This guided tour contains mathematical formulas and/or Greek symbols and are therefore best viewed with Internet Explorer, Opera or Google Chrome, because other web browsers (in particular Firefox) may not display the "Symbol" fonts involved. For example, "b" should be displayed as the Greek letter "beta" rather than the Roman "b". If not, reload this guided tour in Internet Explorer, Opera or Google Chrome, or make one of them your default web browser.
Discrete dependent variable models are models for which the dependent variable Y takes discrete values only. There are three types of discrete dependent variable models:
As the term "count" indicates, in the case of count data models the dependent variable Y represents a quantity. For example, let Y be the number of accidents during rush hour at a particular intersection. Thus, in the case of count data the dependent variable Y is a count of something.
In EasyReg International you can choose from three count data models, depending on whether the dependent variable Y has an explicit finite upperbound or not.
These models are explained in LIMDEP.PDF.
Now suppose that the values of Y represent an ordering of items. For example, let Y be the outcome of a taste test, coded like
In this case Y is not a quantity, but nevertheless a larger value of Y means more, or better. In this case there exists a known smallest natural number m such that
similar to the Binomial Logit model, but now the values of Y represent a ranking rather than a count. The upper bound m is determined by EasyReg itself.
This type of data is usually modeled via a latent variable model:
where the error term e is independent of X and has distribution function F(.). The latent dependent variable Y^{*} is not observable, but related to the observed Y in the following way:
P[Y = 0  X] = P[Y^{*} Î (¥,0]  X] = P[e Î (¥,a  b'X]  X] = F(a  b'X)
P[Y £ 1  X] = P[Y^{*} Î (¥,m_{1}]  X] = P[e Î (¥,m_{1}  a  b'X]  X] = F(m_{1}  a  b'X)
........................
P[Y £ m1  X] = P[Y^{*} Î (¥,m_{m1}]  X] = P[e Î (¥,m_{m1}  a  b'X]  X] = F(m_{m1}  a  b'X)
where
The latter conditions can be easily enforced by reparametrizing the m's as follows:
Thus the model for Y now becomes:
P[Y = 0  X] = F(a  b'X)
P[Y = 1  X] = F(exp(g_{1})  a  b'X)  F(a  b'X)
........................
P[Y = j  X] =
F(å_{1 £ i £ j}
exp(g_{i})
 a 
b'X)

F(å_{1 £ i £ j1}
exp(g_{i})
 a 
b'X)
j = 2,....,m1
........................
P[Y = m  X] = 1  F(å_{1 £ i £ m1} exp(g_{i})  a  b'X)
In EasyReg you have only one option for F, namely the Logistic distribution
function
If the dependent variable Y is dichotomous, i.e., P[Y Î {0,1}] = 1, then it follows from the above discussion of the ordered logit model and the symmetry of the distribution function F that
P[Y = 0  X] = F(a  b'X) = 1  F(a + b'X)
hence
P[Y = 1  X] = F(a + b'X).
In EasyReg you have two options for F, namely is the Logistic distribution function F(x) = 1/[1 + exp(x)] (the Logit model) and the distribution function of the standard normal fistribution (the Probit model). Read PROBIT_LOGIT.PDF for a comparision of Probit and Logit analysis.
The case Y = 1 usually stands for an attribute you are interested in.
For example, let Y = 1 mean
that "an applicant for a mortgage loan will default on the loan", and let X
be the vector of characteristics
of the applicant, such as income, family situation, credit rating, etcetera.
Given a data set of people who have gotten
a mortgage loan, together with their payment history and their characteristics
X, you can estimate the
parameters a and b, and then use
Remark: The Logit model in EasyReg is defined as
However, some statistical packages, in particalur SAS, define the logit model as
Both are equivalent, of course, as g =  a
and d =  b,
but (1) makes more sense than (2) because (1) is monotonic increasing in
Now suppose that you have a vector of dependent dummy variables. For example, suppose you hold a survey in which you ask the respondents which brand of dishwasher detergent they use:
The dependent dummy variables involved can be recoded into a single variable Y, for example, let:
The standard model for this case is the multinomial logit model. For
where
In the case m = 1 this model reduces to the Logit model.
Note that
 exp(a_{j} + b_{j}'X)(P[Y = 0  X])^{2}[å_{1 £ i £ m} exp(a_{i} + b_{i}'X)b_{i}]
so that the direction of the effect of X on P[Y = j  X] depends on all the parameters. On the other hand,
 (exp(a_{j} + b_{j}'X)(P[Y = 0  X])^{2}X'X
= {P[Y = j  X]  (P[Y = j  X])^{2}}X'X > 0.
All the models discussed above are estimated by maximum likelihood. Moreover, in all cases the loglikelihood function is unimodal in the parameters. Therefore, the loglikelihood is maximized by using the Newton iteration.
Download the following variables from the EasyReg database:
Open "Menu > Single equation models > Discrete dependent variable models", choose these variables, choose "Function level" as the dependent variable, and include an intercept.
EasyReg checks whether the dependent variable is discrete or not. If not, you will not be able to continue.
EasyReg will automatically recode the dependent variable as Y = Function level  1.
Since the dependent variable represents an ordering, an ordered logit model is appropriate.
Click "Continue". Then the "What to do next?" window appears.
where X is the vector of explanatory variables, hence
The residuals are the estimates of U for each observation in the sample.
If you use the option "Write residuals to the input file"
the estimates of the conditional expectation g(X) can be constructed
via Menu > Input > Transform variables, and the linear combination
The output involved (without the asymptotic variance matrix) is given below.
Ordered Logit model: Dependent variable: Y = function level (19) Characteristics: function level (19) First observation = 1 Last observation = 2000 Number of usable observations: 2000 Minimum value: 1.0000000E+000 Maximum value: 9.0000000E+000 Sample mean: 3.7775000E+000 This variable is integer valued. A discrete dependent variable model is suitable. X variables: X(1) = education level (17) X(2) = male/female (1/2) X(3) = age in years X(4) = experience in years X(5) = 1 Model: P(Y1=0x) = F(b'x) P(Y1=1x) = F(b'x+exp(b(6))) F(b'x) For j=2,..,m1, P(Y1=jx) = F(b'x+exp(b(6))+..+exp(b(5+j)))  F(b'x+exp(b(6))+..+exp(b(4+j))) and P(Y1=mx) = 1  F(b'x+exp(b(6))+..+exp(b(4+m))) where m =8, b'x = b(1)x(1)+..+b(5)x(5), and F(u) = 1/[1+EXP(u)]. Newton iteration succesfully completed after 15 iterations Last absolute parameter change = 0.0000 Last percentage change of the likelihood = 0.1018 Maximum likelihood estimation results: Par. ML estimate s.e. tvalue [pvalue] Variable b(1) 1.015007 0.034594 29.34 [0.00000] education level (17) b(2) 0.315749 0.115353 2.74 [0.00620] male/female (1/2) b(3) 0.028351 0.004702 6.03 [0.00000] age in years b(4) 0.024434 0.006309 3.87 [0.00011] experience in years b(5) 0.373515 0.267649 1.40 [0.16285] 1 b(6) 0.964437 0.041202 23.41 [0.00000] b(7) 0.825036 0.034042 24.24 [0.00000] b(8) 2.233505 0.187036 11.94 [0.00000] b(9) 2.169395 0.184080 11.79 [0.00000] b(10) 0.383856 0.055109 6.97 [0.00000] b(11) 0.140443 0.088979 1.58 [0.11448] b(12) 0.587651 0.082432 7.13 [0.00000] [The twosided pvalues are based on the normal approximation] Log likelihood: 1.27523282560E+003 Sample size (n): 2000 Information criteria: Akaike: 1.287232826 HannanQuinn: 1.299572029 Schwarz: 1.320838240
The parameters
where f is the density corresponding to F. Therefore, the interpretation of the effect of X on
Now estimate a multinomial logit model using the same data. Of course, the multinomial logit model does not take the ordering of Y into account, so that this model is not appropriate. Nevertheless, in doing so something interesting will happen:
Click "Yes":
The full text in this window is listed below.
What to do if you get the message that some X variables are constant for some values of Y in the multinomial logit model? As an example, try to estimate the following multinomial logit model, using the crosssection data of Dutch wage earners in the EasyReg database: Model variables: y = function level (19) x(1) = education level (17) x(2) = male/female (1/2) x(3) = age in years x(4) = experience in years x(5) = 1 Available observations: t = 1 > 2000 = Chosen. Model: P(y1=0x) = 1/[1+exp(b(1)'x)+ ..+ exp(b(m)'x)] P(y1=jx) = exp(b(j)'x)P(y=0x), j=1,..,m, where m = 8 Then you will get the message: X(2) and X(5) are constant for Y = 5 Program aborted! The problem is that there are no females in function level 5, so that X(2) = 1 for all workers in function level 5. The variable X(2) is therefore perfectly multicollinear with the intercept X(5)! The solution is to merge function level 5 with other function levels, for example with function levels 4 and 6, yielding a new dependent variable with values ranging from 1 to 7. In order to do this, we have to transform the function levels to dummy variables. Create the following dummy variables, using the I(x=a) option in the transformation menu: I[function level (19)=1] I[function level (19)=2] I[function level (19)=3] I[function level (19)=4] I[function level (19)=5] I[function level (19)=6] I[function level (19)=7] I[function level (19)=8] I[function level (19)=9] where I(.) is the indicator function: I(true) = 1, I(false) = 0). Next, create the following new dependent variable, using the linear combination option in the transformation menu: New function level (17) = 1 x I[function level = 1] +2 x I[function level = 2] +3 x I[function level = 3] +4 x I[function level = 4] +4 x I[function level = 5] +4 x I[function level = 6] +5 x I[function level = 7] +6 x I[function level = 8] +7 x I[function level = 9] Using this new dependent variable, the multinomial logit estimation results are: Model variables: y = New function level (17) x(1) = education level (17) x(2) = male/female (1/2) x(3) = age in years x(4) = experience in years x(5) = 1 Available observations: t = 1 > 2000 = Chosen. Model: P(y1=0x) = 1/[1+exp(b(1)'x)+ ..+ exp(b(m)'x)] P(y1=jx) = exp(b(j)'x)P(y=0x), j=1,..,m, where m = 6 Maximum likelihood estimation results: Variable ML estimate of b(.) (tvalue) x(1)=education level (17) b(1,1) = 0.6617117 (5.46) [pvalue = 0.00000] x(2)=male/female (1/2) b(1,2) = 0.2669086 (1.02) [pvalue = 0.30564] x(3)=age in years b(1,3) = 0.0161398 (1.41) [pvalue = 0.15763] x(4)=experience in years b(1,4) = 0.0489719 (2.35) [pvalue = 0.01897] x(5)=1 b(1,5) = 0.8470765 (1.37) [pvalue = 0.17029] x(1)=education level (17) b(2,1) = 1.2491677 (10.15) [pvalue = 0.00000] x(2)=male/female (1/2) b(2,2) = 0.2660622 (1.00) [pvalue = 0.31854] x(3)=age in years b(2,3) = 0.0132317 (1.13) [pvalue = 0.25919] x(4)=experience in years b(2,4) = 0.0968024 (4.62) [pvalue = 0.00000] x(5)=1 b(2,5) = 0.9266000 (1.45) [pvalue = 0.14587] x(1)=education level (17) b(3,1) = 1.9351154 (14.47) [pvalue = 0.00000] x(2)=male/female (1/2) b(3,2) = 0.8985736 (2.80) [pvalue = 0.00510] x(3)=age in years b(3,3) = 0.0201374 (1.52) [pvalue = 0.12764] x(4)=experience in years b(3,4) = 0.0994764 (4.48) [pvalue = 0.00001] x(5)=1 b(3,5) = 4.5914640 (6.18) [pvalue = 0.00000] x(1)=education level (17) b(4,1) = 2.5490455 (16.76) [pvalue = 0.00000] x(2)=male/female (1/2) b(4,2) = 1.3678589 (2.75) [pvalue = 0.00598] x(3)=age in years b(4,3) = 0.0453044 (2.60) [pvalue = 0.00939] x(4)=experience in years b(4,4) = 0.0929477 (3.57) [pvalue = 0.00035] x(5)=1 b(4,5) = 8.9553810 (8.57) [pvalue = 0.00000] x(1)=education level (17) b(5,1) = 2.8232526 (18.12) [pvalue = 0.00000] x(2)=male/female (1/2) b(5,2) = 0.8957252 (1.93) [pvalue = 0.05404] x(3)=age in years b(5,3) = 0.0847234 (4.87) [pvalue = 0.00000] x(4)=experience in years b(5,4) = 0.0526355 (1.98) [pvalue = 0.04734] x(5)=1 b(5,5) = 12.0151756 (11.05) [pvalue = 0.00000] x(1)=education level (17) b(6,1) = 2.4593404 (14.13) [pvalue = 0.00000] x(2)=male/female (1/2) b(6,2) = 2.3332418 (2.20) [pvalue = 0.02763] x(3)=age in years b(6,3) = 0.0763263 (3.54) [pvalue = 0.00041] x(4)=experience in years b(6,4) = 0.0456722 (1.45) [pvalue = 0.14763] x(5)=1 b(6,5) = 9.2689909 (5.73) [pvalue = 0.00000] [The twosided pvalues are based on the normal approximation] Log likelihood: 2.54992886753E+003 Sample size (n): 2000
Note that the parameter