Guided tour on discrete dependent variable models

This guided tour contains mathematical formulas and/or Greek symbols and are therefore best viewed with Internet Explorer, Opera or Google Chrome, because other web browsers (in particular Firefox) may not display the "Symbol" fonts involved. For example, "b" should be displayed as the Greek letter "beta" rather than the Roman "b". If not, reload this guided tour in Internet Explorer, Opera or Google Chrome, or make one of them your default web browser.

Discrete dependent variable models are models for which the dependent variable Y takes discrete values only. There are three types of discrete dependent variable models:

Count data models

As the term "count" indicates, in the case of count data models the dependent variable Y represents a quantity. For example, let Y be the number of accidents during rush hour at a particular intersection. Thus, in the case of count data the dependent variable Y is a count of something.

In EasyReg International you can choose from three count data models, depending on whether the dependent variable Y has an explicit finite upperbound or not.

In the Binomial Logit case there exists a known smallest natural number m such that P[Y {0,1,2,....,m}] = 1, which will be determined by EasyReg itself. In the other two cases Y does not have an explicit upper bound. Note that the negative binomial distribution depends on an integer-valued parameter m 1, which has to be specified.

These models are explained in LIMDEP.PDF.

Ordered probability models

Now suppose that the values of Y represent an ordering of items. For example, let Y be the outcome of a taste test, coded like

In this case Y is not a quantity, but nevertheless a larger value of Y means more, or better. In this case there exists a known smallest natural number m such that

P[Y {0,1,2,....,m}] = 1,

similar to the Binomial Logit model, but now the values of Y represent a ranking rather than a count. The upper bound m is determined by EasyReg itself.

This type of data is usually modeled via a latent variable model:

Y* = a + b'X + e,

where the error term e is independent of X and has distribution function F(.). The latent dependent variable Y* is not observable, but related to the observed Y in the following way:

P[Y = 0 | X] = P[Y* (-,0] | X] = P[e (-,-a - b'X] | X] = F(-a - b'X)

P[Y 1 | X] = P[Y* (-,m1] | X] = P[e (-,m1 - a - b'X] | X] = F(m1 - a - b'X)

........................

P[Y m-1 | X] = P[Y* (-,mm-1] | X] = P[e (-,mm-1 - a - b'X] | X] = F(mm-1 - a - b'X)

where

0 < m1 < m2 < ..... < mm-1

The latter conditions can be easily enforced by reparametrizing the m's as follows:

mj = 1 i j exp(gi), j = 1,2,....,m-1.

Thus the model for Y now becomes:

P[Y = 0 | X] = F(-a - b'X)

P[Y = 1 | X] = F(exp(g1) - a - b'X) - F(-a - b'X)

........................

P[Y = j | X] = F(1 i j exp(gi) - a - b'X) - F(1 i j-1 exp(gi) - a - b'X)
j = 2,....,m-1

........................

P[Y = m | X] = 1 - F(1 i m-1 exp(gi) - a - b'X)

In EasyReg you have only one option for F, namely the Logistic distribution function F(x) = 1/[1 + exp(-x)]. The corresponding model for Y is therefore called the ordered logit model.

Qualitative response models

Logit and Probit models

If the dependent variable Y is dichotomous, i.e., P[Y {0,1}] = 1, then it follows from the above discussion of the ordered logit model and the symmetry of the distribution function F that

P[Y = 0 | X] = F(-a - b'X) = 1 - F(a + b'X)

hence

P[Y = 1 | X] = F(a + b'X).

In EasyReg you have two options for F, namely is the Logistic distribution function F(x) = 1/[1 + exp(-x)] (the Logit model) and the distribution function of the standard normal fistribution (the Probit model). Read PROBIT_LOGIT.PDF for a comparision of Probit and Logit analysis.

The case Y = 1 usually stands for an attribute you are interested in. For example, let Y = 1 mean that "an applicant for a mortgage loan will default on the loan", and let X be the vector of characteristics of the applicant, such as income, family situation, credit rating, etcetera. Given a data set of people who have gotten a mortgage loan, together with their payment history and their characteristics X, you can estimate the parameters a and b, and then use F(a + b'X) to predict the probability that an applicant with characteristics X will default on the loan.

Remark: The Logit model in EasyReg is defined as

(1) P[Y = 1 | X] = 1 / [1 + exp(-a - b' X)]

However, some statistical packages, in particalur SAS, define the logit model as

(2) P[Y = 1 | X] = 1 / [1 + exp( g + d'X)]

Both are equivalent, of course, as g = - a and d = - b, but (1) makes more sense than (2) because (1) is monotonic increasing in b'X. Thus, if you compare the Logit results of EasyReg with the Logit results of another software package and you find that the parameter estimates are the same in absolute value but with opposite signs, then the other software package uses model (2).

The multinomial logit model

Now suppose that you have a vector of dependent dummy variables. For example, suppose you hold a survey in which you ask the respondents which brand of dishwasher detergent they use:

The dependent dummy variables involved can be recoded into a single variable Y, for example, let:

The standard model for this case is the multinomial logit model. For j = 1,2,..,m,

P[Y = j | X] = exp(aj + bj'X).P[Y = 0 | X]

where

P[Y = 0 | X] = 1 / [1 + 1 i m exp(ai + bi'X)]

In the case m = 1 this model reduces to the Logit model.

Note that

(/X')P[Y = j | X] = exp(aj + bj'X).P[Y = 0 | X]bj

- exp(aj + bj'X)(P[Y = 0 | X])2[1 i m exp(ai + bi'X)bi]

so that the direction of the effect of X on P[Y = j | X] depends on all the parameters. On the other hand,

X'(/bj')P[Y = j | X] = exp(aj + bj'X).P[Y = 0 | X] X'X

- (exp(aj + bj'X)(P[Y = 0 | X])2X'X

= {P[Y = j | X] - (P[Y = j | X])2}X'X > 0.

Estimation method

All the models discussed above are estimated by maximum likelihood. Moreover, in all cases the log-likelihood function is unimodal in the parameters. Therefore, the log-likelihood is maximized by using the Newton iteration.

Ordered logit analysis with EasyReg

Download the following variables from the EasyReg database:

LIMDEP window

Open "Menu > Single equation models > Discrete dependent variable models", choose these variables, choose "Function level" as the dependent variable, and include an intercept.

LIMDEP window

EasyReg checks whether the dependent variable is discrete or not. If not, you will not be able to continue.

LIMDEP window

EasyReg will automatically recode the dependent variable as Y = Function level - 1.

Since the dependent variable represents an ordering, an ordered logit model is appropriate.

LIMDEP window

Click "Continue". Then the "What to do next?" window appears.

LIMDEP window

Options

Output

The output involved (without the asymptotic variance matrix) is given below.

Ordered Logit model:
Dependent variable:
Y = function level (1-9)

Characteristics:
function level (1-9)
  First observation = 1
  Last observation  = 2000
  Number of usable observations: 2000
  Minimum value: 1.0000000E+000
  Maximum value: 9.0000000E+000
  Sample mean:   3.7775000E+000
  This variable is integer valued.
  A discrete dependent variable model is suitable.

X variables:
X(1) = education level (1-7)
X(2) = male/female (1/2)
X(3) = age in years
X(4) = experience in years
X(5) = 1


Model:
P(Y-1=0|x) = F(-b'x)
P(Y-1=1|x) = F(-b'x+exp(b(6)))- F(-b'x)
For j=2,..,m-1,
P(Y-1=j|x) = F(-b'x+exp(b(6))+..+exp(b(5+j)))
          - F(-b'x+exp(b(6))+..+exp(b(4+j)))
and
P(Y-1=m|x) = 1 - F(-b'x+exp(b(6))+..+exp(b(4+m)))
where m =8, b'x = b(1)x(1)+..+b(5)x(5), and
F(u) = 1/[1+EXP(-u)].
Newton iteration succesfully completed after 15 iterations
Last absolute parameter change = 0.0000
Last percentage change of the likelihood = -0.1018

Maximum likelihood estimation results:
Par.   ML estimate     s.e. t-value [p-value] Variable              
b(1)      1.015007 0.034594   29.34 [0.00000] education level (1-7) 
b(2)     -0.315749 0.115353   -2.74 [0.00620] male/female (1/2)     
b(3)      0.028351 0.004702    6.03 [0.00000] age in years          
b(4)      0.024434 0.006309    3.87 [0.00011] experience in years   
b(5)     -0.373515 0.267649   -1.40 [0.16285] 1                     
b(6)      0.964437 0.041202   23.41 [0.00000]                       
b(7)      0.825036 0.034042   24.24 [0.00000]                       
b(8)     -2.233505 0.187036  -11.94 [0.00000]                       
b(9)     -2.169395 0.184080  -11.79 [0.00000]                       
b(10)     0.383856 0.055109    6.97 [0.00000]                       
b(11)    -0.140443 0.088979   -1.58 [0.11448]                       
b(12)     0.587651 0.082432    7.13 [0.00000]                       
[The two-sided p-values are based on the normal approximation]

Log likelihood:      -1.27523282560E+003
Sample size (n):                    2000
Information criteria:      
     Akaike:               1.287232826
     Hannan-Quinn:         1.299572029
     Schwarz:              1.320838240

The parameters b(1),...,b(4) are the components of b, b(5) is a, and b(6),...,b(12) are the parameters g1,...,g7. The parameters of interest are b(1),...,b(4). In order to interpret these parameters, observe that

(/X' )P[Y = j | X] = - [f(1 i j exp(gi) - a - b'X) - f(1 i j-1 exp(gi) - a - b'X)]b,

where f is the density corresponding to F. Therefore, the interpretation of the effect of X on P[Y = j | X] depends on whether f(z) is downwards sloping in z = 1 i j-1 exp(gi) - a - b'X or upwards sloping. In the downwards sloping case a positive sign of one of the components of b implies an positive effect of the corresponding component of X, and in the upwards sloping case a negative effect. Which case applies depends on the parameter values and the value of X, so EasyReg will not be able to tell you how to interpret the results. However, if the signs of the b's are consistent with one of the two cases, you may interpret the results accordingly, as in the present case. The positive signs of b(1),b(3),b(4) and the negative sign of b(2) are consistent with the assumption that f(z) is downwards sloping in z = 1 i j-1 exp(gi) - a - b'X for all X, because otherwise the results make no sense. Thus, the positive signs and significance of b(1),b(3),b(4) indicate that education, age and experience have a positive effect on function level, and the negative sign and significance of b(2) indicate that, ceteris paribus, being a female is a disadvantage in attaining the same function level as a male.

Multinomial logit analysis with EasyReg

Now estimate a multinomial logit model using the same data. Of course, the multinomial logit model does not take the ordering of Y into account, so that this model is not appropriate. Nevertheless, in doing so something interesting will happen:

LIMDEP window

Click "Yes":

LIMDEP window

The full text in this window is listed below.

What to do if you get the message that some X variables are constant 
for some values of Y in the multinomial logit model?

       As an example, try to estimate the following multinomial 
logit model, using the cross-section data of Dutch wage earners in 
the EasyReg database:

 Model variables:
 y = function level (1-9)
 x(1) = education level (1-7)                                                                               
 x(2) = male/female (1/2)
 x(3) = age in years
 x(4) = experience in years
 x(5) = 1

 Available observations: t = 1 -> 2000
 = Chosen.
 
 Model:
 P(y-1=0|x) = 1/[1+exp(b(1)'x)+ ..+ exp(b(m)'x)]
 P(y-1=j|x) = exp(b(j)'x)P(y=0|x), j=1,..,m, where m =  8

Then you will get the message:

X(2) and X(5) are constant for Y = 5
Program aborted!

The problem is that there are no females in function level 5, so that
X(2) = 1 for all workers in function level 5. The variable X(2) is 
therefore perfectly multicollinear with the intercept X(5)!
       The solution is to merge function level 5 with other function 
levels, for example with function levels 4 and 6, yielding a new 
dependent variable with values ranging from 1 to 7. In order to do 
this, we have to transform the function levels to dummy variables.
       Create the following dummy variables, using the I(x=a) option
in the transformation menu:
 I[function level (1-9)=1]
 I[function level (1-9)=2]
 I[function level (1-9)=3]
 I[function level (1-9)=4]
 I[function level (1-9)=5]
 I[function level (1-9)=6]
 I[function level (1-9)=7]
 I[function level (1-9)=8]
 I[function level (1-9)=9]
where I(.) is the indicator function: I(true) = 1, I(false) = 0).
       Next, create the following new dependent variable, using 
the linear combination option in the transformation menu:
 New function level (1-7) =
  1 x I[function level = 1]
 +2 x I[function level = 2]
 +3 x I[function level = 3]
 +4 x I[function level = 4]
 +4 x I[function level = 5]
 +4 x I[function level = 6]
 +5 x I[function level = 7]
 +6 x I[function level = 8]
 +7 x I[function level = 9]
       Using this new dependent variable, the multinomial logit 
estimation results are:

 Model variables:
 y = New function level (1-7)
 x(1) = education level (1-7)                                                                               
 x(2) = male/female (1/2)
 x(3) = age in years
 x(4) = experience in years
 x(5) = 1

 Available observations: t = 1 -> 2000
 = Chosen.

 Model:
 P(y-1=0|x) = 1/[1+exp(b(1)'x)+ ..+ exp(b(m)'x)]
 P(y-1=j|x) = exp(b(j)'x)P(y=0|x), j=1,..,m, where m =  6

 Maximum likelihood estimation results:
 Variable                       ML estimate of b(.) (t-value)
 x(1)=education level (1-7)     b(1,1) =    0.6617117   (5.46)
                                           [p-value = 0.00000]
 x(2)=male/female (1/2)         b(1,2) =   -0.2669086  (-1.02)
                                           [p-value = 0.30564]
 x(3)=age in years              b(1,3) =   -0.0161398  (-1.41)
                                           [p-value = 0.15763]
 x(4)=experience in years       b(1,4) =    0.0489719   (2.35)
                                           [p-value = 0.01897]
 x(5)=1                         b(1,5) =    0.8470765   (1.37)
                                           [p-value = 0.17029]
 x(1)=education level (1-7)     b(2,1) =    1.2491677  (10.15)
                                           [p-value = 0.00000]
 x(2)=male/female (1/2)         b(2,2) =   -0.2660622  (-1.00)
                                           [p-value = 0.31854]
 x(3)=age in years              b(2,3) =   -0.0132317  (-1.13)
                                           [p-value = 0.25919]
 x(4)=experience in years       b(2,4) =    0.0968024   (4.62)
                                           [p-value = 0.00000]
 x(5)=1                         b(2,5) =   -0.9266000  (-1.45)
                                           [p-value = 0.14587]
 x(1)=education level (1-7)     b(3,1) =    1.9351154  (14.47)
                                           [p-value = 0.00000]
 x(2)=male/female (1/2)         b(3,2) =   -0.8985736  (-2.80)
                                           [p-value = 0.00510]
 x(3)=age in years              b(3,3) =    0.0201374   (1.52)
                                           [p-value = 0.12764]
 x(4)=experience in years       b(3,4) =    0.0994764   (4.48)
                                           [p-value = 0.00001]
 x(5)=1                         b(3,5) =   -4.5914640  (-6.18)
                                           [p-value = 0.00000]
 x(1)=education level (1-7)     b(4,1) =    2.5490455  (16.76)
                                           [p-value = 0.00000]
 x(2)=male/female (1/2)         b(4,2) =   -1.3678589  (-2.75)
                                           [p-value = 0.00598]
 x(3)=age in years              b(4,3) =    0.0453044   (2.60)
                                           [p-value = 0.00939]
 x(4)=experience in years       b(4,4) =    0.0929477   (3.57)
                                           [p-value = 0.00035]
 x(5)=1                         b(4,5) =   -8.9553810  (-8.57)
                                           [p-value = 0.00000]
 x(1)=education level (1-7)     b(5,1) =    2.8232526  (18.12)
                                           [p-value = 0.00000]
 x(2)=male/female (1/2)         b(5,2) =   -0.8957252  (-1.93)
                                           [p-value = 0.05404]
 x(3)=age in years              b(5,3) =    0.0847234   (4.87)
                                           [p-value = 0.00000]
 x(4)=experience in years       b(5,4) =    0.0526355   (1.98)
                                           [p-value = 0.04734]
 x(5)=1                         b(5,5) =  -12.0151756 (-11.05)
                                           [p-value = 0.00000]
 x(1)=education level (1-7)     b(6,1) =    2.4593404  (14.13)
                                           [p-value = 0.00000]
 x(2)=male/female (1/2)         b(6,2) =   -2.3332418  (-2.20)
                                           [p-value = 0.02763]
 x(3)=age in years              b(6,3) =    0.0763263   (3.54)
                                           [p-value = 0.00041]
 x(4)=experience in years       b(6,4) =    0.0456722   (1.45)
                                           [p-value = 0.14763]
 x(5)=1                         b(6,5) =   -9.2689909  (-5.73)
                                           [p-value = 0.00000]
 [The two-sided p-values are based on the normal approximation]

 Log likelihood:      -2.54992886753E+003
 Sample size (n):                    2000

Note that the parameter b(j,1),..,b(j,4) are the components of bj, and b(j,5) = aj.

This is the end of the guided tour on discrete dependent variable models