## Guided tour on discrete dependent variable models

This guided tour contains mathematical formulas and/or Greek symbols and are therefore best viewed with Internet Explorer, Opera or Google Chrome, because other web browsers (in particular Firefox) may not display the "Symbol" fonts involved. For example, "b" should be displayed as the Greek letter "beta" rather than the Roman "b". If not, reload this guided tour in Internet Explorer, Opera or Google Chrome, or make one of them your default web browser.

Discrete dependent variable models are models for which the dependent variable Y takes discrete values only. There are three types of discrete dependent variable models:

• Count data models
• Ordered probability models
• Qualitative response models

### Count data models

As the term "count" indicates, in the case of count data models the dependent variable Y represents a quantity. For example, let Y be the number of accidents during rush hour at a particular intersection. Thus, in the case of count data the dependent variable Y is a count of something.

In EasyReg International you can choose from three count data models, depending on whether the dependent variable Y has an explicit finite upperbound or not.

• Binomial Logit
• Negative Binomial Logit
• Poisson
In the Binomial Logit case there exists a known smallest natural number m such that P[Y Î {0,1,2,....,m}] = 1, which will be determined by EasyReg itself. In the other two cases Y does not have an explicit upper bound. Note that the negative binomial distribution depends on an integer-valued parameter m ³ 1, which has to be specified.

These models are explained in LIMDEP.PDF.

### Ordered probability models

Now suppose that the values of Y represent an ordering of items. For example, let Y be the outcome of a taste test, coded like

• Y = 0: disgusting,
• Y = 2: OK,
• Y = 3: good,
• Y = 4: delicious.

In this case Y is not a quantity, but nevertheless a larger value of Y means more, or better. In this case there exists a known smallest natural number m such that

P[Y Î {0,1,2,....,m}] = 1,

similar to the Binomial Logit model, but now the values of Y represent a ranking rather than a count. The upper bound m is determined by EasyReg itself.

This type of data is usually modeled via a latent variable model:

Y* = a + b'X + e,

where the error term e is independent of X and has distribution function F(.). The latent dependent variable Y* is not observable, but related to the observed Y in the following way:

P[Y = 0 | X] = P[Y* Î (-¥,0] | X] = P[e Î (-¥,-a - b'X] | X] = F(-a - b'X)

P[Y £ 1 | X] = P[Y* Î (-¥,m1] | X] = P[e Î (-¥,m1 - a - b'X] | X] = F(m1 - a - b'X)

........................

P[Y £ m-1 | X] = P[Y* Î (-¥,mm-1] | X] = P[e Î (-¥,mm-1 - a - b'X] | X] = F(mm-1 - a - b'X)

where

0 < m1 < m2 < ..... < mm-1

The latter conditions can be easily enforced by reparametrizing the m's as follows:

mj = å1 £ i £ j exp(gi), j = 1,2,....,m-1.

Thus the model for Y now becomes:

P[Y = 0 | X] = F(-a - b'X)

P[Y = 1 | X] = F(exp(g1) - a - b'X) - F(-a - b'X)

........................

P[Y = j | X] = F(å1 £ i £ j exp(gi) - a - b'X) - F(å1 £ i £ j-1 exp(gi) - a - b'X)
j = 2,....,m-1

........................

P[Y = m | X] = 1 - F(å1 £ i £ m-1 exp(gi) - a - b'X)

In EasyReg you have only one option for F, namely the Logistic distribution function F(x) = 1/[1 + exp(-x)]. The corresponding model for Y is therefore called the ordered logit model.

### Qualitative response models

#### Logit and Probit models

If the dependent variable Y is dichotomous, i.e., P[Y Î {0,1}] = 1, then it follows from the above discussion of the ordered logit model and the symmetry of the distribution function F that

P[Y = 0 | X] = F(-a - b'X) = 1 - F(a + b'X)

hence

P[Y = 1 | X] = F(a + b'X).

In EasyReg you have two options for F, namely is the Logistic distribution function F(x) = 1/[1 + exp(-x)] (the Logit model) and the distribution function of the standard normal fistribution (the Probit model). Read PROBIT_LOGIT.PDF for a comparision of Probit and Logit analysis.

The case Y = 1 usually stands for an attribute you are interested in. For example, let Y = 1 mean that "an applicant for a mortgage loan will default on the loan", and let X be the vector of characteristics of the applicant, such as income, family situation, credit rating, etcetera. Given a data set of people who have gotten a mortgage loan, together with their payment history and their characteristics X, you can estimate the parameters a and b, and then use F(a + b'X) to predict the probability that an applicant with characteristics X will default on the loan.

Remark: The Logit model in EasyReg is defined as

(1) P[Y = 1 | X] = 1 / [1 + exp(-a - b' X)]

However, some statistical packages, in particalur SAS, define the logit model as

(2) P[Y = 1 | X] = 1 / [1 + exp( g + d'X)]

Both are equivalent, of course, as g = - a and d = - b, but (1) makes more sense than (2) because (1) is monotonic increasing in b'X. Thus, if you compare the Logit results of EasyReg with the Logit results of another software package and you find that the parameter estimates are the same in absolute value but with opposite signs, then the other software package uses model (2).

#### The multinomial logit model

Now suppose that you have a vector of dependent dummy variables. For example, suppose you hold a survey in which you ask the respondents which brand of dishwasher detergent they use:

• Y1 = 1: Brand 1
• Y2 = 1: Brand 2
• Y3 = 1: Brand 3
• ............
• Ym = 1: Brand m
• Ym+1 = 1: None of the above

The dependent dummy variables involved can be recoded into a single variable Y, for example, let:

• Y = 0 if Ym+1 = 1
• Y = 1 if Y1 = 1
• Y = 2 if Y2 = 1
• Y = 3 if Y3 = 1
• ............
• Y = m if Ym = 1

The standard model for this case is the multinomial logit model. For j = 1,2,..,m,

P[Y = j | X] = exp(aj + bj'X).P[Y = 0 | X]

where

P[Y = 0 | X] = 1 / [1 + å1 £ i £ m exp(ai + bi'X)]

In the case m = 1 this model reduces to the Logit model.

Note that

(/X')P[Y = j | X] = exp(aj + bj'X).P[Y = 0 | X]bj

- exp(aj + bj'X)(P[Y = 0 | X])2[å1 £ i £ m exp(ai + bi'X)bi]

so that the direction of the effect of X on P[Y = j | X] depends on all the parameters. On the other hand,

X'(/¶bj')P[Y = j | X] = exp(aj + bj'X).P[Y = 0 | X] X'X

- (exp(aj + bj'X)(P[Y = 0 | X])2X'X

= {P[Y = j | X] - (P[Y = j | X])2}X'X > 0.

### Estimation method

All the models discussed above are estimated by maximum likelihood. Moreover, in all cases the log-likelihood function is unimodal in the parameters. Therefore, the log-likelihood is maximized by using the Newton iteration.

## Ordered logit analysis with EasyReg

Open "Menu > Single equation models > Discrete dependent variable models", choose these variables, choose "Function level" as the dependent variable, and include an intercept.

EasyReg checks whether the dependent variable is discrete or not. If not, you will not be able to continue.

EasyReg will automatically recode the dependent variable as Y = Function level - 1.

Since the dependent variable represents an ordering, an ordered logit model is appropriate.

Click "Continue". Then the "What to do next?" window appears.

#### Options

• The conditional probability model for the discrete dependent variable Y implies a conditional expectation function

g(X) = E[Y|X],

where X is the vector of explanatory variables, hence

Y = g(X) + U, with E[U|X] = 0.

The residuals are the estimates of U for each observation in the sample. If you use the option "Write residuals to the input file" the estimates of the conditional expectation g(X) can be constructed via Menu > Input > Transform variables, and the linear combination Y - U.

• If you choose the option "Write probabilities to the input file", the probabilities involved can be viewed via Menu > Data analysis > Data table.

• The option "Wald test of linear parameter restrictions" works in the same way as in the linear regression case. See the guided tour on Ordinary Least Squares (OLS) estimation and extensions.

#### Output

The output involved (without the asymptotic variance matrix) is given below.

```Ordered Logit model:
Dependent variable:
Y = function level (1-9)

Characteristics:
function level (1-9)
First observation = 1
Last observation  = 2000
Number of usable observations: 2000
Minimum value: 1.0000000E+000
Maximum value: 9.0000000E+000
Sample mean:   3.7775000E+000
This variable is integer valued.
A discrete dependent variable model is suitable.

X variables:
X(1) = education level (1-7)
X(2) = male/female (1/2)
X(3) = age in years
X(4) = experience in years
X(5) = 1

Model:
P(Y-1=0|x) = F(-b'x)
P(Y-1=1|x) = F(-b'x+exp(b(6)))- F(-b'x)
For j=2,..,m-1,
P(Y-1=j|x) = F(-b'x+exp(b(6))+..+exp(b(5+j)))
- F(-b'x+exp(b(6))+..+exp(b(4+j)))
and
P(Y-1=m|x) = 1 - F(-b'x+exp(b(6))+..+exp(b(4+m)))
where m =8, b'x = b(1)x(1)+..+b(5)x(5), and
F(u) = 1/[1+EXP(-u)].
Newton iteration succesfully completed after 15 iterations
Last absolute parameter change = 0.0000
Last percentage change of the likelihood = -0.1018

Maximum likelihood estimation results:
Par.   ML estimate     s.e. t-value [p-value] Variable
b(1)      1.015007 0.034594   29.34 [0.00000] education level (1-7)
b(2)     -0.315749 0.115353   -2.74 [0.00620] male/female (1/2)
b(3)      0.028351 0.004702    6.03 [0.00000] age in years
b(4)      0.024434 0.006309    3.87 [0.00011] experience in years
b(5)     -0.373515 0.267649   -1.40 [0.16285] 1
b(6)      0.964437 0.041202   23.41 [0.00000]
b(7)      0.825036 0.034042   24.24 [0.00000]
b(8)     -2.233505 0.187036  -11.94 [0.00000]
b(9)     -2.169395 0.184080  -11.79 [0.00000]
b(10)     0.383856 0.055109    6.97 [0.00000]
b(11)    -0.140443 0.088979   -1.58 [0.11448]
b(12)     0.587651 0.082432    7.13 [0.00000]
[The two-sided p-values are based on the normal approximation]

Log likelihood:      -1.27523282560E+003
Sample size (n):                    2000
Information criteria:
Akaike:               1.287232826
Hannan-Quinn:         1.299572029
Schwarz:              1.320838240
```

The parameters b(1),...,b(4) are the components of b, b(5) is a, and b(6),...,b(12) are the parameters g1,...,g7. The parameters of interest are b(1),...,b(4). In order to interpret these parameters, observe that

(/X' )P[Y = j | X] = - [f(å1 £ i £ j exp(gi) - a - b'X) - f(å1 £ i £ j-1 exp(gi) - a - b'X)]b,

where f is the density corresponding to F. Therefore, the interpretation of the effect of X on P[Y = j | X] depends on whether f(z) is downwards sloping in z = å1 £ i £ j-1 exp(gi) - a - b'X or upwards sloping. In the downwards sloping case a positive sign of one of the components of b implies an positive effect of the corresponding component of X, and in the upwards sloping case a negative effect. Which case applies depends on the parameter values and the value of X, so EasyReg will not be able to tell you how to interpret the results. However, if the signs of the b's are consistent with one of the two cases, you may interpret the results accordingly, as in the present case. The positive signs of b(1),b(3),b(4) and the negative sign of b(2) are consistent with the assumption that f(z) is downwards sloping in z = å1 £ i £ j-1 exp(gi) - a - b'X for all X, because otherwise the results make no sense. Thus, the positive signs and significance of b(1),b(3),b(4) indicate that education, age and experience have a positive effect on function level, and the negative sign and significance of b(2) indicate that, ceteris paribus, being a female is a disadvantage in attaining the same function level as a male.

## Multinomial logit analysis with EasyReg

Now estimate a multinomial logit model using the same data. Of course, the multinomial logit model does not take the ordering of Y into account, so that this model is not appropriate. Nevertheless, in doing so something interesting will happen:

Click "Yes":

The full text in this window is listed below.

```What to do if you get the message that some X variables are constant
for some values of Y in the multinomial logit model?

As an example, try to estimate the following multinomial
logit model, using the cross-section data of Dutch wage earners in
the EasyReg database:

Model variables:
y = function level (1-9)
x(1) = education level (1-7)
x(2) = male/female (1/2)
x(3) = age in years
x(4) = experience in years
x(5) = 1

Available observations: t = 1 -> 2000
= Chosen.

Model:
P(y-1=0|x) = 1/[1+exp(b(1)'x)+ ..+ exp(b(m)'x)]
P(y-1=j|x) = exp(b(j)'x)P(y=0|x), j=1,..,m, where m =  8

Then you will get the message:

X(2) and X(5) are constant for Y = 5
Program aborted!

The problem is that there are no females in function level 5, so that
X(2) = 1 for all workers in function level 5. The variable X(2) is
therefore perfectly multicollinear with the intercept X(5)!
The solution is to merge function level 5 with other function
levels, for example with function levels 4 and 6, yielding a new
dependent variable with values ranging from 1 to 7. In order to do
this, we have to transform the function levels to dummy variables.
Create the following dummy variables, using the I(x=a) option
I[function level (1-9)=1]
I[function level (1-9)=2]
I[function level (1-9)=3]
I[function level (1-9)=4]
I[function level (1-9)=5]
I[function level (1-9)=6]
I[function level (1-9)=7]
I[function level (1-9)=8]
I[function level (1-9)=9]
where I(.) is the indicator function: I(true) = 1, I(false) = 0).
Next, create the following new dependent variable, using
the linear combination option in the transformation menu:
New function level (1-7) =
1 x I[function level = 1]
+2 x I[function level = 2]
+3 x I[function level = 3]
+4 x I[function level = 4]
+4 x I[function level = 5]
+4 x I[function level = 6]
+5 x I[function level = 7]
+6 x I[function level = 8]
+7 x I[function level = 9]
Using this new dependent variable, the multinomial logit
estimation results are:

Model variables:
y = New function level (1-7)
x(1) = education level (1-7)
x(2) = male/female (1/2)
x(3) = age in years
x(4) = experience in years
x(5) = 1

Available observations: t = 1 -> 2000
= Chosen.

Model:
P(y-1=0|x) = 1/[1+exp(b(1)'x)+ ..+ exp(b(m)'x)]
P(y-1=j|x) = exp(b(j)'x)P(y=0|x), j=1,..,m, where m =  6

Maximum likelihood estimation results:
Variable                       ML estimate of b(.) (t-value)
x(1)=education level (1-7)     b(1,1) =    0.6617117   (5.46)
[p-value = 0.00000]
x(2)=male/female (1/2)         b(1,2) =   -0.2669086  (-1.02)
[p-value = 0.30564]
x(3)=age in years              b(1,3) =   -0.0161398  (-1.41)
[p-value = 0.15763]
x(4)=experience in years       b(1,4) =    0.0489719   (2.35)
[p-value = 0.01897]
x(5)=1                         b(1,5) =    0.8470765   (1.37)
[p-value = 0.17029]
x(1)=education level (1-7)     b(2,1) =    1.2491677  (10.15)
[p-value = 0.00000]
x(2)=male/female (1/2)         b(2,2) =   -0.2660622  (-1.00)
[p-value = 0.31854]
x(3)=age in years              b(2,3) =   -0.0132317  (-1.13)
[p-value = 0.25919]
x(4)=experience in years       b(2,4) =    0.0968024   (4.62)
[p-value = 0.00000]
x(5)=1                         b(2,5) =   -0.9266000  (-1.45)
[p-value = 0.14587]
x(1)=education level (1-7)     b(3,1) =    1.9351154  (14.47)
[p-value = 0.00000]
x(2)=male/female (1/2)         b(3,2) =   -0.8985736  (-2.80)
[p-value = 0.00510]
x(3)=age in years              b(3,3) =    0.0201374   (1.52)
[p-value = 0.12764]
x(4)=experience in years       b(3,4) =    0.0994764   (4.48)
[p-value = 0.00001]
x(5)=1                         b(3,5) =   -4.5914640  (-6.18)
[p-value = 0.00000]
x(1)=education level (1-7)     b(4,1) =    2.5490455  (16.76)
[p-value = 0.00000]
x(2)=male/female (1/2)         b(4,2) =   -1.3678589  (-2.75)
[p-value = 0.00598]
x(3)=age in years              b(4,3) =    0.0453044   (2.60)
[p-value = 0.00939]
x(4)=experience in years       b(4,4) =    0.0929477   (3.57)
[p-value = 0.00035]
x(5)=1                         b(4,5) =   -8.9553810  (-8.57)
[p-value = 0.00000]
x(1)=education level (1-7)     b(5,1) =    2.8232526  (18.12)
[p-value = 0.00000]
x(2)=male/female (1/2)         b(5,2) =   -0.8957252  (-1.93)
[p-value = 0.05404]
x(3)=age in years              b(5,3) =    0.0847234   (4.87)
[p-value = 0.00000]
x(4)=experience in years       b(5,4) =    0.0526355   (1.98)
[p-value = 0.04734]
x(5)=1                         b(5,5) =  -12.0151756 (-11.05)
[p-value = 0.00000]
x(1)=education level (1-7)     b(6,1) =    2.4593404  (14.13)
[p-value = 0.00000]
x(2)=male/female (1/2)         b(6,2) =   -2.3332418  (-2.20)
[p-value = 0.02763]
x(3)=age in years              b(6,3) =    0.0763263   (3.54)
[p-value = 0.00041]
x(4)=experience in years       b(6,4) =    0.0456722   (1.45)
[p-value = 0.14763]
x(5)=1                         b(6,5) =   -9.2689909  (-5.73)
[p-value = 0.00000]
[The two-sided p-values are based on the normal approximation]

Log likelihood:      -2.54992886753E+003
Sample size (n):                    2000
```

Note that the parameter b(j,1),..,b(j,4) are the components of bj, and b(j,5) = aj.