## Guided tour on interval-censored proportional hazard models

This guided tour contains mathematical formulas and/or Greek symbols and are therefore best viewed with Internet Explorer, Opera or Google Chrome, because other web browsers (in particular Firefox) may not display the "Symbol" fonts involved. For example, "b" should be displayed as the Greek letter "beta" rather than the Roman "b". If not, reload this guided tour in Internet Explorer, Opera or Google Chrome, or make one of them your default web browser.

### The proportional hazard model

The proportional hazard model is a popular model for durations, for example unemployment spells. Let T be the duration and let X be a vector of explanatory variables (called covariates). The proportional hazard model assumes that for t > 0,

P[ T > t | X ] = exp( -exp(b'X)L(t | a ) ) = S(t | a, b'X ),

say, where L(t | a) is the integrated baseline hazard function, depending on a parameter (vector) a, and exp(b'X) is the systematic hazard. The function S(t | a, b'X) is called the survival function.

EasyReg assumes that the duration T is interval-censored: Given the interval endpoints

0 = b0 < b1 < b1 < .... < bM ,

the dependent variables are M dummy variables:

I(T Î (bi-1 , bi ] ), i = 1,2,...,M ,

where I(.) is the indicator function: I(true) = 1, I(false) = 0. Then

P(T Î (bi-1 , bi ] | X ) = S( bi-1 | a, b'X ) - S( bi | a, b'X ) .

### Options for the baseline hazard function

EasyReg provides five options for the integrated baseline hazard function L(t | a).

• #### The piecewise linear integrated baseline hazard

This is the most general option, because only the values of L(t | a) in t = bi matter for the conditional probabilities P(T Î (bi-1 , bi ] | X ).

• #### The generalized Weibull baseline hazard

The difference with the standard Weibull case is that in the standard case the baseline hazard l(t | a) approaches infinity if a2 < 1 and t ¯ 0, whereas in the generalized Weibull case l(t | a) stays bounded.

• #### The unimodal baseline hazard

Both Weibull hazards are monotonic. This hazard function takes a maximum at t = a2. Note that in this case l(0 | a) = 0.

• #### The generalized unimodal baseline hazard

The difference with the previous case is that now l(0 | a) > 0, whereas still l(t | a) is maximal at t = a2.

The parameter a1 in the last four cases plays the role of scale parameter. Consequently, it is not allowed to include a constant in X because a1 plays indirectly that role: exp(b'X)a1 = exp(ln(a1) + b'X). The same applies to the piecewise linear integrated hazard case, because in that case the integrated hazard is homogenous of degree 1 in a.

For further information on the proportional hazard model, open SURVIVAL2.PDF.

## How to estimate interval-censored proportional hazard models via EasyReg

### The data and selection of variables

The data for this demonstration is available as an Excel file in CSV format: SURVIVAL2.CSV. This is the same subsample of released ex-convicts in Texas, from the larger data set used in:

• Bierens, H.J., and J.R. Carvalho (2007), "Semi-Nonparametric Competing Risk Analysis of Recidivism", forthcoming in the Journal of Applied Econometrics,
as used in the guided tour on right-censored proportional hazard models, except that now the duration involved, MISDEMEANOR RECIDIVISM, has been converted to interval dummy variables. If you import this data file in EasyReg and then open "Menu > Single equation models > Interval-censored proportional hazard models", the following window appears.

The information window will not be shown if you have downloaded this guided tour.

Click "Clear" and select all the variabled in the model:

The original variable MISDEMEANOR RECIDIVISM (years) is the time in years between the release from prison or jail and the first arrest after release. This variable has been converted to four dummy variables indicating that the ex-convict involved was arrested for the first time after release in the time period involved. The variable AGE (DAYS/1000) is the age of the ex-convict, in units of 1000 days, and the variable SENT (DAYS/1000) is the duration of the last sentence, also in units of 1000 days. The rescaling in units of 1000 days is done for numerical reasons. The dummy variables MALE and BLACK do not need an explanation. The dummy variable RELEASE is equal to 1 if the ex-convict was unconditionally released, and is equal to 0 is the release was on parole or probation.

Click "Selection OK":

In this example I will not choose a subsample: Click "No" and "Continue":

The dummy variables involved are the dependent variables. Select them and click "Continue":

This window is for information only. Click "Continue":

Click "Check data validity". Then EasyReg will check whether the Y variables are dummy variables, and the covariates are not multicollinear or constant. If the variables are OK, the text on the button "Check data validity" changes to "Get brackets". Click it.

EasyReg reads the brackets from the dummy variable names if present, otherwise you have to type them in.

It is strongly recommended to keep the bracket values small, because EasyReg will run in numerical problems if you choose the bracket values too large. For example, initially I had chosen the number of days as basis for the intervals, so that the intervals were (0,180], (180,360], (360,720] and (720,1080], but EasyReg got stuck.

After reading the last pair of brackets, the following window appears.

### Piecewise linear integrated hazard

Now you can choose one of the options for the (integrated) hazard function. I will choose in first instance the default option:

EasyReg maximizes the log-likelihood function in two steps. In the first step the parameters ai are fixed to 1, which corresponds with the Weibull integrated hazard L(t | a) = t. The initial values of the b's are set to zero. At this stage you are allowed to change the start values and the lower and upper bounds of the b's.

The log-likelihood function will now be maximized using the simplex method of Nelder and Mead. See for example

• W.H.Press, B.P.Flannery, S.A.Teukolsky and W.T.Vetterling (1986), Numerical Recipes, Cambridge University Press, pp. 292-293.
Click "Start SIMPLEX iteration":

Leave the "Auto restart" box checked. Then EasyReg will automatically restart the simplex iteration from the last solution until the log-likelihood and/or the parameters do not change anymore. The option "Batch mode" is only useful if your data set is very large, so that you have to run this module overnight. Thus, click "Start":

You may restart the iteration, but it is unlikely that you will get a further improvement. Thus, click "Done":

These are the start values for the second and final step. You can no longer change the parameter bounds. Thus, click "Restart SIMPLEX iteration":

We are now done with the simplex iteration. Thus, click "Done":

Click "Continue" to make the scores of the loglikelihood function and the asymptotic variance matrix of the ML estimates. Then the "What to do next?" module will be activated:

Let us have a look at the integrated baseline hazard:

Next, let us have a look at the corresponding baseline hazard:

EasyReg automatically scales the plot between the minimum and maximum function value. To change that, click "Display options":

Now set the bottom value at zero, and the top value slightly larger than the maximum function value, and redisplay the plot:

This is only an approximation of the true baseline hazard, though. The functions values are the values of the a's corresponding to the brackets.

We are now done with this model. Thus, click "Done":

However, this is not the end of the guided tour! The question we need to address is: What have we learned from this exercise?

### Weibull baseline hazard

We have seen that the piecewise linear baseline hazard is decreasing. This suggests that a standard Weibull specification may be appropriate. To check this, I have re-estimated the model under the standard Weibull option for the baseline hazard:

Following the same procedures as before, we get the output involved:

So, which specification of the hazard is better? The standard Weibull specification is the most parsimonious one, but the piecewise linear specification is the most general specification. To determine whether the additional parameters in the piecewise linear case pay-off, we should compare the information criteria in the two cases, in particular the Hannan-Quinn and Schwarz critera because they are consistent: The number of parameters corresponding to the lowest value is equal to the correct number of parameters with probability converging to one if the sample size converges to infinity. The Akaike information criterion is not consistent in this sense.

The values of Akaike, Hannan-Quinn and Schwarz critera in the Weibull case are slightly higher than in the general piecewise linear case, so that we may conclude that the Weibull specification is not appropriate for this data set.

### Output for the piecewise linear integrated baseline hazard case

```Proportional hazard model:

Dependent variables and their brackets:
Y(1) = Dummy MISDEMEANOR RECIDIVISM (years) in (0,0.5] Brackets:  0, .5
Y(2) = Dummy MISDEMEANOR RECIDIVISM (years) in (0.5,1] Brackets:  .5, 1
Y(3) = Dummy MISDEMEANOR RECIDIVISM (years) in (1,2] Brackets:  1, 2
Y(4) = Dummy MISDEMEANOR RECIDIVISM (years) in (2,3] Brackets:  2, 3

Covariates:
X(1) = MALE
X(2) = BLACK
X(3) = RELEASE
X(4) = AGE (DAYS/1000)
X(5) = SENT (DAYS/1000)

Chosen (sub-)sample: 1->1985
Effective (sub-)sample size: 1984

Hazard function option:
Piecewise linear integrated hazard

The log-likelihood function has been maximized using the simplex method
of Nelder and Mead. The algorithm involved is a Visual Basic translation
of the Fortran algorithm involved in:
W.H.Press, B.P.Flannery, S.A.Teukolsky and W.T.Vetterling, 'Numerical
Recipes', Cambridge University Press, 1986, pp. 292-293

Estimation results:
Parameters ML estimate t-value p-value Covariates
beta(1)       0.286449   3.155 0.00161 MALE
beta(2)       0.065508   1.000 0.31712 BLACK
beta(3)      -0.302968  -2.948 0.00320 RELEASE
beta(4)      -0.063995  -5.452 0.00000 AGE (DAYS/1000)
beta(5)      -0.111339  -2.981 0.00287 SENT (DAYS/1000)
alpha(1)      0.631260   5.346 0.00000
alpha(2)      0.624200   5.249 0.00000
alpha(3)      0.483767   5.267 0.00000
alpha(4)      0.335525   5.068 0.00000

The two-sided p-values are based on the normal approximation

Log-likelihood:            -2686.518534
Number of parameters:      9
Effective sample size (n): 1984
Information criteria:
Akaike:               2.717257
Hannan-Quinn:         2.726576
Schwarz:              2.742627

Bracket point Integrated hazard
0.5                    0.315630
1                      0.627730
2                      1.111497
3                      1.447022
```

The interpretation of the estimation results is similar to the right-censored case. See the guided tour on right-censored proportional hazard models.