Guided tour on right-censored proportional hazard models

This guided tour contains mathematical formulas and/or Greek symbols and are therefore best viewed with Internet Explorer, Opera or Google Chrome, because other web browsers (in particular Firefox) may not display the "Symbol" fonts involved. For example, "b" should be displayed as the Greek letter "beta" rather than the Roman "b". If not, reload this guided tour in Internet Explorer, Opera or Google Chrome, or make one of them your default web browser.

The proportional hazard model

The proportional hazard model is a popular model for durations, for example unemployment spells. Let T be the duration and let X be a vector of explanatory variables (called covariates). The proportional hazard model assumes that for t > 0,

P[ T > t | X ] = exp( -exp(b'X)L(t | a ) ) = S(t | a, b'X ),

say, where L(t | a) is the integrated baseline hazard function, depending on a parameter (vector) a, and exp(b'X) is the systematic hazard. The function S(t | a, b'X) is called the survival function.

EasyReg assumes that the duration T is right-censored: The duration T is only observed to a maximum Tmax which may vary per observation but this maximum is considered to be exogenous.

Options for the baseline hazard function

EasyReg provides four options for the integrated baseline hazard function L(t | a).

• The generalized Weibull baseline hazard

The difference with the standard Weibull case is that in the standard case the baseline hazard l(t | a) approaches infinity if a2 < 1 and t ¯ 0, whereas in the generalized Weibull case l(t | a) stays bounded.

• The unimodal baseline hazard

Both Weibull hazards are monotonic. This hazard function takes a maximum at t = a2. Note that in this case l(0 | a) = 0.

• The generalized unimodal baseline hazard

The difference with the previous case is that now l(0 | a) > 0, whereas still l(t | a) is maximal at t = a2.

The parameter a1 in these four cases plays the role of scale parameter. Consequently, it is not allowed to include a constant in X because a1 plays indirectly that role: exp(b'X)a1 = exp(ln(a1) + b'X).

For further information on the right-censored proportional hazard model, open SURVIVAL1.PDF.

How to estimate right-censored proportional hazard models via EasyReg

The data and selection of variables

The data for this demonstration is available as an Excel file in CSV format: SURVIVAL1.CSV. This is a subsample of released ex-convicts in Texas, from the larger data set used in:

• Bierens, H.J., and J.R. Carvalho (2007), "Semi-Nonparametric Competing Risk Analysis of Recidivism", forthcoming in the Journal of Applied Econometrics
If you import this data file in EasyReg and then open "Menu > Single equation models > right-censored proportional hazard models", the following window appears.

The information window will not be shown if you have downloaded this guided tour.

Click "Clear" and select all the variabled in the model:

The variable MISDEMEANOR RECIDIVISM (DAYS/1000) is the time in units of 1000 days between the release from prison or jail and the first arrest after release. The dummy variable DUMMY RIGHT-CENSORING indicates whether the duration MISDEMEANOR RECIDIVISM (DAYS/1000) is right-censored (=1) or not (=0). If so, the value of MISDEMEANOR RECIDIVISM (DAYS/1000) is the upperbound Tmax of the censoring period. The variable AGE (DAYS/1000) is the age of the ex-convict, in units of 1000 days, and the variable SENT (DAYS/1000) is the duration of the last sentence, also in units of 1000 days. The rescaling in units of 1000 days is done for numerical reasons. The dummy variables MALE and BLACK do not need an explanation. The dummy variable RELEASE is equal to 1 if the ex-convict was unconditionally released, and is equal to 0 is the release was on parole or probation. All Texas' ex-convicts in this subsample are only arrested for a misdemeanor if not censored. Therefore, this subsample if far from representative for recidivism in Texas, because it is inconceivable that felony recidivism does not occur in Texas.

Click "Selection OK":

In this example I will not choose a subsample: Click "No" and "Continue":

We have to select the duration first. Double-click it and click "O.K.":

Next, double-click the right-censoring dummy variable and click "O.K.":

EasyReg selects automatically the other variables as the covariates. Click "Continue":

As said before, there are four options for the baseline hazard. I will select the standard Weibull baseline hazard. Thus, click "Choose this option":

EasyReg maximizes the log-likelihood function in two steps. In the first step the parameters ai are fixed to 1, which corresponds to the Weibull integrated hazard L(t | a) = t. The initial values of the b's are set to zero. At this stage you are allowed to change the lower and upper bounds and initial values of the b's.

The log-likelihood function will now be maximized using the simplex method of Nelder and Mead. See for example

• W.H.Press, B.P.Flannery, S.A.Teukolsky and W.T.Vetterling (1986), Numerical Recipes, Cambridge University Press, pp. 292-293.
Click "Start SIMPLEX iteration":

Leave the "Auto restart" box checked. Then EasyReg will automatically restart the simplex iteration from the last solution until the log-likelihood and/or the parameters do not change anymore. The option "Batch mode" is only useful if your data set is very large, so that you have to run this module overnight. Thus, click "Start":

We are done with the first stage. Thus, click "Done".

These are the start values for the second and final step. You can no longer change the parameter bounds and start values. Thus, click "Restart SIMPLEX iteration" and then "Start".

You may restart the iteration, but it is unlikely that you will get a further improvement. Thus, click "Done":

Click "Continue" to make the scores of the loglikelihood function and the asymptotic variance matrix of the ML estimates. Then the "What to do next?" module will be activated.

In this window I have clicked the "Option" button, which opens the options menu. The only option I will demonstrate is the option to plot the baseline hazard:

Output for the Weibull baseline hazard case

```Selected variables:

MISDEMEANOR RECIDIVISM (DAYS/1000)
DUMMY RIGHT-CENSORING
MALE
BLACK
RELEASE
AGE (DAYS/1000)
SENT (DAYS/1000)

First available observation: t = 1
Last available observation: t = 1985
Number of missing variables in between:  1
= chosen

Duration:
T = MISDEMEANOR RECIDIVISM (DAYS/1000)

Censoring dummy variable:
C = DUMMY RIGHT-CENSORING (1=>censored, 0=>not)

Covariates:
X(1) = MALE
X(2) = BLACK
X(3) = RELEASE
X(4) = AGE (DAYS/1000)
X(5) = SENT (DAYS/1000)

Effective sample size (n) = 1984

Hazard function option:
Weibull baseline hazard

The log-likelihood function has been maximized using the simplex method
of Nelder and Mead. The algorithm involved is a Visual Basic translation
of the Fortran algorithm involved in:
W.H.Press, B.P.Flannery, S.A.Teukolsky and W.T.Vetterling, 'Numerical
Recipes', Cambridge University Press, 1986, pp. 292-293

Estimation results:
Parameters ML estimate t-value p-value Covariates
beta(1)       0.215900   2.604 0.00922 MALE
beta(2)       0.050520   0.836 0.40325 BLACK
beta(3)      -0.431479  -4.767 0.00000 RELEASE
beta(4)      -0.087980  -7.882 0.00000 AGE (DAYS/1000)
beta(5)      -0.106984  -3.127 0.00177 SENT (DAYS/1000)
alpha(1)      1.657058   6.037 0.00000
alpha(2)      0.722742  25.168 0.00000

The two-sided p-values are based on the normal approximation

Log-likelihood:            -1663.377020
Number of parameters:      7
Effective sample size (n): 1984
Information criteria:
Akaike:               1.683848
Hannan-Quinn:         1.691096
Schwarz:              1.703581
```

Conclusions

Although this is only a demonstration of how to use the module under review, some interesting but tentative conclusions can be drawn from these results. First, race does not have a significant effect on recidivism. Both covariates AGE (DAYS/1000) and SENT (DAYS/1000) have significant negative coefficients, which implies that the survival function of recidivism time T = MISDEMEANOR RECIDIVISM (DAYS/1000)

P[ T > t | X ] = exp( -exp(b'X)L(t | a ) ) = S(t | a, b'X ),

is increasing in the components AGE (DAYS/1000) and SENT (DAYS/1000) of X. This means that recidivism is negatively related to age and sentence time. Similarly, males have a higher risk of recidivism than females, and the ex-convicts released on probabition or parole have a higher risk of recidivism. The latter seems counter-intuitive, but this may be due to the fact that parole violations are classified as misdemeanor offences.

Moreover, the hazard of recidivism is decreasing over time. This means that conditional on the covariates and the event that at time t the ex-convict has not yet been arrested, the probability that he/she will be arrested in a short (fixed-length) period [t , t + d) thereafter decreases over time t.