Guided tour on right-censored proportional hazard models

This guided tour contains mathematical formulas and/or Greek symbols and are therefore best viewed with Internet Explorer, Opera or Google Chrome, because other web browsers (in particular Firefox) may not display the "Symbol" fonts involved. For example, "b" should be displayed as the Greek letter "beta" rather than the Roman "b". If not, reload this guided tour in Internet Explorer, Opera or Google Chrome, or make one of them your default web browser.

The proportional hazard model

The proportional hazard model is a popular model for durations, for example unemployment spells. Let T be the duration and let X be a vector of explanatory variables (called covariates). The proportional hazard model assumes that for t > 0,

P[ T > t | X ] = exp( -exp(b'X)L(t | a ) ) = S(t | a, b'X ),

say, where L(t | a) is the integrated baseline hazard function, depending on a parameter (vector) a, and exp(b'X) is the systematic hazard. The function S(t | a, b'X) is called the survival function.

EasyReg assumes that the duration T is right-censored: The duration T is only observed to a maximum Tmax which may vary per observation but this maximum is considered to be exogenous.

Options for the baseline hazard function

EasyReg provides four options for the integrated baseline hazard function L(t | a).

The parameter a1 in these four cases plays the role of scale parameter. Consequently, it is not allowed to include a constant in X because a1 plays indirectly that role: exp(b'X)a1 = exp(ln(a1) + b'X).

For further information on the right-censored proportional hazard model, open SURVIVAL1.PDF.

How to estimate right-censored proportional hazard models via EasyReg

The data and selection of variables

The data for this demonstration is available as an Excel file in CSV format: SURVIVAL1.CSV. This is a subsample of released ex-convicts in Texas, from the larger data set used in:

If you import this data file in EasyReg and then open "Menu > Single equation models > right-censored proportional hazard models", the following window appears.

SURVIVAL window

The information window will not be shown if you have downloaded this guided tour.

Click "Clear" and select all the variabled in the model:

SURVIVAL window

The variable MISDEMEANOR RECIDIVISM (DAYS/1000) is the time in units of 1000 days between the release from prison or jail and the first arrest after release. The dummy variable DUMMY RIGHT-CENSORING indicates whether the duration MISDEMEANOR RECIDIVISM (DAYS/1000) is right-censored (=1) or not (=0). If so, the value of MISDEMEANOR RECIDIVISM (DAYS/1000) is the upperbound Tmax of the censoring period. The variable AGE (DAYS/1000) is the age of the ex-convict, in units of 1000 days, and the variable SENT (DAYS/1000) is the duration of the last sentence, also in units of 1000 days. The rescaling in units of 1000 days is done for numerical reasons. The dummy variables MALE and BLACK do not need an explanation. The dummy variable RELEASE is equal to 1 if the ex-convict was unconditionally released, and is equal to 0 is the release was on parole or probation. All Texas' ex-convicts in this subsample are only arrested for a misdemeanor if not censored. Therefore, this subsample if far from representative for recidivism in Texas, because it is inconceivable that felony recidivism does not occur in Texas.

Click "Selection OK":

SURVIVAL window

In this example I will not choose a subsample: Click "No" and "Continue":

SURVIVAL window

We have to select the duration first. Double-click it and click "O.K.":

SURVIVAL window

Next, double-click the right-censoring dummy variable and click "O.K.":

SURVIVAL window

EasyReg selects automatically the other variables as the covariates. Click "Continue":

SURVIVAL window

As said before, there are four options for the baseline hazard. I will select the standard Weibull baseline hazard. Thus, click "Choose this option":

SURVIVAL window

EasyReg maximizes the log-likelihood function in two steps. In the first step the parameters ai are fixed to 1, which corresponds to the Weibull integrated hazard L(t | a) = t. The initial values of the b's are set to zero. At this stage you are allowed to change the lower and upper bounds and initial values of the b's.

The log-likelihood function will now be maximized using the simplex method of Nelder and Mead. See for example

Click "Start SIMPLEX iteration":

SURVIVAL window

Leave the "Auto restart" box checked. Then EasyReg will automatically restart the simplex iteration from the last solution until the log-likelihood and/or the parameters do not change anymore. The option "Batch mode" is only useful if your data set is very large, so that you have to run this module overnight. Thus, click "Start":

SURVIVAL window

We are done with the first stage. Thus, click "Done".

SURVIVAL window

These are the start values for the second and final step. You can no longer change the parameter bounds and start values. Thus, click "Restart SIMPLEX iteration" and then "Start".

SURVIVAL window

You may restart the iteration, but it is unlikely that you will get a further improvement. Thus, click "Done":

SURVIVAL window

Click "Continue" to make the scores of the loglikelihood function and the asymptotic variance matrix of the ML estimates. Then the "What to do next?" module will be activated.

SURVIVAL window

In this window I have clicked the "Option" button, which opens the options menu. The only option I will demonstrate is the option to plot the baseline hazard:

SURVIVAL window

Output for the Weibull baseline hazard case

Selected variables:

MISDEMEANOR RECIDIVISM (DAYS/1000)
DUMMY RIGHT-CENSORING
MALE
BLACK
RELEASE
AGE (DAYS/1000)
SENT (DAYS/1000)

First available observation: t = 1 
Last available observation: t = 1985 
Number of missing variables in between:  1
= chosen

Duration:
T = MISDEMEANOR RECIDIVISM (DAYS/1000)

Censoring dummy variable:
C = DUMMY RIGHT-CENSORING (1=>censored, 0=>not)


Covariates:
X(1) = MALE
X(2) = BLACK
X(3) = RELEASE
X(4) = AGE (DAYS/1000)
X(5) = SENT (DAYS/1000)

Effective sample size (n) = 1984

Hazard function option: 
Weibull baseline hazard


The log-likelihood function has been maximized using the simplex method
of Nelder and Mead. The algorithm involved is a Visual Basic translation
of the Fortran algorithm involved in:
W.H.Press, B.P.Flannery, S.A.Teukolsky and W.T.Vetterling, 'Numerical
Recipes', Cambridge University Press, 1986, pp. 292-293

Estimation results:
Parameters ML estimate t-value p-value Covariates
beta(1)       0.215900   2.604 0.00922 MALE
beta(2)       0.050520   0.836 0.40325 BLACK
beta(3)      -0.431479  -4.767 0.00000 RELEASE
beta(4)      -0.087980  -7.882 0.00000 AGE (DAYS/1000)
beta(5)      -0.106984  -3.127 0.00177 SENT (DAYS/1000)
alpha(1)      1.657058   6.037 0.00000
alpha(2)      0.722742  25.168 0.00000

The two-sided p-values are based on the normal approximation

Log-likelihood:            -1663.377020
Number of parameters:      7
Effective sample size (n): 1984
Information criteria:      
     Akaike:               1.683848
     Hannan-Quinn:         1.691096
     Schwarz:              1.703581

Conclusions

Although this is only a demonstration of how to use the module under review, some interesting but tentative conclusions can be drawn from these results. First, race does not have a significant effect on recidivism. Both covariates AGE (DAYS/1000) and SENT (DAYS/1000) have significant negative coefficients, which implies that the survival function of recidivism time T = MISDEMEANOR RECIDIVISM (DAYS/1000)

P[ T > t | X ] = exp( -exp(b'X)L(t | a ) ) = S(t | a, b'X ),

is increasing in the components AGE (DAYS/1000) and SENT (DAYS/1000) of X. This means that recidivism is negatively related to age and sentence time. Similarly, males have a higher risk of recidivism than females, and the ex-convicts released on probabition or parole have a higher risk of recidivism. The latter seems counter-intuitive, but this may be due to the fact that parole violations are classified as misdemeanor offences.

Moreover, the hazard of recidivism is decreasing over time. This means that conditional on the covariates and the event that at time t the ex-convict has not yet been arrested, the probability that he/she will be arrested in a short (fixed-length) period [t , t + d) thereafter decreases over time t.

This is the end of the guided tour on right-censored proportional hazard models