This guided tour contains mathematical formulas and/or Greek symbols and are therefore best viewed with Internet Explorer, Opera or Google Chrome, because other web browsers (in particular Firefox) may not display the "Symbol" fonts involved. For example, "b" should be displayed as the Greek letter "beta" rather than the Roman "b". If not, reload this guided tour in Internet Explorer, Opera or Google Chrome, or make one of them your default web browser.
The proportional hazard model is a popular model for durations, for example unemployment spells. Let T be the duration and let X be a vector of explanatory variables (called covariates). The proportional hazard model assumes that for t > 0,
say, where L(t | a) is the
integrated baseline hazard function, depending on a parameter (vector)
a, and
exp(b'X)
is the systematic hazard. The function
EasyReg assumes that the duration T is right-censored: The duration T is only observed to a maximum T_{max} which may vary per observation but this maximum is considered to be exogenous.
EasyReg provides four options for the integrated baseline hazard function L(t | a).
The difference with the standard Weibull case is that in the standard case
the baseline hazard
Both Weibull hazards are monotonic. This hazard function takes a maximum
at t = a_{2}.
Note that in this case
The difference with the previous case is that now
The parameter a_{1} in these four cases plays the role
of scale parameter. Consequently, it is not allowed to include a constant in X because
a_{1} plays indirectly that role:
For further information on the right-censored proportional hazard model, open SURVIVAL1.PDF.
The data for this demonstration is available as an Excel file in CSV format: SURVIVAL1.CSV. This is a subsample of released ex-convicts in Texas, from the larger data set used in:
The information window will not be shown if you have downloaded this guided tour.
Click "Clear" and select all the variabled in the model:
The variable MISDEMEANOR RECIDIVISM (DAYS/1000) is the time in units of 1000 days between the release from prison or jail and the first arrest after release. The dummy variable DUMMY RIGHT-CENSORING indicates whether the duration MISDEMEANOR RECIDIVISM (DAYS/1000) is right-censored (=1) or not (=0). If so, the value of MISDEMEANOR RECIDIVISM (DAYS/1000) is the upperbound T_{max} of the censoring period. The variable AGE (DAYS/1000) is the age of the ex-convict, in units of 1000 days, and the variable SENT (DAYS/1000) is the duration of the last sentence, also in units of 1000 days. The rescaling in units of 1000 days is done for numerical reasons. The dummy variables MALE and BLACK do not need an explanation. The dummy variable RELEASE is equal to 1 if the ex-convict was unconditionally released, and is equal to 0 is the release was on parole or probation. All Texas' ex-convicts in this subsample are only arrested for a misdemeanor if not censored. Therefore, this subsample if far from representative for recidivism in Texas, because it is inconceivable that felony recidivism does not occur in Texas.
Click "Selection OK":
In this example I will not choose a subsample: Click "No" and "Continue":
We have to select the duration first. Double-click it and click "O.K.":
Next, double-click the right-censoring dummy variable and click "O.K.":
EasyReg selects automatically the other variables as the covariates. Click "Continue":
As said before, there are four options for the baseline hazard. I will select the standard Weibull baseline hazard. Thus, click "Choose this option":
EasyReg maximizes the log-likelihood function in two steps. In the first step the parameters
a_{i} are fixed to 1, which corresponds to the
Weibull integrated hazard
The log-likelihood function will now be maximized using the simplex method of Nelder and Mead. See for example
Leave the "Auto restart" box checked. Then EasyReg will automatically restart the simplex iteration from the last solution until the log-likelihood and/or the parameters do not change anymore. The option "Batch mode" is only useful if your data set is very large, so that you have to run this module overnight. Thus, click "Start":
We are done with the first stage. Thus, click "Done".
These are the start values for the second and final step. You can no longer change the parameter bounds and start values. Thus, click "Restart SIMPLEX iteration" and then "Start".
You may restart the iteration, but it is unlikely that you will get a further improvement. Thus, click "Done":
Click "Continue" to make the scores of the loglikelihood function and the asymptotic variance matrix of the ML estimates. Then the "What to do next?" module will be activated.
In this window I have clicked the "Option" button, which opens the options menu. The only option I will demonstrate is the option to plot the baseline hazard:
Selected variables: MISDEMEANOR RECIDIVISM (DAYS/1000) DUMMY RIGHT-CENSORING MALE BLACK RELEASE AGE (DAYS/1000) SENT (DAYS/1000) First available observation: t = 1 Last available observation: t = 1985 Number of missing variables in between: 1 = chosen Duration: T = MISDEMEANOR RECIDIVISM (DAYS/1000) Censoring dummy variable: C = DUMMY RIGHT-CENSORING (1=>censored, 0=>not) Covariates: X(1) = MALE X(2) = BLACK X(3) = RELEASE X(4) = AGE (DAYS/1000) X(5) = SENT (DAYS/1000) Effective sample size (n) = 1984 Hazard function option: Weibull baseline hazard The log-likelihood function has been maximized using the simplex method of Nelder and Mead. The algorithm involved is a Visual Basic translation of the Fortran algorithm involved in: W.H.Press, B.P.Flannery, S.A.Teukolsky and W.T.Vetterling, 'Numerical Recipes', Cambridge University Press, 1986, pp. 292-293 Estimation results: Parameters ML estimate t-value p-value Covariates beta(1) 0.215900 2.604 0.00922 MALE beta(2) 0.050520 0.836 0.40325 BLACK beta(3) -0.431479 -4.767 0.00000 RELEASE beta(4) -0.087980 -7.882 0.00000 AGE (DAYS/1000) beta(5) -0.106984 -3.127 0.00177 SENT (DAYS/1000) alpha(1) 1.657058 6.037 0.00000 alpha(2) 0.722742 25.168 0.00000 The two-sided p-values are based on the normal approximation Log-likelihood: -1663.377020 Number of parameters: 7 Effective sample size (n): 1984 Information criteria: Akaike: 1.683848 Hannan-Quinn: 1.691096 Schwarz: 1.703581
Although this is only a demonstration of how to use the module under review, some interesting but tentative conclusions can be drawn from these results. First, race does not have a significant effect on recidivism. Both covariates AGE (DAYS/1000) and SENT (DAYS/1000) have significant negative coefficients, which implies that the survival function of recidivism time T = MISDEMEANOR RECIDIVISM (DAYS/1000)
is increasing in the components AGE (DAYS/1000) and SENT (DAYS/1000) of X. This means that recidivism is negatively related to age and sentence time. Similarly, males have a higher risk of recidivism than females, and the ex-convicts released on probabition or parole have a higher risk of recidivism. The latter seems counter-intuitive, but this may be due to the fact that parole violations are classified as misdemeanor offences.
Moreover, the hazard of recidivism is decreasing over time. This means that
conditional on the covariates and the event that at time t the ex-convict
has not yet been arrested, the probability that he/she will be arrested in a short
(fixed-length) period