This guided tour contains mathematical formulas and/or Greek symbols and are therefore best viewed with Internet Explorer, Opera or Google Chrome, because other web browsers (in particular Firefox) may not display the "Symbol" fonts involved. For example, "b" should be displayed as the Greek letter "beta" rather than the Roman "b". If not, reload this guided tour in Internet Explorer, Opera or Google Chrome, or make one of them your default web browser.
The proportional hazard model is a popular model for durations, for example unemployment spells. Let T be the duration and let X be a vector of explanatory variables (called covariates). The proportional hazard model assumes that for t > 0,
say, where L(t | a) is the
integrated baseline hazard function, depending on a parameter (vector)
a, and
exp(b'X)
is the systematic hazard. The function
EasyReg assumes that the duration T is interval-censored: Given the interval endpoints
0 = b_{0} < b_{1} < b_{1} < .... < b_{M} ,
the dependent variables are M dummy variables:
I(T Î (b_{i-1} , b_{i} ] ), i = 1,2,...,M ,
where I(.) is the indicator function: I(true) = 1, I(false) = 0. Then
EasyReg provides five options for the integrated baseline hazard function L(t | a).
This is the most general option, because only the values of
L(t | a)
in t = b_{i} matter for the conditional probabilities
The difference with the standard Weibull case is that in the standard case
the baseline hazard
Both Weibull hazards are monotonic. This hazard function takes a maximum
at t = a_{2}.
Note that in this case
The difference with the previous case is that now
The parameter a_{1} in the last four cases plays the role
of scale parameter. Consequently, it is not allowed to include a constant in X because
a_{1} plays indirectly that role:
For further information on the proportional hazard model, open SURVIVAL2.PDF.
The data for this demonstration is available as an Excel file in CSV format: SURVIVAL2.CSV. This is the same subsample of released ex-convicts in Texas, from the larger data set used in:
The information window will not be shown if you have downloaded this guided tour.
Click "Clear" and select all the variabled in the model:
The original variable MISDEMEANOR RECIDIVISM (years) is the time in years between the release from prison or jail and the first arrest after release. This variable has been converted to four dummy variables indicating that the ex-convict involved was arrested for the first time after release in the time period involved. The variable AGE (DAYS/1000) is the age of the ex-convict, in units of 1000 days, and the variable SENT (DAYS/1000) is the duration of the last sentence, also in units of 1000 days. The rescaling in units of 1000 days is done for numerical reasons. The dummy variables MALE and BLACK do not need an explanation. The dummy variable RELEASE is equal to 1 if the ex-convict was unconditionally released, and is equal to 0 is the release was on parole or probation.
Click "Selection OK":
In this example I will not choose a subsample: Click "No" and "Continue":
The dummy variables involved are the dependent variables. Select them and click "Continue":
This window is for information only. Click "Continue":
Click "Check data validity". Then EasyReg will check whether the Y variables are dummy variables, and the covariates are not multicollinear or constant. If the variables are OK, the text on the button "Check data validity" changes to "Get brackets". Click it.
EasyReg reads the brackets from the dummy variable names if present, otherwise you have to type them in.
It is strongly recommended to keep the bracket values small, because EasyReg will run in numerical problems if you choose the bracket
values too large. For example, initially I had chosen the number of days as basis for the intervals, so that the intervals were
After reading the last pair of brackets, the following window appears.
The Help button gives access to SURVIVAL2.PDF. Click "Continue":
Now you can choose one of the options for the (integrated) hazard function. I will choose in first instance the default option:
EasyReg maximizes the log-likelihood function in two steps. In the first step the parameters
a_{i} are fixed to 1, which corresponds with the
Weibull integrated hazard
The log-likelihood function will now be maximized using the simplex method of Nelder and Mead. See for example
Leave the "Auto restart" box checked. Then EasyReg will automatically restart the simplex iteration from the last solution until the log-likelihood and/or the parameters do not change anymore. The option "Batch mode" is only useful if your data set is very large, so that you have to run this module overnight. Thus, click "Start":
You may restart the iteration, but it is unlikely that you will get a further improvement. Thus, click "Done":
These are the start values for the second and final step. You can no longer change the parameter bounds. Thus, click "Restart SIMPLEX iteration":
We are now done with the simplex iteration. Thus, click "Done":
Click "Continue" to make the scores of the loglikelihood function and the asymptotic variance matrix of the ML estimates. Then the "What to do next?" module will be activated:
Let us have a look at the integrated baseline hazard:
Next, let us have a look at the corresponding baseline hazard:
EasyReg automatically scales the plot between the minimum and maximum function value. To change that, click "Display options":
This is only an approximation of the true baseline hazard, though. The functions values are the values of the a's corresponding to the brackets.
We are now done with this model. Thus, click "Done":
However, this is not the end of the guided tour! The question we need to address is: What have we learned from this exercise?
We have seen that the piecewise linear baseline hazard is decreasing. This suggests that a standard Weibull specification may be appropriate. To check this, I have re-estimated the model under the standard Weibull option for the baseline hazard:
Following the same procedures as before, we get the output involved:
So, which specification of the hazard is better? The standard Weibull specification is the most parsimonious one, but the piecewise linear specification is the most general specification. To determine whether the additional parameters in the piecewise linear case pay-off, we should compare the information criteria in the two cases, in particular the Hannan-Quinn and Schwarz critera because they are consistent: The number of parameters corresponding to the lowest value is equal to the correct number of parameters with probability converging to one if the sample size converges to infinity. The Akaike information criterion is not consistent in this sense.
The values of Akaike, Hannan-Quinn and Schwarz critera in the Weibull case are slightly higher than in the general piecewise linear case, so that we may conclude that the Weibull specification is not appropriate for this data set.
Proportional hazard model: Dependent variables and their brackets: Y(1) = Dummy MISDEMEANOR RECIDIVISM (years) in (0,0.5] Brackets: 0, .5 Y(2) = Dummy MISDEMEANOR RECIDIVISM (years) in (0.5,1] Brackets: .5, 1 Y(3) = Dummy MISDEMEANOR RECIDIVISM (years) in (1,2] Brackets: 1, 2 Y(4) = Dummy MISDEMEANOR RECIDIVISM (years) in (2,3] Brackets: 2, 3 Covariates: X(1) = MALE X(2) = BLACK X(3) = RELEASE X(4) = AGE (DAYS/1000) X(5) = SENT (DAYS/1000) Chosen (sub-)sample: 1->1985 Effective (sub-)sample size: 1984 Hazard function option: Piecewise linear integrated hazard The log-likelihood function has been maximized using the simplex method of Nelder and Mead. The algorithm involved is a Visual Basic translation of the Fortran algorithm involved in: W.H.Press, B.P.Flannery, S.A.Teukolsky and W.T.Vetterling, 'Numerical Recipes', Cambridge University Press, 1986, pp. 292-293 Estimation results: Parameters ML estimate t-value p-value Covariates beta(1) 0.286449 3.155 0.00161 MALE beta(2) 0.065508 1.000 0.31712 BLACK beta(3) -0.302968 -2.948 0.00320 RELEASE beta(4) -0.063995 -5.452 0.00000 AGE (DAYS/1000) beta(5) -0.111339 -2.981 0.00287 SENT (DAYS/1000) alpha(1) 0.631260 5.346 0.00000 alpha(2) 0.624200 5.249 0.00000 alpha(3) 0.483767 5.267 0.00000 alpha(4) 0.335525 5.068 0.00000 The two-sided p-values are based on the normal approximation Log-likelihood: -2686.518534 Number of parameters: 9 Effective sample size (n): 1984 Information criteria: Akaike: 2.717257 Hannan-Quinn: 2.726576 Schwarz: 2.742627 Bracket point Integrated hazard 0.5 0.315630 1 0.627730 2 1.111497 3 1.447022
The interpretation of the estimation results is similar to the right-censored case. See the guided tour on right-censored proportional hazard models.