Guided tour on nonparametric SMINK regression estimation

This guided tour contains mathematical formulas and/or Greek symbols and are therefore best viewed with Internet Explorer, Opera or Google Chrome, because other web browsers (in particular Firefox) may not display the "Symbol" fonts involved. For example, "b" should be displayed as the Greek letter "beta" rather than the Roman "b". If not, reload this guided tour in Internet Explorer, Opera or Google Chrome, or make one of them your default web browser.

About SMINK density and regression estimators

SMINK stands for: Sample Moments Integrating Normal Kernel. The SMINK density estimator is a variant of the normal kernel density estimator with the following additional properties:

• The first and second moments of the SMINK density estimator are equal to the first and second sample moments;
• The maximum likelihood estimator of the (multivariate) normal density is a special case of a SMINK density estimator.

The SMINK regression function estimator is derived from the SMINK estimator of the joint density of the dependent and independent variables, together with the SMINK estimator of the marginal density of the independent variables, in the same way as for the original densities.

The basic properties of SMINK density and regression estimators are summarized here. This summary is also provided by the SMINK regression estimation module (SMINKREG) itself. Before you use the SMINK regression module, please read at least the summary first, but I strongly recommend to read the original paper as well:

Bierens, H.J (1983): "Sample Moments Integrating Normal Kernel Estimators of Density and Regression Functions", Sankhya 45, Series B, 160-192.

SMINK regression: How its works

In order to demonstrate how SMINK regression works, I have generated n = 500 independent standard normally distributed random variables X1, X2, and U, and combined them into a dependent variable Y = X12 + X22 + U, so that E[Y|X1] = 1 + X12 and E[Y|X1,X2] = X12 + X22.

This module does not allow you to select more than two explanatory variables, because only univariate and bivariate regression functions can be plotted.

I will demonstrate the bivariate case first.

SMINK regression with two explanatory variables

Open "Menu > Single equation models > Bierens' nonparametric SMINK regression", and select Y, X1 and X2 as the data in the usual way, with Y the dependent variable, and X1 and X2 the independent variables. Then the first SMINK regression window is:

The SMINK regression procedure requires the specification of two window width parameters, a window width g2,n for the SMINK estimator of the joint density of (X1,X2), and a window width g1,n for the SMINK regression estimator. Both have to be contained in the interval [xn,1], where

xn = (Ön)-a/k,

with k the number of X variables (k = 2 in our case), and a Î (0,1). The default value is a = 0.5, which I have chosen.

Click "'alpha' OK". Then the window changes to:

If you leave the option "Optimize gamma" checked and click "Continue" then g2,n and g1,n will be optimized by grid search over the interval [xn,1]. If you uncheck this box first then g2,n and g1,n are set equal to xn. I will leave this box checked:

Choose the number of grid points, and click "Grid OK". Then after a few minutes the following window appears.

Since X1 and X2 are normally distributed, the optimal SMINK density estimator is the maximum likelihood estimator, which corresponds to g2,n = 1. But the regression function is nonlinear, so that g1,n should be less than 1.

If a nonparametric regression estimator is computed for values of the X variables for which the density is close to zero, the estimate will be unreliable. Therefore, the plot range of the X variables should not be too wide. Since X1 and X2 are normally distributed, I have chosen the plot range [-2,2] for both X1 and X2.

The grid points are the grid points of the 3-dimensional plot in the direction of the X variable involved. The default value 29 usually gives the best picture.

Once the plot range and grid points have been specified, the plot data is computed, which takes a few minutes in this case, and when done the module PLOT3DIM is activated. This module opens with a blank picture window. Once you click the "Start" button, the picture is displayed:

Note that at the corners of the plot area [-2,2]´[-2,2] the shape curls down, due to the lack of observations in these areas, but nevertheless the parabolic shape of the true regression function is clearly visible in the center of [-2,2]´[-2,2].

In this example the plot area can easily be determined from the design, but in general you do not know the actual distribution of the X variables. In that case I recommend to open "Menu > Data analysis > Summary statistics", select the X variables involved, and then use the 10% and 90% quantile values as lower and upper bounds of the plot range. In our case we have

```
10% quantile X1 = -1.36670
90% quantile X1 =  1.30098
10% quantile X2 = -1.17452
90% quantile X2 =  1.32714
```

If we choose these quantiles as the plot range, the result looks indeed much better:

Warning

Finally, just as a warning, let me show you what happens in the latter case if you do not adjust the plot range, but just accept the minimum and maximum values:

Close to the borders of the plot area there is hardly any data to support the SMINK regression function estimator, which yield spurious results. This is not typical for SMINK regression, but applies to nonparametric kernel regression in general.

SMINK regression with one explanatory variable

Select Y and X1 as the data in the usual way, with Y the dependent variable, and X1 the independent variable. Now proceed in the same way as before, i.e., choose a = 0.5, optimize the two window width parameters g1,n and g2,n by grid search, using 10 grid points for each, and choose the plot range [-1.36670,1.30098]:

You now have the option to compare the SMINK regression curve with the linear regression line, but I will not choose this option. Then the plot result is:

Recall that the true regression function is E[Y|X1] = 1 + X12, which is resembled by the plot of the SMINK estimate.