Guided tour on nonparametric kernel regression

The nonparametric kernel regression estimator involved is briefly described in PDF file KERNELREG.PDF. Before you use the kernel regression module (KERNELREG), please read this PDF file first.

Note that nonparametric kernel regression estimation is an advanced feature of EasyReg International. If you are a novice econometrician you should not use it.

Nonparametric kernel regression: How its works

In order to demonstrate how kernel regression works, I have generated n = 500 independent standard normally distributed random variables X1, X2, and U, and combined them into a dependent variable Y = X12 + X22 + U, so that E[Y|X1] = 1 + X12 and E[Y|X1,X2] = X12 + X22.

This module does not allow you to select more than two explanatory variables, because only univariate and bivariate regression functions can be plotted.

I will demonstrate the bivariate case first.

Nonparametric kernel regression with two explanatory variables

Open "Menu > Single equation models > Nonparametric kernel regression", and select Y, X1 and X2 as the data in the usual way, with Y the dependent variable, and X1 and X2 the independent variables. Then the first kernel regression window is:

Kernel regression Window 1


As noted in KERNELREG.PDF, the constant c of the bandwidth can be determined by cross-validation. Here I have specified the lower bound of c as 0.5 and the upper bound as 2:

Kernel regression Window 2


Click "Bound OK". Then the following window appears.

Kernel regression Window 3



I have chosen 4 grid points. Click "Grid OK". Then the following window appears.

Kernel regression Window 4



Click "Continue". Then the mean square error will be minimized over these grid points. The optimal c is 1:

Kernel regression Window 5


Click "Make plot data".

Kernel regression Window 6


If a nonparametric regression estimator is computed for values of the X variables for which the density is close to zero, the estimate will be unreliable. Therefore, the plot range of the X variables should not be too wide. Since X1 and X2 are standard normally distributed, I have chosen the plot range [-2,2] for both X1 and X2.

Kernel regression Window 7


The grid points are the grid points of the 3-dimensional plot in the directions of the X variables involved. The default value 29 usually gives the best picture.

Once the plot range and grid points have been specified, the plot data is computed, which takes a few minutes in this case, and when done the module PLOT3DIM is activated. This module opens with a blank picture window.

Kernel regression Window 8


Once you click the "Start" button, the picture is displayed:

Kernel regression Window 9



Note that at the edges of the plot area [-2,2][-2,2] the shape of the kernel regression function looks somewhat ragged, due to the lack of observations in these areas, but nevertheless the parabolic shape of the true regression function is clearly visible in the center of [-2,2][-2,2].

In this example the plot area can easily be determined from the design, but in general you do not know the actual distribution of the X variables. In that case I recommend to open "Menu > Data analysis > Summary statistics", select the X variables involved, and then use the 10% and 90% quantile values as lower and upper bounds of the plot range. In our case we have


10% quantile X1 = -1.36670
90% quantile X1 =  1.30098
10% quantile X2 = -1.17452
90% quantile X2 =  1.32714

If we choose these quantiles as the plot range, the result looks indeed much better:

Kernel regression Window 10



Warning

Finally, just as a warning, let me show you what happens if you do not adjust the plot ranges, but just accept the minimum and maximum values:

Kernel regression Window 11



Close to the borders of the plot area there is hardly any data to support the kernel regression function estimator, which yield spurious results.

Kernel regression with one explanatory variable

Select Y and X1 as the data in the usual way, with Y the dependent variable, and X1 the independent variable. Now proceed in the same way as before. The cross-validated c is again 1. Moreover, choose the plot range [-1.36670,1.30098]:

Kernel regression Window 12



You now have the option to compare the kernel regression curve with the linear regression line, but I will not choose this option. Then the plot result is:

Kernel regression Window 13



Recall that the true regression function is E[Y|X1=x] = 1 + x2, which is resembled by the plot of the kernel estimate.

This is the end of the guided tour on nonparametric kernel regression.