## Guided tour on SMINK density estimation

This guided tour contains mathematical formulas and/or Greek symbols and are therefore best viewed with Internet Explorer, Opera or Google Chrome, because other web browsers (in particular Firefox) may not display the "Symbol" fonts involved. For example, "b" should be displayed as the Greek letter "beta" rather than the Roman "b". If not, reload this guided tour in Internet Explorer, Opera or Google Chrome, or make one of them your default web browser.

SMINK stands for: Sample Moments Integrating Normal Kernel. The SMINK density estimator is a variant of the normal kernel density estimator with the following additional properties:

• The first and second moments of the SMINK density estimator are equal to the first and second sample moments;
• The maximum likelihood estimator of the (multivariate) normal density is a special case of a SMINK density estimator.

The basic properties of SMINK density estimators are summarized here. This summary is also provided by the SMINK density estimation module (SMINKDEN) itself. Before you use the SMINK density module, please read at least the summary first, but I strongly recommend to read the original paper as well:

Bierens, H.J (1983): "Sample Moments Integrating Normal Kernel Estimators of Density and Regression Functions", Sankhya 45, Series B, 160-192.

### SMINK density estimation: How its works

In order to demonstrate how SMINK density estimation works, I have generated n = 500 independent standard normally distributed random variables Z1 and Z2, and combined them into two dependent random variables, X1 = Z12 + Z2 and X2 = Z22 + Z1.

The SMINK density module does not allow you to select more than two variables, because only univariate and bivariate density functions can be plotted.

I will demonstrate the bivariate case first.

#### Bivariate SMINK density estimation

Open "Menu > Data analysis> Bierens' SMINK density estimation", and select X1 and X2 as the data in the usual way. Then the first SMINK density window is:

The first thing you have to do is to select the plot range. It is advisible to determine the plot range on the basis of the 90% and 10% quantiles of the two variables, via "Menu > Data analysis > Summary statistics". See the guided tour on SMINK regression estimation for the reasons. In this case the 90% quantile is about 3, and the 10% quantile is about -0.8, for both variables.

The grid points are the grid points of the 3-dimensional plot in the direction of the X variable involved. The default value 29 usually gives the best picture.

Click the "SMINK density estimation options" button:

The SMINK density estimation procedure requires the specification of a window width parameter gn which has to be contained in the interval [xn,1], where

xn = (Ön)-a/k,

with k the number of X variables (k = 2 in our case), and a Î (0,1). The default value is a = 0.5, which I have chosen.

Click "'alpha' OK". Then the window changes to:

If you leave the option "Optimize gamma" checked and click "Continue" then gn will be optimized by grid search over the interval [xn,1]. If you uncheck this box first then gn will be set equal to xn. I will leave this box checked:

Choose the number of grid points, and click "Grid OK".

Next, click "Options OK". Then the plot data are computed, which takes a few minutes in this case, and when done the module PLOT3DIM will be activated. This module opens with a blank picture window. Once you click the "Start" button, the picture is displayed:

This is not the best view point. Turn the picture using the horizontal arrow buttons:

#### Univariate SMINK density estimation

Select now only X1 as the data in the usual way, and proceed in the same way as before, i.e., choose a = 0.5, optimize the window width parameter gn by grid search, using 10 grid points, and choose the plot range [-0.8,3]. You also have now the option to compare the SMINK density estimator with the corresponding normal density (i.e. the normal density with mean the sample mean and variance the sample variance), or with the standard normal density, but I have not chosen these options: