We propose a novel approach for online estimation of probability density functions. Our approach is based on the kernel density estimation (KDE) and produces models that enable online adaptation, which at the same time maintain a low (or bounded) complexity that scales sublinearly with the observed samples. As a result we get an Online Gaussian Mixture Model.
Key idea 1: maintain a non-parametric model of the data itself in a form of a sample distribution ps(x) and use this model to calculate the kernel density estimate pkde(x) of the target distribution when required.
Key idea 2: each new observation as a distribution in a form of a Dirac-delta function and we model the sample distribution by a mixture of Gaussian and Dirac-delta functions.
Key idea 3: each component carries with it its detailed model in a form of a two-component mixture. This is used during online operation to detect whether a component is oversmoothing the distribution. In that case, the component is replaced by its detailed model.
Update in a nutshell: (1) we update the sample model with the observed data-point. (2) the updated model is used to recalculate the optimal bandwidth for the KDE. (3) the sample distribution is refined and compressed.
Estimation of N-D distributions:
3D distributions can be fully visualized, or via sets of 2D projections. The image below shows a 3D reference distributions from which we drew 10k samples (left). The KDE estimated from these and the respective projections are also shown (right).
Mode detection:
The oKDE toolbox contains some potprocessing tools for analysis of the estimated density. Below we show an example of mode detection in the oKDE and reclustering of the components that form each of the detected modes. The reclustering is performed either by moment matching (matching first two moments of a Gaussian to the subset of components in GMM) or by L2 distance minimization between (minimizing the l2 distance between a Gaussian and the subset of components in the GMM).
Density estimation examples
A 1D nonstationary density estimation:
A 3D and 2D statinary density estimation:
Online resources
The Java implementation of the oKDE (implemented by Jonas Luthke) is available from the following GIT hub repository:
- Github project: https://github.com/joluet/okde-java
- library - https://github.com/joluet/okde-java
- example - https://github.com/joluet/okde-java-example
The C++ implementation of a faster and stable oKDE by Fereira, de Matos nad Riberio is available from the following repository:
- Link to source and installation: link
Matlab code:
- A toolbox for online KDE: link
- Fast batch multivariate bandwidth (kernel size) estimation: link or this link
- 2D batch bandwidth estimation code: link
- Unscented Hellinger distance between Gaussian mixture models: link