KNN and NPRED

Sharma, A. & Mehrotra, R. (2014). “An information theoretic alternative to model a natural system using observational information alone.” Water Resources Research, 50(1), 650-660. DOI:10.1002/2013WR013845

How to define a system? This is a problem faced routinely in science and engineering, with solutions developed from our understanding of the processes inherent, to assessing the underlying structure based on observational evidence alone. In general, system specification involves identifying a few meaningful predictors (from a large enough set that is plausibly related to the response) and formulating a relation between them and the system response being modeled. For systems where physical relationships are less apparent, and sufficient observational records exist, a range of statistical alternatives have been investigated as a possible way of specifying the underlying form. Here we introduce partial information (PI) as a new means for specifying the system, its key advantage being the relative lack of major assumptions about the processes being modeled in order to characterize the complete system. In addition to PI which offers a means of identifying the system predictors of interest, we also introduce the concept of partial weights (PWs) which uses the identified predictors to formulate a predictive model that acknowledges the relative contributions, predictor variables make to the prediction of the response. We assess the utility of the PI-PW framework using synthetically generated data sets from known linear, nonlinear, and high-dimensional dynamic yet chaotic systems and demonstrate the efficacy of the procedure in ascertaining the underlying true system with varying extents of observational evidence available. We highlight how this framework can be invaluable in formulating prediction models for natural systems which are modeled using empirical or semiempirical alternatives and discuss current limitations that still need to be overcome.

Sharma, A., Mehrotra, R., Li, J., Jha, S. (2016). “A programming tool for nonparametric system prediction using Partial Informational Correlation and Partial Weights.” Environmental Modelling & Software, 83: 271-275. DOI: http://dx.doi.org/10.1016/j.envsoft.2016.05.021

Identification of system predictors forms the first step towards formulating a predictive model. Approaches for identifying such predictors are often limited by the need to assume a relationship between the predictor and response. To address this limitation, (Sharma and Mehrotra, 2014) presented a nonparametric predictive model using Partial Informational Correlation (PIC) and Partial Weights (PW). This study describes the open source Nonparametric Prediction (NPRED) R-package. NPRED identifies system predictors using the PIC logic, and predicts the response using a k-nearest-neighbor regression formulation based on a PW based weighted Euclidean distance. The capabilities of the package are demonstrated using synthetic examples and a real application of predicting seasonal rainfall in the Warragamba dam near Sydney, Australia. The results show clear improvements in predictability as compared to the use of linear predictive alternatives, as well as nonparametric alternatives that use an un-weighted Euclidean distance.

The software described in the above mentioned paper can be obtained by clicking here.