A central idea in distance-based machine learning algorithms such k-nearest neighbors and manifold learning is to choose a set of references, or a neighborhood, based on a distance functions to represent the local structure around a query point and use the local structures as the basis to construct models. Local Partial Least Square (local PLS), which is the result of applying this neighborhood based idea in Partial Least Square (PLS), has been shown to perform very well on the regression of small-sample sized and multicollinearity data, but seldom used in high-dimensionality classification. Furthermore the difference between PLS and local PLS with respect to their optimal intrinsic dimensions is unclear. In this paper we combine local PLS with non-Euclidean distance in order to find out which measures are better suited for high dimensionality classification. Experimental results obtained on 8 UCI and spectroscopy datasets show that the Euclidean distance is not a good distance function for use in local PLS classification, especially in high dimensionality cases; instead Manhattan distance and fractional distance are preferred. Experimental results further show that the optimal intrinsic dimension of local PLS is smaller than that of the standard PLS.
- High dimensionality classification
- Distance function
- Fractional distance
- Local Partial Least Squares