Vertical set square distance: A fast and scalable technique to compute total variation in large datasets
DOI :
Date : 2005
In this paper, we introduce the vertical set square distance (VSSD) technique that is designed to efficiently and scalably measure the total variation of a set about a fixed point in large datasets. The set can be any projected subspace of any vector space, including oblique subspaces (not just dimensional subspaces). VSSD can determine the closeness of a point to a set of points in a dataset, which can be very useful for classification, clustering and outlier detection tasks. The technique employs a vertical data structure called the Predicate-tree (P-tree)(1). Performance evaluations based on both synthetic and real-world datasets show that VSSD technology is fast, accurate and scales well to very large datasets, as compared to similar techniques utilizing horizontal record-based data structure.