Vertical set square distance: A fast and scalable technique to compute total variation in large datasets

Publication Name : PROCEEDINGS OF THE ISCA 20TH INTERNATIONAL CONFERENCE ON COMPUTERS AND THEIR APPLICATIONS

DOI :

Date : 2005


In this paper, we introduce the vertical set square distance (VSSD) technique that is designed to efficiently and scalably measure the total variation of a set about a fixed point in large datasets. The set can be any projected subspace of any vector space, including oblique subspaces (not just dimensional subspaces). VSSD can determine the closeness of a point to a set of points in a dataset, which can be very useful for classification, clustering and outlier detection tasks. The technique employs a vertical data structure called the Predicate-tree (P-tree)(1). Performance evaluations based on both synthetic and real-world datasets show that VSSD technology is fast, accurate and scales well to very large datasets, as compared to similar techniques utilizing horizontal record-based data structure.

Type
Book
ISSN
EISSN
Page
60 - 65