TY - JOUR
T1 - Toward an accurate analysis of range queries on spatial data
AU - An, Ning
AU - Jin, Ji
AU - Sivasubramaniam, Anand
N1 - Funding Information:
The authors would like to thank Swarup Acharya of Bell Laboratories for his efforts to educate us on the details of the MinSkew algorithm/implementation. This is a substantially extended version of a preliminary paper [15] that appeared in the Proceedings of the IEEE International Conference on Data Engineering, March 2000. The extensions include a more detailed performance study as well as experimental comparisons with several previous estimation techniques. This research has been supported in part by the US National Science Foundation Career Award MIPS-9701475, US NSF grant CCR-9900701, EPA grant R825195-01-0, and US NSF equipment grants CDA-9617315 and EIA-9818327.
PY - 2003
Y1 - 2003
N2 - Analysis of range queries on spatial (multidimensional) data is both important and challenging. Most previous analysis attempts have made certain simplifying assumptions about the data sets and/or queries to keep the analysis tractable. As a result, they may not be universally applicable. This paper proposes a set of five analysis techniques to estimate the selectivity and number of index nodes accessed in serving a range query. The underlying philosophy behind these techniques is to maintain an auxiliary data structure, called a density file, whose creation is a one-time cost, which can be quickly consulted when the query is given. The schemes differ in what information is kept in the density file, how it is maintained, and how this information Is looked up. It is shown that one of the proposed schemes, called Cumulative Density (CD), gives very accurate results (usually less than 5 percent error) using a diverse suite of point and rectangular data sets, that are uniform or skewed, and a wide range of query window parameters. The estimation takes a constant amount of time, which is typically lower than 1 percent of the time that it would take to execute the query, regardless of data set or query window parameters.
AB - Analysis of range queries on spatial (multidimensional) data is both important and challenging. Most previous analysis attempts have made certain simplifying assumptions about the data sets and/or queries to keep the analysis tractable. As a result, they may not be universally applicable. This paper proposes a set of five analysis techniques to estimate the selectivity and number of index nodes accessed in serving a range query. The underlying philosophy behind these techniques is to maintain an auxiliary data structure, called a density file, whose creation is a one-time cost, which can be quickly consulted when the query is given. The schemes differ in what information is kept in the density file, how it is maintained, and how this information Is looked up. It is shown that one of the proposed schemes, called Cumulative Density (CD), gives very accurate results (usually less than 5 percent error) using a diverse suite of point and rectangular data sets, that are uniform or skewed, and a wide range of query window parameters. The estimation takes a constant amount of time, which is typically lower than 1 percent of the time that it would take to execute the query, regardless of data set or query window parameters.
UR - http://www.scopus.com/inward/record.url?scp=0037341265&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0037341265&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2003.1185836
DO - 10.1109/TKDE.2003.1185836
M3 - Article
AN - SCOPUS:0037341265
SN - 1041-4347
VL - 15
SP - 305
EP - 323
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 2
ER -