TY - GEN
T1 - QR2
T2 - 34th IEEE International Conference on Data Engineering, ICDE 2018
AU - Durairaj Gunasekaran, Yeshwanth
AU - Asudeh, Abolfazl
AU - Hasani, Sona
AU - Zhang, Nan
AU - Jaoua, Ali
AU - Das, Gautam
N1 - Funding Information:
In order to study the performance of different algorithms, we consider different combinations of (i) both web databases, (ii) 1D and MD algorithms, (iii) filtering conditions, and more importantly (iv) ranking functions that are independent, positively correlated, and negatively correlated with the web database’s system ranking function: 1D: The reranking, in this case, is on a single attribute. For both Zillow and Blue Nile and for queries with different filtering predicates, we will choose different attributes for ranking. Also, to construct the rankings with different correlations with the system ranking function, we will test the performance of algorithms in both ascending and descending orders. MD: The MD reranking is on more than one attribute, where the user-specified ranking function is the dot product of the slider values with the ranking attributes. In order to construct queries with different correlations with the system ranking function, we test different combinations of positive and negative slider values on different numbers of attributes. Especially, we choose Blue Nile for constructing ranking functions with more than two ranking attributes. Fig. 3(b) shows an example of such ranking functions (price - 0.1 carat - 0.5 depth). On-the-fly indexing: Indexing the dense regions for future is the main technique used in 1D-RERANK and MD-RERANK to resolve the performance issues of both (1D/MD)-BASELINE and (1D/MD)-BINARY. Showing the effectiveness of this technique is part of the demonstration plan. To do so, after issuing multiple queries, we will track the performance of (1D/MD)-RERANK in terms of both processing time and the number of submitted queries to the web database. Best v.s. worse cases: Finally, we will demonstrate some of the best and worst case scenarios to show efficiency and limitations of the system. For example, we will show that when a large number of tuples have the same value V on an attribute Ai, the performance of the system may drop significantly. That is because, in order to identify the next top tuple, the system may first need to crawl all tuples where t[Ai] = V . On the other hand, when the attribute values follow a uniform distribution on the domain space, even the binary search strategy performs well. Here are two of such functions: • The function price + LengthWidthRatio is inefficient to run on Blue Nile. While processing this query, QR2 needs to crawl all the tuples with t[LengthWidthRatio] = 1. In Blue Nile, when writing this paper, around 20% of the tuples satisfy this predicate. The system, therefore, needs to crawl all these tuples before returning the results. Note that thanks to the on-the-flying indexing, (1D/MD)-RERANK will still have a low amortized cost in these cases. • The function price + squarefeet runs fast on Zillow. The goal of this function is to find the houses with low price and small square feet. The positive correlation between attributes price and squarefeet, as well as the positive correlation of this query with Zillow’s system ranking function, makes the algorithms to finish quickly. IV. SUMMARY We proposed to demonstrate QR2, a third party service that enables the on-the-fly processing of queries with any ranking function defined by the user to a web database. Our system uses nothing but the public search interface of the web database and addresses a wide range of users preferences in ranking the results, even if not supported by the database. V. ACKNOWLEDGEMENT This contribution was made possible by NPRP grant No. 07-794-1-145 from the Qatar National Research Fund (a member of Qatar Foundation). Any findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors listed above.
Publisher Copyright:
© 2018 IEEE.
PY - 2018/10/24
Y1 - 2018/10/24
N2 - The ranked retrieval model has rapidly become the de-facto way for search query processing in web databases. Despite the extensive efforts on designing better ranking mechanisms, in practice, many such databases fail to address the diverse and sometimes contradicting preferences of users. In this paper, we present QR2, a third-party service that uses nothing but the public search interface of a web database and enables the on-The-fly processing of queries with any user-specified ranking functions, no matter if the ranking function is supported by the database or not.
AB - The ranked retrieval model has rapidly become the de-facto way for search query processing in web databases. Despite the extensive efforts on designing better ranking mechanisms, in practice, many such databases fail to address the diverse and sometimes contradicting preferences of users. In this paper, we present QR2, a third-party service that uses nothing but the public search interface of a web database and enables the on-The-fly processing of queries with any user-specified ranking functions, no matter if the ranking function is supported by the database or not.
UR - http://www.scopus.com/inward/record.url?scp=85057128785&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85057128785&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2018.00199
DO - 10.1109/ICDE.2018.00199
M3 - Conference contribution
AN - SCOPUS:85057128785
T3 - Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018
SP - 1653
EP - 1656
BT - Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 16 April 2018 through 19 April 2018
ER -