Project Details
Description
The area of sublinear algorithms aims to establish algorithmic foundations for processing big data. What global properties of the data can we understand while reading only a small portion of it? What if the data is noisy? Can we quickly restore a corrupted data point while ensuring some global property of the data? This project focuses specifically on sublinear algorithms for real-valued numeric data and on real-world challenges associated with it.
Effectively exploiting big data can provide significant societal benefits. This project has the potential to change how real-valued data is processed and analyzed, and how privacy of sensitive datasets is handled. In addition, this project includes educational activities designed to ensure the work's broader impact and to improve workforce training. They include advising and mentoring graduate students; continuing to build a strong theoretical foundations group at Penn State; including the results of the proposed research in the PI's graduate course on sublinear algorithms; widely disseminating the results of this research and, more broadly, algorithmic and computational ideas via talks and publications; and fostering diversity by providing mentoring and educational activities targeted at women in computer science and mathematics.
While there are established and successful lines of research in sublinear algorithms and property testing on Boolean functions, codes (and, more generally, algebraic properties), graphs, and discrete distributions, several recent applications of sublinear algorithms require working with real data. These include the study of Lipschitz functions with applications to data privacy and the study of submodular functions with applications to economics. To achieve their full potential, sublinear algorithms need to be able to handle real-valued data. The aim of this project is to lay the foundations for this area of study. This will require the development of new tools and will open up new connections and new areas of application. The research activities are grouped into three parts: 1. testing and local reconstruction of Lipschitz functions with applications to data privacy; 2. new measures for accuracy guarantees, with the focus on L_p metrics; and 3. new models for data access.
Status | Finished |
---|---|
Effective start/end date | 6/1/14 → 4/30/18 |
Funding
- National Science Foundation: $450,000.00
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.