Skip to main navigation Skip to search Skip to main content

CAREER: Optimization Based Methods for Robust Pattern Recognition in Time-Series Data

Project: Research project

Project Details

Description

This research proposes new mathematical and algorithmic tools for identifying patterns in and effectively mining time-series data, i.e. a sequence of data points, measured typically at successive time instants spaced at uniform time intervals. A large variety of real-world data sources such as speech and audio, biomedical signals, health care records, network-traffic and stock market data etc. manifest as time-series and their analysis is of significant interest to both government and industry. The explosion of such data sources has only been exacerbated by the digital revolution, viz. the generous amount of audio-video streams on the internet, the storage of large amounts of chronological health care records in electronic databases and the continuous generation of new time-series data from advances in sensing. Automated software tools that can find patterns in a large time-series sequence, help in fast and scalable retrieval, and categorize large time-series collections are hence highly desirable. The proposed research is in developing such software (algorithmic)tools with a particular focus on robustness and scalability. The problem of robustness refers to the fact that time-series that may have the "same appeal" to a human consumer, e.g. different versions of the same song/video, may not necessarily be digitally identical. Hence, robust techniques are needed that can withstand distortions which do not change the essence of the time-series content. Scalability requires that the pattern-matching techniques be fast and easy to implement, so that the solutions can be deployed to mine large collections. Further, to prepare the next generation of engineers in electrical engineering and computer science, the project includes a strong educational component. At the heart of this educational component is an edutainment game where a human player, i.e. students with varying levels of academic preparation (high-school, undergraduate and graduate), compete against a computer algorithm in a video piracy challenge. The game is aimed at making the learning process more interactive, particularly for undergraduate students. A serious practical challenge in mining time-series data for emerging applications is the ability to withstand distortions - that is often instances of the "same underlying" time series are observed under noise, amplitude and/or time scaling and other miscellaneous operations. Many existing techniques for time-series comparisons do not enable distortion robustness and the ones that do, often come at a substantial computational cost. Further, existing algorithmic techniques enable control of key properties of time-series features such as robustness and uniqueness only at an intuitive, often heuristic level. The proposed research advocates judicious selection of time-series extrema and aims to break the classical trade-off between computational efficiency in time-series feature extraction and comparison vs. enabling robustness to distortions. Unlike existing methods, which employ pre-processing time-series filters "inspired" from intuition, explicit optimization of the filter is proposed in the sense of cost functions that capture key feature attributes such as robustness and uniqueness of the extracted extrema. Optimal extrema extraction will be investigated in two different setups: a.) a deterministic framework where example training time-series are used in the optimization, and b.) a statistical framework where stochastic models on time-series are used. A variety of related sub-problems also emerge, namely: a.) connections to edge detection problems in image processing and vision, b.) encoding and comparisons of time-series extrema, and c.) extensions to finding robust extrema under non-linear operations on the time-series. The research plan is to juxtapose the development of the algorithmic tools with two real-world applications: 1.) multimedia fingerprinting, and 2.) bio-medical time series analysis. Additionally, software tools namely edutainment games will be developed based on these applications which will play a crucial role in enhancing the PI's research and classroom teaching. Dissemination of research results will be done via articles in leading Journals and conferences, and via online MATLAB software toolboxes.
StatusFinished
Effective start/end date5/1/154/30/20

Funding

  • National Science Foundation: $500,000.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.