Abstract
One of the major challenges of the 'Big Data' epoch is unstructured data mining. The problem arises due to the storage of high-dimensional data that has no standard schema. While knowledge discovery in database (KDD) algorithms were designed for data extraction, the algorithms best fit for structured data storages. Moreover, today, at the data storage level, NoSQL databases have been deployed in response to accommodate the unstructured data. However, the over-reliance on multiple APIs by NoSQL storages hampers efficient data extraction from different NoSQL storages. Also, there are limited numbers of tools available that can perform KDD tasks on NoSQL data stores. In this work, we explore the trend in unstructured data mining and detail the future direction and challenges. Then, focusing on topics and terms extraction from NoSQL databases, we propose a tool called TouchR2, which algorithmically relies on bloom filtering and parallelization. Using the CouchDB data storage as the test case, the evaluation of TouchR2 shows high accuracy for terms extraction and organization within a much optimized duration.
Original language | English (US) |
---|---|
Pages | 854-861 |
Number of pages | 8 |
DOIs | |
State | Published - 2013 |
Event | 2013 16th IEEE International Conference on Computational Science and Engineering, CSE 2013 - Sydney, NSW, Australia Duration: Dec 3 2013 → Dec 5 2013 |
Other
Other | 2013 16th IEEE International Conference on Computational Science and Engineering, CSE 2013 |
---|---|
Country/Territory | Australia |
City | Sydney, NSW |
Period | 12/3/13 → 12/5/13 |
All Science Journal Classification (ASJC) codes
- Computer Science (miscellaneous)