Topics and terms mining in unstructured data stores

Richard K. Lomotey, Ralph Deters

Research output: Contribution to conferencePaperpeer-review

12 Scopus citations

Abstract

One of the major challenges of the 'Big Data' epoch is unstructured data mining. The problem arises due to the storage of high-dimensional data that has no standard schema. While knowledge discovery in database (KDD) algorithms were designed for data extraction, the algorithms best fit for structured data storages. Moreover, today, at the data storage level, NoSQL databases have been deployed in response to accommodate the unstructured data. However, the over-reliance on multiple APIs by NoSQL storages hampers efficient data extraction from different NoSQL storages. Also, there are limited numbers of tools available that can perform KDD tasks on NoSQL data stores. In this work, we explore the trend in unstructured data mining and detail the future direction and challenges. Then, focusing on topics and terms extraction from NoSQL databases, we propose a tool called TouchR2, which algorithmically relies on bloom filtering and parallelization. Using the CouchDB data storage as the test case, the evaluation of TouchR2 shows high accuracy for terms extraction and organization within a much optimized duration.

Original languageEnglish (US)
Pages854-861
Number of pages8
DOIs
StatePublished - 2013
Event2013 16th IEEE International Conference on Computational Science and Engineering, CSE 2013 - Sydney, NSW, Australia
Duration: Dec 3 2013Dec 5 2013

Other

Other2013 16th IEEE International Conference on Computational Science and Engineering, CSE 2013
Country/TerritoryAustralia
CitySydney, NSW
Period12/3/1312/5/13

All Science Journal Classification (ASJC) codes

  • Computer Science (miscellaneous)

Fingerprint

Dive into the research topics of 'Topics and terms mining in unstructured data stores'. Together they form a unique fingerprint.

Cite this