Terms extraction from unstructured data silos

Richard K. Lomotey, Ralph Deters

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Scopus citations

Abstract

The major challenge that the big data era brings to the services computing landscape is debris of unstructured data. The high-dimensional data is in heterogeneous formats, schemaless, and requires multiple storage APIs is some cases. This situation has made it almost impractical to apply existing data mining techniques which are designed for schema-based data sources in a knowledge discovery in database (KDD) process. In this paper, a tool called TouchR is proposed which algorithmically relies on the Hidden Markov Model (HMM) to extract terms from data silos; specifically, distributed NoSQL databases- which we model as network graph. Our use case graph consists of storage nodes such as CouchDB, Neo4J, DynamoDB etc. The evaluation of TouchR shows high accuracy for terms extraction and organization.

Original languageEnglish (US)
Title of host publicationProceedings of 2013 8th International Conference on System of Systems Engineering
Subtitle of host publicationSoSE in Cloud Computing and Emerging Information Technology Applications, SoSE 2013
Pages19-24
Number of pages6
DOIs
StatePublished - 2013
Event2013 8th International Conference on System of Systems Engineering: SoSE in Cloud Computing and Emerging Information Technology Applications, SoSE 2013 - Maui, HI, United States
Duration: Jun 2 2013Jun 6 2013

Publication series

NameProceedings of 2013 8th International Conference on System of Systems Engineering: SoSE in Cloud Computing and Emerging Information Technology Applications, SoSE 2013

Other

Other2013 8th International Conference on System of Systems Engineering: SoSE in Cloud Computing and Emerging Information Technology Applications, SoSE 2013
Country/TerritoryUnited States
CityMaui, HI
Period6/2/136/6/13

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering

Cite this