Abstract
Today's high-dimensional data, which is mostly unstructured, makes data patterns discovery (a.k.a. data mining) challenging and difficult for services engineers. Unstructured data mining deviates from existing information extraction methodologies that have been previously put forward due to the fact that recent data formation and storage has no standard schema; and the data is heterogeneous. While the topic is receiving significant attention recently from both the industry and academia, in this work, we aim at performing term association mining from distributed unstructured data storages. To achieve this goal, an analytics-As-A-service (AaaS) framework is proposed that theoretically relies on the Bernoulli algorithm to ensure the accurate determination association between terms. Specifically, the tool is applied to document-oriented data storages where the CouchDB data storage is employed for testing. The pilot evaluation of the proposed AaaS framework for the extraction of mining medical terms shows high accuracy and reliability regarding association maps.
Original language | English (US) |
---|---|
Pages (from-to) | 49-61 |
Number of pages | 13 |
Journal | International Journal of Business Process Integration and Management |
Volume | 7 |
Issue number | 1 |
DOIs | |
State | Published - 2014 |
All Science Journal Classification (ASJC) codes
- Business and International Management
- Strategy and Management
- Management Science and Operations Research