TY - GEN
T1 - RSenter
T2 - 2013 IEEE International Congress on Big Data, BigData 2013
AU - Lomotey, Richard K.
AU - Deters, Ralph
PY - 2013
Y1 - 2013
N2 - There is enormous volume of user generated content (data) today in open source repositories, online social networks, and so on that enterprises can feed on to enhance product and services delivery. Apart from the open source data, enterprises are also generating a lot of data in-house since modern business requirements are shifting from paper-base to digital records. The major setback however is that, the data is unstructured in the sense that it is in heterogeneous formats (different file types including multimedia files), it is schema less, and it is scattered on multiple sources. This condition makes knowledge discovery (a.k.a. data mining) very challenging. Previous studies have proposed the hierarchical clustering methodology since it enhances human readability and provides clear dependency structure through topics, term and document organization. But, the methodology can be resource intensive and time consuming. Our work investigates the methodology and proposes a tool called RSenter that searches based on parallelization, random walk (or linear search), pessimistic search, and optimistic search in order to generate the hierarchical structure in real time within a search space. Currently, RSenter can search through NoSQL databases and HTML documents and traverse through all the links that are connected to that HTML to the nth depth, extracting the entire user specified elements (topics and terms). Further, the tool can search through an entire repository and organize the files in a hierarchical structure regardless of the file formats.
AB - There is enormous volume of user generated content (data) today in open source repositories, online social networks, and so on that enterprises can feed on to enhance product and services delivery. Apart from the open source data, enterprises are also generating a lot of data in-house since modern business requirements are shifting from paper-base to digital records. The major setback however is that, the data is unstructured in the sense that it is in heterogeneous formats (different file types including multimedia files), it is schema less, and it is scattered on multiple sources. This condition makes knowledge discovery (a.k.a. data mining) very challenging. Previous studies have proposed the hierarchical clustering methodology since it enhances human readability and provides clear dependency structure through topics, term and document organization. But, the methodology can be resource intensive and time consuming. Our work investigates the methodology and proposes a tool called RSenter that searches based on parallelization, random walk (or linear search), pessimistic search, and optimistic search in order to generate the hierarchical structure in real time within a search space. Currently, RSenter can search through NoSQL databases and HTML documents and traverse through all the links that are connected to that HTML to the nth depth, extracting the entire user specified elements (topics and terms). Further, the tool can search through an entire repository and organize the files in a hierarchical structure regardless of the file formats.
UR - http://www.scopus.com/inward/record.url?scp=84886082270&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84886082270&partnerID=8YFLogxK
U2 - 10.1109/BigData.Congress.2013.59
DO - 10.1109/BigData.Congress.2013.59
M3 - Conference contribution
AN - SCOPUS:84886082270
SN - 9780768550060
T3 - Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013
SP - 395
EP - 402
BT - Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013
Y2 - 27 June 2013 through 2 July 2013
ER -