TY - GEN
T1 - Automatic tag recommendation for metadata annotation using probabilistic topic modeling
AU - Tuarob, Suppawong
AU - Pouchard, Line C.
AU - Lee Giles, C.
PY - 2013
Y1 - 2013
N2 - The increase of the complexity and advancement in ecological and environmental sciences encourages scientists across the world to collect data from multiple places, times, and thematic scales to verify their hypotheses. Accumulated over time, such data not only increases in amount, but also in the diversity of the data sources spread around the world. This poses a huge challenge for scientists who have to manually search for information. To alleviate such problems, ONEMercury has recently been implemented as part of the DataONE project to serve as a portal for accessing environmental and observational data across the globe. ONEMercury harvests metadata from the data hosted by multiple repositories and makes it searchable. However, harvested metadata records sometimes are poorly annotated or lacking meaningful keywords, which could affect effective retrieval. Here, we develop algorithms for automatic annotation of metadata. We transform the problem into a tag recommendation problem with a controlled tag library, and propose two variants of an algorithm for recommending tags. Our experiments on four datasets of environmental science metadata records not only show great promises on the performance of our method, but also shed light on the different natures of the datasets.
AB - The increase of the complexity and advancement in ecological and environmental sciences encourages scientists across the world to collect data from multiple places, times, and thematic scales to verify their hypotheses. Accumulated over time, such data not only increases in amount, but also in the diversity of the data sources spread around the world. This poses a huge challenge for scientists who have to manually search for information. To alleviate such problems, ONEMercury has recently been implemented as part of the DataONE project to serve as a portal for accessing environmental and observational data across the globe. ONEMercury harvests metadata from the data hosted by multiple repositories and makes it searchable. However, harvested metadata records sometimes are poorly annotated or lacking meaningful keywords, which could affect effective retrieval. Here, we develop algorithms for automatic annotation of metadata. We transform the problem into a tag recommendation problem with a controlled tag library, and propose two variants of an algorithm for recommending tags. Our experiments on four datasets of environmental science metadata records not only show great promises on the performance of our method, but also shed light on the different natures of the datasets.
UR - http://www.scopus.com/inward/record.url?scp=84882251627&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84882251627&partnerID=8YFLogxK
U2 - 10.1145/2467696.2467706
DO - 10.1145/2467696.2467706
M3 - Conference contribution
AN - SCOPUS:84882251627
SN - 9781450320764
T3 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries
SP - 239
EP - 248
BT - JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries
T2 - 13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2013
Y2 - 22 July 2013 through 26 July 2013
ER -