TY - GEN
T1 - Toward bridging the annotation-retrieval gap in image search by a generative modeling approach
AU - Datta, Ritendra
AU - Ge, Weina
AU - Li, Jia
AU - Wang, James Z.
PY - 2006
Y1 - 2006
N2 - While automatic image annotation remains an actively pursued research topic, enhancement of image search through its use has not been extensively explored. We propose an annotation-driven image retrieval approach and argue that under a number of different scenarios, this is very effective for semantically meaningful image search. In particular, our system is demonstrated to effectively handle cases of partially tagged and completely untagged image databases, multiple keyword queries, and example based queries with or without tags, all in near-realtime. Because our approach utilizes extra knowledge from a training dataset, it outperforms state-of-the-art visual similarity based retrieval techniques. For this purpose, a novel structure-composition model constructed from Beta distributions is developed to capture the spatial relationship among segmented regions of images. This model combined with the Gaussian mixture model produces scalable categorization of generic images. The categorization results are found to surpass previously reported results in speed and accuracy. Our novel annotation framework utilizes the categorization results to select tags based on term frequency, term saliency, and a WordNet-based measure of congruity, to boost salient tags while penalizing potentially unrelated ones. A bag of words distance measure based on WordNet is used to compute semantic similarity. The effectiveness of our approach is shown through extensive experiments.
AB - While automatic image annotation remains an actively pursued research topic, enhancement of image search through its use has not been extensively explored. We propose an annotation-driven image retrieval approach and argue that under a number of different scenarios, this is very effective for semantically meaningful image search. In particular, our system is demonstrated to effectively handle cases of partially tagged and completely untagged image databases, multiple keyword queries, and example based queries with or without tags, all in near-realtime. Because our approach utilizes extra knowledge from a training dataset, it outperforms state-of-the-art visual similarity based retrieval techniques. For this purpose, a novel structure-composition model constructed from Beta distributions is developed to capture the spatial relationship among segmented regions of images. This model combined with the Gaussian mixture model produces scalable categorization of generic images. The categorization results are found to surpass previously reported results in speed and accuracy. Our novel annotation framework utilizes the categorization results to select tags based on term frequency, term saliency, and a WordNet-based measure of congruity, to boost salient tags while penalizing potentially unrelated ones. A bag of words distance measure based on WordNet is used to compute semantic similarity. The effectiveness of our approach is shown through extensive experiments.
UR - http://www.scopus.com/inward/record.url?scp=34547151997&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34547151997&partnerID=8YFLogxK
U2 - 10.1145/1180639.1180856
DO - 10.1145/1180639.1180856
M3 - Conference contribution
AN - SCOPUS:34547151997
SN - 1595934472
SN - 9781595934475
T3 - Proceedings of the 14th Annual ACM International Conference on Multimedia, MM 2006
SP - 977
EP - 986
BT - Proceedings of the 14th Annual ACM International Conference on Multimedia, MM 2006
T2 - 14th Annual ACM International Conference on Multimedia, MM 2006
Y2 - 23 October 2006 through 27 October 2006
ER -