Bridging the Ultimate Semantic Gap: A Semantic Search Engine for Internet Videos

Semantic search in video is a novel and challenging problem in information and multimedia retrieval. Existing solutions are mainly limited to text matching, in which the query words are matched against the textual metadata generated by users. This paper presents a state-of-the-art system for event search without any textual metadata or example videos. The system relies on substantial video content understanding and allows for semantic search over a large collection of videos. The novelty and practicality is demonstrated by the evaluation in NIST TRECVID 2014, where the proposed system achieves the best performance. We share our observations and lessons in building such a stateof-the-art system, which may be instrumental in guiding the design of the future system for semantic search in video.

