Similarity preserving snippet-based visualization of web search results

IEEE Trans Vis Comput Graph. 2014 Mar;20(3):457-70. doi: 10.1109/TVCG.2013.242.

Abstract

Internet users are very familiar with the results of a search query displayed as a ranked list of snippets. Each textual snippet shows a content summary of the referred document (or webpage) and a link to it. This display has many advantages, for example, it affords easy navigation and is straightforward to interpret. Nonetheless, any user of search engines could possibly report some experience of disappointment with this metaphor. Indeed, it has limitations in particular situations, as it fails to provide an overview of the document collection retrieved. Moreover, depending on the nature of the query--for example, it may be too general, or ambiguous, or ill expressed--the desired information may be poorly ranked, or results may contemplate varied topics. Several search tasks would be easier if users were shown an overview of the returned documents, organized so as to reflect how related they are, content wise. We propose a visualization technique to display the results of web queries aimed at overcoming such limitations. It combines the neighborhood preservation capability of multidimensional projections with the familiar snippet-based representation by employing a multidimensional projection to derive two-dimensional layouts of the query search results that preserve text similarity relations, or neighborhoods. Similarity is computed by applying the cosine similarity over a "bag-of-words" vector representation of collection built from the snippets. If the snippets are displayed directly according to the derived layout, they will overlap considerably, producing a poor visualization. We overcome this problem by defining an energy functional that considers both the overlapping among snippets and the preservation of the neighborhood structure as given in the projected layout. Minimizing this energy functional provides a neighborhood preserving two-dimensional arrangement of the textual snippets with minimum overlap. The resulting visualization conveys both a global view of the query results and visual groupings that reflect related results, as illustrated in several examples shown.

Publication types

  • Research Support, Non-U.S. Gov't