A Visualization Technique to Support Searching Filtering

. This paper discusses the need for an integrative literature review on Information Visualization for exploratory search particularly in handling data overload. The paper analyses many applications and web sites across disciplines. Certain search engines incorporate visualization to allow for better understanding of the information and at the same time reduce information overload. Current search engines use the query and response (lookup) process. Exploratory search allows for open-ended search. Visual representation is one feature in exploratory search that can be used to improve the overall search. The main contribution of this paper is the review of previous exploratory-search-based works, the utilised features as well as its existing applications, visualizations as the mechanism for developing filters to narrow down the results of searching. Many studies have shown that replacing traditional search engines with exploratory search by using the features of exploratory search can reduce the data overload.


Introduction
The WWW has become a global warehouse of human knowledge and civilization. Given that hundreds of millions of Internet users have generated millions of documents that arrange the massive repository of human knowledge in history, searching the information on the web has become quite challenging and often have need to key in the queries to the search box [1]. Search engines (SEs) are one of the best and most popular tools in the World and are created to help users find useful data and information that could contain text, pictures, or videos [2]. Users can write a few phrase (words) into the search box and it will return a group of documents related to this words. With the introduction of search results, information access on the WWW became easier and quicker. The nature of exploratory search (ES) is one in which the user looking for information in a filed in which they are relatively unfamiliar, unclear of the approach to accomplish their objectives for the search, or both. Providing help to users to meet their search task target when working on ES objectives is not easy because of the level of uncertainty in the search process and the level of user-specific information that is required [3,4]. This is mainly because an ambiguous query could lead to misunderstanding of the information needed, which could misguide the search engine leading to the user to abandon the originally submitted query. In addition, present search systems predict information seekers to make possibly ambiguous information needs explicit in textual search queries; seekers are expected to help huge volume of individual search results [5]. The current user search methods that are suggesting individual queries, related searches, and query auto-completions may not be sufficient because of the exploratory nature of the task. Specifically, because ES normally concern trying several queries, sources, and information to meet a target, overall searcher behaviour should be investigated by analysing the underlying search process and with providing significant search direction to explore [6]. The area of knowledge visualization is a relatively new discipline that concentrate on the collaborative use of interactive graphics to create, integrate and apply knowledge -specially in the management context. Visual representation is an aspect of focus for this paper review. In this paper, we explained how visualizations are used to present the volume of resources available. Also, the effect of the ES on learning and understanding as were used by many applications [7].

Related Work
Information Retrieval (IR) is the procedure of searching within a data collection for a specific information required which is called a query [8]. It is finding materials (usually documents) of unstructured nature (usually text) that satisfies an information need from within big selection (usually stored on computers). IR naturally investigate to find documents in a given collection that are about a given query or that satisfy given information required. The topic or information need is expressed by a query, created from the searcher [9]. Information that satisfy the given query in the judgment of the user are said to be relevant. Information that are not about the given topic are said to be non-relevant.
In this paper, we address the shortcomings of the present information seeking approaches for the Web. Information overload happens where the search results are large from the whole Internet and search results are also unsystematically output or displayed for users. In particular, the information retrieval from the Web is not sufficiently embracing the Web as an information space and software applications [10]. Web-based information retrieval is especially bound to search and browse tasks, which are normally low-level and laborious, and are thus insufficient for an information space as massive as the WWW. IR are more concentrated on ranking schemes, index creation, and retrieval mechanisms than on the searcher, who is normally engaged in other rather tedious tasks [11]. The beauty here is that information visualization is central to human as they are more at ease if the central component of interaction is with visual information. For specific limitations of Web-based information retrieval, visualization may provide a means to fill the gap between a person's needs and system limitation. In this review, visual search is utilized to specify the IR process which involves transformation of search results likes features of retrieved documents or data into representations of visual information [12].

Exploratory Search
Exploratory search (ES) may be utilized to design an information-seeking problem context that is open-ended, multi-faceted, and persistent; and to utilized informationseeking procedure that are opportunistic, iterative, and multi-tactical [13]. In the first stage, ES is generally utilized in scientific discovery, learning, and decision making contexts. In the second stage, ES tactics are utilized in all manner of information seeking and reflect searcher choice and experience as much as the target [14]. Marchionini [14] identifies a number of search activities that differentiate ES from lookup search. Figure 1 show that ES is especially pertinent to learn and investigate activities. When users' information needs are well defined, look up is sufficient for them to locate information. However, when users' needs are defined, look up search is necessary but not sufficient for users to seek information for learning and investigation. The activities are shown as overlapping clouds because there is generally interplay between them, and some activities may be embedded in others. Exploratory search performs a shift from the analytic process of query-document matching closer to leading at all stages of the information-seeking process. Marchionini [14] categorized search activities into three major sorts as shown in the Table 1. In ES users mostly key in a tentative keyword to get them most relevant documents, then search the environment to better understand how to achievement it, selectively seeking and passively obtaining cues about where their following steps lie [15]. ES capable to considered a specialization of information searching, a broader class of activities where new information is research in a defined conceptual area; exploratory data analysis is another paradigm of an information exploration activity [16]. This suggests that a different concept for processing data, other than retrieving a set of relevant materials through a search engine, should be used. The main disadvantage of a classical approach is forcing the user to browse through long lists. Such a method is really ineffective when searchers cannot define precisely what they are looking for. However, the goal becomes clearer as they learn more and more about the topic. In this case it seems to be more effective to use so called exploratory search, defined by Marchionini as follows: Exploratory Search has the potential to give a more complete overview of a topic based on less specific queries. It also allows the user to discover previously unknown facts and to identify relationships within a topic of interest. To obtain new understanding of data, allowing for multiple interaction modes is necessary. According to White and Roth [17], Exploratory Search should increase user responsibility and control. This feature should include letting the user select how the data is visualized depending on the task of interest. In this section, we introduce several visualization techniques for multivariate applications.

Representing Query Terms
In a typical search scenario, the user inputs a set of query terms and obtains a set of matching documents. Usually the query terms remain in the search box. To reformulate the query, the user has to click in the input search box and manually add or remove query terms. A different approach consists of allowing users to more directly interact with query terms. These terms are usually depicted in the form of tags with actions, such as toggling, removing, or clearing. The user is able to easily manipulate the query quicker, thereby obtaining narrower or broader search results. Another visual play on the query consists of providing relevant suggestions. Query suggestions are the product of extensive research in IR on query expansion [18]. The idea behind query suggestion is to offer user additional keywords for the user to consider, which could guide the search towards relevant documents. In its most simple usage, the suggested query terms simply act as shortcuts to previously typed queries [19]. However, suggestions may help the user discover a set of query terms that lead to new documents of interest. Query suggestion has most commonly been implemented within large commercial search engines by using substantial search logs [20]. Quintura is another in the edge of visual SEs such as Kartoo, Mooter and WebBrain. Quintura mostly takes your keywords, runs a search and then render the results into a tag cloud effect on the results page. Searcher can then look at the results (powered by google and Yahoo) listed under the tag cloud and click on the link as per usual, or they can search words showed in the semantic map to focus the query more carefully. For example, at the time of this writing, if a user were to type the query "The Hunger Games", the input by other users may have led to suggest "movie". The user may have been unaware of the movie, "The Hunger Games". If the user had been searching for "Suzanne Collins" instead, then the search engine might more simply, have suggested the query term "book". The suggestions are most often depicted in the form of a list. However, other search companies have tried more appealing visualizations with more or less success (see Figure 2).

Fig 2. Quintura Represents Suggested Query Terms
A drawback to query suggestion is that the method may induce the users into the most conventional pathways and consequently reduce exploration. This problem is usually referred to as "query drifting" [21]. In this respect, what most people would be presented with is a much narrower set of the entire web. The results that Google retrieves may just as well be coming from its cache. One way to address this issue is to provide greater feedback between the query and the retrieved results. This feature would then lead to another matter for discussion, which is the tight coupling between query terms and search results in the form of dynamic queries [22].

Dynamic Queries
Dynamic queries indicate results that are continuously updated as additional terms are entered. The FilmFinder interface is an early example of the usage of dynamic queries [23]. On the FilmFinder interface, movies are represented as dots on an x-y axis, as shown in Figure 3. The y-axis measures the popularity of the film, while the x-axis indicates the year of release. By manipulating sliders, thereby specifying a query, the user is given immediate feedback on the retrieved results. This feature allows the user to re-formulate the query immediately to explore further the collection. For example, a user might be interested in 1994 movies with actors of last names starting from "A to C". As the results are retrieved, the user may increase the year slider, and immediately see more movies falling within the grid space. Invented in 1994, FilmFinder remains an early example of the usefulness of dynamic query search that encourages an exploratory type of user behavior. Dynamic queries are very good at quickly manipulating data on numbers of different dimensions. This feature is critically important for hypothesis generation and provides immediate feedback on the entered hypothesis. Thus, the user's mind assesses the validity of the hypothesis or its re-formulation. This approach, known as trace tactic, fosters information-need development [25]. That is, when the user may not be initially certain of the needed information, the appearing results may lead to clearer understanding of the actual requirement. From this mode, the user could manipulate the query terms to generate new documents of interest. As previously noted in this paper, users spend most of their time on the top retrieved documents. Paging is rarely used and instead, users prefer to re-formulate their queries until their information need is satisfied [26]. Dynamic queries help in instantaneously having a look at the results, which in return encourage users to explore the collection even further. However, if we could lay more results on a page than a simple list of documents, we may help the user discover new ones. The next Section will

Representing Search Results
We now turn our attention to the visualization of search results. Shneiderman [27] stated that an exploratory interface should allow a user to select a display depending on the data type and the task at hand. When applied to search results, the user should select among different views depending on his information need. For example, a certain view could provide an overview of the entire collection, while another one, perhaps using a graph, could be showing intricate relationships between documents. This paradigm is useful to gain greater insights from the data. The underlying idea remains the same, which is to improve the cognitive ability of users on applying the principles of information visualization.

Principles and Motivation
Card, et al. [28] define six basic principles in which information visualization could improve the cognitive ability of the user as shown in Table 2.

Table 2. Basic Principles of Information Visualization
When applied to search results, these principles could provide a guideline on making patterns emerge, and therefore help humans make better decisions. Several companies have made interesting contributions to information visualization in interfaces. Companies, such as Saracevic [29] and Palantir [30], take a massive amount of business intelligence data and create interfaces that allow analysts to make better business decisions. IBM has a project called "Many Eyes" [31], which is a large collection of visualizations on which people could collaborate and discuss the data collected. The project serves as a catalyst for discussion and collective insights. How interesting it might have been to have had these tools when the notorious Enron emails were released. Would the collective insights of better informed analysts draw more accurate conclusions earlier? How might the outcome be different (or not)? These kinds of visual tools may have at least led to more scrutiny and have reduced the loss and suffering of many. To illustrate our point, let us cover more examples on the application of different visualizations to search results.

Visual Search Results in Applications
Volkswagen [32] offers an interesting faceted search solution to browse through its product line of cars ( Figure  4). Instead of showing the results as a list of links or text, the system simply renders a visual depiction of a car, allowing the use to immediately see the details of the vehicle of interest. Also of interest are the facets which make use of pictograms, which will be discussed more in the next Section. The Volkswagen interface is remarkably similar to those employed within modern exploratory search systems. However, the system remains unable to change search views or provide recommendations.

Visualization on the Facets
We now cover how different visualizations can be applied to the facets themselves. Section 1 showed that ES activities provides a seamless integration between browsing and searching. The user searches for keywords, obtains results, and potentially continues browsing the corpus through the different facet values [33]. Facets also provide interesting summaries of search results with respect to facet classification. More precisely, the facets can reveal patterns of distribution and occurrence at an aggregate level. However, for those patterns to emerge, the data must be represented appropriately [34].

Step No
The Principles First Consists of presenting the results in a manner that expands human memory. While this goal may be obvious, implementation is a challenge given that data presentation should favor recollection. For example, geospatial data could simply be displayed on a map instead of presenting a list of coordinates. The products in a shopping site should be represented as a visual depiction of the product rather than as text data. Second Principle is presenting only relevant information to the user to reduce his search process. For example, the results returned by a search engine should show only the important pieces of data to determine overall relevance with respect to the query. Hovering the mouse over a specific result would then show more details. The next principles are all related and we state them here for completeness. Third Principle aims to present information in a manner which lets the user identify and recognize patterns in the data. Fourth Principles deems to present information to allow the user to infer easily relationships from the data that would otherwise be more difficult to induce. Fifth Consists of enabling the user to monitor a large number of events at once. Sixth Consists of letting the user directly interact with the data through a space of parameter values as opposed to accessing a static diagram.

Visualizing Frequency
Much of the success of faceted search is due to the use of query previews. Query previews [35,36] give the user a hint of what to expect before he selects a link or issues a query. In a standard faceted search system, the query preview is a simple numerical count. Some systems have attempted to represent this count more graphically. The Relation Browser (RB) is an early example. In RB, a bar indicates the relative frequency of the facet terms ( Figure 5) [37].

Fig 5. The Relation Browser Visual Depiction of the Count[38]
The darker portion of the bar shows the count if the facet term is selected within the current search space, while the lighter and longer portion of the bar shows the overall count of the facet term within the entire collection. RB also has the ability to switch views between search results and facets [39]. Facets are presented in a cloud view similar to a tag cloud. RB also features dynamic queries with an excellent response to feedback. However, the system is client-based, which limits its scalability [39]. Another system worth mentioning is FacetLens [40]. The facets on FacetLens use most of the real estate of the interface. The facet values are ordered by frequency and depicted as large circles ( Figure 6). These circles depict the actual search results of interest. According to the authors, the interface helps users identify and compare trends. Furthermore, it offers pivot operations, which allow the user to navigate the dataset using relationships between items.

Fig 6.
FacetLens Depicted [42] Visualizing metrics, such as frequency, within facets could be interesting in the discovery experience. The correct visualization can shift the focus from searching to more exploratory tasks, such as data analysis [41]. However, facets originate in metadata of many different types. For example, dates can be represented textually or more graphically, as a timeline. Locations can be better served by points or regions on a map, rather than by a list of coordinates. Therefore, a broader set of visualizations than those limited to depicting frequencies is possible. We are now going to review some of the visualizations possible with respect to the "type" of facet at play.

Fitting the Data Type
A very simple visualization is that of check-boxes for multi-select disjunctive facets, examples of which can be found in many websites. The new search interface of EBay is solely composed of disjunctive facets represented by a list of check-boxes. In Figure 7, the user filters by (ORs) a couple of brands and the results are visualized. The selection is clearly shown on the facet. The user can also search more terms to be added within the facet itself. In most of the previous examples, facet values are essentially categorical. The data is qualitative and can be organized on a nominal or ordinal scale. However, facets must often display quantitative data, such as product dimensions or price ranges. There are numerous other visualizations for other types of data. For example, when refining by a certain color, a color picker facet may be a good choice. There are various ways to implement this. The website ArtistRising [44] features a full palette of colors ( Figure 8). However, such a display allows the user to select illegal values, which can alienate some benefits of faceted search. Another approach could consist of a list of colored labels together with their respective counts. Many more displays for facets are possible. One can imagine a map to represent geographically based metadata as we have seen for search results. It is important to understand that the facets can far exceed textual representation. First, the count can be more interestingly represented to provide a landscape of the search space. Second, the depiction of the facet and its interaction are subject to the underlying data type. If the data is qualitative or categorical in nature, a conventional list of check-boxes can be used. If the data is quantitative, range sliders may be preferred. More exotic facet displays, otherwise used in software, may also fit well within the interface [45]. The point being there are probably as many facet views as there are ways of visualizing data. The next Section will review many more visualizations, which may be adapted to visualize search results and/or facet values.

More Visualizations
We now cover more visualization, which can be adapted to a faceted search user interface. Keep in mind that we cannot possibly be exhaustive in this review. We are merely selecting visualizations which can be immediately adapted to a faceted search system.

Pictograms
We Feel Fine [46] is a web service to visualize and make sense of a database of over 12 million human "feelings." The database was built over a period of 3 years by crawling blogs and searching for phrases such as "I feel" or "I'm feeling." Of particular interest are the facets which are represented with respect to their meanings ( Figure 9). The gender facet is represented as a pictogram of a woman or man. The weather is depicted using familiar meteorological icons. After selection, the "feelings" are shown on a beautifully colored interface.

Tag Cloud Visualizations
Tag Clouds were quite popular during the Web 2.0 era. They first appeared in 2005 in high-profile websites, such as the photo sharing site [47] or the shared bookmarking site https://del.icio.us/ [48]. They provide a visual representation, using a different font size and color, of term occurrences within a document. They have been shown to be effective as a signaler of social activity [49]. Developers have since created more visually pleasing tagcloud-like visualizations, such as a Wordle [50] ( Figure  10). These clouds visualizations can easily be adapted as facets [51]. Imagine being able to select multiple values within a Wordle. In this case, the invalid choices would be grayed out. Another adaptation could consist of constructing a new word cloud for each subsequent refinement, in which case, the selected terms would be removed from the list and a new list would be generated from the returned search results.

Quantifying Data with Bubbles
Another visualization example comes from the ManyEyes project [31]. As previously discussed, users upload data to ManyEyes, choose a visualization, then share it with others for discussion. Figure 11 presents a depiction of the human world population by Language Speakers. This visualization quickly clarifies population size differences among nations. It allows users to compare bubbles to easily quantify the data in play. Many other uses can be imagined, such as comparing geographic sizes of nations, income levels, health, or education levels. This visualization can easily be used as a facet or an alternative search result view. Numerous other possible visualizations can be employed. Although not necessarily immediately applicable to faceted search, the classic books from Tufte [53], Tufte and Graves-Morris [54] and provide numerous interesting methods of visualizing information. However, the real challenge consists of building a system which can integrate many of these visualizations. In the next Section, we provide a means to quickly create such a system in which every component is re-usable for a community of users.

FINDINGS
Improving the relevancy of search Web results has been of growing concern recently. Relevant results for user queries normally lie anywhere among the list of hits provided by the SE. The huge number of matching documents and the textual list format make it difficult for users to find such results.
The nature of the WWW implies heterogeneity, huge amount of information and variation of structures. Hence, discovering results that better suit the needs of each search is very demanding. Due to these, it is evident that there is a need to increase the capability of the present search system to handle huge amount of results while simultaneously presenting relevant attributes for each web page to guide the searcher with the help of interactive visualization and graphical techniques. In addition, query reformulation and the SE usually controls query reconstruction.

CONCLUSIONS
This paper reviews and summarizes a number of selected literatures on visualization. A comparative research on the existing visualization systems is conducted. However, a variety of systems have been applied in many domains, there are different unresolved problems that are worth for future investigation. Information visualization takes advantage of the innate abilities of users for perceiving, identifying, exploring, and understanding large amounts of data. We also provided numerous examples of visualizations, which could be adapted within a faceted search system. Visualization is of interest to research in Web information retrieval to improve users' interactions with Web search engines. We argue that utilizing search results' features such as document URL, title, summary and augmented with visualization and clustering may improve the effectiveness of users in searching the Web by using smart search engine.