A tag cloud (word cloud, or weighted list in visual design) is a visual representation for text data, typically used to depict keyword metadata (tags) on websites, or to visualize free form text. 'Tags' are usually single words, and the importance of each tag is shown with font size or color. This format is useful for quickly perceiving the most prominent terms and for locating a term alphabetically to determine its relative prominence. When used as website navigation aids, the terms are hyperlinked to items associated with the tag.
In the language of visual design, a tag cloud (or word cloud) is one kind of "weighted list", as commonly used on geographic maps to represent the relative size of cities in terms of relative typeface size. An early printed example of a weighted list of English keywords was the "subconscious files" in Douglas Coupland's Microserfs (1995). A German appearance occurred in 1992.
The specific visual form and common use of the term "tag cloud" rose to prominence in the first decade of the 21st century as a widespread feature of early Web 2.0 websites and blogs, used primarily to visualize the frequency distribution of keyword metadata that describe website content, and as a navigation aid.
The first tag clouds on a high-profile website were on the photo sharing site Flickr, created by Flickr co-founder and interaction designer Stewart Butterfield in 2004. That implementation was based on Jim Flanagan's Search Referral Zeitgeist, a visualization of Web site referrers. Tag clouds were also popularized around the same time by Del.icio.us and Technorati, among others.
Over-saturation of the tag cloud method and ambivalence about its utility as a web-navigation tool led to a noted decline of usage among these early adopters. (Flickr would later "apologize" to the web-development community in their five-word acceptance speech for the 2006 "Best Practices" Webby Award, where they simply stated "sorry about the tag clouds.")
A second generation of software development discovered a wider diversity of uses for tag clouds as a basic visualization method for text data. Most notably, the method was adapted for visualizing word frequency in free-form natural language texts, first by TagCrowd, created by Stanford University researcher and designer Daniel Steinbock in 2006, and further popularized by Wordle, created by IBM researcher Jonathan Feinberg in 2008.
A data cloud showing the population of each of the world's countries. Created in R with wordcloud package. Data fromCountry population. Note that the proportional sizes of China and India were reduced in half.
There are three main types of tag cloud applications in social software, distinguished by their meaning rather than appearance. In the first type, there is a tag for the frequency of each item, whereas in the second type, there are global tag clouds where the frequencies are aggregated over all items and users. In the third type, the cloud contains categories, with size indicating number of subcategories.
In the first type, size represents the number of times that tag has been applied to a single item. This is useful as a means of displaying metadata about an item that has been democratically 'voted' on and where precise results are not desired. Examples of such use include Last.fm (to indicate genres attributed to bands) and LibraryThing (to indicate tags attributed to a book).
In the third type, tags are used as a categorization method for content items. Tags are represented in a cloud where larger tags represent the quantity of content items in that category.
There are some approaches to construct tag clusters instead of tag clouds, e.g. by applying tag co-occurrences in documents.
More generally, the same visual technique can be used to display non-tag data, as in a word cloud or a data cloud.
The term keyword cloud is sometimes used as a search engine marketing (SEM) term that refers to a group of keywords that are relevant to a specific website. In recent years tag clouds have gained popularity because of their role in search engine optimization of Web pages as well as supporting the user in navigating the content in an information system efficiently. Tag clouds as a navigational tool make the resources of a website more connected, when crawled by a search engine spider, which may improve the site's search engine rank. From a user interface perspective they are often used to summarize search results to support the user in finding content in a particular information system faster.
A data cloud showing stock price movement. Color indicates positive or negative change, font size indicates percentage change.
Tag clouds are typically represented using inline HTML elements. The tags can appear in alphabetical order, in a random order, they can be sorted by weight, and so on. Sometimes, further visual properties are manipulated in addition to font size, such as the font color, intensity, or weight.Most popular is a rectangular tag arrangement with alphabetical sorting in a sequential line-by-line layout. The decision for an optimal layout should be driven by the expected user goals. Some prefer to cluster the tags semantically so that similar tags will appear near each other. Heuristics can be used to reduce the size of the tag cloud whether or not the purpose is to cluster the tags.
A data cloud or cloud data is a data display which uses font size and/or color to indicate numerical values It is similar to a tag cloud but instead of word count, displays data such as population or stock market prices.
A text cloud or word cloud is a visualization of word frequency in a given text as a weighted list. The technique has recently been popularly used to visualize the topical content of political speeches.
Extending the principles of a text cloud, a collocate cloud provides a more focused view of a document or corpus. Instead of summarising an entire document, the collocate cloud examines the usage of a particular word. The resulting cloud contains the words which are often used in conjunction with the search word. These collocates are formatted to show frequency (as size) as well as collocational strength (as brightness). This provides interactive ways to browse and explore language.
Perception of tag clouds
Tag clouds have been subject of investigation in several usability studies. The following summary is based on an overview of research results given by Lohmann et al.:
Tag size: Large tags attract more user attention than small tags (effect influenced by further properties, e.g., number of characters, position, neighboring tags).
Scanning: Users scan rather than read tag clouds.
Centering: Tags in the middle of the cloud attract more user attention than tags near the borders (effect influenced by layout).
Position: The upper left quadrant receives more user attention than the others (Western reading habits).
Exploration: Tag clouds provide suboptimal support when searching for specific tags (if these do not have a very large font size).
Creation of a tag cloud
In principle, the font size of a tag in a tag cloud is determined by its incidence. For a word cloud of categories like weblogs, the frequency of use for example, corresponds to the number of weblog entries that are assigned to a category. For small frequencies it's sufficient to indicate directly for any number from one to a maximum font size. For larger values, a scaling should be made. In a linear normalization, the weight of a descriptor is mapped to a size scale of 1 through f, where and are specifying the range of available weights.
for ; else
: display fontsize
: max. fontsize
: min. count
: max. count
Since the number of indexed items per descriptor is usually distributed according to a power law, for larger ranges of values, a logarithmic representation makes sense.
Implementations of tag clouds also include text parsing and filtering out unhelpful tags such as common words, numbers, and punctuation.
There are also websites creating artificially or randomly weighted tag clouds, for advertising, or for humorous results.
^Gilles Deleuze, Felix Guattari (1992). Tausend Plateaus. Kapitalismus und Schizophrenie. ISBN3-88396-094-2.
^A copy of Jim Flanagan's Search Referral Zeitgeist was available at archive.orgbut has since been blocked. In the comments of a blog entry, a user identified as Steve Minutillo attribute the idea to Jim Flanagan, stating that Flanagan's site had such displays in 2002.
^Knautz, K., Soubusta, S., & Stock, W.G. (2010). Tag clusters as information retrieval interfaces. Proceedings of the 43rd Annual Hawaii International Conference on System Sciences (HICSS-43), January 5–8, 2010. IEEE Computer Society Press (10 pages).