Co-occurrence analysis

Term set 1:
Term set 2:

Advanced Options





note: If you do many searches, then use your own API key. For Bing, you just need the API Key: see Bing Key Form. For Google, you might also need a Custom Search Engine (CSE) ID, such as if you make your API key restricted. To get these, use the Google API Form; for an example, see How-to Video.


Help

This interface provides a convenient way to derive metrics for term co-occurrence based on web search queries, using Google or Bing. Each of the terms from the first set is compared against those in the second. Four queries are issued to derive the frequencies with respect to web pages that contain both terms, that contain one term but not the other, and that contain neither term: T1T2, ~T1T2, T1~T2, and ~T1~T2. Phrasal terms can be used by enclosing them in double quotes.

The optional context terms provide a way to indicate the domain for the terms: all searches will use these terms. These can be phrasal and can also include operators specific to the search engine (e.g., Google or Bing). For Google, see special operators. For Bing, see advanced operators. In both cases, you can use 'term' and 'site:name' to omit pages containing a specific term and to restrict results to a particular web site, respectively.

The following figure explains the derivation of the metrics:
Derivation of co-occurrence metrics
See O'Hara et al. (2012) for details, along with an interesting application to chord/word associations. See Dunning (1993) for an in-depth evaluation of the G^2 metric and why it is often preferred for term associations. Also see the Wikipedia G-test article.

Copyright, etc.