In this topic
Use Word Frequency queries to list the most frequently occurring words in your sources. You can choose to search only in the textual content of your sources, in the annotations or both.
You can select the source content you want to search, by selecting sources, nodes, sets, folders or search folders.
Before you run a Word Frequency query, make sure the text search language is set to the language of your source materials—refer to Set the text search language and stop words for more information.
You could use a Word Frequency query to
Identify possible themes, particularly in the early stages of a project
Analyze the most frequently used words in a particular demographic. For example, analyze the most common words used by farmers when discussing climate change You could do a coding query to gather all content coded at climate change and at nodes with the attribute farmer—then select the result node as the criteria for the Word Frequency query.
You can look for exact words, or broaden your search to find the most frequently occurring concepts. For example, if you look for the most frequent words in a dataset survey, you might find that water, health, and harmful are the most frequently occurring words. However, if you adjust the Finding matches slider to return similar words, you might find that the concept of pollution (including pollutants, pollution, polluted, and pollutes) occurs most frequently.
NOTE You cannot use Word Frequency queries to search for words in framework matrix summaries.
On the Explore tab, in the Queries group, click New Query, and then click Word Frequency.
The Word Frequency Query dialog box opens.
Adjust the Finding matches slider, if you want to find concepts rather than words. By default, the slider is set to find exact words only—refer to Understanding text match settings for more information.
In the Search in box, select whether you want to search in Text, Annotations or both.
To change the scope of the query:
In the Of box, select which project items you want to include in the search. Use the Select button to choose specific project items.
In the Where box, choose to include only project items created or modified by selected users—use the Select button to select the users.
Under Display Words, you can choose
All to include all words found in the selected project items.
<number> most frequent to include a specific number of words—for example, you could display the 100 most frequently occurring words.
(Optional) Enter a With minimum length to exclude short words from the results—for example, enter 7 to display only words with seven or more letters.
Click Run.
NOTE
To save the Word Frequency query, click the Add to Project check box, and enter the name in the General tab.
You cannot save the result of a Word Frequency query as a node but you can create nodes from individual words in the result list—refer to Make nodes from the results of a Word Frequency query.
You can export the results of a Word Frequency query as a list from the Detail View. On the External Data tab, in the Export group, click Export, and then click Export List. Refer to Export query results for more information.
PDFs created by scanning paper documents may contain only images—each page is a single image. You cannot query text within images. If you want to use Word Frequency queries to explore the text in these PDFs, then you should consider using optical character recognition (OCR) during the scanning process to convert the scanned images to text (before you import the PDF files into NVivo).
When you run a Word Frequency query the results are displayed in Detail View. There are four tabs on the right—the Summary, Tag Cloud, Tree Map and Cluster Analysis tabs. You can change which tab is displayed by default—refer to the display options in Set application options for more information.
1 The most frequently occurring words excluding any stop words. If you adjusted the Finding matches slider to return similar words, the most frequently occurring word from the group is displayed in this column.
2 Length—the number of letters or characters in the word.
3 Count—the number of times that the word occurs within the project items searched. If you adjusted the Finding matches slider to include similar words, this count is the total for all the similar words.
4 Weighted Percentage—the frequency of the word relative to the total words counted. If you adjusted the Finding matches slider to include similar words, a word may be part of more than one group of similar words. The weighted percentage assigns a portion of the word's frequency to each group so that the overall total does not exceed 100%.
5 Similar Words—other words that have been included as a result of adjusting the Finding matches slider—for example, if you include words with the same stem, then pollutants, pollution, and polluted would be grouped together. This column is not available if you use 'Exact match only'.
This tab displays up to 100 words alphabetically in varying font sizes, where frequently occurring words are in larger fonts.
The Tree Map tab displays up to 100 words as a series of rectangles, where frequently occurring words are in larger rectangles.
The Cluster Analysis tab displays up to 100 words as a horizontal dendrogram, where words that co-occur are clustered together.
When you click on the cluster analysis diagram, the Cluster Analysis tab (on the ribbon) becomes available, you can use the commands on this ribbon tab to:
Change the diagram type—you can show the data as a horizontal or vertical dendrogram, a circle graph, or a 2D or 3D cluster map
In 2D or 3D cluster maps, select the Word Frequency check box if you want to use word frequency to determine the size of the bubbles in the cluster map.
For more information, refer to Change the appearance or content of a cluster analysis diagram.
When you run a Word Frequency query, a preview node is created for each word—this lets you see all references to the word. To open a preview node:
In the Summary, Tag Cloud, Tree Map or Cluster Analysis tab, double-click the word you want to explore.
In the preview node, you see each occurrence of the selected keyword in context:
The context (the text around the word) is displayed in grey—by default it is a 'narrow' context. To expand the context for a selected reference, on the View tab, in the Detail View group, click Node and choose the coding context.
You can also change the definition of 'narrow', to show more or less words on each side of the selected word—refer to Narrow and broad and custom reach settings for more information.
You can create a node that includes all the references to a word you select in the Word Frequency query results.
Select the word you want to use to create a node.
On the Create tab, in the Items group, click Create As and select Create As Node.
The Select Location dialog box opens.
Select a location and name the node.
Click OK.
NOTE You can also right-click on a word to see the shortcut menu, and click Create As Node.
You can run a Text Search query for a selected word in the Word Frequency query results.
On the Explore tab, in the Queries group, click Other Actions, and then click Run Text Search Query.
The Text Search Query dialog box opens.
(Optional) Change the Text Search Criteria or Query Options.
Click Run.
Refer to Run a Text Search query for more information.
When determining the frequency of words, NVivo applies the following rules:
Words containing punctuation (such as hyphens, periods and other symbols) are divided into separate words. For example, part-time will be counted as part and time.
Words containing apostrophes (such as can't and I'd) are treated as one word but if the apostrophe is followed by an 's then the s is not included (Tom's would be counted as Tom).
In audio and video transcripts, only words in the Content field (column) are counted—any words in custom transcript fields are ignored.
In datasets, only words in codable fields (columns) are counted—any words in classifying fields are ignored.
When searching text in selected nodes, if a word is coded against multiple nodes, it is counted once for each node. Similarly, if a word has been coded by multiple users to the same node, it is counted once for each user.
Word Frequency queries do not include 'stop words'—refer to Exclude particular words when running Word Frequency queries for more information.
Word Frequency queries do not include 'stop words'—by default, these are less significant words like conjunctions or prepositions. You can view and edit the list of stop words, refer to Set the text search language and stop words for more information.
You can add a word displayed in your query results to the stop words list—select the word you want to exclude from the query results, then click Add to Stop Words List, in the Stop Words group on the Analyze tab. The words you add to the stop word list will be excluded the next time you run a Word Frequency or Text Search query.
NOTE In server projects, only Project Owners can add words to the stop word list—refer to About teamwork in a server project for more information.