Approaches to analyzing datasets

This topic explains various ways that you might choose to analyze and explore a dataset, starting with the simplest approach and then moving to more complex methods.

As datasets commonly contain survey data, this topic describes how you might approach analyzing a set of survey responses.

In this topic

Explore your survey data in Detail View
Gather responses to each question
Gather responses of each survey respondent
Grouping demographic values into ranges
Gather responses based on demographic values

Explore your survey data in Detail View

When you open a dataset in Detail View, you can visually explore the dataset. When you are working with the dataset in Detail View, you can:

Hide columns to limit the amount of data you are looking at—for example, if you want to see the first column in your dataset next to the fifth column, you can hide the intervening columns.
Use the sort or filter functions to see patterns in your data. For example, if your dataset contains survey responses and includes a classifying field for sex, you can use the sort or filter functions to view the responses of the males or females.
Manually code survey responses at nodes representing the themes in your data—refer to Basic Coding in dataset sources for more information.

You can also run queries to find and code at themes in your data:

Run a Word Frequency query to identify common themes in the survey responses.
Run a Text Search query to find all instances of a particular word or phrase.

Top of Page

Gather responses to each question

Do you want to see how all respondents replied to a question? Gathering responses to each survey question at a node allows you to group the data into broad themes.

Using the example dataset below, you could create a node Question 1 and code the entire column at that node. You could create another node to contain all responses to Question 2.

Respondent	Age	Sex	Question 1	Question 2
Anna	29	Female	I think there should be more car-free zones	Electric buses and taxis would help reduce pollution in the inner city
Jack	31	Male	Pedestrians need to feel safe. There should be better lighting and more police	We should create more green spaces
Maria	52	Female	Safety barriers at busy intersections	I don't think they should tax car parks
Peter	47	Male	Better education in schools about road safety	More street trees

You can code the column manually or automatically:

You can select the entire column and manually code it at a new node called Question 1
You can use the Auto Code Dataset wizard selecting Code at nodes for each column on the first step of the wizard. This is useful when you have many columns containing responses to different survey questions.

NOTE When each respondent is represented by multiple rows (a row per survey question), you can still use the Auto Code Dataset Wizard to gather the responses to a single question at a node—refer to Gather survey responses from multiple rows for more information.

Whichever method you use, you will create and code at the following nodes:

Question 1
Question 2

Once you have grouped all responses to a question at a single node, you can use some of NVivo's powerful analysis tools, including:

Open the node and visually explore content coded at the node. From here you could 'code on' to more granular thematic groupings. For example, you could gather all answers which mentioned car-free zones.
Run a Word Frequency query (using the node in the scope of the query) to find common words or concepts in responses to Question 1.
Run a Text Search query looking for particular words or concepts, using the node in the scope of the query. For example, you could search for education and code all the results at a new node.
Generate a cluster analysis diagram. For example, you can explore the similarity between the responses to Question 1 and responses to other survey questions.

Top of Page

Gather responses of each survey respondent

If your data contains classifying fields that describe your survey participants—for example, the name, age and sex of the participant—you can use these fields to create nodes that represent your survey participants. You can code everything a participant said in response to survey questions at the node that represents them.

Using the data below, for each respondent you would create one node, of the classification 'person', with attributes for Age and Sex. Responses to both Question 1 and Question 2 would be coded at this node.

Respondent	Age	Sex	Question 1	Question 2
Anna	29	Female	I think there should be more car-free zones	Electric buses and taxis would help reduce pollution in the inner city
Jack	31	Male	Pedestrians need to feel safe. There should be better lighting and more police	We should create more green spaces
Maria	52	Female	Safety barriers at busy intersections	I don't think they should tax car parks
Peter	47	Male	Better education in schools about road safety	More street trees

Using the NVivo's automated tools, you can do this in two steps:

Code content at nodes for each person

Use the Auto Code Dataset Wizard, selecting Code at nodes for each row on the first step of the wizard, to code the Question 1 and Question 2 responses at nodes representing the values from the 'Respondent' column. This creates the following nodes:

Anna
Jack
Maria
Peter

At this point, the nodes have coding, but are not classified, and have no attribute values.

Use the Classify Nodes from Dataset Wizard to add the demographic information (age and sex) of each participant to their node.

Once you have created and coded responses at nodes for each respondent, you can use analysis tools which compare their attribute values. You can:

Create charts to compare the demographic attributes of your respondents—perhaps your respondents are mostly males under 30 years old?.
Generate a cluster analysis diagram that compares the attribute values of your respondents—are there clusters of respondents with similar characteristics? Are there any 'outliers'—respondents with demographic characteristics that are very different from the others.
Run a Word Frequency query (using the node in the scope of the query) to find common words or concepts in responses to Question 1. You could code the results at new nodes to further refine your analysis.
Run a Text Search query looking for particular words or concepts, using the node in the scope of the query. For example, you could search for education and code all the results at a new node.

When you have gathered responses both at question nodes (Question 1, Question 2 ) and at respondent nodes (Anna, Jack, Maria, Peter), you can analyze what respondents in different demographic groups are saying in response to particular questions:

Use a Coding query to view all the responses of males under 30 years to Question 1.
Use a Word Frequency query to find the most commonly occurring words or ideas that females mention when responding to Question 2

NOTE

If each respondent is represented by multiple rows (a row per survey question), you can still use the Auto Code Dataset Wizard to gather each person's responses at a node—refer to Gather survey responses from multiple rows for more information.
If you have demographic information about your respondents stored separately from your survey data, you may need to set the attribute values by another method. For example, you can import node attribute values from a spreadsheet or by importing from another NVivo project—refer to Import (or export) classification sheets for more information.

Top of Page

Grouping demographic values into ranges

When you use demographic information in the dataset to set attribute values for nodes, you can optionally group values into ranges.

For example, if your dataset contains the age of your respondents, it may be more useful to know that an individual participant is within the 21-29 age range, than to know their precise age.

The Classify Nodes from Dataset Wizard allows you to the group values—refer to Classify nodes (set attribute values to record information) for more information.

Top of Page

Gather responses based on demographic values

Most commonly, you will use the classifying fields in your dataset to set the attribute values on your respondent nodes. However, it is also possible to create a node structure that reflects the demographic characteristics of your respondents—this provides another way of looking at your data.

Respondent	Age	Sex	Question 1	Question 2
Anna	29	Female	I think there should be more car-free zones	Electric buses and taxis would help reduce pollution in the inner city
Jack	31	Male	Pedestrians need to feel safe. There should be better lighting and more police	We should create more green spaces
Maria	52	Female	Safety barriers at busy intersections	I don't think they should tax car parks
Peter	47	Male	Better education in schools about road safety	More street trees

Using the dataset above, you could use the Auto Code Dataset Wizard (select Code at nodes for each row on the first step of the wizard) to create nodes to represent the male and female respondents, and then child nodes for each age range (by grouping the age values into ranges)—responses to survey questions are coded at the appropriate age-range node.

Your resulting node structure (depending on how you group ages into ranges) might be:

Female

18-29
30-39
40-49
50-52
Male
18-29
30-39
40-49
50-52

This can be a quick way to gather responses by demographic groupings. You can see what males aged 30-39 are saying. If you want to see what all males are saying, turn on aggregation at the parent node (Male).

NOTE

Because the demographic attributes are reflected in the node hierarchy rather than as attribute values on the node, this method is not appropriate when you want use analysis tools that compare node attribute values—for example, if you want to generate a cluster analysis diagram to see the demographic spread of your respondents.
If you anticipate gathering more source materials, and particularly if these source materials are not datasets, this method may not be appropriate. Manually coding additional material within this structure may prove difficult—it might be preferable to store demographic information as attribute values on nodes representing individual participants, as it will be easier to add further coding.

Top of Page