About cluster analysis


In this topic


 


What is cluster analysis?

Cluster analysis is an exploratory technique that you can use to visualize patterns in your project by grouping sources or nodes that share similar words, similar attribute values, or are coded similarly by nodes.  Cluster analysis diagrams provide a graphical representation of sources or nodes to make it easy to see similarities and differences. Sources or nodes in the cluster analysis diagram that appear close together are more similar than those that are far apart.

You can use cluster analysis diagrams to visualize:

In the image below, the cluster of three documents indicates a close similarity. The other three documents are shown further apart which indicates that they are dissimilar.

When you create a cluster analysis diagram in NVivo, it displays in Detail View with two tabs:

cn_similarity_tab_diagram.gif

cn_similarity_tab_summary.gif

1  Items compared—each possible pair of selected items is listed as a row in the table.

2  Similarity Index—displays a value that indicates the degree of similarity for each pair of items based on the similarity metric selected. Items with a high similarity index (maximum=1) indicate a strong similarity and are displayed closer together on the cluster analysis diagram.

ic_top_of_page.gifTop of Page

Types of cluster analysis diagrams

When you create a cluster analysis diagram, by default it is displayed as a horizontal dendrogram. You can select from a gallery of cluster analysis diagrams—experiment with the diagram types to find the one that best fits the project items you are exploring.

Type Description Example
2D Cluster Map

A  two-dimensional diagram where similar items are clustered together and different items are further apart.

rn_clusteranalysis_type_map_2d.gif

3D Cluster Map

A  three-dimensional diagram where similar items are clustered together and different items are further apart.

The diagram can be rotated in three dimensions.

rn_clusteranalysis_type_map_3d.gif

Horizontal Dendrogram

A horizontal branching diagram where similar items are clustered together on the same branch and different items are further apart.

Dendrograms can be useful for comparing pairs of items.

rn_clusteranalysis_type_dendrogram_horizontal.gif

Vertical Dendrogram

A vertical branching diagram where similar items are clustered together on the same branch and different items are further apart.

Dendrograms can be useful for comparing pairs of items.

rn_clusteranalysis_type_dendrogram_vertical.gif

Circle Graph

A circle where all the items are represented as points on the perimeter. Similarity between items is indicated by connecting lines of varying thickness and color.

Similarity is indicated by blue lines—thicker lines indicate stronger similarity. Dissimilarity is indicated by red lines—thicker lines indicate stronger dissimilarity.

rn_clusteranalysis_type_circle.gif

 

ic_top_of_page.gifTop of Page

Cluster by word, coding or attribute value similarity

The sources or nodes in a cluster analysis diagram, can be clustered by word similarity, coding similarity or attribute value similarity.

Cluster by Description
Word similarity The words contained in the selected sources or nodes are compared.
 

Sources or nodes that have a higher degree of similarity based on the occurrence and frequency of words are shown clustered together. Sources or nodes that have a lower degree of similarity based on the occurrence and frequency of words are displayed further apart.

 
Stop words are excluded when using this measure of similarity—refer to Set the text search language and stop words  for more information.
Coding similarity The coding at the selected sources or nodes is compared.
 
Sources or nodes that have been coded similarly are clustered together on the cluster analysis diagram. Sources or nodes that have been coded differently are displayed further apart on the cluster analysis diagram.
Attribute value similarity The attribute values of the selected sources or nodes are compared.
 
Sources or nodes that have similar attribute values are clustered together on the cluster analysis diagram. Sources or nodes that have different attribute values are displayed further apart on the cluster analysis diagram.

ic_top_of_page.gifTop of Page

Selecting a similarity metric

A similarity metric is a statistical method used to calculate correlation between items. When you create a cluster analysis diagram using the Cluster Analysis Wizard, you can choose from the following similarity metrics:

For more information on how similarity is measured, refer to How are cluster analysis diagrams generated?

ic_top_of_page.gifTop of Page