Friday, March 15, 2019

Cluster Analysis


Introduction
Cluster analysis refers to the statistical approach of subdividing variables into groups based on their similarities (Romesburg, 2004). Each cluster possesses items with similar characteristics while the items in different clusters have different characteristics. Cluster analysis is a mathematical tool with diverse applications in the field of research (Romesburg, 2004). For example market, Researchers use cluster analysis to verify the brand of products that the designated targets consider similar. The study of this approach is particularly important in the field of research because it is critical for researchers to create and revise goals consistently. The paper provides a brief overview of the key concepts an individual should be conversant with when applying cluster analysis.

Basic steps in clustering analysis
The principles of cluster analysis apply the concepts of classification that place objects/ variables in the same group and separate those with different characteristics. The process of cluster analysis entails several details that are creating a problem, selecting a distance measure, picking a method of clustering and deciding the number of clusters. The cluster analysis also involves the interpretation of the profile clusters and assessment of the validity of clustering (Romesburg, 2004). There are several basic steps applied in the organization of material in cluster analysis applied to achieve these aspects. The first step is the creation of a data matrix that outlines the object and the attributes. The researcher should arrange the objects on the columns while the attributes/characteristics should fall on the rows (Everitt, 2011).  The next step entails the standardization of the data matrix that is optional for the researcher. The third step entails computing the resemblance matrix using a resemblance coefficient that measures the degree of similarity in the matrix. The resemblance coefficient can either be dissimilarity or a similarity coefficient (Everitt, 2011). The fourth step involves executing the clustering method. Researchers use the data acquired from the previous step to map out a tree diagram that shows the level of the resemblance of the objects (Everitt, 2011). The fifth step involves the rearrangement of the data and resemblance matrices. The researcher does this to clarify and make vivid the similarities of the objects in the tree (Everitt, 2011). The final step entails calculating the cophenetic correlation coefficient to measure the margin of error between the tree and the resemblance matrix (Everitt, 2011).
Types of Cluster analysis
As stated earlier cluster analysis is a mathematical tool that statistically classifies personal observations into clusters using their similarities. There are several types of cluster methods applied in cluster analysis; however, the two commonly used methods include the hierarchical k-mean cluster analysis.
Hierarchical cluster analysis (HCA)
There are two different types of methods for hierarchical clustering that is agglomerative and the divisive. The agglomerative hierarchical clustering deploys each variable/ individual study as a split cluster (Aldenderfer& Blashfield, 1996).  It systematically combines the most similar clusters to form larger units continuously until there is one cluster that contains every observation (Aldenderfer& Blashfield, 1996). The divisive clustering method takes the opposite approach and starts with a single cluster containing every observation and divides the cluster to the simplest cluster that does not have any dissimilarity (Aldenderfer& Blashfield, 1996).
K-means Clustering
The user identifies the number of clusters he/she wants to the solution, and the cluster means (centroids) for each at the beginning of the clustering. The researcher compares the individual study with the values of each cluster mean assigning the most similar to the cluster. The researcher calculates the value of each affected centroid again after assigning it anew. The process continues until there is no reassignment to make that is a complete pass through the dataset.  The procedure affords the researcher the chance to compare individual cluster solutions based on their vigor to the observed data (Aldenderfer& Blashfield, 1996).
Two-step clustering
Researchers use this type of clustering to handle large sets of information, and it applies a two-stage approach (Romesburg, 2004). The first step takes the entire observable data and breaks it down systematically to produce cluster prototypes that have a high level of similarities. The approach then applies a hierarchical agglomerative clustering procedure to combine the objects in the results to produce identical clusters (Romesburg, 2004). The procedure can handle categorical and continuous variables concurrently and afford the researcher the suppleness to identify the numbers of clusters. The procedure can also automatically choose the cluster numbers based on the criteria of statistical evaluation (Romesburg, 2004).
Applications of cluster analysis
There are many developments made in the field of research with more and more discoveries emerging in the globe today. The advance in the technology and the thirst for more information through research makes the application cluster analysis mandatory (Romesburg, 2004). As stated earlier there are many applications of cluster analysis in practical situations since the world. There are several ways through which researcher can apply cluster analysis techniques in the world today.


Clustering for Understanding
Individuals around the globe will apply the concepts of classification to provide an in-depth analysis and description of their environment and everything in the environment (Everitt, 2011). The classification comes as an inherent characteristic of human beings in the world. Many fields of study for researchers apply the application of clustering methods in analysis. Some of these fields include life sciences, information retrieval and the physical features of the world among others. These fields use the comprehensive skills of cluster analysis to provide more insight of the available information and to discover potential research fields(Kaufman& Rousseeuw, 2009).
Clustering for utility
Not only can cluster analysis serve as a platform for understanding but also the computation of relevant data acquired from observations. These mathematical computations include summarization, compression and evaluation among others (Kaufman& Rousseeuw, 2009). A lot of raw data collected comes in complex and bulk forms therefore clustering the information make it easy to carry out other mathematical operations easy. Rather than applying mathematic algorithms on the entire set of information they get applied to cluster prototypes without affecting the accuracy of the information significantly (Kaufman& Rousseeuw, 2009). The approach also compresses the data by applying vector quantization (Romesburg, 2004). The approach inserts the information about the data into tables set as object and attributes which allows the information to fit I clusters. It is inevitable for some of the data to get lost in the process; however the percentage error accrued is not significant as such the outcome of the analysis is within the margins of error. The application of cluster analysis affords the researcher efficiency in finding the neighboring information (Romesburg, 2004).
Limitations in cluster analysis
There are several limitations that deter the application of this process. Cluster analysis imposes a hierarchical structure on data regardless of whether it is real or not. It makes the judgment of the process difficult and also limits rectification of the data.  Clustering analysis does not illustrate and independent underlying controls properly because they lay their platforms on algorithms rather than formal arithmetic (Everitt, 2011). The solutions provided by the data are at times not conclusive as they lack individuality since the outcome of the process relies on the arrangement of the variables. The different methods of classification give varying results due to the different criteria applied when developing the clusters. It makes the entire process hard to confirm for authenticity (Everitt, 2011).


Reference
Aldenderfer, S., & Blashfield, K. (1996). Cluster analysis. Newbury Park, California., Sage Publishers.
Everitt, B. (2011). Cluster analysis. Chichester, West Sussex, United Kingdom: Wiley.
Kaufman, L., & Rousseeuw, P. J. (2009). Finding groups in data: an introduction to cluster analysis (Vol. 344). John Wiley & Sons.
Romesburg, C. (2004). Cluster analysis for researchers. Lulu Press.



Sherry Roberts is the author of this paper. A senior editor at MeldaResearch.Com in nursing essay writing service services. If you need a similar paper you can place your order from research paper services.

No comments:

Post a Comment