Introduction
Cluster
analysis refers to the statistical approach of subdividing variables into
groups based on their similarities (Romesburg, 2004). Each cluster possesses
items with similar characteristics while the items in different clusters have
different characteristics. Cluster analysis is a mathematical tool with diverse
applications in the field of research (Romesburg, 2004). For example market,
Researchers use cluster analysis to verify the brand of products that the
designated targets consider similar. The study of this approach is particularly
important in the field of research because it is critical for researchers to
create and revise goals consistently. The paper provides a brief overview of
the key concepts an individual should be conversant with when applying cluster
analysis.
Basic
steps in clustering analysis
The
principles of cluster analysis apply the concepts of classification that place
objects/ variables in the same group and separate those with different
characteristics. The process of cluster analysis entails several details that
are creating a problem, selecting a distance measure, picking a method of
clustering and deciding the number of clusters. The cluster analysis also
involves the interpretation of the profile clusters and assessment of the
validity of clustering (Romesburg, 2004). There are several basic steps applied
in the organization of material in cluster analysis applied to achieve these
aspects. The first step is the creation of a data matrix that outlines the
object and the attributes. The researcher should arrange the objects on the
columns while the attributes/characteristics should fall on the rows (Everitt,
2011). The next step entails the
standardization of the data matrix that is optional for the researcher. The
third step entails computing the resemblance matrix using a resemblance
coefficient that measures the degree of similarity in the matrix. The
resemblance coefficient can either be dissimilarity or a similarity coefficient
(Everitt, 2011). The fourth step involves executing the clustering method.
Researchers use the data acquired from the previous step to map out a tree
diagram that shows the level of the resemblance of the objects (Everitt, 2011).
The fifth step involves the rearrangement of the data and resemblance matrices.
The researcher does this to clarify and make vivid the similarities of the
objects in the tree (Everitt, 2011). The final step entails calculating the
cophenetic correlation coefficient to measure the margin of error between the
tree and the resemblance matrix (Everitt, 2011).
Types
of Cluster analysis
As
stated earlier cluster analysis is a mathematical tool that statistically
classifies personal observations into clusters using their similarities. There
are several types of cluster methods applied in cluster analysis; however, the
two commonly used methods include the hierarchical k-mean cluster analysis.
Hierarchical
cluster analysis (HCA)
There
are two different types of methods for hierarchical clustering that is
agglomerative and the divisive. The agglomerative hierarchical clustering
deploys each variable/ individual study as a split cluster (Aldenderfer&
Blashfield, 1996). It systematically
combines the most similar clusters to form larger units continuously until
there is one cluster that contains every observation (Aldenderfer&
Blashfield, 1996). The divisive clustering method takes the opposite approach
and starts with a single cluster containing every observation and divides the
cluster to the simplest cluster that does not have any dissimilarity
(Aldenderfer& Blashfield, 1996).
K-means
Clustering
The
user identifies the number of clusters he/she wants to the solution, and the
cluster means (centroids) for each at the beginning of the clustering. The
researcher compares the individual study with the values of each cluster mean
assigning the most similar to the cluster. The researcher calculates the value
of each affected centroid again after assigning it anew. The process continues
until there is no reassignment to make that is a complete pass through the
dataset. The procedure affords the
researcher the chance to compare individual cluster solutions based on their
vigor to the observed data (Aldenderfer& Blashfield, 1996).
Two-step
clustering
Researchers
use this type of clustering to handle large sets of information, and it applies
a two-stage approach (Romesburg, 2004). The first step takes the entire
observable data and breaks it down systematically to produce cluster prototypes
that have a high level of similarities. The approach then applies a
hierarchical agglomerative clustering procedure to combine the objects in the
results to produce identical clusters (Romesburg, 2004). The procedure can
handle categorical and continuous variables concurrently and afford the
researcher the suppleness to identify the numbers of clusters. The procedure
can also automatically choose the cluster numbers based on the criteria of
statistical evaluation (Romesburg, 2004).
Applications
of cluster analysis
There
are many developments made in the field of research with more and more
discoveries emerging in the globe today. The advance in the technology and the
thirst for more information through research makes the application cluster
analysis mandatory (Romesburg, 2004). As stated earlier there are many applications
of cluster analysis in practical situations since the world. There are several
ways through which researcher can apply cluster analysis techniques in the
world today.
Clustering
for Understanding
Individuals
around the globe will apply the concepts of classification to provide an
in-depth analysis and description of their environment and everything in the
environment (Everitt, 2011). The classification comes as an inherent
characteristic of human beings in the world. Many fields of study for researchers
apply the application of clustering methods in analysis. Some of these fields
include life sciences, information retrieval and the physical features of the
world among others. These fields use the comprehensive skills of cluster
analysis to provide more insight of the available information and to discover
potential research fields(Kaufman& Rousseeuw, 2009).
Clustering
for utility
Not
only can cluster analysis serve as a platform for understanding but also the
computation of relevant data acquired from observations. These mathematical
computations include summarization, compression and evaluation among others (Kaufman&
Rousseeuw, 2009). A lot of raw data collected comes in complex and bulk forms
therefore clustering the information make it easy to carry out other
mathematical operations easy. Rather than applying mathematic algorithms on the
entire set of information they get applied to cluster prototypes without
affecting the accuracy of the information significantly (Kaufman&
Rousseeuw, 2009). The approach also compresses the data by applying vector
quantization (Romesburg, 2004). The approach inserts the information about the
data into tables set as object and attributes which allows the information to
fit I clusters. It is inevitable for some of the data to get lost in the
process; however the percentage error accrued is not significant as such the
outcome of the analysis is within the margins of error. The application of
cluster analysis affords the researcher efficiency in finding the neighboring
information (Romesburg, 2004).
Limitations
in cluster analysis
There
are several limitations that deter the application of this process. Cluster
analysis imposes a hierarchical structure on data regardless of whether it is
real or not. It makes the judgment of the process difficult and also limits
rectification of the data. Clustering
analysis does not illustrate and independent underlying controls properly
because they lay their platforms on algorithms rather than formal arithmetic
(Everitt, 2011). The solutions provided by the data are at times not conclusive
as they lack individuality since the outcome of the process relies on the
arrangement of the variables. The different methods of classification give
varying results due to the different criteria applied when developing the
clusters. It makes the entire process hard to confirm for authenticity
(Everitt, 2011).
Reference
Aldenderfer, S., & Blashfield,
K. (1996). Cluster analysis. Newbury
Park, California., Sage Publishers.
Everitt, B. (2011). Cluster analysis. Chichester, West
Sussex, United Kingdom: Wiley.
Kaufman, L., & Rousseeuw, P. J.
(2009). Finding groups in data: an
introduction to cluster analysis (Vol. 344). John Wiley & Sons.
Romesburg, C. (2004). Cluster analysis for researchers. Lulu
Press.
Sherry Roberts is the author of this paper. A senior editor at MeldaResearch.Com in nursing essay writing service services. If you need a similar paper you can place your order from research paper services.
No comments:
Post a Comment