In the field of data analysis, if it is necessary to classify data into groups or sets, we can use SPSS's K-means clustering analysis method, which belongs to the centroid clustering algorithm and can optimize the form and definition of data sets.

K-means clustering analysis refers to finding k cluster centers and assigning the intended data to the nearest cluster center, with each cluster represented by a center vector. By assigning continuity values to the nearest cluster center, we ensure that the sum of the mean distances between each sample and its class is minimized.

1. The following chart shows the market data of a popular earring brand in the south, including the types of earring products, individual length, individual profit, and sales volume. The brand wants to divide the earrings into three categories based on the above four aspects, and formulate marketing and promotion strategies according to the categories.


 

Figure 1: Ear accessory product data


2. In order to classify earring products, we chose an iterative and classification method with a clustering number of 3. We used the product type as the case basis, and then placed the length, sales volume, and profit of each earring product into the variable boxes of K-means clustering analysis.


Figure 2: Number of clusters and classification

3. Next, we will enter the save page for cluster analysis. We will save the new variables as cluster members and need to view the numerical results of the distance from the cluster center.



Figure 3: Setting Cluster Members


4. In the options interface of cluster analysis, we default that if there are missing values, the overall data will be excluded as individual cases in columns, and then individual case clustering, single factor analysis, and statistical operations of initial cluster centers will be performed.


Figure 4: Statistical options for clustering information

 

By finding the function keys for K-means clustering, selecting the iterative classification method, setting case annotation criteria, and saving cluster members, we obtain the cluster center and ANOVA table to interpret the results of clustering analysis. The numerical table of cluster members displays the distance between cluster members and the center, while the ANOVA table shows whether the classification is valid.

1. In the iteration function interface below, we enter 20 iterations and the default convergence criterion is 0. Based on the iteration history, we can see the sum of squares of clustering during each iteration process. If the iteration stabilizes after a few iterations, it indicates that the data model has found a relatively stable solution.



Figure 5: Convergence criterion is 0


2. In the table of cluster members, earrings with butterfly flower shape, crystal blue water droplet, matte round stone, plush pink pendant, plush circular ring, and plush straight pendant belong to cluster 1, earrings with copper multi-point line pendant belong to cluster 2, and earrings with matte red heart pendant, silver wire straight pendant, copper straight pendant, and copper circular ring belong to cluster 3.

Figure 6: Earrings and pendants of different styles


3. The mean square of the clustering for the length of individual earrings is 99.188, with a significance value less than 0.001. The mean square of the clustering for the sales volume of earrings is 1240205156, with a significance value less than 0.001, indicating that the length and sales volume of individual earrings have an impact on the clustering of earrings.


Figure 7: Single item profit does not affect clustering

 

The above is the answer to how to do PSS K-means clustering analysis and interpret the results of SPSS K-means clustering analysis. To optimize the form and definition of the data set, it is recommended to use the K-means clustering analysis method in SPSS. Finally, everyone is also welcome to visit the Chinese website of SPSS to learn more about operational techniques for data analysis.