Many research projects obtain data through questionnaire surveys, but due to practical operations or cooperation issues with respondents, there may be occasional problems with missing questionnaire data. Generally speaking, the number of questionnaires is relatively large, making it difficult to check for missing answers one by one. Therefore, we always check for missing values when processing data in SPSS. So, how to identify missing cases in SPSS and do missing cases affect the results?

1、 How to identify missing cases in SPSS

The missing case data in SPSS means that some variables have blank case data. If there is relatively little data, it can be quickly discovered; If there is a lot of data, it is difficult to observe it with the naked eye. At this point, we need to use SPSS analysis methods to check. Below, we will introduce two methods: descriptive analysis and missing value analysis.

Method 1: Descriptive analysis is the basic data analysis function of SPSS, which can quickly understand the basic overview of data.



1. As shown in Figure 1, open the analysis menu of SPSS to enable the descriptive analysis function in the descriptive statistics section.


Figure 1: Description and Analysis


2. Taking a set of data on customer flow, sales revenue, and unit price as an example, we will add these variables to the analysis variables so that we can check for missing values in each variable. The setup for describing analysis is very simple. If there are no special requirements, you can click to confirm the output results.


Figure 2: Describing Analysis Settings


3. In describing the analysis results, if you want to see if there are any missing data, you can refer to the "N" value, which is the number of cases. Due to the total sample size of 198 in this case, and the number of cases for sales and unit price being 197, it indicates that both variables have missing cases.



Figure 3: Describe the statistical results


Method 2: SPSS missing value analysis function. Compared to descriptive analysis, this function is more practical because it can both check for missing values and calculate and replace them.

1. Open the analysis menu in SPSS and enable the missing value analysis function at the bottom of the menu.



Figure 4: Missing Value Analysis


2. Similar to descriptive analysis, we need to add all the variables that need to be checked to the "quantitative variables" box, and other categorical variables and case labels can be left blank.


Figure 5: Missing Value Analysis Settings


3. Then, as shown in Figure 6, in the estimation function, check the "EM" and "Regression" options. These two options can help us analyze whether the missing values are "randomly missing" and whether the missing values follow a certain distribution pattern.

Because if the missing values are not random, there are human factors involved, such as low-income individuals intentionally not answering income questions, which can affect the analysis results.



Figure 6: Estimation Settings


4. The output result of missing values is shown in Figure 7. We can see from the "missing" item that sales are missing 3 case values and unit price is missing 1 case value.


Figure 7: Univariate statistics


5. So, are these missing values random? We can take a look at the EM test results, as shown in Figure 8, which indicate an importance P-value of 0.177>0.05, rejecting the null hypothesis. This means that we can confidently conclude that the missing data follows a random distribution.


Figure 8: EM test results


2、 Does the absence of SPSS cases affect the results

Whether missing cases in SPSS will affect the results depends on the number of missing cases and whether the missing values are randomly distributed. If the number of missing values is relatively small and follows a completely random distribution, it has little impact on the results; If the number of missing values is relatively large, or if the missing values are not randomly distributed but have a certain degree of systematicity, it may lead to biased results. For example, some low-income individuals may intentionally not answer income related questions, which can result in higher income values when calculating income.

If it is a random missing, the replacement of missing values function in SPSS can be used to complete it. If it is a system deficiency, efforts should be made to supplement the collection as much as possible.



Figure 9: Replacing Missing Values


In the missing value replacement function of SPSS, as shown in Figure 10, we can specify the calculation method for missing values. The available methods include sequence average, linear interpolation, and linear trend of adjacent points. If the data is relatively concentrated and there are few extreme values, choosing the sequence average is sufficient.


Figure 10: Replace missing value settings


After applying the function of replacing missing values, SPSS will generate a new variable on the right side, with the red circle indicating the location of the missing value of the original variable, which has now been filled with the replacement value.


Figure 11: Missing values replaced