| Putting DLI Data to Use A Computing Exercise |
The following exercise provides a hands-on computing experience that introduces some basic approaches to quantitative analysis. Included in a workshop that was first presented at the 1996 Learned Socieities Congress and sponsored by the Humanities and Social Sciences Federation of Canada, the computing tasks below use a data file consisting of a subset of variables and cases drawn from the national Survey of Literacy Skills Used in Daily Activities, 1989. Initially, this customized data file and the instructions accompanying the exercise were prepared for use with the NSDStat statistical system. Subsequently, these products were modified for use with the SPSS statistical package. |
Becoming Familiar with the Data
Begin this exercise by familiarizing yourself with some of the variables in this customized data file. Using the accompanying data documentation, answer the following questions.
| Description | SPSS Variable Name |
|---|---|
| Which variable identifies province of residence? | . |
| Which variable captures the age of the respondents? | . |
| Which variable captures how often the respondents went to a public library? | . |
| Which variable contains the IRT Reading Ability score? | . |
| Which variable contains the four categorized reading levels? | . |
| Variable Name | Measurement Level | |
|---|---|---|
| Categorical | Analytic | |
| PROV | . | . |
| AGECLPSD | . | . |
| Q11C | . | . |
| Q22A | . | . |
| SEX | . | . |
| Q41 | . | . |
| IRT Reading Ability | . | . |
| Reading Level | . | . |
The customized file for this exercise has been saved as both an SPSS portable and system file. The system file, which was produced by processing the raw data in a previous SPSS/Windows session, contains just the data for the subset of cases and variables described in the accompanying data documentation. The system file is not readable by the eye but does include the data and all of the information declared for each variable, such as labels and missing values. The SPSS portable file is a text file version of this system file. However, the contents have been encoded to preserve the data and variable information in a format that is not machine dependent.
You must begin by retrieving a copy of either the portable or system file. Both are available at: ftp://datalib.library.ualberta.ca/pub. The SPSS system file is named dlilit89.sav while the portable version is named dlilit89.por. If you are using SPSS/Windows, the system file is the appropriate choice although the portable file also works. If you are using SPSS on any system other than Windows, retrieve only the portable file. Below is an example of retrieving a copy of the system file using ftp:
ftp datalib.library.ualberta.ca
anonymous
e-mail address
cd pub
binary
hash
get dlilit89.sav
quit
To load either the portable or system file in SPSS for Windows, begin the SPSS program and select the File option from the menu at the top of the SPSS window.
Next, select the Open option and specify the file type and location of the file on your machine.
In the example shown, the file dlilit89.por in the directory si has been identified and declared as an SPSS portable file. Clicking on OPEN will load the data from this file into the current Data Editor (as shown in the following figure.)
You are now ready to complete the data analysis described below.
Descriptive Statistics: Working with Categorical Variables
Analysis Objective. When pursuing background about a policy issue, one question often asked is, "How big will be the impact?" Similarly, when investigating a social problem, the question typically becomes, "How many people face this problem?" One approach to answering either of these questions is to compute population estimates of the focal group. An estimate will provide some sense about the scale of the social problem or the impact of a particular social policy.
Analysis Issue. One concern in the late 80's was the estimate of functional illiterates in Canada. Southam Press had conducted research in 1987 that suggested one in four adults in Canada were functionally illiterate. Statistics Canada conducted their own survey in 1989 to address this issue.
Using the summary variable for reading ability level (RDLEVELA) in the file loaded above, obtain population estimates for this variable. Initially, the frequencies for variables in this file are based on the sample size, that is, the number of respondents in this survey. Statistics Canada, however, has provided a variable that weights cases (1) to adjust for the sampling methodology employed in gathering the data and (2) to provide population estimates. Because not every case in the study had the same probability of being selected for this survey, a weight variable was added to correct for unequal probabilities. In addition, these corrections also rescale the frequencies to an estimate of the population from which the sample was drawn.
Exercise. To use the Statistics Canada weight variable in an analysis, SPSS must first be instructed to perform weighting and assigned the variable containing the weight values. In this instance, the name of the Statistics Canada weight variable is WGHT10. The script for making this assignment in SPSS is given below.
| Total Weighted N = |
| Number of Missing Cases = |
| RDLEVELA | Number or Frequency | Valid Percent |
| Level 1 | . | . |
| Level 2 | . | . |
| Level 3 | . | . |
| Level 4 | . | . |
| Step | Instruction | Answer |
|---|---|---|
| Step 1 | Record the weighted frequency for Level 1: | . |
| Step 2 | Round the figure in Step 1 nearest 1,000: | . |
| Step 3 | Using the Approximate Variance Table, look down the far left column the number closest to the figure in Step 2 (the table lists values in '000). Follow the string of asterisks to the right until you encounter a number. Record the value from this table: | . |
| Step 4 | Using the Sampling Variability Guidelines, compare the figure recorded in Step 3 with the table in the guidelines. How should the population estimate for Level 1 be reported according to the guidelines? | . |
| Step | Instruction | Answer |
|---|---|---|
| Step 1 | Record the percentage of the population with a Level 3 reading ability level: | . |
| Step 2 | Convert the percentage into a proportion by dividing by 100, i.e., move the decimal point two places to the left and record the value: | . |
| Step 3 | Enter the weighted frequency for Level 3: | . |
| Step 4 | Round the figure in Step 3 nearest 1,000: | . |
| Step 5 | Using the Approximate Variance Table, look down the far left column for the number closest to the figure in Step 4 (remember the table lists values in '000). Follow the string of asterisks to the right until you encounter a number. Record the value from this table: | . |
| Step 6 | Convert the percentage in Step 5 to a proportion by dividing by 100, i.e., move the decimal point two places to the left and record the value: | . |
| Step 7 | A 95% confidence interval (CI95) brackets a parameter estimate and thus, consists of two values: an interval minimum and maximum. To calculate these values, complete the following equations:
CI95 minimum = .222 - ( 2 * .222 * .033), where .222 comes from Step 2, the 2 in parentheses is the approximate value for the 95% level and .033 is the coefficient of variation from Step 6. CI95 maximum = .222 + ( 2 * .222 * .033) | . |
The next exercise entails examining the distribution of the reading ability levels of just those respondents who have completed some secondary education. To approach an analysis in this way, one is looking at a special subpopulation and examining key dependent variables of this group. For example, what is the reading ability level of those who have some secondary education but who did not receive a secondary degree? This is what we will explore.
Turn to the data documentation and find the variable containing the respondent's highest level of education. Next, identify the code used to classify those who "completed some secondary education." Record your answers here.
| Variable Name of Respondent's Highest Level of Education | Code for the Category: Completed Some Secondary Education: |
|---|---|
| . | . |
Select Q22A and complete the equation as follows: Q22A = 3
Click Continue and and then OK.
| Total Weighted N = |
| Number of Missing Cases = |
| RDLEVELA | Number or Frequency | Valid Percent |
| Level 1 | . | . |
| Level 2 | . | . |
| Level 3 | . | . |
| Level 4 | . | . |
| Step | Instruction | Answer |
|---|---|---|
| Step 1 | Record the percentage of those with a Level 2 reading ability level: | . |
| Step 2 | Convert the percentage into a proportion by dividing by 100, i.e., move the decimal point two places to the left and record the value: | . |
| Step 3 | Enter the weighted frequency for Level 2: | . |
| Step 4 | Round the figure in Step 3 nearest 1,000: | . |
| Step 5 | Using the Approximate Variance Table, look down the far left column for the number closest to the figure in Step 4 (remember the table lists values in '000). Next, move across the columns at the top of the Table until you find a percentage close to the value in Step 1. The figure that intersects this row and column is the coefficient of variation to be used. Record the Table value: | . |
| Step 6 | Convert the percentage in Step 5 to a proportion by dividing by 100, i.e., move the decimal point two places to the left and record the value: | . |
| Step 7 | Calculate upper and lower 95% CI: CI95 min = a - ( 2 * a * b), where a = the proportion from Step 2 and b is the coefficient of variation from Step 6. CI95 max = a + ( 2 * a * b), where a = the proportion from Step 2 and b is the coefficient of variation from Step 6. | . |
Re-select All Cases and Change Weight Variables
Next, turn off the filter that selected the subpopulation defined above so that subsequent analyses will use all of the cases. From the menu bar, select Data, Select Cases, then choose the button for All cases, and click OK.
The above weight variable not only corrected for the sampling methodology, but also produced population estimates. There are times when population estimates are not really required. Instead of wanting to know an estimate of the number of people in Canada with a certain attribute, the research focus is on the proportion or average of some property. Nevertheless, a weight variable is still required to adjust for the sampling methodology. Without this adjustment, the results cannot be generalized to the population. A second weight variable has been included with the Literacy work file that re-scales the weight variable back to the sample size, that is, to the size of the original sample rather than the estimated population size.
Now change the weight variable to the re-scaled weight variable, which is named, WT. From the menu bar, select Data, Weight Cases, replace WGHT10 with WT, and click OK.