logo

 

Part 2 Cleaning SurveyMonkey Data in Excel
(Advanced Spreadsheet Format)

For this section we will continue to use AdvancedSpreadsheet-numbers.xls. If you don't have the previous file or you started the tutorial at this point, you can download a copy. We will be making changes to this file through the lesson.

Action: Download file AdvancedSpreadsheet-numbers.xls (if necessary)

There are a number of variables that SurveyMonkey has added to our Excel file. We will remove these columns. The first column is the ResponderID. While it may not be useful in this exercise you should get in the habit of keeping user or row IDs. A discussion of IDs is beyond the scope of this tutorial, but we shall keep the ID column anyway.  We will leave column A, and delete columns B-J.

Action: Delete columns B-I

Our first task is to name our variables. As we saw in part 1, SurveyMonkey has exported the question as the variable names. This wont work because they are much too long. In this lesson we will change the long variable names to short and intutitive names.

We will open a second row in our Excel file and create the variable names in this row.

Action: Insert a new row in Excel after row 1

Now we will name our variables. We will start with column B where the variable corresponds to the question "I have a pet dinosaur or have had a pet dinosaur in the past." In the blank second row cell put the word PetDino. That will be our variable name.

A quick word about naming conventions.

I have examined data from a variety of sources and it seems that the people making these data files often forget that they may not be the only ones working with the data. There is nothing more frustrating then trying to work with a column of data with the variable name V6. This tells me nothing and when I work in SPSS's data view, I often have no idea which variables I am looking at. At the same time I have worked with files that have better variable names, yet quite often these names are impossible to decipher. Eth962 is a lousy name for a variable. It is important to choose a naming convention that allows you, and anyone else who may be using your data, to understand which variable are which when in dataview.

One option is to use capital and lowercase letters in a system that make reading the variable name easy. For example, the variable which is derived from "I have a pet dinosaur or have had a pet dinosaurs in the past." can be written as PetDino. Underscores are also helpful and allowed in SPSS (pet_dino).

Now we will go through the entire spreadsheet and name the rest of the variables. Here is the list. There are 52 variables and that is a lot of work. If you would like to cheat you can open file four. but I suggest doing a few to get the hang of it, as well as taking a look at the variable names to understand more about naming conventions.

Action: Name variables using the list below. If you would like to cheat and use a completed file, use file Advanced-Finished.xls

Column
Variable Name
Column
Variable Name
 

A

RespondentID

AA

TraitLooksOvir

 

B

PetDino

AB

TraitPerOvri

 

C

DinoPetBrach

AC

GoodPet

 

D

DinoPetOvir

AD

DinoOutlawed

 

E

DinoPetTric

AE

ImportComp

 

F

DinoPetTyr

AF

ImportLone

 

G

DinoPetVelo

AG

ImportGuard

 

H

DinoPetNone

AH

ImportSitter

 

I

ScaryBrach

AI

ImportNoisy

 

J

ScaryOvri

AJ

ImportHolid

 

K

ScaryTric

AK

PetEasy

 

L

ScaryTyran

AL

PetExp

 

M

ScaryVelo

AM

PetApart

 

N

TraitIntelBrach

AN

PetFarm

 

O

TraitLooksBrach

AO

PetYard

 

P

TraitPerBrach

AP

DinoBirth

 

Q

TraitIntelTric

AQ

DinoNum

 

R

TraitLooksTric

AR

OpenExper

 

S

TraitPerTric

AS

UserAge

 

T

TraitIntelTyran

AT

UserChildNum

 

U

TraitLooksTyran

AU

UserBoys0to7

 

V

TraitPerTyran

AV

UserBoys8-12

 

W

TraitIntelVelo

AW

UserBoys13-17

 

X

TraitLooksVelo

AX

UserGirls0to7

 

Y

TraitPerVelo

AY

UserGirls8to12

 

Z

TraitIntelOvir

AZ

UserGirls13to17

 

 

 

BA

OpenKnow

Now that we have named our variables we can delete the top row

Action: Select the top row in the Excel spreadsheet and delete it.

Now that we have cleaned up the top rows let's look at the data. A lot of this data can be left as is. But there is one type of question that has values that must be changed. These are the choose all that apply type of questions. Our survey has one of those types of questions and we discussed the output in in part one.

Now we will work with this data. As you may recall from part one, with this question the user had the option to select more then one type of dinosaur. If a dinosaur was selected, then a number was entered into the cell. If the user did not select the dinosaur then the cell remained blank. This is a problem because instead of a yes and no you have yes and missing. We will change that now.

I prefer to keep things consistent, so we will use a 1 for yes and a 2 for no throughout. This means we will need to change each number to a 1 and each empty cell to a 2. We will use the find and replace option in Excel. We will do each column one at a time. Select column C.

Use the find and replace option which is located on the Home ribbon in Excel. Leave the Find What textbox empty. In the Replace With box put a 2. Click the Find Next button and then the Replace button and the previously blank cell should have a 2. If this worked then click replace all. The entire column should have either 1 or 2 in the cells.

The next column is a bit trickier because the yes boxes have a 2. We will fix that first. Select column D then open the find and replace dialog box. In the Find What textbox put the number 2. In the Replace With box put the number one. We are switching the numbers around because we want yes to be 1.

Then choose column D again and leave the find what box blank and put a 2 in the replace with box.

Now we will make these changes to columns E through H.

  • E - change 3 to 1
  • F - change 4 to 1
  • G - change 5 to 1
  • H - change 6 to 1

Replace all blank cells in columns E through H to 2

Action: Close all files

 

We are ready for SPSS!

 

Back to part 1 of the Advanced Spreadsheet Format Continue to SPSS

 

© 2010 Laura Kazan