Applying Coding Basics

Coding Basics, Day 2

Author

Matthew Sutcliffe, Madeline Gillman, JP Flores

Objectives of Coding Basics: Class 2

  • Be able to apply the objectives covered in Coding Basics: Class 1 to a new dataset

  • Identify and fix a bug in a code example

Your datasets

This class we will be working with the mtcars dataset. The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).

The other dataset we will be working with is the Palmer Penguins dataset. This is not a built-in dataset, so you will need to install it. You will only need to install the package once.

# this code is making sure that the correct files are installed during the project rendering
# Students, don't worry too much about this code. It is here to make sure that our curriculum
# book runs correcrtly, but if you are curious, feel free to ask teachers for more info. 
if(!require("palmerpenguins")){
  install.packages("palmerpenguins",repos = 'http://cran.us.r-project.org')
}
renv was unable to query available packages from the following repositories:
- # http://cran.us.r-project.org/src/contrib -----------------------------------
error downloading 'http://cran.us.r-project.org/src/contrib/PACKAGES.rds' [error code 22]
error downloading 'http://cran.us.r-project.org/src/contrib/PACKAGES.gz' [error code 22]
error downloading 'http://cran.us.r-project.org/src/contrib/PACKAGES' [error code 22]


The following package(s) will be installed:
- palmerpenguins [0.1.1]
These packages will be installed into "~/Documents/GitHub/Rclass-DataScience/renv/library/R-4.4/aarch64-apple-darwin20".

# Installing packages --------------------------------------------------------
- Installing palmerpenguins ...                 OK [linked from cache]
Successfully installed 1 package in 2.9 milliseconds.
install.packages("palmerpenguins")

Once it is installed, you will need to load the package into your R environment. You will need to do this anytime you want to use a package.

library(palmerpenguins)

You will also need to load the penguins dataset into your R environment:

data(package = "palmerpenguins")

Today’s class

Cars dataset

  1. The dataset is stored ‘under the hood’ in an object called mtcars. View the dataset. Use head() to view the first 5, 10, and 20 rows.
  2. Assign mtcars to a new variable of your choice.
  3. What is the data type of each column in the dataset?
  4. How many rows are in the dataset? How many columns? You may need to look up how to do this! Try searching “how to get number of rows in data frame in R” in Google.
  5. Run str(mtcars) . What is this output telling you? How does it compare to what you found in #3 and #4?
  6. For each column, find the mean, range, and median values. Are you able to do this for all columns? Why or why not?
  7. What value is in the 6th row and 10th column?
  8. Print every row of the 4th column.
  9. Print every column of only rows 28 to 31.

Penguins dataset

  1. The dataset is stored ‘under the hood’ in an object called penguins. View the dataset. Use head() to view the first 5, 10, and 20 rows.
  2. Assign penguins to a new variable of your choice.
  3. What is the data type of each column in the dataset?
  4. How many rows and columns?
  5. For each column, if possible, find the mean, range, and median values.
  6. For columns that you cannot find the mean/range/median of, try using the table() function, e.g. table(penguis$species) . What is this telling you?
  7. Currently, the bill_length and bill_depth columns are in millimeters. Create a new column with those values converted to centimeters. (HINT: look at what you did at the end of the “Accessing parts of a list” section in Class 1)
  8. Add two new columns to the data frame of your choice.
  9. The penguins dataset is not perfect–it has some missing values. Check the missing values in the column sex by running two functions: is.na(penguins$sex) and sum(is.na(penguins$sex)) .
    1. What is the difference between the two outputs?
    2. Compare to the result in #6.
    3. Use the help page for the table() function and see if you can get the output to include NAs.

Code debugging

Your former lab mate Weird Barbie graduated a few years ago. Before she left, she was working on some interesting analyses of the frequencies of Kens.

photo credit: Warner Bros.

Here’s the data below, which you will not (and should not) need to change:

# The data -- DO NOT EDIT 
ken_data <- data.frame(
  "ken_name" = c("Ken1", "Ken2", "Ken3", "Ken4", "Ken5", "Ken6", "Ken7", "Allan"),
  "hair_color" = c("Blonde", "Brown", "Black", "Red", "Blonde", "Brown", "Black", "Black"),
  "cowboy_hats_owned" = c(2, 0, 1, 3, 0, 1, 2, 0),
  "favorite_outfit" = c("Casual", "Formal", "Sporty", "Beachwear", "Formal", "Casual", "Sporty", "Casual"),
  "age" = c(25, 27, 26, 28, 29, 30, 26, 27),
  "height_cm" = c(180, 175, 182, 178, 180, 183, 177, 175),
  "weight_kg" = c(75, 70, 80, 77, 76, 78, 79, 70),
  "favorite_hobby" = c("Surfing", "Reading", "Soccer", "Volleyball", "Painting", "Cooking", "Dancing", "Guitar"),
  "favorite_color" = c("Blue", "Green", "Red", "Yellow", "Purple", "Orange", "Pink", "Blue"),
  "shoe_size" = c(10, 9, 11, 10, NA, 11, 10, 9),
  "best_friend" = c("Barbie", "Barbie", "Barbie", "Barbie", "Barbie", "Barbie", "Barbie", NA),
  "is_ken" = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE)
)

However, as is typical of Weird Barbie, her code is…weird. In almost all other aspects of life, that’s OK! But when it comes to you, three years later, trying to figure out what she did…not ideal. Here’s her code below. As it’s written, there are many bugs (code errors that either return an error or return an unexpected/incorrect result), the style is inconsistent, and there is no documentation. Using what you have learned so far, fix Weird Barbie’s code: find the bugs, smash the bugs (get the code to run), change to a consistent style, and add helpful comments. You may need to consult the style guide mentioned in Class 0, help pages, and Google.

str(ken_data)
haed(ken_data)

mean(ken_data$cowboy_hats_owned)
hist(ken_data$cowboy_hats_owned)
ken_data$more.than.1_cowboyHat <- ken_data$cowboy_hats_owned > 1
print(paste(sum(ken_data$more.than.1_cowboyHat), "Kens have more than 1 cowboy hat"))

range(ken$age)
range(ken_data$shoe_size)


correlation <- cor(ken_data$height_cm, ken_data$weight_kg)
print(paste("The correlation between height and weight is", correlation))
plot(ken_data$height_cm, ken_data$weight_kg)

table(ken_data$best_friend)
# looks like everyone's bff is barbie!

# outfits
table(ken_data$favorte_outfit)


# no allan
range(ken_data[1:7,5])

noAllan <- ken_data[1:7,]
also_noAllan <- noAllan <- ken_data[ken_data$is_ken == TRUE,]
range(noAllan$shoe_size)

# Are the sporty Kens taller than the other Kens?
sporty_kens <- mean( ken_data [ken_data$favorite_outfit == "Sporty", "height_cm"])
other_kens_mean <- mean(ken_data[ken_data$favorite_outfit != "Sporty", "height_cm"] )

sporty_kens > other_kens_mean