Welcome to How to Learn to Code!

Introduction

This page will walk you through setting up access to UNC’s computing cluster and introduce you a bit to R and R Studio so we can hit the ground running in the first class. To ensure you have access to the UNC cluster (and thus able to participate in class), please review this document in full at least 24 hours in advance of the first class–Research IT will need time to approve your account request.

Class 0 Objectives

  • Request a Longleaf account

  • Launch an R Studio session on OnDemand

  • Know what each of the four panels in R Studio show

R vs. R Studio

In this class, you’ll hear these two terms a lot. They sound similar, but they are actually very different! R is the programming language we will be learning in this class. R Studio is a user-friendly interface (or IDE, integrated development environment) we will be using to write scripts in R and interact with R software.

Longleaf

“Longleaf” is the name for UNC’s computing system. Researchers in all departments across UNC use it to run analyses, store data, and use programs that require GPUs. Whenever someone says they are “on Longleaf” or “running code on Longleaf” it means their personal computer is connected to the cluster and they are either actively interacting with a program running on the cluster (we will be doing this with R Studio!) or writing code that tells the cluster to perform certain tasks whenever it has the memory availability.

Before the first class, you will need to request access to Longleaf. Follow the instructions on the Research IT website. In addition to your onyen and email address, you’ll need the following information:

  • Preferred shell: bash

  • Faculty sponsor name and onyen: You can put your PI here, or if you do not have a PI, leave blank.

  • Type of subscription: Longleaf

  • Description of work you will do on the cluster: How to Learn to Code R class

It may take ~24 hours before your account is approved.

OnDemand

OnDemand is a web portal that allows you to access Longleaf. We will be using OnDemand to launch R Studio and run R code. You will need to have your Longleaf account approved before accessing OnDemand.

To launch OnDemand, navigate to this site in a browser of your choice: https://ondemand.rc.unc.edu (you may want to bookmark this site, you’ll be accessing it for each class).

Once you’ve logged in, you’ll see a page like this. Click on the RStudio Server tile.

screenshot of the OnDemand home page with a red box around the RStudio Server icon.

This will take you to a page where you can fill out some parameters for your R Studio Server session. The only one you’ll need to adjust is “Number of hours” where you should put “2”.

Screenshot of the R Studio Server request page where a user inputs the R version, number of hours, and number of CPUs requested before launching the session.

Note

You can request up to 10 hours, but it’s good practice to only request the amount of time you’ll need (Longleaf is a shared resource!). Since each class is 90 minutes, you’ll likely only need to request 2 hours for each class. Under “Additional job submission arguments” you adjust the amount of memory requested. This won’t be needed for How to Learn to Code classes, but may be needed when you are running your own analyses on large datasets in the future.

After you’ve filled out the appropriate information, click Launch. This will take you to the “My Interactive Sessions” page. Your session request may be queued for a minute while space on the cluster is being allocated for your session. Once it’s ready, click “Connect to R Studio Server”. This will launch R Studio in a new tab.

Running code

Try running the below line of code using one of the four ways described above. First, copy the below line of code and paste it into the Source pane.

print("hello world!")

Before executing the code, your Source and Console panes will look like this:

After executing the line of code, your source pane will look like this:

Congrats! You just ran your first line of code. If you want to save your script (what’s written in the source pane) go to File > Save as and save your script with a helpful name in a location that makes sense (e.g., maybe in a folder called “H2L2C_class” and name the script “hello_world.R”).

Review the rest of the information on this page before Class 1, but don’t worry if it doesn’t make sense right away. We will be going over some of it in the first class and touching on it throughout the course.

Talking like an R user

Below is some jargon that you may hear during class. Don’t worry about memorizing it all before Class 1! Just know that it’s here so if you are ever wondering what a term means you will know where to look.

Running code/run this line/execute: Telling R to perform the command given in a console. If someone says “run this line of code” that means to send it to the console (either by copying/pasting or using one of the shortcuts mentioned previously).

Data types: Data types in R include numeric, logical, and character. There are a few more, but those are the main three. We will touch on these more in Classes 1 and 2.

Vector: A series of values of any data type. A vector is created using the c() function (c for combine/concatenate).

Factors: This is the R term for categorical data. Sometimes R will automatically treat data as categorical (especially if it is a character type), but not always. You can coerce other data types (like numeric) to factors using using the factor() function and specifying the order using the levels argument. For more information on factors, see this page in the R for Data Science book.

Data frame: The best way to think of data frames is a spreadsheet. Technically, they are composed of vectors. Typically the rows in a data frame will correspond to observations and the columns will correspond to variables describing those observations. Data in a data frame can be of different types–i.e. you can have one column be character (maybe describing hair color for each observation) and another be numeric (maybe describing height for each observation).

Matrix: A matrix in R is very similar to a data frame. Unlike a data frame, all elements must be of the same data type.

Functions: A function performs a given task. This task can be very simple (add two numbers) or more complex (create a large data frame, run a linear regression, save the output to a csv file). R has many built-in functions you will use. Many packages also have functions you can use.

Packages: Packages in R are extensions of what is called “base R.” Base R refers to using R without any add-ons (i.e., no packages). Packages can have data, functions, and/or compiled code. It is the responsibility of package developers to maintain their package–which means some undergo frequent updates and some haven’t been touched in years (and thus might not work anymore for whatever reason). It also means that some packages can have bugs or might not be appropriate for your data/analysis. To use a package, you will first need to install it using the function install.package("<package_name>"). You will only need to install the package once. Each time you want to use the package, you’ll need to load it into your environment: library(<package_name'). Once it is loaded into your environment, you will be able to use any functions or data in the package.

global vs. local: Global refers to something (usually a variable) that is accessible to the entire program/code. Local refers to something (usually a variable) that is accessible only relative to something else (such as within a specific code block, like a function).

# Global variable
x <- "airplane"
# Function that defines a local variable
my_function <- function() {
    y <- "car"
}
# Accessing the local variable outside the function returns an error
y
# But the global variable is accessible
x
# Accessing the local variable 
z <- my_function()
z

directory: A directory is another term for what you may refer to as a “folder” on your computer.

paths: Paths are the directions to files and folders on your system. Understanding paths is important for reading your data into your R environment, since you will need to tell R where the file is located. You can have global and local paths. Global paths are sort of like the full set of instructions starting from your home base. Local paths are instructions given a certain starting location. Here’s an example of a global path to an example file on Longleaf: /work/users/g/h/goheels/my_project/my_data.csv. Here’s an example of a local path, given the starting spot of the goheels directory: my_project/my_data.csv.

syntax/style: The visual appearance (spaces, indentations, capitalization) of your code greatly improves readability and makes it easier for someone else to quickly understand what it’s doing (or you six months later). In this class, we will follow the Tidyverse Style Guide and encourage you to reference it during class to ensure you are consistently naming variables and using appropriate syntax. The sections most relevant for now are Files, Syntax, and the “Comments” section of the Functions page. Later classes will touch on pipes and ggplot2.

Conditionals: A conditional is a line of code that will run only if a particular condition is met. You can recognize these by the use of “if” “else” or “while”. The best way to understand these is by actually reading the code out loud. If you were to read the below example out loud, you might say “if x equals 3, print ‘condition is met’, else print ‘condition is not met’”. What do you think will happen if x == 3? What if x == 4?

if(x == 3) {
  print("condition is met")
} else {
  print("condition is not met")
}

Understanding errors and warnings

You will get lots of errors during your How to Learn to Code Journey. “Warnings” indicate your code ran, but some non-fatal issue arose. Sometimes these are OK to ignore, sometimes they indicate an issue you need to look into further. Either way, they should always be investigated! “Errors” are fatal issues and may be the result of things like syntax errors, typos, and incorrect data types. A good starting point for investigating any error or warning is Google (chances are quite high someone has run into the same issue, especially when you’re just learning how to code). You can copy and paste the entire error/warning into Google and usually return a helpful result.

Use of AI tools

Using AI tools such as ChatGPT and Microsoft Copilot can be really helpful! But before turning to these tools for assistance, try figuring out the solution yourself. Part of learning how to code is learning how to think like a coder, and that requires doing things the hard way for a bit. Remember that you are responsible for understanding what your code is doing and why, and that the output is accurate. Additionally, depending on the type of work you are doing, you many need to use additional caution when copying and pasting code/data (any questions/concerns on this should be directed to your PI/department).

I’m stuck! Additional resources

Research IT page on how to use OnDemand: https://help.rc.unc.edu/ondemand

Getting started on Longleaf: https://help.rc.unc.edu/getting-started-on-longleaf

More stats please! https://odum.unc.edu/education/short-courses/#course1

Still haven’t found what you’re looking for? Post a message in the How to Learn to Code Teams!

Bonus: Installing R Studio on your personal computer

A lot of you may want to use R on your personal computer (i.e., not on Longleaf). There may be reasons why you want to stick with Longleaf though (e.g., data should not be downloaded on personal devices, data/analysis requires a lot of memory). If you are interested in installing R and R Studio on your personal computer, you can use the below resources for help. All classes will be taught assuming you are using Longleaf though, so class time won’t be dedicated to troubleshooting R install issues on personal computers.

If you just want to click 2 buttons and figure it out: https://posit.co/download/rstudio-desktop/

If you want a more detailed install walkthrough: https://rstudio-education.github.io/hopr/starting.html