functions.Rmd
So far we’ve used a lot of functions that already exist in R. Now you will learn how to write your own functions. User-written functions take the following structure:
myfunction <- function(arg1, arg2, ...) { do some stuff return(object) }
Make a simple function that just returns the arument that it is given.
first_function <- function (x) { return(x) }
Now if you pass a value to first_function
it should return that value:
first_function(9) ## [1] 9
A function that only returns the given value is not useful, but it does illustrate how you can add functions to your environment. Now make a function that squares a value and then adds one to it.
second_function <- function(x) { ans <- x^2 + 1 return(ans) } second_function(9) ## [1] 82
An environment is a place to store variables. As we discussed in class 1, when you make assignments in R, they are generally added as entries to the global environment.
Functions are evaluated in their own environments. When a function is called a new environment is created. This new environment is called the evaluation environment. Functions also have an enclosing environment, which is the environment where the function was defined. For a functions defined in the workspace the enclosing environment is the global envrionment.
When a function is evaluated, R looks through a series of environments for variables called. It first searches the evaluation environment and then the enclosing environment. This means that a global variable can be referenced inside a function. This principle is shown below by third_function
which sums the argument x
with the variable a
from the global environment.
a <- 9 third_function <- function(x) { x + a } third_function(11) ## [1] 20
The evaluation environment is populated with local variables as the evaluation of the function procedes. Once the function completes running, the evaluation environment is destroyed. Look at second_function
for example. Within the function the variable ans
is assigned. However, when you evaluate the function the ans
variable is not available the global environment because it was assigned in the evaluation environment, and the evaluation environment was destroyed after the function completed.
second_function(9) ## [1] 82 ls() ## [1] "a" "first_function" "second_function" "third_function"
As you can see, listing objects in the global environment only returns the global variable a
assigned previously and the three functions that you have defined so far.
The don’t repeat yourself (DRY) principle states that
every piece of knowledge must have a single, unambigous, authorative representation within a system - Dave Thomas and Andy Hunt.
There is a lot to think about when using the DRY principle, like not repeating variable names throughout your code. The DRY principle is particularly useful when thinking about functional programming. If you find yourself writing the same code more than once, or copying and pasting, consider writing a function. Think back to the primer list discussed in last class, and consider the following code to generate random primers:
bases <- c('A', 'T', 'C', 'G') l1 <- sample(15:35, 25, replace = TRUE) p1 <- character(length = length(l1)) for (i in 1:length(p1)) { p1[i] <- paste(sample(bases, l1[i], replace = TRUE), collapse = "") } l2 <- sample(15:35, 25, replace = TRUE) p2 <- character(length = length(l2)) for (i in 1:length(p2)) { p2[i] <- paste(sample(bases, l2[i], replace = TRUE), collapse = "") }
Now imagine you have to create 100 sets of primers. After you fininsh you realize you need to change the list of possible primer lengths. If you wrote code like above you would have to edit each line. Notice how similar the code is to generate the two sets of primers above. You may see how the code above could be functionalized. Consider what may be variable in the code above, and create a function parameter for each variable part of the code. You may wish to change the minimum primer length, the maximum primer length, the number of primers in each list, and even the possible bases.
# Create a funciton to primer lists. genPrimer <- function(minlen, maxlen, nprimers, bases = c('A', 'T', 'C', 'G')) { l <- sample(minlen:maxlen, nprimers, replace = TRUE) p <- character(length = length(l)) for (i in 1:length(p)) { p[i] <- paste(sample(bases, l[i], replace = TRUE), collapse = "") } return(list(primers = p, lengths = l)) } primers1 <- genPrimer(minlen = 15, maxlen = 35, nprimers = 25) primers2 <- genPrimer(minlen = 20, maxlen = 40, nprimers = 25)
Notice three things: (1) now any change needed in the primer generation only requires the code to change in one place – greatly reducing the amount of work, and more importantly, the chance for error; (2) notice how ‘bases’ is defined in the function with a default value. You can probably imagine how many functions benefit from default parameter values. For the genPrimer
function above you will most often only generate sequences with nucelotides found in DNA, so it makes sense to provide a default value for the bases. Almost all of the functions you will use in R run with default values that you may or may not be aware of; (3) notice an R function can only return one object. Here, we want to return both the primers and their lenghts. We can return multiple objects by storing them all in one list.
Similarly, you can functionalize the molecular weight and Tm calculators:
primer_analysis <- function(primers) { nprimer <- length(primers$primer) # Create results list res <- list (primer = character(length = nprimer), mw = numeric(length = nprimer), Tm = numeric(length = nprimer), length = numeric(length = nprimer)) # For-loop iterating through the 1:(number of primers) for (p in 1:nprimer) { sequence <- primers$primer[p] len <- primers$length[p] # Count bases nA <- 0 # number of A's nC <- 0 # number of C's nG <- 0 # number of G's nT <- 0 # number of T's for (j in 1:len) { l <- substr(sequence, j, j) # Get jth nucleic acid of pth primer if (l == "C") { nC <- nC + 1 m <- 467.2 } else if (l == "A") { nA <- nA + 1 m <- 491.2 } else if (l == "G") { nG <- nG + 1 m <- 507.2 } else if (l == "T") { nT <- nT + 1 m <- 482.2 } } # End of count base for-loop # Calculate mass pmass <- nC*467.2 + nA*491.2 + nG*507.2 + nT*482.2 # Calculate Tm if (len <= 13) { Tm <- (nA + nT)*2 + (nG + nC)*4 } else { Tm <- 64.9 +41*(nG+nC-16.4)/(nA+nT+nG+nC) } # Assign values to the results matrix res$primer[p] <- sequence res$mw[p] <- pmass res$Tm[p] <- Tm res$length[p] <- len } # End of primer for-loop return(res) }
Now you can use the primer_analysis
function to the primers1
and primers2
datasets you generated.
results1 <- primer_analysis(primers1) results2 <- primer_analysis(primers2)
So far we have only discussed one aspect of function control statments: the return
function. The return
function can actually go anywhere in the function, and a function can have multiple return
functions. For example consider a function that returns “big number” for numbers greater than or equal to 100 and “small number” for numbers under less than 100:
xmpl_func <- function(x) { if (x >= 100) { return("big number") } return("small number") } xmpl_func(1) ## [1] "small number" xmpl_func(1e10) ## [1] "big number" xmpl_func("see what happens") ## [1] "big number"
When a function encounters the return
function the returns the value and halts any execution. The other two functions worth knowing are: stop
and warning
. The stop
and warning
functions allow you to do tests within the function and exit the function (stop
) if necessary, or issue a warning (warning
).
In the xmpl_func("see what happens")
example, the function returned TRUE
. (Recall from class 1 that both would be converted to character in the if statement, which is why the code did not error.) However, you may wish to make sure the input to xmpl_func
is numeric and has a length of 1. Modify xmpl_func
to test the input.
xmpl_func <- function(x) { # Test the length of x if (length(x) > 1) stop("'x' must be of length 1.") # Test the class of x if (!is.numeric(x)) stop("'x' must be numeric.") if (x >= 100) { return("big number") } return("small number") } xmpl_func("see what happens") ## Error in xmpl_func("see what happens"): 'x' must be numeric. xmpl_func(1:10) ## Error in xmpl_func(1:10): 'x' must be of length 1.
Finally, you may wish to issue a warning if the input can be converted to numeric instead of an error:
xmpl_func <- function(x) { # Test the length of x if (length(x) > 1) stop("'x' must be of length 1.") if (is.logical(x)) { x <- as.numeric(x) warning("'x' was converted from logical to numeric.") } if (!is.numeric(x)) stop("'x' could not be converted to numeric.") if (x >= 100) { return("big number") } return("small number") } xmpl_func("see what happens") ## Error in xmpl_func("see what happens"): 'x' could not be converted to numeric. xmpl_func(TRUE) ## Warning in xmpl_func(TRUE): 'x' was converted from logical to numeric. ## [1] "small number"
Notice the function completed when the input was logical. The warning
function does not halt execution, but does print a warning message for the user.
These exercises are to help you solidify and expand on the information given above. We intentionally added some concepts that were not covered above, and hope that you will take a few minutes to think through what is happening and how R is interpreting the code.
Write a function that solves the quadratic formula. Recall, given: \[ax^2 + bx + c = 0\] The quadratic equation is: \[x = \frac{-b\pm\sqrt{b^2-4ac}}{2a}\] Use it to solve the following equations. \[x^2+x-4=0\] \[x^2-3x-4=0\] \[6x^2+11x-35=0\]
Add the necessary checks for input length and class to the quadratic formula function from question 1.