Week 11: Debugging and defensive programming

Hi everyone!

This week we learned about debugging in R. Debugging involves fixing the issues as they arise in the code you write. In R, there is an in-built debugging function in the Debug tab that toggles breakpoints. To take a more code based debugging approach, we can use traceback(), debug(), or trace() to see line by line where the error lies.

The code below contains a 'deliberate' bug!

tukey_multiple <- function(x) {
   outliers <- array(TRUE,dim=dim(x))
   for (j in 1:ncol(x))
    {
    outliers[,j] <- outliers[,j] && tukey.outlier(x[,j])
    }
outlier.vec <- vector(length=nrow(x))
    for (i in 1:nrow(x))
    { outlier.vec[i] <- all(outliers[i,]) } return(outlier.vec) }

Find the bug and fix it!

The first step I took to understand where the bug may be was to simply run the code in R to see the error message which is seen below.

This message says there is an unexpected symbol in the for loop and return.

Usually, the return after a for loop should be on an indented separate line. This will be my first change to the code. After running return(outlier.vec) on a new line, that error disappered. I tried the tukey_multiple function with the integer 3 and received an error saying the dimensions cannot be of length 0.

My next step was debug the tukey_multiple function to if the issue lies in the input or the function itself. I chose to input the data frame mtcars saved as mycars into the the tukey_multiple() function and received the error message seen below.

From this error, I can tell that the dimension cannot be of length 32, but needs to be logical(1). So, I made a new data frame saving only the first row and called it firstCar to test in the function. I received the error message that the function tukey.outlier() could not be found.

This lets me know that tukey_outlier() has not been defined which is the major bug in the tukey_multiple() function. After commenting out that function, the tukey_function() works on mycars.

Based on the TRUE or FALSE output per cell, I think tukey.outlier() is supposed to return TRUE or FALSE depending on the Tukey test requirements for an outlier. I am not sure how to define tukey.outlier(), but defining this should fix the issue.

I found debugging in this scenario especially difficult because I did not understand the steps or purpose of the code very well. If I knew what tukey.outlier() is meant to do or what tukey.multiple() is meant to do, I may have had more options to fix the debugging issue more completely. Debugging is an iterative process, but it is rewarding to find the issue.

Check this out in GitHub!

-Ramya's POV

Search This Blog

R Programming: Ramya's POV

Week 11: Debugging and defensive programming

Comments

Post a Comment

Popular posts from this blog

Week 6: Doing Math P2