Week 8: Inputs and Outputs

Hi everyone!

This week we learned about inputs and outputs, string manipulation, and the plyr package in R. Among the functions learned, read.table() and ddply() are particularly interesting. The function read.table() can be used to read in a text file to use in R. The ddply() function can be used to split by a category such as gender to order data, while you can also modify this new data frame using transform and calculate a new column. 

Please following steps 1-3
Step # 1
  Download Import assignment 6 Data-set to R. Then, Run the commend "mean" using Sex as the category (use plyr package for this operation). Last commend in this step: write the resulting output to a file.



In data frame y, you can see the students are separated into females first and males second with the new grade average column added.








Step # 2 Convert the data set to a dataframe for names whos' name contains the letter i, then create a new data set with those names, Write those names to a file separated by comma’s (CSV) + Step # 3 Write the filtered data set and convert it to CSV file

Here the students are filtered based on their names containing i.









From this gender separation and grade averages, we can tell there may be a connection between gender and grades. So, to explore this further I first created two sub data frames by gender. Since age has not been explored yet, I created a box plot capturing the difference in ages for the females and males in this dataset. From this box plot, we learn that there is a wider range of ages for males than the females. Interestingly, the mean is very similar in both groups. 








Next, I looked at the grade distribution for each sex utilizing another bar plot. From this we can tell the mean grades for females is much higher than the males. However, to understand how many of each gender were doing better than their own averages I used the count function. From this I learned more than half of the females were above average, while only half of the males were above average. It is interesting to see how a smaller age range may be associated with better grades for the females, while a smaller number of males varying largely in their age may not represent above average grades. I find the ddply() function to be the most useful to glean results that can be explored further through visualizations such as box plots.

Check this out in GitHub!

-Ramya's POV


Comments

Popular posts from this blog

Week 6: Doing Math P2