Assignment 5: Tidyverse, ggplot2

Due by 05:00 PM on Wednesday, September 20, 2023

To do yourself

To submit on Canvas

Create RMarkdown document with headers, text, and code to answer/visualize questions. Submit both Rmd and knitted PDF. Pay attention to code clarity, variable names, comments.

  • Get names of all packages installed on your computer, check the installed.packages() function. Split package names into characters. Calculate frequency of each letter, case sensitive. Create a horizontal barplot, Y-axis - letters/characters, X-axis - frequency. Make it sorted, so the most frequently occurring letters (highest bars) are on top. Color each bar by its own color. Do not output legend.

  • Download the CDC US Birth dataset from https://github.com/fivethirtyeight/data/tree/master/births. Answer the following questions:

    • Make barplots of most-to-least popular a) day of week, b) day of month, c) month to give birth, irrespectively of year. X-axis - date, Y-axis - number of births. Color by date
    • Visually demonstrate the total birth trend over the years. X-axis - years, Y-axis - total number of births per year.
    • For each a) day of week, b) day of month, c) month, collect the number of births, irrespectively of year. Plot them as a) scatterplots, b) boxplots, with X-axis being the corresponding data (days of week, month, etc.) and Y-axis - the number of births. For scatterplots, fit a smoothing line with the default nonlinear fit. Color by date.
    • Bonus question: Make the time series curve similar to the second plot of the accompanying article.