Data wrangling
This week continues building data wrangling skills, focusing on three common data types that require special handling—strings, factors, and dates/times—as well as techniques for iteration that allow you to apply the same operation across multiple columns, files, or datasets efficiently.
This session will introduce tools for working with three data types that often require special handling: strings, factors, and dates/times. Using the stringr package, you will learn how to manipulate text data—combining, extracting, and parsing strings into structured columns. The session will cover the forcats package for working with categorical variables, including techniques for reordering factor levels to improve visualizations and recoding categories for clearer presentation. Finally, you will learn how to parse, extract components from, and perform calculations with dates and times using lubridate. By the end of the session, you will be able to clean messy text fields, prepare categorical variables for analysis and plotting, and handle temporal data in principled ways.
This session will introduce functional programming tools for iteration that allow you to apply the same operation across multiple columns, files, or datasets without writing repetitive code. Using the purrr package and dplyr’s across() function, you will learn how to transform multiple columns simultaneously, read and combine batches of files, and save multiple outputs efficiently. The session will also demonstrate how to combine iteration with statistical modeling using the broom package, enabling you to run and summarize many models at once. By the end of the session, you will be able to replace repetitive copy-paste workflows with concise, readable, and maintainable code.
Groundhog Day Data!