This course will introduce you to data science and data visualization in R. The core content of the course focuses on data acquisition and wrangling, exploratory data analysis, data visualization, modeling, and effective communication of results. My goal is to bring you from zero to being able to work in a team on a fully reproducible data science project analyzing a dataset of your choice and answering questions you care about.

Teaching Team

Eric Green, PhD

Topics

We’ll start with the good stuff—data visualization. You’ll learn how to break down a plot into its component parts, imagine the underlying data structure, and create effective visuals.

Next, we’ll dig into the core data wrangling skills required in every data science project. You’ll learn about data transformation, import, tidying, joining, strings, factors, dates, and iteraton—oh my.

By this point of the course you will have enough of a foundation in R to learn about modeling, and I’ll introduce you to the tidy models framework. As we talk about machine learning, we’ll also spend time thinking about data science ethics and algorithmic bias.

Following modeling, we’ll zoom out and discuss the importance of having reproducible workflows. I’ll introduce you to literate programming and the world of RMarkdown. We’ll spend time learning about how to create two specific outputs: dashboards and slide decks.

We’ll finish the course with some topics of my choosing. Maybe we’ll explore maps and text mining. Or maybe we’ll take a beat and revisit topics that tripped us up along the way. The final three sessions will be devoted to your course projects.

Schedule

Week	Date	Unit	Topic
0	Jan 11	-	Welcome
1	Jan 16	Data visualization	The grammar of graphics and ggplot2 API
1	Jan 18	Data visualization	Designing effective plots
2	Jan 23	Data visualization	Common geoms and plot styles to know
2	Jan 25	Data visualization	Small multiples and patchwork
3	Jan 30	Data wrangling	Data import
3	Feb 1	Data wrangling	Data transformation
4	Feb 6	Data wrangling	Tidy data
4	Feb 8	Data wrangling	Relational data
5	Feb 13	Data wrangling	Strings
5	Feb 15	Data wrangling	Factors
6	Feb 20	Data wrangling	Dates and times
6	Feb 22	Data wrangling	Iteration
7	Feb 27	Data wrangling	Skills assessment
7	Feb 29	Models	Data science ethics
8	Mar 5	Models	Building models
8	Mar 7	Models	Tidy models introduction, part 1
9	Mar 19	Models	Tidy models introduction, part 2
9	Mar 21	Models	Tidy models introduction, part 3
10	Mar 26	Communicating data science	Reproducible workflows
10	Mar 28	Communicating data science	(Asynchronous class) Dashboards
11	Apr 2	Communicating data science	Slide decks
11	Apr 4	Chef’s choice	Maps / Text mining?
12	Apr 9	Projects	Work session
12	Apr 11	Projects	Presentations Day 1
13	Apr 16	Projects	Presentations Day 2

Assignments will be posted the Saturday before each week begins.

Requirements

There are no prerequisites to take this course aside from a good dose of curiosity and interest in learning about R, data visualization, and data science. All you need to participate is a computer with an internet connection. Our tech stack will include:

R
RStudio Desktop
Zoom for remote learning (if the situation arises) and office hours
Ed for course discussion

Format

You’ll prepare for each class session by reading and practicing. During class, we’ll divide our time discussing concepts and running code. I’ll often suggest opportunities for you to continue practicing after class.

Weekly Assignments

Each week you will explore a new Tidy Tuesday dataset. The R4DS Online Learning Community posts a new dataset each Monday. We’ll leave time in our Tuesday sessions to review the dataset, and then you will have until Sunday to explore it. I’ll provide you with a template for reporting. In the beginning of the course when your skills are more limited, I’ll ask you to think more about the data and what you want to do. As the semester goes on, I’ll expect that your explorations become more sophisticated. In addition to exploring the datasets on your own, I’ll also ask you to find a #tidytuesday entry on Twitter, examine the author’s code and describe what you learned from them.
Mid-semester skills/comprehension check-in (replaces 2 weeks of TidyTuesday)

Final Project

You will also complete a final project alone or with partners using a dataset of your choosing. Details here.

Exams

There are no exams (including no final exam).

Grading

You will be evaluated on the basis of your weekly assignments (50%) and independent project (50%). Ranges for letter grades will be set at the end of the semester. Cumulative scores of at least 90, 80, and 70 will be guaranteed at least an A-, B-, and C-, respectively.

Policies

If you miss an assignment deadline (Sundays at 10pm Eastern), get in touch so we can discuss if a make-up or extension is possible. Missing more than one deadline could result in grade penalties. Class sessions will be recorded.
Students should abide by the Duke Community Standard at all times. If a questionable circumstance arises, do not hesitate to seek my guidance (before is always better than after).

Equity and Inclusion at DGHI

In 2020 DGHI formed the Equity Task Force to identify and address structural inequities related to global power dynamics, race, ethnicity, gender, and all marginalized identities throughout the institute. See here for more information on this work.

If you require any accommodations or arrangements (hearing, vision, English language comprehension, extenuating personal or family circumstances, etc.) please let me know ASAP and, if relevant, please contact the Student Disability Access Office (SDOA) to ensure the accommodation/arrangements can be implemented in a timely fashion.

Syllabus