Syllabus

GLHLTH 562, Duke University, Spring, 2024

This course will introduce you to data science and data visualization in R. The core content of the course focuses on data acquisition and wrangling, exploratory data analysis, data visualization, modeling, and effective communication of results. My goal is to bring you from zero to being able to work in a team on a fully reproducible data science project analyzing a dataset of your choice and answering questions you care about.

Teaching Team

Eric Green, PhD

Topics

We’ll start with the good stuff—data visualization. You’ll learn how to break down a plot into its component parts, imagine the underlying data structure, and create effective visuals.

Next, we’ll dig into the core data wrangling skills required in every data science project. You’ll learn about data transformation, import, tidying, joining, strings, factors, dates, and iteraton—oh my.

By this point of the course you will have enough of a foundation in R to learn about modeling, and I’ll introduce you to the tidy models framework. As we talk about machine learning, we’ll also spend time thinking about data science ethics and algorithmic bias.

Following modeling, we’ll zoom out and discuss the importance of having reproducible workflows. I’ll introduce you to literate programming and the world of RMarkdown. We’ll spend time learning about how to create two specific outputs: dashboards and slide decks.

We’ll finish the course with some topics of my choosing. Maybe we’ll explore maps and text mining. Or maybe we’ll take a beat and revisit topics that tripped us up along the way. The final three sessions will be devoted to your course projects.

Schedule

Week Date Unit Topic
0 Jan 11 - Welcome
1 Jan 16 Data visualization The grammar of graphics and ggplot2 API
1 Jan 18 Data visualization Designing effective plots
2 Jan 23 Data visualization Common geoms and plot styles to know
2 Jan 25 Data visualization Small multiples and patchwork
3 Jan 30 Data wrangling Data import
3 Feb 1 Data wrangling Data transformation
4 Feb 6 Data wrangling Tidy data
4 Feb 8 Data wrangling Relational data
5 Feb 13 Data wrangling Strings
5 Feb 15 Data wrangling Factors
6 Feb 20 Data wrangling Dates and times
6 Feb 22 Data wrangling Iteration
7 Feb 27 Data wrangling Skills assessment
7 Feb 29 Models Data science ethics
8 Mar 5 Models Building models
8 Mar 7 Models Tidy models introduction, part 1
9 Mar 19 Models Tidy models introduction, part 2
9 Mar 21 Models Tidy models introduction, part 3
10 Mar 26 Communicating data science Reproducible workflows
10 Mar 28 Communicating data science (Asynchronous class) Dashboards
11 Apr 2 Communicating data science Slide decks
11 Apr 4 Chef’s choice Maps / Text mining?
12 Apr 9 Projects Work session
12 Apr 11 Projects Presentations Day 1
13 Apr 16 Projects Presentations Day 2

Assignments will be posted the Saturday before each week begins.

Requirements

There are no prerequisites to take this course aside from a good dose of curiosity and interest in learning about R, data visualization, and data science. All you need to participate is a computer with an internet connection. Our tech stack will include:

Format

You’ll prepare for each class session by reading and practicing. During class, we’ll divide our time discussing concepts and running code. I’ll often suggest opportunities for you to continue practicing after class.

Weekly Assignments

Final Project

You will also complete a final project alone or with partners using a dataset of your choosing. Details here.

Exams

There are no exams (including no final exam).

Grading

You will be evaluated on the basis of your weekly assignments (50%) and independent project (50%). Ranges for letter grades will be set at the end of the semester. Cumulative scores of at least 90, 80, and 70 will be guaranteed at least an A-, B-, and C-, respectively.

Policies

Equity and Inclusion at DGHI

In 2020 DGHI formed the Equity Task Force to identify and address structural inequities related to global power dynamics, race, ethnicity, gender, and all marginalized identities throughout the institute. See here for more information on this work.

If you require any accommodations or arrangements (hearing, vision, English language comprehension, extenuating personal or family circumstances, etc.) please let me know ASAP and, if relevant, please contact the Student Disability Access Office (SDOA) to ensure the accommodation/arrangements can be implemented in a timely fashion.