Syllabus

GLHLTH 562, Duke University, Spring, 2026

Teaching Team

Eric Green, PhD Jinyou Sheng, TA

Course Description

This course introduces core concepts and tools in data science and data visualization, with a particular emphasis on applications in health and global health. Using R as our primary analytic environment, we will focus on how to move from raw data to meaningful, well-communicated insights through a combination of data wrangling, visualization, modeling, and reproducible workflows.

We will begin with storytelling and visualization, treating plots not as decorative outputs but as tools for thinking, diagnosing data issues, and communicating claims. From there, the course develops the practical data wrangling skills that underlie all data science work, including data import, transformation, tidying, joining, and working with text, dates, and iterative workflows. Throughout, we emphasize that these steps are not merely technical, but substantive choices that shape what questions can be asked and answered.

Once you have a working foundation in R, the course explicitly incorporates generative AI as part of the data science workflow. Rather than treating AI as a shortcut or replacement for understanding, we will focus on how to use AI tools responsibly and effectively—to generate code, explain unfamiliar constructs, explore alternatives, and critique results—while retaining accountability for analytic decisions and outputs. Ethical considerations around data use, bias, and responsibility are woven into this discussion.

In the second half of the course, we introduce statistical modeling using the tidy models framework, with an emphasis on prediction-oriented thinking, model evaluation, and interpretation. The goal is not exhaustive coverage of statistical methods, but a clear understanding of what models do, what assumptions they rely on, and how to assess their usefulness in applied health contexts.

The course concludes by shifting from analysis to application. You will learn how data analyses become data products—such as reproducible reports, interactive tools, or simple services—by exploring reproducible workflows, APIs, interactive products, and basic concepts of architecture and deployment. This unit emphasizes durability, usability, and trust: how to create analytic outputs that others can run, understand, and use responsibly.

By the end of the semester, you should be able to participate meaningfully in a team-based data science project: acquiring and wrangling data, exploring and visualizing patterns, building and evaluating models, using AI tools thoughtfully, and communicating results clearly and reproducibly.

Schedule

Week Date Unit Topic
0 Jan 8 - Welcome
1 Jan 13 Storytelling with data Foundations
1 Jan 15 Storytelling with data Foundations
2 Jan 20 Data visualization Plotting in R
2 Jan 22 Data visualization Plotting in R
3 Jan 27 Data wrangling Data import and transformation
3 Jan 29 Data wrangling Tidy and relational data
4 Feb 3 Data wrangling Strings, factors, date/time
4 Feb 5 Data wrangling Iteration
5 Feb 10 Data wrangling Putting it all together
5 Feb 12 Data wrangling Skills assessment
6 Feb 17 AI Data science ethics
6 Feb 19 AI AI tools and workflows
7 Feb 24 Models Tidy models, part 1
7 Feb 26 Models Tidy models, part 2
8 Mar 3 Models Tidy models, part 3
8 Mar 5 Models Tidy models, part 4
9 Mar 17 Building data products Reproducible workflows
9 Mar 19 Building data products APIs
10 Mar 24 Building data products Interactive products
10 Mar 26 Building data products Architecture and deployment
11 Mar 31 Projects Work session
11 Apr 2 Projects Work session
12 Apr 7 Projects Presentations Day 1
12 Apr 9 Projects Presentations Day 2
13 Apr 14 Projects Presentations Day 3

Assignments will be posted the Saturday before each week begins.

Requirements

There are no prerequisites to take this course aside from a good dose of curiosity and interest in learning about R, data visualization, and data science. All you need to participate is a computer with an internet connection. Our tech stack will include:

We’ll work our way through most of R for Data Science by Hadley Wickham and Garrett Grolemund (R4DS). You can use the free online version or order a physical copy from Amazon.

We’ll also read several chapters in Professor Kieran Healy’s book Data Visualization (SOCVIZ). You can use the free online version or order a physical copy from Amazon (it’s a beautiful book that should probably be on your bookshelf at some point).

It’s important that you read the chapters AND run the code as you go. If you read without coding you are likely to trick yourself into thinking you’ve learned more than you have. This type of material requires most of us to be active learners. Read/Type/Run > Read/Copy/Paste/Run > Read only.

Format

You’ll prepare for each class session by reading and practicing. During class, we’ll divide our time discussing concepts and running code. I’ll often suggest opportunities for you to continue practicing after class.

Weekly Assignments

Final Project

You will also work solo or in small teams on final project. It’s possible you could bring your own data and question, but the bar will be very high. Expect to work on a pre-approved topic. Details forthcoming.

Exams

There are no exams (including no final exam).

Grading

You will be evaluated on the basis of your weekly assignments (50%) and independent project (50%). Ranges for letter grades will be set at the end of the semester. Cumulative scores of at least 90, 80, and 70 will be guaranteed at least an A-, B-, and C-, respectively.

Policies

Accomodations

If you require any accommodations or arrangements (hearing, vision, English language comprehension, extenuating personal or family circumstances, etc.) please let me know as soon as possible and contact the Student Disability Access Office (SDAO) to ensure the accommodation can be implemented in a timely fashion.