GLHLTH 562, Duke University, Spring, 2026

Eric Green, PhD Jinyou Sheng, TA
This course introduces core concepts and tools in data science and data visualization, with a particular emphasis on applications in health and global health. Using R as our primary analytic environment, we will focus on how to move from raw data to meaningful, well-communicated insights through a combination of data wrangling, visualization, modeling, and reproducible workflows.
We will begin with storytelling and visualization, treating plots not as decorative outputs but as tools for thinking, diagnosing data issues, and communicating claims. From there, the course develops the practical data wrangling skills that underlie all data science work, including data import, transformation, tidying, joining, and working with text, dates, and iterative workflows. Throughout, we emphasize that these steps are not merely technical, but substantive choices that shape what questions can be asked and answered.
Once you have a working foundation in R, the course explicitly incorporates generative AI as part of the data science workflow. Rather than treating AI as a shortcut or replacement for understanding, we will focus on how to use AI tools responsibly and effectively—to generate code, explain unfamiliar constructs, explore alternatives, and critique results—while retaining accountability for analytic decisions and outputs. Ethical considerations around data use, bias, and responsibility are woven into this discussion.
In the second half of the course, we introduce statistical modeling using the tidy models framework, with an emphasis on prediction-oriented thinking, model evaluation, and interpretation. The goal is not exhaustive coverage of statistical methods, but a clear understanding of what models do, what assumptions they rely on, and how to assess their usefulness in applied health contexts.
The course concludes by shifting from analysis to application. You will learn how data analyses become data products—such as reproducible reports, interactive tools, or simple services—by exploring reproducible workflows, APIs, interactive products, and basic concepts of architecture and deployment. This unit emphasizes durability, usability, and trust: how to create analytic outputs that others can run, understand, and use responsibly.
By the end of the semester, you should be able to participate meaningfully in a team-based data science project: acquiring and wrangling data, exploring and visualizing patterns, building and evaluating models, using AI tools thoughtfully, and communicating results clearly and reproducibly.
| Week | Date | Unit | Topic |
|---|---|---|---|
| 0 | Jan 8 | - | Welcome |
| 1 | Jan 13 | Storytelling with data | Foundations |
| 1 | Jan 15 | Storytelling with data | Foundations |
| 2 | Jan 20 | Data visualization | Plotting in R |
| 2 | Jan 22 | Data visualization | Plotting in R |
| 3 | Jan 27 | Data wrangling | Data import and transformation |
| 3 | Jan 29 | Data wrangling | Tidy and relational data |
| 4 | Feb 3 | Data wrangling | Strings, factors, date/time |
| 4 | Feb 5 | Data wrangling | Iteration |
| 5 | Feb 10 | Data wrangling | Putting it all together |
| 5 | Feb 12 | Data wrangling | Skills assessment |
| 6 | Feb 17 | AI | Data science ethics |
| 6 | Feb 19 | AI | AI tools and workflows |
| 7 | Feb 24 | Models | Tidy models, part 1 |
| 7 | Feb 26 | Models | Tidy models, part 2 |
| 8 | Mar 3 | Models | Tidy models, part 3 |
| 8 | Mar 5 | Models | Tidy models, part 4 |
| 9 | Mar 17 | Building data products | Reproducible workflows |
| 9 | Mar 19 | Building data products | APIs |
| 10 | Mar 24 | Building data products | Interactive products |
| 10 | Mar 26 | Building data products | Architecture and deployment |
| 11 | Mar 31 | Projects | Work session |
| 11 | Apr 2 | Projects | Work session |
| 12 | Apr 7 | Projects | Presentations Day 1 |
| 12 | Apr 9 | Projects | Presentations Day 2 |
| 13 | Apr 14 | Projects | Presentations Day 3 |
Assignments will be posted the Saturday before each week begins.
There are no prerequisites to take this course aside from a good dose of curiosity and interest in learning about R, data visualization, and data science. All you need to participate is a computer with an internet connection. Our tech stack will include:
We’ll work our way through most of R for Data Science by Hadley Wickham and Garrett Grolemund (R4DS). You can use the free online version or order a physical copy from Amazon.
We’ll also read several chapters in Professor Kieran Healy’s book Data Visualization (SOCVIZ). You can use the free online version or order a physical copy from Amazon (it’s a beautiful book that should probably be on your bookshelf at some point).
It’s important that you read the chapters AND run the code as you go. If you read without coding you are likely to trick yourself into thinking you’ve learned more than you have. This type of material requires most of us to be active learners. Read/Type/Run > Read/Copy/Paste/Run > Read only.
You’ll prepare for each class session by reading and practicing. During class, we’ll divide our time discussing concepts and running code. I’ll often suggest opportunities for you to continue practicing after class.
#tidytuesday entry on Twitter, examine the author’s code and describe what you learned from them.You will also work solo or in small teams on final project. It’s possible you could bring your own data and question, but the bar will be very high. Expect to work on a pre-approved topic. Details forthcoming.
There are no exams (including no final exam).
You will be evaluated on the basis of your weekly assignments (50%) and independent project (50%). Ranges for letter grades will be set at the end of the semester. Cumulative scores of at least 90, 80, and 70 will be guaranteed at least an A-, B-, and C-, respectively.
If you require any accommodations or arrangements (hearing, vision, English language comprehension, extenuating personal or family circumstances, etc.) please let me know as soon as possible and contact the Student Disability Access Office (SDAO) to ensure the accommodation can be implemented in a timely fashion.