Final Project

Overview

Your final project is to build a data product – not a traditional analysis and write-up.

A data product is something that takes data as input, processes it, and delivers value to an end user. It is not a static document with tables and figures. It is a working thing – something someone could run, interact with, or query.

You may work alone or in pairs. We prefer pairs.

What Makes It a Data Product

Your product must accept user input and include at least one of the following capabilities. These are what distinguish a data product from a traditional analysis.

User Input (Required)

Every data product must accept input from a user and return something tailored to that input. The user makes a choice, and the product responds. This could be a Shiny app with dropdown menus, a parameterized report where the user specifies a region, or an API endpoint where a user submits data and gets back a result.

If a user cannot interact with it, it is a report, not a product.

Plus At Least One Of:

API Integration

Your pipeline programmatically consumes or serves data through an API. The product does not start from a static CSV you downloaded by hand – it fetches data at runtime.

Examples:

GenAI Model in the Pipeline

A generative AI model operates on your data as a functional component of the pipeline. The LLM does real work – classifying, extracting, summarizing, or generating – not just helping you write code.

Examples:

Automation

Your pipeline runs on a schedule without human intervention. The product does its job autonomously – fetching fresh data, processing it, and producing output.

Examples:

Combining Capabilities

You are encouraged to combine. A Shiny app (user input) that pulls from an API (API integration) and uses an LLM to summarize results (GenAI) is ambitious and excellent. But user input plus one capability done well is sufficient.

What Does NOT Count

A traditional analysis-and-write-up will not meet the requirements, regardless of quality. The following do not qualify on their own:

These are analyses. The difference is what happens at runtime. If the only thing that happens is rendering a document from a static dataset, it is not a data product.

Pipeline Documentation

Every project must include clear documentation of the data pipeline in your README.md. This is not a methods section. It is a technical document that describes:

  1. Where the data comes from – API endpoint, database, user input, file
  2. How the data is ingested – packages, authentication, format
  3. How the data is processed – cleaning, joining, transforming, modeling
  4. What the output is – dashboard, app, report, predictions, API responses
  5. How someone else could run it – dependencies, environment setup, API keys, deployment steps

Think of it as the document you would hand to a colleague who needs to maintain your product after you leave.

Data

You are free to use any data source. Some starting points:

If your data includes protected health information, you must use the synthpop package to create a synthetic version.

Deliverables

1. Proposal (due March 17)

A short document (2 pages) covering:

This is a checkpoint, not a contract. Your project can evolve. But this document needs to be thorough. It’s your best chance to kick the tires and make sure you’ve got a good plan.

2. Working Product

The thing itself. This could be:

We should be able to run it or access it. If your product requires API keys or credentials, include instructions for setup.

3. Pipeline Documentation

Your README.md covering the five pipeline components described above.

4. Presentation (April 7, 9, or 14)

5 minutes maximum. Demo the product. Show us what it does, not just what you found.

Your presentation should cover:

If you work with a partner, both of you must present. Prepare slides using Quarto. The presentation order will be randomized and posted in advance.

5. Peer Evaluation

A brief evaluation rating your partner’s contribution (if working in a pair) and providing feedback on other teams’ products.

Project Structure

Submit via GitHub repository (preferred) or zip file:

project/
├── README.md              # Pipeline documentation
├── app.R or index.qmd     # Your product (varies by type)
├── R/                     # Supporting scripts or functions
├── data/                  # Raw or cached data (if applicable)
├── deck/                  # Presentation slides (qmd + html)
└── .gitignore             # Exclude API keys, large files, etc.

Do not commit API keys or secrets to your repository. Use environment variables or a .env file that is gitignored.

Grading

Total 100 pts
Proposal 10 pts
Product 40 pts
Pipeline documentation 15 pts
Presentation 20 pts
Peer evaluation 15 pts

Product (40 pts)

Pipeline Documentation (15 pts)

Presentation (20 pts)

Peer Evaluation (15 pts)

Tips

Late Work