Overview
Every year on February 2, Groundhog Day, the famous groundhog Punxsutawney Phil—and a growing cast of creatures (including stuffed animals, sock puppets, and mascots)—emerge from their burrows to predict the weather. Punxsutawney Phil, the prognosticator of prognosticators, has been predicting the weather since 1887. If he sees his shadow, his prediction is for six more weeks of winter. If not, he predicts an early spring.
This package brings together prediction data from Countdown to Groundhog Day and weather data from Open-Meteo to help you evaluate which prognosticators you can trust.
Installation
# Install from GitHub
devtools::install_github("ericpgreen/feb2")Usage
Currently this is a function-less data package designed to facilitate your analyses of Groundhog Day prediction data.
Datasets
There are three main datasets and several supporting datasets.
erDiagram
prognosticators {
string prognosticator_slug PK
string prognosticator_name
string prognosticator_city
float prognosticator_lat
float prognosticator_long
string prognosticator_type
string prognosticator_creature
string Status
}
predictions {
string prognosticator_slug FK
int year
string prediction
int predict_early_spring
}
class_def1 {
string prognosticator_city FK
int year
string class
}
class_def1_data {
string prognosticator_city FK
int year
int month
float tmax_monthly_mean_f
float tmax_monthly_mean_f_15y
string class
}
prognosticators ||--o{ predictions : "has"
prognosticators ||--o{ class_def1 : "location"
prognosticators ||--o{ class_def1_data : "location"
class_def1_data ||--|| class_def1 : "summarizes"
prognosticators
Michael Venos, the creator of Countdown to Groundhog Day, maintains the internet’s most comprehensive database about Groundhog Day predictions. He generously agreed to allow me to incorporate his data into this package. Currently he has data on a diverse collection of over 300 prognosticators.
Each prognosticator is uniquely identified by a prognosticator_slug derived from their URL on the Countdown to Groundhog Day website. This allows for accurate linking even when multiple prognosticators share the same name (e.g., there are multiple “Woody” prognosticators in different cities).
predictions
Michael has collected over 2,400 predictions going back to Punxsutawney Phil’s first prediction in 1887. Some years the prediction is uncertain or not recorded, accounting for the NAs.
Predictions are linked to prognosticators via prognosticator_slug, ensuring accurate attribution even for prognosticators with duplicate names.
class_def1
The biggest challenge for evaluating Groundhog Day predictions is defining what we mean by “early spring”. So far in this package I follow the general approach of earlier analyses by NOAA and 538. I define early spring for a prognosticator’s location as one month (February OR March) with an average high temperature above the historical average for that month.1 Unlike the previous analyses, however, I use local data for each prognosticator. NOAA used U.S. national temperatures, and 538 looked across nine U.S. regions.2 I think it’s just silly to expect a real or stuffed groundhog to be able to predict national or regional weather based on localized sunshine. I say let’s evaluate their powers of prognostication using local data.3
Weather Data
Weather data comes from Open-Meteo’s Historical Weather API, which provides ERA5 reanalysis data back to 1940. Each prognosticator’s city is geocoded to coordinates, and daily maximum temperatures are retrieved directly for those coordinates.
For Punxsutawney Phil’s predictions from 1887-1939 (before Open-Meteo coverage), weather data comes from nearby NOAA GHCND weather stations.
Classification Steps
- Geocode each prognosticator’s city to latitude/longitude coordinates
- Query Open-Meteo Historical API for daily maximum temperatures (February and March, 1940-present)
- Calculate the mean monthly high temperature for each location
- Calculate the 15-year rolling mean high monthly temperature
- Use the
def1definition to classify each year as “early spring” or “long winter”
The class_def1 dataset contains one row per city-year with the classification. The class_def1_data dataset contains the underlying monthly temperature data and rolling averages.
Updates
I plan to update the prognosticators, predictions, and classifications data after the month of March (classification definition def1 depends on March weather data).
Data Use
Open-Meteo weather data is available under CC BY 4.0. Data on prognosticators and their predictions come with permission from Countdown to Groundhog Day. You are welcome to use the data via this package for any purpose, but please do not post the raw data on any other public sites. Instead, give credit to Michael’s tremendous effort by pointing back to Countdown to Groundhog Day.
Hex Sticker
The groundhog pixel art is a DALL-E 2 creation.
Issues
Please submit an issue if you encounter any bugs or errors. This package comes with no warranty of any kind. Don’t rely on me or these rodents to get it right. Though my family did live in Punxsutawney when I was 4, and I have been to Gobbler’s Knob.

