Skip to contents

Overview

Every year on February 2, Groundhog Day, the famous groundhog Punxsutawney Phil—and a growing cast of creatures (including stuffed animals, sock puppets, and mascots)—emerge from their burrows to predict the weather. Punxsutawney Phil, the prognosticator of prognosticators, has been predicting the weather since 1887. If he sees his shadow, his prediction is for six more weeks of winter. If not, he predicts an early spring.

This package brings together prediction data from Countdown to Groundhog Day and weather data from Open-Meteo to help you evaluate which prognosticators you can trust.

Installation

# Install from GitHub
devtools::install_github("ericpgreen/feb2")

Usage

Currently this is a function-less data package designed to facilitate your analyses of Groundhog Day prediction data.

Datasets

There are three main datasets and several supporting datasets.

erDiagram
    prognosticators {
        string prognosticator_slug PK
        string prognosticator_name
        string prognosticator_city
        float prognosticator_lat
        float prognosticator_long
        string prognosticator_type
        string prognosticator_creature
        string Status
    }
    predictions {
        string prognosticator_slug FK
        int year
        string prediction
        int predict_early_spring
    }
    class_def1 {
        string prognosticator_city FK
        int year
        string class
    }
    class_def1_data {
        string prognosticator_city FK
        int year
        int month
        float tmax_monthly_mean_f
        float tmax_monthly_mean_f_15y
        string class
    }
    prognosticators ||--o{ predictions : "has"
    prognosticators ||--o{ class_def1 : "location"
    prognosticators ||--o{ class_def1_data : "location"
    class_def1_data ||--|| class_def1 : "summarizes"

prognosticators

Michael Venos, the creator of Countdown to Groundhog Day, maintains the internet’s most comprehensive database about Groundhog Day predictions. He generously agreed to allow me to incorporate his data into this package. Currently he has data on a diverse collection of over 300 prognosticators.

Each prognosticator is uniquely identified by a prognosticator_slug derived from their URL on the Countdown to Groundhog Day website. This allows for accurate linking even when multiple prognosticators share the same name (e.g., there are multiple “Woody” prognosticators in different cities).

predictions

Michael has collected over 2,400 predictions going back to Punxsutawney Phil’s first prediction in 1887. Some years the prediction is uncertain or not recorded, accounting for the NAs.

Predictions are linked to prognosticators via prognosticator_slug, ensuring accurate attribution even for prognosticators with duplicate names.

class_def1

The biggest challenge for evaluating Groundhog Day predictions is defining what we mean by “early spring”. So far in this package I follow the general approach of earlier analyses by NOAA and 538. I define early spring for a prognosticator’s location as one month (February OR March) with an average high temperature above the historical average for that month.1 Unlike the previous analyses, however, I use local data for each prognosticator. NOAA used U.S. national temperatures, and 538 looked across nine U.S. regions.2 I think it’s just silly to expect a real or stuffed groundhog to be able to predict national or regional weather based on localized sunshine. I say let’s evaluate their powers of prognostication using local data.3

Weather Data

Weather data comes from Open-Meteo’s Historical Weather API, which provides ERA5 reanalysis data back to 1940. Each prognosticator’s city is geocoded to coordinates, and daily maximum temperatures are retrieved directly for those coordinates.

For Punxsutawney Phil’s predictions from 1887-1939 (before Open-Meteo coverage), weather data comes from nearby NOAA GHCND weather stations.

Classification Steps

  1. Geocode each prognosticator’s city to latitude/longitude coordinates
  2. Query Open-Meteo Historical API for daily maximum temperatures (February and March, 1940-present)
  3. Calculate the mean monthly high temperature for each location
  4. Calculate the 15-year rolling mean high monthly temperature
  5. Use the def1 definition to classify each year as “early spring” or “long winter”

The class_def1 dataset contains one row per city-year with the classification. The class_def1_data dataset contains the underlying monthly temperature data and rolling averages.

Updates

I plan to update the prognosticators, predictions, and classifications data after the month of March (classification definition def1 depends on March weather data).

Data Use

Open-Meteo weather data is available under CC BY 4.0. Data on prognosticators and their predictions come with permission from Countdown to Groundhog Day. You are welcome to use the data via this package for any purpose, but please do not post the raw data on any other public sites. Instead, give credit to Michael’s tremendous effort by pointing back to Countdown to Groundhog Day.

Hex Sticker

The groundhog pixel art is a DALL-E 2 creation.

Issues

Please submit an issue if you encounter any bugs or errors. This package comes with no warranty of any kind. Don’t rely on me or these rodents to get it right. Though my family did live in Punxsutawney when I was 4, and I have been to Gobbler’s Knob.

Me visting Gobbler’s Knob as a child.
Me visting Gobbler’s Knob as a child.