Gold standard data

The data-truth folder contains the "gold standard" data that forecasts are eventually compared to.

Table of Contents

Data sources
Hospitalization data
Accessing gold standard data

Data sources

Influenza hospitalization data are taken from the HealthData.gov COVID-19 Reported Patient Impact and Hospital Capacity by State Timeseries.

Please note the following detail from the dataset description:

"The file will be updated regularly and provides the latest values reported by each facility within the last four days for all time. This allows for a more comprehensive picture of the hospital utilization within a state by ensuring a hospital is represented, even if they miss a single day of reporting."

This implies that some values may be repeated. Extra caution should be applied in these cases and in particular for interpreting data for the current day, as hospitals report hospital admissions for the previous day (further detail below).

Some of these data are also available programmatically through the EpiData API.

Hospitalization data

HealthData.gov Hospitalization Timeseries

The gold standard data that hospitalization forecasts (inc hosp targets) will be evaluated against are the HealthData.gov COVID-19 Reported Patient Impact and Hospital Capacity by State Timeseries. These data are released weekly.

Previously collected influenza data from the 2020-21 influenza season (Fields 33-38) are included in the COVID-19 Reported Patient Impact and Hospital Capacity by State Timeseries dataset. This dataset is updated regularly based on data reported through the day prior. Therefore, datasets updated on Monday will include data reported through the immediately preceding Sunday, and this dataset will capture influenza hospital admissions that occurred through Saturday (see the data processing section for more information).

Reporting of the influenza fields 33-35 became mandatory in February 2022, and additional details are provided in the current hospital reporting guidance and FAQs. Numbers of reporting hospitals increased after the period that reporting became mandatory in early 2022 but have since stabilized at high levels of compliance. The number of hospitals reporting these data each day by state are available in the previous_day_admission_influenza_confirmed_coverage variable found in the COVID-19 Reported Patient Impact and Hospital Capacity by State Timeseries dataset.

These data are also available in a facility-level dataset; data values less than 4 are suppressed in the facility-level dataset. Additional historical influenza surveillance data from other surveillance systems are available at https://www.cdc.gov/flu/weekly/fluviewinteractive.htm. These data are updated every Friday at noon Eastern Time. The "cdcfluview" R package can be used to retrieve these data. Additional potential data sources are available in Carnegie Mellon University's Epidata API.

Data processing

The hospitalization truth data is computed based on the previous_day_admission_influenza_confirmed field which provides the new daily admissions with a confirmed diagnosis of influenza.

Since these admission data are listed as “previous day” admissions in the raw data, the truth data shifts values in the date column one day earlier so that inc hosp align with the date the admissions occurred.

As an example, the following data from HealthData.gov

   date    | previous_day_admission_influenza_confirmed 
-----------|--------------------------------------------
2020-10-30 |                  5

would turn into the following observed data for daily incident hospitalizations

   date    | incident_hospitalizations
-----------|----------------------------
2020-10-29 |          5

National hospitalization, i.e. US, data are constructed from these data by summing the data across all 50 states, Washington DC (DC), Puerto Rico(PR), and the US Virgin Islands (VI). The HHS data do not include admissions for additional territories.

Daily admission counts are then aggregated into epidemiological weeks.

For week-ahead forecasts, we will use the specification of epidemiological weeks (EWs) defined by the US CDC which run Sunday through Saturday. For example, a 1-week-ahead forecast made for the Forecast Due Date of Monday, November 28, 2022, would correspond to EW48, which ends on (i.e., has a target_end_date of Saturday, December 3, 2022). A 2-week-ahead forecast made for that date would correspond to EW49 and have a target_end_date of Saturday, December 10, 2022. There are standard software packages to convert from dates to epidemic weeks and vice versa (e.g. MMWRweek for R and pymmwr and epiweeks for Python).

Additional resources

Here are a few additional resources that describe these hospitalization data:

data dictionary for the dataset
the official document describing the “guidance for hospital reporting”

Accessing gold standard data

While we make efforts to create accurate, verified, clean versions of the gold standard data, these should be seen as secondary sources to the original data at the HHS Protect site.

CSV files

A set of comma-separated plain text files are automatically updated every week with the latest observed values for incident hospitalizations. A corresponding CSV file is created in data-truth/truth-Incident Hospitalizations.csv.

Resources for Accessing Hospitalization Data

Our collaborators at the Delphi Group at CMU have provided resources to make these data (as well as archived versions) available through their Delphi Epidata API. The current weekly timeseries of the hospitalization data as well as prior versions of the data are available under the "covidcast" endpoint of the API. In particular, under the "hhs" data source name, there are flu-related HHS signals:

Confirmed Influenza Admissions per day confirmed_addmissions_influenza_1d
Confirmed Influenza Admissions (smoothed with a 7 day trailing average) confirmed_admissions_influenza_1d_7dav

Also under the "covidcast" endpoint, under the "chng" data source name, there are signals pertaining to confirmed influenza from outpatient visits:

Confirmed Influenza from Doctor's Visits smoothed_outpatient_flu
Confirmed Influenza from Doctors'Visits (with weekday adjustment) smoothed_adj_outpatient_flu

Other related and potentially helpful endpoints of the Epidata API include:

COVID-19 Hospitalization by State
COVID-19 Hospitalization by Facility
COVID-19 Hospitalization: Facility Lookup

To access these data, teams can utilize the COVIDCast Rpackage or Python package.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Gold standard data

Data sources

Hospitalization data

HealthData.gov Hospitalization Timeseries

Data processing

Additional resources

Accessing gold standard data

CSV files

Resources for Accessing Hospitalization Data

Files

README.md

Latest commit

History

README.md

File metadata and controls

Gold standard data

Data sources

Hospitalization data

HealthData.gov Hospitalization Timeseries

Data processing

Additional resources

Accessing gold standard data

CSV files

Resources for Accessing Hospitalization Data