On April 30, 2021, more than 14 months into the global pandemic, the U.S. published the first ever national dataset on age of those admitted to hospital with COVID-19. Prior releases included the daily total of adult and child admissions, but lacked a decade-by-decade breakdown.

The most obvious shortcoming of this data are states where age is missing for a large proportion of cases. Nationally that number is less than 5% but in some states, especially in earlier months, it can be higher. Hospitals in Washington state failed to report the age of approximately 40% of patients through March, although things have improved drastically since then. That’s around the time another problem in Washington was resolved

This data was sought by The Lund Report in March. See related github issue (now closed) filed on Careset’s github repo. This piece, published May 19, 2021, uses the data to describe issues with Oregon’s vaccine rollout.

Data Source

The raw HHS file “provides state-aggregated data for hospital utilization in a timeseries format dating back to January 1, 2020. These are derived from reports with facility-level granularity across three main sources: (1) HHS TeleTracking, (2) reporting provided directly to HHS Protect by state/territorial health departments on behalf of their healthcare facilities” (and a third collection method used prior to July 2020). The file can be accessed here. This analysis was updated with data as of May 22.

Data vetting

A much larger analysis doc that looks for detectable data errors is available here. Known errors from earlier work on this dataset include the daily counts of hospital admissions for suspected pediatric COVID in WA and OR. It also appears that there’s a substantial number of Tennessee hospitals that are missing many variables, Tennessee is generally not used much although it is included in the national total.

# Libraries


library(rio) # dependency of ggpubr, needs to be explicitly loaded?

# setwd("/Users/jacob/github-whitelabel/covid-kids/juvenile_covid_analysis/hhs/")

#file = "0508_Timeseries.csv"
#file = "0515_Timeseries.csv"
file = "0522_Timeseries.csv"

hospdf <- read.csv(file, header=TRUE, sep=",")
hospdf$dateob = as.Date(hospdf$date)

# Sum total adult COVID: confirmed + suspected
hospdf$total_adult = hospdf$previous_day_admission_adult_covid_confirmed + hospdf$previous_day_admission_adult_covid_suspected

# Sum total pediatric COVID: confirmed + suspected
hospdf$total_ped = hospdf$previous_day_admission_pediatric_covid_confirmed + hospdf$previous_day_admission_pediatric_covid_suspected

hospdf$total_admissions <- hospdf$total_ped + hospdf$total_adult

# Sum the confirmed plus suspected into a single figure by age

hospdf$total_unk <- hospdf$previous_day_admission_adult_covid_suspected_unknown + hospdf$previous_day_admission_adult_covid_confirmed_unknown

hospdf$total_80 <- hospdf$previous_day_admission_adult_covid_confirmed_80. + hospdf$previous_day_admission_adult_covid_suspected_80.

hospdf$total_70 <- hospdf$previous_day_admission_adult_covid_confirmed_70.79 + hospdf$previous_day_admission_adult_covid_suspected_70.79

hospdf$total_60 <- hospdf$previous_day_admission_adult_covid_confirmed_60.69 + hospdf$previous_day_admission_adult_covid_suspected_60.69

hospdf$total_50 <- hospdf$previous_day_admission_adult_covid_confirmed_50.59 + hospdf$previous_day_admission_adult_covid_suspected_50.59

hospdf$total_40 <- hospdf$previous_day_admission_adult_covid_confirmed_40.49 + hospdf$previous_day_admission_adult_covid_suspected_40.49

hospdf$total_30 <- hospdf$previous_day_admission_adult_covid_confirmed_30.39 + hospdf$previous_day_admission_adult_covid_suspected_30.39

hospdf$total_20 <- hospdf$previous_day_admission_adult_covid_confirmed_20.29 + hospdf$previous_day_admission_adult_covid_suspected_20.29

hospdf$total_1819 <- hospdf$previous_day_admission_adult_covid_confirmed_18.19 + hospdf$previous_day_admission_adult_covid_suspected_18.19

hospdf$unknown_percent <- 100*( (hospdf$previous_day_admission_adult_covid_suspected_unknown + hospdf$previous_day_admission_adult_covid_confirmed_unknown)/hospdf$total_admissions)

hospdf$X80_percent <-100*(hospdf$total_80/hospdf$total_admissions)

hospdf$X70_percent <-100*(hospdf$total_70/hospdf$total_admissions)
hospdf$X60_percent <-100*(hospdf$total_60/hospdf$total_admissions)
hospdf$X50_percent <-100*(hospdf$total_50/hospdf$total_admissions)
hospdf$X40_percent <-100*(hospdf$total_40/hospdf$total_admissions)
hospdf$X30_percent <-100*(hospdf$total_30/hospdf$total_admissions)
hospdf$X20_percent <-100*(hospdf$total_20/hospdf$total_admissions)

Raw hospitalization fraction, 80+ by state

hospdf  %>% ggplot(aes(x=dateob,y=X80_percent)) + geom_line() + facet_wrap( ~ state, scales = "free" )  + xlab("") + xlim(as.Date('2020-07-01'), as.Date('2021-05-23'))
## Warning: Removed 143 row(s) containing missing values (geom_path).