On April 30, 2021, more than 14 months into the global pandemic, the U.S. published the first ever national dataset on age of those admitted to hospital with COVID-19. Prior releases included the daily total of adult and child admissions, but lacked a decade-by-decade breakdown.
The most obvious shortcoming of this data are states where age is missing for a large proportion of cases. Nationally that number is less than 5% but in some states, especially in earlier months, it can be higher. Hospitals in Washington state failed to report the age of approximately 40% of patients through March, although things have improved drastically since then. That’s around the time another problem in Washington was resolved
This data was sought by The Lund Report in March. See related github issue (now closed) filed on Careset’s github repo. This piece, published May 19, 2021, uses the data to describe issues with Oregon’s vaccine rollout.
The raw HHS file “provides state-aggregated data for hospital utilization in a timeseries format dating back to January 1, 2020. These are derived from reports with facility-level granularity across three main sources: (1) HHS TeleTracking, (2) reporting provided directly to HHS Protect by state/territorial health departments on behalf of their healthcare facilities” (and a third collection method used prior to July 2020). The file can be accessed here. This analysis was updated with data as of May 22.
A much larger analysis doc that looks for detectable data errors is available here. Known errors from earlier work on this dataset include the daily counts of hospital admissions for suspected pediatric COVID in WA and OR. It also appears that there’s a substantial number of Tennessee hospitals that are missing many variables, Tennessee is generally not used much although it is included in the national total.
# Libraries library(ggplot2) library(tidyverse) library(rio) # dependency of ggpubr, needs to be explicitly loaded? library(ggpubr) # setwd("/Users/jacob/github-whitelabel/covid-kids/juvenile_covid_analysis/hhs/") #library(rmarkdown) #file = "0508_Timeseries.csv" #file = "0515_Timeseries.csv" file = "0522_Timeseries.csv" hospdf <- read.csv(file, header=TRUE, sep=",") hospdf$dateob = as.Date(hospdf$date) # Sum total adult COVID: confirmed + suspected hospdf$total_adult = hospdf$previous_day_admission_adult_covid_confirmed + hospdf$previous_day_admission_adult_covid_suspected # Sum total pediatric COVID: confirmed + suspected hospdf$total_ped = hospdf$previous_day_admission_pediatric_covid_confirmed + hospdf$previous_day_admission_pediatric_covid_suspected hospdf$total_admissions <- hospdf$total_ped + hospdf$total_adult # Sum the confirmed plus suspected into a single figure by age hospdf$total_unk <- hospdf$previous_day_admission_adult_covid_suspected_unknown + hospdf$previous_day_admission_adult_covid_confirmed_unknown hospdf$total_80 <- hospdf$previous_day_admission_adult_covid_confirmed_80. + hospdf$previous_day_admission_adult_covid_suspected_80. hospdf$total_70 <- hospdf$previous_day_admission_adult_covid_confirmed_70.79 + hospdf$previous_day_admission_adult_covid_suspected_70.79 hospdf$total_60 <- hospdf$previous_day_admission_adult_covid_confirmed_60.69 + hospdf$previous_day_admission_adult_covid_suspected_60.69 hospdf$total_50 <- hospdf$previous_day_admission_adult_covid_confirmed_50.59 + hospdf$previous_day_admission_adult_covid_suspected_50.59 hospdf$total_40 <- hospdf$previous_day_admission_adult_covid_confirmed_40.49 + hospdf$previous_day_admission_adult_covid_suspected_40.49 hospdf$total_30 <- hospdf$previous_day_admission_adult_covid_confirmed_30.39 + hospdf$previous_day_admission_adult_covid_suspected_30.39 hospdf$total_20 <- hospdf$previous_day_admission_adult_covid_confirmed_20.29 + hospdf$previous_day_admission_adult_covid_suspected_20.29 hospdf$total_1819 <- hospdf$previous_day_admission_adult_covid_confirmed_18.19 + hospdf$previous_day_admission_adult_covid_suspected_18.19 hospdf$unknown_percent <- 100*( (hospdf$previous_day_admission_adult_covid_suspected_unknown + hospdf$previous_day_admission_adult_covid_confirmed_unknown)/hospdf$total_admissions) hospdf$X80_percent <-100*(hospdf$total_80/hospdf$total_admissions) hospdf$X70_percent <-100*(hospdf$total_70/hospdf$total_admissions) hospdf$X60_percent <-100*(hospdf$total_60/hospdf$total_admissions) hospdf$X50_percent <-100*(hospdf$total_50/hospdf$total_admissions) hospdf$X40_percent <-100*(hospdf$total_40/hospdf$total_admissions) hospdf$X30_percent <-100*(hospdf$total_30/hospdf$total_admissions) hospdf$X20_percent <-100*(hospdf$total_20/hospdf$total_admissions)
hospdf %>% ggplot(aes(x=dateob,y=X80_percent)) + geom_line() + facet_wrap( ~ state, scales = "free" ) + xlab("") + xlim(as.Date('2020-07-01'), as.Date('2021-05-23'))
## Warning: Removed 143 row(s) containing missing values (geom_path).