Using Databricks to Develop Stats & Mathe Models to Forecast Monkeypox Outbreak in Washington State
We used contact-tracing data, standard line-lists, and published parameters to train a variety of time-series forecasting models, including an ARIMA model, a Poisson regression, and an SEIR compartmental model. We also calculated the daily R-effective rate as an additional output. The compartmental model best fit the reported cases when tested out of sample, but the statistical models were quicker and easier to deploy and helped inform initial decision-making. The R-effective rate was particularly useful throughout the effort.
Overall, these efforts highlighted the importance of rapidly deployable and scalable infectious disease modeling pipelines. Public health data science is still a nascent field, however, so common best practices in other industries are often-times novel approaches in public health. The need for stable, generalizable pipelines is crucial. Using the Databricks platform has allowed us to more quickly scale and iteratively improve our modeling pipelines to include other infectious diseases, such as influenza and RSV. Further development of scalable and standardized approaches to disease forecasting at the state and local level is vital to better informing future public health response efforts.
Talk by: Matthew Doxey
Connect with us: Website: https://databricks.com
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc
Facebook: https://www.facebook.com/databricksinc
Add comment