EpiGro codes and data repository
EpiGro is a disease outbreak forecasting tool. It started (v.1.0) as a phenonmenological model that described disease incidence as a quadratic function of the cumulative number of cases. Version 2.0 incorporated the exact definition of the ICC (Incidence - Cumulative Cases) curve for the SIR model, thereby transforming EpiGro into a mechanistic model. Version 3.0, developed for COVID-19 forecasting, combines the mechanistic approach of EpiGro v.2.0 with variational data assimilation techniques.
EpiGro v.1.0 was developed in response to the DARPA Chikungunya Challenge and is described in Lega & Brown (2016). The approach relies on the empirical observation that weekly incidence data for the 2014 outbreak of chikungunya in Guadeloupe, plotted as a function of the cumulative number of cases, can be fitted with a parabola. This simple fact means that the cumulative number of cases may in turn be approximated by a quantity that follows logistic growth, confirming previous observations reported in the literature for other diseases (Chowell et al., 2014).
EpiGro won the DARPA Challenge and an analysis of the methods used by challenge participants revealed that simpler models generally performed better than complex ones (Del Valle et al., 2018). More details on our approach may be found on our chikungunya modeling challenge site.
The codes released for EpiGro v.1.0 consist of a MATLAB Graphical User Interface (GUI) that compares cumulative epidemiological data to logistic growth, by fitting a prabola to incidence (growth rate) plotted as a function of the cumulative cases. Users may import their own epidemiological data or select some of the datasets provided. The GUI also allows to model outbreaks that are comprised of two separate waves (via the two-parabola option).
EpiGro v.2.0 fits outbreak epidemiological data to the ICC curve of the SIR model. The exact formulation, derived in Lega (2020), is given by
where I is incidence, β is the contact rate of the disease, C is the cumulative number of cases, N is the size of the population, R0 is the basic reproductive number, and κ represents initial conditions.
The following results are also established in Lega (2020).
Due to its equivalence with the SIR model, EpiGro v.2.0 is a mechanistic approach that fits a SIR model to outbreak data.
The MATLAB codes provided for version 2.0 of EpiGro find the ICC curve associated with user-provided epidemiological data, estimate ranges of suitable parameter values in the presence of reporting noise, and describe a method to find a range of values of N if the latter variable is unknown. Simple forecasting based on a fit of the ICC curve to the data is also discussed.
EpiGro v.3.0, or EpiCovDA combines variational data assimilation methods with the exact formulation of the SIR ICC curve, to provide forecasts for ongoing outbreaks. Details will be provided in Biegel & Lega (2020). The model assumes that current interventions (such as social-distancing measures or stay-at-home orders) will remain in effect for at least four weeks after the forecasts are made.
Priors are found by processing the early stages of the outbreak data with EpiGro v.2.0. The data assimilation step identifies parameters by minimizing a cost function that combines distance from prior values as well as distance between data points collected in the last 3 to 14 days and the parametrized ICC curves. Forecasts are obtained by integration of ICC curves for parameter values in the posterior distribution, augmented by resampling of the results with a normal distribution.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
See LICENSE.txt in this repository for additional information.