Introduction: OVERSIMPLIFIED COVID19 CURVE FIT
In this Instructable, I will explain how you can build a very simple model to explain COVID19 Cases. This is not a science project and is not suitable for planning purposes, but it will explain how statistical models work. You can easily use this method to fit other non linear functions to data-sets.
When you fit mathematical functions on a data-set, like COVID19 Cases, you should remember that your simple model will not take into account
- future changes in human behavior
- Stricter or less government regulations
- Future availability of vaccines
You will need Microsoft Excell with the solver add-in
The following is valuable resources for time series data n COVID19 cases
Step 1: The Logistic Function
The number of cases (tested infections), deaths and recoveries in pandemics like this can normally be modeled with a logistics function. There are a few functions also. Most of them have a sigmoid or S-shape.
The logistics function will initially closely follow an exponential function, then it will reach a time where the maximum additional cases reach a maximum (inflection point), after this point, the curve will start to flatten and follow a logarithmic function
More about the logistics function is available at https://en.wikipedia.org/wiki/Logistic_function
The logistics function is described by
f(t)=L/(1+e^(-k(t-i))) Where, L=Maximum number of cases e is the Euler's number / The base of the Natural Logarithm (e= 2.5460024825847) k=the logistic growth rate or steepness of the curve i is the value of t, where f(t) reached the inflection point
Mathematically, the inflection point, is the point where the rate of change of f(t) is maximized or where f'(t) (The slope of f(t) ) is maximized.
Step 2: DATA OF SOUTH KOREA
In this example, I will use the data of South Korea.
It is also attach in this Excel sheet
We are going to use the Method of Least Squares to fit a curve (Logistics Function) on the South Korea time series data set
Open the attach sheet
In the upper left corner you will see the PARAMETERS of the Logistics Function (L, k, and I)
You will also see the columns
- t for time
- CASES is the reported COVID19 CASES
- f(t) estimated. This is the Logistics Curve estimated with the selected parameters for each date (value of t)
- SQUARE ERRORS or Residuals is calculated as ( f(t)-estimated-CASES)^2
- In Cell E55 you will see the SSE (SUM OF SQUARE ERRORS) value. It is the sum of column E
How do we decide how the curve fit should be done.
Intuitively you may decide that a good fit should minimize the vertical distance between your curve and and the actual CASES for the specific day. This can be calculated as f(t)-estimated minus CASES. It will not be practical to find a curve that minimize the vertical distance between each point (t, CASES) and (t, f(t)-estimated ).
So you may decide to sum up all these distances and try to minimize the sum. The problem will be that your negative values will cancel positive values, since the fit may be below or above the actual cases.
You will there fore square the distances. This is what is done in column D. You will then sum this values to get the SSE (Cell E55)
It is clear that f(t) estimated, SQUARE ERRORS and SSE is dependent on your selection of L, k and I.
As a matter of fact L, k and I should be selected to Minimize the SSE (Cell E55)
Step 3: USING EXCFEL SOLVER TO MINIMIZE SSE
We are now going to Use the excel solver, to select the appropriate values for L, k and I to minimize SSE. please check the appropriate video.
Selecting the initial parameters L=9000, k=0.5, I=20 does not provide a good fit. The excel solver will optimize the values. With this values the SSE=75 996 559
After the solver optimized the curve (Minimized the SSE), the new SSE = 4 017 028
You can now use this model (Technique) for your own country. Also use other functions like the GOMPERTZ function with this technique.