Skip to main content

Table 1 Models for non-count data adapted for count data and their limitations

From: Forecasting emergency department arrivals using INGARCH models

Model

Advantages

Disadvantages

Normal linear regression

\(y=x\beta +\epsilon\)

\(\epsilon \sim N\left(0,{\sigma }^{2}\right)\)

Normal distribution approximates the Poisson distribution if the mean is higher than 20

No possible inference on single outcomes

The model allows for a negative outcome

The prediction is not coherent, i.e., the forecast is not an integer-valued outcome

Log-linear model

\(\mathrm{log}\left(y\right)=x\beta +\epsilon\)

\(\epsilon \sim N\left(0,{\sigma }^{2}\right)\)

The variable y is modelled as a log-normal variable

The zeros in the data have to be deleted to estimate this model, which leads to endogenous sample selection problems

The prediction is not coherent, i.e., the forecast is not an integer-valued outcome

There is a restriction on the conditional variance, i.e., it must be quadratic in the conditional expectation.*

Log-linear model with constant c to deal with zeros

\(\mathrm{log}\left(y+c\right)=x\beta +\epsilon\)

\(\epsilon |x\sim N\left(0,{\sigma }^{2}\right)\)

The model can be estimated even if there are zero elements in the dataset

The log(y) is not linear in x, which introduces bias in the estimation of the model

The prediction is not coherent, i.e., the forecast is not an integer-valued outcome

Non-linear model

\(y=\mathrm{exp}\left(\mathrm{x\beta }\right)+\upepsilon\)

\(\epsilon \sim N\left(0,{\sigma }^{2}\right)\)

There is no problem in dealing with zero values

The model allows for a negative outcome

The prediction is not coherent, i.e., the forecast is not an integer-valued outcome

Ordered probit and logit

state equation:

\({y}^{*}=x\beta +\epsilon\)

Observation equation:

\(y=0\;\text{if}\;{y}^{*}<{\alpha }_{0}\)Ā Ā 

\(y=1\;\text{if}\;{\alpha }_{0}\le {y}^{*}<{\alpha }_{1}\)Ā Ā 

\(y=2\;\text{if}\;{\alpha }_{1}\le {y}^{*}<{\alpha }_{2}\)Ā Ā 

\(\vdots\)

The integer-valued structure of the data is considered

The prediction can be coherent, i.e., if we wanted to forecast the future median value, it would be an integer-valued outcome

The underlying count process is not reflected

The forecast is limited to values already observed in the data

Complexity is excessive when the number of counts is high

  1. *If a variable y follows a log-normal distribution, the following identity holds: \({\varvec{V}}{\varvec{a}}{\varvec{r}}\left({\varvec{y}}|{\varvec{x}}\right)=\left({{\varvec{e}}}^{{{\varvec{\sigma}}}^{2}}-1\right){\left[{\varvec{E}}\left({\varvec{y}}|{\varvec{x}}\right)\right]}^{2}\)