Even when using the same parameters (such as the values of p, d, q), the results can be drastically different in R and Python. #957

kestlermai · 2024-04-29T15:51:38Z

R 4.2.1; forecast 8.22.0 :

arima <- arima(train_data, order = c(0, 1, 1), seasonal = list(order = c(0, 1, 1), period = 12))
summary(arima)

Series: train_data
ARIMA(0,1,1)(0,1,1)[12]

Coefficients:
ma1 sma1
-0.193 -0.791
s.e. 0.091 0.084

sigma^2 = 181: log likelihood = 37.83
AIC=-69.66 AICc=-69.45 BIC=-61.32

Python 3.11; statsmodels 0.14.1 :

model = SARIMAX(train_data['incidence'], order=(0,1,1), seasonal_order=(0,1,1,12))
result = model.fit()
print(result.summary())

SARIMAX Results
Dep. Variable: incidence No. Observations: 132
Model: SARIMAX(0, 1, 1)x(0, 1, 1, 12) Log Likelihood -99.484
Date: Mon, 29 Apr 2024 AIC 204.969
Time: 23:46:06 BIC 213.306
Sample: 0 HQIC 208.354
- 132
Covariance Type:opg
coef std err z P>|z| [0.025 0.975]

ma.L1 -0.6900 0.048 -14.322 0.000 -0.784 -0.596
ma.S.L12 -0.8250 0.102 -8.081 0.000 -1.025 -0.625
sigma2 0.2766 0.019 14.838 0.000 0.240 0.313

Ljung-Box (L1) (Q): 0.73 Jarque-Bera (JB): 438.41
Prob(Q): 0.39 Prob(JB): 0.00
Heteroskedasticity (H): 1.21 Skew: -0.82
Prob(H) (two-sided): 0.56 Kurtosis: 12.26

Using the same parameters in two different software packages results in drastically different model performances.
For example, in R: log likelihood = 37.83, aic = -69.66; while in Python: Log Likelihood = -99.484, AIC = 204.969.

Can you help me？

The text was updated successfully, but these errors were encountered:

robjhyndman · 2024-04-29T22:37:58Z

I don't know what objective function is used by statsmodels. But even if the docs say it is maximum likelihood, there are many variations. R is using a state space representation with a diffuse prior as explained in the documentation for stats::arima(): https://rdrr.io/r/stats/arima.html. Other objective functions may yield different results. See https://robjhyndman.com/hyndsight/estimation/
Whatever objective function is used, it will contain local optima and there is no guarantee that the software finds the global optimum. See https://rjournal.github.io/articles/RN-2002-007/
The AIC/BIC/etc depends on the likelihood, so different likelihood functions lead to different information criteria. Even with the same likelihood function, some software implementations omit the constant in the calculation. See https://robjhyndman.com/hyndsight/lm_aic.html.
The best Python implementation of ARIMA models that I know of is provided by StatsForecast: https://nixtlaverse.nixtla.io/statsforecast/src/core/models.html#arima

kestlermai · 2024-05-02T18:14:53Z

Thank you very much for your reply.
When I tried to use the StatsForecast to build an ARIMA model, the results still differed significantly from those obtained by running R.
Under the same parameters {order=(0, 1, 1), season_length=12, seasonal_order=(0,1,1)}, MAPE: is 4.922 in R and 14.463 in Python.
This may be attributed to different software algorithms?
Anyway, thank you very much for your help.

robjhyndman · 2024-05-02T21:35:46Z

A MAPE difference that large suggests something's gone wrong in the Python model.

kestlermai closed this as completed May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Even when using the same parameters (such as the values of p, d, q), the results can be drastically different in R and Python. #957

Even when using the same parameters (such as the values of p, d, q), the results can be drastically different in R and Python. #957

kestlermai commented Apr 29, 2024 •

edited

Loading

robjhyndman commented Apr 29, 2024

kestlermai commented May 2, 2024 •

edited

Loading

robjhyndman commented May 2, 2024

Even when using the same parameters (such as the values of p, d, q), the results can be drastically different in R and Python. #957

Even when using the same parameters (such as the values of p, d, q), the results can be drastically different in R and Python. #957

Comments

kestlermai commented Apr 29, 2024 • edited Loading

robjhyndman commented Apr 29, 2024

kestlermai commented May 2, 2024 • edited Loading

robjhyndman commented May 2, 2024

kestlermai commented Apr 29, 2024 •

edited

Loading

kestlermai commented May 2, 2024 •

edited

Loading