The general steps to implement an ARIMA model are –
Load the data: The first step for model building is of course to load the dataset
Preprocessing: Depending on the dataset, the steps of preprocessing will be defined. This will include creating timestamps, converting the dtype of date/time column, making the series univariate, etc.
Make series stationary: In order to satisfy the assumption, it is necessary to make the series stationary. This would include checking the stationarity of the series and performing required transformations
Determine d value: For making the series stationary, the number of times the difference operation was performed will be taken as the d value
Create ACF and PACF plots: This is the most important step in ARIMA implementation. ACF PACF plots are used to determine the input parameters for our ARIMA model
Determine the p and q values: Read the values of p and q from the plots in the previous step
Fit ARIMA model: Using the processed data and parameter values we calculated from the previous steps, fit the ARIMA model
Predict values on validation set: Predict the future values
Calculate RMSE: To check the performance of the model, check the RMSE value using the predictions and actual values on the validation set
Although ARIMA is a very powerful model for forecasting time series data, the data preparation and parameter tuning processes end up being really time consuming. Before implementing ARIMA, you need to make the series stationary, and determine the values of p and q using the plots we discussed above. Auto ARIMA makes this task really simple for us as it eliminates steps 3 to 6 we saw in the previous section. Below are the steps you should follow for implementing auto ARIMA:
Load the data: This step will be the same. Load the data into your notebook
Preprocessing data: The input should be univariate, hence drop the other columns
Fit Auto ARIMA: Fit the model on the univariate series
Predict values on validation set: Make predictions on the validation set
Calculate RMSE: Check the performance of the model using the predicted values against the actual values