The Super AI Engineer 2023 (South) is a highly anticipated competition that focuses on the development of an advanced Artificial Intelligence (AI) system for accurately forecasting stock prices. Participants are challenged to create an AI model capable of predicting stock prices and determining the crucial features that contribute to accurate predictions. In this competition, the AI models will be built using a combination of lag data, mean of lag data, standard deviation of lag data, and residuals of ARIMA models. Ultimately, the goal is to build a Long Short-Term Memory (LSTM) model capable of forecasting stock prices for the next 300 days.
To tackle the stock investment prediction challenge, participants are required to develop an AI model that harnesses the power of various data-driven techniques. The primary data sources for the model include historical stock price data, lag data, and the residuals obtained from applying the ARIMA (AutoRegressive Integrated Moving Average) model.
Lag data refers to historical stock prices from previous time periods. By incorporating lag data into the AI model, participants can capture valuable information about the stock's past performance. This data can provide insights into patterns, trends, and potential indicators for future stock price movements.
The mean of lag data involves calculating the average value of the historical stock prices from different time periods. This statistical measure helps identify the central tendency of the stock's past performance. By including the mean of lag data as a feature, the AI model can consider the average behavior of the stock when making predictions.
The standard deviation of lag data quantifies the dispersion or volatility of the stock prices from previous time periods. By incorporating this feature, the AI model can capture the level of uncertainty or stability in the stock's historical performance. This information can be crucial in predicting potential price fluctuations and assessing risk.
ARIMA models are widely used in time series analysis to predict future values based on historical data patterns. In this competition, participants extract the residuals of ARIMA models as an additional feature for their AI models. The residuals represent the differences between the observed values and the predictions made by the ARIMA model. By including these residuals, the AI model can account for any remaining unexplained variability in the stock price data.
Once the essential features are obtained, participants are tasked with building an LSTM model, a type of recurrent neural network (RNN), for stock price forecasting. LSTM models are well-suited for time series data analysis as they can capture long-term dependencies and remember important patterns. By training the LSTM model on the provided dataset, participants aim to predict the stock prices for the next day and extend the forecasting horizon up to 300 days.