Machine Learning Solutions
上QQ阅读APP看书,第一时间看更新

Implementing the revised approach

In this section, we will discuss the three parts of implementation, which are as follows:

  • Implementation
  • Testing the revised approach
  • Understanding the problem with the revised approach

Implementation

Here, we are implementing the following:

  • Alignment
  • Smoothing
  • Logistic Regression

We have already discussed the approach and key concepts, so now we just focus on the code part here. You can find all the code at this GitHub link: https://github.com/jalajthanaki/stock_price_prediction/blob/master/Stock_Price_Prediction.ipynb.

Implementing alignment

The alignment is performed on the testing dataset. You can refer to the following code snippet:

Implementing alignment

Figure 2.30: Code snippet for alignment on the test dataset

As you can see in the preceding code snippet, we obtain a difference of 10 days adj close price using the average price of the last 5 days and the average price of the predicted upcoming 5 days in order to align the test data. Here, we also convert the date from the string into the date format. As you can see, 5096.99 is the difference in the test prediction price, which we will add to our predicted adj close price value. We have generated the graph again so we can easily understand that the alignment approach is implemented nicely. You can refer to the following code snippet:

Implementing alignment

Figure 2.31: Code snippet of the graph for the alignment approach

As you can see in the preceding code snippet, the alignment graph shows that our testing dataset price and predicted prices are aligned. The benefit of the aligned graph is that now we can define in a precise manner that RandomForestRegressor didn't do its job with high accuracy as its performance was not great for all data records. The alignment graph gave us a crystal clear picture of our previous iteration. So when we train the logistic regression now, we will evaluate the predicted prices using alignment.

Implementing smoothing

We are using the pandas EWMA API using 60 days' time span and frequency time D. This "D" indicates that we are dealing with the datetime format in our dataset. You can see the code implementation in the following code snippet:

Implementing smoothing

Figure 2.32: Code snippet for EWMA smoothing

We are also generating the graph in which we put the predicted price, average predicted price, actual price, and average actual price. You can refer to the following code and graph:

Implementing smoothing

Figure 2.33: Code snippet for generating the graph after smoothing

In this graph, you can see that after smoothing the average predicted price, the curve follows the actual price trend. Although the accuracy is not great, we will move toward a positive direction. The smoothing technique will be useful for us if we want to tune our algorithm. You can refer to the following graph for the average predicted price versus actual price:

Implementing smoothing

Figure 2.34: Code snippet for the graph, indicating average_predicted_price versus actual_price

By referring to the preceding graph, we can indicate that we apply alignment and smoothing because it helps tune our ML model for the next iteration.

Implementing logistic regression

In this section, we will be implementing logistic regression. Take a look at the following screenshot:

Implementing logistic regression

Figure 2.35: Code snippet for logistic regression

Here, we have trained the model again using the logistic regression ML algorithm. We have also implemented alignment and smoothing for the test dataset. Now, let's evaluate the logistic regression model.

Testing the revised approach

We have tested the logistic regression model. You can refer to the visualization in the form of graphs that show that this revised approach is certainly better than RandomForesRegressor (without alignment and smoothing), but it is not up to the mark:

Testing the revised approach

Figure 2.36: Year-wise prediction graph

As you can see in the preceding screenshot, we have generated a year-wise graph for logistic Regression; we can see a slight improvement using this model. We have also used alignment and smoothing, but they are not too effective.

Now, let's discuss what the problems with this revised approach are, and then we can implement the best approach.

Understanding the problem with the revised approach

In this section, we will discuss why our revised approach doesn't give us good results. ML models don't work because datasets are not normalized. The second reason is that even after alignment and smoothing, the RandomForestRegression ML model faces an overfitting issue. For the best approach, we need to handle normalization and overfitting. We can solve this issue using a neural network-based ML algorithm. So in our last iteration, we will develop the neural network that can give us the best accuracy.