This repository contains two Google Colab notebooks designed to analyze and predict the stock performance of Netflix (NFLX) and Disney (DIS) using historical stock data. The project addresses the following tasks:
- Prediction: A machine learning model to predict stock prices of Netflix and Disney.
- Regression Discontinuity (RD) Analysis: An analysis of the effect of certain events (e.g., stock splits) on stock prices using regression discontinuity methods.
The notebooks are organized to enable the analysis of stock price data, modeling, and causal inference.
- DIS_stock_price.csv: Contains historical stock prices for Disney, including variables such as Date, Open Price, Close Price, High Price, Low Price, and Volume.
- DIS_stock_split.csv: Contains information about Disney's stock splits, including dates and split ratios.
- NFLX_stock_price.csv: Contains historical stock prices for Netflix, similar to the DIS_stock_price.csv format.
- NFLX_stock_split.csv: Contains information about Netflix's stock splits, including dates and split ratios.
- Prediction_Model_Colab.ipynb: This notebook applies machine learning techniques to predict Netflix and Disney stock prices based on their historical data (Open Price, High Price, Low Price, Volume).
- RD_Analysis_Colab.ipynb: This notebook focuses on performing regression discontinuity analysis to understand the impact of stock splits on stock prices for both companies.
- The data preprocessing steps in the code clean and prepare the stock data for analysis and prediction, including handling missing values and feature engineering.
- A brief NLP analysis is performed to evaluate potential textual data (if available), though this may not be necessary for the current dataset.
The following libraries are required to run the code:
- Python: Version 3.9 or higher.
- pandas: Version 1.3.0 or higher.
- numpy: Version 1.21.0 or higher.
- matplotlib: Version 3.4.0 or higher.
- seaborn: Version 0.11.1 or higher.
- scikit-learn: Version 0.24.0 or higher.
- statsmodels: Version 0.12.2 or higher.
To install the required libraries, you can use the following command:
pip install pandas numpy matplotlib seaborn scikit-learn statsmodels
Alternatively, you can create a requirements.txt
file that contains the following dependencies:
pandas==1.3.0
numpy==1.21.0
matplotlib==3.4.0
seaborn==0.11.1
scikit-learn==0.24.0
statsmodels==0.12.2
Then install the dependencies with:
pip install -r requirements.txt
- Upload your dataset files (DIS_stock_price.csv, DIS_stock_split.csv, NFLX_stock_price.csv, NFLX_stock_split.csv) into the Colab notebook.
- Use the following code to load the datasets:
import pandas as pd df_nflx = pd.read_csv('NFLX_stock_price.csv') df_dis = pd.read_csv('DIS_stock_price.csv')
- This section is optional if no textual data is available. If you plan to perform textual analysis, you can use the following steps:
from sklearn.feature_extraction.text import CountVectorizer # NLP steps such as topic modeling, sentiment analysis, etc.
- Training the Model: The code uses RandomForestRegressor and LinearRegression to predict stock prices based on features like Open Price, High Price, Low Price, and Volume.
- Evaluation: The models' performance is evaluated using Mean Absolute Error (MAE) and Mean Squared Error (MSE). Here's the general code snippet for training:
from sklearn.ensemble import RandomForestRegressor model = RandomForestRegressor() model.fit(X_train, y_train)
- Perform RD Analysis: In the RD notebook, stock splits are analyzed to understand their causal effect on stock prices.
- Steps:
- Identify the cutoff (e.g., stock split date).
- Estimate the treatment effect using RD models.
- Visualizations:
- Time series plots showing the actual vs predicted stock prices.
- Feature importance bar charts from the Random Forest model.
- Performance Metrics:
- MAE and MSE values for prediction models.
- Insights:
- Insights on how stock splits impact stock prices (from the RD analysis).
- The predictive performance of different models.