Skip to content

Rising-Stars-by-Sunshine/Khatanbuuvei_PS2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Khatanbuuvei_PS2

1. Overview:

This repository contains two Google Colab notebooks designed to analyze and predict the stock performance of Netflix (NFLX) and Disney (DIS) using historical stock data. The project addresses the following tasks:

  • Prediction: A machine learning model to predict stock prices of Netflix and Disney.
  • Regression Discontinuity (RD) Analysis: An analysis of the effect of certain events (e.g., stock splits) on stock prices using regression discontinuity methods.

The notebooks are organized to enable the analysis of stock price data, modeling, and causal inference.

2. Files Included:

1. Data Files:

  • DIS_stock_price.csv: Contains historical stock prices for Disney, including variables such as Date, Open Price, Close Price, High Price, Low Price, and Volume.
  • DIS_stock_split.csv: Contains information about Disney's stock splits, including dates and split ratios.
  • NFLX_stock_price.csv: Contains historical stock prices for Netflix, similar to the DIS_stock_price.csv format.
  • NFLX_stock_split.csv: Contains information about Netflix's stock splits, including dates and split ratios.

2. Code Notebooks:

  • Prediction_Model_Colab.ipynb: This notebook applies machine learning techniques to predict Netflix and Disney stock prices based on their historical data (Open Price, High Price, Low Price, Volume).
  • RD_Analysis_Colab.ipynb: This notebook focuses on performing regression discontinuity analysis to understand the impact of stock splits on stock prices for both companies.

3. Data Preprocessing and NLP Analysis:

  • The data preprocessing steps in the code clean and prepare the stock data for analysis and prediction, including handling missing values and feature engineering.
  • A brief NLP analysis is performed to evaluate potential textual data (if available), though this may not be necessary for the current dataset.

3. Prerequisites:

The following libraries are required to run the code:

  • Python: Version 3.9 or higher.
  • pandas: Version 1.3.0 or higher.
  • numpy: Version 1.21.0 or higher.
  • matplotlib: Version 3.4.0 or higher.
  • seaborn: Version 0.11.1 or higher.
  • scikit-learn: Version 0.24.0 or higher.
  • statsmodels: Version 0.12.2 or higher.

Installation Instructions:

To install the required libraries, you can use the following command:

pip install pandas numpy matplotlib seaborn scikit-learn statsmodels

Alternatively, you can create a requirements.txt file that contains the following dependencies:

pandas==1.3.0
numpy==1.21.0
matplotlib==3.4.0
seaborn==0.11.1
scikit-learn==0.24.0
statsmodels==0.12.2

Then install the dependencies with:

pip install -r requirements.txt

4. Usage Instructions:

Loading Datasets:

  1. Upload your dataset files (DIS_stock_price.csv, DIS_stock_split.csv, NFLX_stock_price.csv, NFLX_stock_split.csv) into the Colab notebook.
  2. Use the following code to load the datasets:
    import pandas as pd
    df_nflx = pd.read_csv('NFLX_stock_price.csv')
    df_dis = pd.read_csv('DIS_stock_price.csv')

Running NLP or Social Network Analysis:

  • This section is optional if no textual data is available. If you plan to perform textual analysis, you can use the following steps:
    from sklearn.feature_extraction.text import CountVectorizer
    # NLP steps such as topic modeling, sentiment analysis, etc.

Training and Evaluating Predictive Models:

  1. Training the Model: The code uses RandomForestRegressor and LinearRegression to predict stock prices based on features like Open Price, High Price, Low Price, and Volume.
  2. Evaluation: The models' performance is evaluated using Mean Absolute Error (MAE) and Mean Squared Error (MSE). Here's the general code snippet for training:
    from sklearn.ensemble import RandomForestRegressor
    model = RandomForestRegressor()
    model.fit(X_train, y_train)

Running Regression Discontinuity (RD) Analysis:

  1. Perform RD Analysis: In the RD notebook, stock splits are analyzed to understand their causal effect on stock prices.
  2. Steps:
    • Identify the cutoff (e.g., stock split date).
    • Estimate the treatment effect using RD models.

5. Expected Outputs:

  • Visualizations:
    • Time series plots showing the actual vs predicted stock prices.
    • Feature importance bar charts from the Random Forest model.
  • Performance Metrics:
    • MAE and MSE values for prediction models.
  • Insights:
    • Insights on how stock splits impact stock prices (from the RD analysis).
    • The predictive performance of different models.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published