This project was submitted for STATS201 Machine Learning for Social Science, instructed by Prof. Luyao Zhang at Duke Kunshan University in Autumn 2025.
We sincerely thank:
- Prof. Luyao Zhang for her guidance and insightful feedback throughout the project.
- Our classmates for their discussions and constructive critiques.
- AIGC tools such as ChatGPT and Storm for assisting in literature synthesis and data visualization.
- Open-source communities including Python, Matplotlib, etc. for providing essential machine learning and network analysis libraries.
Through this project, I deepened the understanding of Social Network Analysis (SNA) and its applications in public health. I gained hands-on experience in data preprocessing, network construction, and machine learning techniques, reinforcing our ability to analyze large-scale health datasets. The project also enhanced my critical thinking and problem-solving skills, equipping me for future research in data science and healthcare analytics.
- Project Overview
- Authors
- Disclaimer
- Acknowledgments
- Statement of Intellectual and Professional Growth
- Embedded Media
- Repository Navigation
- Located in the
code/
directory. - Contains a Python script for network construction, centrality analysis, Community Detection, Network Visualization, and temporal visualization.
- Raw and preprocessed datasets are stored in
data/
. - The
dataPreprocessing/
folder contains a Python script documenting missing data handling, data transformation and temporal splitting steps.
- The
docs/
folder includes the final report explaining background and motivation, research question, application scenario, methodologies, results, intellectual merits, and practical impacts. requirements.txt
independencies/
folder lists all dependencies required for replicating the project environment.