Disclaimer: This is a personal project used for educational/didatic purposes only
This project leverages Python's Scrapy library to perform web scraping on Mercado Livre, specifically collecting information about 5-string bass guitars
If you'd like to scrape data for a different item, it is totally possible!
extraction/spiders/mercadolivre.py
Set it to the item you wish to scrape If you wish to scrape prices for Acer notebooks, the url would be
https://lista.mercadolivre.com.br/notebook-acer
Click the "Next Page" button and observe the new URL. It should look like
https://lista.mercadolivre.com.br/informatica/portateis-acessorios/notebooks/acer/notebook-acer_Desde_49_NoIndex_True
Set this url as the next page attribute in the MercadoLivreSpider class, but change 49 to {offset}
This will ensure that the crawler moves through to the next pages
In the end, the code for the next page attribute should look like
next_page = f"https://lista.mercadolivre.com.br/instrumentos-musicais/instrumentos-corda/baixos/baixo-5-cordas_Desde_{offset}_NoIndex_True_STRINGS*NUMBER_5-5"
The dashboard currently looks like this:
You may search through all items and apply filters
I have refactored the project to offer Docker support
You may install it with the following commands:
docker build -t mlscrape .
docker run -p 8501:8501 mlscrape
This will map your 8501 port to the one exposed on the Dockerfile
You will be able to access the dashboard by navigating to localhost:8501
It is my personal recommendation that you make a new virtual Python environment for every project you run. To do so, open your preferred terminal and run the commands:
git clone https://github.com/heitornolla/mercadolivre-scraping.git
cd mercadolivre-scraping
pyenv local 3.12.1
You can do this with Venv with the command
python -m venv .venv
or use other environment managers, such as Conda
If you opted for Venv, activate the environment with
source .venv/Scripts/activate
pip install -r requirements.txt
Run the crawl.py file to crawl Mercado Livre
To generate the dashboard based on your data, run
streamlit run dashboard/dashboard.py
Python, Scrapy, Pandas, Streamlit and Docker