Web Scraping Mercado Livre

Disclaimer: This is a personal project used for educational/didatic purposes only

Overview

This project leverages Python's Scrapy library to perform web scraping on Mercado Livre, specifically collecting information about 5-string bass guitars

Adapting for other items

If you'd like to scrape data for a different item, it is totally possible!

1. Go to the file located in

extraction/spiders/mercadolivre.py

2. Update the start url

Set it to the item you wish to scrape If you wish to scrape prices for Acer notebooks, the url would be

https://lista.mercadolivre.com.br/notebook-acer

3. Update the parse function

Click the "Next Page" button and observe the new URL. It should look like

https://lista.mercadolivre.com.br/informatica/portateis-acessorios/notebooks/acer/notebook-acer_Desde_49_NoIndex_True

Set this url as the next page attribute in the MercadoLivreSpider class, but change 49 to {offset}

This will ensure that the crawler moves through to the next pages

In the end, the code for the next page attribute should look like

next_page = f"https://lista.mercadolivre.com.br/instrumentos-musicais/instrumentos-corda/baixos/baixo-5-cordas_Desde_{offset}_NoIndex_True_STRINGS*NUMBER_5-5"

Dashboard

The dashboard currently looks like this:

You may search through all items and apply filters

How to Install and Run the project

With Docker

I have refactored the project to offer Docker support

You may install it with the following commands:

docker build -t mlscrape .

docker run -p 8501:8501 mlscrape

This will map your 8501 port to the one exposed on the Dockerfile

You will be able to access the dashboard by navigating to localhost:8501

With a local Python installation

It is my personal recommendation that you make a new virtual Python environment for every project you run. To do so, open your preferred terminal and run the commands:

1. Clone the repository

git clone https://github.com/heitornolla/mercadolivre-scraping.git

2. Move to the project folder

cd mercadolivre-scraping

3. Define the local Python version

pyenv local 3.12.1

4. Create a new Python environment

You can do this with Venv with the command

python -m venv .venv

or use other environment managers, such as Conda

If you opted for Venv, activate the environment with

source .venv/Scripts/activate

5. Install the Requirements

pip install -r requirements.txt

6. Run the Project!

Run the crawl.py file to crawl Mercado Livre

To generate the dashboard based on your data, run

streamlit run dashboard/dashboard.py

Technologies Used

Python, Scrapy, Pandas, Streamlit and Docker

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
assets		assets
dashboard		dashboard
data		data
extraction		extraction
transforms		transforms
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
crawl.py		crawl.py
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Scraping Mercado Livre

Overview

Adapting for other items

1. Go to the file located in

2. Update the start url

3. Update the parse function

Dashboard

How to Install and Run the project

With Docker

With a local Python installation

1. Clone the repository

2. Move to the project folder

3. Define the local Python version

4. Create a new Python environment

5. Install the Requirements

6. Run the Project!

Technologies Used

About

Uh oh!

Uh oh!

Languages

License

heitornolla/WebScraping-MercadoLivre

Folders and files

Latest commit

History

Repository files navigation

Web Scraping Mercado Livre

Overview

Adapting for other items

1. Go to the file located in

2. Update the start url

3. Update the parse function

Dashboard

How to Install and Run the project

With Docker

With a local Python installation

1. Clone the repository

2. Move to the project folder

3. Define the local Python version

4. Create a new Python environment

5. Install the Requirements

6. Run the Project!

Technologies Used

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages