MagicXML is a high-performance web application built with FastAPI that converts data between XML, CSV, Excel, JSON, PDF, and image formats. Designed for data analysts, developers, and e-commerce professionals, MagicXML handles complex structures with advanced parsing capabilities, asyncio-powered processing, and intelligent data classification.
- Convert CSV to XML
- Convert CSV to Excel
- Convert Excel to CSV
- Convert JSON to CSV
- Convert CSV to JSON
- Convert XML to JSON
- JPEG↔PNG image conversion
- Convert PDF to CSV
- Convert PDF to Excel
- Convert PDF to JSON
- Convert CSV to PDF
- Convert Excel to PDF
🔗 Live Demo: https://magic-xml.replit.app
-
High-Performance Processing: Asynchronous architecture for efficient handling of large XML files
-
Intelligent Data Extraction: Contextual parsing of complex nested XML structures
-
Data Cleaning & Sanitization: Automatic cleaning of HTML tags and special characters
-
Multilingual Support: Interface available in English, Russian, and more languages
-
RESTful API: Programmatic access for seamless integration with your systems
-
Callback Support: Optional webhook notifications when processing is complete
-
Robust Error Handling: Comprehensive error management with detailed reporting
-
Versatile Format Conversions: Convert between CSV, XML, Excel, JSON, PDF, and JPEG/PNG images
MagicXML leverages several advanced technologies to deliver exceptional performance:
- FastAPI Backend: High-performance asynchronous API framework
- Asyncio & Aiohttp: Non-blocking I/O operations for concurrent processing
- XML ElementTree: Efficient XML parsing and traversal
- BeautifulSoup: Intelligent HTML content cleaning
- Modern Frontend: Responsive design with custom CSS and JavaScript
- E-commerce Data Processing: Convert product feeds from XML to CSV
- Data Analysis: Transform XML datasets into analysis-ready CSV format
- System Integration: Bridge XML-based systems with CSV-compatible tools
- Catalog Management: Process large product catalogs efficiently
- Automated Workflows: Integrate with data pipelines via API
- Python 3.8+
- Git
# Clone the repository
git clone https://github.com/Solrikk/MagicXML.git
cd MagicXML
# Install dependencies
poetry install
# Run the application
poetry run uvicorn main:app --host 0.0.0.0 --port 8080 --reload
Alternatively, install dependencies with pip
:
pip install -r requirements.txt
curl -X 'POST' \
'https://magic-xml.replit.app/process_link' \
-H 'Content-Type: application/json' \
-d '{
"link_url": "https://example.com/data.xml",
"preset_id": "optional-tracking-id",
"return_url": "https://your-callback-url.com/webhook"
}'
{
"file_url": "https://magic-xml.replit.app/download/data_files/example_com.csv",
"preset_id": "optional-tracking-id",
"status": "completed"
}
curl -X 'GET' 'https://magic-xml.replit.app/status/{preset_id}'
curl -X 'GET' 'https://magic-xml.replit.app/download/data_files/{filename}'
MagicXML processes XML files asynchronously using Python's asyncio
and aiohttp
:
async def process_offers_chunk(offers_chunk, build_category_path, format_type):
offers = []
for offer_elem in offers_chunk:
offer_data = await process_offer(offer_elem, build_category_path, format_type)
offers.append(offer_data)
return {"offers": offers}
This approach enables efficient concurrent processing, drastically reducing conversion time for large XML files.
The application implements sophisticated text processing to ensure data quality:
def clean_description(description):
if not description:
return ''
soup = BeautifulSoup(description, 'html5lib')
allowed_tags = ['p', 'br']
for tag in soup.find_all(True):
if tag.name not in allowed_tags:
tag.unwrap()
# Additional cleaning logic...
return str(soup)