An intelligent RSS feed aggregator that transforms news into beautiful newspaper-style publications
Newsgator is an advanced RSS feed aggregator that collects news from various sources, analyzes content for similarity, uses LLM to translate and rewrite content in Italian, and publishes beautifully styled newspaper-like HTML and RSS feeds.
- Demo
- Features
- Quick Start
- Prerequisites
- Installation
- Configuration
- Usage
- GitHub Pages Publishing
- Customization
- Project Structure
- Performance & Limitations
- Troubleshooting
- FAQ
- Contributing
- License
Newsgator transforms RSS feeds into beautiful, newspaper-style web pages:
Example of generated newspaper-style HTML output with Italian news articles
- 🔄 RSS Feed Collection: Fetches and parses RSS feeds from multiple news sources
- 🧠 Content Analysis: Groups similar articles using natural language processing techniques
- 🌐 LLM Translation & Rewriting: Translates content to Italian and rewrites it in a journalistic style
- Supports both OpenAI models and local LM Studio models (including phi-4-mini-instruct)
- 📰 Newspaper-Style HTML: Generates beautifully formatted HTML with a classic newspaper design
- 📡 RSS Feed Generation: Creates an RSS feed of translated and processed articles
- 🐳 Docker Support: Run in a container with a built-in web server to view the content
Get Newsgator running in under 5 minutes:
# Clone the repository
git clone https://github.com/fabriziosalmi/newsgator.git
cd newsgator
# Option 1: Using Docker (recommended)
docker build -t newsgator .
docker run -p 8080:8080 newsgator
# Visit http://localhost:8080 to see your newspaper!
# Option 2: Install locally
pip install -e .
python main.py
# Open docs/index.html in your browser
Note: For LLM functionality, you'll need either OpenAI API access or a local LM Studio instance running.
- Python: 3.8 or higher
- Memory: At least 2GB RAM (4GB recommended for large feeds)
- Storage: 500MB free space for dependencies and generated content
- Network: Internet connection for RSS feeds and LLM API calls
Option A: OpenAI API (easier setup)
- OpenAI API key with sufficient credits
- Models supported: GPT-4, GPT-3.5-turbo, or newer
Option B: Local LM Studio (free, privacy-focused)
- LM Studio installed and running
- phi-4-mini-instruct model downloaded (or compatible model)
- At least 8GB RAM for model inference
- Docker: For containerized deployment (recommended)
- Git: For version control and GitHub Pages publishing
-
Clone the repository:
git clone https://github.com/fabriziosalmi/newsgator.git cd newsgator
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install the package in development mode:
pip install -e .
-
Clone the repository:
git clone https://github.com/fabriziosalmi/newsgator.git cd newsgator
-
Build the Docker image:
docker build -t newsgator .
-
Run the container (make sure LM Studio is running locally first):
docker run -p 8080:8080 newsgator
Or with OpenAI:
docker run -p 8080:8080 -e LLM_PROVIDER="openai" -e OPENAI_API_KEY="your-api-key-here" newsgator
Newsgator supports two options for the LLM provider:
-
LM Studio (Default): Uses a local LM Studio instance with phi-4-mini-instruct
- Make sure to have LM Studio running locally at http://localhost:1234
- The phi-4-mini-instruct model should be loaded in LM Studio
- Set environment variables if needed:
export LLM_PROVIDER="lmstudio" export LMSTUDIO_BASE_URL="http://localhost:1234/v1" export LMSTUDIO_MODEL="phi-4-mini-instruct"
-
OpenAI: Uses OpenAI's models (requires an API key)
- Set environment variables:
export LLM_PROVIDER="openai" export OPENAI_API_KEY="your-api-key-here" export OPENAI_MODEL="gpt-4" # or another model
- Set environment variables:
Additional settings:
- Customize feeds and other settings in
src/newsgator/config.py
as needed.
Run the application to fetch feeds, process content, and generate HTML and RSS output:
python main.py
The generated content will be placed in the docs/
directory, which you can:
- View locally
- Push to GitHub Pages
- Serve with a web server
The Docker container supports different operation modes:
-
Generate content once and serve it (with LM Studio):
# Make sure LM Studio is running locally first docker run -p 8080:8080 newsgator
-
Generate content once and serve it (with OpenAI):
docker run -p 8080:8080 -e LLM_PROVIDER="openai" -e OPENAI_API_KEY="your-api-key" newsgator
-
Only generate content (without serving):
docker run -e LLM_PROVIDER="lmstudio" newsgator --mode generate
-
Only serve existing content:
docker run -p 8080:8080 newsgator --mode serve
-
Generate content periodically (every 24 hours) and serve it:
docker run -p 8080:8080 -e LLM_PROVIDER="lmstudio" newsgator --mode both --interval 24
When running in Docker, the container will try to connect to LM Studio at host.docker.internal:1234
which maps to your host machine's localhost. This should work automatically on Docker Desktop for Mac and Windows. For Linux, you may need to add --add-host=host.docker.internal:host-gateway
to your docker run command.
After generating content in the docs/
directory, you can manually push to GitHub:
-
Commit the changes in the
docs/
directory:git add docs/ git commit -m "Update news content" git push
-
In your GitHub repository, go to Settings → Pages
-
Set Source to "Deploy from a branch"
-
Select the
main
branch and/docs
folder -
Click Save
- RSS Feeds: Edit the
RSS_FEEDS
list inconfig.py
to add or remove sources - HTML Design: Modify the templates in
src/newsgator/templates/
or the CSS indocs/css/styles.css
- Content Analysis Settings: Adjust similarity thresholds and clustering parameters in
config.py
- LLM Settings: Change target language, model, or temperature in
config.py
newsgator/
├── docs/ # Output directory (HTML, RSS, CSS)
├── src/
│ └── newsgator/ # Main package
│ ├── feed_processing/ # RSS feed fetching and parsing
│ ├── content_analysis/ # Content similarity and clustering
│ ├── llm_integration/ # LLM translation and rewriting
│ ├── html_generation/ # HTML and RSS generation
│ ├── web_server/ # Simple HTTP server for Docker
│ └── templates/ # Jinja2 templates
├── docker-entrypoint.py # Docker entry point script
├── Dockerfile # Docker configuration
├── main.py # Entry point script
├── requirements.txt # Python dependencies
├── setup.py # Package setup file
└── README.md # This file
- Small feeds (5-10 articles): ~2-5 minutes
- Medium feeds (20-30 articles): ~5-15 minutes
- Large feeds (50+ articles): ~15-30 minutes
Processing time depends on LLM provider, article length, and system performance.
- Language: Currently optimized for Italian translation only
- Article limits: Maximum of 5 articles per category by default
- Feed sources: Pre-configured Italian news sources (customizable)
- LLM context: Limited by model's maximum context length (32K tokens for phi-4)
- Rate limiting: Subject to OpenAI API rate limits when using OpenAI
- Memory: ~1-2GB during processing
- Storage: Generated content typically 10-50MB per run
- Network: Downloads RSS feeds and makes LLM API calls
"No articles fetched" error
# Check internet connection and RSS feed URLs
curl -I https://www.ansa.it/sito/notizie/topnews/topnews_rss.xml
# Verify config.yaml has valid RSS feeds
cat config.yaml | grep -A 20 rss_feeds
LM Studio connection failed
# Ensure LM Studio is running on correct port
curl http://localhost:1234/v1/models
# Check if phi-4-mini-instruct model is loaded
# Open LM Studio GUI and verify model status
Docker container networking issues
# For Linux systems, add host networking
docker run --add-host=host.docker.internal:host-gateway -p 8080:8080 newsgator
# Alternative: Use host network mode
docker run --network host newsgator
Out of memory errors
- Reduce
max_items_per_feed
in config.yaml - Use a smaller LLM model
- Increase system swap space
Permission denied writing to docs/
# Fix permissions
chmod 755 docs/
sudo chown -R $USER:$USER docs/
Enable debug logging for more detailed information:
python main.py --debug
Q: Can I use languages other than Italian? A: Currently, Newsgator is optimized for Italian translation. You can modify the LLM prompts in the source code to target other languages.
Q: How much does it cost to run with OpenAI? A: Costs vary based on article volume and model choice. Typical usage: $0.50-$2.00 per run with GPT-4, $0.10-$0.50 with GPT-3.5-turbo.
Q: Can I add my own RSS feeds?
A: Yes! Edit the rss_feeds
section in config.yaml
to add your preferred news sources.
Q: How often should I run Newsgator? A: For daily news: once per day. For real-time updates: every 2-4 hours. Consider API rate limits and costs.
Q: Can I customize the newspaper design?
A: Yes! Modify the templates in src/newsgator/templates/
and CSS in docs/css/styles.css
.
Q: Is my data private when using local LM Studio? A: Yes! With LM Studio, all processing happens locally. No data is sent to external services except for RSS feed fetching.
Q: Can I run this on a Raspberry Pi? A: Yes, but LM Studio requires significant resources. Consider using OpenAI API instead for lightweight deployments.
MIT
Contributions are welcome! Please feel free to submit a Pull Request.