Newsgator

An intelligent RSS feed aggregator that transforms news into beautiful newspaper-style publications

Newsgator is an advanced RSS feed aggregator that collects news from various sources, analyzes content for similarity, uses LLM to translate and rewrite content in Italian, and publishes beautifully styled newspaper-like HTML and RSS feeds.

🎬 Demo

Newsgator transforms RSS feeds into beautiful, newspaper-style web pages:

Example of generated newspaper-style HTML output with Italian news articles

Features

🔄 RSS Feed Collection: Fetches and parses RSS feeds from multiple news sources
🧠 Content Analysis: Groups similar articles using natural language processing techniques
🌐 LLM Translation & Rewriting: Translates content to Italian and rewrites it in a journalistic style
- Supports both OpenAI models and local LM Studio models (including phi-4-mini-instruct)
📰 Newspaper-Style HTML: Generates beautifully formatted HTML with a classic newspaper design
📡 RSS Feed Generation: Creates an RSS feed of translated and processed articles
🐳 Docker Support: Run in a container with a built-in web server to view the content

🚀 Quick Start

Get Newsgator running in under 5 minutes:

# Clone the repository
git clone https://github.com/fabriziosalmi/newsgator.git
cd newsgator

# Option 1: Using Docker (recommended)
docker build -t newsgator .
docker run -p 8080:8080 newsgator
# Visit http://localhost:8080 to see your newspaper!

# Option 2: Install locally
pip install -e .
python main.py
# Open docs/index.html in your browser

Note: For LLM functionality, you'll need either OpenAI API access or a local LM Studio instance running.

📋 Prerequisites

System Requirements

Python: 3.8 or higher
Memory: At least 2GB RAM (4GB recommended for large feeds)
Storage: 500MB free space for dependencies and generated content
Network: Internet connection for RSS feeds and LLM API calls

LLM Requirements (choose one)

Option A: OpenAI API (easier setup)

OpenAI API key with sufficient credits
Models supported: GPT-4, GPT-3.5-turbo, or newer

Option B: Local LM Studio (free, privacy-focused)

LM Studio installed and running
phi-4-mini-instruct model downloaded (or compatible model)
At least 8GB RAM for model inference

Optional

Docker: For containerized deployment (recommended)
Git: For version control and GitHub Pages publishing

Installation

Option 1: Install from source

Clone the repository:

git clone https://github.com/fabriziosalmi/newsgator.git
cd newsgator

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the package in development mode:
```
pip install -e .
```

Option 2: Using Docker

Clone the repository:

git clone https://github.com/fabriziosalmi/newsgator.git
cd newsgator

Build the Docker image:
```
docker build -t newsgator .
```

Run the container (make sure LM Studio is running locally first):

docker run -p 8080:8080 newsgator

Or with OpenAI:

docker run -p 8080:8080 -e LLM_PROVIDER="openai" -e OPENAI_API_KEY="your-api-key-here" newsgator

Configuration

LLM Options

Newsgator supports two options for the LLM provider:

LM Studio (Default): Uses a local LM Studio instance with phi-4-mini-instruct
- Make sure to have LM Studio running locally at http://localhost:1234
- The phi-4-mini-instruct model should be loaded in LM Studio
- Set environment variables if needed:
```
export LLM_PROVIDER="lmstudio"
export LMSTUDIO_BASE_URL="http://localhost:1234/v1"
export LMSTUDIO_MODEL="phi-4-mini-instruct"
```

OpenAI: Uses OpenAI's models (requires an API key)

Set environment variables:

export LLM_PROVIDER="openai"
export OPENAI_API_KEY="your-api-key-here"
export OPENAI_MODEL="gpt-4"  # or another model

Additional settings:

Customize feeds and other settings in src/newsgator/config.py as needed.

Usage

Basic Usage

Run the application to fetch feeds, process content, and generate HTML and RSS output:

python main.py

The generated content will be placed in the docs/ directory, which you can:

View locally
Push to GitHub Pages
Serve with a web server

Docker Usage

The Docker container supports different operation modes:

Generate content once and serve it (with LM Studio):

# Make sure LM Studio is running locally first
docker run -p 8080:8080 newsgator

Generate content once and serve it (with OpenAI):

docker run -p 8080:8080 -e LLM_PROVIDER="openai" -e OPENAI_API_KEY="your-api-key" newsgator

Only generate content (without serving):

docker run -e LLM_PROVIDER="lmstudio" newsgator --mode generate

Only serve existing content:

docker run -p 8080:8080 newsgator --mode serve

Generate content periodically (every 24 hours) and serve it:

docker run -p 8080:8080 -e LLM_PROVIDER="lmstudio" newsgator --mode both --interval 24

Docker Networking Note

When running in Docker, the container will try to connect to LM Studio at host.docker.internal:1234 which maps to your host machine's localhost. This should work automatically on Docker Desktop for Mac and Windows. For Linux, you may need to add --add-host=host.docker.internal:host-gateway to your docker run command.

GitHub Pages Publishing

After generating content in the docs/ directory, you can manually push to GitHub:

Commit the changes in the docs/ directory:

git add docs/
git commit -m "Update news content"
git push

In your GitHub repository, go to Settings → Pages
Set Source to "Deploy from a branch"
Select the main branch and /docs folder
Click Save

Customization

RSS Feeds: Edit the RSS_FEEDS list in config.py to add or remove sources
HTML Design: Modify the templates in src/newsgator/templates/ or the CSS in docs/css/styles.css
Content Analysis Settings: Adjust similarity thresholds and clustering parameters in config.py
LLM Settings: Change target language, model, or temperature in config.py

Project Structure

newsgator/
├── docs/                  # Output directory (HTML, RSS, CSS)
├── src/
│   └── newsgator/         # Main package
│       ├── feed_processing/      # RSS feed fetching and parsing
│       ├── content_analysis/     # Content similarity and clustering
│       ├── llm_integration/      # LLM translation and rewriting
│       ├── html_generation/      # HTML and RSS generation
│       ├── web_server/           # Simple HTTP server for Docker
│       └── templates/            # Jinja2 templates
├── docker-entrypoint.py   # Docker entry point script
├── Dockerfile             # Docker configuration
├── main.py                # Entry point script
├── requirements.txt       # Python dependencies
├── setup.py               # Package setup file
└── README.md              # This file

⚡ Performance & Limitations

Processing Time

Small feeds (5-10 articles): ~2-5 minutes
Medium feeds (20-30 articles): ~5-15 minutes
Large feeds (50+ articles): ~15-30 minutes

Processing time depends on LLM provider, article length, and system performance.

Current Limitations

Language: Currently optimized for Italian translation only
Article limits: Maximum of 5 articles per category by default
Feed sources: Pre-configured Italian news sources (customizable)
LLM context: Limited by model's maximum context length (32K tokens for phi-4)
Rate limiting: Subject to OpenAI API rate limits when using OpenAI

Resource Usage

Memory: ~1-2GB during processing
Storage: Generated content typically 10-50MB per run
Network: Downloads RSS feeds and makes LLM API calls

🔧 Troubleshooting

Common Issues

"No articles fetched" error

# Check internet connection and RSS feed URLs
curl -I https://www.ansa.it/sito/notizie/topnews/topnews_rss.xml

# Verify config.yaml has valid RSS feeds
cat config.yaml | grep -A 20 rss_feeds

LM Studio connection failed

# Ensure LM Studio is running on correct port
curl http://localhost:1234/v1/models

# Check if phi-4-mini-instruct model is loaded
# Open LM Studio GUI and verify model status

Docker container networking issues

# For Linux systems, add host networking
docker run --add-host=host.docker.internal:host-gateway -p 8080:8080 newsgator

# Alternative: Use host network mode
docker run --network host newsgator

Out of memory errors

Reduce max_items_per_feed in config.yaml
Use a smaller LLM model
Increase system swap space

Permission denied writing to docs/

# Fix permissions
chmod 755 docs/
sudo chown -R $USER:$USER docs/

Debug Mode

Enable debug logging for more detailed information:

python main.py --debug

❓ FAQ

Q: Can I use languages other than Italian? A: Currently, Newsgator is optimized for Italian translation. You can modify the LLM prompts in the source code to target other languages.

Q: How much does it cost to run with OpenAI? A: Costs vary based on article volume and model choice. Typical usage: $0.50-$2.00 per run with GPT-4, $0.10-$0.50 with GPT-3.5-turbo.

Q: Can I add my own RSS feeds? A: Yes! Edit the rss_feeds section in config.yaml to add your preferred news sources.

Q: How often should I run Newsgator? A: For daily news: once per day. For real-time updates: every 2-4 hours. Consider API rate limits and costs.

Q: Can I customize the newspaper design? A: Yes! Modify the templates in src/newsgator/templates/ and CSS in docs/css/styles.css.

Q: Is my data private when using local LM Studio? A: Yes! With LM Studio, all processing happens locally. No data is sent to external services except for RSS feed fetching.

Q: Can I run this on a Raspberry Pi? A: Yes, but LM Studio requires significant resources. Consider using OpenAI API instead for lightweight deployments.

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github		.github
docs		docs
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
config.yaml		config.yaml
docker-entrypoint.py		docker-entrypoint.py
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py

Uh oh!

fabriziosalmi/newsgator

Folders and files

Latest commit

History

Repository files navigation

Newsgator

📖 Table of Contents

🎬 Demo

Features

🚀 Quick Start

📋 Prerequisites

System Requirements

LLM Requirements (choose one)

Optional

Installation

Option 1: Install from source

Option 2: Using Docker

Configuration

LLM Options

Usage

Basic Usage

Docker Usage

Docker Networking Note

GitHub Pages Publishing

Customization

Project Structure

⚡ Performance & Limitations

Processing Time

Current Limitations

Resource Usage

🔧 Troubleshooting

Common Issues

Debug Mode

❓ FAQ

License

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Sponsor this project

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages