This project is a text simplification application that takes eBooks from the Gutenberg library as input, processes them to simplify complex sentences and vocabulary, and outputs the simplified text.
The Text Simplification Application aims to make reading complex texts more accessible by simplifying sentence structure and vocabulary. The app fetches eBooks from the Gutenberg library, cleans the text, segments it, applies AI-based text simplification, and generates simplified output.
- Data Ingestion: Fetch and parse eBooks from Gutenberg API.
- Preprocessing: Clean and segment raw text.
- Text Simplification: Apply AI models to simplify text.
- Post-Processing: Format the simplified text for readability.
- Output Generation: Export simplified eBooks in different formats.
- Clone the repository:
git clone https://gitlab.rz.hft-stuttgart.de/vector-software-project/text-simplification.git
- Set up a Python virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
- Install the required dependencies:
pip install -r requirements.txt
- Run the Application:
To run the text simplification pipeline on a Gutenberg eBook:
python main.py
Access it on http://127.0.0.1:5000
- Database Setup
-
Create .env file in the same directory as the
main.py
# .env file # Database configuration DB_USERNAME=<<YOUR_USERNAME>> DB_PASSWORD=<<YOUR_PASSWORD>> DB_HOST=localhost DB_NAME=gutenberg_db
DB_USERNAME=root DB_PASSWORD=root DB_HOST='host.docker.internal' # 'docker.for.mac.host.internal' for mac DB_NAME=test
-
Windows
- Visit https://dev.mysql.com/downloads/windows/installer/8.0.html
- Download the MySQL 8.0.40 .msi for Windows (306.4M) file
- This is the community edition
- After setting up the database password, update the
.env
file you created in step (1)
-
Creating the local db
create database gutenberg_db; show databases;
-
Docker Command for creating Image and Running the container
- Pre-requirements: Must have Docker Installed and Open Docker since it starts Docker Demon
- docker build -t gpt-neo-flask-api . (In place of gpt-neo-flask-api you can keep any <IMAGE_NAME>)
- docker run --add-host host.docker.internal:host-gateway -p 5000:5000 <IMAGE_NAME>
- Running Application using Docker Compose
You do NOT need to install Python or setup any databases on your local machine. Make sure that Docker have been installed and started
-
Run the following command to start the application:
docker-compose up
This command builds the services and starts the containers.
-
To stop the application and remove the containers, run:
docker-compose down
We welcome contributions! Please follow these steps:
- Clone the repository.
- Create a new feature branch (
git checkout -b feature-name
). - Commit your changes (
git commit -m "Add feature"
). - Push to the branch (
git push origin feature-name
). - Open a Pull Request.