AutoCSV Profiler Suite

A comprehensive toolkit for automated CSV data analysis using multiple profiling engines. This suite provides statistical analysis, data quality assessment, and interactive reporting through conda-managed environments.

Project Structure

AutoCSV-Profiler-Suite/
├── README.md
├── LICENSE
├── CHANGELOG.md
├── SECURITY.md
├── .gitignore
├── run_analysis.bat
│
├── src/
│   ├── auto_csv_profiler.py
│   ├── profile_ydata_profiling_report.py
│   ├── profile_sweetviz_report.py
│   ├── profile_dataprep_report.py
│   └── recognize_delimiter.py
│
├── environments/
│   ├── environment-main.yml
│   ├── environment-profiling.yml
│   └── environment-dataprep.yml
│
├── scripts/
│   └── setup_environments.ps1
│
├── docs/
│   ├── installation.md
│   ├── usage.md
│   ├── environments.md
│   └── troubleshooting.md
│
└── examples/
    └── sample_data.csv

Distribution Methods

This project is available in two formats to suit different user needs:

🐍 PyPI Package (Simplified)

pip install autocsv-profiler
autocsv-profiler data.csv

Single environment with core analysis features
Command-line interface for quick analysis
Automatic dependency management

Documentation:

PyPI Package Docs

📦 Source Distribution (Full Suite)

git clone https://github.com/dhaneshbb/AutoCSV-Profiler-Suite.git
.\scripts\setup_environments.ps1
run_analysis.bat

Three specialized environments for different tools
Multiple profiling engines (YData, SweetViz, DataPrep)
Interactive tool selection interface

Documentation:

Source Suite Guide

Features

Multiple Profiling Engines: YData Profiling, SweetViz, and DataPrep (source distribution)
Comprehensive Analysis: Statistical summaries, outlier detection, missing value analysis
Interactive Reports: HTML reports with visualizations and data insights
Environment Management: Single or multiple conda environments
Flexible Installation: Choose between simple pip install or full toolkit setup
Cross-Platform Support: Works on Windows with PowerShell and batch scripts

Project Architecture

Source Distribution (Full Suite)

graph TB
    A[CSV Input File] --> B[run_analysis.bat]
    B --> C[Delimiter Detection]
    C --> D{User Selection}
    
    D --> E[csv-profiler-main<br/>Python 3.11.7]
    D --> F[csv-profiler-profiling<br/>Python 3.10.4]
    D --> G[csv-profiler-dataprep<br/>Python 3.10.4]
    
    E --> H[auto_csv_profiler.py<br/>Statistical Analysis]
    F --> I[profile_ydata_profiling_report.py<br/>YData Reports]
    F --> J[profile_sweetviz_report.py<br/>SweetViz Reports]
    G --> K[profile_dataprep_report.py<br/>DataPrep EDA]
    
    H --> L[Output Directory]
    I --> L
    J --> L
    K --> L
    
    L --> M[HTML Reports]
    L --> N[Statistical Summaries]
    L --> O[Visualizations]
    L --> P[Cleaned Data]

PyPI Package (Simplified)

graph TB
    A[CSV Input File] --> B[autocsv-profiler CLI]
    B --> C[Single Environment<br/>Python 3.11+]
    C --> D[Core Analysis Engine]
    D --> E[Output Directory]
    E --> F[Statistical Reports]
    E --> G[Visualizations]
    E --> H[Data Quality Reports]

Environment Structure

Source Distribution (Multiple Environments)

graph LR
    A[setup_environments.ps1] --> B[csv-profiler-main]
    A --> C[csv-profiler-profiling]
    A --> D[csv-profiler-dataprep]
    
    B --> E[pandas, numpy, scipy<br/>matplotlib, seaborn<br/>scikit-learn, statsmodels]
    C --> F[ydata-profiling<br/>sweetviz]
    D --> G[dataprep]
    
    style B fill:#e1f5fe
    style C fill:#f3e5f5
    style D fill:#e8f5e8

PyPI Package (Single Environment)

graph LR
    A[pip install] --> B[autocsv-profiler]
    B --> C[Unified Environment<br/>All core packages<br/>pandas, numpy, scipy<br/>matplotlib, seaborn, etc.]
    
    style B fill:#e1f5fe
    style C fill:#f3e5f5

Quick Start

Prerequisites

Windows OS with PowerShell (for source distribution)
Python 3.9+ (for PyPI package)
Anaconda or Miniconda (for source distribution)
Internet connection for package downloads

Installation Options

Option 1: PyPI Package (Recommended for Most Users)

# Simple installation
pip install autocsv-profiler

# Quick analysis
autocsv-profiler your_data.csv

Option 2: Source Distribution (Full Feature Set)

# Clone repository
git clone https://github.com/dhaneshbb/AutoCSV-Profiler-Suite.git
cd AutoCSV-Profiler-Suite

# Setup environments
.\scripts\setup_environments.ps1

# Run analysis
run_analysis.bat

Usage Comparison

Feature	PyPI Package	Source Distribution
Installation	`pip install`	Download + conda setup
Setup Time	30 seconds	10 minutes
Environments	1 (unified)	3 (specialized)
Analysis Tools	Core statistical analysis	Core + YData + SweetViz + DataPrep
Interface	Command-line	Interactive batch menu
Updates	`pip install -U`	Git pull + environment update
Target Users	Developers, quick analysis	Data analysts, comprehensive reports

Choose PyPI Package If:

You want quick, straightforward CSV analysis
You prefer command-line tools
You need core statistical features only
You want automatic dependency management

Choose Source Distribution If:

You need multiple profiling engines
You want specialized HTML reports
You prefer interactive tool selection
You need the full feature set

Usage Workflow

sequenceDiagram
    participant User
    participant Batch as run_analysis.bat
    participant Env as Environment Manager
    participant Scripts as Analysis Scripts
    participant Output as Results
    
    User->>Batch: Execute with CSV path
    Batch->>Env: Activate csv-profiler-main
    Env->>Scripts: Run delimiter detection
    Scripts->>User: Prompt for delimiter confirmation
    User->>Batch: Select analysis tools
    
    loop For each selected tool
        Batch->>Env: Activate specific environment
        Env->>Scripts: Run analysis script
        Scripts->>Output: Generate reports
    end
    
    Output->>User: HTML reports and statistics

Environment Management (Source Distribution)

The source distribution uses three specialized conda environments for maximum functionality and tool compatibility:

csv-profiler-main

Purpose: Core statistical analysis and data processing
Python Version: 3.11.7
Key Packages: pandas, numpy, scipy, matplotlib, seaborn, scikit-learn

csv-profiler-profiling

Purpose: YData Profiling and SweetViz report generation
Python Version: 3.10.4
Key Packages: ydata-profiling, sweetviz

csv-profiler-dataprep

Purpose: DataPrep EDA and data preparation tasks
Python Version: 3.10.4
Key Packages: dataprep

Output Structure

graph TD
    A[CSV Analysis Results] --> B[Statistical Reports]
    A --> C[HTML Reports]
    A --> D[Visualizations]
    A --> E[Data Quality]
    
    B --> F[summary_statistics_all.txt]
    B --> G[categorical_summary.txt]
    B --> H[outliers_summary.txt]
    
    C --> I[profiling_report.html]
    C --> J[sweetviz_report.html]
    C --> K[dataprep_report.html]
    
    D --> L[Box Plots]
    D --> M[Histograms]
    D --> N[Correlation Heatmaps]
    
    E --> O[missing_values_report.txt]
    E --> P[duplicated_rows.csv]
    E --> Q[imputed_data.csv]

Documentation

For Source Distribution

Source Distribution - Full setup

For PyPI Package

Installation: pip install autocsv-profiler
Usage: autocsv-profiler --help
PyPI Page: https://pypi.org/project/autocsv-profiler/

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

This project is licensed under the MIT License. See the LICENSE file for details.

Support

Create an issue for bug reports
Check troubleshooting guide for common problems
Review changelog for recent updates

Version

Current version: 1.1.0

For version history and changes, see CHANGELOG.md.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
assets		assets
docs		docs
environments		environments
examples		examples
package/autocsv-profiler		package/autocsv-profiler
scripts		scripts
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
SECURITY.md		SECURITY.md
run_analysis.bat		run_analysis.bat

License

dhaneshbb/AutoCSV-Profiler-Suite

Folders and files

Latest commit

History

Repository files navigation

AutoCSV Profiler Suite

Project Structure

Distribution Methods

🐍 PyPI Package (Simplified)

📦 Source Distribution (Full Suite)

Features

Project Architecture

Source Distribution (Full Suite)

PyPI Package (Simplified)

Environment Structure

Source Distribution (Multiple Environments)

PyPI Package (Single Environment)

Quick Start

Prerequisites

Installation Options

Option 1: PyPI Package (Recommended for Most Users)

Option 2: Source Distribution (Full Feature Set)

Usage Comparison

Choose PyPI Package If:

Choose Source Distribution If:

Usage Workflow

Environment Management (Source Distribution)

csv-profiler-main

csv-profiler-profiling

csv-profiler-dataprep

Output Structure

Documentation

For Source Distribution

For PyPI Package

Contributing

License

Support

Version

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages