A Rust library and CLI tool for detecting data types from strings, starting with comprehensive date detection.
-
Date Detection: Supports multiple date formats including:
- ISO 8601 (2023-12-25, 2023-12-25T10:30:00Z)
- US formats (12/25/2023, 12-25-2023)
- European formats (25.12.2023)
- Unix timestamps (1703462400, 1703462400000)
- Year-only (2023)
- RFC formats (RFC 2822, RFC 3339)
- Multi-language support:
- English: "January 15 2023", "15 January 2023"
- Spanish: "15 enero 2023", "25 de diciembre de 2023"
-
High Confidence Scoring: Each detection includes a confidence score
-
Extensive Testing: Unit tests, property-based tests, and fuzzing
-
CLI Tool: Command-line interface for batch processing
use detectype::{detect_type, DataType};
let result = detect_type("2023-12-25");
assert_eq!(result, DataType::Date);
let result = detect_type("hello world");
assert_eq!(result, DataType::String);
# Single input
detectype "2023-12-25"
# With detailed information (including language detection)
detectype --verbose "15 enero 2023"
# Output: Date (Date format: DAY_MONTH_YEAR_ES (confidence: 0.85, language: es))
# From file
detectype --file dates.txt
# From stdin with verbose output
echo "25 de diciembre de 2023" | detectype --stdin --verbose
cargo build --release
# Run all tests
cargo test
# Run property-based tests
cargo test --test date_property_tests
# Run integration tests
cargo test --test integration_tests
Format | Example | Pattern |
---|---|---|
ISO 8601 Date | 2023-12-25 | YYYY-MM-DD |
ISO 8601 DateTime | 2023-12-25T10:30:00Z | YYYY-MM-DDTHH:MM:SSZ |
US Date (slash) | 12/25/2023 | MM/DD/YYYY |
US Date (dash) | 12-25-2023 | MM-DD-YYYY |
European Date | 25.12.2023 | DD.MM.YYYY |
Unix Timestamp | 1703462400 | 10 digits |
Unix Timestamp (ms) | 1703462400000 | 13 digits |
Year Only | 2023 | YYYY |
English Natural | January 15 2023 | Month Day Year |
English Natural | 15 January 2023 | Day Month Year |
Spanish Natural | 15 enero 2023 | Day Month Year |
Spanish Natural | 25 de diciembre de 2023 | Day de Month de Year |
src/lib.rs
- Main library interfacesrc/detectors/date.rs
- Date detection implementationsrc/error.rs
- Error handlingsrc/types.rs
- Type definitionssrc/bin/main.rs
- CLI tooltests/
- Comprehensive test suite
The library currently supports date detection in:
- English: Full and abbreviated month names (January/Jan, February/Feb, etc.)
- Spanish: Full and abbreviated month names (enero/ene, febrero/feb, etc.)
- Automatic language detection based on month names
- Handles ambiguous abbreviations (prioritizes English for conflicts)
- Supports Spanish prepositions ("de") in date formats
- Case-insensitive matching
- Confidence scoring per language
The architecture is designed for easy language extension. To add a new language:
- Add month name mappings in
src/detectors/date.rs
- Add regex patterns for language-specific formats
- Update the parsing logic in
parse_natural_language_date
- Add comprehensive tests
- Integer detection
- Float detection
- Boolean detection
- Email detection
- URL detection
- Phone number detection
- Credit card detection
- Additional languages (French, German, Italian, etc.)