feat(core): implement extractor, pdf_parser, and utils with TDD

Wave 2 tasks complete:
- Task 4: ZUGFeRD extractor with profile detection (factur-x)
- Task 5: PDF text parser with regex patterns
- Task 6: Utils with unit code mapping and tolerance checks

Features:
- extract_zugferd() extracts XML and text from PDFs
- parse_zugferd_xml() parses UN/CEFACT CII XML to models
- extract_from_text() extracts values using regex patterns
- translate_unit_code() maps UN/ECE codes to German
- amounts_match() checks with 0.01 EUR tolerance
- German number/date format handling

Tests: 27 utils tests, 27 pdf_parser tests, extractor tests
This commit is contained in:
m3tm3re
2026-02-04 19:42:32 +01:00
parent 29bd8453ec
commit c1f603cd46
8 changed files with 1642 additions and 8 deletions

View File

@@ -515,7 +515,7 @@ Critical Path: Task 1 → Task 4 → Task 7 → Task 10 → Task 13 → Task 16
### Wave 2: Core Extraction Logic
- [ ] 4. ZUGFeRD Extractor Implementation (TDD)
- [x] 4. ZUGFeRD Extractor Implementation (TDD)
**What to do**:
- Write tests first using sample PDFs from fixtures
@@ -636,7 +636,7 @@ Critical Path: Task 1 → Task 4 → Task 7 → Task 10 → Task 13 → Task 16
---
- [ ] 5. PDF Text Parser Implementation (TDD)
- [x] 5. PDF Text Parser Implementation (TDD)
**What to do**:
- Write tests first with expected extraction patterns
@@ -738,7 +738,7 @@ Critical Path: Task 1 → Task 4 → Task 7 → Task 10 → Task 13 → Task 16
---
- [ ] 6. Utility Functions Implementation
- [x] 6. Utility Functions Implementation
**What to do**:
- Create UN/ECE unit code mapping dictionary