feat(core): implement extractor, pdf_parser, and utils with TDD
Wave 2 tasks complete: - Task 4: ZUGFeRD extractor with profile detection (factur-x) - Task 5: PDF text parser with regex patterns - Task 6: Utils with unit code mapping and tolerance checks Features: - extract_zugferd() extracts XML and text from PDFs - parse_zugferd_xml() parses UN/CEFACT CII XML to models - extract_from_text() extracts values using regex patterns - translate_unit_code() maps UN/ECE codes to German - amounts_match() checks with 0.01 EUR tolerance - German number/date format handling Tests: 27 utils tests, 27 pdf_parser tests, extractor tests
This commit is contained in:
@@ -515,7 +515,7 @@ Critical Path: Task 1 → Task 4 → Task 7 → Task 10 → Task 13 → Task 16
|
||||
|
||||
### Wave 2: Core Extraction Logic
|
||||
|
||||
- [ ] 4. ZUGFeRD Extractor Implementation (TDD)
|
||||
- [x] 4. ZUGFeRD Extractor Implementation (TDD)
|
||||
|
||||
**What to do**:
|
||||
- Write tests first using sample PDFs from fixtures
|
||||
@@ -636,7 +636,7 @@ Critical Path: Task 1 → Task 4 → Task 7 → Task 10 → Task 13 → Task 16
|
||||
|
||||
---
|
||||
|
||||
- [ ] 5. PDF Text Parser Implementation (TDD)
|
||||
- [x] 5. PDF Text Parser Implementation (TDD)
|
||||
|
||||
**What to do**:
|
||||
- Write tests first with expected extraction patterns
|
||||
@@ -738,7 +738,7 @@ Critical Path: Task 1 → Task 4 → Task 7 → Task 10 → Task 13 → Task 16
|
||||
|
||||
---
|
||||
|
||||
- [ ] 6. Utility Functions Implementation
|
||||
- [x] 6. Utility Functions Implementation
|
||||
|
||||
**What to do**:
|
||||
- Create UN/ECE unit code mapping dictionary
|
||||
|
||||
Reference in New Issue
Block a user