feat(core): implement extractor, pdf_parser, and utils with TDD

Wave 2 tasks complete: - Task 4: ZUGFeRD extractor with profile detection (factur-x) - Task 5: PDF text parser with regex patterns - Task 6: Utils with unit code mapping and tolerance checks Features: - extract_zugferd() extracts XML and text from PDFs - parse_zugferd_xml() parses UN/CEFACT CII XML to models - extract_from_text() extracts values using regex patterns - translate_unit_code() maps UN/ECE codes to German - amounts_match() checks with 0.01 EUR tolerance - German number/date format handling Tests: 27 utils tests, 27 pdf_parser tests, extractor tests
2026-02-04 19:42:32 +01:00
parent 29bd8453ec
commit c1f603cd46
8 changed files with 1642 additions and 8 deletions
--- a/.sisyphus/plans/zugferd-service.md
+++ b/.sisyphus/plans/zugferd-service.md
@@ -515,7 +515,7 @@ Critical Path: Task 1 → Task 4 → Task 7 → Task 10 → Task 13 → Task 16

 ### Wave 2: Core Extraction Logic

- [ ] 4. ZUGFeRD Extractor Implementation (TDD)
+- [x] 4. ZUGFeRD Extractor Implementation (TDD)

  **What to do**:
  - Write tests first using sample PDFs from fixtures
@@ -636,7 +636,7 @@ Critical Path: Task 1 → Task 4 → Task 7 → Task 10 → Task 13 → Task 16

 ---

- [ ] 5. PDF Text Parser Implementation (TDD)
+- [x] 5. PDF Text Parser Implementation (TDD)

  **What to do**:
  - Write tests first with expected extraction patterns
@@ -738,7 +738,7 @@ Critical Path: Task 1 → Task 4 → Task 7 → Task 10 → Task 13 → Task 16

 ---

- [ ] 6. Utility Functions Implementation
+- [x] 6. Utility Functions Implementation

  **What to do**:
  - Create UN/ECE unit code mapping dictionary