Files
zugferd-service/.sisyphus/drafts/zugferd-service.md

131 lines
4.4 KiB
Markdown

# Draft: ZUGFeRD-Service Implementation
## Requirements (confirmed)
### Core Functionality
- **Purpose**: Python REST API for ZUGFeRD/Factur-X invoice extraction and validation
- **Framework**: FastAPI (preferred by user)
- **Runtime**: Python 3.11+
- **Deployment**: Docker container on NixOS server + native Nix package
### API Endpoints
1. `GET /health` - Health check endpoint
2. `POST /extract` - PDF extraction (accepts base64-encoded PDF)
3. `POST /validate` - Invoice validation (pflichtfelder, betraege, ustid, pdf_abgleich)
### Key Dependencies
- `factur-x>=2.5` - ZUGFeRD/Factur-X extraction
- `pypdf>=4.0.0` - PDF text extraction
- `fastapi>=0.109.0` - API framework
- `uvicorn>=0.27.0` - ASGI server
- `pydantic>=2.5.0` - Data models
- `lxml>=5.0.0` - XML parsing
- `python-multipart>=0.0.6` - File uploads
### Project Structure (user-specified)
```
zugferd-service/
├── Dockerfile
├── requirements.txt
├── README.md
├── src/
│ ├── __init__.py
│ ├── main.py # FastAPI App + Endpoints
│ ├── extractor.py # ZUGFeRD/PDF Extraktion
│ ├── validator.py # Validierungslogik
│ ├── pdf_parser.py # PDF-Text-Parsing für Abgleich
│ ├── models.py # Pydantic Models
│ └── utils.py # Hilfsfunktionen
├── tests/
│ ├── __init__.py
│ ├── test_extractor.py
│ ├── test_validator.py
│ └── fixtures/
│ ├── sample_zugferd.pdf
│ └── sample_no_zugferd.pdf
└── docker-compose.yml
```
## Research Findings
### Nix Packaging (from librarian research)
- Use `buildPythonApplication` for standalone service
- `pyproject = true` with hatchling/setuptools
- `pythonRelaxDeps = true` for dependency flexibility
- mem0 example pattern: custom server script via `postInstall`
- Consider flake.nix for modern Nix workflow
### factur-x Library (from librarian research)
- `get_xml_from_pdf()` - Core extraction function
- `get_level()` / `get_flavor()` - Profile detection
- Namespaces: rsm, ram, udt for UN/CEFACT CII format
- Profile levels: minimum, basicwl, basic, en16931, extended
### UN/ECE Unit Codes
- C62 = Piece, KGM = Kilogram, H87 = Piece (alt)
- Need comprehensive mapping dictionary
## Technical Decisions
### Python Tooling
- **PENDING**: Use pyproject.toml (modern) or requirements.txt (legacy)?
- **PENDING**: Build system: setuptools, hatchling, or poetry-core?
### Nix Approach
- **PENDING**: Flake-based or traditional Nix expressions?
- **PENDING**: Include NixOS service module?
### Testing Strategy
- **PENDING**: TDD or tests-after?
- **PENDING**: Test framework: pytest (standard choice)
## Scope Boundaries
### INCLUDE
- All 3 API endpoints as specified
- All validation checks (pflichtfelder, betraege, ustid, pdf_abgleich)
- Docker multi-stage build
- Nix packaging
- Basic test suite
- README documentation
### EXCLUDE
- Online USt-ID validation (only format check)
- Database/persistence (stateless service)
- Authentication/authorization
- Rate limiting
- Metrics/tracing
## Open Questions (RESOLVED)
1.**Python project structure**: pyproject.toml with hatchling
2.**Build system**: hatchling (modern, Nix-friendly)
3.**Nix approach**: Flake-based
4.**Testing**: TDD (test-first) with pytest
5.**Sample PDFs**: Source from official ZUGFeRD repositories
## Metis Gap Analysis (Reviewed)
### Gaps Classified as MINOR (Auto-Resolved)
- **UN/ECE unit codes**: Start with common codes (C62, KGM, H87, MTR, LTR, etc.), expand as needed
- **Tolerance**: Hardcode 0.01 EUR as specified
- **Validation scope**: Check required fields exist for declared profile
- **Error codes**: Implement as specified in user's detailed spec
### Gaps Classified as DEFAULTS APPLIED
- **Authentication**: OPEN (no auth mentioned in spec → stateless public API)
- **ZUGFeRD profiles**: ALL profiles supported (MINIMUM, BASIC, BASIC WL, EN16931, EXTENDED)
- **Deployment**: Container-based on NixOS (as per NixOS config section in spec)
- **PDF text extraction**: REQUIRED for pdf_abgleich check (explicitly in spec)
- **File size limit**: Handle as error for >10MB (spec mentions this edge case)
### Guardrails (Must NOT Have)
- NO authentication middleware
- NO database/persistence
- NO caching layers
- NO rate limiting
- NO metrics endpoints (beyond /health)
- NO CLI interface
- NO web UI
- NO abstraction layers for "future extensibility"