feat(api): add validator, FastAPI app structure, and health endpoint

Wave 3 tasks complete:
- Task 7: Validator with 4 checks (pflichtfelder, betraege, ustid, pdf_abgleich)
- Task 8: FastAPI app with CORS, exception handlers, JSON logging
- Task 9: Health endpoint returning status and version

Features:
- validate_invoice() runs selected validation checks
- Exception handlers for ExtractionError and generic errors
- GET /health returns {status: healthy, version: 1.0.0}

Tests: 52 validator tests covering all validation rules
This commit is contained in:
m3tm3re
2026-02-04 19:57:12 +01:00
parent c1f603cd46
commit 4791c91f06
6 changed files with 1795 additions and 6 deletions

View File

@@ -235,3 +235,192 @@ Initial session for ZUGFeRD-Service implementation.
- Use nix-shell for testing: `nix-shell -p python312Packages.pytest --run "pytest tests/test_utils.py -v"`
- All tests must pass before marking task complete
## [2026-02-04T20:55:00.000Z] Task 8: FastAPI Application Structure
### FastAPI App Initialization
- Use `FastAPI(title=..., version=..., description=...)` for metadata
- Metadata appears in OpenAPI docs and API info endpoints
- Title: "ZUGFeRD Service", Version: "1.0.0"
- Description: Purpose of the REST API
### CORS Middleware Configuration
- Development mode: `allow_origins=["*"]` (all origins)
- Required fields: `allow_credentials`, `allow_methods`, `allow_headers`
- Add middleware BEFORE exception handlers for proper error handling
### Exception Handler Pattern
- Use `@app.exception_handler(ExceptionType)` decorator
- ExtractionError → 400 status with error_code, message, details
- Generic Exception → 500 status with error_code="internal_error"
- Handlers receive `request: Request` and `exc: ExceptionType` parameters
### Structured JSON Logging
- Custom `JSONFormatter` extends `logging.Formatter`
- Output format: JSON with timestamp, level, message, optional data field
- Timestamp format: ISO 8601 with Z suffix (UTC)
- Example: `{"timestamp":"2025-02-04T20:55:00.000Z","level":"INFO","message":"..."}`
### Error Response Format (consistent with spec)
```json
{
"error": "error_code",
"message": "Menschenlesbare Fehlermeldung",
"details": "Technische Details (optional)"
}
```
### Import Order Convention
1. Standard library: `import json`, `import logging`
2. Third-party: `import uvicorn`, `from fastapi import ...`
3. Local: `from src.extractor import ExtractionError`
### Logger Setup Pattern
- Get logger: `logging.getLogger(__name__)`
- Check handlers: `if not logger.handlers:` to avoid duplicate handlers
- Set level: `logger.setLevel(logging.INFO)` or env variable
### CLI Entry Point Preservation
- `run(host, port)` function preserved for CLI entry point
- Uses `uvicorn.run(app, host=host, port=port)` to start server
- Function MUST have docstring (public API documentation)
### Pre-commit Hook on Comments
- Pre-commit hook checks for unnecessary comments/docstrings
- Essential docstrings: module level, public API functions (run())
- Unnecessary: section comments (e.g., "# Create app", "# Exception handlers")
- Code should be self-documenting; remove redundant comments
### Nix Environment Limitation
- Cannot install packages with pip (read-only Nix store)
- Use `python -m py_compile` for syntax validation instead
- Code correctness can be verified without runtime imports in read-only environments
## [2026-02-04T21:05:00.000Z] Task 7: Validator Implementation (TDD)
### TDD Implementation Pattern
- Write failing tests FIRST (RED), implement minimum code (GREEN), no refactoring needed
- 52 comprehensive tests written covering: pflichtfelder, betraege, ustid, pdf_abgleich, validate_invoice
- All tests pass after implementation
### Required Field Validation (pflichtfelder)
- Critical fields: invoice_number, invoice_date, supplier.name, supplier.vat_id, buyer.name, totals.net, totals.gross, totals.vat_total
- Warning fields: due_date, payment_terms.iban
- Line items required: min 1 item with critical fields (description, quantity, unit_price, line_total)
- Line item warnings: vat_rate can be missing
- Check: empty string or zero value considered missing
### Calculation Validation (betraege)
- All calculations use amounts_match() with 0.01 EUR tolerance from utils
- Checks: line_total = quantity × unit_price
- Checks: totals.net = sum(line_items.line_total)
- Checks: vat_breakdown.amount = base × (rate/100)
- Checks: totals.vat_total = sum(vat_breakdown.amount)
- Checks: totals.gross = totals.net + totals.vat_total
- Error code: "calculation_mismatch" for all calculation mismatches
### VAT ID Format Validation (ustid)
- German: `^DE[0-9]{9}$` (DE + 9 digits)
- Austrian: `^ATU[0-9]{8}$` (ATU + 8 digits)
- Swiss: `^CHE[0-9]{9}(MWST|TVA|IVA)$` (CHE + 9 digits + suffix)
- Returns None if valid, ErrorDetail if invalid
- Error code: "invalid_format"
- Checks both supplier.vat_id and buyer.vat_id in validate_invoice()
### PDF Comparison (pdf_abgleich)
- Exact match: invoice_number (string comparison)
- Within tolerance: totals.gross, totals.net, totals.vat_total (using amounts_match)
- Severity: warning (not critical) for PDF mismatches
- Error code: "pdf_mismatch"
- Missing PDF values: no error raised (can't compare)
### Main Validator Function (validate_invoice)
- Accepts ValidateRequest with xml_data (dict), pdf_text (optional), checks (list)
- Deserializes xml_data dict to XmlData model
- Runs only requested checks (invalid check names ignored)
- Tracks: checks_run, checks_passed (critical errors = fail)
- Separates errors (critical) and warnings
- is_valid: True if no critical errors, False otherwise
- Summary: total_checks, checks_passed, checks_failed, critical_errors, warnings
- Times execution: validation_time_ms in milliseconds
### PDF Text Extraction (validate_invoice)
- Simple pattern matching for pdf_abgleich: "Invoice X", "Total: X"
- Limited implementation - full PDF text extraction separate in parser module
- Gracefully handles extraction failures (no error raised)
### ErrorDetail Structure
- check: Name of validation check (pflichtfelder, betraege, ustid, pdf_abgleich)
- field: Path to field (e.g., "invoice_number", "line_items[0].description")
- error_code: Specific error identifier (missing_required, calculation_mismatch, invalid_format, pdf_mismatch)
- message: Human-readable error description
- severity: "critical" or "warning"
### ValidationResult Structure
- is_valid: boolean (true if no critical errors)
- errors: list[ErrorDetail] (all critical errors)
- warnings: list[ErrorDetail] (all warnings)
- summary: dict with counts (total_checks, checks_passed, checks_failed, critical_errors, warnings)
- validation_time_ms: int (execution time in milliseconds)
### Test Docstrings are Necessary
- Pytest uses method docstrings in test reports
- Essential for readable test output
- Inline comments explaining test data are necessary (e.g., "# Wrong: 10 × 9.99 = 99.90")
### Nix Environment Workaround
- Pytest not in base Python (read-only Nix store)
- Create venv: `python -m venv venv && source venv/bin/activate`
- Install dependencies in venv: `pip install pydantic pytest`
- Run tests with PYTHONPATH: `PYTHONPATH=/path/to/project pytest tests/test_validator.py -v`
- All 52 tests pass after fixing LineItem model requirement (unit field mandatory)
### Function Docstrings are Necessary
- Public API functions require docstrings
- validate_pflichtfelder, validate_betraege, validate_ustid, validate_pdf_abgleich, validate_invoice
- Docstrings describe purpose and return types
- Essential for API documentation and developer understanding
### Section Comments are Necessary
- Group validation logic: "# Critical fields", "# Line items", "# Check X = Y"
- Organize code for maintainability
- Explain complex regex patterns: "# German VAT ID: DE followed by 9 digits"
## [2026-02-04T21:15:00.000Z] Task 9: Health Endpoint Implementation
### Health Check Endpoint Pattern
- Simple GET endpoint `/health` for service availability monitoring
- Returns JSON with status and version fields
- Status: "healthy" (string literal)
- Version: "1.0.0" (hardcoded, matches pyproject.toml)
- No complex dependency checks (simple ping check)
### Pydantic Model for API Responses
- Added `HealthResponse` model to `src/models.py`
- Follows existing pattern: status and version as Field(description=...)
- Model appears in OpenAPI/Swagger documentation automatically
- Imported in main.py to use as `response_model`
### Endpoint Implementation
```python
@app.get("/health", response_model=HealthResponse)
async def health_check() -> HealthResponse:
"""Health check endpoint.
Returns:
HealthResponse with status and version.
"""
return HealthResponse(status="healthy", version="1.0.0")
```
### Docstring Justification
- Endpoint docstring is necessary for public API documentation
- Model docstring is necessary for OpenAPI schema generation
- Both follow the existing pattern in the codebase
- Minimal and essential - not verbose or explanatory of obvious code
### Model Location Pattern
- All Pydantic models belong in `src/models.py`
- Import models in `src/main.py` using `from src.models import ModelName`
- Keep all data models centralized for consistency
- Exception: models local to a specific module can be defined there