feat(api): implement /extract and /validate endpoints with error handling

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
This commit is contained in:
m3tm3re
2026-02-04 20:07:35 +01:00
parent 4791c91f06
commit 867b47efd0
5 changed files with 263 additions and 7 deletions

View File

@@ -424,3 +424,87 @@ async def health_check() -> HealthResponse:
- Import models in `src/main.py` using `from src.models import ModelName` - Import models in `src/main.py` using `from src.models import ModelName`
- Keep all data models centralized for consistency - Keep all data models centralized for consistency
- Exception: models local to a specific module can be defined there - Exception: models local to a specific module can be defined there
## [2026-02-04T19:59:00.000Z] Task 11: Validate Endpoint Implementation
### Implementation
- Added POST /validate endpoint to src/main.py
- Endpoint accepts ValidateRequest (xml_data, pdf_text, checks)
- Returns ValidateResponse wrapping ValidationResult in "result" field
- Delegates to validate_invoice() from src.validator module
### Key Code Pattern
```python
@app.post("/validate", response_model=ValidateResponse)
async def validate_invoice_endpoint(request: ValidateRequest) -> ValidateResponse:
result = validate_invoice(request)
return ValidateResponse(result=result)
```
### Important Fix in Validator
- Updated validate_invoice() to handle empty checks gracefully
- If request.checks is empty, return early with ValidationResult(is_valid=True, ...)
- This prevents ValidationError when xml_data is empty but no checks need to run
### Testing
- test_validate_pflichtfelder: Tests valid invoice with pflichtfelder check
- test_validate_empty_checks: Tests empty checks list returns 200
- Both tests pass
### Validation Response Structure
Response contains nested "result" field:
```json
{
"result": {
"is_valid": false,
"errors": [...],
"warnings": [...],
"summary": {...},
"validation_time_ms": 45
}
}
```
### Docstring Justification
- Endpoint docstring provides API documentation for OpenAPI/Swagger
- Describes args (request type) and return (response type)
- Follows existing pattern from health_check endpoint
## Task 12: HTTPException Handler (2025-02-04)
### Pattern: Custom FastAPI Exception Handlers
FastAPI's default `HTTPException` returns nested `{"detail": {...}}` format which breaks API spec.
**Solution**: Add custom exception handler for `HTTPException` that returns flat JSON structure.
```python
@app.exception_handler(HTTPException)
async def http_exception_handler(request: Request, exc: HTTPException):
if isinstance(exc.detail, dict) and "error" in exc.detail:
return JSONResponse(
status_code=exc.status_code,
content={
"error": exc.detail.get("error"),
"message": exc.detail.get("message"),
},
)
return JSONResponse(
status_code=exc.status_code,
content={
"error": "http_error",
"message": str(exc.detail),
},
)
```
**Key Implementation Details**:
1. Handler checks if `exc.detail` is a dict with "error" key
2. If structured error (dict with error/message), extracts to flat format
3. Falls back to generic `{"error": "http_error", "message": str(exc.detail)}` for other cases
4. Preserves original status code from HTTPException
**Error Format Consistency**:
- All error responses now use flat structure: `{"error": "code", "message": "..."}`
- ExtractionError, HTTPException, and generic Exception handlers all follow this pattern
- Test `test_extract_invalid_base64` expects this flat format

View File

@@ -1067,7 +1067,7 @@ Critical Path: Task 1 → Task 4 → Task 7 → Task 10 → Task 13 → Task 16
### Wave 4: API Endpoints ### Wave 4: API Endpoints
- [ ] 10. Extract Endpoint Implementation (TDD) - [x] 10. Extract Endpoint Implementation (TDD)
**What to do**: **What to do**:
- Write integration tests for `/extract` endpoint - Write integration tests for `/extract` endpoint
@@ -1155,7 +1155,7 @@ Critical Path: Task 1 → Task 4 → Task 7 → Task 10 → Task 13 → Task 16
--- ---
- [ ] 11. Validate Endpoint Implementation (TDD) - [x] 11. Validate Endpoint Implementation (TDD)
**What to do**: **What to do**:
- Write integration tests for `/validate` endpoint - Write integration tests for `/validate` endpoint
@@ -1239,7 +1239,7 @@ Critical Path: Task 1 → Task 4 → Task 7 → Task 10 → Task 13 → Task 16
--- ---
- [ ] 12. Error Handling Middleware - [x] 12. Error Handling Middleware
**What to do**: **What to do**:
- Implement exception handlers for all error types - Implement exception handlers for all error types

View File

@@ -1,5 +1,6 @@
"""FastAPI application for ZUGFeRD invoice processing.""" """FastAPI application for ZUGFeRD invoice processing."""
import base64
import json import json
import logging import logging
from datetime import datetime from datetime import datetime
@@ -9,8 +10,15 @@ from fastapi import FastAPI, HTTPException, Request
from fastapi.middleware.cors import CORSMiddleware from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse from fastapi.responses import JSONResponse
from src.extractor import ExtractionError from src.extractor import ExtractionError, extract_zugferd
from src.models import HealthResponse from src.models import (
ExtractRequest,
ExtractResponse,
HealthResponse,
ValidateRequest,
ValidateResponse,
)
from src.validator import validate_invoice
class JSONFormatter(logging.Formatter): class JSONFormatter(logging.Formatter):
@@ -60,6 +68,25 @@ async def extraction_error_handler(request: Request, exc: ExtractionError):
) )
@app.exception_handler(HTTPException)
async def http_exception_handler(request: Request, exc: HTTPException):
if isinstance(exc.detail, dict) and "error" in exc.detail:
return JSONResponse(
status_code=exc.status_code,
content={
"error": exc.detail.get("error"),
"message": exc.detail.get("message"),
},
)
return JSONResponse(
status_code=exc.status_code,
content={
"error": "http_error",
"message": str(exc.detail),
},
)
@app.exception_handler(Exception) @app.exception_handler(Exception)
async def generic_error_handler(request: Request, exc: Exception): async def generic_error_handler(request: Request, exc: Exception):
logger.error(f"Internal error: {exc}") logger.error(f"Internal error: {exc}")
@@ -82,6 +109,41 @@ async def health_check() -> HealthResponse:
return HealthResponse(status="healthy", version="1.0.0") return HealthResponse(status="healthy", version="1.0.0")
@app.post("/extract", response_model=ExtractResponse)
async def extract_pdf(request: ExtractRequest) -> ExtractResponse:
"""Extract ZUGFeRD data from PDF.
Args:
request: ExtractRequest with pdf_base64 field
Returns:
ExtractResponse with extraction results
"""
try:
pdf_bytes = base64.b64decode(request.pdf_base64)
except Exception:
raise HTTPException(
status_code=400,
detail={"error": "invalid_base64", "message": "Invalid base64 encoding"},
)
return extract_zugferd(pdf_bytes)
@app.post("/validate", response_model=ValidateResponse)
async def validate_invoice_endpoint(request: ValidateRequest) -> ValidateResponse:
"""Validate ZUGFeRD invoice data.
Args:
request: ValidateRequest with xml_data, pdf_text, checks
Returns:
ValidateResponse with validation results
"""
result = validate_invoice(request)
return ValidateResponse(result=result)
def run(host: str = "0.0.0.0", port: int = 5000) -> None: def run(host: str = "0.0.0.0", port: int = 5000) -> None:
"""Run the FastAPI application. """Run the FastAPI application.

View File

@@ -257,11 +257,26 @@ def validate_invoice(request: ValidateRequest) -> ValidationResult:
all_errors = [] all_errors = []
all_warnings = [] all_warnings = []
xml_data = XmlData(**request.xml_data)
checks_run = 0 checks_run = 0
checks_passed = 0 checks_passed = 0
if not request.checks:
return ValidationResult(
is_valid=True,
errors=[],
warnings=[],
summary={
"total_checks": 0,
"checks_passed": 0,
"checks_failed": 0,
"critical_errors": 0,
"warnings": 0,
},
validation_time_ms=0,
)
xml_data = XmlData(**request.xml_data)
# Run requested checks # Run requested checks
for check_name in request.checks: for check_name in request.checks:
check_errors: list[ErrorDetail] = [] check_errors: list[ErrorDetail] = []

95
tests/test_api.py Normal file
View File

@@ -0,0 +1,95 @@
import base64
import pytest
from fastapi.testclient import TestClient
from src.main import app
@pytest.fixture
def client():
return TestClient(app)
def test_extract_valid_zugferd(client):
with open("tests/fixtures/EN16931_Einfach.pdf", "rb") as f:
pdf_base64 = base64.b64encode(f.read()).decode()
response = client.post("/extract", json={"pdf_base64": pdf_base64})
assert response.status_code == 200
data = response.json()
assert data["is_zugferd"] is True
assert data["zugferd_profil"] == "EN16931"
assert "xml_raw" in data
assert "xml_data" in data
assert "pdf_text" in data
assert "extraction_meta" in data
def test_extract_non_zugferd(client):
with open("tests/fixtures/EmptyPDFA1.pdf", "rb") as f:
pdf_base64 = base64.b64encode(f.read()).decode()
response = client.post("/extract", json={"pdf_base64": pdf_base64})
assert response.status_code == 200
data = response.json()
assert data["is_zugferd"] is False
assert data["zugferd_profil"] is None
assert "pdf_text" in data
assert "extraction_meta" in data
def test_extract_invalid_base64(client):
response = client.post("/extract", json={"pdf_base64": "invalid!!!"})
assert response.status_code == 400
data = response.json()
assert data["error"] == "invalid_base64"
assert "message" in data
def test_extract_non_pdf(client):
pdf_base64 = base64.b64encode(b"Hello World").decode()
response = client.post("/extract", json={"pdf_base64": pdf_base64})
assert response.status_code == 400
data = response.json()
assert "error" in data
def test_validate_pflichtfelder(client):
response = client.post(
"/validate",
json={
"xml_data": {
"invoice_number": "RE-001",
"invoice_date": "2025-02-04",
"supplier": {"name": "Test GmbH", "vat_id": "DE123456789"},
"buyer": {"name": "Kunde AG"},
"totals": {
"net": 100.0,
"gross": 119.0,
"vat_total": 19.0,
"line_total_sum": 100.0,
},
"line_items": [
{
"position": 1,
"description": "Test",
"quantity": 1.0,
"unit": "Stück",
"unit_price": 100.0,
"line_total": 100.0,
}
],
},
"checks": ["pflichtfelder"],
},
)
assert response.status_code == 200
data = response.json()
assert "result" in data
assert "is_valid" in data["result"]
def test_validate_empty_checks(client):
response = client.post("/validate", json={"xml_data": {}, "checks": []})
assert response.status_code == 200