6.4 KiB
6.4 KiB
AGENTS.md - Agent Development Guide
This document provides context and guidelines for agentic coding agents working on the zugferd-service repository.
Project Overview
ZUGFeRD-Service is a REST API for extracting and validating ZUGFeRD/Factur-X invoice data from PDF files. Built with FastAPI and Python 3.11+.
Tech Stack:
- FastAPI >= 0.109.0 (web framework)
- Uvicorn >= 0.27.0 (ASGI server)
- Pydantic >= 2.5.0 (data validation)
- factur-x >= 2.5 (ZUGFeRD/Factur-X library)
- pypdf >= 4.0.0 (PDF text extraction)
- lxml >= 5.0.0 (XML processing)
Commands
Development
# Install dependencies
pip install -e .
# Run the service (default: 0.0.0.0:5000)
python -m src.main
zugferd-service # entry point
# With environment variables
HOST=127.0.0.1 PORT=8000 LOG_LEVEL=DEBUG python -m src.main
Testing
# Run all tests
pytest
# Run specific test file
pytest tests/test_extract.py
# Run specific test function
pytest tests/test_api.py::test_health_check
# Run with coverage
pytest --cov=src
# Run with verbose output
pytest -v
Building
# Docker build
docker build -t zugferd-service .
# Nix build
nix build .#zugferd-service
# Nix development shell
nix develop
Code Style Guidelines
Type Hints (Python 3.11+)
Use modern union syntax (|) instead of Optional or Union:
# Good
field: str | None
numbers: list[int] | None
# Avoid
from typing import Optional, Union
field: Optional[str]
numbers: Union[list[int], None]
All public functions must have type hints:
def extract_zugferd(pdf_bytes: bytes) -> ExtractResponse:
"""Extract ZUGFeRD data from PDF bytes.
Args:
pdf_bytes: Raw PDF file content
Returns:
ExtractResponse with extraction results
"""
Imports
- Group imports: standard library, third-party, local modules
- Use
from typing import Anyonly when needed - Avoid star imports (
from module import *)
# Standard library
import io
import time
from typing import Any
# Third-party
from fastapi import FastAPI
from lxml import etree
from pydantic import BaseModel
# Local modules
from src.models import ExtractResponse
from src.utils import amounts_match
Naming Conventions
- Classes:
PascalCase(e.g.,ExtractionMeta,ValidateRequest) - Functions/variables:
snake_case(e.g.,extract_text_from_pdf,pdf_bytes) - Constants:
SCREAMING_SNAKE_CASE(e.g.,NAMESPACES,UNECE_UNIT_CODES) - Private:
_leading_underscore(e.g.,_parse_internal)
Pydantic Models
All models defined in src/models.py using Pydantic v2:
from pydantic import BaseModel, Field
class Supplier(BaseModel):
"""Supplier/seller information."""
name: str = Field(description="Supplier name")
vat_id: str | None = Field(default=None, description="VAT ID")
- Use
Field()for all fields with descriptions - Use
default=Nonefor optional fields (notNonein type hint) - Use
default_factory=listfor mutable defaults
Error Handling
Custom exceptions for domain-specific errors:
class ExtractionError(Exception):
"""Error during PDF extraction."""
def __init__(self, error_code: str, message: str, details: str = ""):
self.error_code = error_code
self.message = message
self.details = details
super().__init__(message)
FastAPI exception handlers defined in src/main.py:
ExtractionError→ 400 with error code/messageHTTPException→ preserves status_code- Generic
Exception→ 500 internal error
Raise HTTPException for validation errors:
from fastapi import HTTPException
raise HTTPException(
status_code=400,
detail={"error": "invalid_base64", "message": "Invalid base64 encoding"},
)
Docstrings
Use Google-style docstrings:
def parse_supplier(xml_root: etree._Element) -> Supplier:
"""Parse supplier information from XML.
Args:
xml_root: XML root element
Returns:
Supplier model with parsed data
"""
XML Parsing
Use lxml.etree with namespace-aware XPath:
from lxml import etree
NAMESPACES = {
"rsm": "urn:un:unece:uncefact:data:standard:CrossIndustryInvoice:100",
"ram": "urn:un:unece:uncefact:data:standard:ReusableAggregateBusinessInformationEntity:100",
}
# Use namespaces in all XPath queries
name = xml_root.xpath(
"//ram:ApplicableHeaderTradeAgreement/ram:SellerTradeParty/ram:Name/text()",
namespaces=NAMESPACES,
)
Logging
Structured JSON logging via custom JSONFormatter in src/main.py:
logger = logging.getLogger(__name__)
logger.info("Extraction completed", extra={"data": {"profile": "EN16931"}})
Testing
- Use
pytestwithpytest-asynciofor async tests - Use
TestClientfromfastapi.testclientfor API tests - Define fixtures in
tests/conftest.py - Test PDFs in
tests/fixtures/
import pytest
from fastapi.testclient import TestClient
from src.main import app
@pytest.fixture
def client():
return TestClient(app)
def test_health_check(client):
response = client.get("/health")
assert response.status_code == 200
assert response.json()["status"] == "healthy"
File Structure
src/
├── __init__.py
├── main.py # FastAPI app, endpoints, exception handlers
├── models.py # Pydantic models (all requests/responses)
├── extractor.py # ZUGFeRD XML extraction logic
├── validator.py # Invoice validation logic
├── pdf_parser.py # PDF text extraction
└── utils.py # Utility functions (constants, helpers)
tests/
├── conftest.py # Pytest fixtures
├── test_api.py # API endpoint tests
├── test_extractor.py # Extraction logic tests
├── test_validator.py # Validation logic tests
└── fixtures/ # Test PDF files
Validation Checks
Four validation checks supported:
- pflichtfelder - Required fields present and non-empty
- betraege - Amount calculations correct (tolerance: 0.01)
- ustid - VAT ID format (DE, AT, CH)
- pdf_abgleich - XML vs PDF text comparison
Environment Variables
HOST(default:0.0.0.0)PORT(default:5000)LOG_LEVEL(default:INFO)
Additional Notes
- Python 3.11+ required
- No type suppression (
# type: ignore) allowed - File size limit: 10MB for PDF uploads
- Returns warnings (not errors) for non-critical issues
- Uses Decimal rounding with ROUND_HALF_UP for monetary values