+ validation schema
This commit is contained in:
256
AGENTS.md
Normal file
256
AGENTS.md
Normal file
@@ -0,0 +1,256 @@
|
||||
# AGENTS.md - Agent Development Guide
|
||||
|
||||
This document provides context and guidelines for agentic coding agents working on the zugferd-service repository.
|
||||
|
||||
## Project Overview
|
||||
|
||||
ZUGFeRD-Service is a REST API for extracting and validating ZUGFeRD/Factur-X invoice data from PDF files. Built with FastAPI and Python 3.11+.
|
||||
|
||||
**Tech Stack:**
|
||||
- FastAPI >= 0.109.0 (web framework)
|
||||
- Uvicorn >= 0.27.0 (ASGI server)
|
||||
- Pydantic >= 2.5.0 (data validation)
|
||||
- factur-x >= 2.5 (ZUGFeRD/Factur-X library)
|
||||
- pypdf >= 4.0.0 (PDF text extraction)
|
||||
- lxml >= 5.0.0 (XML processing)
|
||||
|
||||
## Commands
|
||||
|
||||
### Development
|
||||
```bash
|
||||
# Install dependencies
|
||||
pip install -e .
|
||||
|
||||
# Run the service (default: 0.0.0.0:5000)
|
||||
python -m src.main
|
||||
zugferd-service # entry point
|
||||
|
||||
# With environment variables
|
||||
HOST=127.0.0.1 PORT=8000 LOG_LEVEL=DEBUG python -m src.main
|
||||
```
|
||||
|
||||
### Testing
|
||||
```bash
|
||||
# Run all tests
|
||||
pytest
|
||||
|
||||
# Run specific test file
|
||||
pytest tests/test_extract.py
|
||||
|
||||
# Run specific test function
|
||||
pytest tests/test_api.py::test_health_check
|
||||
|
||||
# Run with coverage
|
||||
pytest --cov=src
|
||||
|
||||
# Run with verbose output
|
||||
pytest -v
|
||||
```
|
||||
|
||||
### Building
|
||||
```bash
|
||||
# Docker build
|
||||
docker build -t zugferd-service .
|
||||
|
||||
# Nix build
|
||||
nix build .#zugferd-service
|
||||
|
||||
# Nix development shell
|
||||
nix develop
|
||||
```
|
||||
|
||||
## Code Style Guidelines
|
||||
|
||||
### Type Hints (Python 3.11+)
|
||||
Use modern union syntax (`|`) instead of `Optional` or `Union`:
|
||||
```python
|
||||
# Good
|
||||
field: str | None
|
||||
numbers: list[int] | None
|
||||
|
||||
# Avoid
|
||||
from typing import Optional, Union
|
||||
field: Optional[str]
|
||||
numbers: Union[list[int], None]
|
||||
```
|
||||
|
||||
All public functions must have type hints:
|
||||
```python
|
||||
def extract_zugferd(pdf_bytes: bytes) -> ExtractResponse:
|
||||
"""Extract ZUGFeRD data from PDF bytes.
|
||||
|
||||
Args:
|
||||
pdf_bytes: Raw PDF file content
|
||||
|
||||
Returns:
|
||||
ExtractResponse with extraction results
|
||||
"""
|
||||
```
|
||||
|
||||
### Imports
|
||||
- Group imports: standard library, third-party, local modules
|
||||
- Use `from typing import Any` only when needed
|
||||
- Avoid star imports (`from module import *`)
|
||||
|
||||
```python
|
||||
# Standard library
|
||||
import io
|
||||
import time
|
||||
from typing import Any
|
||||
|
||||
# Third-party
|
||||
from fastapi import FastAPI
|
||||
from lxml import etree
|
||||
from pydantic import BaseModel
|
||||
|
||||
# Local modules
|
||||
from src.models import ExtractResponse
|
||||
from src.utils import amounts_match
|
||||
```
|
||||
|
||||
### Naming Conventions
|
||||
- **Classes**: `PascalCase` (e.g., `ExtractionMeta`, `ValidateRequest`)
|
||||
- **Functions/variables**: `snake_case` (e.g., `extract_text_from_pdf`, `pdf_bytes`)
|
||||
- **Constants**: `SCREAMING_SNAKE_CASE` (e.g., `NAMESPACES`, `UNECE_UNIT_CODES`)
|
||||
- **Private**: `_leading_underscore` (e.g., `_parse_internal`)
|
||||
|
||||
### Pydantic Models
|
||||
All models defined in `src/models.py` using Pydantic v2:
|
||||
```python
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
class Supplier(BaseModel):
|
||||
"""Supplier/seller information."""
|
||||
|
||||
name: str = Field(description="Supplier name")
|
||||
vat_id: str | None = Field(default=None, description="VAT ID")
|
||||
```
|
||||
- Use `Field()` for all fields with descriptions
|
||||
- Use `default=None` for optional fields (not `None` in type hint)
|
||||
- Use `default_factory=list` for mutable defaults
|
||||
|
||||
### Error Handling
|
||||
**Custom exceptions** for domain-specific errors:
|
||||
```python
|
||||
class ExtractionError(Exception):
|
||||
"""Error during PDF extraction."""
|
||||
|
||||
def __init__(self, error_code: str, message: str, details: str = ""):
|
||||
self.error_code = error_code
|
||||
self.message = message
|
||||
self.details = details
|
||||
super().__init__(message)
|
||||
```
|
||||
|
||||
**FastAPI exception handlers** defined in `src/main.py`:
|
||||
- `ExtractionError` → 400 with error code/message
|
||||
- `HTTPException` → preserves status_code
|
||||
- Generic `Exception` → 500 internal error
|
||||
|
||||
**Raise HTTPException for validation errors:**
|
||||
```python
|
||||
from fastapi import HTTPException
|
||||
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail={"error": "invalid_base64", "message": "Invalid base64 encoding"},
|
||||
)
|
||||
```
|
||||
|
||||
### Docstrings
|
||||
Use Google-style docstrings:
|
||||
```python
|
||||
def parse_supplier(xml_root: etree._Element) -> Supplier:
|
||||
"""Parse supplier information from XML.
|
||||
|
||||
Args:
|
||||
xml_root: XML root element
|
||||
|
||||
Returns:
|
||||
Supplier model with parsed data
|
||||
"""
|
||||
```
|
||||
|
||||
### XML Parsing
|
||||
Use `lxml.etree` with namespace-aware XPath:
|
||||
```python
|
||||
from lxml import etree
|
||||
|
||||
NAMESPACES = {
|
||||
"rsm": "urn:un:unece:uncefact:data:standard:CrossIndustryInvoice:100",
|
||||
"ram": "urn:un:unece:uncefact:data:standard:ReusableAggregateBusinessInformationEntity:100",
|
||||
}
|
||||
|
||||
# Use namespaces in all XPath queries
|
||||
name = xml_root.xpath(
|
||||
"//ram:ApplicableHeaderTradeAgreement/ram:SellerTradeParty/ram:Name/text()",
|
||||
namespaces=NAMESPACES,
|
||||
)
|
||||
```
|
||||
|
||||
### Logging
|
||||
Structured JSON logging via custom `JSONFormatter` in `src/main.py`:
|
||||
```python
|
||||
logger = logging.getLogger(__name__)
|
||||
logger.info("Extraction completed", extra={"data": {"profile": "EN16931"}})
|
||||
```
|
||||
|
||||
### Testing
|
||||
- Use `pytest` with `pytest-asyncio` for async tests
|
||||
- Use `TestClient` from `fastapi.testclient` for API tests
|
||||
- Define fixtures in `tests/conftest.py`
|
||||
- Test PDFs in `tests/fixtures/`
|
||||
|
||||
```python
|
||||
import pytest
|
||||
from fastapi.testclient import TestClient
|
||||
from src.main import app
|
||||
|
||||
@pytest.fixture
|
||||
def client():
|
||||
return TestClient(app)
|
||||
|
||||
def test_health_check(client):
|
||||
response = client.get("/health")
|
||||
assert response.status_code == 200
|
||||
assert response.json()["status"] == "healthy"
|
||||
```
|
||||
|
||||
### File Structure
|
||||
```
|
||||
src/
|
||||
├── __init__.py
|
||||
├── main.py # FastAPI app, endpoints, exception handlers
|
||||
├── models.py # Pydantic models (all requests/responses)
|
||||
├── extractor.py # ZUGFeRD XML extraction logic
|
||||
├── validator.py # Invoice validation logic
|
||||
├── pdf_parser.py # PDF text extraction
|
||||
└── utils.py # Utility functions (constants, helpers)
|
||||
|
||||
tests/
|
||||
├── conftest.py # Pytest fixtures
|
||||
├── test_api.py # API endpoint tests
|
||||
├── test_extractor.py # Extraction logic tests
|
||||
├── test_validator.py # Validation logic tests
|
||||
└── fixtures/ # Test PDF files
|
||||
```
|
||||
|
||||
## Validation Checks
|
||||
|
||||
Four validation checks supported:
|
||||
1. **pflichtfelder** - Required fields present and non-empty
|
||||
2. **betraege** - Amount calculations correct (tolerance: 0.01)
|
||||
3. **ustid** - VAT ID format (DE, AT, CH)
|
||||
4. **pdf_abgleich** - XML vs PDF text comparison
|
||||
|
||||
## Environment Variables
|
||||
- `HOST` (default: `0.0.0.0`)
|
||||
- `PORT` (default: `5000`)
|
||||
- `LOG_LEVEL` (default: `INFO`)
|
||||
|
||||
## Additional Notes
|
||||
- Python 3.11+ required
|
||||
- No type suppression (`# type: ignore`) allowed
|
||||
- File size limit: 10MB for PDF uploads
|
||||
- Returns warnings (not errors) for non-critical issues
|
||||
- Uses Decimal rounding with ROUND_HALF_UP for monetary values
|
||||
Reference in New Issue
Block a user