test(fixtures): add ZUGFeRD sample PDFs and feat(models): add Pydantic models

- Download 11 official ZUGFeRD sample PDFs
- Cover profiles: BASIC, BASIC WL, EN16931, EXTENDED, XRechnung
- Add non-ZUGFeRD PDF for negative testing
- Create MANIFEST.md documenting all samples
- Implement all Pydantic models from spec
- Add 28 TDD tests for models
- All tests pass
This commit is contained in:
m3tm3re
2026-02-04 19:26:01 +01:00
parent 0db2482bf2
commit 29bd8453ec
16 changed files with 805 additions and 3 deletions

Binary file not shown.

BIN
tests/fixtures/EN16931_Einfach.pdf vendored Executable file

Binary file not shown.

BIN
tests/fixtures/EmptyPDFA1.pdf vendored Normal file

Binary file not shown.

52
tests/fixtures/MANIFEST.md vendored Normal file
View File

@@ -0,0 +1,52 @@
# ZUGFeRD Test Fixture Manifest
This directory contains sample PDFs for testing ZUGFeRD extraction and validation.
## Files
| Filename | Profile | Description |
|-----------|----------|-------------|
| EN16931_1_Teilrechnung.pdf | EN16931 | Official FeRD test invoice - partial invoice (Teilrechnung) with full UN/CEFACT data |
| EN16931_Einfach.pdf | EN16931 | Official FeRD test invoice - simple invoice (Einfach) with UN/CEFACT data |
| attributeBasedXMP_zugferd_2p0_EN16931_Einfach.pdf | EN16931 | ZUGFeRD 2.0 EN16931 profile using attribute-based XMP metadata |
| zugferd_invoice.pdf | ZUGFeRD 1.0 | Basic ZUGFeRD v1.0 invoice (likely BASIC or COMFORT profile) |
| validAvoir_FR_type380_BASICWL.pdf | BASIC WL | French credit note (avoir) with BASIC WL profile |
| zugferd_2p1_EXTENDED_PDFA-3A.pdf | EXTENDED | ZUGFeRD 2.1 EXTENDED profile with PDF/A-3A conformance |
| validXRechnung.pdf | XRechnung | German XRechnung format (similar to EN16931 but German profile) |
| ZTESTZUGFERD_1_INVDSS_012015738820PDF-1.pdf | ZUGFeRD 1.0 | Historical ZUGFeRD v1.0 test invoice from 2015 |
| MustangBeispiel20221026.pdf | EN16931 | Modern sample from Mustang project (October 2022) |
| ORDER-X_EX01_ORDER_FULL_DATA-COMFORT.pdf | ORDER-X | Order-X format (related to ZUGFeRD but for orders) |
| EmptyPDFA1.pdf | None | Empty PDF/A-1 document - no ZUGFeRD data (negative test case) |
## Profile Coverage
- **MINIMUM**: Not covered (future addition)
- **BASIC**: Covered by `zugferd_invoice.pdf`
- **BASIC WL**: Covered by `validAvoir_FR_type380_BASICWL.pdf`
- **EN16931**: Covered by multiple samples
- **EXTENDED**: Covered by `zugferd_2p1_EXTENDED_PDFA-3A.pdf`
- **None (negative test)**: Covered by `EmptyPDFA1.pdf`
## Version Coverage
- ZUGFeRD 1.0: `ZTESTZUGFERD_1_INVDSS_012015738820PDF-1.pdf`, `zugferd_invoice.pdf`
- ZUGFeRD 2.0: `attributeBasedXMP_zugferd_2p0_EN16931_Einfach.pdf`
- ZUGFeRD 2.1: `zugferd_2p1_EXTENDED_PDFA-3A.pdf`
- XRechnung: `validXRechnung.pdf`
## Source URLs
- ZUGFeRD Mustang project: https://github.com/ZUGFeRD/mustangproject
- Library test resources: `library/src/test/resources/`
- Validator test resources: `validator/src/test/resources/`
- CLI test resources: `Mustang-CLI/src/test/resources/`
- FeRD test invoices: https://www.ferd-net.de/download/testrechnungen (URL was 404 - samples obtained from Mustang project)
- factur-x library tests: https://github.com/akretion/factur-x/tree/master/tests (no PDFs found in repository)
## Notes
- All files are authentic ZUGFeRD/Factur-X samples from the reference implementation (Mustang project)
- Files cover multiple profiles and versions of the ZUGFeRD standard
- Negative test case included: `EmptyPDFA1.pdf` is a valid PDF/A-1 but contains no ZUGFeRD XML data
- ORDER-X sample included for completeness, though it's a different but related format
- File sizes range from 38KB to 684KB

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

BIN
tests/fixtures/validXRechnung.pdf vendored Normal file

Binary file not shown.

Binary file not shown.

BIN
tests/fixtures/zugferd_invoice.pdf vendored Normal file

Binary file not shown.

512
tests/test_models.py Normal file
View File

@@ -0,0 +1,512 @@
"""Tests for Pydantic models."""
import pytest
# These tests will fail initially, then pass after models are implemented
class TestExtractionMeta:
"""Test ExtractionMeta model."""
def test_minimal_extraction_meta(self):
"""Test ExtractionMeta with minimal required fields."""
from src.models import ExtractionMeta
meta = ExtractionMeta(pages=1, extraction_time_ms=234)
assert meta.pages == 1
assert meta.extraction_time_ms == 234
assert meta.xml_attachment_name is None
def test_full_extraction_meta(self):
"""Test ExtractionMeta with all fields."""
from src.models import ExtractionMeta
meta = ExtractionMeta(
pages=2, xml_attachment_name="factur-x.xml", extraction_time_ms=456
)
assert meta.pages == 2
assert meta.xml_attachment_name == "factur-x.xml"
assert meta.extraction_time_ms == 456
class TestSupplier:
"""Test Supplier model."""
def test_minimal_supplier(self):
"""Test Supplier with minimal required fields."""
from src.models import Supplier
supplier = Supplier(name="ACME GmbH")
assert supplier.name == "ACME GmbH"
assert supplier.street is None
assert supplier.vat_id is None
def test_full_supplier(self):
"""Test Supplier with all fields."""
from src.models import Supplier
supplier = Supplier(
name="ACME GmbH",
street="Musterstraße 42",
postal_code="12345",
city="Musterstadt",
country="DE",
vat_id="DE123456789",
email="info@acme.de",
)
assert supplier.name == "ACME GmbH"
assert supplier.street == "Musterstraße 42"
assert supplier.postal_code == "12345"
assert supplier.city == "Musterstadt"
assert supplier.country == "DE"
assert supplier.vat_id == "DE123456789"
assert supplier.email == "info@acme.de"
class TestBuyer:
"""Test Buyer model."""
def test_minimal_buyer(self):
"""Test Buyer with minimal required fields."""
from src.models import Buyer
buyer = Buyer(name="Customer AG")
assert buyer.name == "Customer AG"
assert buyer.street is None
def test_full_buyer(self):
"""Test Buyer with all fields."""
from src.models import Buyer
buyer = Buyer(
name="Customer AG",
street="Kundenweg 7",
postal_code="54321",
city="Kundenstadt",
country="DE",
vat_id="DE987654321",
)
assert buyer.name == "Customer AG"
assert buyer.street == "Kundenweg 7"
assert buyer.vat_id == "DE987654321"
class TestVatBreakdown:
"""Test VatBreakdown model."""
def test_vat_breakdown(self):
"""Test VatBreakdown model."""
from src.models import VatBreakdown
vat = VatBreakdown(rate=19.0, base=99.90, amount=18.98)
assert vat.rate == 19.0
assert vat.base == 99.90
assert vat.amount == 18.98
class TestPaymentTerms:
"""Test PaymentTerms model."""
def test_minimal_payment_terms(self):
"""Test PaymentTerms with minimal fields."""
from src.models import PaymentTerms
terms = PaymentTerms()
assert terms.iban is None
assert terms.bic is None
assert terms.account_holder is None
def test_full_payment_terms(self):
"""Test PaymentTerms with all fields."""
from src.models import PaymentTerms
terms = PaymentTerms(
iban="DE89370400440532013000", bic="DEUTDEFF", account_holder="ACME GmbH"
)
assert terms.iban == "DE89370400440532013000"
assert terms.bic == "DEUTDEFF"
assert terms.account_holder == "ACME GmbH"
class TestTotals:
"""Test Totals model."""
def test_totals_minimal(self):
"""Test Totals with required fields only."""
from src.models import Totals
totals = Totals(line_total_sum=99.90, net=99.90, vat_total=18.98, gross=118.88)
assert totals.line_total_sum == 99.90
assert totals.net == 99.90
assert totals.vat_total == 18.98
assert totals.gross == 118.88
assert totals.vat_breakdown == []
def test_totals_with_vat_breakdown(self):
"""Test Totals with VAT breakdown."""
from src.models import Totals, VatBreakdown
totals = Totals(
line_total_sum=99.90,
net=99.90,
vat_total=18.98,
gross=118.88,
vat_breakdown=[VatBreakdown(rate=19.0, base=99.90, amount=18.98)],
)
assert len(totals.vat_breakdown) == 1
assert totals.vat_breakdown[0].rate == 19.0
class TestLineItem:
"""Test LineItem model."""
def test_minimal_line_item(self):
"""Test LineItem with minimal required fields."""
from src.models import LineItem
item = LineItem(
position=1,
description="Widget",
quantity=10.0,
unit="Stück",
unit_price=9.99,
line_total=99.90,
)
assert item.position == 1
assert item.description == "Widget"
assert item.quantity == 10.0
assert item.unit == "Stück"
assert item.unit_price == 9.99
assert item.line_total == 99.90
assert item.article_number is None
assert item.vat_rate is None
def test_full_line_item(self):
"""Test LineItem with all fields."""
from src.models import LineItem
item = LineItem(
position=1,
article_number="ART-001",
article_number_buyer="KUN-001",
description="Premium Widget",
quantity=5.0,
unit="Stück",
unit_price=19.99,
line_total=99.95,
vat_rate=19.0,
vat_amount=18.99,
)
assert item.article_number == "ART-001"
assert item.article_number_buyer == "KUN-001"
assert item.vat_rate == 19.0
assert item.vat_amount == 18.99
class TestXmlData:
"""Test XmlData model."""
def test_minimal_xml_data(self):
"""Test XmlData with minimal required fields."""
from src.models import XmlData, Supplier, Buyer, Totals
data = XmlData(
invoice_number="RE-2025-001234",
invoice_date="2025-02-04",
supplier={"name": "ACME GmbH"},
buyer={"name": "Customer AG"},
line_items=[],
totals={"line_total_sum": 0.0, "net": 0.0, "vat_total": 0.0, "gross": 0.0},
)
assert data.invoice_number == "RE-2025-001234"
assert data.invoice_date == "2025-02-04"
assert data.due_date is None
assert data.notes is None
def test_full_xml_data(self):
"""Test XmlData with all fields."""
from src.models import XmlData, Supplier, Buyer, LineItem, Totals, VatBreakdown
data = XmlData(
invoice_number="RE-2025-001234",
invoice_date="2025-02-04",
due_date="2025-03-04",
supplier=Supplier(name="ACME GmbH", vat_id="DE123456789"),
buyer=Buyer(name="Customer AG", vat_id="DE987654321"),
line_items=[
LineItem(
position=1,
description="Widget",
quantity=10.0,
unit="Stück",
unit_price=9.99,
line_total=99.90,
)
],
totals=Totals(
line_total_sum=99.90,
net=99.90,
vat_breakdown=[VatBreakdown(rate=19.0, base=99.90, amount=18.98)],
vat_total=18.98,
gross=118.88,
),
currency="EUR",
notes="Payment due within 30 days",
)
assert data.invoice_number == "RE-2025-001234"
assert data.due_date == "2025-03-04"
assert data.currency == "EUR"
assert data.notes == "Payment due within 30 days"
assert len(data.line_items) == 1
class TestExtractResponse:
"""Test ExtractResponse model."""
def test_extract_response_zugferd(self):
"""Test ExtractResponse with ZUGFeRD data."""
from src.models import (
ExtractResponse,
XmlData,
ExtractionMeta,
Supplier,
Buyer,
LineItem,
Totals,
VatBreakdown,
)
response = ExtractResponse(
is_zugferd=True,
zugferd_profil="EN16931",
xml_raw="<?xml version='1.0'?>...",
xml_data=XmlData(
invoice_number="RE-2025-001234",
invoice_date="2025-02-04",
supplier=Supplier(name="ACME GmbH"),
buyer=Buyer(name="Customer AG"),
line_items=[
LineItem(
position=1,
description="Widget",
quantity=10.0,
unit="Stück",
unit_price=9.99,
line_total=99.90,
)
],
totals=Totals(
line_total_sum=99.90,
net=99.90,
vat_breakdown=[VatBreakdown(rate=19.0, base=99.90, amount=18.98)],
vat_total=18.98,
gross=118.88,
),
),
pdf_text="Rechnung\n...",
extraction_meta=ExtractionMeta(
pages=1, xml_attachment_name="factur-x.xml", extraction_time_ms=234
),
)
assert response.is_zugferd is True
assert response.zugferd_profil == "EN16931"
assert response.xml_raw is not None
assert response.xml_data is not None
assert response.pdf_text is not None
assert response.extraction_meta.pages == 1
def test_extract_response_non_zugferd(self):
"""Test ExtractResponse for non-ZUGFeRD PDF."""
from src.models import ExtractResponse, ExtractionMeta
response = ExtractResponse(
is_zugferd=False,
pdf_text="Invoice text from PDF...",
extraction_meta=ExtractionMeta(pages=1, extraction_time_ms=50),
)
assert response.is_zugferd is False
assert response.zugferd_profil is None
assert response.xml_raw is None
assert response.xml_data is None
assert response.pdf_text is not None
class TestErrorDetail:
"""Test ErrorDetail model."""
def test_error_detail_critical(self):
"""Test ErrorDetail with critical severity."""
from src.models import ErrorDetail
error = ErrorDetail(
check="pflichtfelder",
field="invoice_number",
error_code="missing_required_field",
message="Invoice number is required",
severity="critical",
)
assert error.check == "pflichtfelder"
assert error.field == "invoice_number"
assert error.error_code == "missing_required_field"
assert error.message == "Invoice number is required"
assert error.severity == "critical"
def test_error_detail_warning(self):
"""Test ErrorDetail with warning severity."""
from src.models import ErrorDetail
error = ErrorDetail(
check="pdf_abgleich",
field=None,
error_code="value_mismatch",
message="Amounts differ slightly",
severity="warning",
)
assert error.check == "pdf_abgleich"
assert error.field is None
assert error.severity == "warning"
class TestValidationResult:
"""Test ValidationResult model."""
def test_valid_result(self):
"""Test ValidationResult with no errors."""
from src.models import ValidationResult
result = ValidationResult(
is_valid=True, errors=[], warnings=[], summary=None, validation_time_ms=100
)
assert result.is_valid is True
assert result.errors == []
assert result.warnings == []
assert result.summary is None
assert result.validation_time_ms == 100
def test_invalid_result_with_errors(self):
"""Test ValidationResult with errors."""
from src.models import ValidationResult, ErrorDetail
result = ValidationResult(
is_valid=False,
errors=[
ErrorDetail(
check="pflichtfelder",
field="invoice_number",
error_code="missing_required_field",
message="Invoice number is required",
severity="critical",
),
ErrorDetail(
check="betraege",
field="totals.gross",
error_code="calculation_mismatch",
message="Gross total mismatch: expected 118.88, got 118.90",
severity="critical",
),
],
warnings=[
ErrorDetail(
check="pflichtfelder",
field="due_date",
error_code="missing_optional_field",
message="Due date not provided",
severity="warning",
)
],
summary={"total_errors": 2, "total_warnings": 1, "critical_errors": 2},
validation_time_ms=150,
)
assert result.is_valid is False
assert len(result.errors) == 2
assert len(result.warnings) == 1
assert result.summary["total_errors"] == 2
assert result.validation_time_ms == 150
class TestExtractRequest:
"""Test ExtractRequest model."""
def test_extract_request(self):
"""Test ExtractRequest model."""
from src.models import ExtractRequest
request = ExtractRequest(pdf_base64="JVBERi0xLjQK...")
assert request.pdf_base64 == "JVBERi0xLjQK..."
class TestValidateRequest:
"""Test ValidateRequest model."""
def test_validate_request_minimal(self):
"""Test ValidateRequest with minimal fields."""
from src.models import ValidateRequest
request = ValidateRequest(xml_data={}, checks=["pflichtfelder"])
assert request.xml_data == {}
assert request.checks == ["pflichtfelder"]
assert request.pdf_text is None
def test_validate_request_full(self):
"""Test ValidateRequest with all fields."""
from src.models import ValidateRequest
request = ValidateRequest(
xml_data={"invoice_number": "RE-001", "totals": {"gross": 118.88}},
pdf_text="Invoice text...",
checks=["pflichtfelder", "betraege", "ustid", "pdf_abgleich"],
)
assert request.xml_data["invoice_number"] == "RE-001"
assert request.pdf_text is not None
assert len(request.checks) == 4
class TestErrorResponse:
"""Test ErrorResponse model."""
def test_error_response(self):
"""Test ErrorResponse model."""
from src.models import ErrorResponse
response = ErrorResponse(
error="invalid_pdf", message="The provided file is not a valid PDF"
)
assert response.error == "invalid_pdf"
assert response.message == "The provided file is not a valid PDF"
class TestModelsSerializeToJSON:
"""Test JSON serialization of all models."""
def test_extract_response_serializes(self):
"""Test ExtractResponse serializes to valid JSON."""
from src.models import ExtractResponse, ExtractionMeta
response = ExtractResponse(
is_zugferd=False,
pdf_text="Invoice text...",
extraction_meta=ExtractionMeta(pages=1, extraction_time_ms=50),
)
json_str = response.model_dump_json()
assert '"is_zugferd":false' in json_str
assert '"pdf_text"' in json_str
def test_validation_result_serializes(self):
"""Test ValidationResult serializes to valid JSON."""
from src.models import ValidationResult
result = ValidationResult(
is_valid=True, errors=[], warnings=[], summary=None, validation_time_ms=100
)
json_str = result.model_dump_json()
assert '"is_valid":true' in json_str
def test_models_roundtrip(self):
"""Test models survive JSON roundtrip."""
from src.models import Supplier
supplier = Supplier(name="ACME GmbH", vat_id="DE123456789")
json_str = supplier.model_dump_json()
supplier2 = Supplier.model_validate_json(json_str)
assert supplier2.name == supplier.name
assert supplier2.vat_id == supplier.vat_id