Architecture¶
Yggdrasil is built around a single conversion registry that every engine plugs into.
The cast registry¶
Source: python/src/yggdrasil/data/cast/registry.py.
Register converters with @register_converter(from_hint, to_hint); dispatch them via convert(value, target). Dispatch order:
- Exact match — registered
(from, to)pair. - Identity — value already matches the target type.
Anywildcards — fall back to converters registered withAny.- MRO fallback — walk the source type's MRO to find a registered ancestor.
- One-hop composition —
from → mid → toif a single intermediate exists.
Engine modules register their converters on import:
import yggdrasil.arrow.cast # noqa: F401
import yggdrasil.polars.cast # noqa: F401 (needs polars installed)
import yggdrasil.pandas.cast # noqa: F401 (needs pandas installed)
import yggdrasil.spark.cast # noqa: F401 (needs pyspark installed)
If a conversion you expect isn't firing, check whether the engine module has actually been imported.
Register your own¶
from decimal import Decimal
from yggdrasil.data.cast.registry import convert, register_converter
@register_converter(str, Decimal)
def _str_to_decimal(value: str, options=None) -> Decimal:
return Decimal(value.replace(",", "."))
convert("19,95", Decimal) # Decimal('19.95')
CastOptions¶
Source: python/src/yggdrasil/data/cast/options.py.
CastOptions is the single normalized options carrier. It threads through every cast helper and holds source hints, target field/schema, safety/memory/nullability behavior, and strictness flags.
import yggdrasil.arrow as pa
from yggdrasil.data.cast.options import CastOptions
opts = CastOptions(
target_field=pa.schema([pa.field("id", pa.int64(), nullable=False)]),
strict_match_names=True,
)
In your own helpers, normalize input through CastOptions.check:
def normalize_options(options=None, *, target_field=None) -> CastOptions:
return CastOptions.check(options, target_field=target_field, strict_match_names=True)
Don't invent parallel per-call option objects — extend CastOptions or pass it through.
yggdrasil.data is the canonical surface¶
Reach for yggdrasil.data before raw engine APIs:
Field/Schemafor describing columns (names, nullability, metadata, nested structure).DataType/DataTypeIdfor type hints (don't hand-rollpa.int64()/pl.Int64/"bigint"strings).DataTable/StatementResultfor "execute a query, then move rows somewhere".convert(value, target, options=...)for value conversion.yggdrasil.data.enumsfor normalized currency / geozone / timezone values.
Only drop down to polars / pandas / pyspark / pyarrow when you actually need something the abstraction doesn't cover. When you do, register the new behavior back into yggdrasil.data so the next caller gets it for free.
Optional dependencies — the lib.py pattern¶
Subsystems that depend on optional packages expose a lib.py guard that does the import once and raises a helpful "install extra X" error on failure.
Same applies to yggdrasil.pandas.lib, yggdrasil.spark.lib, and Databricks-related modules.
The only hard runtime deps are pyarrow>=20, polars>=1.3, and yggrs. Base installs must keep working without anything else.
Rust fast path, Python canonical¶
yggdrasil/rs.py is the only place that imports from yggdrasil.rust.*. It exposes HAS_RS plus the fallback-capable entry points (e.g. utf8_len).
Rules:
- Python behavior is the source of truth; Rust must match it, not diverge.
- Pure-Python fallback must stay correct on its own — tests pass with and without
yggrsinstalled. - Add Rust only to a path that is actually hot and semantically stable.
from yggdrasil.rs import HAS_RS, utf8_len
print(HAS_RS) # True if yggrs is installed
print(utf8_len(["héllo"])) # native if HAS_RS, else pure Python
Schema intent across boundaries¶
Names, order, nullability, metadata, nested structure, precision/scale, and timezone intent are part of the user contract. Don't drop them unless the API documents the loss. The cast registry preserves them by default; engine bridges round-trip through Arrow rather than each engine's native parser to avoid silent drift.