Skip to content

Casting

Every cast in Yggdrasil — scalar, dataclass, Arrow, dataframe engine — runs through the same registry. This page shows the patterns you'll actually use.

Scalar conversion

from yggdrasil.data.cast.registry import convert

convert("10", int)              # 10
convert("false", bool)          # False
convert("3.14", float)          # 3.14
convert("2024-06-01", "date")   # datetime.date(2024, 6, 1)

Dict → dataclass

from dataclasses import dataclass
from yggdrasil.data.cast.registry import convert

@dataclass
class User:
    id: int
    email: str
    active: bool = True

convert({"id": "1", "email": "ada@example.com", "active": "yes"}, User)

Register a custom converter

from decimal import Decimal
from yggdrasil.data.cast.registry import register_converter, convert

@register_converter(str, Decimal)
def _str_to_decimal(value: str, options=None) -> Decimal:
    return Decimal(value.replace(",", "."))

convert("19,95", Decimal)   # Decimal('19.95')

Schema-aware tabular casting (Arrow)

import yggdrasil.arrow as pa
from yggdrasil.data.cast.options import CastOptions
from yggdrasil.arrow.cast import cast_arrow_tabular

source = pa.table({"id": ["1"], "price": ["9.99"]})
target = pa.schema([
    pa.field("id",    pa.int64(),   nullable=False),
    pa.field("price", pa.float64(), nullable=False),
])
out = cast_arrow_tabular(source, CastOptions(target_field=target, strict_match_names=True))

Streaming readers:

from yggdrasil.arrow.cast import cast_arrow_record_batch_reader

# reader: pyarrow.RecordBatchReader, opts: CastOptions
for batch in cast_arrow_record_batch_reader(reader, opts):
    process(batch)

Dataclass → Arrow field

from dataclasses import dataclass
from yggdrasil.dataclasses import dataclass_to_arrow_field

@dataclass
class Position:
    symbol: str
    quantity: float

print(dataclass_to_arrow_field(Position))

Engine bridges

Helper Module
cast_arrow_tabular, cast_arrow_record_batch_reader yggdrasil.arrow.cast
cast_pandas_dataframe yggdrasil.pandas.cast
cast_polars_dataframe, cast_polars_lazyframe yggdrasil.polars.cast
cast_spark_dataframe yggdrasil.spark.cast

Each module registers its converters on import. Always reach the optional engines via their lib.py guard so base installs stay functional:

from yggdrasil.polars.lib import polars
from yggdrasil.pandas.lib import pandas

Polars

import yggdrasil.arrow as pa
from yggdrasil.data.cast.options import CastOptions
from yggdrasil.polars.cast import cast_polars_dataframe
from yggdrasil.polars.lib import polars

df = polars.DataFrame({"id": ["1"], "score": ["4.5"]})
target = pa.schema([pa.field("id", pa.int64()), pa.field("score", pa.float64())])
out = cast_polars_dataframe(df, CastOptions(target_field=target))

Arrow ↔ Polars round-trip

from yggdrasil.polars.cast import (
    arrow_table_to_polars_dataframe,
    polars_dataframe_to_arrow_table,
)

pl_df = arrow_table_to_polars_dataframe(arrow_table)
roundtrip = polars_dataframe_to_arrow_table(pl_df)

pandas / Spark

yggdrasil.pandas.cast and yggdrasil.spark.cast mirror the same shape:

from yggdrasil.pandas.cast import cast_pandas_dataframe
from yggdrasil.spark.cast import cast_spark_dataframe

Reusing CastOptions in custom helpers

from yggdrasil.data.cast.options import CastOptions

def normalize_options(options=None, *, target_field=None) -> CastOptions:
    return CastOptions.check(options, target_field=target_field, strict_match_names=True)

When the cast doesn't fire

  1. Confirm the engine cast module is imported (yggdrasil.polars.cast, etc.). Engines register on import.
  2. Check CastOptions.target_fieldcast_arrow_tabular and friends need the target schema.
  3. Inspect the dispatch order in Architecture. Most "missing converter" cases are an MRO miss; register a converter or add an Any-wildcard fallback.