agglovar.merge.base

Base class for callset intersects.

Attributes

CallsetDefType

Alias for acceptable types.

Classes

MergeBase

Base class for callset intersects.

Module Contents

class agglovar.merge.base.MergeBase

Bases: abc.ABC

Base class for callset intersects.

abstractmethod __call__(callsets: collections.abc.Iterable[CallsetDefType], retain_index: bool = False, pre_filter: collections.abc.Iterable[polars.Expr] | None = None, temp_dir: bool | str | pathlib.Path = False) polars.LazyFrame

Intersect callsets.

Parameters:
  • callsets – Callsets to intersect.

  • retain_index – If True, do not drop an existing “_index” column if it exists.

  • pre_filter – If set, filter each table with these expressions. Filter is applied last (after “_index” is set).

  • temp_dir – How the underlying pairwise intersect materialises prepared tables before its chunked loop. False (default) keeps tables in memory; True writes them to the system temp directory as parquet; a str/Path writes them to that directory. Temp files are always removed on exit.

Returns:

A merged callset table.

static get_intersect_tuples(callsets: collections.abc.Iterable[CallsetDefType], retain_index: bool = False, pre_filter: collections.abc.Iterable[polars.Expr] | polars.Expr | None = None) list[tuple[polars.LazyFrame, str, int]]

Transform input arguments to a list of tuples with set fields.

Each returned tuple has three fields:

  1. A lazy frame

  2. A name for the source

  3. An index for the source (0 for the first, increments by 1)

If a source name is given, the given name is used. If it is not, then a default name is generated using the source index.

Lazy frames are transformed to add an index (“_index” column) and to enusre a variant ID is present (“id” column) and that all non-null values are filled in.

Parameters:
  • callsets – Callsets parameter. May be an iterable of DataFrames, LazyFrames, or tuples of (DataFrame, name).

  • retain_index – If True, do not drop an existing “_index” column if it exists.

  • pre_filter – If set, filter each table with these expressions. Filter is applied last (after “_index” is set).

Returns:

A list of tuples where each tuple element represents one input source.

type agglovar.merge.base.CallsetDefType = pl.DataFrame | pl.LazyFrame | Iterable[pl.DataFrame | pl.LazyFrame | str]

Alias for acceptable types.