agglovar.merge.base
Base class for callset intersects.
Attributes
Alias for acceptable types. |
Classes
Base class for callset intersects. |
Module Contents
- class agglovar.merge.base.MergeBase
Bases:
abc.ABCBase class for callset intersects.
- abstractmethod __call__(callsets: collections.abc.Iterable[CallsetDefType], retain_index: bool = False, pre_filter: collections.abc.Iterable[polars.Expr] | None = None, temp_dir: bool | str | pathlib.Path = False) polars.LazyFrame
Intersect callsets.
- Parameters:
callsets – Callsets to intersect.
retain_index – If True, do not drop an existing “_index” column if it exists.
pre_filter – If set, filter each table with these expressions. Filter is applied last (after “_index” is set).
temp_dir – How the underlying pairwise intersect materialises prepared tables before its chunked loop.
False(default) keeps tables in memory;Truewrites them to the system temp directory as parquet; astr/Pathwrites them to that directory. Temp files are always removed on exit.
- Returns:
A merged callset table.
- static get_intersect_tuples(callsets: collections.abc.Iterable[CallsetDefType], retain_index: bool = False, pre_filter: collections.abc.Iterable[polars.Expr] | polars.Expr | None = None) list[tuple[polars.LazyFrame, str, int]]
Transform input arguments to a list of tuples with set fields.
Each returned tuple has three fields:
A lazy frame
A name for the source
An index for the source (0 for the first, increments by 1)
If a source name is given, the given name is used. If it is not, then a default name is generated using the source index.
Lazy frames are transformed to add an index (“_index” column) and to enusre a variant ID is present (“id” column) and that all non-null values are filled in.
- Parameters:
callsets – Callsets parameter. May be an iterable of DataFrames, LazyFrames, or tuples of (DataFrame, name).
retain_index – If True, do not drop an existing “_index” column if it exists.
pre_filter – If set, filter each table with these expressions. Filter is applied last (after “_index” is set).
- Returns:
A list of tuples where each tuple element represents one input source.