agglovar.merge.cumulative
A merging strategy that adds callsets cumulatively to the merge.
This strategy uses a table of variants that accumulates as callsets are added (the cumulative table). The cumulative table is initially empty. As callsets are added, variant intersecting the cumulative table are added to existing entries, and variants not intersecting the cumulative table are appended as new variants. This process is repeated for each callset added.
After all callsets are procesesd, the cumulative table represents a nonredundant callset where each entry is one variant that was found in one or more of the original callsets. Columns tracking the sources and the variant within each source.
This strategy is fast and uses minimal memory, but is not necessarily optimal. The order variants are input into the callset may alter the merged results in nontrivial ways, especially in loci where multiple join choices are possible.
Classes
Strategy for choosing the lead variant. |
|
Iterative intersection. |
Module Contents
- class agglovar.merge.cumulative.LeadStrategy(*args, **kwds)
Bases:
enum.EnumStrategy for choosing the lead variant.
When variants from multiple sources join into one record, this strategy determines which variant is chosen as the lead variant. The lead variant represents the merged records in the merged callset.
- FIRST = 'right'
- LEFT = 'left'
- class agglovar.merge.cumulative.MergeCumulative(pairwise_join: agglovar.pairwise.base.PairwiseJoin, lead_strategy: LeadStrategy = LeadStrategy.LEFT)
Bases:
agglovar.merge.base.MergeBaseIterative intersection.
- Variables:
join – Pairwise join strategy for intersects.
- __call__(callsets: collections.abc.Iterable[agglovar.merge.base.CallsetDefType], retain_index: bool = False, pre_filter: collections.abc.Iterable[polars.Expr] | polars.Expr | None = None, sort: bool = True, add_id: bool = True, temp_dir: bool | str | pathlib.Path = False) polars.LazyFrame
Intersect callsets.
- Parameters:
callsets – Callsets to intersect.
retain_index – If True, do not drop an existing “_index” column in callset tables if they exist.
pre_filter – If set, filter each table with these expressions. Filter is applied last (after “_index” is set).
temp_dir – Forwarded to the pairwise intersect for each cumulative step. See
agglovar.pairwise.base.PairwiseJoin.join_iter().
- Returns:
A merged callset table.
- lead_strategy
- pairwise_join