agglovar.pairwise.base ====================== .. py:module:: agglovar.pairwise.base .. autoapi-nested-parse:: Base class for pairwise intersect strategies. Defines an interface and implements common functionality for pairwise intersect strategies. Classes ------- .. autoapisummary:: agglovar.pairwise.base.PairwiseJoin Module Contents --------------- .. py:class:: PairwiseJoin(weight_strategy: agglovar.pairwise.weights.WeightStrategy = DEFAULT_WEIGHT_STRATEGY) Bases: :py:obj:`abc.ABC` Base class for pairwise intersection classes. .. py:method:: check_required_cols(df: polars.LazyFrame | polars.DataFrame | collections.abc.Iterable[str], raise_exception: bool = False) -> set[str] Check if a table has the expected columns. :param df: Table to check. :param raise_exception: If True, raise an exception if any expected columns are missing. :returns: A set of missing columns. :raises ValueError: If any expected columns are missing and `raise_exception` is True. .. py:method:: check_reserved_cols(df: polars.LazyFrame | polars.DataFrame | collections.abc.Iterable[str], raise_exception: bool = False) -> set[str] Check if a table has reserved columns. :param df: Table to check. :param raise_exception: If True, raise an exception if any reserved columns are found. :returns: A set of reserved columns found in the table. :raises ValueError: If any reserved columns are found and `raise_exception` is True. .. py:method:: join(df_a: polars.DataFrame | polars.LazyFrame, df_b: polars.DataFrame | polars.LazyFrame, retain_index: bool = False, temp_dir: bool | str | pathlib.Path = False) -> polars.LazyFrame Find all pairs of variants in two sources that meet a set of criteria. This is a convenience function that calls join_iter() and concatenates the results. :param df_a: Table A. :param df_b: Table B. :param retain_index: If True, do not drop an existing "_index" column in callset tables if they exist. :param temp_dir: See :meth:`join_iter`. :returns: A join table. .. py:method:: join_iter(df_a: polars.DataFrame | polars.LazyFrame, df_b: polars.DataFrame | polars.LazyFrame, retain_index: bool = False, temp_dir: bool | str | pathlib.Path = False) -> collections.abc.Iterator[polars.LazyFrame] :abstractmethod: Find all pairs of variants in two sources that meet a set of criteria. :param df_a: Source dataframe. :param df_b: Target dataframe. :param retain_index: If True, do not drop an existing "_index" column in callset tables if they exist. :param temp_dir: How to materialise the prepared tables before the chunked loop. ``False`` (default) collects both into memory; ``True`` writes them to the system temp directory as parquet files; a ``str``/``Path`` writes them to that directory. Temp files are always removed on exit. :yields: A LazyFrame for each chunk. .. py:property:: required_cols :type: set[str] :abstractmethod: The minimum set of columns that must be present in input tables. .. py:property:: reserved_cols :type: set[str] A set of columns that are reserved for internal use and must not be present in input tables. .. py:property:: weight_strategy :type: agglovar.pairwise.weights.WeightStrategy Weight strategy to use for this join.