agglovar.bed.intersect

Table intersects.

Functions

as_bool(→ polars.LazyFrame)

Add a boolean column to df_a indicating whether each record intersects with df_b.

as_proportion(→ polars.LazyFrame)

Compute the proportion of each interval in df_a covered by intervals in df_b.

Module Contents

agglovar.bed.intersect.as_bool(df_a: polars.LazyFrame | polars.DataFrame, df_b: polars.LazyFrame | polars.DataFrame, name: str, distance: int = 0, negate: bool = False, col_names_a: agglovar.bed.col.CoordCol | Iterable[str] | str | None = None, col_names_b: agglovar.bed.col.CoordCol | Iterable[str] | str | None = None, temp_dir: bool | str | pathlib.Path = False) polars.LazyFrame

Add a boolean column to df_a indicating whether each record intersects with df_b.

Parameters:
  • df_a – Table a.

  • df_b – Table b.

  • name – Name of the column to add.

  • distance – Maximum distance between two records. May be negative to require overlap.

  • negate – If True, negate the boolean column to annotate misses instead of hits.

  • col_names_a – Columns in a (chromosome or query ID, pos, end).

  • col_names_b – Columns in b (chromosome or query ID, pos, end).

  • temp_dir – How to materialise the prepared tables before iterating. See agglovar.bed.join.pairwise_join().

Returns:

A LazyFrame with two columns: _index and name.

agglovar.bed.intersect.as_proportion(df_a: polars.LazyFrame | polars.DataFrame, df_b: polars.LazyFrame | polars.DataFrame, name: str, col_names_a: agglovar.bed.col.CoordCol | Iterable[str] | str | None = None, col_names_b: agglovar.bed.col.CoordCol | Iterable[str] | str | None = None, temp_dir: bool | str | pathlib.Path = False) polars.LazyFrame

Compute the proportion of each interval in df_a covered by intervals in df_b.

Rows in df_a with null pos or end are preserved in the output with a null proportion. Zero-length intervals (pos == end) produce NaN (0 / 0).

Parameters:
  • df_a – Table a.

  • df_b – Table b.

  • name – Name of the column to add.

  • col_names_a – Columns in a (chromosome or query ID, pos, end).

  • col_names_b – Columns in b (chromosome or query ID, pos, end).

  • temp_dir – How to materialise the prepared tables before iterating. See agglovar.bed.join.pairwise_join().

Returns:

A LazyFrame with two columns: _index and name.