agglovar.fa

Reference and FASTA processing utilities.

Functions

fa_info(→ polars.DataFrame)

Get a table of reference information from a FASTA file.

read_fai(, name)

Read an FAI File name.

Module Contents

agglovar.fa.fa_info(ref_fa: str | pathlib.Path) polars.DataFrame

Get a table of reference information from a FASTA file.

FASTA must have a “.fai”.

Table columns:

chrom: Chromosome name. md5: MD5 of the sequence. order: Order of the sequence in the FASTA file. pos: Start position of the sequence in the FASTA file. line_bp: Number of base pairs per line. line_bytes: Number of bytes per line.

Parameters:

ref_fa – Reference FASTA.

Returns:

A table with sequence information.

agglovar.fa.read_fai(fai_file_name: str | pathlib.Path, cols: Iterable[str] | None = ('chrom', 'len'), name: str = None) polars.DataFrame

Read an FAI File name.

By default, return a Series of chromosome lengths keyed by the chromosome (or contig) name.

Available columns are: chrom, len, pos, line_bp, line_bytes

Parameters:
  • fai_file_name – File to read.

  • cols – Select these columns. None to retain all columns.

  • name – Name of the chromosome column (typically “qry_id” to match query alignment tables).

Returns:

A table of the FAI file.