agglovar.kmer.util

Basic k-mer manipulation utilities.

Attributes

BASE_TO_INT

Maps base string to 2-bit integer.

BYTE_SIZE_TO_NUMPY_UINT

Maps k-mer byte size the smallest numpy integer type that can store it.

INT_TO_BASE

Maps 2-bit integer to base string.

NP_MAX_BIT_SIZE

Maximum k-mer bit size for numpy arrays.

NP_MAX_BYTE_SIZE

Maximum k-mer byte size for numpy arrays.

NP_MAX_KMER_SIZE

Maximum k-mer size for numpy arrays.

Classes

KmerUtil

Manages basic k-mer functions, such as converting formats, appending bases, and reverse-complementing.

Functions

stream(→ Iterator[int])

Get an iterator for k-mers in a sequence.

stream_index(→ Iterator[tuple[int, int]])

Get an iterator for k-mers in a sequence with their index.

Module Contents

class agglovar.kmer.util.KmerUtil(k_size: int, k_min_size: int = 0, k_min_mask: int = 0)

Manages basic k-mer functions, such as converting formats, appending bases, and reverse-complementing.

Contains a set of constants specific to the k-mer size and minimizer (if defined).

Variables:
  • k_size – K-mer size.

  • k_bit_size – Number of bits in a k-mer. Minimum size of unsigned integers storing k-mers.

  • k_byte_size – Size of k-mers in bytes.

  • k_min_size – Minimizer size or <code>0</code> if a minimizer is not used.

  • k_min_mask – Minimizer mask if set and <code>kMinSize</code> is not <code>0</code>.

  • k_mask – Mask for k-mer part of integer. Masks out unused bits if an integer has more bits than k_bit_size.

  • min_kmer_util – K-mer util for minimizer. None if a minimizer is not defined.

  • minimizer_mask – Mask for extracting minimizers from k-mers (minimizer-mask). 0 if a minimizer is not defined.

  • sub_per_kmer – Number of sub-kmers per k-mer (sub_per_kmer). None if a minimizer is not defined.

  • np_int_type – Smallest numpy unsigned integer type that can store k-mers of k_size. None if k_size is larger than the maximum numpy unsigned integer size (i.e. 8: np.uint64).

append(kmer: int, base: str) int

Shift k-mer one base and append a new base.

Parameters:
  • kmer – Old k-mer.

  • base – Base to be appended.

Returns:

New k-mer with appended base.

canonical_complement(kmer: int) int

Get the canonical k-mer of a k-mer.

The canonical k-mer is the lesser of the k-mer and its reverse-complement.

Parameters:

kmer – K-mer.

Returns:

kmer if it is less than the reverse-complement, and the reverse-complement otherwise.

minimizer(kmer: int) int

Get the minimizer of a k-mer.

The minimizer of a k-mer is the lesser of all sub-k-mers and their reverse-complements. This function can only be used if a minimizer size is defined.

Parameters:

kmer – K-mer.

Returns:

Minimizer.

rev_complement(kmer: int) int

Reverse-complement k-mer.

Parameters:

kmer – K-mer.

Returns:

Reverse-complement of kmer.

rev_complement_array(kmer_arr: numpy.ndarray) numpy.ndarray

Reverse-complement k-mers in an array.

Parameters:

kmer_arr – K-mer array.

Returns:

Reverse-complemented k-mers.

to_kmer(k_str: str) int

Convert a string to a k-mer.

Parameters:

k_str – K-mer string.

Returns:

K-mer integer.

to_string(kmer: int) str

Translate integer k-mer to a string.

Parameters:

kmer – Integer k-mer.

Returns:

String representation of kmer.

k_bit_size: int
k_byte_size: int
k_mask: int
k_min_mask: int
k_min_size: int
k_size: int
min_kmer_util: Self | None
minimizer_mask: int
np_int_type: numpy.integer | None
sub_per_kmer: int | None
agglovar.kmer.util.stream(seq: str, kutil: KmerUtil) Iterator[int]

Get an iterator for k-mers in a sequence.

Parameters:
  • seq – String sequence of bases.

  • kutil – K-mer util describing parameters (e.g. k-mer size).

Yields:

K-mers.

agglovar.kmer.util.stream_index(seq: str, kutil: KmerUtil) Iterator[tuple[int, int]]

Get an iterator for k-mers in a sequence with their index.

Parameters:
  • seq – String sequence of bases.

  • kutil – K-mer util describing parameters (e.g. k-mer size).

Returns:

Iterator of tuples (kmer, index). Index starts at 0 and increments for each k-mer.

agglovar.kmer.util.BASE_TO_INT: Mapping[str, int]

Maps base string to 2-bit integer.

agglovar.kmer.util.BYTE_SIZE_TO_NUMPY_UINT: Mapping[int, numpy.integer]

Maps k-mer byte size the smallest numpy integer type that can store it.

Only defined for sizes up to the maximum numpy unsigned numpy size (i.e. 8: np.uint64).

agglovar.kmer.util.INT_TO_BASE: list[str] = ['A', 'C', 'G', 'T']

Maps 2-bit integer to base string.

agglovar.kmer.util.NP_MAX_BIT_SIZE

Maximum k-mer bit size for numpy arrays.

agglovar.kmer.util.NP_MAX_BYTE_SIZE

Maximum k-mer byte size for numpy arrays.

agglovar.kmer.util.NP_MAX_KMER_SIZE

Maximum k-mer size for numpy arrays.