agglovar.kmer.util
Basic k-mer manipulation utilities.
Attributes
Maps base string to 2-bit integer. |
|
Maps k-mer byte size the smallest numpy integer type that can store it. |
|
Maps 2-bit integer to base string. |
|
Maximum k-mer bit size for numpy arrays. |
|
Maximum k-mer byte size for numpy arrays. |
|
Maximum k-mer size for numpy arrays. |
Classes
Manages basic k-mer functions, such as converting formats, appending bases, and reverse-complementing. |
Functions
|
Get an iterator for k-mers in a sequence. |
|
Get an iterator for k-mers in a sequence with their index. |
Module Contents
- class agglovar.kmer.util.KmerUtil(k_size: int, k_min_size: int = 0, k_min_mask: int = 0)
Manages basic k-mer functions, such as converting formats, appending bases, and reverse-complementing.
Contains a set of constants specific to the k-mer size and minimizer (if defined).
- Variables:
k_size – K-mer size.
k_bit_size – Number of bits in a k-mer. Minimum size of unsigned integers storing k-mers.
k_byte_size – Size of k-mers in bytes.
k_min_size – Minimizer size or <code>0</code> if a minimizer is not used.
k_min_mask – Minimizer mask if set and <code>kMinSize</code> is not <code>0</code>.
k_mask – Mask for k-mer part of integer. Masks out unused bits if an integer has more bits than k_bit_size.
min_kmer_util – K-mer util for minimizer. None if a minimizer is not defined.
minimizer_mask – Mask for extracting minimizers from k-mers (minimizer-mask). 0 if a minimizer is not defined.
sub_per_kmer – Number of sub-kmers per k-mer (sub_per_kmer). None if a minimizer is not defined.
np_int_type – Smallest numpy unsigned integer type that can store k-mers of k_size. None if k_size is larger than the maximum numpy unsigned integer size (i.e. 8: np.uint64).
- append(kmer: int, base: str) int
Shift k-mer one base and append a new base.
- Parameters:
kmer – Old k-mer.
base – Base to be appended.
- Returns:
New k-mer with appended base.
- canonical_complement(kmer: int) int
Get the canonical k-mer of a k-mer.
The canonical k-mer is the lesser of the k-mer and its reverse-complement.
- Parameters:
kmer – K-mer.
- Returns:
kmer if it is less than the reverse-complement, and the reverse-complement otherwise.
- minimizer(kmer: int) int
Get the minimizer of a k-mer.
The minimizer of a k-mer is the lesser of all sub-k-mers and their reverse-complements. This function can only be used if a minimizer size is defined.
- Parameters:
kmer – K-mer.
- Returns:
Minimizer.
- rev_complement(kmer: int) int
Reverse-complement k-mer.
- Parameters:
kmer – K-mer.
- Returns:
Reverse-complement of kmer.
- rev_complement_array(kmer_arr: numpy.ndarray) numpy.ndarray
Reverse-complement k-mers in an array.
- Parameters:
kmer_arr – K-mer array.
- Returns:
Reverse-complemented k-mers.
- to_kmer(k_str: str) int
Convert a string to a k-mer.
- Parameters:
k_str – K-mer string.
- Returns:
K-mer integer.
- agglovar.kmer.util.stream(seq: str, kutil: KmerUtil) Iterator[int]
Get an iterator for k-mers in a sequence.
- Parameters:
seq – String sequence of bases.
kutil – K-mer util describing parameters (e.g. k-mer size).
- Yields:
K-mers.
- agglovar.kmer.util.stream_index(seq: str, kutil: KmerUtil) Iterator[tuple[int, int]]
Get an iterator for k-mers in a sequence with their index.
- Parameters:
seq – String sequence of bases.
kutil – K-mer util describing parameters (e.g. k-mer size).
- Returns:
Iterator of tuples (kmer, index). Index starts at 0 and increments for each k-mer.
- agglovar.kmer.util.BYTE_SIZE_TO_NUMPY_UINT: Mapping[int, numpy.integer]
Maps k-mer byte size the smallest numpy integer type that can store it.
Only defined for sizes up to the maximum numpy unsigned numpy size (i.e. 8: np.uint64).
- agglovar.kmer.util.INT_TO_BASE: list[str] = ['A', 'C', 'G', 'T']
Maps 2-bit integer to base string.
- agglovar.kmer.util.NP_MAX_BIT_SIZE
Maximum k-mer bit size for numpy arrays.
- agglovar.kmer.util.NP_MAX_BYTE_SIZE
Maximum k-mer byte size for numpy arrays.
- agglovar.kmer.util.NP_MAX_KMER_SIZE
Maximum k-mer size for numpy arrays.