agglovar.seqmatch
Fast Smith-waterman implementation.
Classes
An engine for scoring alignments. |
Module Contents
- class agglovar.seqmatch.MatchScoreModel
An engine for scoring alignments.
Intended for determining sequence similarity through an alignment where the alignment itself is not important.
- Parameters:
match – Base match score (> 0.0).
mismatch – Base mismatch score (< 0.0).
gap_open – Gap open cost (<= 0.0).
gap_extend – Gap extend cost (<= 0.0).
map_limit – Maximum sequence size before falling back to Jaccard index in
match_prop().jaccard_kmer – Jaccard k-mer size for comparisons falling back to Jaccard index in
match_prop().
- match_prop(seq_a: str, seq_b: str) float
Get the alignment score proportion over the max possible score between two sequences.
To follow tandem duplications to map correctly, seq_b is duplicated head-to-tail (seq_b + seq_b) and seq_a is aligned to it.
The max possible score is achieved if seq_a and seq_b are the same size and seq_a aligns to seq_b + seq_b with all seq_a bases matching (function returns 1.0).
The numerator is the alignment score and the denominator is the size of the larger sequence times the match
score:
min( score_align(seq_a, seq_b + seq_b), min_len(seq_a, seq_b) * match ) / ( max_len(seq_a, seq_b) * match )
- Parameters:
seq_a – Subject sequence.
seq_b – Query sequence.
- Returns:
Alignment proportion with seq_b duplicated head to tail. If either sequence is None or empty, returns 0.0
- score_align(seq_a: str, seq_b: str) float
Get max score aligning two sequences.
Only returns the score, not the alignment.
This is a legacy native Python implementation used by SV-Pop. It is accurate, but extremely slow.
- Parameters:
seq_a – Subject sequence.
seq_b – Query sequence.
- Returns:
Maximum alignment score.
- score_align_native(seq_a: str, seq_b: str) float
Get max score aligning two sequences.
Only returns the score, not the alignment.commands = [[“pytest”, “{posargs}”]]
This is a legacy native Python implementation used by SV-Pop. It is validated, but extremely slow.
- Parameters:
seq_a – Subject sequence.
seq_b – Query sequence.
- Returns:
Maximum alignment score.
- score_model: agglovar.align.score.AffineScoreModel = None