agglovar.seqmatch

Fast Smith-waterman implementation.

Classes

MatchScoreModel

An engine for scoring alignments.

Module Contents

class agglovar.seqmatch.MatchScoreModel

An engine for scoring alignments.

Intended for determining sequence similarity through an alignment where the alignment itself is not important.

Parameters:
  • match – Base match score (> 0.0).

  • mismatch – Base mismatch score (< 0.0).

  • gap_open – Gap open cost (<= 0.0).

  • gap_extend – Gap extend cost (<= 0.0).

  • map_limit – Maximum sequence size before falling back to Jaccard index in match_prop().

  • jaccard_kmer – Jaccard k-mer size for comparisons falling back to Jaccard index in match_prop().

__post_init__() None

Post-initialization.

match_prop(seq_a: str, seq_b: str) float

Get the alignment score proportion over the max possible score between two sequences.

To follow tandem duplications to map correctly, seq_b is duplicated head-to-tail (seq_b + seq_b) and seq_a is aligned to it.

The max possible score is achieved if seq_a and seq_b are the same size and seq_a aligns to seq_b + seq_b with all seq_a bases matching (function returns 1.0).

The numerator is the alignment score and the denominator is the size of the larger sequence times the match

score:

min(
    score_align(seq_a, seq_b + seq_b), min_len(seq_a, seq_b) * match
) / (
    max_len(seq_a, seq_b) * match
)
Parameters:
  • seq_a – Subject sequence.

  • seq_b – Query sequence.

Returns:

Alignment proportion with seq_b duplicated head to tail. If either sequence is None or empty, returns 0.0

score_align(seq_a: str, seq_b: str) float

Get max score aligning two sequences.

Only returns the score, not the alignment.

This is a legacy native Python implementation used by SV-Pop. It is accurate, but extremely slow.

Parameters:
  • seq_a – Subject sequence.

  • seq_b – Query sequence.

Returns:

Maximum alignment score.

score_align_native(seq_a: str, seq_b: str) float

Get max score aligning two sequences.

Only returns the score, not the alignment.commands = [[“pytest”, “{posargs}”]]

This is a legacy native Python implementation used by SV-Pop. It is validated, but extremely slow.

Parameters:
  • seq_a – Subject sequence.

  • seq_b – Query sequence.

Returns:

Maximum alignment score.

affine_gap: float
jaccard_kmer: int = 9
map_limit: int | None = 5000
match: float
mismatch: float
rotate_min: int = 3
score_model: agglovar.align.score.AffineScoreModel = None