Detailed explanation of the diferente measures of sequence conservation used in this database ΒΆ


The oligonucleotides were ranked considering three main measures of sequence conservation:

- Percentage of identical sites (PIS)
The PIS is calculated by dividing the number of equal positions in the alignment for an oligonucleotide by its length

- Percentage of identical sites in the last five nucleotides at the 3’ end of oligonucleotide (3’PIS)
The 3'PIS is calculated as the PIS but only considering the last five nucleotides of the oligonucleotide

- Percentage of pairwise identity (PPI)
The PPI is calculated by counting the average number of pairwise matches across the positions of the alignment where the oligonucleotide is located. We then divide this value by the total number of pairwise comparisons

- EbolaID score
The ranking score (‘EbolaID score’) considers the mean value of the three different measures (PIS, 3’PIS and PPI).



Example

The four main measures described above are calculated as follows for EBOLAID001 oligonucleotide (25 nucleotides) considering the EbolaIDalig3 (124 sequences):


Percentage of identical sites: (14 identical sites/20 sites)*100 = 70.0%


3'Percentage of identical sites: (4 identical sites/5 sites)*100 = 80.0%


PPI: (144384 pairwise matches/152520 total pairwise comparisons)*100 = 94.6%


EbolaID score: (70+80+94.6)/3=81.53