Detailed explanation of the diferente measures of sequence conservation used in this database
The oligonucleotides were ranked considering
three main measures of sequence conservation:
- Percentage of identical sites (PIS) The PIS is calculated by dividing the number of equal positions in the alignment
for an oligonucleotide by its length
- Percentage of identical sites in the last five nucleotides at the 3’ end of oligonucleotide (3’PIS)
The 3'PIS is calculated as the PIS but only considering the last five nucleotides of the oligonucleotide
- Percentage of pairwise identity (PPI) The PPI is calculated by counting the average number of pairwise matches across the
positions of the alignment where the oligonucleotide is located. We then divide this value by the total number of pairwise
comparisons
- EbolaID score The ranking score (‘EbolaID score’) considers the mean value of the three different measures (PIS, 3’PIS and PPI).
Example
The four main measures described above are calculated as follows for EBOLAID001 oligonucleotide (25 nucleotides) considering the EbolaIDalig3 (124 sequences):
Percentage of identical sites: (14 identical sites/20 sites)*100 = 70.0%
3'Percentage of identical sites: (4 identical sites/5 sites)*100 = 80.0%
PPI: (144384 pairwise matches/152520 total pairwise comparisons)*100 = 94.6%