A Simple Implementation of Nussinov Algorithm

The Nussinov algorithm is a dynamic programming-based nucleic acid structure prediction algorithm used in computational biology to predict the folding of an RNA molecule. It calculates the best structure for small subsequences, and works its way outwards to larger and larger subsequences. The Nussinov calculation is recursive. The key idea of the recursive calculation is that there are only four possible ways of getting the best structure for i, j from the best structures of the smaller subsequences.

And here is an easy solution to this type of problem: Consider recursion first, look for duplicate calculations, then eliminate repeated computations using extra memory and other techniques. Determine whether recursion may be eliminated by changing the order of computation to compute directly. If possible, it is preferable to complete the task.

import numpy as np

# seq = "GUACGUGUGCGU"
seq = "AAACUUUCCCAGGG"
seq = "UUCCUUUGGGCCCUGUUGGGGCCCAAAGGGGUUCUUCAAAACGCGCUGCCCUUCUUCUGGGCAGCGCGUUUU"

scores = {'A': {'A': 0, 'G': 0, 'C': 0, 'U': 1},
          'G': {'A': 0, 'G': 0, 'C': 1, 'U': 1},
          'C': {'A': 0, 'G': 1, 'C': 0, 'U': 0},
          'U': {'A': 1, 'G': 1, 'C': 0, 'U': 0}}

N = len(seq)
DP = np.zeros((N, N))
minimum_loop_size = 3

for n in range(minimum_loop_size + 1, N):
    for i in range(0, N - n):
        j = i + n       # diag pos
        DP[i][j] = max(DP[i][j - 1], DP[i + 1][j], DP[i + 1][j - 1] + scores[seq[i]][seq[j]])

        for k in range(i + 1 + minimum_loop_size, j - 1 - minimum_loop_size):
            DP[i][j] = max(DP[i][j], DP[i][k] + DP[k + 1][j])

print(DP)

ans = []
result = ["."] * N
def backtrack(i, j):
    if i >= j:
        return
    if DP[i][j] == DP[i][j - 1]:
        backtrack(i, j - 1)
    elif DP[i][j] == DP[i + 1][j]:
        backtrack(i + 1, j)
    elif DP[i][j] == DP[i + 1][j - 1] + scores[seq[i]][seq[j]]:
        ans.append((i, j))
        result[i] = "("
        result[j] = ")"
        backtrack(i + 1, j - 1)
    else:
        for k in range(i + 1 + minimum_loop_size, j - 1 - minimum_loop_size):
            if DP[i][j] == DP[i][k] + DP[k + 1][j]:
                ans.append((i, k))
                result[i] = "("
                result[j] = ")"

                backtrack(i, k)
                backtrack(k + 1, j)

backtrack(0, N - 1)
print(ans)
print(''.join(result))

 

 

Add a Comment

Your email address will not be published. Required fields are marked *