An introductory exercise to aligning amino acid sequences is the Manhattan tourist problem. Suppose a tourist starts his route in the top left corner of the map and wants to visit as many attractions on the way as possible, and finish in the down right corner. There is a restriction however - he can only move either to the right or down.

A Simplified Map of Manhattan

To describe the problem in numbers and data structures, the map is converted to the directed grid shown below. A tourist can move along the edges of the grid and the score of 1 is added if he passes an attraction, which are assigned to the bold edges. Of all possible paths we are interested in the one with the highest score.

Map of Manhattan as a Graph

The solution to this problem is equivalent to a more general problem. This time every "street", or the edge of the graph, is assigned a score. The goal is to find a path which gains the highest score.

Generic Manhattan Problem

The problem can be brute forced, of course, by calculating the score for all possible paths. This is not practical for the larger graphs of course. A better approach is to calculate the highest possible score at every intersection, starting from the top left corner. Here is how the logic goes: If the tourist goes only towards the left in the grid, or only down, there are no choices for him. So it is easy to calculate the highest scores for the top and left row of intersections. Now we have these scores calculated:

Generic Manhattan Problem - First Row and Column Solved

Now when that's done, we can proceed to the next row. How can the tourist get to intersection (1, 1)? He can either go to (0, 1) first, and gain 3 + 0 = 3 score, or go to (1, 0) and gain 1 + 3 = 4 score. Therefore, the maximum score he can gain at (1, 1) is 4.

Generic Manhattan Problem - Cell (1,1) Solved

The generic formula to use is quite intuitive: To reach an intersection, you have to arrive from one of two possible previous locations, and traverse a corresponding edge. You choose the path where the sum of the score of a previous location, plus the score of the edge, is higher.

Generic Manhattan Formula

We continue calculations along the column all the way down. After that, we can move on to the next row.

Generic Manhattan Problem - Second Column Solved

Following this logic, it is easy to compute the maximum score for each intersection and find out the maximum score that can be gained while traversing Manhattan.

Generic Manhattan Problem - Fully Solved

Along the way the algorithm also becomes clear: the maximum score at any intersection is the maximum of the following two values:

MANHATTANTOURIST(n, m, down, right) s0, 0 ← 0 for i ← 1 to n si, 0 ← si-1, 0 + downi, 0 for j ← 1 to m s0, j ← s0, j−1 + right0, j for i ← 1 to n for j ← 1 to m si, j ← max{si - 1, j + downi, j, si, j - 1 + righti, j} return sn, m

The following C# function implements the algorithm and returns the highest possible score, assuming we input all the scores for the edges towards the right and all the edges pointing down in the form of two-dimension arrays.

public static int ManhattanProblem(int[,] RightMatrix, int[,] DownMatrix) { int n = RightMatrix.GetLength(0) + 1; int m = DownMatrix.GetLength(1) + 1; int[,] ManhattanMatrix = new int[n, m]; ManhattanMatrix[0, 0] = 0; for (int i = 1; i <= n; i++) { ManhattanMatrix[i, 0] = ManhattanMatrix[i - 1, 0] + DownMatrix[i - 1, 0]; } for (int j = 1; j <= m; j++) { ManhattanMatrix[0, j] = ManhattanMatrix[0, j - 1] + RightMatrix[0, j - 1]; } for (int i = 1; i <= n; i++) { for (int j = 1; j <= m; j++) { ManhattanMatrix[i, j] = Math.Max(ManhattanMatrix[i - 1, j] + DownMatrix[i - 1, j], ManhattanMatrix[i, j - 1] + RightMatrix[i, j - 1]); } } return ManhattanMatrix.Cast<int>().Max(); }

Understanding this problem turns out to be important to understanding sequence alignment in bioinformatics.

by Evgeny. Also posted on my website
## No comments:

Post a Comment