Thursday, January 29, 2009

Homology correction of co-occurrence and text-mining scores (updated)

[Update 29/01/2009: The homology correction is also applied to the text-mining channel starting with STRING 8.]

In order to avoid that gene duplications lead spurious functional associations, homologous proteins are down-weighed in the co-occurrence and text-mining channels. You will notice this on the score summary page of a link and if you have our SQL dumps.

Here's an example: The co-occurrence view looks fine for this pair of proteins.
However, the total score of 0.204 is less than the co-occurrence score:

The reason for this is that the proteins have some sequence similarity and are therefore down-weighted according to this formula:

effective co-occurrence score = co-occurrence score * (1 - homology score)

(The homology score is calculated from the bit score of the alignment.) In this case:

0.204 = 0.478 * ( 1 - 0.572 )

No comments:

Post a Comment