Thursday, January 29, 2009

Homology correction of co-occurrence and text-mining scores (updated)

[Update 29/01/2009: The homology correction is also applied to the text-mining channel starting with STRING 8.]

In order to avoid that gene duplications lead spurious functional associations, homologous proteins are down-weighed in the co-occurrence and text-mining channels. You will notice this on the score summary page of a link and if you have our SQL dumps.

Here's an example: The co-occurrence view looks fine for this pair of proteins.
However, the total score of 0.204 is less than the co-occurrence score:

The reason for this is that the proteins have some sequence similarity and are therefore down-weighted according to this formula:

effective co-occurrence score = co-occurrence score * (1 - homology score)

(The homology score is calculated from the bit score of the alignment.) In this case:

0.204 = 0.478 * ( 1 - 0.572 )


Thursday, January 15, 2009

Linking to individual networks (updated)

If you want to link to STRING or STITCH from your website, you can use the following URLs for simple queries:

For STITCH, you can use names of chemicals:

http://stitch.embl.de/interactions/aspirin?species=9606

You can also use identifiers, e.g. SwissProt or ATC codes:

http://string.embl.de/interactions/DRD1_HUMAN
http://stitch.embl.de/interactions/A01AD05?species=9606

(The 9606 specifies that you want human interactions, see NCBI taxonomy.)

Update: You can also link to networks with multiple items. As STRING saves the user's preference for proteins/COG mode, it's better to specify the target mode.

http://string.embl.de/interactionsList/zgc:73075%0Dzgc:136854?targetmode=proteins
http://string.embl.de/interactionsList/zgc:73075%0Dzgc:136854?targetmode=cogs
http://string.embl.de/interactionsList/KOG0044%0DKOG3656?targetmode=cogs

http://stitch.embl.de/interactionsList/DRD1_HUMAN%0Dpergolide?species=9606

You construct the URL by concatenating the protein names with "%0D" or "%0A" (an encoded carriage return / newline character).

Wednesday, January 14, 2009

New Year - New Major Release

Looks like 2009 will bring a lot of changes for both STRING and STITCH - and to lead the way, STRING has now been upgraded to version 8.0 !

This has been a major upgrade, and it has been some time in the making. We have almost doubled the number of organisms (again), and re-imported all the various pathways, protein-complexes and text-collections. We've also worked a lot behind the scenes, solidifying the API, further automating our data import and updating the way we display orthologous groups, to name just a few examples. All this has been possible only, really, because of our new sponsor - the Swiss Institute of Bioinformatics (SIB). Thanks guys !

More info about this new release is also available from here.