Wednesday, July 1, 2009

100 API accesses per minute: not a good idea

STRING was unreachable for some time this morning while it was busy processing ~10,000 API requests in about 1.5 hours. 100 API accesses per minute is not a good idea, as this overloads our server, making STRING (and STITCH) unavailable for everyone. Don't run scripts that access the API in parallel.

If you want to do large-scale analysis on STRING data, you can make your and our life much easier by signing an academic license agreement and downloading the full dataset.

Wednesday, June 24, 2009

New Release of STRING: Version 8.1

We are happy to announce that STRING 8.1 has just been released. We have updated the interaction data, fixed a number of bugs and greatly improved the web interface. As always, we keep older versions around to guarantee reproducibility of earlier work (see here).

The interactive network viewer has been re-implemented (it's now based on Adobe Flash version 10), and it gives users the new opportunity to "play around" with the network while keeping the same look and feel as previously.

The proteins in the network can now also be clustered, 'live', via two different methods: k-means and markov chain clustering. The topology can be relaxed after clustering, or in real time by turning on "relaxation" and "cooling". The graph layouting is done by force-directed placement (see here), and the cooling is done by gradually lowering the relaxation variable hat determines the strength of node movements, to zero. Of course, this is merely a first version of the interactive viewer and future versions may well have additional features (in particular in case our users have specific requests).

We have also extended the protein structure previews. These are now not only based on PDB, but we also incorporate homology models from the SWISS-MODEL Repository. And, structure previews are finally shown in the proper context of the protein's domain architecture.

Lastly, we fixed several 'known issues' (i.e., bugs) - such as missing minor chromosomes for some species, and viewer problems with the textmining predictions.

To start playing with the new release, visit STRING 8.1 - and p
lease don't hesitate to send us any feedback, criticism or suggestions...

Have Fun !

The STRING team

Thursday, February 19, 2009

Known Issues in STRING version 8.0

For each STRING version so far, only when we released it to the users did we find the last remaining bugs. Users often email us with their problems, and sometimes we are indeed to blame because there is an error. This is good (we think), because each bug found is a bug fixed - albeit only in the next release, usually.

So far, this is what we have found in release 8.0:

a) Some of our text-mining links do not show up in the corresponding evidence viewer. They are still correct, but the underlying text cannot be recovered and shown, for technical reasons. This happens because we developed a new feature that recognizes generic 'family' names for gene groups (like 'WNTs' for the various, homologous Wnt proteins). Within reasonable limits, such ambiguous names are now expanded to the individual protein members. However, we forgot to update the code of the text-viewer to reflect this ... we will do so in the next version.

b) Unfortunately, some of the prokaryotic genomes in this release are incomplete - in 43 cases we're missing a second (or third) minor chromosome. This was caused by a misunderstanding when parsing files from the RefSeq database: RefSeq provides an overview file that only lists one chromosome for each prokaryote, and we mistook that file for the full listing. Again, this will be fixed in the next release of STRING (on which we are already working). Obviously, we're now writing a new entry in our test suite that will prevent this type of error in the future - we will be checking the final gene counts of all organisms for consistency and also compare these counts against an external reference. Below is a list of affected organisms in the current release; if you're working with any of these, we recommend you continue using version 7.1 of STRING for now.

Luckily, no major model organisms are affected !!

Agrobacterium tumefaciens str. C58
Brucella abortus biovar 1 str. 9-941
Brucella melitensis 16M
Brucella melitensis biovar Abortus 2308
Brucella ovis ATCC 25840
Brucella suis 1330
Burkholderia ambifaria AMMD
Burkholderia cenocepacia AU 1054
Burkholderia cenocepacia HI2424
Burkholderia mallei ATCC 23344
Burkholderia mallei NCTC 10229
Burkholderia mallei NCTC 10247
Burkholderia mallei SAVP1
Burkholderia pseudomallei 1106a
Burkholderia pseudomallei 1710b
Burkholderia pseudomallei 668
Burkholderia pseudomallei K96243
Burkholderia sp. 383
Burkholderia thailandensis E264
Burkholderia vietnamiensis G4
Burkholderia xenovorans LB400
Deinococcus radiodurans R1
Haloarcula marismortui ATCC 43049
Leptospira borgpetersenii serovar Hardjo-bovis JB197
Leptospira borgpetersenii serovar Hardjo-bovis L550
Leptospira interrogans serovar Copenhageni str. Fiocruz L1-130
Leptospira interrogans serovar Lai str. 56601
Ochrobactrum anthropi ATCC 49188
Paracoccus denitrificans PD1222
Photobacterium profundum SS9
Pseudoalteromonas haloplanktis TAC125
Ralstonia eutropha H16
Ralstonia eutropha JMP134
Ralstonia metallidurans CH34
Rhodobacter sphaeroides 2.4.1
Rhodobacter sphaeroides ATCC 17029
Vibrio cholerae O1 biovar eltor str. N16961
Vibrio cholerae O395
Vibrio fischeri ES114
Vibrio harveyi ATCC BAA-1116
Vibrio parahaemolyticus RIMD 2210633
Vibrio vulnificus CMCP6
Vibrio vulnificus YJ016

That's it for known issues so far. But, do keep those emails coming - the feedback is very valuable !!

Thursday, January 29, 2009

Homology correction of co-occurrence and text-mining scores (updated)

[Update 29/01/2009: The homology correction is also applied to the text-mining channel starting with STRING 8.]

In order to avoid that gene duplications lead spurious functional associations, homologous proteins are down-weighed in the co-occurrence and text-mining channels. You will notice this on the score summary page of a link and if you have our SQL dumps.

Here's an example: The co-occurrence view looks fine for this pair of proteins.
However, the total score of 0.204 is less than the co-occurrence score:

The reason for this is that the proteins have some sequence similarity and are therefore down-weighted according to this formula:

effective co-occurrence score = co-occurrence score * (1 - homology score)

(The homology score is calculated from the bit score of the alignment.) In this case:

0.204 = 0.478 * ( 1 - 0.572 )


Thursday, January 15, 2009

Linking to individual networks (updated)

If you want to link to STRING or STITCH from your website, you can use the following URLs for simple queries:

For STITCH, you can use names of chemicals:

http://stitch.embl.de/interactions/aspirin?species=9606

You can also use identifiers, e.g. SwissProt or ATC codes:

http://string.embl.de/interactions/DRD1_HUMAN
http://stitch.embl.de/interactions/A01AD05?species=9606

(The 9606 specifies that you want human interactions, see NCBI taxonomy.)

Update: You can also link to networks with multiple items. As STRING saves the user's preference for proteins/COG mode, it's better to specify the target mode.

http://string.embl.de/interactionsList/zgc:73075%0Dzgc:136854?targetmode=proteins
http://string.embl.de/interactionsList/zgc:73075%0Dzgc:136854?targetmode=cogs
http://string.embl.de/interactionsList/KOG0044%0DKOG3656?targetmode=cogs

http://stitch.embl.de/interactionsList/DRD1_HUMAN%0Dpergolide?species=9606

You construct the URL by concatenating the protein names with "%0D" or "%0A" (an encoded carriage return / newline character).

Wednesday, January 14, 2009

New Year - New Major Release

Looks like 2009 will bring a lot of changes for both STRING and STITCH - and to lead the way, STRING has now been upgraded to version 8.0 !

This has been a major upgrade, and it has been some time in the making. We have almost doubled the number of organisms (again), and re-imported all the various pathways, protein-complexes and text-collections. We've also worked a lot behind the scenes, solidifying the API, further automating our data import and updating the way we display orthologous groups, to name just a few examples. All this has been possible only, really, because of our new sponsor - the Swiss Institute of Bioinformatics (SIB). Thanks guys !

More info about this new release is also available from here.

Monday, July 21, 2008

High-resolution images

We recently implemented a way to export high-res images (300 dpi, click the image below to see an example). This feature will go public with STRING 8 / STITCH 2, but if you're now using STRING or STITCH and want to prepare an image for publication, please get in touch with us (mkuhn embl de) and we can send you the image.