Monday, November 9, 2009

STRING 8.2 and STITCH 2.0 released

We have released new versions of STRING and STITCH!

Here are some of the changes:

  • protein-protein actions like binding or post-translational modifications are now included
  • most download files are now under Creative Commons licenses, so academics usually don't need to fill license agreements anymore
  • bugfix: version 8.1 had accidentally omitted homology transfer in the textmining channel, this is now fixed.
  • the Flash interactive network viewer has further matured; a simple, lean version is now enabled by default. The full-featured version can be activated by clicking on the "advanced" button below the network
  • STITCH 2 uses the protein universe of STRING 8
  • the number of interactions has further increased

Tuesday, September 1, 2009

API for STRING 8.0 still accessible

After the release of STRING 8.1, the previous version, 8.0, can still be accessed: Under this URL, the API continues to work, e.g.:

This pattern will continue for upcoming releases (you could also use "string81" as host name now to stay on this version even if we update to a new version).

Thursday, July 23, 2009

Quick poll for next STITCH version concerning chemical names

I'm conducting a small poll over at FriendFeed about abbreviations of chemical names. If you have a opinion about which abbreviation is better in the following cases, please leave your feedback (either on FriendFeed or in the comments here):

  • 10-formyltetrahydrofolate: "10-formyltetra." or "10-formyl-THF"?
  • acetyl-L-carnitine: "acetyl-L-carni." or "ALCAR"?
  • dihydrotestosterone: "androstanolone" or "dihydrotestost."?

Tuesday, July 21, 2009

STRING Cytoscape plugin

Users of Cytoscape can now natively retrieve interaction networks from STRING !

During a recent workshop at the EBI, a common web service API to query interaction databases (called PSICQUIC) was finalized. Once all interaction databases have implemented this interface, it will be possible to use a single client (a Cytoscape plugin for example) to interact with all of them. We are committed to this initiative, and look forward to the implementations.

In the interim, we have decided to also release a small, custom-made plugin for Cytoscape called StringWSClient, which interacts only with the STRING database.

This allows us to offer users the full range of features that the STRING API allows (e.g. to show all available species, or to resolve ambiguous inputs). Version 1.0 (1.1) supports only the import of interaction networks; upcoming versions will be able to extend existing networks, filter them using STRING specific criteria, etc. The 1.0 version works only with Cytoscape 2.6.1, and 1.1 was released to support the whole 2.6.x branch.

To install it, fire up Cytoscape, open Plugins/Manage Plugins dialog and pick StringWSClient v1.0 from "Network and Attribute I/O" section. You may have to restart Cytoscape to load the plugin. See Cytoscape documentation for details.

Once you have the plugin installed, open File/Import/Network from web services... dialog and
pick the String plugin.

The plugin resembles STRING's web user interface: a field to type queries and the organisms selector. In the background, the query is sent to the STRING database and the resulting interaction network is fetched and displayed.

We're looking forward to your feedback !
Milan Simonovic and the STRING team.

Wednesday, July 1, 2009

100 API accesses per minute: not a good idea

STRING was unreachable for some time this morning while it was busy processing ~10,000 API requests in about 1.5 hours. 100 API accesses per minute is not a good idea, as this overloads our server, making STRING (and STITCH) unavailable for everyone. Don't run scripts that access the API in parallel.

If you want to do large-scale analysis on STRING data, you can make your and our life much easier by signing an academic license agreement and downloading the full dataset.

Wednesday, June 24, 2009

New Release of STRING: Version 8.1

We are happy to announce that STRING 8.1 has just been released. We have updated the interaction data, fixed a number of bugs and greatly improved the web interface. As always, we keep older versions around to guarantee reproducibility of earlier work (see here).

The interactive network viewer has been re-implemented (it's now based on Adobe Flash version 10), and it gives users the new opportunity to "play around" with the network while keeping the same look and feel as previously.

The proteins in the network can now also be clustered, 'live', via two different methods: k-means and markov chain clustering. The topology can be relaxed after clustering, or in real time by turning on "relaxation" and "cooling". The graph layouting is done by force-directed placement (see here), and the cooling is done by gradually lowering the relaxation variable hat determines the strength of node movements, to zero. Of course, this is merely a first version of the interactive viewer and future versions may well have additional features (in particular in case our users have specific requests).

We have also extended the protein structure previews. These are now not only based on PDB, but we also incorporate homology models from the SWISS-MODEL Repository. And, structure previews are finally shown in the proper context of the protein's domain architecture.

Lastly, we fixed several 'known issues' (i.e., bugs) - such as missing minor chromosomes for some species, and viewer problems with the textmining predictions.

To start playing with the new release, visit STRING 8.1 - and p
lease don't hesitate to send us any feedback, criticism or suggestions...

Have Fun !

The STRING team

Thursday, February 19, 2009

Known Issues in STRING version 8.0

For each STRING version so far, only when we released it to the users did we find the last remaining bugs. Users often email us with their problems, and sometimes we are indeed to blame because there is an error. This is good (we think), because each bug found is a bug fixed - albeit only in the next release, usually.

So far, this is what we have found in release 8.0:

a) Some of our text-mining links do not show up in the corresponding evidence viewer. They are still correct, but the underlying text cannot be recovered and shown, for technical reasons. This happens because we developed a new feature that recognizes generic 'family' names for gene groups (like 'WNTs' for the various, homologous Wnt proteins). Within reasonable limits, such ambiguous names are now expanded to the individual protein members. However, we forgot to update the code of the text-viewer to reflect this ... we will do so in the next version.

b) Unfortunately, some of the prokaryotic genomes in this release are incomplete - in 43 cases we're missing a second (or third) minor chromosome. This was caused by a misunderstanding when parsing files from the RefSeq database: RefSeq provides an overview file that only lists one chromosome for each prokaryote, and we mistook that file for the full listing. Again, this will be fixed in the next release of STRING (on which we are already working). Obviously, we're now writing a new entry in our test suite that will prevent this type of error in the future - we will be checking the final gene counts of all organisms for consistency and also compare these counts against an external reference. Below is a list of affected organisms in the current release; if you're working with any of these, we recommend you continue using version 7.1 of STRING for now.

Luckily, no major model organisms are affected !!

Agrobacterium tumefaciens str. C58
Brucella abortus biovar 1 str. 9-941
Brucella melitensis 16M
Brucella melitensis biovar Abortus 2308
Brucella ovis ATCC 25840
Brucella suis 1330
Burkholderia ambifaria AMMD
Burkholderia cenocepacia AU 1054
Burkholderia cenocepacia HI2424
Burkholderia mallei ATCC 23344
Burkholderia mallei NCTC 10229
Burkholderia mallei NCTC 10247
Burkholderia mallei SAVP1
Burkholderia pseudomallei 1106a
Burkholderia pseudomallei 1710b
Burkholderia pseudomallei 668
Burkholderia pseudomallei K96243
Burkholderia sp. 383
Burkholderia thailandensis E264
Burkholderia vietnamiensis G4
Burkholderia xenovorans LB400
Deinococcus radiodurans R1
Haloarcula marismortui ATCC 43049
Leptospira borgpetersenii serovar Hardjo-bovis JB197
Leptospira borgpetersenii serovar Hardjo-bovis L550
Leptospira interrogans serovar Copenhageni str. Fiocruz L1-130
Leptospira interrogans serovar Lai str. 56601
Ochrobactrum anthropi ATCC 49188
Paracoccus denitrificans PD1222
Photobacterium profundum SS9
Pseudoalteromonas haloplanktis TAC125
Ralstonia eutropha H16
Ralstonia eutropha JMP134
Ralstonia metallidurans CH34
Rhodobacter sphaeroides 2.4.1
Rhodobacter sphaeroides ATCC 17029
Vibrio cholerae O1 biovar eltor str. N16961
Vibrio cholerae O395
Vibrio fischeri ES114
Vibrio harveyi ATCC BAA-1116
Vibrio parahaemolyticus RIMD 2210633
Vibrio vulnificus CMCP6
Vibrio vulnificus YJ016

That's it for known issues so far. But, do keep those emails coming - the feedback is very valuable !!

Thursday, January 29, 2009

Homology correction of co-occurrence and text-mining scores (updated)

[Update 29/01/2009: The homology correction is also applied to the text-mining channel starting with STRING 8.]

In order to avoid that gene duplications lead spurious functional associations, homologous proteins are down-weighed in the co-occurrence and text-mining channels. You will notice this on the score summary page of a link and if you have our SQL dumps.

Here's an example: The co-occurrence view looks fine for this pair of proteins.
However, the total score of 0.204 is less than the co-occurrence score:

The reason for this is that the proteins have some sequence similarity and are therefore down-weighted according to this formula:

effective co-occurrence score = co-occurrence score * (1 - homology score)

(The homology score is calculated from the bit score of the alignment.) In this case:

0.204 = 0.478 * ( 1 - 0.572 )

Thursday, January 15, 2009

Linking to individual networks (updated)

If you want to link to STRING or STITCH from your website, you can use the following URLs for simple queries:

For STITCH, you can use names of chemicals:

You can also use identifiers, e.g. SwissProt or ATC codes:

(The 9606 specifies that you want human interactions, see NCBI taxonomy.)

Update: You can also link to networks with multiple items. As STRING saves the user's preference for proteins/COG mode, it's better to specify the target mode.

You construct the URL by concatenating the protein names with "%0D" or "%0A" (an encoded carriage return / newline character).

Wednesday, January 14, 2009

New Year - New Major Release

Looks like 2009 will bring a lot of changes for both STRING and STITCH - and to lead the way, STRING has now been upgraded to version 8.0 !

This has been a major upgrade, and it has been some time in the making. We have almost doubled the number of organisms (again), and re-imported all the various pathways, protein-complexes and text-collections. We've also worked a lot behind the scenes, solidifying the API, further automating our data import and updating the way we display orthologous groups, to name just a few examples. All this has been possible only, really, because of our new sponsor - the Swiss Institute of Bioinformatics (SIB). Thanks guys !

More info about this new release is also available from here.