Thursday, February 19, 2009

Known Issues in STRING version 8.0

For each STRING version so far, only when we released it to the users did we find the last remaining bugs. Users often email us with their problems, and sometimes we are indeed to blame because there is an error. This is good (we think), because each bug found is a bug fixed - albeit only in the next release, usually.

So far, this is what we have found in release 8.0:

a) Some of our text-mining links do not show up in the corresponding evidence viewer. They are still correct, but the underlying text cannot be recovered and shown, for technical reasons. This happens because we developed a new feature that recognizes generic 'family' names for gene groups (like 'WNTs' for the various, homologous Wnt proteins). Within reasonable limits, such ambiguous names are now expanded to the individual protein members. However, we forgot to update the code of the text-viewer to reflect this ... we will do so in the next version.

b) Unfortunately, some of the prokaryotic genomes in this release are incomplete - in 43 cases we're missing a second (or third) minor chromosome. This was caused by a misunderstanding when parsing files from the RefSeq database: RefSeq provides an overview file that only lists one chromosome for each prokaryote, and we mistook that file for the full listing. Again, this will be fixed in the next release of STRING (on which we are already working). Obviously, we're now writing a new entry in our test suite that will prevent this type of error in the future - we will be checking the final gene counts of all organisms for consistency and also compare these counts against an external reference. Below is a list of affected organisms in the current release; if you're working with any of these, we recommend you continue using version 7.1 of STRING for now.

Luckily, no major model organisms are affected !!

Agrobacterium tumefaciens str. C58
Brucella abortus biovar 1 str. 9-941
Brucella melitensis 16M
Brucella melitensis biovar Abortus 2308
Brucella ovis ATCC 25840
Brucella suis 1330
Burkholderia ambifaria AMMD
Burkholderia cenocepacia AU 1054
Burkholderia cenocepacia HI2424
Burkholderia mallei ATCC 23344
Burkholderia mallei NCTC 10229
Burkholderia mallei NCTC 10247
Burkholderia mallei SAVP1
Burkholderia pseudomallei 1106a
Burkholderia pseudomallei 1710b
Burkholderia pseudomallei 668
Burkholderia pseudomallei K96243
Burkholderia sp. 383
Burkholderia thailandensis E264
Burkholderia vietnamiensis G4
Burkholderia xenovorans LB400
Deinococcus radiodurans R1
Haloarcula marismortui ATCC 43049
Leptospira borgpetersenii serovar Hardjo-bovis JB197
Leptospira borgpetersenii serovar Hardjo-bovis L550
Leptospira interrogans serovar Copenhageni str. Fiocruz L1-130
Leptospira interrogans serovar Lai str. 56601
Ochrobactrum anthropi ATCC 49188
Paracoccus denitrificans PD1222
Photobacterium profundum SS9
Pseudoalteromonas haloplanktis TAC125
Ralstonia eutropha H16
Ralstonia eutropha JMP134
Ralstonia metallidurans CH34
Rhodobacter sphaeroides 2.4.1
Rhodobacter sphaeroides ATCC 17029
Vibrio cholerae O1 biovar eltor str. N16961
Vibrio cholerae O395
Vibrio fischeri ES114
Vibrio harveyi ATCC BAA-1116
Vibrio parahaemolyticus RIMD 2210633
Vibrio vulnificus CMCP6
Vibrio vulnificus YJ016

That's it for known issues so far. But, do keep those emails coming - the feedback is very valuable !!

2 comments:

  1. Hello,

    I've just found the STRING site and find it very useful but I've noticed one bug, I don't see any email on the webpage (maybe I've missed it) so I am leaving the info here.

    If you view S.cerevisiae gene GRX4 or GRX3 you will get among others an interaction with ACT1 (actin) based on two aspects, experiment and coexpression, the coexpression however states:

    "Co-Expression: none, but putative homologs are coexpressed in 0 species (score 0.056)."

    And clicking on show gives "- none - " for data.

    This is obviously a bug.

    One other thing that I've noticed is that it would be nice if the names could be changed since I know some of these proteins under different names (for example I'd prefer ARP1, as in SGD, over systematic name YHR129C) and would for like to create pictures with the names which I can immediately recognize.

    But overall the site is great, and the integration of so many genomes in one framework is especially useful.

    ReplyDelete
  2. Hi Paul,

    we have a filtering step in place that removes low-probability interactions from the database (because there are so many of them). I guess in this case the score of 0.056 should have been suppressed.

    It would be pretty hard for us to implement the display of different names. We try hard to come up with a good and unique name. However, sometimes there's another protein by the same name, and we thus can't use the "good" name. In the case of ARP1, there is another protein (ARL3) that is also called ARP1 (GeneDB), so we fall back on the systematic name.

    For making a figure for a paper, you could download and edit the SVG file.

    ReplyDelete