Friday, February 22, 2008

Example Taverna workflow

Taverna is a tool that lets you connect web services from different sources with each other. I've implemented a simple example workflow (.xml): You enter a human protein or chemical of interest, STITCH will identify the matching item(s), generate an interaction network and retrieve the 10 most relevant abstracts. The list of Pubmed ids is then passed on to Whatizit, which highlights disease terms in the abstracts. A simple Python script then counts the diseases.

Try it out; try other things; tell me what you think.
Note: The Python script is called via the Soaplab interface. I tried to do this in Taverna but gave up, it didn't like my XPath (due to the XML produced by Whatizit) and writing a Beanshell script seemed more cumbersome.

Tuesday, February 19, 2008

We have an API!

I went to the BioHackathon 2008 in Tokyo and worked on an API for STRING and STITCH. If you think about using STRING or STITCH with an API, and miss features, please get in touch with us either via the comments or e-mail (e.g. mkuhn//embl.de).

Here's what we have to offer so far:

REST interface

The URL patterns are: http://stitch.embl.de/api/[format]/[request]?[parameters]
http://string.embl.de/api/[format]/[request]?[parameters]

Possible formats:

  • tsv: tab-separated values, with a header line
  • tsv-no-header: as above, but no header
  • json: JSON format either as a list of hashes/dictionaries, or as a plain list (if there is only one value to be returned per record)
  • psi-mi: the interaction network is available in PSI-MI 2.5 XML format
  • psi-mi-tab: there is also a tab-delimited form, modeled after the IntAct specification. This is easier to parse, but contains less information than the XML format.
  • url: return the URL of the network image
Possible requests:
  • abstracts: return a list of abstracts that contain the query item
  • abstractsList: return a list of abstracts that contain any of the query items
  • interactions: return an interaction network in PSI-MI 2.5 format (PSI-MI is currently the only format for interactions. Perhaps the PSI-MI tab-delimited form would also make sense? I don't know how a JSON form should look like.)
  • interactionsList: same as above, but for list of identifiers
  • interactors: return a list of interaction partners for the query item
  • interactorsList: return a list of interaction partners for any of the query item
  • resolve: return the list of items that match (in name or identifier) the query item
  • network / networkList: in conjunction with the "url" format, return the URL to the network
For a full list of possible parameters, please refer to our STRING Soaplab 2 interface. With the help of Soaplab2 / Gowlab, we'll describe the set of possible parameters there. (Doesn't work right now. :-/ )

Examples

To find out which proteins match the description "dopamine receptor" in human, you can use this query:

http://stitch.embl.de/api/tsv/resolve?identifier=dopamine%20receptor&species=9606
http://string.embl.de/api/tsv/resolve?identifier=dopamine%20receptor&species=9606

This gives you a lot of additional info. If you just want to get the list of STRING identifiers, you can alter the query a bit:

http://stitch.embl.de/api/tsv-no-header/resolve?identifier=dopamine%20receptor&species=9606&format=only-ids
http://string.embl.de/api/tsv-no-header/resolve?identifier=dopamine%20receptor&species=9606&format=only-ids

Now, you'll only receive a bare list of ids that you could pipe into other STRING API functions.

To illustrate the difference between normal and "list" queries:

http://stitch.embl.de/api/tsv/interactors?identifier=DRD1_HUMAN
http://stitch.embl.de/api/tsv/interactorsList?identifiers=DRD1_HUMAN%0DDRD2_HUMAN

http://string.embl.de/api/tsv/interactors?identifier=DRD1_HUMAN
http://string.embl.de/api/tsv/interactorsList?identifiers=DRD1_HUMAN%0DDRD2_HUMAN


In the second case, the identifiers parameter contains a list of items separated by new line characters (%0A or %0D).

SOAP / Taverna

In a separate post, I've described an example Taverna workflow. As for SOAP integration, I hope that the Soaplab interface works...

Obligatory beta notice

As all good things these days, this is still in beta (internally, everything in fact runs on our beta server, I'm just making it accessible via the normal STITCH domain to expose it to the web). Therefore, the API might change, be down, ... until STITCH 2 / STRING 8 comes out.

Updates

03.03.2008: Added clarification – PSI-MI is currently the only interactions format.
04.03.2008: Fixed typo – it's "
interactorsList"
12.03.2008: Add psi-mi-tab format
19.05.2008: Add STRING API (with same specification)
08.07.2008: Add API for generating network images
16.03.2009: Enabled interactionsList

Wednesday, February 6, 2008

Mea culpa: missing links from PDB

I intended to extract protein–chemical links from the PDB (and we wrote this in the paper), but somehow I didn't quite finish the import scripts before we finalized STITCH 1.0. I am sorry about this, and apologize if you are missing interactions.


We are currently preparing the next versions of STRING and STITCH (versions 8 and 2, respectively) and I will import the remediated PDB for the new version. I guess STITCH 2 should come out in Spring 2008.

(Special thanks to Florian Raible for testing his pet molecule and discovering what was missing.)