Co-expression

Web Services

For us, Web services are Web-accessible tools that deliver data in easy-to-parse, machine-readable formats suitable for import into third-party visualization software and data analysis environments, or even custom programs. We want to deliver data in formats that will make it easier to try out new types of data exploration and data mining techniques, and we think that Web services are a good strategy to achieve this goal.

We are using two different Web services strategies for delivering data in machine-readable formats: a ReST-style strategy, and a BioMoby-based strategy. To find out how to use these different types of services, read on:

ReST: For information about retrieving expression values via URLs (ReST), Click here.

BioMoby: For information about accessing data using BioMoby web services, Click here.

The ReST Style

We provide access to "raw" expression values and KS quality control test statistics for ATH1 microarray data in our database via a simple, easy-to-use, ReST-style Web service.

The acryonym ReST stands for Representational State Transfer, and it refers to an architectural style of organizing and indexing information on the Web. In a nutshell, a ReST approach treats data sets as having unique addresses, typically using identifiers or names as part of URLs to make it easier to access the data. Typically, a ReST service would return data in an XML (Extended Markup Language) format, but since many programs you might want to use can't easily import XML, our ReST-style service outputs simple comma-separated, plain text. This is why we call our service "ReST-Style" instead of just ReST, because our service doesn't fully conform to the ReST way of doing things.

To access the data via this Web service, you just need to know how to formulate URL queries.

The syntax is:

http://www.cressexpress.org/cgi-bin/getExpVals.py?pss=[ps1,ps2,..,psN]&version=[v]

where [ps1,ps2.,,,psN] is a comma-separated list of probe set ids on the ATH1 array, and version [v] corresponds to one of the data releases listed here. Legal values for version include 2_0, 3_0, 3_1, and 3_2.

For example, the following URL retrieves expression data for two redundant probe sets that measure transcript variants from Arabidopsis locus AT5G35980, which encodes a putative protein kinase with two alternative three prime ends.

http://www.cressexpress.org/cgi-bin/getExpVals.py?pss=249678_at,249679_at&version=2_0

When you click the above URL, the Web service will run and send the output to your browser window. The results are returned as comma-separated plain text table of data. You can then save what appears in your browser as a plain text file and open it later in Excel, R, TableView, or any other program that can handle this simple, table-based format.

Each row in the returned data represents a single 'CEL' file or slide. Columns (and headings) include:

Column Header Description
cel CEL file name (from AffyWatch DVDs/CDs)
ps1 Expression values for first probe set in your query
ps2 Expression values for second probe set in your query
.... and so on
exp Experiment id (a number) assigned by NASC
slide The name of the slide as it existed when data were harvested from NASC and an AffyWatch release. Note that sometimes these names do not match the slide names listed on the NASC Web site. It appears that NASCArrays changes the names of individual slides from time to time; we do our best to track these, but we can't guaraxntee that all slides names reported by the web service will match slide names reported on NASC Experiment-level pages.
ks Kolmogorov-Smirnov (K-S) goodness-of-fit test value computed for a slide.
url URL referencing a Web page at NASC that describes the experiment
 

Here is some R code showing how you might use the R statistical analysis environment to access and plot data using the ReST-style Web service:



BioMoby

BioMoby Web services are mainly for bioinformatics developers to use in building software. If the ReST-style services are for "power users" who want to import our data into their own data analysis environments, then the BioMoby services are for "extra-power-users", people who want to build their own software that can use our data as inputs. To read more about BioMoby and how it works, visit the BioMoby Web site.

BioMoby offers a way for programmers to access data over the internet using syntax that hides the details of connecting to the Web, accessing URLs, and retrieving the data. BioMoby uses libraries and concepts from SOAP (Simple Object Access Protocal), but it adds to SOAP by creating new biology-specific data types. Another way BioMoby adds to the SOAP concept is by creating and maintaining a centralized registry of data types and web services, which allows programmers to build tools that automatically find out what types of services are available to operate on specific data types.

Some examples: One of the niftiest (we think) applications for this idea is the Taverna workflow program, which allows users to chain together Web services into relatively complex data processing pipelines. LitRep is another example of what the BioMoby system makes possible; LitRep combines results from several different services into a single resource. We find that it is particularly useful for researching the functions of individual Arabidopsis genes.

Our contributions thus far: we attended an NSF-funded workshop on Web services at the J Craig Venter Institute in May, 2007. During the workshop, we set up four new prototype BioMoby Web services that offer programmer-friendly access to data in the co-expression tool database. We hope to do more in future.

In keeping with the BioMoby style, these Web services are offered as (relatively) simple functions that programmers can incorporate into their own software once they've installed the requisite BioMoby and SOAP third-party libraries.

Co-expression Tool BioMoby services include:

Service (function) name:

  • getAGILocusCodesForProbesetId

    Consumes data type Primary Input: Object
    Emits data types Output: BioMoby::String[] (array)
    Clients to consume this service jMobyClient
    Dashboard
    Description Consumes an Affymetrix ATH1 probe set id and emits a list of AGI locus codes, using probe set-to-AGI code mappings provided by the Arabidopsis Information Resource.


  • getProbesetIdsForAGILocusCodeService

    Consumes data type Primary Input: Object
    Emits data types Output: BioMoby::String[] (array)
    Clients to consume this service jMobyClient
    Dashboard
    Description Consumes an AGI locus code and emits a list of Affymetrix probe set ids, using probe set-to-AGI code mappings provided by the Arabidopspis Information resource.


  • getGeneExpressionDataForNASCExperimentID

    Consumes data type Primary Input: NASCArraysReferenceNumber
    Secondary Input: image_processing_algorithm [RMA,MAS5,GCRMA]
    Emits data types Output: Collection of multi_slide_expression_values
    Clients to consume this service jMobyClient
    Dashboard
    Description Consumes an NASCArrays experiment id (numeric) and returns a matrix of Data Release 3.0 expression values for each array (slide) in the experiment


  • getKolmogrovQualityControlStatisticForNASCExperimentId

    Consumes data type Primary Input: NASCArraysReferenceNumber
    Secondary Input: image_processing_algorithm [RMA,MAS5,GCRMA]
    Emits data types Output: Collection of simple_key_value_pair
    Clients to consume this service jMobyClient
    Dashboard
    Description Consumes an NASCArrays experiment id (numeric) and returns CEL file names and their corresponding KS-D statistics, for Data Release 3.0.


As always, your comments and feedback are welcomed!