Skip to content

Latest commit

 

History

History
327 lines (234 loc) · 23.1 KB

webservices.md

File metadata and controls

327 lines (234 loc) · 23.1 KB

Webservices

Using the UniChem web service API users can retrieve data from the UniChem database in a programmatic fashion. This page documents the currently supported functionality and defines the expected inputs and outputs of each method, together with examples for each one (which may be tested most simply by pasting into a browser, or using an open source tool such as RESTClient from WizTools ).

Please note that although that wherever possible small data sets are used in the examples, occassionally the only examples possible are with larger sets, which may take some time to retrieve (such as the auxiliary mapping method). In order to help you get started, we have provided some example cURL requests (below). Note that for small, ad hoc queries, users should consider simply using the Home / Search page.

For whole source mapping, users are strongly advised to simply download the pre-generated mappings as gzip files from the downloads site instead, rather than use the web services mapping method described below.

Constructing Queries.

All RESTful queries are constructed using the following 'stem' or 'base' url:
https://www.ebi.ac.uk/unichem/rest/

Specific query urls are all constructed by adding a method name to this base url, followed by input data, as shown in the API methods section below.

Input data may consist of three types:

  • src_compound_id
  • src_id
  • InChIKey

src_ids are integers that represent the various sources in UniChem. A list of valid src_ids can be found either on the sources page or by using the 'Get all src_ids' method below.'

For some methods, a 'to_src_id' may be required. This is simply a normal src_id but serves to specify the source to which the output is specifically mapped to.

src_compound_ids are the individual compound identifiers provided by each of the sources. Since src_compound_ids may be ambiguous across different sources, most API methods requiring a src_compound_id as an input also require the specification of a corresponding src_id. However, for a few methods, an 'orphan src_compound_id' is expected. An 'orphan src_compound_id' is simply a normal src_compound_id for which the src_id is not specified. For example, '1234' and 'CHEMBL121' are both termed an 'orphan src_compound_id's if their corresponding src_ids are not specified. Methods accepting such 'orphan src_compound_ids' serve to identify possible src_ids for the orphan src_compound_id (eg: '4, 22, 31' and '1', respectively, using the previous example). Other methods using 'orphan src_compound_id's serve first to identify possible src_ids for the orphan src_compound_id, and then to generate all possible mappings to other src_compound_id/src_ids for each of the possible src_ids for the orphan src_compound_id (eg: method 'Get mappings for orphan src_compound_id').

Note also that src-compound_ids are treated in a case-sensitive manner throughout UniChem (see here for formatting information on search terms).

Content Negotiation.

The default data serialization method is JSON. However, alternative content types are available (eg: XML). These may be selected by the user by specifying the content type HTTP Accept header (eg: 'text/xml' for XML). JSONP may be returned without specifying an Accept header, but by simply appending '?callback=xyz' to the request_url (where 'xyz' is defined by the user). In this case, the returned object will then have the form of xyz(serializedJSON). See examples at the foot of this page.

Error Codes.

HTTP Response Code Summary Description
200 OK The request to the web service completed successfully. This includes valid requests that happen to return empty data sets.
400 Bad Request The parameters passed to the API endpoint were deemed invalid. This response will be returned for... 1. Invalid API method names, or 2. Valid method names with invalid numbers of parameters, or 3. InChIKeys which do not match the pattern of a Standard InChIKey version 1
404 Not Found The resource corresponding to the supplied parameters does not exist. This response will be returned if the inputted data type (src_compound_id, src_id, or InChIKey matching the pattern of a Standard InChIKey version 1) does not exist in UniChem.
500 Service Unavailable An internal problem prevented us from fulfilling your request.

Methods

Get src_compound_ids from src_compound_id

Description:

  • Obtain a list of all src_compound_ids from all sources which are CURRENTLY assigned to the same structure as a currently assigned query src_compound_id.
  • The output will include query src_compound_id if it is a valid src_compound_id with a current assignment.
  • Note also, that by adding an additional (optional) argument (a valid src_id), then results will be restricted to only the source specified with this optional argument.

Extension of base url: src_compound_id
Number of required input parameters: 2 or 3
Input: /src_compound_id/src_id(/to_src_id)
Output: list of two element arrays, containing 'src_compound_id' and 'src_id', or (if optional 'to_src_id' is specified) list of 'src_compound_id's.
Example: https://www.ebi.ac.uk/unichem/rest/src_compound_id/CHEMBL12/1
Example: https://www.ebi.ac.uk/unichem/rest/src_compound_id/CHEMBL12/1/2

Get src_compound_ids all from src_compound_id

Description:

  • Obtain a list of all src_compound_ids from all sources (including BOTH current AND obsolete assignments) to the same structure as a currently assigned query src_compound_id.
  • The output will include query src_compound_id if it is a valid src_compound_id with a current assignment.
  • Note also, that by adding an additional (optional) argument (a valid src_id), then results will be restricted to only the source specified with this optional argument.

Extension of base url: src_compound_id_all
Number of required input parameters: 2 or 3
Input: /src_compound_id/src_id(/to_src_id)
Output: list of three element arrays, containing 'src_compound_id', 'src_id' and 'Assignment', or (if optional 'to_src_id' is specified) list of two element arrays, containing 'src_compound_id' and 'Assignment'.
Example: https://www.ebi.ac.uk/unichem/rest/src_compound_id_all/CHEMBL12/1
Example: https://www.ebi.ac.uk/unichem/rest/src_compound_id_all/CHEMBL12/1/2

Get mapping

Description: Obtain a full mapping between two sources. Uses only currently assigned src_compound_ids from both sources.
Extension of base url: mapping
Number of required input parameters: 2
Input: src_id/to_src_id
Output: list of two element arrays, containing 'src_compound_id' and 'src_compound_id'.
Example: https://www.ebi.ac.uk/unichem/rest/mapping/4/1

Get src_compound_ids from InChIKey

Description: Obtain a list of all src_compound_ids (from all sources) which are CURRENTLY assigned to a query InChIKey
Extension of base url: inchikey
Number of required input parameters: 1
Input: /InChIKey
Output: list of two element arrays, containing 'src_compound_id' and 'src_id'.
Example: https://www.ebi.ac.uk/unichem/rest/inchikey/AAOVKJBEBIDNHE-UHFFFAOYSA-N

Get src_compound_ids all from InChIKey

Description: Obtain a list of all src_compound_ids (from all sources) which have current AND obsolete assignments to a query InChIKey
Extension of base url: inchikey_all
Number of required input parameters: 1
Input: /InChIKey
Output: list of two element arrays, containing 'src_compound_id', 'src_id' and 'Assignment'.
Example: https://www.ebi.ac.uk/unichem/rest/inchikey_all/AAOVKJBEBIDNHE-UHFFFAOYSA-N

Get all src_ids

Description: Obtain all src_ids currently in UniChem
Extension of base url: src_ids
Number of required input parameters: 0
Input: - none -
Output: list of 'src_id's.
Example: https://www.ebi.ac.uk/unichem/rest/src_ids/

Get source infomation

Description: Obtain all information on a source by querying with a source id (src_id).
Extension of base url: sources
Number of required input parameters: 1
Input: /src_id
Output: list containing:

  • src_id (the src_id for this source).
  • src_url (the main home page of the source).
  • name (the unique name for the source in UniChem, always lower case).
  • name_long (the full name of the source, as defined by the source).
  • name_label (A name for the source suitable for use as a 'label' for the source within a web-page. Correct case setting for source, and always less than 30 characters).
  • description (a description of the content of the source).
  • base_id_url_available (an flag indicating whether this source provides a valid base_id_url for creating cpd-specific links [1=yes, 0=no]).
  • base_id_url (the base url for constructing hyperlinks to this source [append an identifier from this source to the end of this url to create a valid url to a specific page for this cpd], unless aux_for_url=1).
  • aux_for_url (A flag to indicate whether the aux_src field should be used to create hyperlinks instead of the src_compound_id [1=yes, 0=no].

Example: https://www.ebi.ac.uk/unichem/rest/sources/1

Get structure

Description: Obtain structure(s) CURRENTLY assigned to a query src_compound_id.
Extension of base url: structure
Number of required input parameters: 2
Input: /src_compound_id/src_id
Output: list of two element arrays, containing 'Standard InChI', and 'Standard InChIKey'
Example: https://www.ebi.ac.uk/unichem/rest/structure/CHEMBL12/1

Get structure all

Description: Obtain structure(s) with current AND obsolete assignments to a query src_compound_id.
Extension of base url: structure_all
Number of required input parameters: 2
Input: /src_compound_id/src_id
Output: list of three element arrays, containing 'Standard InChI', 'Standard InChIKey', and 'Assignment'
Example: https://www.ebi.ac.uk/unichem/rest/structure_all/CHEMBL12/1

Get URL for src_compound_ids from src_compound_id

Description:

  • Obtain a list of URLs for all src_compound_ids, from a specified source (the 'to_src_id'), which are CURRENTLY assigned to the same structure as a currently assigned query src_compound_id.
  • Method only applicable for sources which support direct URLs to src_compound_id pages.
  • Method also applicable for 'to_src_id's where the hyperlink is constructed from auxiliary data [and not from the src_compound_id] as per example 2 below.

Extension of base url: src_compound_id_url
Number of required input parameters: 3
Input: /src_compound_id/src_id/to_src_id
Output: list of URLs.
Example: https://www.ebi.ac.uk/unichem/rest/src_compound_id_url/CHEMBL12/1/2
Example: https://www.ebi.ac.uk/unichem/rest/src_compound_id_url/CHEMBL490/1/15

Get src_compound_ids all from obsolete src_compound_id

Description:

  • Obtain a list of all src_compound_ids from all sources with BOTH current AND obsolete to the same structure with an obsolete assignment to the query src_compound_id.
  • The output will include query src_compound_id if it is a valid src_compound_id with an obsolete assignment.
  • Note also, that by adding an additional (optional) argument (a valid src_id), then results will be restricted to only the source specified with this optional argument.

Extension of base url: src_compound_id_all_obsolete
Number of required input parameters: 2 or 3
Input: /src_compound_id/src_id(/to_src_id)
Output: list of four element arrays, containing 'src_compound_id', 'src_id', 'Assignment' and 'InChIKey', or (if optional 'to_src_id' is specified) list of three element arrays, containing 'src_compound_id', 'Assignment' and 'UCI'.
Example: https://www.ebi.ac.uk/unichem/rest/src_compound_id_all_obsolete/DB07699/2
Example: https://www.ebi.ac.uk/unichem/rest/src_compound_id_all_obsolete/DB07699/2/1

Get verbose src_compound_ids from InChIKey

Description:

  • Obtain all src_compound_ids (from all sources) which are CURRENTLY assigned to a query InChIKey.
  • However, these are returned as part of the following data structure:
    • A list of sources containing these src_compound_ids, including source description, base_id_url, etc.
    • One element in this list is a list of the src_compound_ids currently assigned to the query InChIKey.

Extension of base url: verbose_inchikey
Number of required input parameters: 1
Input: /InChIKey
Output: list containing:

  • src_id (the src_id for this source).
  • src_url (the main home page of the source).
  • name (the unique name for the source in UniChem, always lower case).
  • name_long (the full name of the source, as defined by the source).
  • name_label (A name for the source suitable for use as a 'label' for the source within a web page. Correct case setting for source, and always less than 30 characters.
  • description (a description of the content of the source).
  • base_id_url_available (an flag indicating whether this source provides a valid base_id_url for creating cpd-specific links [1=yes, 0=no]).
  • base_id_url (the base url for constructing hyperlinks to this source [append an identifier from this source to the end of this url to create a valid url to a specific page for this cpd], unless aux_for_url=1).
  • aux_for_url (A flag to indicate whether the aux_src field should be used to create hyperlinks instead of the src_compound_id [1=yes, 0=no] .
  • src_compound_id (a list of src_compound_ids from this source which are currently assigned to the query InChIKey.
  • aux_src (a list of src-compound_id keys mapping to corresponding auxiliary data (url_id:value), for creating links if aux_for_url=1. Only shown if aux_for_url=1).

Example: https://www.ebi.ac.uk/unichem/rest/verbose_inchikey/GZUITABIAKMVPG-UHFFFAOYSA-N

Get auxiliary mappings

Description:

  • For a single source, obtain a mapping between all current src_compound_ids to their corresponding auxiliary data.
  • See FAQ for an explanation of 'auxiliary data'. Please note that the only examples available (below) for this method may be very large data sets, and may therefore take a very long time to retrieve.
  • A much faster method of retrieving this same data set is to download the pre-cached, gzipped mapping file for the source of interest from the Auxiliary Data Mapping page.

Extension of base url: mappingaux
Number of required input parameters: 1
Input: /src_id
Output: list of two element arrays, containing 'src_compound_id' and 'auxiliary data'.
Example: https://www.ebi.ac.uk/unichem/rest/mappingaux/20

Get Connectivity data from InChIKey

Description:

  • One of two 'Connectivity-Based' Searching methods. Because of the complexity of these methods, details are shown elsewhere.
  • Follow link in 'Details' below.
  • This method replaces the now deprecated 'Get verbose src_compound_ids from InChIKey' method above. Extension of base url: key_search Number of required input parameters: 1 and optional criteria A-H Input: /InChIKey/A/B/C/D/E/F/G/H Detail: https://www.ebi.ac.uk/unichem/info/widesearchInfo

Get Connectivity data from src_compound_id

Description: One of two 'Connectivity-Based' Searching methods. Because of the complexity of these methods, details are shown elsewhere. Follow link in 'Details' below.
Extension of base url: cpd_search
Number of required input parameters: 2 and optional criteria A-H
Input: /src_compound_id/src_id/A/B/C/D/E/F/G/H
Detail: https://www.ebi.ac.uk/unichem/info/widesearchInfo

Get InChI from InChIKey

Description: Obtain InChI for InChIKey
Extension of base url: inchi
Number of required input parameters: 1
Input: /InChIKey
Output:
Example: https://www.ebi.ac.uk/unichem/rest/inchi/AAOVKJBEBIDNHE-UHFFFAOYSA-N

Get Current Release Information

Description: Obtain Information about the Current Release
Extension of base url: release
Number of required input parameters: 0
Input: - none -
Output: list containing:

  • release (the current release number [aka UDRI (UniChem Data Release ID)].
  • date (the date of the current release).
  • source_count (The number of sources in the current release).
  • structure_count (The number of structures in the current release).
  • xref_count (The number of XREF records [aka assignments] in the current release).
  • xref_count_current (The number of Current XREF records [aka current assignments] in the current release).
  • xref_count_obsolete (The number of Obsolete XREF records [aka obsolete assignments] in the current release). Example: https://www.ebi.ac.uk/unichem/rest/release/

Get src_ids for orphan src_compound_id

Description: Obtain a non-redundant list of all src_ids which have current assignments for an 'orphan src_compound_id' (see definition in 'Constructing Queries', above).
Extension of base url: orphanIdSource
Number of required input parameters: 1
Input: /src_compound_id
Output: list of possible 'src_id's for the query orphan src_compound_id
Example: https://www.ebi.ac.uk/unichem/rest/orphanIdSource/1234
Example: https://www.ebi.ac.uk/unichem/rest/orphanIdSource/AIN

Get mappings for orphan src_compound_id

Description:

  • Obtain mappings between the query 'orphan src_compound_id' (see definition in 'Constructing Queries', above) and all other src_compound_ids from all other sources.
  • A separate set of mappings is supplied for each possible src_id identified for the orphan src_compound_id.
  • Only current mappings are shown (ie: mappings between src_compound_ids with current assignments).
  • Optionally, returned data may be filtered to give only mappings TO a particular src_id of interest by adding a 'to_src_id' as an additional argument. Thus appending the additional arguement '1' to a query will return only mappings TO ChEMBL (see 3rd example below).
  • UCI (UniChem Identifiers) are also included for each mapping in case a query orphan src_compound_id were to be mapped to multiple structures in a single source (unlikely). In this case the UCI can be used to distinguish the mappings for the different structures. Extension of base url: orphanIdMap Number of required input parameters: 1 Input: /src_compound_id/(to_src_id) Output: list of possible 'src_id's for the query orphan src_compound_id, each one a key to a mapping between this 'orphan src_compound_id / src_id' combination and other src_compound_ids and src_ids. Example: https://www.ebi.ac.uk/unichem/rest/orphanIdMap/1234 Example: https://www.ebi.ac.uk/unichem/rest/orphanIdMap/AIN Example: https://www.ebi.ac.uk/unichem/rest/orphanIdMap/1234/1

Example cURL requests

The example cURL requests illustrate how the web services may be queried programmatically. In each case, the cURL request is shown, then the retrieved data.

Example 1: Retrieve Source information

Retrieve information relating to the source 'ChEMBL' (src_id = 1). Note that JSON is returned (the default serialization method).

curl https://www.ebi.ac.uk/unichem/rest/sources/1
[{"name":"chembl","description":"A database of bioactive drug-like small molecules and associated bioactivities abstracted from the scientific literature","name_long"  :"ChEMBL","base_id_url":"https://www.ebi.ac.uk/chembldb/compound/inspect/","src_id":"1","aux_for_url":"0","base_id_url_available":"1","name_label":"ChEMBL","src_url":"https://www.ebi.ac.uk/chembl/"}]

Example 2: Retrieve Structure Information

Retrieve the structure currently assigned to CHEMBL12 (from src_id = 1). Note that JSON is returned (the default serialization method).

curl https://www.ebi.ac.uk/unichem/rest/structure/CHEMBL12/1
[{"standardinchikey":"AAOVKJBEBIDNHE-UHFFFAOYSA-N","standardinchi":"InChI=1S/C16H13ClN2O/c1-19-14-8-7-12(17)9-13(14)16(18-10-15(19)20)11-5-3-2-4-6-11/h2-9H,10H2,1H3"}]

Example 3: Retrieve JSONP

Retrieve the structure currently assigned to CHEMBL12 (from src_id = 1), as per example 2, but append a callback to return JSONP.

curl https://www.ebi.ac.uk/unichem/rest/structure/CHEMBL12/1?callback=xyz
xyz([{"standardinchikey":"AAOVKJBEBIDNHE-UHFFFAOYSA-N","standardinchi":"InChI=1S/C16H13ClN2O/c1-19-14-8-7-12(17)9-13(14)16(18-10-15(19)20)11-5-3-2-4-6-11/h2-9H,10H2,1H3"}]);

Example 4: Retrieve XML

Retrieve the structure currently assigned to CHEMBL12 (from src_id = 1), as per example 2, but set the 'Accept Header' to return XML.

curl -H "Accept: text/xml" -H "Content-Type: text/xml" https://www.ebi.ac.uk/unichem/rest/structure/CHEMBL12/1
<opt\>
  <data standardinchi="InChI=1S/C16H13ClN2O/c1-19-14-8-7-12(17)9-13(14)16(18-10-15(19)20)11-5-3-2-4-6-11/h2-9H,10H2,1H3" standardinchikey="AAOVKJBEBIDNHE-UHFFFAOYSA-N" />
<opt\>