BioSamples https://www.ebi.ac.uk/biosamples/ stores and supplies descriptions and metadata about biological samples used in research and development by academia and industry. Samples are either 'reference' samples (e.g. from 1000 Genomes, HipSci, FAANG) or have been used in an assay database such as the European Nucleotide Archive (ENA) or ArrayExpress.
BioSamples also synchronizes data with the NCBI BioSample database and imports data from ENA.
This document provides information about the local installation, development environment setup instructions of BioSamples database.
-
Run this in terminal to install the dependent softwares.
sudo apt-get update sudo apt-get install openjdk-8-jdk maven git docker
-
Please make sure the software versions are correct.
mvn -v # Output: # Apache Maven 3.6.0 docker -v # Output # Docker version 18.06.1-ce java -version # Output # openjdk version "1.8.0_222"
-
Install BioSamples on your computer.
This process sets up a local compiled version of all biosamples tools. It requires a large download of Spring dependencies and uses up to two threads per core of your machine. The installation might take several minutes.
git clone https://github.com/EBIBioSamples/biosamples-v4.git cd biosamples-v4 mvn -T 2C package
-
Start Biosamples on your own machine
docker-compose up
If it returns
ERROR: Couldn't connect to Docker daemon - you might need to run docker-machine start default
. Please runsudo docker-compose up
instead. -
Now you can access the public interface at http://localhost:8081/biosamples/. So far, there is no data in the local sample.
-
Creat AAP account for API authentication and data upload
An AAP account is required to upload data through API. The API account can be registered at https://explore.aai.ebi.ac.uk/registerUser. A detailed instruction about user account and authentication can be found on https://www.ebi.ac.uk/biosamples/docs/guides/authentication.
Please replace ALL 'https://aai.ebi.ac.uk' in the authentication guide with 'https://explore.aai.ebi.ac.uk' to use the local BioSamples API.
-
Upload first test data
TOKEN=$(curl -u Username https://explore.api.aai.ebi.ac.uk/auth) curl 'http://localhost:8081/biosamples/samples' -i -X POST -H "Content-Type: application/json;charset=UTF-8" -H "Accept: application/hal+json" -H "Authorization: Bearer $TOKEN" -d '{ "name" : "FakeSample", "update" : "2019-07-16T09:47:20.003Z", "release" : "2019-07-16T09:47:20.003Z", "domain" : "self.ExampleDomain" }'
An example of the JSON format that can be sent by POST to http://localhost:8081/biosamples/beta/samples is at https://github.com/EBIBioSamples/biosamples-v4/blob/master/models/core/src/test/resources/TEST1.json
Download the XML dump (~400Mb) to the current directory:
Run the pipeline to send the data to the submission API via REST
docker-compose up biosamples-pipelines-ncbi
Note: You will need to mount the location that the XML dump was downloaded to within the docker container. A docker-compose.override.yml file is the easiest way to do that.
You can run the pipelines-ena to import ENA samples. In order to do that you will need to add some security settings to maven to get access to oracle private driver repository.
You can read more instructions about this at https://blogs.oracle.com/dev2dev/get-oracle-jdbc-drivers-and-ucp-from-oracle-maven-repository-without-ides
Cross-platform easy to use mongodb management tool http://www.mongoclient.com
Docker can be run from within a virtual machine e.g VirtualBox. This is useful if it causes any problems for your machine or if you have an OS that is not supported.
You might want to mount the virtual machines directory with the host, so you can work in a standard IDE outside of the VM. VirtualBox supports this.
If you ware using a virtual machine, you might also want to configure docker-compose to start by default.
As you make changes to the code, you can recompile it via Maven with:
mvn -T 2C package
And to get the new packages into the docker containers you will need to rebuild containers with:
docker-compose build
If needed, you can rebuild just a single container by specifying its name e.g.
docker-compose build biosamples-pipelines
To start a service, using docker compose will also start and dependent services it requires e.g.
docker-compose up biosamples-webapp-api
will also start solr, neo4j, mongo, and rabbitmq
To run an executable file in a docker container, and start its dependencies first use something like:
docker-compose run --service-ports biosamples-pipelines
If you want to add command line arguments note that these will entirely replace the executable in the docker-compose.yml file. So you need to do something like:
docker-compose run --service-ports biosamples-pipelines java -jar pipelines-4.0.0-SNAPSHOT.jar --debug
If you want to connect debugging tools to the java applications running inside docker containers, see instructions at http://www.jamasoftware.com/blog/monitoring-java-applications/
Note that you can bring maven and docker together into a single commandline like:
mvn -T 2C package && docker-compose build && docker-compose run --service-ports biosamples-pipelines
Beware, Docker tar’s and copies all the files on the filesystem from the location of docker-compose down. If you have data files there (e.g. downloads from ncbi, docker volumes, logs) then that process can take so long that it makes using Docker impractical.
As docker-compose creates new volumes each time, you may fill the disk docker is working on. To delete all docker volumes use:
docker volume ls -q | xargs -r docker volume rm
To delete all docker images use:
docker images -q | xargs -r docker rmi
Note
|
this will remove everything not just things for this project |
There is a spring client, and a spring-boot starter module, for use with BioSamples. To use these in a maven project, add the following to the appropriate sections:
<dependencies> <dependency> <groupId>uk.ac.ebi.biosamples</groupId> <artifactId>biosamples-spring-boot-starter</artifactId> <version>4.0.4</version> </dependency> </dependencies>
<repositories> <repository> <id>spotnexus</id> <url>https://www.ebi.ac.uk/spot/nexus/repository/maven-releases/</url> </repository> </repositories>
This can then be configured by several spring application.properties including biosamples.client.uri to specify the base URI of the BioSamples instance to use.
This was originally using spring-data-rest to expose rest API for the repositories. But there are a number of problems with this (see below) and that was scrapped in favor of implementing custom HATEOAS compliant endpoints.
Content type negotiation is not possible as it can’t overlap with the URLs for the Thymeleaf controllers and it can’t serve XML even with the appropriate converters supplied.
When repeatedly sending JSON because it is a list of things with optional components, the optional parts can become mixed if the list ordering changes. Maybe this can be remedied by using map of attribute types instead?
Solr has a limit on the field size (technically the term vector). Therefore the attribute values over 255 characters are not indexed in solr.