Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docker compose and online documentation for Multi-tenant registry #359

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docker/.env
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,12 @@ REG_API_APP_PROPERTIES_FILE=./default-config/application.properties
# --------------------------------------------------------------------

# Docker image of the Registry Loader
REG_LOADER_IMAGE=nasapds/registry-loader:0.4.1
REG_LOADER_IMAGE=nasapds/registry-loader:latest

# --------------------------------------------------------------------
# Registry Sweepers
# --------------------------------------------------------------------
REG_SWEEPERS_IMAGE=nasapds/registry-sweepers:1.2.1
REG_SWEEPERS_IMAGE=nasapds/registry-sweepers:latest
PROV_CREDENTIALS={"admin":"admin"}

# --------------------------------------------------------------------
Expand Down
26 changes: 12 additions & 14 deletions docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,18 @@ please refer to [https://docs.docker.com/compose/](https://docs.docker.com/compo

The docker-compose.yml file contains following profiles and each profile will start the components as shown in the table below.

| Components\\Profiles | dev-api | pds-core-registry | int-registry-batch-loader | int-registry-service-loader | pds-loader-services | pds-batch-loader | pds-service-loader | int-test |
|:-----------------------------------------------------|:-------:|:-----------------:|:-------------------------:|:---------------------------:|:-------------------:|:----------------:|:------------------:|:--------:|
| Elasticsearch | ✓ | ✓ | ✓ | ✓ | | | | |
| Elasticsearch init | ✓ | ✓ | ✓ | ✓ | | | | |
| Registry API | | ✓ | ✓ | ✓ | | | | |
| Registry loader test init | ✓ | | ✓ | | | | | |
| Registry loader | | | | | | ✓ | | |
| Registry API integration tests (postman collection?) | | | ✓ | ✓ | | | | ✓ |
| Rabbitmq | | | | ✓ | ✓ | | | |
| Registry harvest service | | | | ✓ | ✓ | | | |
| Registry crawler service | | | | ✓ | ✓ | | | |
| Registry harvest cli test init | | | | ✓ | | | | |
| Registry harvest cli | | | | | | | ✓ | |
| TLS termination | | ✓ | ✓ | ✓ | | | | |
| Components\\Profiles | dev-api | pds-core-registry | int-registry-batch-loader | pds-batch-loader | int-test |
|:-----------------------------------------------------|:-------:|:-----------------:|:-------------------------:|:-----------------:|:--------:|
| Elasticsearch | ✓ | ✓ | ✓ | | |
| Elasticsearch init | ✓ | ✓ | ✓ | | |
| Registry API | | ✓ | ✓ | | |
| Nginx | ✓ | ✓ | ✓ | | ✓ |
| Registry loader test init | ✓ | | ✓ | | |
| Registry loader | | | | ✓ | |
| Registry API integration tests (postman collection?) | | | ✓ | | ✓ |
| TLS termination | | ✓ | ✓ | | |



With the use of above profiles the docker compose can start components individually
or as a group of components as follows. The `-d` option at the end of the commands is used to
Expand Down
2 changes: 1 addition & 1 deletion docker/default-config/application.properties
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ openSearch.registryRefIndex=registry-refs
openSearch.timeOutSeconds=60
# , separated list of the prefixes used in the opensearch indices,
# if none, keep this configuration empty.
openSearch.disciplineNodes=
openSearch.disciplineNodes=geo
openSearch.username=admin
openSearch.password=admin
openSearch.ssl=true
Expand Down
1 change: 0 additions & 1 deletion docker/default-config/es-auth.cfg
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
trust.self-signed = true
# TODO Warning: Use the default username and password only for testing purposes in local setup
user = admin
password = admin
4 changes: 4 additions & 0 deletions docker/default-config/local_registry.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<?xml version="1.0" encoding="UTF-8"?>
<registry_connection index="geo-registry">
<server_url trust_self_signed="false">https://elasticsearch:9200</server_url>
</registry_connection>
2 changes: 1 addition & 1 deletion docker/default-config/opensearch.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ plugins.security.system_indices.enabled: true
plugins.security.system_indices.indices: [".opendistro-alerting-config", ".opendistro-alerting-alert*", ".opendistro-anomaly-results*", ".opendistro-anomaly-detector*", ".opendistro-anomaly-checkpoints", ".opendistro-anomaly-detection-state", ".opendistro-reports-*", ".opendistro-notifications-*", ".opendistro-notebooks", ".opendistro-asynchronous-search-response*", ".replication-metadata-store"]
node.max_local_storage_nodes: 3

#logger.level: DEBUG
logger.level: DEBUG


######## End OpenSearch Security Demo Configuration ########
81 changes: 8 additions & 73 deletions docker/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ services:
- ${HARVEST_JOB_CONFIG_FILE}:/cfg/harvest-config.xml
- data-volume:${CONTAINER_HARVEST_data-volume}
- ./default-config/es-auth.cfg:/etc/es-auth.cfg
- ./default-config/local-registry.xml:/etc/local-registry.xml
networks:
- pds

Expand All @@ -58,6 +59,7 @@ services:
- data-volume:${CONTAINER_HARVEST_DATA_DIR}
- ./scripts/registry-loader-waits-for-elasticsearch.sh:/usr/local/bin/registry-loader-waits-for-elasticsearch.sh
- ./default-config/es-auth.cfg:/etc/es-auth.cfg
- ./custom-datasets.tar.gz:/etc/custom-datasets.tar.gz
networks:
- pds
entrypoint: /usr/local/bin/registry-loader-waits-for-elasticsearch.sh
Expand All @@ -71,19 +73,17 @@ services:
- PROV_CREDENTIALS=${PROV_CREDENTIALS}
- LOGLEVEL=DEBUG
- DEV_MODE=1
- MULTITENANCY_NODE_ID=geo
command: ["/usr/local/bin/sweepers_driver.py", "--legacy-sync"]
networks:
- pds
depends_on:
registry-loader-test-init:
condition: service_completed_successfully




# Starts Elasticsearch
elasticsearch:
profiles: ["dev-api", "dev-loader", "pds-core-registry", "int-registry-batch-loader", "int-registry-service-loader"]
profiles: ["dev-api", "dev-loader", "pds-core-registry", "int-registry-batch-loader", "os"]
image: ${ES_IMAGE}
environment:
- discovery.type=${ES_DISCOVERY_TYPE}
Expand All @@ -102,33 +102,21 @@ services:

# Initializes Elasticsearch by creating registry and data dictionary indices by utilizing the Registry Loader
elasticsearch-init:
profiles: ["dev-api", "pds-core-registry", "int-registry-batch-loader", "int-registry-service-loader"]
profiles: ["dev-api", "pds-core-registry", "int-registry-batch-loader"]
image: ${REG_LOADER_IMAGE}
environment:
- ES_URL=${ES_URL}
volumes:
- ./scripts/elasticsearch-init.sh:/usr/local/bin/elasticsearch-init.sh
- ./default-config/es-auth.cfg:/etc/es-auth.cfg
- ./default-config/local_registry.xml:/etc/local_registry.xml
networks:
- pds
entrypoint: ["bash", "/usr/local/bin/elasticsearch-init.sh"]

# Starts RabbitMQ
rabbit-mq:
profiles: ["dev-loader", "int-registry-service-loader", "pds-loader-services"]
image: rabbitmq:3.9-management
ports:
- "15672:15672"
- "5672:5672"
volumes:
- ./default-config/rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf:ro
- ./default-config/rabbitmq-definitions.json:/etc/rabbitmq/definitions.json:ro
networks:
- pds

# Starts the Registry API
registry-api:
profiles: ["dev-loader", "pds-core-registry", "int-registry-batch-loader", "int-registry-service-loader"]
profiles: ["pds-core-registry", "int-registry-batch-loader"]
image: ${REG_API_IMAGE}
environment:
- ES_URL=${ES_URL}
Expand All @@ -142,62 +130,9 @@ services:
- pds
command: /usr/local/registry-api-service/registry-api-waits-for-elasticsearch.sh

# Starts the Registry Harvest Service
registry-harvest-service:
profiles: ["int-registry-service-loader", "pds-loader-services"]
image: ${REGISTRY_HARVEST_SERVICE_IMAGE}
environment:
- ES_URL=${ES_URL}
ports:
- "8005:8005"
volumes:
- ${HARVEST_SERVER_CONFIG_FILE}:/cfg/harvest-server.cfg
- data-volume:${CONTAINER_HARVEST_DATA_DIR}
- ./default-config/es-auth.cfg:/etc/es-auth.cfg
networks:
- pds

# Starts the Registry Crawler Service
registry-crawler-service:
profiles: ["int-registry-service-loader", "pds-loader-services"]
image: ${REGISTRY_CRAWLER_SERVICE_IMAGE}
ports:
- "8001:8001"
volumes:
- $CRAWLER_SERVER_CONFIG_FILE:/cfg/crawler-server.cfg
- data-volume:${CONTAINER_HARVEST_DATA_DIR}
networks:
- pds

# Executes the Registry Harvest CLI
registry-harvest-cli:
profiles: ["pds-service-loader"]
image: ${REGISTRY_HARVEST_CLI_IMAGE}
volumes:
- ${HARVEST_JOB_CONFIG_FILE}:/cfg/harvest-job-config.xml
- data-volume:${CONTAINER_HARVEST_DATA_DIR}
- ${HARVEST_CLIENT_CONFIG_FILE}:/cfg/harvest-client.cfg
networks:
- pds

# Executes the Registry Harvest CLI with test data
registry-harvest-cli-test-init:
profiles: ["int-registry-service-loader"]
image: ${REGISTRY_HARVEST_CLI_IMAGE}
environment:
- RUN_TESTS=true
- TEST_DATA_URL=${TEST_DATA_URL}
- TEST_DATA_LIDVID=${TEST_DATA_LIDVID}
volumes:
- ${HARVEST_JOB_CONFIG_FILE}:/cfg/harvest-job-config.xml
- data-volume:${CONTAINER_HARVEST_DATA_DIR}
- ${HARVEST_CLIENT_CONFIG_FILE}:/cfg/harvest-client.cfg
networks:
- pds

# Executes an nginx service to expose the PDS4 archive
registry-web-archive:
profiles: ["dev-loader", "dev-api", "int-registry-batch-loader", "int-registry-service-loader"]
profiles: ["dev-loader", "dev-api", "int-registry-batch-loader"]
image: nginx
ports:
- "81:80"
Expand Down
6 changes: 3 additions & 3 deletions docker/postman/postman_collection.json
Original file line number Diff line number Diff line change
Expand Up @@ -513,7 +513,7 @@
"method": "GET",
"header": [],
"url": {
"raw": "{{baseUrl}}/products?q=(product_class EQ \"Product_Bundle\")",
"raw": "{{baseUrl}}/products?q=(product_class eq \"Product_Bundle\")",
"host": [
"{{baseUrl}}"
],
Expand All @@ -523,7 +523,7 @@
"query": [
{
"key": "q",
"value": "(product_class EQ \"Product_Bundle\")"
"value": "(product_class eq \"Product_Bundle\")"
}
]
}
Expand Down Expand Up @@ -3272,7 +3272,7 @@
"",
"pm.test(\"C2488844 Response contains same number of properties\", () => {",
" const responseJson = pm.response.json();",
" pm.expect(responseJson.length).to.be.eql(143);",
" pm.expect(responseJson.length).to.be.eql(131);",
"});",
"",
"pm.test(\"C2488844 Response property objects follow expected schema\", () => {",
Expand Down
2 changes: 1 addition & 1 deletion docker/scripts/elasticsearch-init.sh
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,4 @@ while ! curl --output /dev/null --silent --head --fail "$ES_URL" -u 'admin:admin
done

echo "Creating registry and data dictionary indices..." 1>&2
registry-manager create-registry -es "$ES_URL" -auth /etc/es-auth.cfg
registry-manager create-registry -es file:///etc/local_registry.xml -auth /etc/es-auth.cfg
Binary file modified docs/source/_static/images/registry_service.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
54 changes: 27 additions & 27 deletions docs/source/about.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,6 @@ The high level architecture of the PDS Registry Service is shown below:

.. image:: _static/images/registry_service.png

Details on the deployment on AWS are given in the :doc:`cloud architecture page</cloud/architecture>`.

The PDS Registry Application is the software which implements the components of the PDS Registry Service.


Expand All @@ -24,56 +22,58 @@ The PDS Registry Application

The core functionality for the PDS Registry Application is satisfied by `OpenSearch <https://opensearch.org/>`_.

The high level architecture of PDS Registry Application and its main components is shown below.

.. image:: _static/images/registry-arc.png


API
----

Provides read-only REST APIs to search and access PDS data. You can call REST APIs directly or
use Python or Java clients. More information about PDS API clients is available
`here <https://nasa-pds.github.io/pds-api-client/>`_.
use Python or Java clients.

The most popular client library is `peppi <https://nasa-pds.github.io/peppi>`_.

For direct access, the API is documented `here <https://nasa-pds.github.io/pds-api/guides/search.html>`_.




OpenSearch
-----------

`OpenSearch <https://opensearch.org/>`_ is a NoSQL database based on Apache Lucene project,
optimized for text search. All metadata extracted from PDS4 labels is stored in OpenSearch database.
optimized for text search. All metadata extracted from PDS4 labels is stored in the OpenSearch database provided by AWS as `OpenSearch Serverless Managed Service <https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless.html>`_.


Authentication/Authorization
-----------------------------

The access to the OpenSearch service is restricted using Cognito username and passwords provided by Engineering Node to the other PDS nodes.
Using their login, discipline node can write in their own OpenSearch indexes and read from all the other indexes.


Harvest
--------

Harvest is a software to crawl and extract metadata from PDS4 labels and to load
extracted information into OpenSearch. There are two versions of Harvest:
extracted information into OpenSearch.

* Standalone command-line tool.
* Scalable Harvest.
This command-line tool doesn't require complex installation and configuration.

**Standalone Harvest**

A command-line tool which doesn't require complex installation and configuration.
This tool is recommended for small data sets of up to 5,000-10,000 of PDS4 labels.
Registry Manager
-----------------

**Scalable Harvest**
A command-line tool to perform admin tasks on a Registry, such as:

Scalable Harvest consists of several server components: RabbitMQ message broker, Crawler server, and Harvest server.
These components can be deployed in the cloud or on-prem. Also there is a Harvest Client command-line tool to submit jobs
to server components asynchronously.
This setup is recommended if you want to process big data sets in parallel.
* Update product archive status.
* Delete products.
* Create or delete registry indices in OpenSearch (by Engineering Node administrators).
* Manage registry data dictionary (by Engineering Node administrators).

.. image:: _static/images/scalable-harvest.png


Registry Manager
-----------------
Registry Client
----------------

A command-line tool to perform admin tasks on a Registry, such as:
A command-line tool which provides full access to the OpenSearch API to handle operations not supported by the previous tools.
The application takes care of the authentication of the user and signs the queries as required by the AWS OpenSearch Serverless Managed Service.

* Create or delete registry indices in OpenSearch.
* Manage registry data dictionary.
* Update product archive status.
Loading
Loading