Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FileSystem (FS) cloudProvider option to importer #1

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

3Nigma
Copy link

@3Nigma 3Nigma commented Jan 14, 2022

Story

As a Hedera developer, I would like to have a way for the hedera-mirror-node to ingest local, hedera-services, live-generated data without the need to push/pull it through a S3 compatible file-service.

Architecture

To make local file-system loading of rcd files possible, a new hedera.mirror.importer.downloader.cloudProvider value called FS (short for FileSystem) has been defined. Setting this up will most likely require that the hedera.mirror.importer.network value be set to LOCAL so that the mirror-node runtime can pick-up local-system changes having implicit-root file-path set to /opt/hedera/services/data, which is the default hedera-services docker path when spinning up a local hedera-network. This root file-path can be customized via the hedera.mirror.importer.downloader.bucketName property.

The following diagram ilustrates the proposed architecture for this feature:
hedera-mirror-node-local-cloudprovider-architecture

The core contribution relies on refactoring the Importer > Downloader > S3AsyncClient dependency into a hierarchy based on common functionality starting from a FileClient interface that provides simple download and list-ing capabilities ontop of which an abstract MultiFileClient class is defined to allow for bulk-file downloads of StreamFilenames that match a provided filter predicate. Currently, MultiFileClient allows to downloadSignatureFiles.

Ontop of MultiFileClient we add a ParameterizedFileClient which is basically a MultiFileClient aware of Spring injected configuration with common property exports such as rootPath or pathPrefixFor (to get the node-dependent location prefix).

ParameterizedFileClient branches into a S3FileClient and a LocalFileClient that actually do the file(s) specific retrieval work.

On the other end, downloading through a FileClient now returns a generic instance of PendingDownload which, through a new DownloadResult interface, abstracts away S3 SDK retrieved-object responses (S3PendingDownload) and local-file downloads (see PendingDownload.SimpleResultForwarder).

Running it

Have your project-root application.yml configured to use the FS (FileSystem) cloud provider targeting a LOCAL network:

hedera:
  mirror:
    importer:
      downloader:
        cloudProvider: FS
      network: LOCAL

You can provide a different path for the importer to operate on via the hedera.mirror.importer.downloader.bucketName which defaults to /opt/hedera/services/data.
Also, you might want to decrease the record check-up frequency (the hedera.mirror.downloader.record.frequency) so that the downloader won't put too much pressure on the operating system host.

@3Nigma 3Nigma marked this pull request as draft January 14, 2022 15:22
@3Nigma 3Nigma requested a review from victorholo January 14, 2022 15:22
@3Nigma 3Nigma changed the title Added FileSystem (FS) cloudProvider option to importer Add FileSystem (FS) cloudProvider option to importer Jan 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant