Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support Hadoop-Compatible File System when use HDFS as exchange type #24627

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hqbhoho
Copy link

@hqbhoho hqbhoho commented Jan 3, 2025

Description

Add exchange.hdfs.skip-directory-scheme-validation to enable use Hadoop-Compatible File System with HDFS Client.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Add support Hadoop-Compatible File System when use HDFS as exchange type. 

Copy link

cla-bot bot commented Jan 3, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@raunaqmorarka raunaqmorarka requested a review from losipiuk January 7, 2025 05:50

boolean hdfsCompatibleFsEnabled = buildConfigObject(ExchangeHdfsConfig.class).isHdfsCompatibleFsEnabled();

if (scheme.equalsIgnoreCase("hdfs") || hdfsCompatibleFsEnabled) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cant you just use hdfs as schema in the uri you pass in base directories config.
Why do we need this extra property?

Copy link
Author

@hqbhoho hqbhoho Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I use alluxio as backend now, so base directories must be alluxio://xxxx/xxx,HDFS client can access alluxio with alluxio schema and core-site.xml like this.

<property>
  <name>fs.alluxio.impl</name>
  <value>alluxio.hadoop.FileSystem</value>
  <description>The Alluxio FileSystem (Hadoop 1.x and 2.x)</description>
</property>
<property>
  <name>fs.AbstractFileSystem.alluxio.impl</name>
  <value>alluxio.hadoop.AlluxioFileSystem</value>
  <description>The Alluxio AbstractFileSystem (Hadoop 2.x)</description>
</property>

https://docs.alluxio.io/os/user/stable/en/compute/Hadoop-MapReduce.html
I think add this config, so we can use HDFS Client with oss or alluxio schmea to access oss, alluxio and so on. There is many Hadoop-Compatible File System, it can increase the flexibility of configuration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just change code so `alluxio://' URLs are using HDFS exchange impl? Not sure we need extra config for that.

Copy link
Author

@hqbhoho hqbhoho Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for feedback. Since the HDFS client supports accessing other hadoop compatible file systems, I believe adding this config can provide users with more options. Now, if we restrict it to hdfs:// or alluxio:// here, it feels a bit inflexible. If we can support Alluxio here, why can't we support other hadoop compatible file systems? I believe this could be any hadoop compatible file system that the HDFS Client can access. By modifying the core-site.xml and adding the relevant client JARs, users can freely choose their file system.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you insist, we can change it to only support Alluxio, but I still believe that adding a configuration can increase flexibility.

if (scheme.equalsIgnoreCase("hdfs") || scheme.equalsIgnoreCase("alluxio")) {
            binder.bind(FileSystemExchangeStorage.class).to(HadoopFileSystemExchangeStorage.class).in(Scopes.SINGLETON);
            configBinder(binder).bindConfig(ExchangeHdfsConfig.class);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not feel strongly. You may keep config property.
But maybe call it exchange.hdfs.skip-directory-scheme-validation

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it sounds good!

@hqbhoho hqbhoho force-pushed the feature/support_exchage_with_hadoop_compatible branch from e0cd6a2 to 8bb1ece Compare January 9, 2025 12:58
Copy link

cla-bot bot commented Jan 9, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@hqbhoho hqbhoho force-pushed the feature/support_exchage_with_hadoop_compatible branch from 8bb1ece to 9cbd814 Compare January 9, 2025 13:02
Copy link

cla-bot bot commented Jan 9, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@hqbhoho hqbhoho requested a review from losipiuk January 9, 2025 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants