-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support Hadoop-Compatible File System when use HDFS as exchange type #24627
base: master
Are you sure you want to change the base?
Add support Hadoop-Compatible File System when use HDFS as exchange type #24627
Conversation
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
|
||
boolean hdfsCompatibleFsEnabled = buildConfigObject(ExchangeHdfsConfig.class).isHdfsCompatibleFsEnabled(); | ||
|
||
if (scheme.equalsIgnoreCase("hdfs") || hdfsCompatibleFsEnabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cant you just use hdfs
as schema in the uri you pass in base directories config.
Why do we need this extra property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I use alluxio as backend now, so base directories must be alluxio://xxxx/xxx,HDFS client can access alluxio with alluxio schema and core-site.xml like this.
<property>
<name>fs.alluxio.impl</name>
<value>alluxio.hadoop.FileSystem</value>
<description>The Alluxio FileSystem (Hadoop 1.x and 2.x)</description>
</property>
<property>
<name>fs.AbstractFileSystem.alluxio.impl</name>
<value>alluxio.hadoop.AlluxioFileSystem</value>
<description>The Alluxio AbstractFileSystem (Hadoop 2.x)</description>
</property>
https://docs.alluxio.io/os/user/stable/en/compute/Hadoop-MapReduce.html
I think add this config, so we can use HDFS Client with oss or alluxio schmea to access oss, alluxio and so on. There is many Hadoop-Compatible File System, it can increase the flexibility of configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just change code so `alluxio://' URLs are using HDFS exchange impl? Not sure we need extra config for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for feedback. Since the HDFS client supports accessing other hadoop compatible file systems, I believe adding this config can provide users with more options. Now, if we restrict it to hdfs:// or alluxio:// here, it feels a bit inflexible. If we can support Alluxio here, why can't we support other hadoop compatible file systems? I believe this could be any hadoop compatible file system that the HDFS Client can access. By modifying the core-site.xml and adding the relevant client JARs, users can freely choose their file system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you insist, we can change it to only support Alluxio, but I still believe that adding a configuration can increase flexibility.
if (scheme.equalsIgnoreCase("hdfs") || scheme.equalsIgnoreCase("alluxio")) {
binder.bind(FileSystemExchangeStorage.class).to(HadoopFileSystemExchangeStorage.class).in(Scopes.SINGLETON);
configBinder(binder).bindConfig(ExchangeHdfsConfig.class);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not feel strongly. You may keep config property.
But maybe call it exchange.hdfs.skip-directory-scheme-validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it sounds good!
e0cd6a2
to
8bb1ece
Compare
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
8bb1ece
to
9cbd814
Compare
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Description
Add exchange.hdfs.skip-directory-scheme-validation to enable use Hadoop-Compatible File System with HDFS Client.
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: