Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NSFS | versioning | calculate multipart upload version by creation time #8644

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nadavMiz
Copy link
Contributor

@nadavMiz nadavMiz commented Jan 2, 2025

Explain the changes

according to aws specifications multipart upload version creation time rather than completion time. this effects version ordering as will no have the latest version if another object was uploaded after multipart upload creation but before completion

  1. add fs_napi function utimensat to change access time of files. see https://man7.org/linux/man-pages/man3/futimens.3p.html
  2. change the mtime of the upload file after mp completion to have the creation time, for correct version id time
  3. on move_to_dest_versioning add case: when the version is not latest move directly to .version and break since we don't need to handle latest version (it remains the same in this case)
  4. add tests for both new fs function and multipart upload order

Issues: Fixed #8411

Testing Instructions:

  1. utimensat fs function - sudo node ./node_modules/mocha/bin/mocha src/test/unit_tests/test_nb_native_fs.js
  2. multipart upload versioning - sudo NC_CORETEST=true node ./node_modules/mocha/bin/mocha src/test/unit_tests/test_bucketspace_versioning.js
  • Doc added/updated
  • Tests added

@nadavMiz nadavMiz requested review from guymguym and romayalon January 2, 2025 09:31
@nadavMiz nadavMiz force-pushed the multipart-version-ordering branch 6 times, most recently from 12445b6 to 2c10bc0 Compare January 5, 2025 05:25
@nadavMiz nadavMiz self-assigned this Jan 8, 2025
@nadavMiz nadavMiz force-pushed the multipart-version-ordering branch from 2c10bc0 to b3b9a42 Compare January 8, 2025 12:29
src/native/fs/fs_napi.cpp Show resolved Hide resolved
src/native/fs/fs_napi.cpp Show resolved Hide resolved
src/test/unit_tests/test_nb_native_fs.js Show resolved Hide resolved
src/sdk/namespace_fs.js Outdated Show resolved Hide resolved
src/sdk/namespace_fs.js Show resolved Hide resolved
src/sdk/namespace_fs.js Outdated Show resolved Hide resolved
src/test/unit_tests/test_bucketspace_versioning.js Outdated Show resolved Hide resolved
assert.ok(exist);
const comp_res = await compare_version_ids(full_path, mpu_key1, res_put_object.VersionId, res.VersionId);
assert.ok(comp_res);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an assert so we can see that res_put_object version_id is the latest explicitly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compare_version_ids does exactly that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Shira it's better to call headObject and check that the latest version is the single put version and not the MPU

src/test/unit_tests/test_bucketspace_versioning.js Outdated Show resolved Hide resolved
@shirady
Copy link
Contributor

shirady commented Jan 13, 2025

@nadavMiz I'm adding a TODO here so we will not forget - run the test on MAC as well.

@nadavMiz nadavMiz force-pushed the multipart-version-ordering branch from b3b9a42 to cc844df Compare January 14, 2025 09:20
@nadavMiz nadavMiz force-pushed the multipart-version-ordering branch from cc844df to 7d5bf69 Compare January 14, 2025 09:33
src/sdk/namespace_fs.js Show resolved Hide resolved
const create_object_upload_stat = await nb_native().fs.stat(fs_context, create_object_upload_path);
//according to aws, multipart upload version time should be calculated based on creation rather then completion time.
//change the files mtime to match the creation time
await nb_native().fs.utimensat(fs_context, upload_path, {modtime: create_object_upload_stat.mtimeNsBigint});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think we should fail the whole upload in case we were not able to set the mtime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can use the existing time in case it fails

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guymguym WDYT? I'm not sure, since it might create a wrong .versions/ stack order, but on the other hand we probably don't want to fail on that

src/sdk/namespace_fs.js Show resolved Hide resolved
src/sdk/namespace_fs.js Show resolved Hide resolved
src/sdk/namespace_fs.js Show resolved Hide resolved
@@ -407,7 +407,54 @@ mocha.describe('bucketspace namespace_fs - versioning', function() {
const exist = await version_file_exists(full_path, mpu_key1, '', prev_version_id);
assert.ok(exist);
});

mocha.it('mpu object - versioning enabled - put object before running mpu complete. mpu version should move to .versions dir and not latest', async function() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth adding a test for null version

@@ -953,6 +953,10 @@ interface NativeFS {
checkAccess(fs_context: NativeFSContext, path: string): Promise<void>;
getsinglexattr(fs_context: NativeFSContext, path: string, key: string): Promise<string>;
getpwname(fs_context: NativeFSContext, user: string): Promise<NativeFSUserObject>;
utimensat(fs_context: NativeFSContext, path: string, times: {
modtime?: bigint,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use the conventional mtime and atime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modtime and actime are the names used by utime. https://man7.org/linux/man-pages/man2/utime.2.html. utimensat uses an array instead of a struct so it doesn't give them a name, but I decided to use the same naming. I can change them to fit with the names used in our code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you can choose either, mtime and atime used in our code because of the node.js terms

SYSCALL_OR_RETURN(utimensat(AT_FDCWD, _path.c_str(), _times, 0));
}

void get_time_from_info(const Napi::Object& napi_times, const std::string& time_type, struct timespec& timeval, std::string& log_str) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls move helper functions to not be in the middle of the workers

Copy link
Contributor Author

@nadavMiz nadavMiz Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_time_from_info is only relevant to utimensat, so I added it as a member function to the class. where would you propose to move it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw helper functions both at the top and at the bottom of this file, so check what you think is the best location

timeval.tv_nsec = time % int64_t(1e9);
log_str = std::to_string(time);
} else {
timeval.tv_nsec = UTIME_OMIT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when will this happen? was this flow tested?

Copy link
Contributor Author

@nadavMiz nadavMiz Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will happen when one of the times is undefined. it means the time will not be changed. it is tested in Utimensat - change only mtime

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but in namespaceFS we always take a defined mtime, I mean, I don't see how we can reach this code from our flow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function is used both by mtime and atime. currently we always get here for atime (I can also see this in the logs). also I think its good to leave an option to just change the atime and not the mtime, in case we want to use this function elsewhere

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got you, so for atime this flag will just keep it as it is since we pass only mtime

}
virtual void Work()
{
SYSCALL_OR_RETURN(utimensat(AT_FDCWD, _path.c_str(), _times, 0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does mac have utimensat() function and same behavior?

Copy link
Contributor Author

@nadavMiz nadavMiz Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its hard to know for sure as I don't have mac. but from what I found its available from osx 10.13. not sure what version of mac we start to support from.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please always make sure you are not breaking native build for the developers on mac, I sent you on slack the man utimensat

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, than it looks like it works on mac. also I should have mentioned it before: utimensat is defined in POSIX, so it should work on mac the same. even though you can never know. We will test this on mac regardless before pushing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants