Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement S3 storage class for Streaming and External basic transfer adapters #81

Merged
merged 25 commits into from
Mar 15, 2021

Conversation

tomeksabala
Copy link
Contributor

@tomeksabala tomeksabala commented Mar 12, 2021

Implements a feature described in issue #3

I have created S3 storage with class giftless.storage.aws_s3.AwsS3Storage implementing two interfaces: StreamingStorage and ExternalStorage.

I have decided to support AWS authentication in a native way to boto3. In my opinion AWS users are used to the most common ways to provide credentials as I've explained in docs/source/storage-backends.md:127

One outstanding issue is the requirement.txt file in this PR. I needed to add boto3 and botocore requirements to requirements.in. After running make requirements.txt I couldn't keep the old formatting in place. I can't fix it once I'm told how to do this properly.

@tomeksabala tomeksabala changed the title #3 Implement S3 storage class for Streaming and External basic transfer adapters Implement S3 storage class for Streaming and External basic transfer adapters Mar 12, 2021
@tomeksabala tomeksabala marked this pull request as ready for review March 12, 2021 13:15
Copy link
Contributor

@shevron shevron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is great! I have a few comments, I don't think any of them is huge, but would be good to fix or get your feedback on them. I think the requirements.txt formatting is a result of an older version of pip-tools - try to upgrade to the latest and see what happens.

README.md Outdated Show resolved Hide resolved
docs/source/storage-backends.md Show resolved Hide resolved
giftless/storage/aws_s3.py Outdated Show resolved Hide resolved
giftless/storage/aws_s3.py Outdated Show resolved Hide resolved
giftless/storage/aws_s3.py Outdated Show resolved Hide resolved
giftless/storage/aws_s3.py Outdated Show resolved Hide resolved
giftless/storage/aws_s3.py Outdated Show resolved Hide resolved
giftless/storage/aws_s3.py Outdated Show resolved Hide resolved
giftless/storage/aws_s3.py Outdated Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
@tomeksabala tomeksabala requested a review from shevron March 15, 2021 12:48
@tomeksabala
Copy link
Contributor Author

Thanks @shevron . I've fixed/updated all listed issues. I have left two conversations unresolved to get your feedback on:

  1. ### Running updated yaml config with uWSGI section in the readme
  2. Type hints around safe_filename call.

Copy link
Contributor

@shevron shevron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting very close now 👍

docs/source/storage-backends.md Outdated Show resolved Hide resolved
docs/source/storage-backends.md Show resolved Hide resolved
Comment on lines 78 to 85
raw_filename = extra.get('filename') if extra else oid
assert raw_filename
filename = safe_filename(raw_filename)
params = {
'Bucket': self.bucket_name,
'Key': self._get_blob_path(prefix, oid),
'ResponseContentDisposition': f"attachment;filename={filename}"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nicer to not set content-disposition unless we need to (that is unless the user specified a custom name).

Something like:

Suggested change
raw_filename = extra.get('filename') if extra else oid
assert raw_filename
filename = safe_filename(raw_filename)
params = {
'Bucket': self.bucket_name,
'Key': self._get_blob_path(prefix, oid),
'ResponseContentDisposition': f"attachment;filename={filename}"
}
params = {'Bucket': self.bucket_name, 'Key': self._get_blob_path(prefix, oid)}
if extra and 'filename' in extra:
filename = safe_filename(extra['filename'])
params['ResponseContentDisposition'] = f'attachment; filename="{filename}"'

I think this will also pass type checking.
I've also added double quotes around the filename in the header, it is not strictly required unless the filename contains a space or a semicolon IIRC. While right now safe_filename prevents that, in the future I'd like to make it less strict and then it will be required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's helpful, thank you. I have manually verified and actually the 'ResponseContentDisposition' can only be provided if we want a custom filename for downloaded file. Otherwise it defaults to object it oid.
Type checks also passed this change. Thank you for fixing the syntax of the header too.
I have changed only the params dict init so it aligns better with the rest of the dicts in the class.

@tomeksabala
Copy link
Contributor Author

This is great feedback. Thanks as I've learned something new because of it :)

@tomeksabala tomeksabala requested a review from shevron March 15, 2021 17:10
Copy link
Contributor

@shevron shevron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good, thanks.

@shevron
Copy link
Contributor

shevron commented Mar 15, 2021

@tomeksabala this looks great now, I'm merging this to master, and will issue a new release with this included in the next few days - I want to ensure the docs are up to date, and run the full test suite with a live AWS account before I make an official "stable" release.

@shevron shevron merged commit c5e391a into datopian:master Mar 15, 2021
@shevron
Copy link
Contributor

shevron commented Mar 15, 2021

/cc @rufuspollock

@rufuspollock
Copy link
Member

@tomeksabala @shevron 👏 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants