Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement S3 storage class for Streaming and External basic transfer adapters #81

Merged
merged 25 commits into from
Mar 15, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
2450f46
Init of S3 Storage work.
tomeksabala Mar 4, 2021
d11f616
Updated requirements.in
tomeksabala Mar 8, 2021
8a0764a
Removed unused import.
tomeksabala Mar 8, 2021
c9d3c85
Put objects in corrects directory path.
tomeksabala Mar 8, 2021
13c9b94
Disabling vcr caching for live tests for dev.
tomeksabala Mar 8, 2021
8de9205
S3 bucket clean on test teardown.
tomeksabala Mar 8, 2021
0ee1772
Object exists implementation.
tomeksabala Mar 8, 2021
25588ba
Get size implementation.
tomeksabala Mar 8, 2021
d38b1b8
Fixup put object implementation.
tomeksabala Mar 8, 2021
e938639
Get object implementation.
tomeksabala Mar 8, 2021
d75f545
Get download and upload url implementations.
tomeksabala Mar 9, 2021
23955ce
path_prefix as optional param.
tomeksabala Mar 9, 2021
f82a29b
Updated put method to use multipart, unseekable stream friendly boto3…
tomeksabala Mar 9, 2021
57819b7
Fixed get_upload_action url gen and put method returning # bytes uplo…
tomeksabala Mar 10, 2021
faa9882
Removing multipart storage interface.
tomeksabala Mar 10, 2021
cc2e586
Sort out boto3 auth.
tomeksabala Mar 11, 2021
e8b8527
Removed type hints for boto3.
tomeksabala Mar 12, 2021
babf4c2
VCR cassettes for S3 Storage tests.
tomeksabala Mar 12, 2021
9469339
Regenerating requirements.txt
tomeksabala Mar 12, 2021
885fa2d
Fixing codestyle and formatting issues.
tomeksabala Mar 12, 2021
4dbd122
Implementing PR recommended changes.
tomeksabala Mar 15, 2021
294bebf
Switching the dependency between `get_size` and `exists`.
tomeksabala Mar 15, 2021
65f4c5f
Updating vcr test cassettes.
tomeksabala Mar 15, 2021
1550a31
Fixup; Rename AWS S3 to Amazon S3 in the docs
tomeksabala Mar 15, 2021
5fffc60
ResponseContentDisposition only with custom filenames.
tomeksabala Mar 15, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ storage backends:
* [Google Cloud Storage](https://cloud.google.com/storage)
* [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/)
with direct-to-cloud or streamed transfers
* [Amazon S3 Storage](https://aws.amazon.com/s3/)

In addition, Giftless implements a custom transfer mode called `multipart-basic`,
which is designed to take advantage of many vendors' multipart upload
Expand Down
27 changes: 27 additions & 0 deletions docs/source/storage-backends.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,33 @@ TRANSFER_ADAPTERS:
bucket_name: git-lfs
account_key_base64: S0m3B4se64RandomStuff.....ThatI5Redac7edHeReF0rRead4b1lity==
```
### Amazon S3 Storage

#### `giftless.storage.amazon_s3:AmazonS3Storage`
Modify your `giftless.yaml` file according to the following config:

```bash
$ cat giftless.yaml

TRANSFER_ADAPTERS:
basic:
factory: giftless.transfer.basic_external:factory
options:
storage_class: giftless.storage.amazon_s3:AmazonS3Storage
storage_options:
bucket_name: bucket-name
path_prefix: optional_prefix
```

#### boto3 authentication
`AwsS3Storage` supports 3 ways of authentication defined in more detail in
[docs](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html):
1. Environment variables
2. Shared credential file (~/.aws/credentials)
3. AWS config file (~/.aws/config)
4. Instance metadata service on an Amazon EC2 instance that has an IAM role configured (usually used in production).

### Running updated yaml config with uWSGI
shevron marked this conversation as resolved.
Show resolved Hide resolved

After configuring your `giftless.yaml` file, export it:

Expand Down
111 changes: 111 additions & 0 deletions giftless/storage/amazon_s3.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
import os
from typing import Any, BinaryIO, Dict, Iterable, Optional

import boto3 # type: ignore
import botocore # type: ignore

from giftless.storage import ExternalStorage, StreamingStorage
from giftless.storage.exc import ObjectNotFound
from giftless.util import safe_filename


class AmazonS3Storage(StreamingStorage, ExternalStorage):
"""AWS S3 Blob Storage backend.
"""

def __init__(self, bucket_name: str, path_prefix: Optional[str] = None, **_):
self.bucket_name = bucket_name
self.path_prefix = path_prefix
self.s3 = boto3.resource('s3')
self.s3_client = boto3.client('s3')

def get(self, prefix: str, oid: str) -> Iterable[bytes]:
if not self.exists(prefix, oid):
raise ObjectNotFound()
result: Iterable[bytes] = self._s3_object(prefix, oid).get()['Body']
return result

def put(self, prefix: str, oid: str, data_stream: BinaryIO) -> int:
completed = []

def upload_callback(size):
completed.append(size)

bucket = self.s3.Bucket(self.bucket_name)
bucket.upload_fileobj(data_stream, self._get_blob_path(prefix, oid), Callback=upload_callback)
return sum(completed)

def exists(self, prefix: str, oid: str) -> bool:
try:
self.get_size(prefix, oid)
except ObjectNotFound:
return False
return True

def get_size(self, prefix: str, oid: str) -> int:
try:
result: int = self._s3_object(prefix, oid).content_length
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
raise ObjectNotFound()
else:
raise e
return result

def get_upload_action(self, prefix: str, oid: str, size: int, expires_in: int,
extra: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:
params = {
'Bucket': self.bucket_name,
'Key': self._get_blob_path(prefix, oid)
}
response = self.s3_client.generate_presigned_url('put_object',
Params=params,
ExpiresIn=expires_in
)
return {
"actions": {
"upload": {
"href": response,
"header": {},
"expires_in": expires_in
}
}
}

def get_download_action(self, prefix: str, oid: str, size: int, expires_in: int,
extra: Optional[Dict[str, str]] = None) -> Dict[str, Any]:

params = {
'Bucket': self.bucket_name,
'Key': self._get_blob_path(prefix, oid)
}
if extra and 'filename' in extra:
filename = safe_filename(extra['filename'])
params['ResponseContentDisposition'] = f'attachment; filename="{filename}"'
response = self.s3_client.generate_presigned_url('get_object',
Params=params,
ExpiresIn=expires_in
)
return {
"actions": {
"download": {
"href": response,
"header": {},
"expires_in": expires_in
}
}
}

def _get_blob_path(self, prefix: str, oid: str) -> str:
"""Get the path to a blob in storage
"""
if not self.path_prefix:
storage_prefix = ''
elif self.path_prefix[0] == '/':
storage_prefix = self.path_prefix[1:]
else:
storage_prefix = self.path_prefix
return os.path.join(storage_prefix, prefix, oid)

def _s3_object(self, prefix, oid):
return self.s3.Object(self.bucket_name, self._get_blob_path(prefix, oid))
1 change: 1 addition & 0 deletions requirements.in
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@ git+https://github.com/teracyhq/flask-classful.git@3bbab31#egg=flask-classful
# TODO: Split these out so users don't have to install all of them
azure-storage-blob==12.2.*
google-cloud-storage==1.28.*
boto3==1.17.*
20 changes: 18 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,12 @@ azure-core==1.8.2
# via azure-storage-blob
azure-storage-blob==12.2.0
# via -r requirements.in
boto3==1.17.26
# via -r requirements.in
botocore==1.20.26
# via
# boto3
# s3transfer
cachetools==4.1.1
# via google-auth
certifi==2020.6.20
Expand Down Expand Up @@ -57,6 +63,10 @@ itsdangerous==1.1.0
# via flask
jinja2==2.11.2
# via flask
jmespath==0.10.0
# via
# boto3
# botocore
markupsafe==1.1.1
# via jinja2
marshmallow-enum==1.5.1
Expand Down Expand Up @@ -85,7 +95,9 @@ pycparser==2.20
pyjwt==1.7.1
# via -r requirements.in
python-dateutil==2.8.1
# via -r requirements.in
# via
# -r requirements.in
# botocore
python-dotenv==0.13.0
# via -r requirements.in
pytz==2020.4
Expand All @@ -102,6 +114,8 @@ requests==2.24.0
# requests-oauthlib
rsa==4.6
# via google-auth
s3transfer==0.3.4
# via boto3
six==1.15.0
# via
# azure-core
Expand All @@ -116,7 +130,9 @@ six==1.15.0
typing-extensions==3.7.4.3
# via -r requirements.in
urllib3==1.25.11
# via requests
# via
# botocore
# requests
webargs==5.5.3
# via -r requirements.in
werkzeug==1.0.1
Expand Down
Loading