Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement S3 storage class for Streaming and External basic transfer adapters #81

Merged
merged 25 commits into from
Mar 15, 2021
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
2450f46
Init of S3 Storage work.
tomeksabala Mar 4, 2021
d11f616
Updated requirements.in
tomeksabala Mar 8, 2021
8a0764a
Removed unused import.
tomeksabala Mar 8, 2021
c9d3c85
Put objects in corrects directory path.
tomeksabala Mar 8, 2021
13c9b94
Disabling vcr caching for live tests for dev.
tomeksabala Mar 8, 2021
8de9205
S3 bucket clean on test teardown.
tomeksabala Mar 8, 2021
0ee1772
Object exists implementation.
tomeksabala Mar 8, 2021
25588ba
Get size implementation.
tomeksabala Mar 8, 2021
d38b1b8
Fixup put object implementation.
tomeksabala Mar 8, 2021
e938639
Get object implementation.
tomeksabala Mar 8, 2021
d75f545
Get download and upload url implementations.
tomeksabala Mar 9, 2021
23955ce
path_prefix as optional param.
tomeksabala Mar 9, 2021
f82a29b
Updated put method to use multipart, unseekable stream friendly boto3…
tomeksabala Mar 9, 2021
57819b7
Fixed get_upload_action url gen and put method returning # bytes uplo…
tomeksabala Mar 10, 2021
faa9882
Removing multipart storage interface.
tomeksabala Mar 10, 2021
cc2e586
Sort out boto3 auth.
tomeksabala Mar 11, 2021
e8b8527
Removed type hints for boto3.
tomeksabala Mar 12, 2021
babf4c2
VCR cassettes for S3 Storage tests.
tomeksabala Mar 12, 2021
9469339
Regenerating requirements.txt
tomeksabala Mar 12, 2021
885fa2d
Fixing codestyle and formatting issues.
tomeksabala Mar 12, 2021
4dbd122
Implementing PR recommended changes.
tomeksabala Mar 15, 2021
294bebf
Switching the dependency between `get_size` and `exists`.
tomeksabala Mar 15, 2021
65f4c5f
Updating vcr test cassettes.
tomeksabala Mar 15, 2021
1550a31
Fixup; Rename AWS S3 to Amazon S3 in the docs
tomeksabala Mar 15, 2021
5fffc60
ResponseContentDisposition only with custom filenames.
tomeksabala Mar 15, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ storage backends:
* [Google Cloud Storage](https://cloud.google.com/storage)
* [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/)
with direct-to-cloud or streamed transfers
* [AWS S3 Storage](https://aws.amazon.com/s3/)
tomeksabala marked this conversation as resolved.
Show resolved Hide resolved

In addition, Giftless implements a custom transfer mode called `multipart-basic`,
which is designed to take advantage of many vendors' multipart upload
Expand Down
42 changes: 42 additions & 0 deletions docs/source/storage-backends.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,48 @@ TRANSFER_ADAPTERS:
bucket_name: git-lfs
account_key_base64: S0m3B4se64RandomStuff.....ThatI5Redac7edHeReF0rRead4b1lity==
```
### AWS S3 Storage
tomeksabala marked this conversation as resolved.
Show resolved Hide resolved

#### `giftless.storage.aws_s3:AwsS3Storage`
Modify your `giftless.yaml` file according to the following config:

```bash
$ cat giftless.yaml

TRANSFER_ADAPTERS:
basic:
factory: giftless.transfer.basic_external:factory
options:
storage_class: giftless.storage.aws_s3:AwsS3Storage
storage_options:
aws_s3_bucket_name: bucket-name
path_prefix: optional_prefix
```

#### boto3 authentication
`AwsS3Storage` supports 3 ways of authentication defined in more detail in
[docs](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html):
1. Environment variables
2. Shared credential file (~/.aws/credentials)
3. AWS config file (~/.aws/config)
4. Instance metadata service on an Amazon EC2 instance that has an IAM role configured (usually used in production).

### Running updated yaml config with uWSGI
shevron marked this conversation as resolved.
Show resolved Hide resolved
After configuring your `giftless.yaml` file, export it:
```bash
$ export GIFTLESS_CONFIG_FILE=giftless.yaml
```

You will need uWSGI running. Install it with your preferred package manager.
Here is an example of how to run it:

```bash
# Run uWSGI in HTTP mode on port 8080
$ uwsgi -M -T --threads 2 -p 2 --manage-script-name \
--module giftless.wsgi_entrypoint --callable app --http 127.0.0.1:8080
```

See `giftless/config.py` for some default configuration options.

After configuring your `giftless.yaml` file, export it:

Expand Down
110 changes: 110 additions & 0 deletions giftless/storage/aws_s3.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
import os
from typing import Any, BinaryIO, Dict, Iterable, Optional

import boto3 # type: ignore
import botocore # type: ignore

from giftless.storage import ExternalStorage, StreamingStorage
from giftless.storage.exc import ObjectNotFound


class AwsS3Storage(StreamingStorage, ExternalStorage):
tomeksabala marked this conversation as resolved.
Show resolved Hide resolved
"""AWS S3 Blob Storage backend.
"""

def __init__(self, aws_s3_bucket_name: str, path_prefix: Optional[str] = None, **_):
tomeksabala marked this conversation as resolved.
Show resolved Hide resolved
self.aws_s3_bucket_name = aws_s3_bucket_name
self.path_prefix = path_prefix
self.s3 = boto3.resource('s3')
self.s3_client = boto3.client('s3')

def get(self, prefix: str, oid: str) -> Iterable[bytes]:
if not self.exists(prefix, oid):
raise ObjectNotFound()
result: Iterable[bytes] = self._s3_object(prefix, oid).get()['Body']
return result

def put(self, prefix: str, oid: str, data_stream: BinaryIO) -> int:
completed = []

def upload_callback(size):
completed.append(size)

bucket = self.s3.Bucket(self.aws_s3_bucket_name)
bucket.upload_fileobj(data_stream, self._get_blob_path(prefix, oid), Callback=upload_callback)
return sum(completed)

def exists(self, prefix: str, oid: str) -> bool:
s3_object = self._s3_object(prefix, oid)
try:
s3_object.load()
tomeksabala marked this conversation as resolved.
Show resolved Hide resolved
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
return False
else:
raise RuntimeError(e)
tomeksabala marked this conversation as resolved.
Show resolved Hide resolved
return True

def get_size(self, prefix: str, oid: str) -> int:
tomeksabala marked this conversation as resolved.
Show resolved Hide resolved
if self.exists(prefix, oid):
result: int = self._s3_object(prefix, oid).content_length
return result
else:
raise ObjectNotFound()

def get_upload_action(self, prefix: str, oid: str, size: int, expires_in: int,
extra: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:
params_ = {
tomeksabala marked this conversation as resolved.
Show resolved Hide resolved
'Bucket': self.aws_s3_bucket_name,
'Key': self._get_blob_path(prefix, oid)
}
response = self.s3_client.generate_presigned_url('put_object',
Params=params_,
ExpiresIn=expires_in
)
return {
"actions": {
"upload": {
"href": response,
"header": {},
"expires_in": expires_in
}
}
}

def get_download_action(self, prefix: str, oid: str, size: int, expires_in: int,
extra: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:

filename = extra.get('filename') if extra else oid
params_ = {
'Bucket': self.aws_s3_bucket_name,
'Key': self._get_blob_path(prefix, oid),
'ResponseContentDisposition': f"attachment; filename = {filename}"
shevron marked this conversation as resolved.
Show resolved Hide resolved
}
response = self.s3_client.generate_presigned_url('get_object',
Params=params_,
ExpiresIn=expires_in
)
return {
"actions": {
"download": {
"href": response,
"header": {},
"expires_in": expires_in
}
}
}

def _get_blob_path(self, prefix: str, oid: str) -> str:
"""Get the path to a blob in storage
"""
if not self.path_prefix:
storage_prefix = ''
elif self.path_prefix[0] == '/':
storage_prefix = self.path_prefix[1:]
else:
storage_prefix = self.path_prefix
return os.path.join(storage_prefix, prefix, oid)

def _s3_object(self, prefix, oid):
return self.s3.Object(self.aws_s3_bucket_name, self._get_blob_path(prefix, oid))
1 change: 1 addition & 0 deletions requirements.in
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@ git+https://github.com/teracyhq/flask-classful.git@3bbab31#egg=flask-classful
# TODO: Split these out so users don't have to install all of them
azure-storage-blob==12.2.*
google-cloud-storage==1.28.*
boto3==1.17.*
165 changes: 48 additions & 117 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,123 +4,54 @@
#
# pip-compile --no-index --output-file=requirements.txt requirements.in
#
azure-core==1.8.2
tomeksabala marked this conversation as resolved.
Show resolved Hide resolved
# via azure-storage-blob
azure-storage-blob==12.2.0
# via -r requirements.in
cachetools==4.1.1
# via google-auth
certifi==2020.6.20
# via
# msrest
# requests
cffi==1.14.3
# via cryptography
chardet==3.0.4
# via requests
click==7.1.2
# via flask
cryptography==3.3.2
# via
# -r requirements.in
# azure-storage-blob
figcan==0.0.4
# via -r requirements.in
git+https://github.com/teracyhq/flask-classful.git@3bbab31#egg=flask-classful
# via -r requirements.in
flask-marshmallow==0.11.0
# via -r requirements.in
flask==1.1.2
# via
# -r requirements.in
# flask-classful
# flask-marshmallow
google-api-core==1.23.0
# via google-cloud-core
google-auth==1.23.0
# via
# google-api-core
# google-cloud-storage
google-cloud-core==1.4.3
# via google-cloud-storage
google-cloud-storage==1.28.1
# via -r requirements.in
google-resumable-media==0.5.1
# via google-cloud-storage
googleapis-common-protos==1.52.0
# via google-api-core
idna==2.10
# via requests
isodate==0.6.0
# via msrest
itsdangerous==1.1.0
# via flask
jinja2==2.11.2
# via flask
markupsafe==1.1.1
# via jinja2
marshmallow-enum==1.5.1
# via -r requirements.in
marshmallow==3.9.0
# via
# flask-marshmallow
# marshmallow-enum
# webargs
msrest==0.6.19
# via azure-storage-blob
oauthlib==3.1.0
# via requests-oauthlib
protobuf==3.13.0
# via
# google-api-core
# googleapis-common-protos
pyasn1-modules==0.2.8
# via google-auth
pyasn1==0.4.8
# via
# pyasn1-modules
# rsa
pycparser==2.20
# via cffi
pyjwt==1.7.1
# via -r requirements.in
python-dateutil==2.8.1
# via -r requirements.in
python-dotenv==0.13.0
# via -r requirements.in
pytz==2020.4
# via google-api-core
pyyaml==5.3.1
# via -r requirements.in
requests-oauthlib==1.3.0
# via msrest
requests==2.24.0
# via
# azure-core
# google-api-core
# msrest
# requests-oauthlib
rsa==4.6
# via google-auth
six==1.15.0
# via
# azure-core
# cryptography
# flask-marshmallow
# google-api-core
# google-auth
# google-resumable-media
# isodate
# protobuf
# python-dateutil
typing-extensions==3.7.4.3
# via -r requirements.in
urllib3==1.25.11
# via requests
webargs==5.5.3
# via -r requirements.in
werkzeug==1.0.1
# via flask
azure-core==1.8.2 # via azure-storage-blob
azure-storage-blob==12.2.0 # via -r requirements.in
boto3==1.17.26 # via -r requirements.in
botocore==1.20.26 # via boto3, s3transfer
cachetools==4.1.1 # via google-auth
certifi==2020.6.20 # via msrest, requests
cffi==1.14.3 # via cryptography
chardet==3.0.4 # via requests
click==7.1.2 # via flask
cryptography==3.3.2 # via -r requirements.in, azure-storage-blob
figcan==0.0.4 # via -r requirements.in
git+https://github.com/teracyhq/flask-classful.git@3bbab31#egg=flask-classful # via -r requirements.in
flask-marshmallow==0.11.0 # via -r requirements.in
flask==1.1.2 # via -r requirements.in, flask-classful, flask-marshmallow
google-api-core==1.23.0 # via google-cloud-core
google-auth==1.23.0 # via google-api-core, google-cloud-storage
google-cloud-core==1.4.3 # via google-cloud-storage
google-cloud-storage==1.28.1 # via -r requirements.in
google-resumable-media==0.5.1 # via google-cloud-storage
googleapis-common-protos==1.52.0 # via google-api-core
idna==2.10 # via requests
isodate==0.6.0 # via msrest
itsdangerous==1.1.0 # via flask
jinja2==2.11.2 # via flask
jmespath==0.10.0 # via boto3, botocore
markupsafe==1.1.1 # via jinja2
marshmallow-enum==1.5.1 # via -r requirements.in
marshmallow==3.9.0 # via flask-marshmallow, marshmallow-enum, webargs
msrest==0.6.19 # via azure-storage-blob
oauthlib==3.1.0 # via requests-oauthlib
protobuf==3.13.0 # via google-api-core, googleapis-common-protos
pyasn1-modules==0.2.8 # via google-auth
pyasn1==0.4.8 # via pyasn1-modules, rsa
pycparser==2.20 # via cffi
pyjwt==1.7.1 # via -r requirements.in
python-dateutil==2.8.1 # via -r requirements.in, botocore
python-dotenv==0.13.0 # via -r requirements.in
pytz==2020.4 # via google-api-core
pyyaml==5.3.1 # via -r requirements.in
requests-oauthlib==1.3.0 # via msrest
requests==2.24.0 # via azure-core, google-api-core, msrest, requests-oauthlib
rsa==4.6 # via google-auth
s3transfer==0.3.4 # via boto3
six==1.15.0 # via azure-core, cryptography, flask-marshmallow, google-api-core, google-auth, google-resumable-media, isodate, protobuf, python-dateutil
typing-extensions==3.7.4.3 # via -r requirements.in
urllib3==1.25.11 # via botocore, requests
webargs==5.5.3 # via -r requirements.in
werkzeug==1.0.1 # via flask

# The following packages are considered to be unsafe in a requirements file:
# setuptools
Loading