Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add support for S3 Batch Operations event #3563

Closed
2 tasks done
sbailliez opened this issue Dec 27, 2023 · 7 comments · Fixed by #3572
Closed
2 tasks done

Feature request: Add support for S3 Batch Operations event #3563

sbailliez opened this issue Dec 27, 2023 · 7 comments · Fixed by #3572
Assignees
Labels
event_sources Event Source Data Class utility feature-request feature request

Comments

@sbailliez
Copy link
Contributor

Use case

Writing invocation handler for s3 batch operations requires a lot of boilerplate code.

There are currently some examples for Invocation schema 1.0 and none available for 2.0 as well as little documentation for 2.0. This requires users to go through trial and errors to test behavior and implement reliable code requires significant effort.

Solution/User Experience

Provide a data class that handles s3 batch operation event for schema 1.0 and 2.0.

If interested I can submit a PR for this?

Alternative solutions

- Write all the boiler plate code as suggested in the documentation for the 1.0 schema
- Run batch operations with 2.0 schema and log events to discover the exact structure

Acknowledgment

@sbailliez sbailliez added feature-request feature request triage Pending triage from maintainers labels Dec 27, 2023
Copy link

boring-cyborg bot commented Dec 27, 2023

Thanks for opening your first issue here! We'll come back to you as soon as we can.
In the meantime, check out the #python channel on our Powertools for AWS Lambda Discord: Invite link

@leandrodamascena
Copy link
Contributor

leandrodamascena commented Dec 27, 2023

Hello @sbailliez! Thanks for opening this issue to add support for S3 batch operations. I think it makes a lot of sense for us to add support for this, but not just for the invocation payload, but also for the response payload. Let me add a few points to help us decide the next steps:

1 - As you said, there are 2 payloads when S3 Batch Operation invokes a Lambda, but the only difference between them is the key userArguments, so, we can create only one DataClass and add this field as optional, make sense?

Payload v1:

{
   "invocationId":"aaaaa....",
   "job":{
      "id":"d01f2142-feed-4bd7-871d-456e5937529c"
   },
   "tasks":[
      {
         "taskId":"AAAAAAAAAA....",
         "s3BucketArn":"arn:aws:s3:::aaaa",
         "s3Key":"aaaa.pdf",
         "s3VersionId":"None"
      }
   ],
   "invocationSchemaVersion":"1.0"
}

Payload v2:

{
  "invocationId": "AAAA....",
  "job": {
    "id": "389b0812-e6ae-42a4-81af-2f3d2ab58eb5",
    "userArguments": {
      "leandro": "powertools"
    }
  },
  "tasks": [
    {
      "taskId": "AAAAAAAAAAA...",
      "s3Bucket": "aaaa",
      "s3Key": "aaaa.pdf",
      "s3VersionId": "None"
    }
  ],
  "invocationSchemaVersion": "2.0"
}

2 - When S3 batch operation invokes a Lambda, it expects a specific response payload and we can create a class to help customers build the response easily. We did the same for Kinesis Firehose Data Transformation.
The response payload expects 3 possible values for the field treatMissingKeysAs: Succeeded, TemporaryFailure, and PermanentFailure.

Response payload

{
  "invocationSchemaVersion": "1.0",
  "treatMissingKeysAs" : "PermanentFailure",
  "invocationId" : "YXNkbGZqYWRmaiBhc2RmdW9hZHNmZGpmaGFzbGtkaGZza2RmaAo",
  "results": [
    {
      "taskId": "dGFza2lkZ29lc2hlcmUK",
      "resultCode": "Succeeded",
      "resultString": "[\"aaa", \"Jaaas\"]"
    }
  ]
}

Do you want to send a PR to add this support? If you can't, I can do this and tag you during the code review, what do you think?
Thanks 🚀

@leandrodamascena leandrodamascena added event_sources Event Source Data Class utility and removed triage Pending triage from maintainers labels Dec 27, 2023
@leandrodamascena leandrodamascena self-assigned this Dec 27, 2023
@leandrodamascena
Copy link
Contributor

Looking again at the payloads, there is also a difference in the s3BucketArn and s3Bucket keys. It makes more sense to create 2 classes for different versions of this event.

@sbailliez
Copy link
Contributor Author

@leandrodamascena totally makes sense for the response, I was trying to look for examples of where something similar was done before posting this issue but wanted to focus on the event first. Thanks for the link.

And for the event I think at this time 1 event will do the deal and support the 2 schema versions. It will actually helps a lot the transition from invocation with schema 1.0 and 2.0 to have an identical event data class. Generally you end up parsing the arn to be able to extract the s3 bucket (that you actually always need in your code). To give a snippet, in the task sructure, I would do something like:

@property
def s3_bucket_arn(self): -> Optional[str]:
    return self.get("s3BucketArn")
    
@property 
def s3_bucket(self) -> str:
    if self.s3_bucket_arn:
         return self.s3_bucket_arn.split(':::')[-1]
    return self["s3Bucket"]

Makes it very convenient in the code because then you can use task.s3_bucket from the get go.

I can work on all of this including the response.

@leandrodamascena
Copy link
Contributor

Hello @sbailliez! Please go ahead with this implementation! I liked the idea of how we can use the s3_bucket_arn and s3_bucket properties and make the transition between versions easy/transparent.

Feel free to ping me if you have any questions or are stuck on anything.

Thanks!

@sbailliez
Copy link
Contributor Author

Submitted initial draft via PR #3572

Copy link
Contributor

⚠️COMMENT VISIBILITY WARNING⚠️

This issue is now closed. Please be mindful that future comments are hard for our team to see.

If you need more assistance, please either tag a team member or open a new issue that references this one.

If you wish to keep having a conversation with other community members under this issue feel free to do so.

@leandrodamascena leandrodamascena moved this from Coming soon to Shipped in Powertools for AWS Lambda (Python) Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
event_sources Event Source Data Class utility feature-request feature request
Projects
Status: Shipped
Development

Successfully merging a pull request may close this issue.

2 participants