Here is a interesting use case that I worked on recently, I had to process about 4Tb of  360° panoramic images stored in AWS S3 and generate tile images for them.


I had a lambda function which was listening on any s3:ObjectCreated event on an S3 bucket, which would in turn process the image and generate the tiles. So all I had to do was to copy the existing images from one the bucket to a temporary bucket and make the lambda function listen on the s3 event.


Pretty simple .. eh?


Well, here is the catch. I needed to control the rate at which the objects were copied in order to make sure that I do not shoot it over the roof and make the lambda function throttle. By default, an AWS account has a limit on how many lambda invocation can be made in parallel - only 1000/account.


I was using AWS CLI to copy the image from one bucket to another, and pretty soon, I was hitting that bottleneck, and the lambda functions started to throttle.

Solution.

Luckily, AWS CLI S3 has some configurations to tweak concurrency settings, which I could easily tweak to adjust to my need.


Setting the max_concurrent_requests in your aws config (~/.aws/config)

s3 =
  max_concurrent_requests = 500
  max_queue_size = 10000
  use_accelerate_endpoint = true


I was to specify the max_concurrent_requests value, and after a few trial and errors and monitoring the results, I was able to control the objects transferred per second and able to keep it within limits.

Notes.

While in my case I wanted to throttle the no of objects that were copied, tweaking the same configuration would also allow us to copy the objects much faster for a different use case. If you have resources on your machine to spawn multiple threads, then you increase the value of the max_concurrent_requests and have the objects copied much faster.

References:
https://docs.aws.amazon.com/cli/latest/topic/s3-config.html#configuration-values

Hope its helpful for someone out there.