Here is a interesting use case that I worked on recently, I had to process about 4Tb of 360° panoramic images stored in AWS S3 and generate tile images for them.
I had a lambda function which was listening on any
s3:ObjectCreated event on an S3 bucket, which would in turn process the image and generate the tiles. So all I had to do was to copy the existing images from one the bucket to a temporary bucket and make the lambda function listen on the s3 event.
Pretty simple .. eh?
Well, here is the catch. I needed to control the rate at which the objects were copied in order to make sure that I do not shoot it over the roof and make the lambda function throttle. By default, an AWS account has a limit on how many lambda invocation can be made in parallel - only 1000/account.
I was using AWS CLI to copy the image from one bucket to another, and pretty soon, I was hitting that bottleneck, and the lambda functions started to throttle.
Luckily, AWS CLI S3 has some configurations to tweak concurrency settings, which I could easily tweak to adjust to my need.
max_concurrent_requests in your aws config
s3 = max_concurrent_requests = 500 max_queue_size = 10000 use_accelerate_endpoint = true
I was to specify the
max_concurrent_requests value, and after a few trial and errors and monitoring the results, I was able to control the objects transferred per second and able to keep it within limits.
While in my case I wanted to throttle the no of objects that were copied, tweaking the same configuration would also allow us to copy the objects much faster for a different use case. If you have resources on your machine to spawn multiple threads, then you increase the value of the
max_concurrent_requests and have the objects copied much faster.
Hope its helpful for someone out there.