How to upload large file to S3 using multipart upload and rubygem 'aws-sdk'

Posted by : on

Category : Rails

Hello Guys, After a long time I am back with the solution of large file uploads hassle on s3. Many times we face problem in uploading large files to S3. When we try to upload large file to S3, it become more difficult than usual. Here is step by step demonstration how we can upload large files on S3:

 This blog is based on 'aws-sdk' rubygem having version 1.67.0.

What is multipart upload?

Before we go on for multipart upload, we should know about it first. S3 has a multipart_threshold defined, the default value of it is 16777216 Bytes or 16.777216MB. When uploading file size exceeded this threshold S3 assume it as multipart upload.

Override File method

First you have to override the File class and create a method that returns the part of the file of given size. Here is the code

PART_SIZE=16777216
# you can give any size that is not less than 16777216.
# It will raise an exception part too small if the PART_SIZE is less than 16777216.

class File
  def each_part(part_size=PART_SIZE)
    yield read(part_size) until eof?
  end
end

Upload file

You can create a new class or just a method to perform upload action. In this example I am using a method to upload.

  def upload_large_file_to_s3(file_name_with_path, object, key, content_type, acl= :public_read)
  # file_name_with_path is file name along with path like some/path/filename.ext
  # object is S3 object you can get an object by
  #object = AWS::S3.new.buckets['your_bucket_name'].objects[key]
  # key is an identifier for the file to be uploaded.
  # content_type is content_type of the file to be uploaded.
  # acl is permission I have make it public by default.
  
    File.open(file_name_with_path, 'rb') do |file|
      if file.size > PART_SIZE
        object.multipart_upload(acl: acl) do |upload|
          Rails.logger.info "File size over #{PART_SIZE} bytes, using multipart upload..."
          total_parts = file.size.to_f / PART_SIZE
          current_part = 1
          Rails.logger.info "total parts = #{total_parts}"
          _upload_id = upload.id
          file.each_part do |part|
            Rails.logger.info "uploading part:: = ##{current_part} === with upload_id #{_upload_id}"
            upload.add_part(part, part_number: current_part, upload_id: _upload_id)
            current_part+=1
          end
        end
      else
        object.write(key: key, file: File.open(file_name_with_path, 'rb'), acl: acl, content_type: content_type)
      end
    end
    (object.present? && object.key.present?) ? object.public_url : nil
  end
 

The above method will check for the large file. If the file size exceeded multipart_threshold. Then multipart upload will be perform, normal write will be perform otherwise.upload_large_file_to_s3 method return the public url of the uploaded file if successful upload is completed.

My recommendations is to use sidekiq or any background processing to handle large file upload to avoid requst time out.

Hope It will solve your problem. If you still facing problem or you have other version of the gem you can follow this link For more detail.



About Ram Laxman Yadav
Ram Laxman Yadav

Senior Software Engineering Professional | Tech Enthusiast | Mentor | Payments | Hospitality | E-Commerce, based in NCR, India

Email : info@ramlaxman.co.in

Website : https://ramlaxman.co.in