Skip to content

[WIP] S3 Multipart upload/download update #3275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 18 commits into
base: version-3
Choose a base branch
from

Conversation

jterapin
Copy link
Contributor

TBD


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

  1. To make sure we include your contribution in the release notes, please make sure to add description entry for your changes in the "unreleased changes" section of the CHANGELOG.md file (at corresponding gem). For the description entry, please make sure it lives in one line and starts with Feature or Issue in the correct format.

  2. For generated code changes, please checkout below instructions first:
    https://github.com/aws/aws-sdk-ruby/blob/version-3/CONTRIBUTING.md

Thank you for your contribution!

Copy link

Detected 1 possible performance regressions:

  • aws-sdk-s3.get_object_small_ms - z-score regression: 10.43 -> 15.13. Z-score: 20.03


module Aws
module S3
# TODO - move to another location
# Error raised when file download operations fail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably make this similar to MultipartUploadError and make it a MultipartDownloadError and define it at the same scope. This is the only time this is raised, right? What errors are raised if the file cannot be written to, or it's a single request download?

key: options[:key]
}
@params[:version_id] = options[:version_id] if options[:version_id]
@params = set_params(options)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we weren't doing this correctly. The download_file method says:

      # @param [Hash] options
      #   Additional options for {Client#get_object} and #{Client#head_object}
      #   may be provided.

however it looks like we only cherrypick these 3. I think instead, we probably want to delete the other options like mode, thread count, chunk size, etc, then the remaining options are all API options.

Comment on lines +147 to +151
if part.params[:range]
range = resp.content_range.split(' ').last.split('/').first
expected_range = part.params[:range].split('=').last
raise FileDownloadError, 'file download integrity checked failed' unless expected_range == range
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this being parsed and what is compared? Perhaps pulling this out into a method can better document what is happening.

write(resp)
if @on_checksum_validated && resp.checksum_validated
@on_checksum_validated.call(resp.checksum_validated, resp)
end
mutex.synchronize { total_requests += 1 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the mutex may not be necessary here, because at no point are you retrieving total_requests until after all the threads have completed.

@client.complete_multipart_upload(
**complete_opts(options).merge(
upload_id: upload_id,
multipart_upload: { parts: parts }
multipart_upload: { parts: parts },
mpu_object_size: File.size(source)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, why is this needed?

hash
UPLOAD_PART_OPTIONS.each_with_object({}) do |key, hash|
# don't pass through checksum calculations
hash[key] = options[key] if options.key?(key) && !checksum_key?(key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember adding this check, but is it still necessary?

@@ -184,8 +184,13 @@ module S3
part_number: 1
}).exactly(1).times

client.stub_responses(:get_object, -> (_ctx) {
{ body: 'body', content_range: 'bytes 0-4/4' }
client.stub_responses(:get_object, -> (context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are missing validation cases - I'm assuming you will add these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants