-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[WIP] S3 Multipart upload/download update #3275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: version-3
Are you sure you want to change the base?
Conversation
Detected 1 possible performance regressions:
|
|
||
module Aws | ||
module S3 | ||
# TODO - move to another location | ||
# Error raised when file download operations fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably make this similar to MultipartUploadError and make it a MultipartDownloadError and define it at the same scope. This is the only time this is raised, right? What errors are raised if the file cannot be written to, or it's a single request download?
key: options[:key] | ||
} | ||
@params[:version_id] = options[:version_id] if options[:version_id] | ||
@params = set_params(options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we weren't doing this correctly. The download_file method says:
# @param [Hash] options
# Additional options for {Client#get_object} and #{Client#head_object}
# may be provided.
however it looks like we only cherrypick these 3. I think instead, we probably want to delete the other options like mode, thread count, chunk size, etc, then the remaining options are all API options.
if part.params[:range] | ||
range = resp.content_range.split(' ').last.split('/').first | ||
expected_range = part.params[:range].split('=').last | ||
raise FileDownloadError, 'file download integrity checked failed' unless expected_range == range | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this being parsed and what is compared? Perhaps pulling this out into a method can better document what is happening.
write(resp) | ||
if @on_checksum_validated && resp.checksum_validated | ||
@on_checksum_validated.call(resp.checksum_validated, resp) | ||
end | ||
mutex.synchronize { total_requests += 1 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the mutex may not be necessary here, because at no point are you retrieving total_requests until after all the threads have completed.
@client.complete_multipart_upload( | ||
**complete_opts(options).merge( | ||
upload_id: upload_id, | ||
multipart_upload: { parts: parts } | ||
multipart_upload: { parts: parts }, | ||
mpu_object_size: File.size(source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, why is this needed?
hash | ||
UPLOAD_PART_OPTIONS.each_with_object({}) do |key, hash| | ||
# don't pass through checksum calculations | ||
hash[key] = options[key] if options.key?(key) && !checksum_key?(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember adding this check, but is it still necessary?
@@ -184,8 +184,13 @@ module S3 | |||
part_number: 1 | |||
}).exactly(1).times | |||
|
|||
client.stub_responses(:get_object, -> (_ctx) { | |||
{ body: 'body', content_range: 'bytes 0-4/4' } | |||
client.stub_responses(:get_object, -> (context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are missing validation cases - I'm assuming you will add these.
TBD
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
To make sure we include your contribution in the release notes, please make sure to add description entry for your changes in the "unreleased changes" section of the
CHANGELOG.md
file (at corresponding gem). For the description entry, please make sure it lives in one line and starts withFeature
orIssue
in the correct format.For generated code changes, please checkout below instructions first:
https://github.com/aws/aws-sdk-ruby/blob/version-3/CONTRIBUTING.md
Thank you for your contribution!