Skip to content

HADOOP-19604_branch-3.4. ABFS: BlockId generation based on blockCount along with full blob md5 computation change #7819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: branch-3.4
Choose a base branch
from

Conversation

anmolanmol1234
Copy link
Contributor

Jira :- https://issues.apache.org/jira/browse/HADOOP-19604

BlockId computation to be consistent across clients for PutBlock and PutBlockList so made use of blockCount instead of offset.
Block IDs were previously derived from the data offset, which could lead to inconsistency across different clients. The change now uses blockCount (i.e., the index of the block) to compute the Block ID, ensuring deterministic and consistent ID generation for both PutBlock and PutBlockList operations across clients.

Restrict URL encoding of certain JSON metadata during setXAttr calls.
When setting extended attributes (xAttrs), the JSON metadata (hdi_permission) was previously URL-encoded, which could cause unnecessary escaping or compatibility issues. This change ensures that only required metadata are encoded.

Maintain the MD5 hash of the whole block to validate data integrity during flush.
During flush operations, the MD5 hash of the entire block's data is computed and stored. This hash is later used to validate that the block correctly persisted, ensuring data integrity and helping detect corruption or transmission errors.

… full blob md5 computation change (apache#7777)

Contributed by Anmol Asrani
@anmolanmol1234 anmolanmol1234 changed the title HADOOP-19604. ABFS: BlockId generation based on blockCount along with full blob md5 computation change HADOOP-19604_branch-3.4. ABFS: BlockId generation based on blockCount along with full blob md5 computation change Jul 23, 2025
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 18m 30s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 12 new or modified test files.
_ branch-3.4 Compile Tests _
+1 💚 mvninstall 35m 56s branch-3.4 passed
+1 💚 compile 0m 40s branch-3.4 passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 38s branch-3.4 passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 0m 34s branch-3.4 passed
+1 💚 mvnsite 0m 43s branch-3.4 passed
+1 💚 javadoc 0m 42s branch-3.4 passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 36s branch-3.4 passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 1m 8s branch-3.4 passed
+1 💚 shadedclient 32m 4s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 31s the patch passed
+1 💚 compile 0m 33s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 33s the patch passed
+1 💚 compile 0m 29s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 javac 0m 29s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 21s hadoop-tools/hadoop-azure: The patch generated 0 new + 14 unchanged - 1 fixed = 14 total (was 15)
+1 💚 mvnsite 0m 33s the patch passed
+1 💚 javadoc 0m 27s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 26s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 1m 7s the patch passed
+1 💚 shadedclient 36m 37s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 32s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 38s The patch does not generate ASF License warnings.
137m 31s
Subsystem Report/Notes
Docker ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7819/1/artifact/out/Dockerfile
GITHUB PR #7819
Optional Tests dupname asflicense codespell detsecrets xmllint compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle
uname Linux 6a669777139a 5.15.0-143-generic #153-Ubuntu SMP Fri Jun 13 19:10:45 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.4 / d086393
Default Java Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7819/1/testReport/
Max. process+thread count 553 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7819/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants