Skip to content

Conversation

adam900710
Copy link
Collaborator

There is a github issue that expressed confusion about the "Compatibility" section of "ch-compression.rst".

The words in the man page is indeed confusing, and some points are no longer correct either.

Considering how complex the direct IO and compression thing is, I didn't come up with a correct answer until reading the code.

So update that section to provide a more straightforward result.

Issue: #1015

SidongYang and others added 11 commits August 13, 2025 20:05
Use SUBVOL_SYNC_WAIT ioctl for 'btrfs subvolume sync' command before
checking periodically and add an option to not use sync wait ioctl call
and force to check periodically. This patch calls a new function
wait_for_subvolume_sync() that calls BTRFS_IOC_SUBVOL_SYNC_WAIT for each
subvol.

Issue: kdave#953
Pull-request: kdave#989
Signed-off-by: Sidong Yang <sidong.yang@furiosa.ai>
Signed-off-by: David Sterba <dsterba@suse.com>
In the upstream equivalent of btrfs_remove_block_group(), the
function remove_block_group_free_space() is called to delete free
spaces associated with the block group being freed. However, this
function is defined in btrfs-progs but not called anywhere.

To address this issue, I added a call to
remove_block_group_free_space() in btrfs_remove_block_group(). This
ensures that the free spaces are properly deleted when a block group
is removed.

I also added a check to remove_block_group_free_space to make sure that
free-space-tree is enabled.

Signed-off-by: Leo Martins <loemra.dev@gmail.com>
check_free_space_tree() is intended to check that the free-space tree
doesn't contain any entries which don't correspond to a block group.
Unfortunately it calls btrfs_lookup_first_block_group(), which only
returns NULL if the search offset is larger than the start of any block
group.

Fix check_free_space_tree() to use btrfs_lookup_block_group() as was
obviously intended, and fix fsck-tests/054-orphan-directory to get rid
of its spurious free-space entries that this now diagnoses.

Signed-off-by: Mark Harmstone <mark@harmstone.com>
There is a note about a bug that mount a fs RO first, then remount it
RW, will make btrfs to skip the orphan roots cleanup.

However it's no longer the case after kernel commit 44c0ca211a4d
("btrfs: lift read-write mount setup from mount and remount"), as that
commit unify the pre-RW mount checks, and will always do the orphan
roots cleanup.

Just update the note so that it won't cause any confusion.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
[BEHAVIOR CHANGE]
In the incoming v6.17 kernel release, due to the changes in commit
40426dd147ff ("btrfs: use the super_block as holder when mounting file
systems"), we can no longer mount a seed device through both sprouted
fs and the seed device.

E.g.

 # mkfs.btrfs -f /dev/test/scratch1
 # mount /dev/test/scratch1 /mnt/btrfs
 # xfs_io -f -c "pwrite 0 16m" /mnt/btrfs/foobar
 # umount /mnt/btrfs
 # btrfstune -S1 /dev/test/scratch1
 # mount /dev/test/scratch1 /mnt/btrfs
 # btrfs device add /dev/test/scratch2 /mnt/btrfs

 Now the sprouted fs is mount, but if one wants to mount the seed
 device, it will fail:

 # mount /dev/test/scratch1  /mnt/btrfs/
 mount: /mnt/btrfs: /dev/mapper/test-scratch1 already mounted or mount point busy.
        dmesg(1) may have more information after failed mount system call.

 The only new dmesg is:

 BTRFS error: failed to open device for path /dev/mapper/test-scratch1 with flags 0x23: -16

[CAUSE]
After that kernel commit, each block device will have its own unique
holder (super block).

This super block block device holder is critical to pass device events
like freeze/thaw/missing to each filesystem.

And after the seed device is sprouted, the holder for the seed device is
the super block of the sprouted fs.

But if some one else tries to mount the seed device again, it will be a
new super block as the holder passed into bdev_file_open_by_path().

Since the seed device already has a different holder, this new
bdev_file_open_by_path() will fail with -EBUSY.

[ENHANCEMENT]
This is a kernel behavior change, but considering the benefit (proper
bdev events passing into the fs) I'd say the old behavior is more like a
hack or a coincidence, other than a properly designed behavior.

So update the documentation to make it more explicit.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
…erent filesystems

Since kernel commit 40426dd147ff ("btrfs: use the super_block as holder
when mounting file systems"), the kernel will not allow a block device
belonging to two different filesystems.

This means a seed device can only be mounted through either the sprouted
fs, or the seed device, not both at the same time.

Although a seed device can still be shared between different sprouted
fs, only one of those fs can be mounted.

Considering the extra benefit (extra protection, better device events
handling), it's worthy to do the kernel behavior change.

Update the test case to follow the new limits.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Currently we do the validation of subvolume parameters inside
mkfs/main.c, but considering all things like structure rootdir_subvol
are all inside rootdir.[ch], it's better to move the validation part
into rootdir.[ch].

Extract the validation part into a helper,
btrfs_mkfs_validate_subvols(), into rootdir.[ch].

Furthermore since we're here, also slightly enhance the duplicated
subvolume check, so that the runtime is halved.

Signed-off-by: Qu Wenruo <wqu@suse.com>
helper

Currently we do the validation of inode flag parameters inside
mkfs/main.c, but considering all things like structure
rootdir_inode_flags_entry are all inside rootdir.[ch], it's better to
move the validation part into rootdir.[ch].

Extract the validation part into a helper,
btrfs_mkfs_validate_inode_flags(), into rootdir.[ch].

Signed-off-by: Qu Wenruo <wqu@suse.com>
Currently for --subvol parameter, we save the full path of the host fs
into rootdir_subvol sturcture, then compare each inode we hit with that
saved full path.

This string comparison can be time consuming (up to PATH_MAX
characters), and we're doing it for every directory inode.

On the other hand, nftw() also provides a stat structure of the current
inode, stat::st_dev and stat::st_ino pair can always uniquely locate an
inode in the host filesystem.

With that said, we can just save the st_dev/st_ino pair when validating
the subvol parameters, and use st_dev/st_ino to check if we hit the
target subvolume.

This has two benefits:

- Reduce the memory usage of each rootdir_subvol
  Now we need no full_path member, replacing it with two u64.
  This saves (4K - 16) bytes per rootdir_subvol.

- Reduce the runtime of direct inode comparison
  Instead of doing strcmp() for up to 4K bytes, it's just two u64
  comparison.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Currently for --inode-flags parameter, we save the full path of the host
fs into rootdir_inode_flags_entry sturcture, then compare each inode we hit
with that saved full path.

This string comparison can be time consuming (up to PATH_MAX
characters), and we're doing it for every inode.

On the other hand, nftw() also provides a stat structure of the current
inode, stat::st_dev and stat::st_ino pair can always uniquely locate an
inode in the host filesystem.

With that said, we can just save the st_dev/st_ino pair when validating
the inode flag parameters, and use st_dev/st_ino to check if we hit the
target inode.

This has two benefits:

- Reduce the memory usage of each rootdir_inode_flags_entry
  Now we need no full_path member, replacing it with two u64.
  This saves (4K - 16) bytes per rootdir_inode_flags_entry.

- Reduce the runtime of inode comparison
  Instead of doing strcmp() for up to 4K bytes, it's just two u64
  comparison.

Signed-off-by: Qu Wenruo <wqu@suse.com>
With the recent enhancements to --subvol and --inode-flags options, the
memory usage for them are just 4K+, and even for the older 8K+ memory
usage, it's really hard to consider them as heavy memory usage.

E.g. even if the end user specified over 1000 entries of
--subvol/--inode-flags, it only takes a little over 8MiB for the older
code or over 4MiB for the newer code.

Since we're in the user space and the year is 2025 not 1995, such memory
usage is far from heavy.

Just remove the paranoid note from the man page.

Signed-off-by: Qu Wenruo <wqu@suse.com>
@UserlandAlchemist
Copy link

It looks like there’s a minor typo:

Compression requires both data checksum and COW, so either *nodatasum* or
*nodatasum* mount option/inode flag will result no compression.

I believe the second nodatasum should be nodatacow.

There is a github issue that expressed confusion about the
"Compatibility" section of "ch-compression.rst".

The words in the man page is indeed confusing, and some points are no
longer correct either.

Considering how complex the direct IO and compression thing is, I didn't
come up with a correct answer until reading the code.

So update that section to provide a more straightforward result.

Issue: kdave#1015
Signed-off-by: Qu Wenruo <wqu@suse.com>
@adam900710
Copy link
Collaborator Author

Thanks a lot for noticing the typo. Now the branch updated.

@kdave kdave force-pushed the devel branch 2 times, most recently from 05e416f to 539b0cb Compare August 27, 2025 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants