Skip to content

Fix set semantics for Common and Inherited #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

dcsommer
Copy link
Contributor

Description: On reviewing the source code, it appears that the iterator implementation for ScriptExtension only ever yields a single element. I did not see this documented, and assuming this is a bug, I have written a test case to clarify the expected subset, union, and iterator behavior. If maintainers believe this behavior as shown in the proposed test is correct, I can go ahead and implement the fixes in the rest of the library.

@Manishearth
Copy link
Member

Yeah, I think that that behavior is a bug.

@dcsommer
Copy link
Contributor Author

Thanks. If you think the test looks right, I'll go ahead and make it pass and update the PR.

Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test looks good

…ited/Common

After finding the iterator after a union with Common only yielded a single
element, I overhauled the representation and semantics of
`ScriptExtension`. This is a breaking change for most APIs.

Summary of improvements to `ScriptExtension`:

* Improved  representation to be able to track multiple scripts as well as Inherited/Common
* "Inherited" and "Common" no longer intersect with everything and have no subset/superset relationship between them.
* `for_str` is a union, not intersection, of all chars
* Added `is_subset_or_equal()` for easier comparison of unions and intersections
* Changed `Debug` impl to a vanilla derive to allow comparing hex bits
* Fixed `Display` impl to properly show each script, separated by pluses
* New test for iterator
@dcsommer dcsommer changed the title Implement test to clarify and fix iterator and subset relationships Fix set semantics for Common and Inherited Aug 13, 2025
@dcsommer
Copy link
Contributor Author

@Manishearth it ended up being a breaking change for several APIs, but I think the behavior is much easier to reason about now and more useful overall. LMK what you think when you have a moment.

Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is incorrect: Common and Inherited are not allowed to coexist with other scx values

That behavior is deliberate

https://www.unicode.org/reports/tr24/#Scx_Bad_Example_Table

I misunderstood the point of the test, sorry.

@dcsommer
Copy link
Contributor Author

Got it, will close this PR then.

@dcsommer dcsommer closed this Aug 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants