Skip to content

Fix fields with \n being ignored when searching all fields w/o regex #3943

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 24, 2025

Conversation

iamllama
Copy link
Contributor

Reported in https://forums.ankiweb.net/t/potential-bug-newline-character-in-fields-html-disrupts-search-for-wildcard-fields/59617

*:*somestring* gets rewritten to (?i)^.*somestring.*$. But this doesn't match against a field with

<div>
  somestring
</div>

Perhaps surprisingly, the default behavior for . is not to match any character, but rather, to match any character except for the line terminator (which is \n by default). When this mode is enabled, the behavior changes such that . truly matches any character.
https://docs.rs/regex/latest/regex/struct.RegexBuilder.html#method.dot_matches_new_line

The fix proposed is to add the (?s) flag that enables the abovementioned mode

@iamllama iamllama changed the title Fix fields with \n potentially being ignored when searching all fields w/o regex Fix fields with \n being ignored when searching all fields w/o regex Apr 24, 2025
@dae
Copy link
Member

dae commented Apr 24, 2025

Should we perhaps be making the same change to write_regex and write_single_field_regexp as well?

@iamllama
Copy link
Contributor Author

iamllama commented Apr 24, 2025

This problem only happens because of the anchors ^ and $ that write_all_fields tacks on. Without them, newlines don't matter here as the match position isn't constrained anymore

write_regex and write_single_field_regexp do not add them automatically, so they aren't affected by this, unless a user explicitly uses anchors in a search. But adding them would perhaps make things more consistent too. Thoughts?

EDIT: it is already possible to specify additional regex flags when searching, i.e. "*:re:(?s)foo.*bar" will match foo\nbar

@dae
Copy link
Member

dae commented Apr 24, 2025

Good point about them not being anchored. Not so concerned about consistency as regex vs non-regex are sufficiently different. But a case could still be made for changing the other references, as forgetting-that-HTML-contains-newlines is a footgun we've tripped over multiple times before, and we could try save others from it. On the other hand, we've had zero complaints about it so far, and it could surprise someone who's used to the old behaviour. So maybe it's best if we just focus on the original fix for now. Thank you as always for these fixes :-)

@dae dae merged commit 1e6c8b2 into ankitects:main Apr 24, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants