- Status changed from Feedback to Open
thyresias (Thierry Lambert) wrote in #note-2:
Does this behavior cause any problems in your application?
Yes:
search_text = "foo"
s_search = Regexp.escape(search_text)
re_prefix = /\p{In_Arabic}.+ /
s_search.prepend re_prefix.source
_re = /^#{s_search}|(?<=– |: )#{s_search}/ #=> encoding mismatch in dynamic regexp : US-ASCII and UTF-8 (RegexpError)
Thank you for providing an example. This seems more like an issue with the literal Regexp support in general than with Regexp.escape. You can trigger the issue without Regexp.escape:
re = /#{"\\p{In_Arabic}".encode("US-ASCII")}\u1234/
# encoding mismatch in dynamic regexp : US-ASCII and UTF-8
It seems to require you specify unicode properties inside an interpolated string that isn't in UTF-8.
You get a different error without that unicode character at the end:
re = /#{"\\p{In_Arabic}".encode("US-ASCII")}/
# invalid character property name {In_Arabic}: /\p{In_Arabic}/
Using Regexp.new instead of a literal Regexp may work around the issue:
search_text = "foo"
s_search = Regexp.escape(search_text)
re_prefix = /\p{In_Arabic}.+ /
s_search.prepend re_prefix.source
_re = Regexp.new("^#{s_search}|(?<=– |: )#{s_search}")