Bug #19728
closedAutomate (checking of) Regexp character property documentation
Description
This came up in a discussion at https://github.com/ruby/ruby/pull/7923.
The documentation at doc/regexp.rdoc currently contains a list of character properties that can be used in regular expressions. But there is no guarantee that this list is updated when the Unicode version is updated.
One idea is to create a ruby equivalent of https://github.com/k-takata/Onigmo/blob/master/tool/update-doc.py. Another idea is to just write a test that checks enc/unicode/$UNICODE_VERSION/name2ctype.h against the relevant part of the documentation file. This might make it easier for the documentation to be rewritten while guaranteeing that no properties get forgotten.
Updated by janosch-x (Janosch Müller) over 1 year ago
How about doing it in enc-unicode.rb?
On the one hand, this script is a bit convoluted as it is, and does not need another responsibility.
On the other hand, it already passes a (quote) "human-friendly name for the group" to its #make_const
method for every property that it creates, and the sections of the document could be based on that. It also has the abbreviations (e.g. LL for lowercase letter) available in its aliases
variable. Generating the doc here would ensure an exact match of docs and code, whereas a test would probably not ensure e.g. that properties are in the correct section of the doc.
Updated by janosch-x (Janosch Müller) over 1 year ago
I found that enc-unicode.rb
deals with some inconsistent unicode data (i.e. some data which uses short property names and some data which uses long names), so it doesn't provide much useful context. I've made a PR to create documentation from the result instead: https://github.com/ruby/ruby/pull/7944
Updated by janosch-x (Janosch Müller) over 1 year ago
- Status changed from Open to Closed
Applied in changeset git|08b3fb11524e6cde453476f24ac80fd60457dfef.
[Bug #19728] Auto-generate unicode property docs