Bug #19728: Automate (checking of) Regexp character property documentation - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #19728

closed

Automate (checking of) Regexp character property documentation

Bug #19728: Automate (checking of) Regexp character property documentation

Added by duerst (Martin Dürst) about 3 years ago. Updated about 3 years ago.

Status:

Closed

Assignee:

duerst (Martin Dürst)

Target version:

ruby -v:

Backport:

3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN

[ruby-core:113895]

Description

This came up in a discussion at https://github.com/ruby/ruby/pull/7923.

The documentation at doc/regexp.rdoc currently contains a list of character properties that can be used in regular expressions. But there is no guarantee that this list is updated when the Unicode version is updated.

One idea is to create a ruby equivalent of https://github.com/k-takata/Onigmo/blob/master/tool/update-doc.py. Another idea is to just write a test that checks enc/unicode/$UNICODE_VERSION/name2ctype.h against the relevant part of the documentation file. This might make it easier for the documentation to be rewritten while guaranteeing that no properties get forgotten.

Updated by janosch-x (Janosch Müller) about 3 years ago Actions
Copy link
#1 [ruby-core:113901]

How about doing it in enc-unicode.rb?

On the one hand, this script is a bit convoluted as it is, and does not need another responsibility.

On the other hand, it already passes a (quote) "human-friendly name for the group" to its #make_const method for every property that it creates, and the sections of the document could be based on that. It also has the abbreviations (e.g. LL for lowercase letter) available in its aliases variable. Generating the doc here would ensure an exact match of docs and code, whereas a test would probably not ensure e.g. that properties are in the correct section of the doc.

Updated by janosch-x (Janosch Müller) about 3 years ago Actions
Copy link
#2 [ruby-core:113907]

I found that enc-unicode.rb deals with some inconsistent unicode data (i.e. some data which uses short property names and some data which uses long names), so it doesn't provide much useful context. I've made a PR to create documentation from the result instead: https://github.com/ruby/ruby/pull/7944

Updated by janosch-x (Janosch Müller) about 3 years ago Actions
Copy link
#3

Status changed from Open to Closed

Applied in changeset git|08b3fb11524e6cde453476f24ac80fd60457dfef.

[Bug #19728] Auto-generate unicode property docs

https://bugs.ruby-lang.org/issues/19728

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Bug #19728

Automate (checking of) Regexp character property documentation

Updated by janosch-x (Janosch Müller) about 3 years ago Actions
Copy link
#1 [ruby-core:113901]

Updated by janosch-x (Janosch Müller) about 3 years ago Actions
Copy link
#2 [ruby-core:113907]

Updated by janosch-x (Janosch Müller) about 3 years ago Actions
Copy link
#3

Project

General

Profile

Ruby

Custom queries

Bug #19728

Automate (checking of) Regexp character property documentation

Updated by janosch-x (Janosch Müller) about 3 years ago ActionsCopy link #1 [ruby-core:113901]

Updated by janosch-x (Janosch Müller) about 3 years ago ActionsCopy link #2 [ruby-core:113907]

Updated by janosch-x (Janosch Müller) about 3 years ago ActionsCopy link #3

Updated by janosch-x (Janosch Müller) about 3 years ago Actions
Copy link
#1 [ruby-core:113901]

Updated by janosch-x (Janosch Müller) about 3 years ago Actions
Copy link
#2 [ruby-core:113907]

Updated by janosch-x (Janosch Müller) about 3 years ago Actions
Copy link
#3