Bug #11706

Clean up files etc/unicode/name2ctype.{h.blt,kwd,src}

Added by duerst (Martin Dürst) over 5 years ago. Updated about 5 years ago.

Target version:


The files name2ctype.{h.blt,kwd,src} in etc/unicode are intermediate products that are not needed in the repository, and haven't been committed consistently. I propose to remove them.

[I'm not sure this is a bug or a feature, but it doesn't provide any new functionality, so feature doesn't seem right.]

[I've assigned this to Nobu for feedback; I can execute it once we agree on a way forward.]

On 2015/11/17 15:39, Nobuyoshi Nakada wrote:

Please update name2ctype.{h.blt,kwd,src} files too.

Thanks for the reminder. I had a look at these files. Maybe before further commits, we can try to simplify things a bit, and/or to ignore irrelevant stuff.

Sorry this message is long. Looking at the three files you mentioned, I noticed the following:

enc/unicode/name2ctype.h.kwd was produced on the Onigmo side, when I worked on the update (see also, too. However, it is not part of the Onigmo distribution.
It was last committed by Yui Naruse at r36070, on 2012/06/14. This is way before the update to Unicode 7.0.0 with r46831.

On 2011/11/20, K. Takata introduced, which is used as: name2ctype.kwd > name2ctype.h
to directly convert from name2ctype.kwd to name2ctype.h (although it produces a few numbered intermediary files which are removed in the last step).

enc/unicode/name2ctype.h.blt was last committed by yourself in r49292 on 2015/01/17. Your log message mentions r46831, but it is unclear why you updated .h.blt and not .kwd and .src. The last commit before this was r36070, same as for name2ctype.h.kwd.

enc/unicode/name2ctype.src also was last committed in r36070.

Looking at, it contains instructions to create enc/unicode/name2ctype.h from enc/unicode/name2ctype.kwd at There, .h.blt and .src are mentioned, but my knowledge of shell syntax isn't good enough to understand what's exactly supposed to go on.

My conclusions so far would be:

  • name2ctype.{h.blt,kwd,src} are all intermediary files that are not actually used directly for building Ruby.
  • In the last few years, these three files have been committed only rarely and accidentally, not in any visible sync with actual bug fixes or feature additions.
  • Onigmo no longer uses name2ctype.h.blt and .src, and does not commit .kwd.
  • The build process on the Onigmo side, although I did it manually, was well documented and painless; on the Ruby side, it may be possible to build enc/unicode/name2ctype.h (the file that's finally used for compilation), but I haven't found how to do so.
  • For a process that needs to be done about once a year, this amount of manual work seems perfectly fine (at least for me, and I volunteer to do it again next year).
  • Therefore, I suggest that we don't care about committing name2ctype.{h.blt,kwd,src}. If you want me to commit enc/unicode/name2ctype.h.kwd, I can do it (because I have the new version). Indeed, it might be better to remove these three files; they only make checkouts heavier.
  • If we want to simplify the production process, my preference would be to update based on, or to directly integrate into tool/enc-unicode.rb (why would one want to use sed and friends if we already use ruby?)

Related issues

Related to Ruby master - Feature #11563: Update Onigmo regular expression engine to Unicode Version 8.0.0Closedduerst (Martin Dürst)Actions

Also available in: Atom PDF