Project

General

Profile

Actions

Bug #11706

closed

Clean up files etc/unicode/name2ctype.{h.blt,kwd,src}

Added by duerst (Martin Dürst) over 8 years ago. Updated over 2 years ago.

Status:
Closed
Target version:
-
[ruby-core:71542]

Description

The files name2ctype.{h.blt,kwd,src} in etc/unicode are intermediate products that are not needed in the repository, and haven't been committed consistently. I propose to remove them.

[I'm not sure this is a bug or a feature, but it doesn't provide any new functionality, so feature doesn't seem right.]

[I've assigned this to Nobu for feedback; I can execute it once we agree on a way forward.]

On 2015/11/17 15:39, Nobuyoshi Nakada wrote:

Please update name2ctype.{h.blt,kwd,src} files too.

Thanks for the reminder. I had a look at these files. Maybe before further commits, we can try to simplify things a bit, and/or to ignore irrelevant stuff.

Sorry this message is long. Looking at the three files you mentioned, I noticed the following:

enc/unicode/name2ctype.h.kwd was produced on the Onigmo side, when I worked on the update (see also https://github.com/k-takata/Onigmo/pull/58), too. However, it is not part of the Onigmo distribution.
It was last committed by Yui Naruse at r36070, on 2012/06/14. This is way before the update to Unicode 7.0.0 with r46831.

On 2011/11/20, K. Takata introduced https://github.com/k-takata/Onigmo/blob/master/tool/convert-name2ctype.sh, which is used as:
convert-name2ctype.sh name2ctype.kwd > name2ctype.h
to directly convert from name2ctype.kwd to name2ctype.h (although it produces a few numbered intermediary files which are removed in the last step).

enc/unicode/name2ctype.h.blt was last committed by yourself in r49292 on 2015/01/17. Your log message mentions r46831, but it is unclear why you updated .h.blt and not .kwd and .src. The last commit before this was r36070, same as for name2ctype.h.kwd.

enc/unicode/name2ctype.src also was last committed in r36070.

Looking at Makefile.in, it contains instructions to create enc/unicode/name2ctype.h from enc/unicode/name2ctype.kwd at http://svn.ruby-lang.org/cgi-bin/viewvc.cgi/trunk/Makefile.in?view=markup#l340. There, .h.blt and .src are mentioned, but my knowledge of shell syntax isn't good enough to understand what's exactly supposed to go on.

My conclusions so far would be:

  • name2ctype.{h.blt,kwd,src} are all intermediary files that are not
    actually used directly for building Ruby.
  • In the last few years, these three files have been committed only
    rarely and accidentally, not in any visible sync with actual bug fixes
    or feature additions.
  • Onigmo no longer uses name2ctype.h.blt and .src, and does not commit
    .kwd.
  • The build process on the Onigmo side, although I did it manually, was
    well documented and painless; on the Ruby side, it may be possible to
    build enc/unicode/name2ctype.h (the file that's finally used for
    compilation), but I haven't found how to do so.
  • For a process that needs to be done about once a year, this amount of
    manual work seems perfectly fine (at least for me, and I volunteer to
    do it again next year).
  • Therefore, I suggest that we don't care about committing
    name2ctype.{h.blt,kwd,src}. If you want me to commit
    enc/unicode/name2ctype.h.kwd, I can do it (because I have the new
    version). Indeed, it might be better to remove these three files;
    they only make checkouts heavier.
  • If we want to simplify the production process, my preference would be
    to update Makefile.in based on convert-name2ctype.sh, or to directly
    integrate convert-name2ctype.sh into tool/enc-unicode.rb
    (why would one want to use sed and friends if we already use ruby?)

Related issues 1 (0 open1 closed)

Related to Ruby master - Feature #11563: Update Onigmo regular expression engine to Unicode Version 8.0.0Closedduerst (Martin Dürst)Actions
Actions #1

Updated by duerst (Martin Dürst) over 8 years ago

  • Related to Feature #11563: Update Onigmo regular expression engine to Unicode Version 8.0.0 added

Updated by chrisseaton (Chris Seaton) about 8 years ago

I've been dealing with an issue related to this. When Ruby updated to MRI 7.0 the name2ctype.h was updated but not the name2ctype.src, so they're now inconsistent (look at CR_Blank for example).

I found this problem when I tried to update JCodings (part of JRuby) which generated its tables from these files. It uses the name2ctype.src, so got the wrong values.

I'll update JCodings to read from name2ctype.h instead.

You've listed name2ctype.h as an intermediate that should be deleted. I'm not sure that's right - it's actually the original source now isn't it? It's the only file in https://github.com/k-takata/Onigmo/tree/master/enc/unicode. I don't think that one can be deleted.

https://github.com/jruby/jcodings/issues/13

Updated by duerst (Martin Dürst) about 8 years ago

Chris Seaton wrote:

I've been dealing with an issue related to this. When Ruby updated to MRI 7.0

Do you mean Unicode 7.0?

the name2ctype.h was updated but not the name2ctype.src, so they're now inconsistent (look at CR_Blank for example).

What do you mean by "now"? What's your current revision/Ruby version? As for inconsistencies, I indeed mentioned that.

I found this problem when I tried to update JCodings (part of JRuby)

Can you tell me where in the JRuby source tree these files are?

which generated its tables from these files. It uses the name2ctype.src, so got the wrong values.

I'll update JCodings to read from name2ctype.h instead.

You've listed name2ctype.h as an intermediate that should be deleted. I'm not sure that's right - it's actually the original source now isn't it?

But I haven't listed it as an intermediary; I only listed name2ctype.h.blt, which isn't the same file.

It's the only file in https://github.com/k-takata/Onigmo/tree/master/enc/unicode. I don't think that one can be deleted.

I didn't propose to delete it, but it could be deleted because it's an intermediate file in the sense that the original source of the data is the Unicode database itself.

https://github.com/jruby/jcodings/issues/13

I'll add a pointer to here to that issue.

Updated by chrisseaton (Chris Seaton) about 8 years ago

Yes sorry I mean Unicode 7.0.

The JRuby code is at https://github.com/jruby/jcodings/tree/master/scripts.

Ah sorry I misread name2ctype.{h.blt,kwd,src} as name2ctype.{h,blt,kwd,src}, so I see you aren't proposing removing the .h.

Updated by jeremyevans0 (Jeremy Evans) over 2 years ago

  • Status changed from Open to Closed

The name2ctype.{h.blt,kwd,src} files were removed in 2f87f9e63b9f88b6fe1f26c315a64d41f8adc0a5.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0