Project

General

Profile

Actions

Bug #19848

closed

Ripper BOM behavior

Added by kddnewton (Kevin Newton) 8 months ago. Updated 8 months ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:114495]

Description

When there is a byte-order mark in a file, the first token in the file usually begins at -3. For example:

Ripper.lex("\xEF\xBB\xBF[]")
# => [[[1, -3], :on_lbracket, "[", BEG|LABEL], [[1, 1], :on_rbracket, "]", END]]

The rest of the tokens appear as if the byte-order-mark never existed. This is consistent except for the case where the file starts with a global variable, an instance variable, or a class variable. In those cases the first token begins at 0. For example:

Ripper.lex("\xEF\xBB\xBF@foo")
# => [[[1, 0], :on_ivar, "@foo", END]]

Ripper.lex("\xEF\xBB\xBF@@foo")
# => [[[1, 0], :on_cvar, "@@foo", END]]

Ripper.lex("\xEF\xBB\xBF$foo")
# => [[[1, 0], :on_gvar, "$foo", END]]

Additionally, when there is a byte-order mark it usually does not appear as part of the first token, unless the token is a magic encoding comment. If it's a magic encoding comment, then it's part of the value:

Ripper.lex("\xEF\xBB\xBF# encoding: us-ascii")
# => [[[1, -3], :on_comment, "\xEF\xBB\xBF# encoding: us-ascii", BEG]]

For solutions - when there is a byte-order mark I think the column information should either always start at 0, or always start at -3. Then for the encoding comment, it should probably not show up as part of the value, or it should show up for all comments.

Updated by kddnewton (Kevin Newton) 8 months ago

Apologies, I think I was wrong about the last part, it's part of the string but it doesn't show up on inspect. So this is just about the column information then.

Actions #3

Updated by nobu (Nobuyoshi Nakada) 8 months ago

  • Backport changed from 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN to 3.0: REQUIRED, 3.1: REQUIRED, 3.2: REQUIRED
Actions #4

Updated by nobu (Nobuyoshi Nakada) 8 months ago

  • Status changed from Open to Closed

Applied in changeset git|1f76e42b85be4031bdedcc3e457e8fa949195304.


[Bug #19848] Flush BOM

The token just after BOM needs to position at column 0, so that the
indentation matches closing line.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0