Bug #19485
closedUnexpected behavior in squiggly heredocs
Description
Based on the squiggly heredoc documentation, I found the following to be unexpected behavior. Explicitly, the documentation specifies, "The indentation of the least-indented line will be removed from each line of the content."
After running:
File.write("test.rb", "p <<~EOF\n\ta\n b\nEOF\n")
and then ruby test.rb
, I get the following output:
"\ta\nb\n"
The least-indented line above is b
, however, no leading whitespace is removed from the line containing \ta
.
For another example:
File.write("test.rb", "p <<~EOF\n\tA\n \tB\nEOF\n")
ruby test.rb
gives:
"A\nB\n"
In this case, the \t
was removed from the line containing A
, but more whitespace than that ( \t
) was removed from the line containing B
.
After seeing the first example, I assumed that the documentation was out of date, and that I should fix it to read that \t
would never be converted into space characters in order to remove leading whitespace. But after the second example, it seems like this is a bug in removing leading whitespace.
Can someone please explain what the rules should be on squiggly heredocs? I can implement a fix to adhere to the rules, or can update the documentation, I am just unsure of what the rules should be because the above two examples reflect unexpected behavior in two distinct ways.
Updated by Dan0042 (Daniel DeLorme) over 1 year ago
I think what's happening here is that tabs are not converted directly to 8 spaces, but to "move ahead to next multiple of 8 chars". So in that sense "\t" and " \t" are equivalent. It's the same behavior as 10.times{ |i| print " "*i,"\t",i,"\n" }
Updated by nobu (Nobuyoshi Nakada) over 1 year ago
- Status changed from Open to Assigned
- Assignee set to core
- Backport changed from 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN to 2.7: UNKNOWN, 3.0: REQUIRED, 3.1: REQUIRED, 3.2: REQUIRED
My draft is:
Note that the "indentation" is counted like as each horizontal tabs are
expanded to spaces up to the next tab stop column (per 8 columns), and each
indentation to be removed is the longest tabs and spaces sequence where the
next column does not exceed the least-indentation.
Does this make sense?
Updated by sawa (Tsuyoshi Sawada) over 1 year ago
nobu (Nobuyoshi Nakada) wrote in #note-2:
My [draft] is:
Note that the "indentation" is counted like as each horizontal tabs are
expanded to spaces up to the next tab stop column (per 8 columns), and each
indentation to be removed is the longest tabs and spaces sequence where the
next column does not exceed the least-indentation.
I find the sentence too long and a little too difficult to parse/understand. What about something like this:
For the purpose of measuring an indentation, a horizontal tab is regarded as a sequence of one to eight spaces such that the column position corresponding to its end is a multiple of eight. The amount to be removed is counted in terms of the number of spaces. If the boundary appears in the middle of a tab, that tab is not removed.
Updated by ioquatix (Samuel Williams) over 1 year ago
I don't think it's a good idea to assume a tab is 8 spaces.
Regarding indentation, it might be a nice simplification to only consider the first line in the squiggly heredoc. That's what I've done in the past - it's predictable and easy to explain.
i.e.
x = <<~FOO
1
2
3
FOO
At most 4 spaces is removed from each line. The first line determines this. Anyway, maybe it's irrelevant to this discussion. But that's how I've implemented it in my own language/interpreter in the past.
Python also has the idea of consistent indentation.
That means mixed spaces/tabs are not considered the same. If someone indents with "SSSSTT" and "TT" on two lines, it's considered invalid and/or not removed. Since you can't determine the equivalence of "S" (space) and "T" (tab) characters. Assuming there is a mapping from tabs to spaces is incorrect IMHO.
Updated by Eregon (Benoit Daloze) over 1 year ago
Another condition could be only accept tabs in squiggly heredoc if they prefix all lines of the squiggly heredoc? (otherwise SyntaxError, including for the 2 cases in the description)
(I wish tabs would just not be accepted as indentation for Ruby, but well that's probably a pointless discussion, even though it seems 99% of the community agrees there)
Updated by jemmai (Jemma Issroff) over 1 year ago
sawa (Tsuyoshi Sawada) wrote in #note-3:
For the purpose of measuring an indentation, a horizontal tab is regarded as a sequence of one to eight spaces such that the column position corresponding to its end is a multiple of eight. The amount to be removed is counted in terms of the number of spaces. If the boundary appears in the middle of a tab, that tab is not removed.
This documentation is very clear to me, and explains both cases I've mentioned in a way that is easy to understand.
Updated by nobu (Nobuyoshi Nakada) over 1 year ago
- Status changed from Assigned to Closed
Applied in changeset git|e7342e76dfd26237c604e42f9a59a1eaa578c94e.
[Bug #19485] [DOC] Mention tabs in indentation of heredoc identifier
Co-Authored-By: sawa (Tsuyoshi Sawada) sawadatsuyoshi@gmail.com
Updated by naruse (Yui NARUSE) over 1 year ago
- Backport changed from 2.7: UNKNOWN, 3.0: REQUIRED, 3.1: REQUIRED, 3.2: REQUIRED to 2.7: UNKNOWN, 3.0: REQUIRED, 3.1: REQUIRED, 3.2: DONE
ruby_3_2 b93e2223300bc54dfa387ffb9fa3d48ecbe670f0 merged revision(s) e7342e76dfd26237c604e42f9a59a1eaa578c94e.
Updated by nagachika (Tomoyuki Chikanaga) over 1 year ago
- Backport changed from 2.7: UNKNOWN, 3.0: REQUIRED, 3.1: REQUIRED, 3.2: DONE to 2.7: UNKNOWN, 3.0: REQUIRED, 3.1: DONE, 3.2: DONE
ruby_3_1 19af12ff195aba64bdca7a83f564f2c0e46061c0 merged revision(s) e7342e76dfd26237c604e42f9a59a1eaa578c94e.