Bug #18768
closedInconsistent behavior of IO, StringIO and String each_line methods when return paragraph and chomp: true passed
Description
In IO, StringIO and String methods that return lines (e.g. each
, each_line
, gets
, readline
, readlines
) behave in a different way when the following parameters are passed
- separator is
""
and -
chomp
istrue
.
They truncate the new line characters between paragraphs and the trailing one differently:
"a\n\nb\n\nc\n".each_line("", chomp: true).to_a
#=> ["a\n", "b\n", "c\n"]
StringIO.new("a\n\nb\n\nc\n").each_line("", chomp: true).to_a
#=> ["a\n", "b\n", "c"]
File.open('chomp.txt').each_line("", chomp: true).to_a
#=> ["a", "b", "c\n"]
The text file content is the same:
File.read('chomp.txt')
#=> "a\n\nb\n\nc\n"
Expected behavior - they return the same result.
Updated by andrykonchin (Andrew Konchin) over 2 years ago
- Description updated (diff)
Updated by jeremyevans0 (Jeremy Evans) over 2 years ago
- Related to Bug #18770: Inconsistent behavior of IO/StringIO's each methods when called with nil as a separator, limit and chomp: true added
Updated by jeremyevans0 (Jeremy Evans) over 2 years ago
I agree that the behavior for all of these methods should be the same. I think the IO behavior makes the most sense, and String and StringIO should be changed to match it.
I submitted pull requests to Ruby (https://github.com/ruby/ruby/pull/5960) and StringIO (https://github.com/ruby/stringio/pull/29) to fix this.
Updated by jeremyevans (Jeremy Evans) over 2 years ago
- Status changed from Open to Closed
Applied in changeset git|609d73e8925f807786686caf635178bb1de74256.
[ruby/stringio] Fix handling of chomp with paragraph separator
Try to mirror IO behavior, where chomp takes out the entire paragraph
separators between entries, but does not chomp a single line separator
at the end of the string.
Partially Fixes [Bug #18768]
Updated by jeremyevans0 (Jeremy Evans) over 2 years ago
- Status changed from Closed to Open
Updated by mame (Yusuke Endoh) over 2 years ago
I have no strong opinion about this issue and am never against Jeremy's change, but I'm just curious. I thought that each_line(..., chomp: true) {|s| ... }
was equal to each_line(...) {|s| s = s.chomp; ... }
. In the above examples, StringIO's result (["a\n", "b\n", "c"]
) was the simplest to me. @jeremyevans0 (Jeremy Evans) How did you think the IO behavior made the most sense?
Updated by jeremyevans0 (Jeremy Evans) over 2 years ago
mame (Yusuke Endoh) wrote in #note-6:
I have no strong opinion about this issue and am never against Jeremy's change, but I'm just curious. I thought that
each_line(..., chomp: true) {|s| ... }
was equal toeach_line(...) {|s| s = s.chomp; ... }
. In the above examples, StringIO's result (["a\n", "b\n", "c"]
) was the simplest to me. @jeremyevans0 (Jeremy Evans) How did you think the IO behavior made the most sense?
My thought here is that chomp: true
, does not mean chomp
with no arguments, it means chomp
with the separator used for the each_line
. So when you call each_line with ''
(paragraph separator), it only chomps the paragraph separator. That's what the IO behavior already was, and I thought it best to make String/StringIO consistent with that.
Updated by mame (Yusuke Endoh) over 2 years ago
@jeremyevans0 (Jeremy Evans) Thank you for your explanation, I understand. The keyword name "chomp" might be a bit confusing.
Updated by jeremyevans0 (Jeremy Evans) over 2 years ago
mame (Yusuke Endoh) wrote in #note-8:
@jeremyevans0 (Jeremy Evans) Thank you for your explanation, I understand. The keyword name "chomp" might be a bit confusing.
I agree, the behavior is not obvious from the name chomp
. However, it is obvious that chomp
is supposed to remove the separator and not the generic newline. You can see this when providing an explicit line ending character:
'abc'.each_line('b', chomp: true).to_a
# => ["a", "c"]
require 'stringio'
StringIO.new('abc').each_line('b', chomp: true).to_a
# => ["a", "c"]
r, w = IO.pipe
w.write('abc')
w.close
r.each_line('b', chomp: true).to_a
# => ["a", "c"]
Updated by mame (Yusuke Endoh) over 2 years ago
We discussed this at the dev meeting, and @matz (Yukihiro Matsumoto) agreed with @jeremyevans0 (Jeremy Evans): chomp: true
should remove the separator. Could you please merge the change?
Updated by jeremyevans (Jeremy Evans) over 2 years ago
- Status changed from Open to Closed
Applied in changeset git|423b41cba77719b4f62aa530593ad36a990f7c74.
Make String#each_line work correctly with paragraph separator and chomp
Previously, it was including one newline when chomp was used,
which is inconsistent with IO#each_line behavior. This makes
behavior consistent with IO#each_line, chomping all paragraph
separators (multiple consecutive newlines), but not single
newlines.
Partially Fixes [Bug #18768]