Feature #15588
openString#each_chunk and #chunks
Description
String#each_chunk iterates chunks of specified size in String.
String#chunks is a shorthand for str.each_chunk(n).to_a.
present:
str = <<EOS
20190101 20190102
20190103 20190104
EOS
str.scan(/.{1,9}/m) do |chunk|
p chunk #=> "20190101 "
end
str.scan(/.{1,9}/m) do |chunk|
chunk.strip!
p chunk #=> "20190101"
end
str.scan(/.{1,9}/m) #=> ["20190101 ", "20190102\n", "20190103 ", "20190104\n"]
str.scan(/.{1,9}/m).map(&:strip) #=> ["20190101", "20190102", "20190103", "20190104"]
proposal:
str = <<EOS
20190101 20190102
20190103 20190104
EOS
str.each_chunk(9) do |chunk|
p chunk #=> "20190101 "
end
str.each_chunk(9, strip: true) do |chunk|
p chunk #=> "20190101"
end
str.chunks(9) #=> ["20190101 ", "20190102\n", "20190103 ", "20190104\n"]
str.chunks(9, strip: true) #=> ["20190101", "20190102", "20190103", "20190104"]
Files
Updated by shyouhei (Shyouhei Urabe) over 5 years ago
Why the String#scan
example you showed is not suitable for you? Tell us what makes you happy with the proposal.
Updated by mame (Yusuke Endoh) over 5 years ago
I like the proposal itself. I don't think that chunks
is a good name, though.
To take every n characters, I often write str.scan(/.{1,#{ n }}/m)
, but it looks a bit cryptic. In this case str.chunks(n)
is simpler.
I dislike strip: true
. It is too ad-hoc. Does it also support lstrip: true
, rstrip: true
, chop: true
, chomp: true
, etc? In principle, one method should do one thing, IMO.
Updated by sawa (Tsuyoshi Sawada) over 5 years ago
I am also not so sure if this feature is needed. But if I wanted such feature, I would ask to let String#scan
take similar arguments as String#[]
. That is, let the first argument point to the starting position, and an optional second argument to be the length. Since we want to capture multiple matches unlike with []
, passing a single index for the first argument does not make much sense, but now we have Enumerator::ArithmeticSequence
. So we should be able to do
str.scan((0..).step(9)) #=> ["20190101 ", "20190102\n", "20190103 ", "20190104\n"]
str.scan((0..).step(9), 8) #=> ["20190101", "20190102", "20190103", "20190104"]
Updated by naruse (Yui NARUSE) over 5 years ago
This requires more concrete real world example.
Updated by ioquatix (Samuel Williams) over 5 years ago
Here is a usecase
Because I didn't know /....../
should be /....../m
I wasted at least 2 hours of debugging.
I wish for both each_chunk
or each_slice
and/or each_unpack
.
Updated by ioquatix (Samuel Williams) over 5 years ago
I wonder if we should have consistency with slice
and each_slice
from Array
. But honestly, I don't care, just if it's available.
Updated by ioquatix (Samuel Williams) over 5 years ago
Is size in characters or bytes?
Updated by Glass_saga (Masaki Matsushita) over 5 years ago
I wonder if we should have consistency with slice and
each_slice
fromArray
. But honestly, I don't care, just if it's available.
I like String#each_slice
and #slices
.
Is size in characters or bytes?
Considering consistency with #slice
, it is better to have size as characters.
Updated by Eregon (Benoit Daloze) over 5 years ago
I think String#each_slice(n_chars)
would make sense, since it's like str.chars.each_slice(9) { |a| a.join }
Updated by shevegen (Robert A. Heiler) over 5 years ago
#each_slice
and #slices
seems fine to me as well; I think it is also a better
name than chunks.
Updated by osyo (manga osyo) over 5 years ago
I also wanted something like # each_slice
.
For example, use it when you want to fix the width of the output.
puts "abcdefghijklmnopqrstuvwxyz".each_slice(5).map { |s| "#{s}<br>" }
# output:
# abcde<br>
# fghij<br>
# klmno<br>
# pqrst<br>
# uvwxy<br>
# z<br>
Is size in characters or bytes?
Considering consistency with #slice, it is better to have size as characters.
I think that there may be multiple String#each_slice_xxx
like String#each_xxx
.
(e.g. Defined String#each_slice_byte
, String#each_slice_char
and more...
Also, I think that String#each_slice
may be equivalent toString#each_slice_char
.
Updated by matz (Yukihiro Matsumoto) about 5 years ago
As @shyouhei (Shyouhei Urabe) mentioned, we'd like to hear the real-world use-case. Extracting fixed-width records may be the purpose. I'm curious about the OP's opinion.
Matz.
Updated by usa (Usaku NAKAMURA) about 5 years ago
Just an idea, this method may be useful to treat data of fixed-length record format if it accepts multi column lengths, such as
records = []
fixed_length_records_data.each_slice(7, 10, 20) do |zip, tel, name|
records.push({zip: zip, tel: tel, name: name})
end