Project

General

Profile

Feature #15588

String#each_chunk and #chunks

Added by Glass_saga (Masaki Matsushita) 10 months ago. Updated 4 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
[ruby-core:91414]

Description

String#each_chunk iterates chunks of specified size in String.
String#chunks is a shorthand for str.each_chunk(n).to_a.

present:

str = <<EOS
20190101 20190102
20190103 20190104
EOS

str.scan(/.{1,9}/m) do |chunk|
  p chunk #=> "20190101 "
end

str.scan(/.{1,9}/m) do |chunk|
  chunk.strip!
  p chunk #=> "20190101"
end

str.scan(/.{1,9}/m) #=> ["20190101 ", "20190102\n", "20190103 ", "20190104\n"]
str.scan(/.{1,9}/m).map(&:strip) #=> ["20190101", "20190102", "20190103", "20190104"]

proposal:

str = <<EOS
20190101 20190102
20190103 20190104
EOS

str.each_chunk(9) do |chunk|
  p chunk #=> "20190101 "
end

str.each_chunk(9, strip: true) do |chunk|
  p chunk #=> "20190101"
end

str.chunks(9) #=> ["20190101 ", "20190102\n", "20190103 ", "20190104\n"]
str.chunks(9, strip: true) #=> ["20190101", "20190102", "20190103", "20190104"]

Files

patch.diff (6.56 KB) patch.diff Glass_saga (Masaki Matsushita), 02/06/2019 01:35 AM

History

Updated by shyouhei (Shyouhei Urabe) 10 months ago

Why the String#scan example you showed is not suitable for you? Tell us what makes you happy with the proposal.

Updated by mame (Yusuke Endoh) 10 months ago

I like the proposal itself. I don't think that chunks is a good name, though.

To take every n characters, I often write str.scan(/.{1,#{ n }}/m), but it looks a bit cryptic. In this case str.chunks(n) is simpler.

I dislike strip: true. It is too ad-hoc. Does it also support lstrip: true, rstrip: true, chop: true, chomp: true, etc? In principle, one method should do one thing, IMO.

#3

Updated by sawa (Tsuyoshi Sawada) 10 months ago

I am also not so sure if this feature is needed. But if I wanted such feature, I would ask to let String#scan take similar arguments as String#[]. That is, let the first argument point to the starting position, and an optional second argument to be the length. Since we want to capture multiple matches unlike with [], passing a single index for the first argument does not make much sense, but now we have Enumerator::ArithmeticSequence. So we should be able to do

str.scan((0..).step(9)) #=> ["20190101 ", "20190102\n", "20190103 ", "20190104\n"]
str.scan((0..).step(9), 8) #=> ["20190101", "20190102", "20190103", "20190104"]

Updated by naruse (Yui NARUSE) 10 months ago

This requires more concrete real world example.

Updated by ioquatix (Samuel Williams) 6 months ago

Here is a usecase

https://github.com/socketry/protocol-http2/blob/12875a97e0f82315682191e3bbbaba8b59cb3432/lib/protocol/http2/settings_frame.rb#L236

Because I didn't know /....../ should be /....../m I wasted at least 2 hours of debugging.

I wish for both each_chunk or each_slice and/or each_unpack.

Updated by ioquatix (Samuel Williams) 6 months ago

I wonder if we should have consistency with slice and each_slice from Array. But honestly, I don't care, just if it's available.

Updated by ioquatix (Samuel Williams) 6 months ago

Is size in characters or bytes?

Updated by Glass_saga (Masaki Matsushita) 5 months ago

I wonder if we should have consistency with slice and each_slice from Array. But honestly, I don't care, just if it's available.

I like String#each_slice and #slices.

Is size in characters or bytes?

Considering consistency with #slice , it is better to have size as characters.

Updated by Eregon (Benoit Daloze) 5 months ago

I think String#each_slice(n_chars) would make sense, since it's like str.chars.each_slice(9) { |a| a.join }

Updated by shevegen (Robert A. Heiler) 5 months ago

#each_slice and #slices seems fine to me as well; I think it is also a better
name than chunks.

Updated by osyo (manga osyo) 5 months ago

I also wanted something like # each_slice.
For example, use it when you want to fix the width of the output.

puts "abcdefghijklmnopqrstuvwxyz".each_slice(5).map { |s| "#{s}<br>" }
# output:
# abcde<br>
# fghij<br>
# klmno<br>
# pqrst<br>
# uvwxy<br>
# z<br>

Is size in characters or bytes?
Considering consistency with #slice, it is better to have size as characters.

I think that there may be multiple String#each_slice_xxx likeString#each_xxx.
(e.g. Defined String#each_slice_byte , String#each_slice_char and more...
Also, I think that String#each_slice may be equivalent toString#each_slice_char.

Updated by matz (Yukihiro Matsumoto) 4 months ago

As shyouhei (Shyouhei Urabe) mentioned, we'd like to hear the real-world use-case. Extracting fixed-width records may be the purpose. I'm curious about the OP's opinion.

Matz.

Updated by usa (Usaku NAKAMURA) 4 months ago

Just an idea, this method may be useful to treat data of fixed-length record format if it accepts multi column lengths, such as

records = []
fixed_length_records_data.each_slice(7, 10, 20) do |zip, tel, name|
  records.push({zip: zip, tel: tel, name: name})
end

Also available in: Atom PDF