Project

General

Profile

Feature #15899

String#before and String#after

Added by kke (Kimmo Lehto) 4 months ago. Updated 3 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:92972]

Description

There seems to be no methods for getting a substring before or after a marker.

Too often I see and have to resort to variations of:

str[/(.+?);/, 1]
str.split(';').first
substr, _ = str.split(';', 2)
str.sub(/.*;/, '')
str[0...str.index(';')]

These create intermediate objects or/and are ugly.

The String#delete_suffix and String#delete_prefix do not accept regexps and thus only can be used if you first figure out the full prefix or suffix.

For this reason, I suggest something like:

> str = 'application/json; charset=utf-8'
> str.before(';')
=> "application/json"
> str.after(';')
=> " charset=utf-8"

What should happen if the marker isn't found? In my opinion, before should return the full string and after an empty string.


Files

test.rb (712 Bytes) test.rb edd314159 (Edd Morgan), 07/09/2019 06:33 PM
test_mem.rb (326 Bytes) test_mem.rb edd314159 (Edd Morgan), 07/09/2019 06:33 PM
2269.diff (3.77 KB) 2269.diff edd314159 (Edd Morgan), 07/09/2019 06:33 PM

History

Updated by sawa (Tsuyoshi Sawada) 4 months ago

Since you are mentioning that String#delete_suffix and String#delete_prefix do not accept regexps and that is a weak point, you should better use regexps in the examples illustrating your proposal.

Updated by sawa (Tsuyoshi Sawada) 4 months ago

Using partition looks reasonable, and it can accept regexes.

str = 'application/json; charset=utf-8'
before, _, after = str.partition(/; /)
before # => "application/json"
after # => "charset=utf-8"

Updated by shevegen (Robert A. Heiler) 4 months ago

I can see where it may be useful, since it could shorten code like this:

first_part = "hello world!".split(' ').first

To:

first_part = "hello world!.before(' ')

It is not a huge improvement in my opinion, though. (My comment here has
not yet addressed the other part about using regexes - see a bit later for
that.)

I am not a big fan of the names, though. I somehow associate #before and #after
more with time-based operations; and rack/sinatra middleware (route) filters.

I do not have a better or alternative suggestion, although since we already have
delete_prefix, perhaps we could have some methods that return the desired prefix
instead (or suffix).

As for lack of regex support, I think sawa already pointed out that it may be
better to reason for changing delete_prefix and delete_suffix instead. That way
your demonstrated use case could be simplified as well.

Updated by kke (Kimmo Lehto) 4 months ago

Using partition looks reasonable, and it can accept regexes.

It also has the problem of creating extra objects that you need to discard with _ or assign and just leave unused.

I am not a big fan of the names, though. I somehow associate #before and #after
more with time-based operations; and rack/sinatra middleware (route) filters.

How about str.preceding(';') and str.following(';')?

Perhaps str.prior_to(';') and str.behind(';')?

Possibility of opposite reading direction can make these problematic.

str.left_from(';'), str.right_from(';')? Sounds a bit clunky.

Head and tail could be the unixy choice and more versatile for other use cases.

class String
  def head(count = 10, separator = "\n")
    ...
  end

  def tail(count = 10, separator = "\n")
    ...
  end
end

For my example use case, it would become:

str = "application/json; charset=utf-8"
mime = str.head(1, ';')
labels = str.tail(1, ';')

And to emulate something like $ curl xttp://x.example.com | head you would use response.body.head

Updated by kke (Kimmo Lehto) 4 months ago

How about first and last?

'hello world'.first(2)
 => 'he'
'hello world'.last(2)
 => 'ld'
'hello world'.first
 => 'h'
'hello world'.last
 => 'd'
'hello world'.first(1, ' ')
 => 'hello'
'hello world'.last(1, ' ')
 => 'world'
'application/json; charset=utf-8'.first(1, ';')
 => 'application/json'

Updated by marcandre (Marc-Andre Lafortune) 4 months ago

sawa is right. Just use partition and rpartition.

Updated by edd314159 (Edd Morgan) 3 months ago

I'd like to add my +1 to this idea. Splitting a string by a substring (and only caring about the first result) is a use case I run into all the time. In fact, the example given by kke (Kimmo Lehto) of splitting a Content-Type HTTP header by the semicolon is the one I needed it for most recently.

It's true, partition and rpartition can absolutely achieve the same thing. But they have the side effect of returning (and, of course, allocating) extra String objects that are frequently discarded. This not only negatively impacts performance, but results in less readable code: we have to resort to the convention of prefixing the throwaway variable name with an underscore. This underscore is a convention agreed upon, informally, by humans to indicate the irrelevance of the variable, and I'm sure many Ruby programmers are unaware of the convention, or simply forget about it.

I have suggested an implementation in PR #2269 on Github: https://github.com/ruby/ruby/pull/2269

I also attach the following benchmark to show that when these new methods are used for this use case, performance is ~30% improved for splitting by a String (and moreso when splitting by Regex):

eddmorgan@eddbook ~/Projects/rubydev/build  make run

../ruby/revision.h unchanged
./miniruby -I../ruby/lib -I. -I.ext/common   ../ruby/test.rb
                       user     system      total        real
String#before      0.182367   0.000587   0.182954 (  0.183625)
String#partition   0.303105   0.000877   0.303982 (  0.304961)
                       user     system      total        real
String#after       0.199295   0.000672   0.199967 (  0.200794)
String#partition   0.302300   0.001409   0.303709 (  0.305278)

Also available in: Atom PDF