Project

General

Profile

« Previous | Next » 

Revision 4e90dcc9

Added by normal about 7 years ago

string.c (str_uminus): deduplicate strings

This exposes the rb_fstring internal function to return a
deduped and frozen string when a non-frozen string is given.
This is useful for writing all sorts of record processing key
values maybe stored, but certain keys and values are often
duplicated at a high frequency, so memory savings can
noticeable.

Use cases are many:

  • email/NNTP header processing

    There are some standard header keys everybody uses
    (From/To/Cc/Date/Subject/Received/Message-ID/References/In-Reply-To),
    as well as common ones specific to a certain lists:
    (ruby-core has X-Redmine-* headers)
    It is also useful to dedupe values, as most inboxes have
    multiple messages from the same sender, or MUA.

  • package management systems -
    things like RubyGems stores identical strings for licenses,
    dependency names, author names/emails, etc

  • HTTP headers/trailers -
    standard headers (Host/Accept/Accept-Encoding/User-Agent/...)
    are common, but there are also uncommon ones.
    Values may be deduped, as well, as it is likely a user
    agent will make multiple/parallel requests to the same
    server.

  • version control systems -
    this can be useful for deduplicating names of frequent
    committers (like "nobu" :)

    In linux.git and git.git, there are also common
    trailers such as Signed-Off-By/Acked-by/Reviewed-by/Fixes/...
    as well as less common ones.

  • audio metadata -

    There are commonly used tags (Artist/Album/Title/Tracknumber),
    but Vorbis comments allows arbitrary key values to be stored.
    Music collections contain songs by the same artist or mutiple
    songs from the same album, so deduplicating values will be
    helpful there, too.

  • JSON, YAML, XML, HTML processing

    Certain fields, tags and attributes are commonly used
    across the same and multiple documents

There is no security concern in this being a DoS vector by
causing immortal strings. The fstring table is not a GC-root
and not walked during the mark phase. GC-able dynamic symbols
since Ruby 2.2 are handled in the same manner, and that
implementation also relies on the non-immortality of fstrings.

[Feature #13077] [ruby-core:79663]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57698 b2dd03c8-39d4-4d8f-98ff-823fe69b080e