Feature #18033
openTime.new to parse a string
Description
Make Time.new
parse Time#inspect
and ISO-8601 like strings.
-
Time.iso8601
andTime.parse
need an extension library,date
. -
Time.iso8601
can't parseTime#inspect
string. -
Time.parse
often results in unintentional/surprising results. -
Time.new
also about 1.9 times faster thanTime.iso8601
.$ ./ruby -rtime -rbenchmark -e ' n = 1000 s = Time.now.iso8601 Benchmark.bm(12) do |x| x.report("Time.iso8601") {n.times{Time.iso8601(s)}} x.report("Time.parse") {n.times{Time.parse(s)}} x.report("Time.new") {n.times{Time.new(s)}} end' user system total real Time.iso8601 0.006919 0.000185 0.007104 ( 0.007091) Time.parse 0.018338 0.000207 0.018545 ( 0.018590) Time.new 0.003671 0.000069 0.003740 ( 0.003741)
Updated by ioquatix (Samuel Williams) 11 months ago
It looks like good improvement.
But does any other #new
implementation work this way? Can we fix Time#parse
? I wonder if user will be confused, should they use Time.parse
or Time.new
?
Why is Time.parse
so slow?
This seems like a good performance improvement. But rather than fixing the existing #parse
, we introduce new one. Now we move the complexity to the user.
Updated by nobu (Nobuyoshi Nakada) 11 months ago
ioquatix (Samuel Williams) wrote in #note-2:
But does any other
#new
implementation work this way? Can we fixTime#parse
? I wonder if user will be confused, should they useTime.parse
orTime.new
?
The reason not to "fix Time#parse
" is
Time.parse
often results in unintentional/surprising results.
Updated by ioquatix (Samuel Williams) 11 months ago
Time.parse often results in unintentional/surprising results.
Can we change it so that it doesn't return unintentional/surprising results? I realise this might be impossible without extending the interface... e.g. Time.parse(..., surprising: false)
:p
I agree clean room implementation is nice. But if Time.parse
can do unexpected things, a new interface does not fix existing usage so the problem still exists, and now we have two interfaces for doing largely the same thing - one that "works" and one that does "unintentional/surprising" things.
Updated by nobu (Nobuyoshi Nakada) 11 months ago
ioquatix (Samuel Williams) wrote in #note-4:
Time.parse often results in unintentional/surprising results.
Can we change it so that it doesn't return unintentional/surprising results? I realise this might be impossible without extending the interface... e.g.
Time.parse(..., surprising: false)
:p
Yes, it seems by design, and changing the behavior will just break something.
I thought that the same name but different behavior method would be more confusing.
I agree clean room implementation is nice. But if
Time.parse
can do unexpected things, a new interface does not fix existing usage so the problem still exists, and now we have two interfaces for doing largely the same thing - one that "works" and one that does "unintentional/surprising" things.
A wrapper version based on the new Time.new
would be possible, I think.
Updated by byroot (Jean Boussier) 11 months ago
user system total real
Time.iso8601 0.006919 0.000185 0.007104 ( 0.007091)
Time.parse 0.018338 0.000207 0.018545 ( 0.018590)
Time.new 0.003671 0.000069 0.003740 ( 0.003741)
This new interface being faster than Time.iso8601
at parsing iso8601
seems weird to me. When I have IS0-8601 dates (pretty much the only format I really use), I'd expect that by using the dedicated method to parse them, I get the best possible performance.
Can we optimize Time.iso8601
in the same way? Otherwise I can already see people moving to the new API and losing strictness by doing so.
Updated by mame (Yusuke Endoh) 10 months ago
- Related to Feature #16005: A variation of Time.iso8601 that can parse yyyy-MM-dd HH:mm:ss added
Updated by ioquatix (Samuel Williams) 10 months ago
Yes, it seems by design, and changing the behavior will just break something.
I think there is value in the following interface:
Time.parse(..., strict: true)
where we have some very clearly documented interpretation of strict
.
If we want to support multiple formatS:
Time.parse(..., format: :loose/:strict)
or Time.parse(...,format: :iso8601)
etc.
Updated by ioquatix (Samuel Williams) 7 months ago
I checked the latest update.
I'm still not sure about Time.new(string)
parsing a string.
What about Time.parse_strict
or Time.parse_iso8601
etc?
Updated by Eregon (Benoit Daloze) 6 months ago
Like @ioquatix (Samuel Williams) and @byroot (Jean Boussier) I think Time.new
taking a iso-like String is not a good idea.
Also it's technically incompatible (the first argument is always the year for Time.new
and it even accepts strings):
> Time.new "2234-01-01T00:00:00+07:00"
=> 2234-01-01 00:00:00 +0100
I understand Time.iso8601 and Time.parse need an extension library, date.
is annoying and not intuitive.
How about simply making Time.iso8601
a core method?
Time.iso8601 can't parse Time#inspect string.
One shouldn't parse inspect
so this should be no problem in practice.
We could add Time#iso8601
though (in core), I think that's much better than time.to_datetime.iso8601
.
Updated by byroot (Jean Boussier) 6 months ago
If I might add a nitpick, the actual format is RFC 3339, which is pretty much a subset of ISO 8601.
But yes, +1 to making Time.iso8601
a core method.
Updated by timcraft (Tim Craft) 6 months ago
Nice performance improvement!
In terms of the interface I think it would be confusing to make Time.new
parse a string:
- Using a
.parse
method for parsing is more conventional / less surprising / more intention revealing - Having both
.new
and.parse
would be confusing because it's not clear when to use one or the other, what the trade-offs are etc
It feels very unintiutive and not "least surprise". Similarly Time.parse(...,format: :iso8601)
and Time.parse_iso8601
feel a bit clunky.
Is this awkwardness stemming from trying to fit much functionality into the Time class? It would be more work, but if there was a dedicated ISO8601 module we could shift some responsibility away from the Time class. For example:
ISO8601::Time.parse # strictly ISO8601 parsing, returns a Time object
This could be extended to encapsulate ISO8601 parsing for other classes:
ISO8601::Date.parse
ISO8601::Duration.parse
ISO8601::TimeInterval.parse
The same general pattern could be followed for other formats:
RFC3339::Time.parse
RFC2822::Time.parse
SQL92::Time.parse
Each of those methods would have the same signature, so they could be easily interchanged to get varying degrees of strictness or performance.
From a naming and readability perspective the names are less surprising, more intention revealing, and very greppable.
The higher level Time.parse
method could potentially delegate to these methods to support parsing a wider range of formats.
Each module could also be extended to encapsulate string formatting as an alternative to adding instance methods like #iso8601, #rfc3339 etc.
An alternative to using the formats as the top level modules would be to invert and nest the modules under the Time classes, for example:
Time::ISO8601.parse # instead of ISO8601::Time.parse
Same benefits—I'm not sure which would be considered more Ruby-ish?
Updated by nobu (Nobuyoshi Nakada) 6 months ago
- Related to Feature #18331: Kernel.#Time added
Updated by nobu (Nobuyoshi Nakada) 6 months ago
Updated by nobu (Nobuyoshi Nakada) 6 months ago
I remember #18331 won't work with subclasses of Time
.
Does Time.at("2021-12-04 02:00:00")
make sense?
Updated by sawa (Tsuyoshi Sawada) 6 months ago
nobu (Nobuyoshi Nakada) wrote in #note-15:
I remember #18331 won't work with subclasses of
Time
.
Can you elaborate on that? Is your concern about certain frameworks using time-like classes other than Time
? I don't see any problem with that. If you create a Time
object using Kernel.#Time
proposed in #18331, that can easily be converted (often without noticing) to such class in case of Ruby on Rails' ActiveSupport::TimeWithZone
.
Updated by Eregon (Benoit Daloze) 6 months ago
Yes, Kernel#Time(str)
seems better than Time.new(str)
.
However I think being precise for parsing is key for times.
Hence, why not make Time.iso8601
fast and a core method?
Then code already using the right method would just become faster, and there is no need migrate code to use a new API.
Updated by nobu (Nobuyoshi Nakada) 6 months ago
sawa (Tsuyoshi Sawada) wrote in #note-16:
nobu (Nobuyoshi Nakada) wrote in #note-15:
I remember #18331 won't work with subclasses of
Time
.Can you elaborate on that? Is your concern about certain frameworks using time-like classes other than
Time
? I don't see any problem with that. If you create aTime
object usingKernel.#Time
proposed in #18331, that can easily be converted (often without noticing) to such class in case of Ruby on Rails'ActiveSupport::TimeWithZone
.
The timezone name will be converted by find_timezone
method on the class of a Time
instance, so that subclasses can override it.
Updated by nobu (Nobuyoshi Nakada) 6 months ago
Eregon (Benoit Daloze) wrote in #note-17:
Hence, why not make
Time.iso8601
fast and a core method?
Then code already using the right method would just become faster, and there is no need migrate code to use a new API.
First, the primary target is the result of Time#inspect
, and it is not fully compliant with ISO-8601.
Second, ISO-8601 allows many variants, but I'm not going to implement them all.
Updated by nobu (Nobuyoshi Nakada) 6 months ago
Eregon (Benoit Daloze) wrote in #note-10:
Also it's technically incompatible (the first argument is always the year for
Time.new
and it even accepts strings):> Time.new "2234-01-01T00:00:00+07:00" => 2234-01-01 00:00:00 +0100
Actually this code raises an ArgumentError
in the master, since https://bugs.ruby-lang.org/issues/17485#change-89871.
Updated by Eregon (Benoit Daloze) 6 months ago
nobu (Nobuyoshi Nakada) wrote in #note-19:
First, the primary target is the result of
Time#inspect
, and it is not fully compliant with ISO-8601.
Thanks for making that clear, I wasn't sure why not just improve Time.iso8601
.
Right, they differ:
irb(main):006:0> t.inspect
=> "2021-12-06 14:10:15.334531885 +0100"
irb(main):007:0> t.iso8601
=> "2021-12-06T14:10:15+01:00"
Why we do we need to parse Time#inspect
output?
It seems in general bad to rely on inspect
for serialization, since it only works for some classes.
Maybe to preserve subseconds?
#inspect
does not feel like a proper way to serialize a Time instance, so if we want to add that I think we should have a new method for it too (maybe #dump
?).
In any case, Time.new
accepts individual time components, and I think making it parse strings is just confusing.
A new method Time.try_convert
(or Time.undump
) feels more appropriate for such parsing.
Second, ISO-8601 allows many variants, but I'm not going to implement them all.
Is that what makes it faster?
The PR still uses a Regexp so I guess the main difference is the Regexp is a little bit simpler?
They look fairly similar:
Updated by nobu (Nobuyoshi Nakada) 6 months ago
Eregon (Benoit Daloze) wrote in #note-21:
Why we do we need to parse
Time#inspect
output?
It seems in general bad to rely oninspect
for serialization, since it only works for some classes.
Maybe to preserve subseconds?
I wrote Time#inspect
, but the "ISO 8601-like" format is not only used by it, e.g., --date=iso
of git.
#inspect
does not feel like a proper way to serialize a Time instance, so if we want to add that I think we should have a new method for it too (maybe#dump
?).
In any case,
Time.new
accepts individual time components, and I think making it parse strings is just confusing.
How about Time.at
?
A new method
Time.try_convert
(orTime.undump
) feels more appropriate for such parsing.
Time.try_convert
feels considerable, but passing the timezone option may not fit.
Second, ISO-8601 allows many variants, but I'm not going to implement them all.
Is that what makes it faster?
Not to be faster.
ISO-8601 is a large specification and allows many omissions, which makes parsing very complex and sometimes conflicts with the spec of Time
.
The PR still uses a Regexp so I guess the main difference is the Regexp is a little bit simpler?
They look fairly similar:
That PR is abandoned, please see Time.new-string or Time.at-string.
Updated by Eregon (Benoit Daloze) 6 months ago
nobu (Nobuyoshi Nakada) wrote in #note-22:
I wrote
Time#inspect
, but the "ISO 8601-like" format is not only used by it, e.g.,--date=iso
of git.
Interesting that git
has this too:
--date=iso (or --date=iso8601) shows timestamps in a ISO 8601-like format. The differences to the strict ISO 8601 format are:
• a space instead of the T date/time delimiter
• a space between time and time zone
• no colon between hours and minutes of the time zone
--date=iso-strict (or --date=iso8601-strict) shows timestamps in strict ISO 8601 format.
Maybe we really need a name for this. Like Time#iso
and Time.iso
.
Or it could be a keyword argument: time.iso8601(format: :git)
and Time.iso8601(str, format: :git)
.
:git
is just an example, could be textual
, relaxed
or anything.
Another idea would be to actually use the strict variant of iso8601 if that is well defined, like what git log --date=iso-strict
shows.
Then we'd have Time#iso8601_strict
/Time.iso8601_strict
, or time.iso8601(format: :strict)
and Time.iso8601(str, format: :strict)
.
That feels proper to me because then we have a clear way to serialize and deserialize times and finding the dumping/parsing method is trivial (they have the same name, no need to guess).
I feel relying on #inspect
for efficient serializing is in general bad, because that is not the main purpose of #inspect
(rather it's to show a clear/debug-like representation of an object).
How about
Time.at
?
To me Time.at
sounds like "give me a Time at epoch", so I think it's unexpected it does String parsing.
Time.try_convert
feels considerable, but passing the timezone option may not fit.
I think an extra optional timezone argument would be fine.
Updated by nobu (Nobuyoshi Nakada) 6 months ago
Eregon (Benoit Daloze) wrote in #note-23:
Time.try_convert
feels considerable, but passing the timezone option may not fit.I think an extra optional timezone argument would be fine.
Ah, my bad.
The try_convert
methods are the implicit conversions for duck-typing, i.e, to_int
to Integer
, to_str
to String
and so on.
As parsing like Kernel#Integer
is not an implicit conversion, Time.try_convert
can't be valid.