Feature #20396
openObjectSpace.dump_all(string_value: false): skip dumping the String contents
Description
ObjectSpace.dump_all
is a very useful method to debug memory leaks and such, hence is frequently needed in production. But since all the 7bit strings content is included in the dump, it incur the risk of leaking personal data, or secrets.
Also, in many case the strings content isn't that helpful and is just making the dump much bigger for no good reason. And only pure-ASCII strings are dumped this way, which means all the tools that process these dumps should already be compatible with a dump without any string content.
Feature¶
I propose to add another optional parameter to dump_all
: string_value: false
. When passed, no String content is ever dumped regardless of its coderange.
Implementation: https://github.com/ruby/ruby/pull/10382
Updated by ko1 (Koichi Sasada) 10 months ago
false on default is safer?
Updated by byroot (Jean Boussier) 10 months ago
false on default is safer?
Agreed. Safer and faster. I only set it to true
on default to not change the current behavior, but wouldn't mind flipping it to false
by default.
Updated by jhawthorn (John Hawthorn) 10 months ago
This is a great addition! I've often used a post-processing script to remove the string data, so having it built in would be very helpful.
I think false
would be a good default (but either way is fine by me).
Updated by shyouhei (Shyouhei Urabe) 9 months ago
Why not just stop dumping string values? I'm proposing this because I see no reason to keep them. It is practically proven unnecessary; all non-ASCII bits are already silently dropped and no one complains. I prefer simple API for ObjectSpace.dump_all. We could add options later, if we find any use cases.
Updated by byroot (Jean Boussier) 9 months ago
I see no reason to keep them. It is practically proven unnecessary
I disagree. Just to give one example among many, it's very useful when tracking memory leaks. For instance you notice some pattern of a Hash growing, being able from the dump to see the content of the key often allow to map that object to actual code.
I also use it very frequently to find opportunities for string deduplication via heap-profiler
, e.g. https://github.com/rmosolgo/graphql-ruby/pull/4897
I'm totally fine with making it opt-in, but I'd like to keep the capability.
Updated by shyouhei (Shyouhei Urabe) 9 months ago
I'm not sure if I'm in favor of this request then. ObjectSpace.dump_all is very much analogous to a coredump. Both are very handy on occasions. I don't doubt your experience of finding memory leak is real. But... People normally don't try to cruft a coredump. One do often include sensitive info, but being able to access to a coredump is a big threat already. We normally strictly restrict access to them. The same thing can go for ObjectSpace.dump_all output.
I wrote "I prefer simple API for ObjectSpace.dump_all" because, I'm pretty sure this is not the last thing you wanted for the output. People need to filter out some objects fields, order by something, group by something, having a histogram, ... and pretty sure we would end up need an entire SQL engine. My preference is this method should remain as simple as possible, and let jq(1)
etc., having that business.
Updated by byroot (Jean Boussier) 9 months ago
I'm not sure reasoning by analogy with core dumps is sound here. If there was a way to be sure a core dump is stripped of all personally identifiable informations I'd definitely use it to share core dumps when it's useful.
because, I'm pretty sure this is not the last thing you wanted for the output. ... and pretty sure we would end up need an entire SQL engine.
I think this is a bit of an unfair argument. Yes I requested multiple additions to this API over the last few years, but in my opinion there is a very long way to go before it can considered a complex API, especially for an API that is intended for very advanced debugging. And it's not like I have a long list of feature requests I'm drip feeding.
Also I don't even need that capability myself, I suggested it because I was trying to help @zzak (zzak _) fix a memory leak at his company, and the dumps containing string values made it hard for him to get approval to generate heap dumps from production because of security concerns, and thought this new option it could be useful for the community.