Project

General

Profile

Bug #15968

Updated by nobu (Nobuyoshi Nakada) almost 5 years ago

While working on a Rails app, I noticed some odd behavior where after marshalling and demarshalling an array of ActiveRecord objects, some elements were replaced with symbols and empty hashes ([original Rails bug report](https://github.com/rails/rails/issues/36522)). 

 It appears some of Rails' custom marshallization methods modify allow an object's unset instance variables to be set during marshallization. However, since these instance variables weren't counted at the start of marshallization, they overflow into subsequent array elements upon demarshallization. 

 Here is a test case (written in plain Ruby) demonstrating this behavior: 

 ```ruby ``` 
 require 'test/unit' 

 class Foo 
   attr_accessor :bar, :baz 

   def initialize 
     self.bar = Bar.new(self) 
   end 
 end 

 class Bar 
   attr_accessor :foo 

   def initialize(foo) 
     self.foo = foo 
   end 

   def marshal_dump 
     self.foo.baz = :problem 
     {foo: self.foo} 
   end 

   def marshal_load(data) 
     self.foo = data[:foo] 
   end 
 end 

 class BugTest < Test::Unit::TestCase 
   def test_marshalization 
     foo = Foo.new 
     array = [foo, nil] 
     marshalled_array = Marshal.dump(array) 
     demarshalled_array = Marshal.load(marshalled_array) 

     assert_nil demarshalled_array[1] 
   end 
 end 
 ``` 

 I'm not positive this qualifies as a bug - if a programmer writes custom `marshal_dump` marshal_dump and `marshal_load` marshal_load methods, perhaps it's their responsibility to avoid unintended side-effects like those demonstrated in my test case. 

 However, I think this issue might be altogether avoided by adding a reserved delimiter character to Ruby's core marshallization functionality (in marshal.c) representing the "end" of a serialized object. For instance, in the above test case, `marshalled_array` comes out to: 

 ``` 
 \x04\b[\ao:\bFoo\x06:\t@barU:\bBar{\x06:\bfoo@\x06:\t@baz:\fproblem0 
 ``` 

 Suppose Ruby used a `z` character to represent the end of a serialized object - in this case, `marshalled_array` would come out to something like: 

 ``` 
 \x04\b[\ao:\bFoo\x06:\t@barU:\bBar{\x06:\bfoo@\x06:\t@baz:\fproblemz0 
 ``` 

 (Note the second-to-last character - `z`.) 

 This way, when demarshalling an object, even if additional instance variables had somehow snuck in during marshallization process, the `z` character could be used to mark the end of a serialized object, ensuring that the extra instance variables don't overflow into the next segment of serialized data. 

 I don't write much C, and I haven't fully grokked Ruby's marshal.c - so there may be dozens of reasons why this won't work. But I think a serialization strategy along those lines may help avoid unexpected behavior.

Back