Project

General

Profile

Backport #3329

Segfault using nokogiri

Added by jdrowell (John Rowell) about 9 years ago. Updated 5 days ago.

Status:
Closed
Priority:
Normal
Assignee:
-
[ruby-core:30342]

Description

=begin
I'm using the Nokogiri gem to parse HTML and XML, and to apply XSLT. Tests went ok in irb but when running the method I get the following:

jdrowell@falcon:~/work/GEDi$ ruby crawler.rb
crawler.rb:18: [BUG] Segmentation fault
ruby 1.9.1p378 (2010-01-10 revision 26273) [i686-linux]

-- control frame ----------

c:0010 p:---- s:0033 b:0033 l:000032 d:000032 CFUNC :transform
c:0009 p:0066 s:0029 b:0029 l:000028 d:000028 METHOD crawler.rb:18
c:0008 p:0037 s:0022 b:0022 l:000013 d:000021 BLOCK crawler.rb:26
c:0007 p:---- s:0019 b:0019 l:000018 d:000018 FINISH
c:0006 p:---- s:0017 b:0017 l:000016 d:000016 CFUNC :each
c:0005 p:0032 s:0014 b:0014 l:000013 d:000013 METHOD crawler.rb:25
c:0004 p:0011 s:0010 b:0010 l:000009 d:000009 METHOD crawler.rb:31
c:0003 p:0063 s:0007 b:0007 l:00057c d:002338 EVAL crawler.rb:36
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:0000 s:0002 b:0002 l:00057c d:00057c TOP


-- Ruby level backtrace information-----------------------------------------
crawler.rb:18:in transform'
crawler.rb:18:in
prefeitura_noticia'
crawler.rb:26:in block in prefeitura_noticias'
crawler.rb:25:in
each'
crawler.rb:25:in prefeitura_noticias'
crawler.rb:31:in
run'
crawler.rb:36:in `'

-- C level backtrace information -------------------------------------------
0x81239f7 ruby(rb_vm_bugreport+0x47) [0x81239f7]
0x8150363 ruby() [0x8150363]
0x81503d8 ruby(rb_bug+0x28) [0x81503d8]
0x80d33c8 ruby() [0x80d33c8]
0x276410 [0x276410]
0x2ea1ff /usr/lib/libxslt.so.1(xsltApplyStylesheet+0x2f) [0x2ea1ff]
0xc3cac3 /home/jdrowell/.rvm/gems/ruby-1.9.1-p378/gems/nokogiri-1.4.1/lib/nokogiri/nokogiri.so(+0xaac3) [0xc3cac3]
0x811345d ruby() [0x811345d]
0x8113790 ruby() [0x8113790]
0x811e8ed ruby() [0x811e8ed]
0x811856d ruby() [0x811856d]
0x811b3c6 ruby() [0x811b3c6]
0x812077a ruby(rb_yield+0x1aa) [0x812077a]
0x812e191 ruby(rb_ary_each+0x41) [0x812e191]
0x8113790 ruby() [0x8113790]
0x811e8ed ruby() [0x811e8ed]
0x811856d ruby() [0x811856d]
0x811b3c6 ruby() [0x811b3c6]
0x811b5f9 ruby(rb_iseq_eval_main+0x99) [0x811b5f9]
0x805d64f ruby(ruby_exec_node+0x9f) [0x805d64f]
0x805e9e6 ruby(ruby_run_node+0x46) [0x805e9e6]
0x805c09a ruby(main+0x5a) [0x805c09a]
0x126bd6 /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x126bd6]
0x805bfa1 ruby() [0x805bfa1]

Please advise if any additional information would be useful. I can provide both the HTML and the XSLT file that caused the segfault. I'll continue to work on the issue and will leave more comments later.
=end

History

#1

Updated by jdrowell (John Rowell) about 9 years ago

=begin
This may be happening due to character encodings. A

 res = xslt.transform(page.search('//body'))

(where 'page' is a Mechanize instance) causes a segfault, while a

 res = xslt.transform(Nokogiri::HTML(page.content, nil, page.encoding))

does not. The original page is encoded with ISO-8859-1, and Mechanize doesn't always convert text to UTF-8 (#text is converted, #content is not). Maybe libxslt only accepts UTF-8 and Nokogiri is not properly converting the encodings before sending the text.
=end

#2

Updated by tenderlovemaking (Aaron Patterson) about 9 years ago

=begin
On Fri, May 21, 2010 at 09:50:54AM +0900, John Rowell wrote:

Issue #3329 has been updated by John Rowell.

This may be happening due to character encodings. A

res = xslt.transform(page.search('//body'))

(where 'page' is a Mechanize instance) causes a segfault, while a

res = xslt.transform(Nokogiri::HTML(page.content, nil, page.encoding))

does not. The original page is encoded with ISO-8859-1, and Mechanize doesn't always convert text to UTF-8 (#text is converted, #content is not). Maybe libxslt only accepts UTF-8 and Nokogiri is not properly converting the encodings before sending the text.

This sounds like it may be a bug in Nokogiri and not Ruby. Can you
please add a ticket to our tracker here:

http://github.com/tenderlove/nokogiri/issues

Also, if you provide the output of nokogiri -v and a script to
reproduce the problem, that would be extremely helpful. Thanks!

--
Aaron Patterson
http://tenderlovemaking.com/

Attachment: (unnamed)
=end

#3

Updated by jeremyevans0 (Jeremy Evans) 5 days ago

  • Status changed from Open to Closed
  • Description updated (diff)

Also available in: Atom PDF