Project

General

Profile

Actions

Bug #3686

closed

Error in parsing musicbrainz.org with rexml

Added by vinc-mai (Vincent Carmona) almost 12 years ago. Updated about 11 years ago.

Status:
Closed
Priority:
Normal
Target version:
ruby -v:
ruby 1.9.1p378 (2010-01-10 revision 26273) [i486-linux]
Backport:
[ruby-core:31693]

Description

=begin
rexml (ruby 1.9.1) fails to parse this url http://musicbrainz.org/show/puid/?puid=c6a6717f-6d88-4d0e-4c57-d6b949118072 .

require 'net/http'
require 'rexml/document'

url='http://musicbrainz.org/show/puid/?puid=c6a6717f-6d88-4d0e-4c57-d6b949118072'
res=Net::HTTP.get_response(URI.parse(url))
doc=REXML::Document.new(res.body)

/usr/lib/ruby/1.9.1/rexml/parsers/treeparser.rb:95:in rescue in parse': #<RuntimeError: Undeclared entity '&raquo;' in raw string "Skip to main content &raquo;"> (REXML::ParseException) /usr/lib/ruby/1.9.1/rexml/text.rb:165:in block in check'
/usr/lib/ruby/1.9.1/rexml/text.rb:153:in scan' /usr/lib/ruby/1.9.1/rexml/text.rb:153:in check'
/usr/lib/ruby/1.9.1/rexml/text.rb:125:in parent=' /usr/lib/ruby/1.9.1/rexml/parent.rb:19:in add'
/usr/lib/ruby/1.9.1/rexml/parsers/treeparser.rb:45:in parse' /usr/lib/ruby/1.9.1/rexml/document.rb:228:in build'
/usr/lib/ruby/1.9.1/rexml/document.rb:43:in `initialize'

$ ruby1.9.1 --version
ruby 1.9.1p378 (2010-01-10 revision 26273) [i486-linux]

ruby 1.8.7 can parse these data.
=end

Actions

Also available in: Atom PDF