Bug 6329 - Implement XML::LibXML Structured Errors
: Implement XML::LibXML Structured Errors
Status: RESOLVED FIXED
Product: Validator
check
: HEAD
: All All
: P2 normal (vote)
: ---
Assigned To: Olivier Thereaux
: qa-dev tracking
: http://deeden.co.uk/misc/quantum.html
:
:
:
:
  Show dependency treegraph
 
Reported: 2008-12-22 14:14 UTC by Steve Rushe
Modified: 2009-03-13 14:59 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Steve Rushe 2008-12-22 14:14:40 UTC
I've noticed that some pages are reporting valid at one point in time and
invalid later despite nothing in the page having changed.

This occurs with 3 or 4 pages I'm testing, all of which are presumably invalid
with a common error (no space before a class=). I've reduced this down to a
test case which reproduces the bug I'm seeing
(http://deeden.co.uk/misc/quantum.html). There is no space before class="hello"
on line 10 and this is what the validator reports on when it views the page as
invalid.  

As I write this the page is being reported as invalid, however in a while
(whether  minutes or hours) it will report it as valid. If I retry the page a
few times it will continue to say it is valid until it eventually starts
reporting it as invalid, again consistently.

I've checked that the headers being sent for the page are the same during both
valid and invalid periods and they are, so it's not something to do with that.

The behaviour I see is happening both through the web interface and the
WebService::Validator::HTML::W3C perl module.
Comment 1 Olivier Thereaux 2008-12-22 14:26:39 UTC
(In reply to comment #0)
> I've noticed that some pages are reporting valid at one point in time and
> invalid later despite nothing in the page having changed.

I find this surprising. One way to debug this would be to check 
Comment 2 Steve Rushe 2008-12-22 14:54:01 UTC
(In reply to comment #1)
> I find this surprising. One way to debug this would be to check 
Comment 3 Olivier Thereaux 2008-12-22 15:08:37 UTC
(In reply to comment #2)
> It was the same here until just now when I checked. At the moment, it's
> reporting as valid for me.

Ack. It's not one validator being inconsistent, it's two servers acting
differently:
http://128.30.52.13/check?uri=http%3A%2F%2Fdeeden.co.uk%2Fmisc%2Fquantum.html&debug

http://128.30.52.49/check?uri=http%3A%2F%2Fdeeden.co.uk%2Fmisc%2Fquantum.html&debug

It looks like these two servers are using different versions of the XML
libraries, but the difference in result are disturbing. Will look into that.
Comment 4 Steve Rushe 2008-12-22 15:38:16 UTC
(In reply to comment #3)
> 
> Ack. It's not one validator being inconsistent, it's two servers acting
> differently:
> http://128.30.52.13/check?uri=http%3A%2F%2Fdeeden.co.uk%2Fmisc%2Fquantum.html&debug
> 
> http://128.30.52.49/check?uri=http%3A%2F%2Fdeeden.co.uk%2Fmisc%2Fquantum.html&debug
> 
> It looks like these two servers are using different versions of the XML
> libraries, but the difference in result are disturbing. Will look into that.

Cheers for that Olivier. I'm relieved that it's not something I did.
Comment 5 Olivier Thereaux 2008-12-22 16:39:43 UTC
I am finding incompatibilities between libxml2 and XML::LibXML, two lower-lever
libraries used by the validators, but only for certain versions. This is very
puzzling, to say the least. 

Below, the script I used to test on  various machines, and a number of results.
The results saying "attributes construct error" are the proper, expected ones.

I will try and contact the maintainer(s) for XML::LibXML, in hope that they can
be of help.


#!/usr/bin/perl
use 5.008; use strict; use warnings; use utf8; use XML::LibXML qw();
my $dotted=XML::LibXML::LIBXML_DOTTED_VERSION;
print "XML::LibXML Version: $XML::LibXML::VERSION\nlibxml2 Version:
$dotted\n\n";
XML::LibXML->new()->parse_string('<foo attr1="value1"attr2="value2" />');



XML::LibXML Version: 1.66
libxml2 Version: 2.6.16

:1: parser error : attributes construct error
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Couldn't find end of Start Tag foo line 1
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Extra content at the end of the document
<foo attr1="value1"attr2="value2" />
                   ^ at testlibxml.pl line 5


**************************************************************************

XML::LibXML Version: 1.68
libxml2 Version: 2.6.16

:1: parser error : attributes construct error
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Couldn't find end of Start Tag foo line 1
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Extra content at the end of the document
<foo attr1="value1"attr2="value2" />
                   ^ at testlibxml.pl line 5



**************************************************************************

XML::LibXML Version: 1.69
libxml2 Version: 2.6.16

:1: parser error : attributes construct error
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Couldn't find end of Start Tag foo line 1
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Extra content at the end of the document
<foo attr1="value1"attr2="value2" />
                   ^ at testlibxml.pl line 5



**************************************************************************

XML::LibXML Version: 1.66
libxml2 Version: 2.6.32

:1: parser error : attributes construct error
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Couldn't find end of Start Tag foo line 1
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Extra content at the end of the document
<foo attr1="value1"attr2="value2" />
                   ^ at testlibxml.pl line 5


**************************************************************************



XML::LibXML Version: 1.69
libxml2 Version: 2.6.32

:1: parser error : Extra content at the end of the document


**************************************************************************

XML::LibXML Version: 1.63
libxml2 Version: 2.6.29

:1: parser error : attributes construct error
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Couldn't find end of Start Tag foo line 1
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Extra content at the end of the document
<foo attr1="value1"attr2="value2" />
                   ^ at testlibxml.pl line 5


**************************************************************************

XML::LibXML Version: 1.68
libxml2 Version: 2.6.27

:1: parser error : Extra content at the end of the document



**************************************************************************
Comment 6 Olivier Thereaux 2008-12-22 16:40:14 UTC
(In reply to comment #4)
> Cheers for that Olivier. I'm relieved that it's not something I did.

No problem Steve, and many thanks for the report.
Comment 7 Olivier Thereaux 2008-12-22 16:52:08 UTC
For the time being I have downgraded the version of the XML::LibXML library
(now using 1.66 which seems to work better) and the two validator.w3.org
servers are producing the proper (and consistent) output.

Will keep thi sbug open until we have a satisfying resolution of the library
problem, not just this workaround.
Comment 8 Olivier Thereaux 2008-12-22 19:37:30 UTC
I think I found the culprit in the Changelog for XML::LibXML. In recent
versions, there is a new module to use the Structured Errors API (great!) but
it's not quite backward compatible.

Will have to add code to handle the Structured Errors. 
http://search.cpan.org/~pajas/XML-LibXML-1.69/lib/XML/LibXML/Error.pod
Comment 9 Olivier Thereaux 2008-12-31 20:36:49 UTC
(In reply to comment #8)
> I think I found the culprit in the Changelog for XML::LibXML. In recent
> versions, there is a new module to use the Structured Errors API (great!) but
> it's not quite backward compatible.
> 
> Will have to add code to handle the Structured Errors. 
> http://search.cpan.org/~pajas/XML-LibXML-1.69/lib/XML/LibXML/Error.pod

The frustrating part so far is that the new structured errors code only gives
you the last parsing error (when, if anything, I would be happy enough showing
only the first!)

Using this code, from the perl module documentation:

use 5.008; use strict; use warnings; use utf8; use XML::LibXML qw();
my $dotted=XML::LibXML::LIBXML_DOTTED_VERSION;
print "XML::LibXML Version: $XML::LibXML::VERSION\nlibxml2 Version:
$dotted\n\n";

eval {XML::LibXML->new()->parse_string('<foo attr1="value1"attr2="value2"
/>')};
if (ref($@)) {
  # handle a structured error (XML::LibXML::Error object)
  print $@->dump();
} elsif ($@) {
  # error, but not an XML::LibXML::Error object
  print $@;
} else {
  # no error
}


ot@qa:~$ perl testlibxml.pl 
XML::LibXML Version: 1.66
libxml2 Version: 2.6.32

:1: parser error : attributes construct error
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Couldn't find end of Start Tag foo line 1
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Extra content at the end of the document
<foo attr1="value1"attr2="value2" />
                   ^ at testlibxml.pl line 6


ot@qa:~$ perl testlibxml.pl 
XML::LibXML Version: 1.69
libxml2 Version: 2.6.32

$error = bless( {
                  'num1' => 0,
                  'file' => '',
                  'message' => 'Extra content at the end of the document
',
                  'domain' => 1,
                  'level' => 3,
                  'str2' => undef,
                  '_prev' => undef,
                  'str1' => undef,
                  'str3' => undef,
                  'num2' => 11,
                  'code' => 5,
                  'line' => 1
                }, 'XML::LibXML::Error' );

Maybe I'm doing it wrong? The documentation is scarce, might need to contact
the developer(s) to get some clearer answers.
Comment 10 Olivier Thereaux 2009-01-07 22:06:45 UTC
Followup archived here:
http://lists.w3.org/Archives/Public/public-qa-dev/2009Jan/0003.html

Hoping Petr will have time to respond.
Comment 11 Olivier Thereaux 2009-01-22 14:34:57 UTC
Got a prompt reply from Petr:
http://lists.w3.org/Archives/Public/public-qa-dev/2009Jan/0004.html
and now waiting for the release of the version 1.70 of XML::LibXML
http://search.cpan.org/dist/XML-LibXML/

Things are going to be tricky then, because any system with a version of
XML::LibXML between 1.67 and 1.69 (inclusive) will have slightly wrong error
reporting for xml-wf issues. I'm wondering whether this would be acceptable, or
whether to require >= 1.70, which may then be a burden.

To be continued...
Comment 12 Olivier Thereaux 2009-02-02 23:20:42 UTC
*** Bug 4420 has been marked as a duplicate of this bug. ***
Comment 13 Olivier Thereaux 2009-02-05 17:02:47 UTC
Ah-ha! 

XML-LibXML-1.69_1 developer release. Should be enough to start work on
implementing the structured errors in the validator.
http://cpansearch.perl.org/src/PAJAS/XML-LibXML-1.69_1/Changes
Comment 14 Olivier Thereaux 2009-02-13 15:05:39 UTC
Implementation done, ready for next release:
http://lists.w3.org/Archives/Public/www-validator-cvs/2009Feb/0092.html
and
http://lists.w3.org/Archives/Public/www-validator-cvs/2009Feb/0136.html

Note that the code above does rely on a developer version of XML::LibXML -
might be a bit of trouble for people with their own instance. But with time,
we'll be fine.
Comment 15 Olivier Thereaux 2009-03-13 14:59:08 UTC
Here is a one liner to text which version of XML::LibXML your system has:

perl -MXML::LibXML -e 'print " XML::LibXML Version: $XML::LibXML::VERSION\n";' 


If you have a version > 1.66 and < 1.70, I suggest heading to CPAN and install
the latest version:
http://search.cpan.org/dist/XML-LibXML/