Bug 5992 - Validator ignores HTML5 encoding declaration
: Validator ignores HTML5 encoding declaration
Status: REOPENED
Product: Validator
check
: HEAD
: All All
: P2 normal with 3 votes (vote)
: 0.8.6
Assigned To: This bug has no owner yet - up for the taking
: qa-dev tracking
: http://htmlex.met.cz/
:
:
:
:
  Show dependency treegraph
 
Reported: 2008-08-26 06:32 UTC by Martin Hassman
Modified: 2010-06-14 06:51 UTC (History)
10 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Hassman 2008-08-26 06:32:00 UTC
Seems validator ignores short version of encoding declaration:
<meta charset="utf-8">

Validation of page http://htmlex.met.cz/ gives me 1 warning "No Character
Encoding Found! Falling back to UTF-8." Validation with
http://html5.validator.nu/ tool gives no warning.

Looks problem is only in "Validate by URI" and "Validate by File Upload".
"Validate by Direct input" does produce no warning.
Comment 1 Patrick Bielen 2009-03-19 13:55:12 UTC
(In reply to comment #0)
> Seems validator ignores short version of encoding declaration:
> <meta charset="utf-8">

Indeed... agreed, something is not right in the validator,
i get the same problem.

Best Regards,

Patrick
Comment 2 Ville Skytt 2009-03-19 23:07:31 UTC
The problem is in the HTML::Encoding perl module used by the validator. 
There's a bug report open about it at
https://rt.cpan.org/Ticket/Display.html?id=42497
Comment 3 Dean Edridge 2009-03-20 12:53:43 UTC
(In reply to comment #2)
> The problem is in the HTML::Encoding perl module used by the validator. 
> There's a bug report open about it at
> https://rt.cpan.org/Ticket/Display.html?id=42497
> 

I can't see how that can be the problem. There may well be a problem with the
HTML::Encoding module, but that shouldn't affect (X)HTML5 validation. AFAICT
the W3C's part of the markup validator shouldn't even see the meta charset
(<meta charset="utf-8">) part of the webpage, as soon as the validator sees the
new HTML doctype (introduced in HTML5 (<!DOCTYPE html>)) it should pass the
whole document over to the validator.nu part of the validator for validation
and then the validator.nu should decide if the charset is correct or not, not
the main W3C validator.
Comment 4 Olivier Thereaux 2009-03-20 14:33:42 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > The problem is in the HTML::Encoding perl module used by the validator. 

> I can't see how that can be the problem. 
[snip]
> as soon as the validator sees the
> new HTML doctype (introduced in HTML5 (<!DOCTYPE html>)) it should pass the
> whole document over to the validator.nu 

The validator 1) needs to know the encoding before it can preparse the document
and detect that doctype and 2) needs to know and decode the bytes before it can
pass the document to the validator.nu engine. It is not “just” a redirection. 
Comment 5 Dean Edridge 2009-03-22 09:35:24 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > (In reply to comment #2)
> > > The problem is in the HTML::Encoding perl module used by the validator. 
> 
> > I can't see how that can be the problem. 
> [snip]
> > as soon as the validator sees the
> > new HTML doctype (introduced in HTML5 (<!DOCTYPE html>)) it should pass the
> > whole document over to the validator.nu 
> 
> The validator 1) needs to know the encoding before it can preparse the document
> and detect that doctype and 2) needs to know and decode the bytes before it can
> pass the document to the validator.nu engine. It is not “just” a
> redirection. 
> 

I think problems like this are going to be never ending, therefore I think the
W3C should use the validator.nu as for the "front end" of its validation
service. Has this been considered before?
Comment 6 Olivier Thereaux 2009-03-22 22:15:21 UTC
(In reply to comment #5)
> I think problems like this are going to be never ending, therefore I think the
> W3C should use the validator.nu as for the "front end" of its validation
> service. Has this been considered before?

This is getting a little OT and would probably be best on the validator list,
but yes, this has been considered. 

The validator.nu engine is a wonderful piece of software, in many ways superior
to the other engines which validator.w3.org uses. However, IMHO validator.nu is
neither stable enough (see e.g
http://lists.w3.org/Archives/Public/www-validator/2009Mar/0037.html ) nor
flexible enough (limited number of profiles, no DTD support for legacy HTML,
etc) nor usable enough (bare bone UI and limited message explanations, no file
upload, no direct input, etc) to simply "be" the sole and front engine on
validator.w3.org. 

I am quite certain that at this point, having validator.w3.org be a frontend
for multiple engines, including OpenSP for DTD and validator.nu for html5 and
other applications, is the most desirable architecture.
Comment 7 Oli Studholme 2009-05-07 03:13:21 UTC
For what it
Comment 8 Ville Skytt 2009-09-21 18:56:01 UTC
*** Bug 7135 has been marked as a duplicate of this bug. ***
Comment 9 Jill Ramonsky 2009-10-16 08:22:38 UTC
This one is biting me too. Nothing to add, except I'd like to see it fixed
soon.
Comment 10 Thomas Traub 2009-12-05 23:55:59 UTC
I encountered the same issue for http://usesthis.com/
Comment 11 Michael[tm] Smith 2009-12-08 09:16:48 UTC
Ville has a new Validator release queued up to deploy, and I think it may
contain a fix for this issue. I'll check with him and see.
Comment 12 Ville Skytt 2009-12-10 19:01:40 UTC
There is no fix for this issue yet.  I have some local prototype level code for
this which I'll revisit soon, but it has some showstopper problems (for example
it might in some cases affect validation of non-HTML5 HTML documents).  Due to
how the validator works at the moment, the fix is not trivial.
Comment 13 Ville Skytt 2009-12-11 19:13:06 UTC
A fix is now in CVS and available for testing at
http://qa-dev.w3.org/wmvs/HEAD/ .

Something weird happens when that (and my local instance) of validator tries to
access the HTML5 validator installed locally on
http://qa-dev.w3.org:8888/html5/ when validating http://htmlex.met.cz/ .  The
error is "Insecure dependency in connect while running with -T switch" and what
makes it strange is that interfacing the very same HTML5 validator when
checking some other documents (such as the ones from comment 7 and comment 10)
works just fine.  As does when the validator is configured to use
http://validator.nu/ as its HTML5 validator.  I have no idea how the document
to be validated could cause this (it has already been fetched locally, and is
about to be POSTed to the same HTML5 instance which works fine for other docs),
but I'll try to find out.
Comment 14 Ville Skytt 2009-12-12 12:51:48 UTC
(In reply to comment #13)
> Something weird happens when that (and my local instance) of validator tries to
> access the HTML5 validator installed locally on
> http://qa-dev.w3.org:8888/html5/ when validating http://htmlex.met.cz/ .

Workaround (but no reason) found and applied, more details at
http://rt.cpan.org/Public/Bug/Display.html?id=52707
Comment 15 Ville Skytt 2010-01-08 21:42:52 UTC
*** Bug 8678 has been marked as a duplicate of this bug. ***
Comment 16 Thomas Traub 2010-01-08 22:01:05 UTC
(In reply to comment #13)
> A fix is now in CVS and available for testing at
> http://qa-dev.w3.org/wmvs/HEAD/ .
> 
This fix works for me, thanks
Comment 17 Ville Skytt 2010-03-02 19:52:25 UTC
Code fixes are included in 0.8.6 but unfortunately the required
HTML::HeadParser >= 3.60 module is not installed on the production
validator.w3.org boxes yet.
Comment 18 Ted Guild 2010-03-03 04:03:24 UTC
(In reply to comment #17)
> Code fixes are included in 0.8.6 but unfortunately the required
> HTML::HeadParser >= 3.60 module is not installed on the production
> validator.w3.org boxes yet.

Installed now, sorry for the inconvenience.
Comment 19 Ville Skytt 2010-03-03 17:17:17 UTC
Thanks, closing.
Comment 20 Sasha Vodnik 2010-06-03 23:46:58 UTC
I just ran into this bug on the production site:
http://validator.w3.org/#validate_by_upload
The validator didn't see my file's <!DOCTYPE html>.
I verified that my code validates at 
http://qa-dev.w3.org/wmvs/HEAD/#validate_by_upload
Is it possible that this bug is fixed for the URI case, but not for uploads?

(In reply to comment #18)
> (In reply to comment #17)
> > Code fixes are included in 0.8.6 but unfortunately the required
> > HTML::HeadParser >= 3.60 module is not installed on the production
> > validator.w3.org boxes yet.
> 
> Installed now, sorry for the inconvenience.
Comment 21 Michael[tm] Smith 2010-06-14 06:51:26 UTC
I changed the category on this because this is not a bug in the validator.nu
HTML5-checking backend but instead relates to the Perl code