Technology...

WWW2006 Podcast 3

September 2006 PAC


www2006 Edinburgh Conference

Podcast 3 of 4: Technical and W3C tracks


Opening Montage

STING

INSERT: i Shawn Henry

"With web accessability we wanted to take advantage of all the new technologies and developing technologies and help make those improve accessibility."

STING

INSERT: ii Harald Weinreich

"We were also surprised how quickly they interacted with the browser, that they stayed for less than 12 seconds on more than 50% of the pages."

STING

INSERT: iii Ziv Bar-Yossef

"I felt that in the audience there were some people who really knew what was going on inside.  They may be felt that these things have been known for a while, but they were not published, because these were trade secrets."


So, what did the audience know? What sort of web pages kept people interested for a whole 12 seconds? and, accessibility and new technologies. Research and standards from the 15th International World Wide Web Conference in Edinburgh.

SIGNATURE TUNE

Hello. 

STING

I'm Peter Croasdale. And welcome to this series of conference podcasts.

In the last episode we looked at some of the work that's going on to support researchers across the globe. In this one - we're actually going to look at the work that scientists, engineers and other researchers have have actually been doing on the workings of the Internet itself.

Research Papers



The conference is, in reality, an academic one. And as we heard in the first podcast, there were over 700 papers submitted - of which just about 10% were eventually accepted.

Also in that podcast I chatted to Carol Goble, from Manchester University - who was the co-chair of the technical track. She outlined the broad themes of the research going on - what she described as the "People powered web" - and - "People using the web".

Delving in to more
details of the technical track, she touched on both.


INSERT: 1 Carol Goble

"In the Semantic Web track there's been a strong emphasis on practicality: of really embracing this whole collective intelligence of the community.  So we've seen work on semantic Wikipedias.  How can we actually use semantic technologies to be in order to be able to ramp up our added value to these group or community activities.  And the same with social tagging.  How can we exploit people allocating random keywords to their Flickr libraries of photographs in order to be able to really get some value out of that for other applications.  So I think those are very interesting papers.  And I think that they are really what might have been perceived as the scruffy end of the semantic web, but I believe that there really important.  Today the link spam detection session was completely packed out.  You couldn't get into the door even if he wanted to because people are really...  Their whole information lives, both their personal information activities, their business information activities, research and academic information activities are so wrapped in search engines that anything that begins to muck about with the effectiveness of search engines like link spamming really screws you up."


Now, Ziv Bar-Yossef, from Technion - Israel's Institute of Technology - was the chair for that packed search session. 

There were three main speakers, two academics and one from industry. With two of them looking at the ability to detect and remove spam links and link alliances from search engines, and the third one looking at how to spot spam web pages themselves. But the research wasn't the only thing that Ziv found interesting.


INSERT: 2 Ziv Bar-Yossef

"What I found interesting about the whole session was that there were some hidden interaction between the audience and the speakers.  The speakers are the people who are not in the search engines themselves, they are outsiders, they are in academia, or industrial labs.  They are not really the ones, the engineers, who designed and work on search engines.  So they gave their own interpretation of how you can fight spam.  I felt that within the audience there were some people who really knew what was going on inside.  They may be felt that these things have been known for a while but they have not been published because they're trade secrets.  So, there was some kind of tension, I felt, especially in the questions session.

Croasdale: And why do you think that this is an area that attracted such a large audience?  Why is it so important? 

Bar-Yossef: This is probably one of the most burning issues today in search.  This is just a matter of the money.  People know that if they are ranked highly, there are one of the top 10 results, in various queries, they have the potential to attract a lost of traffic into their web sites.  So, there is a lot of incentive to companies to increase their rankings in the major search engines.  It has been around forever, since the web basically began.  But spammers became very sophisticated in the last two or three years.  They really understand how search engines work.  And they do use that in order to create more and more sophisticated spam strategies.  So really it is one of the main issues for search engines and the research community facing today."

Search, also appeared in the awards for the best papers - and Carol also played a role in awarding those prizes. But it was a tricky job sorting out the eventual winners - especially when the titles were so gripping. But - hey - these are academic papers...!

INSERT: 3 Carol Goble

"Random sampling from the search engines index, now I can tell that you are excited by that already.  Which is, how to sample random pages from a search engines index using only the search engines' public interface.  And you think, "oooh, that's boring!".  But it is actually for creating objective benchmarks for search engines.  But this is an open the problem.  And this paper actually takes the standard work in the area, as it were, which has two flaws in it, and addresses those flaws, and fixes them."

And the chap who did that fixing was none other than Ziv Bar-Yossef

So, the problem is this: If search engines don't want to show you their entire index - how can you tell how good any one search engine is over another. Well, if you could randomly sample pages from that index - just like a poll or a survey of people - you could then extrapolate out from those sample pages what you needed to know.

The trouble is, working out a way of getting mathematically random pages. I though just typing in 'Random' might do the trick - but you actually need a few more words to query the database than that. In fact - listen carefully to exactly how many extra queries you need to do.


INSERT: 4 Ziv Bar-Yossef

"So the idea we had is the following: we collected a large pool of possible queries, like we collected a pool of half a billion of phrases of length five, and then we picked randomly one of these phrases and submit them to each one of the search engines, and we get back the results.  Now if you pick one of the results randomly, this is more or less a random document from the index.  There are certain statistical procedures that you have to apply over these samples in order to make them really random.  The idea has been around for several years, and our innovation is in this statistical procedures to make the sample is truly random.  Which is important in order to get the accurate estimates, the accurate evaluations of the search engines.

Croasdale: So I suppose the critical question next, is which search engine is best?

Bar-Yossef: So, I don't have a straight answer, and I don't want to be part of these wars among search engines.  The only thing I can tell you it is we checked relative sizes amongst search engines.  Which index is bigger?  And it turned out that Yahoo is slightly bigger than Google, Google is slightly bigger than MSN search today.  But I should say that these things are changing all the time.  So one day he Yahoo is bigger, and other day Google is bigger.  It is very dynamic this area.

Croasdale: Do you think there is an opportunity for your work to the lead into some site that says, almost like a weather forecast, the best search engine at the moment -- is Google.  Tomorrow it might be Yahoo.  Or if you aren't doing scientific work, I would choose this one, if you're looking for financial data I'd use that one.

Bar-Yossef: That is the vision.  We want this to be a starting point of a new methodology, an objective methodology for evaluating and comparing search engines."
Ziv Bar-Yossef

STING

So, Ziv thought up half-a-billion random queries - but struggled for an exciting title to his paper. Now a paper that did catch my eye because of it's title was  - "
XML Screamer": Which I thought might be a teen-hacker-horror-rom-com type thing.

It was all about guts - but - the guts of your systems.

Basically, because we're slapping XML on everything - certainly in the commercial world - there are now racks of servers with as much get up and go as you average zombie. Eric Perkins, from
IBM, decided that enough was enough - it was time to sharpen that XML parser.

INSERT: 5 Eric Perkins

"A focus of our effort to get it moving in a sort of  normal pace was too view the task in a very performance orientated, very classical approach to network protocols and that sort of thing.  Where we were really watching every kind of cycle, as it were.  We wanted to account by every piece of the performance loss that we might see.  It the rush to add features we often do a very sloppy job of accounting for the amount of performance gain or loss that we see in a particular technology.  So you chalk it up to  "oh, well, it's textbased so of course it's going to be a little bit slower" and then that bleeds into "of course it's going to be a lot slower" and then it just falls all over the floor.  And we want to be very exacting about it, and that's where we go with that.  The particular technology that we used was a compilation technology.  So we used schema definitions to drive the parser and actually use advanced schema knowledge about what sort of documents or messages to you're parsing to speed up the parse.

Croasdale: So, having put this parser on a diet and make it think very carefully about what it was doing before it did anything, what sort of improvement to performances did you starting to see as a consequence of that?

Perkins: In sort of the most conservative cases, I thinkg,  you would see performance improvements on the order of twofold.  But as we expanded our viewpoint in terms of not just looking and one thin layer at the bottom of the stack but moving up into a more useful chunk of the work that was going on between getting XML bytes off the wire and getting them into usable form; when we looked at that sort of larger chunk we were looking at improvements more in the order of 5 to 10 fold.

Croasdale: Wow! That's seriously significant.

Perkins: Right.  Like I said it's because this infrastructure, this XML infrastructure is so new.  And we're adding new specifications every three months.  And so in the rush to do this we're just putting - all these specifications are made in layers.  You just add XML on top of a UNIX code; you add name spaces on top of XML; you had schema on top of that. And so you've got have all these layers and that makes sense from a specification standpoint. You can't specify a whole stack all at once.  But, from an implementation standpoint it is not going to get you performance to implement them that way.  You know, incrementally - add a new piece of blue on top of an old piece of glue.  So I think the take-home message is, push your vendors to start looking at performance and get that performance from them because you or the customer and you can make them do it.  And we've proven that it's possible to get that performance out."

STING

The browser track had a relatively small number of papers but - thought Carol Goble - was very important: A major contributor to her "People use the web" theme.

INSERT: 6 Carol Goble

"There's a return to thinking about how are people really using the Web as it is today.  So a lot of studies and a lot of work done in browsing and browsers 10 years ago because was all very novel and exciting and interesting. And at the end of the 1990s and people are interested in how one designed sites and how one design browsers in order to cope with sites.  And then people become rather complacent and think that's the way it always is - that's what a browser looks like and that's the way it is and this is the way that people design websites  - because that's the way it is. And there was a paper today, for example, which was in the browsing track, which was addressing well actually that isn't the way it is - maybe we should go back and have a look."


And that paper - called "Off the beaten tracks: Exploring three aspects of web navigation" was award the Best Student Paper of the conference.

Over the past ten years or so there has been little published work on the long term browsing habits of a groups of  normal users. So, the researchers, decided to take 25 people, over 4 months and recorded every click of their mouse on every site that they visited.

Harald Weinreich from the Hamberg Unversity was one of the authors. So, over the past 10 years or so - has much changed?

INSERT: 7 Harald Weinreich 

"The importance of certain elements of a Web browser have changed for instance the back button is not as frequently used anymore as in former times.  And then we wanted to find out why is it like that.  So, in former times about 35% of all actions to go to an already visited page was by the back button and now it is only about 14%.  We wanted to see why has this happened.  And so there are certain aspects like online applications or if you have an online application you cannot go back but you want to complete a certain task and that has to do with several web pages. But in fact they're not really documents, they are a kind of application.  And also users use now very often many windows or these tabs - tabbed browsing - and this also is a new kind of navigation strategy that wasn't there about 10 years ago.

Croasdale: Were there any other surprises in the results that you found?

Weinreich: Yes, we were also surprised how quickly they interact with the browser. That they stay for less than 12 seconds on more than 50% of the pages.  And if you see the whole distribution it is clear that the time of attention that they stay that they spend on a page is even much lower.

Croasdale: Do we need to increase things like selective bolding and the use of bulleted lists and those sorts of things that break up the content on the page which would allow users to quickly work out what are the key factors on this page?

Weinreich: Yes, in fact already there are several design guidelines and experts say pages on the web have to be designed for scanning, for quickly detecting important parts, and this supports this opinion of experts.  So we think that's quite good because now we know that some experts are right and we should rely on what they say.

Croasdale: And where next?  What do you think the consequences of this work are going to be?  Have you provided the next benchmark that is going to last another decade?

Weinreich: We would like to motivate other people to do similar work.  And we would really like to see more work on the human interaction with the Web.  There's a lot of technical improvements but if we look at the Web browser and if we look for instance and x-Mosaic the first very popular browser, many things have not changed very much.  The interface of the browser look very similar.  We still have a home button that is hardly used by anyone.  The links look the same. Very often you don't know where to go  - I don't see really a loss of improvements there."

Harald Weinreich.

STING

W3C Track


 
A critical part of the conference is the W3C track. A forum for discussing the future direction of the standards and guidelines that all go towards a World Wide Web that we can all use. We've already touched on some of the key advances that are being worked on in the previous podcasts - from the Semantic Web work - through to the Mobile Web initiative.

I thought it was time to check up on one of those simple,
little ideas, that took time to get going, but now has a real head of steam. Bert Bos is the Style Activity lead for the W3C - and king of the Cascading Style Sheet. After 10 years, what percentage HTML pages now use style sheets?
 

INSERT: 8 Bert Bos

"I did a little - first a quantitative answer - I did a little test last week and I did a simple crawling of the web and testing how many sites actually use it.  And I came to about 70% of all the HTML pages use style sheets now.  So I think we have shown that people like to use this.  It indeed makes making web pages easier.  You can concentrate on the continent and the style separately.  You can even hire somebody else to make your stylesheets who doesn't have to know the content.  You can also apply the same style to multiple pages at the same time.  It is more efficient in bandwidth because your reuse the same styles.  People like the system and it's easy to understand, easy to learn.  And I think that's the way we designed it, of course.  We designed it for everybody to use it and I think we've shown that it actually works that way

Croasdale: And where are we going with style sheets? Now we've got the browsers to start to understand the importance of rendering the pages properly, where are you taking style sheets next?

Bos: Right, we see CSS. is divided into levels.  The first level
that came out, the simplest level, is now 10 years ago almost.  The second level, that's what my book is about, level 2, that is where we are now.  That is what browsers implement now.  We're in the process of revising the level 2 specification.  We're working on level 3, which will add more features.  For example, vertical text for Japanese and Chinese, multiple columns like you find in newspapers - snaking columns.  We will do more features for borders for example.  You can now make straight lines as a border, but maybe you want a rounded corners, or you want images in the corners.  We will try to make a new system for layout where you can put things on the page more or less independently, but still align things to each other.  Like you would do in the table, but without being restrained by having the same order.  For example if you're on a mobile phone you might want to have the content first on the menu and afterwards.  While on a desktop screen where you have a long more space you on the menu at the top and then you can see the content at the same time without scrolling.  There's also more interested in using CSS for styling user interfaces.  All the new applications on the Web they use HTML and CSS and scripting to create something more like an application, a program with a graphical user interface instead of a document.  And for laying out user interfaces you need different types of controls.  But this will be a long-term process some parts of CSS level 3 are well advanced and other parts will take much longer."


And I'm convinced it will be well worth waiting for.

Now, that separation of content and presentation that Bert was talking about leads on to one of the most important and far reaching aspect of the web - that of providing accessible content for everyone.

Shawn Henry, also with the W3C, chaired the web accessibility session. The initial focus of that meeting was the accessibility of authoring tools. As the more and more user-generated content finds its way online - its essential that these online tools themselves allow access to people with assistive technologies. But I asked her about the impact of web 2.0, and the fact that was the web is changing in to a far more dynamic, time-based medium.

INSERT: 9 Shawn Henry

"Well, the exciting thing is it can also solve problems.  So, with Web accessibility we want to take advantage of all the new technologies and developing technologies and help make those improve accessibility.  So, it's really important that accessibility is considered early on in developing new technologies.  In the past a lost of the assistive technologies couldn't deal with JavaScript at all.  And now a lot of them can.  So that's really exciting news because now there is more that you can do with JavaScript that the screen readers can deal with just fine.  But there are additional issues that there weren't easy answer is to before and were working on those and have some implementations to demonstrate and it needs more work but it's an exciting step forward.

Croasdale: And were also in an environment where the webs moving off the computer.  It's having legs, it's walking to you or mobile, it's on your PDA.  I suspect that's throwing up a few issues as well?

Henry: It's actually a really good demonstration case for us. Because, in fact, we've been talking for a long time about situational limitations.  Some people have disabilities because of functional limitations some people have disabilities caused by the situation.  And in fact, when you're using your browser on a mobile phone or another handheld device you have situational limitations now.  And the exciting thing is that a lot of the techniques and approaches that we've developed for accessibility help and inform how to make your Web work well on a mobile device.  So at W3C we've worked together with the
mobile Web initiative and looking at their best practices and you'll see a lot of commonality is between those.  So it's really exciting.  A few years ago I had developed a site - several years - ago we weren't even thinking about mobile web.  It wasn't even on the radar.  But we said it has to be accessible - period.  It's going to be a top notch a example of accessibility.  We developed - great - accessible.  And the president of the company then got a mobile phone that could browse the Web.  And of course, the first thing she did was go to our site - and it worked great.  And it was "of course! we planned that - phew!" but it's just an example of the overlap."



Shawn Henry, with a great example of how designing-in accessibility from the start can help everyone - including your own promotional prospects! And I'm sure she'll be please to know that full transcripts of these programmes exist on the technology section of the Bright Indigo .com  website

One more podcasts to go. And in the that final one - we'll hear more on the commercial aspects of the emerging web technologies..

And if you'd like to subscribe to that one - or grab a podcast you missed - then go to - www2006.org - and look for the link to the podcasts in the main navigation.

This programme was produced for International World Wide Web Conference Committee by Bright Indigo. 

And if you'd like to email me - then - podcasts @ bright indigo .com - will do the trick!

But, until the final podcast – from me Peter Croasdale -

STING

Goodbye.

SIGNATURE TUNE