Google Webmaster Central Blog - Official news on crawling and indexing sites for the Google index

Crawl Errors: The Next Generation

Monday, March 12, 2012 at 3:00 PM

Webmaster level: All

Crawl errors is one of the most popular features in Webmaster Tools, and today we’re rolling out some very significant enhancements that will make it even more useful.

We now detect and report many new types of errors. To help make sense of the new data, we’ve split the errors into two parts: site errors and URL errors.

Site Errors

Site errors are errors that aren’t specific to a particular URL—they affect your entire site. These include DNS resolution failures, connectivity issues with your web server, and problems fetching your robots.txt file. We used to report these errors by URL, but that didn’t make a lot of sense because they aren’t specific to individual URLs—in fact, they prevent Googlebot from even requesting a URL! Instead, we now keep track of the failure rates for each type of site-wide error. We’ll also try to send you alerts when these errors become frequent enough that they warrant attention.

View site error rate and counts over time

Furthermore, if you don’t have (and haven’t recently had) any problems in these areas, as is the case for many sites, we won’t bother you with this section. Instead, we’ll just show you some friendly check marks to let you know everything is hunky-dory.

A site with no recent site-level errors

URL errors

URL errors are errors that are specific to a particular page. This means that when Googlebot tried to crawl the URL, it was able to resolve your DNS, connect to your server, fetch and read your robots.txt file, and then request this URL, but something went wrong after that. We break the URL errors down into various categories based on what caused the error. If your site serves up Google News or mobile (CHTML/XHTML) data, we’ll show separate categories for those errors.

URL errors by type with full current and historical counts

Less is more

We used to show you at most 100,000 errors of each type. Trying to consume all this information was like drinking from a firehose, and you had no way of knowing which of those errors were important (your homepage is down) or less important (someone’s personal site made a typo in a link to your site). There was no realistic way to view all 100,000 errors—no way to sort, search, or mark your progress. In the new version of this feature, we’ve focused on trying to give you only the most important errors up front. For each category, we’ll give you what we think are the 1000 most important and actionable errors.  You can sort and filter these top 1000 errors, let us know when you think you’ve fixed them, and view details about them.

Instantly filter and sort errors on any column

Some sites have more than 1000 errors of a given type, so you’ll still be able to see the total number of errors you have of each type, as well as a graph showing historical data going back 90 days. For those who worry that 1000 error details plus a total aggregate count will not be enough, we’re considering adding programmatic access (an API) to allow you to download every last error you have, so please give us feedback if you need more.

We've also removed the list of pages blocked by robots.txt, because while these can sometimes be useful for diagnosing a problem with your robots.txt file, they are frequently pages you intentionally blocked. We really wanted to focus on errors, so look for information about roboted URLs to show up soon in the "Crawler access" feature under "Site configuration".

Dive into the details

Clicking on an individual error URL from the main list brings up a detail pane with additional information, including when we last tried to crawl the URL, when we first noticed a problem, and a brief explanation of the error.

Details for each URL error

From the details pane you can click on the link for the URL that caused the error to see for yourself what happens when you try to visit it. You can also mark the error as “fixed” (more on that later!), view help content for the error type, list Sitemaps that contain the URL, see other pages that link to this URL, and even have Googlebot fetch the URL right now, either for more information or to double-check that your fix worked.

View pages which link to this URL

Take action!

One thing we’re really excited about in this new version of the Crawl errors feature is that you can really focus on fixing what’s most important first. We’ve ranked the errors so that those at the top of the priority list will be ones where there’s something you can do, whether that’s fixing broken links on your own site, fixing bugs in your server software, updating your Sitemaps to prune dead URLs, or adding a 301 redirect to get users to the “real” page. We determine this based on a multitude of factors, including whether or not you included the URL in a Sitemap, how many places it’s linked from (and if any of those are also on your site), and whether the URL has gotten any traffic recently from search.

Once you think you’ve fixed the issue (you can test your fix by fetching the URL as Googlebot), you can let us know by marking the error as “fixed” if you are a user with full access permissions. This will remove the error from your list.  In the future, the errors you’ve marked as fixed won’t be included in the top errors list, unless we’ve encountered the same error when trying to re-crawl a URL.

Select errors and mark them as fixed

We’ve put a lot of work into the new Crawl errors feature, so we hope that it will be very useful to you. Let us know what you think and if you have any suggestions, please visit our forum!

The comments you read here belong only to the person who posted them. We do, however, reserve the right to remove off-topic comments.

72 comments:

Peter said...

Just a heads up - the link to "more info" when you click on a url in the Soft 404 section points to a non-existent url.

It redirect a few times and then eventually shows "the information you requested can't be found"

Andrea Pernici said...

Great improvements.

Only one tedious thing...why crawl set as 404 also link in onclick="_gaq.push(['_trackPageview', '/outgoing/etcetc']);".

And also DFP js tags? Like googletag.defineSlot('/6711113/Forum_Leaderboard_Alto_728x90', [728, 90], 'div-gpt-ad-1330620494683-0').addService(googletag.pubads());

Rajesh Narayanan said...

Very nice improvements, much awaited.

Sean said...

Right on, Kurt and team. Nice work!

fibers said...

If a page was removed and is now a 404. Should I mark it as fixed or does that mean it is no longer a 404?

If there are 1000s of 404s for pages that have been removed and will never be a page yet, mark as fixed or ignore?

Manuel Lemos said...

The mark as fixed is useful but having to mark each page with crawl errors as fixed one at a time is a pain when you have hundreds of pages to mark.

Can you please add an option to mark all filtered options in the current listing page at once?

Search Engine Optimizer said...

This is very nice improvement in google web master tool !! for web site admin of broken links, URL path as 400, 500, etc... !

Don Dekalp said...

Hi there, same request as Manuel lemos, but for the "remove url from index" feature. Mass delete from index if you want.
Cheers.

Hiren Modi said...

I have one solution to fixing errors. You have removed section which shows me crawl errors in XML sitemap. That's why I have to open each and every URL to see error type and where they detected.

Because, if any broken link will detect in XML so I just need to update XML sitemap rather than set up custom 404 page or 301 redirect. What you think about it?

Igor said...

"Crawl errors" in the dashboard has a bug with localisation. It should be in English, but last night it was in French, now it's in German (I think).

Ralk said...

Really great update! Love it!

two missing things:

the checkbox "mark all" is really missing.

like Andrea P. said:
the trackPageView 404s are kind of annoying

run-IT-direct said...

I get foreign language in "crawl errors"! Search queries is correct but under Crawl Errors it isn't my language setting. The foreign language differs from site to site too!

Peter Líška said...

I have removed all pages with 404 from index in January by 'Crawler access' -> 'Create new removal request' and Google still shows me crawl errors.

Please for 404 - make small quick button - real removing from index.

Matt A said...

Same thing with languages (just for the crawl errors) on the dashboard. If you keep refreshing the page you get a different language every time.

Gus said...

Hi, nice thing!
Just to fix a confusing translation.
'Mark as fixed' column should be 'Marcar como arregladas' in spanish, instead of 'Marcar como fijas'
Thanks
Gustavo

Gus said...

...or 'Marcar como solucionado', which is in the individual error help screen is also a good translation

alicemortie said...

If i add a sub domain to my ten year old blog then what will be the effect of that.
hotel seo

Yana said...
This comment has been removed by the author.
Yana said...

Hi Webmasters, I think this is a very good improvements made by Google Webmaster. But I have the questions:

I have removed some URL pages long ago but it's displays as URLs 404 (not found), then I have checked the URLs "mark as fixed".

1. what would happened after I entered them to "mark as fixed?".

2. Does it means those links will no longer 404?

3. I have removed the 404 URLs then why it still show as 404?

Best regards!

Steve Johnston said...

You must, must, must reinstate the crawl error download. The analysis of enterprise sites when the errors may be in the 100s of thousands cannot be done efficiently without the ability to export large numbers of errors and analyse them in spreadsheets for patterns. Please let us have it back!

seoer said...

Hey guys,

instead of limiting the information to 1000 errors, which to me sounds like limiting the possibility to action, why don't you make them available in different panes or eventually chronologically classifiable and downloadable.

Let's say I want to download all the errors occurred in Dec 2012, a data picker will allow me to choose the date and download only the information for that time frame.

Easy peasy, uh?

Peter Lauge said...

Is it out for all users in all countries?

Nail and Beauty Studio, Marlow said...

Looks good on the surface - I just went to webmaster tools, and all my crawl errors are in spanish?!

JustDoItEasy said...

Looks nice! Yep indeed I have it Spanish German, French :) but only 4 errors :)
Grtz

John Mueller said...

Hi everyone, we're aware of the language mix-up regarding the labels there, sorry for the confusion! We hope to have that resolved shortly. In other news, this is your chance to learn what various crawl errors are called in other languages :)

Iwein Dekoninck said...

Please give us back the ability to download all crawl errors - it's invaluable to find and fix the many problems often found on large sites. 1,000 errors simply isn't enough...

Matt Staton said...

Why is some of this in spanish? http://awesomescreenshot.com/00f1hrzc5

anniecushing said...

The API access will be critical for agencies to provide the same level of service to Google's end users. Limiting us to 1000 results of each error is a significant reduction in data. In the interim, at least we have tools like Screaming Frog to make up for the lack.

seohippus said...

It still reports made-up URLs taken from JavaScript etc. These are not necessarily site errors - they are Google crawling errors. Perhaps something to indicate the source was an actual href link would help? Or, as well as "Mark as fixed", perhaps you could introduce a "Mark as not my problem - Googlebot's problem" button. :-)

I think the split of site and URL errors and prioritisation are good ideas. However, only providing 1,000 errors is a step back. An option or API to allow the download of all errors would definitely be useful and appreciated, as would more transparency on why certain errors have been prioritised over others. Some of the URLs I'm seeing as priority have never been pages on the site, linked to from anywhere or seen any traffic.

In short, thanks for trying to make things easy but I'll always prefer to investigate for myself before deciding what needs fixed.

Jesse said...

There are some translation errors in the error overview on the dashboard. While my language settings are Dutch but the error labels are in different languages...

techseo said...

I was previously able to download the entire list of errors including where they originated from into an Excel file. Is this still possible? It was more useful than clicking through each error individually.

Thanks!

John Wheatcroft said...

Why are my Google Crawl Errors suddenly showing up in German, refreshing makes then French, refreshing again we get Spanish and finally refreshing again we get Italian. Have Google buggered something up today. It's been fine for years - now it isn't, put it back the way it was pelase because it worked and now it doesn't. I don't care what "Enhancements" you have created to justify your jobs, put it back the way it worked !!!

Thanks
Computer Solutions

www.kenilworthcomputerrepairs.com

Tricky's Boutique said...

CDJust to let you all know if you click the mark as fixed box on the bottom row then scroll to top hold down SHIFT and then click the top one it will select all and you can mark up to 500 as 'fixed' this way :)

Pittbug said...

I'm very interested in API access to download all crawl errors. I work with a large site that can easily have over 100,000 errors with similar looking URLS. Having access to them will help us narrow down the various root causes and fix many URLs at a time.

a-kleinschmidt said...

Excellent work! Very helpful indeed!

Except maybe for the fact that the Dashboard now shows the crawl errors in French instead of German ;)

Thanks a lot!

Kamal Gir said...

I have noticed that Google is displaying soft 404 error for those pages which are 301 redirected. Why 301 redirected pages are being displayed in soft 404?

How to fix soft 404 errors?

Steve said...

Please reinstate the ability to download all errors to CSV. I don't need an API, just the ability to download a CSV, like we've always had.

IT Service said...

Great work! Really well done.

But I am missing one information: the number of links pointing to a broken URL is no longer shown at first sight. Now I have to click on every URL to see if it is linked from other pages.

Peter Liska said...

And what about HTML suggestions? Will you improve them?

1.)
Delete pages removed from index. (Crawler access' -> 'Create new removal request')

I have deleted these pages from index, why do you suggest improvements to me?

2.)
Delete pages with disabled URL parameters. ('URL parameters' -> 'No URLs')

I have disabled these URL parameters, why do you still have them in index?

3.)
Please add button: 'Page removed to' + input text field for new URL

Pages was removed and you say new and old URL - they have same Tittle.

Peter Liska said...

I want to write 'moved' instead of 'removed' sorry:

3.)
Please add button: 'Page moved to' + input text field for new URL

Page has moved. And you say new and old URL - they have same Tittle.

Stefan said...

Why do you crawl Virtual URLs from Google Analytics. These now show up as 404 errors in Google Webmaster Tools.

Thanks for the new features to look on server issues. This helps a lot.

problogger said...

same here my crawl errors showing in different language some times german sometimes holand! what is wrong?

J-MP3 said...

I just wanna know what can we do if it is site error with DNS error? It is affecting the entire site in google search. Any suggestions will be appreciated. THX & hope hearing it soon from gooogle staff.

ApOG said...

I've been trying to speed up my website. I followed almost all "page speed" advises, it's rated 92/100, which I think is pretty good. I've also done tests that show that the total loading time is almost always about 1 second. However, in my google webmasters tools, the graph instead of showing better times, it's worse every day... It would be good to get some feedback from people that really know about this (dns, catching, i don't know), but I don't know any.

Thanks!

DT said...

Very good job. I have got a question: I have some URL errors, I don't know where those URL come from. They are not linked anywhere in my site, nor they are in my sitemap. The "in sitemap" and the "linked from" tabs are empty, so I don't know how to do to get the source of those errors.

Hope someone can help.

Regards,
D.

Jessica said...

I don't think having access to what Google believes are the top 1000 Not Found URLs is enough. I would love to ability to download all the URLs so that I can check if they need to be redirected or not. Thanks!

allonline said...

Please stop trying to index or crawl pages that do not exist anymore and have not done so for over a year. If i deleted a section of my site or remove perhaps a malware infection that generated loads of spam pages i still see these as errors for pages not found, they have not existed sometimes for over more than one year. The option to no longer index urls pointing to dead pages in your index or stop crawling pages not found after a time period would remove a lot of error messages and also clean up your index. Sometime I delete a whole sites content to rebuild it say from a static site to a dynamic one but all the old pages are still looked for by Google many years later. If the page is not found after a year I would say the link pointing to it from somewhere else is obsolete and shouldnt be followed in the first place. It would make more sense to alert the site owner if it had links point to page that may no longer exist as they cannot be found for a certain time period. That way if i have dead links im not aware of pointing to some elses site. Instead of them getting crawl errors I would get the error pointing out the link and can either remove it or fix it. In the meantime Google could just de-index it unless it is found again from a new link.

Trevor said...

I agree with the following users:

--Steve Johnston
--seoer
--Iwein Dekoninck
--anniecushing
--techseo
--Pittbug
--Steve
--Jessica

I'd like to have back the ability to see all the errors. 1000 is not enough. In the case of my site, I've got hundreds of thousands of errors for URLs that are now working and I want to mark them all as fixed. I have to clear them one-by-one over days and weeks and months? Even an API or something would be fine. I just want to mark them as fixed.

Charlie said...

Like Andrea P. I have 404 crawl errors for all of my DFP ad units

googletag.defineSlot(...

Vijay Padiyar said...

I have a question about what exactly "Mark as fixed" does.

On my site, Google reports a lot of 404s for pages that no longer exist, and even the pages that link to them don't exist anymore.

For e.g. Google reports a 404 for mysite.com/page/a, and "linked from" shows mysite.com/page/b, when neither a nor b exist anymore.

So should I mark these URLs as fixed, even though there's nothing to fix at my end, really?

Also, why is Google trying to follow URLs from cached pages of my site that were deleted over a year back? This is the root of the problem.

Thanks

Vijay Padiyar

Mareforzanove said...

We can not longer download a csv file with broken links and pages where are them. This was one of the feature most used to fix broken links, please republish this tool!

David Harnadek said...

Yes Please. Very important for large sites.

".. we’re considering adding programmatic access (an API) to allow you to download every last error you have, so please give us feedback if you need more .."

David Harnadek said...

Yes Please. Very important for large sites.

".. we’re considering adding programmatic access (an API) to allow you to download every last error you have, so please give us feedback if you need more .."

realestate said...

i am facing similar prob....I am not getting a answer to my prob. Actually, i am analysing a sharp increase and decrease in my indexed pages. one day it was 9840 pages and the very next day it fall down to 3450. Also, that these days my website are goin many changes. Within 2 months my company has launched 3rd time after revamping the complete site. also we happen to see many crawler issues in the new and old indexed pages.
Kindly, help me with this technial issue as it is affecting my indexed pages big time with sharp fluctuation in the indexed pages.

Mr Ed said...

Yes - us too!! Have I missed something fundamental - there's a download button but ironically the page doesn't exist!!

Dealing with the errors efficiently and effectively is the point of the page, please help.

WebWizard said...

We recently changed web servers and have about 45k 404 errors. At 1,000 per day, it will take 45 days to "Mark As Fixed".

I wish you could pull more than 1,000 per day.

WebWizard said...

We recently changed web servers and have about 45k 404 errors. At 1,000 per day, it will take 45 days to "Mark As Fixed".

I wish you could pull more than 1,000 per day.

chrisw said...

I hope Google guys are still reading these comments! I thought I would try out this feature for a while and then try to post some sensible feedback.

Okay, while I appreciate the effort that has gone into this feature, to me it is almost exactly the opposite of what I would actually like.

I get hundreds of errors showing up on my site, 99.9999% of which are caused by bad-search-engine type sites, crawlers, spiders, and inept users who can't make links to the right pages. I cannot fix any of these links - and even if I did, more would immediately reappear. A huge waste of time. Half these sites are of such poor quality that I couldn't even contact them if I wanted to. I could set up 301 redirects for some or all of these, but a) it would take forever b) it would slow down my server having umpteen hundred redirects and c) it still wouldn't stop other people making careless links. The bottom line is that I don't care if crap sites are making inept, broken links to my site. Even if I go through all these links, check them, and marked them as fixed (which actually means "I cannot fix them even if I want to"), they reappear again a few days, weeks, later.

A tiny few of the crawl errors are my mistakes: bad links from one page of my site to a page that doesn't exist on my site. These are things I can fix and I'm hugely grateful to have them pointed out. Excellent stuff. The trouble is, I have to go through all the crap links to find them.

What I would really like to know about are the broken links out from my site to other sites - the dead links that produce 404s on other sites. These are things I *can* and want to fix, but they're very hard to find on a large site.

In summary, what I'd really value is:
a) An ability to mark errors as "not fixable by me" (an error caused by some other site that I cannot fix myself) - and please don't tell me about this error again.
b) An easy way to find the crawl errors only within my own site without going through all the crawl errors. (My mistaken attempts to link to non-existent pages on my own site.)
c) A list of broken outgoing links on my own site, which I would love to be in a position to scan for and repair regularly.

At the moment, it seems to me you're telling webmasters about lots of errors they can't fix - and not telling them about lots of errors they can!

But, as always, I *greatly* value the effort you put into Webmaster tools and I thank you for listening. All the best....

Admin said...

Like Charlie AND Andrea P. I have 404 crawl errors for all of my DFP ad units

googletag.defineSlot(...

Charlie said...

@Admin, I recently added the dfp "links" to the robots.txt Disallow list. I've marked them as fixed in the webmaster tools, and they have not popped back up.

Tino said...

@admin

Please let us have ability to fix more than 1000 URL at a time

The Barcode Spot said...

Please allow more than 1000 errors to be downloaded at one time. Also, please respond. If there is an API that is accessible, please make it evident.

Hiren Modi said...

This is such a fabulous facility on Google webmaster tools. But, I have big concern with De-Indexing of pages.

I'm working on eCommerce website. [Vista Stores] I can see 404 pages when I remove any products from website. I'm dealing with 301 redirect method. But, Google take too much time to De-Index that specific page from web search. Can you add facility in Google webmaster tools where we can notify Google via CSV or any bulk data format?

Internet Services said...

When adding 301 redirects to multiple pages Google is very slow at realizing that a proper redirection is in place and continues to advise you on creating 404 not found pages.

Viktor Szépe said...

Can you help me what could be the problem? (DNS error)
http://snag.gy/0xZPw.jpg

Marcus Kool said...

There is still a big issue with 404 crawl errors: there are spam/bogus sites that refer to my website with a wrong URL and Google marks this as a crawl error. I think it is a must-have feature to report back to Google that the referer is wrong and not my flawless website.
E.g. http://klaipedarealestate.com/index.php?v=vpn-banking-software links to http://www.urlfilterdb.com/en/support/faq/https.html30 which is wrong (a link without the trailing "30" works").

Sumanth said...

Hello sir i have seen 26 url errors found to my site www.andhraworld.in and after fixing that it is still showing that 26 errors when these errors will be recovered...

R. Richard Hobbs said...

dealing with 100's of 404's of urls that never existed on my site to begin with is confusing and annoying. I have this nightmare that I am going to end up having to spec every line of code on my site about 1000 unique pages because Google bot cant tell the difference btn js, a url parameter, line of code or url. 'A' for effort but..., jus' sayin...

plastic-mart.com said...

I would love any input on my issue.

On my Ecommerce site in webmaster tools I have crawl errors showing product pages. But the urls its giving match products from another site on the server. How is this possible?

thank you

Xavi said...

Ours is saying that there are +2000 pages forbidden by robots.txt when robots is allowing all.
See... http://productforums.google.com/forum/#!mydiscussions/webmasters/HiYAZLXd3XA

camel7 said...

I have 382 URL errors. Which are correct errors (products do not exist in my webship anymore)

What should i do to remove them from my the google index?

Mark them as fixed? Or leave it like this?

Please advice.

VINOD YEOLE said...

If i redirect the crawl links to new links, whether my indexed links get non-indexed? I required that links too in future. Please help me out. I am going to fix the crawl links in webmaster tool. Is that ok???