After a couple hours, I’ve tracked down and fixed a bug I was having with some of our WordPress plugins. I believe that there are a few people out there having the same problem. I think there may be another solution online, but it is one of those issues that is difficult to pare down to a good search query.
Anyway here is a solution for “404 issues with plugin pages” or “lynx shows a 404, but the page still loads”, or “Google Webtools says there is a 404, but I can get to the page”, or “setting status to 200:OK still results in 404″, or “I get a 404 in IE, but refreshing the page brings it up”, or “I get random 404 errors in IE”, or “I’m getting an HTTP/1.1 404 Not Found error but the page still loads”.
You may skip ahead to the Final Solution code, but it is probably a good idea to read everything below to make sure that you are indeed having the same issue I had… and that this will actually fix your problem.
The Context
I have some WordPress plguins (Stranger Products, Stranger Events) that generate pages outside of the core WordPress system (i.e. they are not “wordpress pages” in the WP DB, they are web pages generated by our plugin script). To serve these pages, I add a bunch of rules to the .htaccess file to redirect stuff like /products/1/ to a product info page.
Some gallery plugins or other plugins that generate new pages may have a similar setup/issue.
The Problem
While the mod rewrite works fine, and the page loads fine, WordPress doesn’t find a WP page or post for the query string and so sends a “HTTP/1.1 404 File Not Found” status in the header. Most web browsers will ignore this and show the content that comes after the header. It seems that IE will sometimes choke on this status, and other times show the page. Funny IE!
Google’s crawler however will not crawl that page and will let you know in a web toolkit report. Also, I noticed that the lynx command line browser for Linux would show the 404 error and then load the page.
The big issue here is that Google is not going to crawl our page.
The Fix
I spent a lot of time tracking down where in the WordPress code the 404 status is set. Ideally, there would be a plugin “hook” near this that we could use to prevent the 404 status from reaching the browser.
The function that makes the 404 decision is handle_404(), which can be found in the /wp-includes/classes.php file. Here is the code (for WordPress 2.8.4, similar for previous versions I looked at too):
459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 |
function handle_404() { global $wp_query; if ( (0 == count($wp_query->posts)) && !is_404() && !is_search() && ( $this->did_permalink || (!empty($_SERVER['QUERY_STRING']) && (false === strpos($_SERVER['REQUEST_URI'], '?'))) ) ) { // Don't 404 for these queries if they matched an object. if ( ( is_tag() || is_category() || is_author() ) && $wp_query->get_queried_object() ) { if ( !is_404() ) status_header( 200 ); return; } $wp_query->set_404(); status_header( 404 ); nocache_headers(); } elseif ( !is_404() ) { status_header( 200 ); } } |
This would be the ideal place to say, “Hey, don’t 404 this page”, but there is no hook in here. I tried setting the $wp->did_permalink flag to FALSE, which worked sometimes, but sometimes WordPress would write that back to TRUE after I reset it. And I’m not even sure what that flag is doing; so playing with it might cause some bugs elsewhere.
The next place to check is the status_header() function called by handle_404. This function is found in the /wp-includes/functions.php file. Here is the code:
1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 |
function status_header( $header ) { $text = get_status_header_desc( $header ); if ( empty( $text ) ) return false; $protocol = $_SERVER["SERVER_PROTOCOL"]; if ( 'HTTP/1.1' != $protocol && 'HTTP/1.0' != $protocol ) $protocol = 'HTTP/1.0'; $status_header = "$protocol $header $text"; if ( function_exists( 'apply_filters' ) ) $status_header = apply_filters( 'status_header', $status_header, $header, $text, $protocol ); return @header( $status_header, true, $header ); } |
Tada! This function uses the “status_header” hook/filter before updating the header. So we can create a function in our plugin to check the status for a 404 and then return false/NULL if we know that there really is a page to load. Here’s how I did it.
The Final Solution
In PHP code for my pages, I created a global variable called $isapage and set it to true. So at the very top of any page that is giving the 404 errors, add this code:
global $isapage; $isapage = true;
Now I add the following function and filter to my plugin code:
//this function checks if we have set the $isapage variable, and if so prevents WP from sending a 404 function ssp_status_filter($s) { global $isapage; if($isapage && strpos($s, "404")) return false; //don't send the 404 else return $s; } add_filter('status_header', 'ssp_status_filter');
I hope this helps some people out there.
Feel free to critique this solution. Let me know if I missed something or if there are better ways to do this.
Feel free to post related issues. I may have found solutions to those along the way… or maybe a commenter can help you out.