And still more fixing drupal

Sometimes you find intermediate solutions that you think are fixing the problem, but actually aren't. My previous two posts on optimizing mysql tables, and modifying drupal's path handling, were not, as it turns out, actually the problem.

No, it wasn't that easy of course. I have another site I'm working on that I am using the latest version of drupal for, the newly released and much improved drupal 4.7. I noticed, when working on this site, that link handling was different. In drupal 4.6, all links are relative to the base site, so there was no difference between a relative and absolute link. In drupal 4.7, this wasn't the case, and relative URL's are actually relative to the current path.

Odd... I went looking for why this change was made, and came across this lengthy discussion on the problems of relative linking in drupal. As it turns out, the base href that drupal 4.6 uses (which tells browsers to format all links relative to the base of the site) is often ignored by search engine crawlers, even though it's been part of the HTML standard for about 10 years now. The upshot of this is that if you are using relative links in drupal 4.6 that reference the base site (in other words, you use relative links instead of absolute links because they pretty much work the same), the search engine crawler may ignore the base href and follow it as a relative link.

What happens is that if you are on a page '/somewhere/apage' and you have a link to 'nowhere/anotherpage', a browser will see the base href and make 'nowhere/anotherpage' a link relative to the base of the site. A search engine, however, will ignore the base href and instead format the link as relative to the current page, so it would link to '/somewhere/nowhere/anotherpage'. As the crawler follows these links, they can get deeply recursive and flood you site with infinite search engine requests.

Of course, this has been fixed in drupal 4.7, but that's a major update that will take time to prep, so I had to backport the changes to our installation of 4.6 and it seems to work well. The only downside is that I have to find all the relative links that take advantage of how drupal 4.6 works and fix them, but at least we can be indexed again.

Comments

Post new comment

  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options