posted by dave on Monday, April 11, 2005 at 11:19 PM in category technology, website

I'm having some trouble with 'bots again.

This time, however, it's my fault.

Basically, the 'bots are indexing the pages that contain dynamic 'blog entries.

For example, Google might index my index2.shtml page and note that it contains the word "Freeze" - but once I type a few more entries, the one about Polly's Freeze is no longer displayed on index2.shtml because it's not one of the ten newest entries anymore.

This means that somebody can Google the words "double poo-poo" and get led to my main page, but when they get there those words are nowhere on that page. That's just too much disappointment for me to want to take responsibility for. I mean, when you want to read about double poo-poo you just shouldn't have to wait.

What I need to do is have the 'bots follow the links on pages like index2.shtml, but not index those pages themselves.

That way the links to the single entries, like this one, are followed, and only those (static) single-entry pages are indexed.

So here's what I've done:

1. I put this line into the code for my non-static pages:

<meta name="robots" content="noarchive,noindex,follow">

This tells the 'bots that honor this type of line to follow any links found on the page, but not to index the page itself.

2. I put this line on all of my single-entry pages:

<meta name="robots" content="index,nofollow">

This does the exact opposite - it tells the 'bots that it's okay to index the page but not to follow any further links. These pages are a dead-end, in other words.

Of course these modifications only work if the 'bots are well-behaved. The ones that aren't I try to take care of with my robots.txt and .htaccess files as described in this old entry.

The whole thing would make Rube Goldberg proud.

I really need to simplify my 'blog configurations when I do my next site redesign. Until then I'll probably just do some minor tweaks like the one I made tonight.

post a comment

If you haven't left a comment here before, you may need to be approved before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.

I'll pretty much approve anything except SPAM comments, or comments that clearly have no purpose except to piss me off, or comments that are insulting to a previous commenter.

Use anything you want for your name and email address. I think it has to at least look like a valid email address though.

mysterious gray box mysterious blue box mysterious red box mysterious green box mysterious gold box

search main 'blog

Year

Month

Category

Author

Search word(s)
   help me!

blog favorites

searching
awakening
the convenience of grief
apology
merrily, merrily, merrily, merrily
paradise
nothing personal
the one
dream sweet dreams for me
the willow bends and so do i
on bloodied ground
r.i.p.
lack of inertia
gray
thinning the herd
or maybe not
here's looking at you
what i miss
peril
who wants to play?
feverish thoughts
the devil inside?
perseverance
my cat ate my homework
don't say i didn't warn you
forgiveness
my god, it's full of stars
hold on a second, koko, i'm writing something
you know?
apples and oranges
happy new year
pissing on the inside
ramblings
remembering dad


Creative Commons License
This work is licensed under a Creative Commons License.