Search Engines and Readers fields, un-publishing web docs

I want to be able to un-publish documents from my web server, am I correct to assume that if I run an agent that removes Anonymous from all Readers fields in certain documents, that web search engines will not be able to index the unpublished documents (even though there are keyword meta tags in the documents)? Or is there another berrer or easier way to un-publish documents so that search engines can’t pick them up? I don’t really want to add meta robot tags. Thanks.

Subject: Search Engines and Readers fields, un-publishing web docs

Requiring a login to access the documents would prevent search engines from picking up any new documents. There are two problems, though.

First, “removes Anonymous from Readers fields” doesn’t give enough information for me to determine whether Anonymous users would still have access to the documents; it depends whether the Readers fields are now blank, or whether they contain other entries. Frankly, writing Anonymous into a Readers field is kind of weird. If you want to give everyone read access, use “*” or leave the field blank, or just don’t have a Readers field. Otherwise you could run into a situation where by logging in, users can see fewer documents than by not logging in (because their username is no longer “Anonymous”).

The second problem is that some search engines – Google for instance – cache everything they index. Therefore, even after you remove access to the original document, web users can see the content it used to have. I don’t believe there’s any way you can force the search service to get rid of that cache – though I suppose you could write and ask them to do it.

Alternately, you could leave those documents in place with read access for all, but delete their contents. When the search service next indexes the page, it’ll erase the old information in the cache and replace it with new, blank information.

You can also put in Meta tags – I forget which ones they are – that tell indexing engines to not index these documents. This makes them still available without login, but a web search won’t find them – assuming the indexing robot respoects that setting. I also don’t know what effect this will have on already cached pages.