Liverpool street station, London

Random notes

of Naveen Agnihotri

General notes on Apache

Apache sucks. That's my bottom-line observation after having used Apache for two years, then AOLserver for four years, and now coming back to Apache.

AOLserver has a much more intuitive API and much better documentation. The user community is much smaller, but ther people involved are active and it doesn't take too long to get questions answered. The only reason I'm not using AOLserver right now is because it doesn't run on my machine, which runs debian on a HPPA processor.

When I started using Apache, I was impressed with how comprehensive the documentation looked, until I tried to do something that wasn't in one of the examples. Here's how it went:

Apache is made up of a bunch of different modules, which are all documented separately, and if you're trying to make them do something together, then it may or may not work right out of the box. Especially if you're trying to do something that's not directly in the examples. For example, see my experience with the mod_perl script for sending mail below.

A mod_perl script for sending mail

I wanted to write a simple script to send me mail. It turned out to be a lot harder than I expected:

Hiding my email address from web crawlers

My email address has been on the web since 1994. I receive between 30 and 80 junk emails every day. In August 2003, I finally decided to hide my email address from web crawlers, which is how junk mailers get the addresses on their lists. Here is my strategy: the HTML for the email address has mixed-in ISO characters, so that it looks the same to a human being looking at the page in a browser, but a web crawler (or a human being looking at the source of the web page) cannot parse it. And the email address, instead of being a usual mailto: link, is a link to a web form where you can send me email (see above).

For example, instead of the usual HTML that would read:
<a href=""><em></em></a>
I would like to generate HTML that reads:
<a href="/cgi-bin/"><em>me&#64;h&#101;re&#46;c&#111;m</em></a>
The HTML of the email address in the second line reads the same in a web browser as the first line, but cannot be parsed by a web crawler (unless it is enormously sophisticated, and in which case we'll have to think of something else).

Here's how I've implemented this scheme: