eMail obfuscation from several directions
It would be beneficial to have your eMail-Address on your homepage when you want to give people the opportunity to contact you. In my case, I actually want to give people that opportunity. However, we all know that there are villains out there who harvest emails from websites in order to sell them to spammers. Hence, you would want to obfuscate your eMail-Address in such a way that it cannot be read from the HTML source code of the website easily. It goes without saying that naive obfuscation techniques like mymail(at)domain(dot)com are beaten more easily by machines than by people, and the more creative you get with your substitutions, the easier the machines can do it and the less easy your customers can. Images are the end of that line of thought, the point where a user has no chance at all other than to copy your eMail address letter by letter from an image which could still theoretically be OCR'd by a decent program.
This blog post is one among many advertising the capability of unicode to display letters from right to left. However, seriously, how difficult do you think it is for an email harvester to check for reversed email addresses as well? I would suppose it's more or less just a matter of checking for the reversed regular expression, too. Easy prey, if you ask me. But I had an idea!
What if we would switch between right-to-left and left-to-right mode a couple of times?
RTL = '‮'
LTR = '‭'
e_mail = [ '&#x%04X;' % ord(letter) if not re.match("[A-Za-z0-9]",letter)
else letter for letter in reversed(e_mail) ]
mode = random.getrandbits(len(e_mail)-1)
default, opposite = LTR, RTL
r = ""
r += e_mail.pop()
if mode & 1 == 1:
r += opposite
default, opposite = opposite, default
mode >>= 1
if default == RTL: r += LTR
What does this do? It uses the unicode characters for right-to-left override and the one for left-to-right override to switch between directions. Imagine we have the string email@example.com. We first write the string usr from the left. Then, we write moc. from the right, meaning we write the letter m from the right, then the letter o from the right, etc. The current string is usr.com. Next, we write the string @do from the left, which will then be firstname.lastname@example.org. Finally, we write the letter m from the right and we get email@example.com. Here is what the code did, it has a couple more changes in direction, but essentially it's the same concept:
Oh yeah, and I replaced non-alphanumeric characters such as @ by the unicode HTML entity as well, just for kicks. Go ahead, paste that into an HTML document. You can mark the address with the mouse (although it's a bit awkward). For example, this is my email address:
Unfortunately, I realized that while you can copy and paste the above, you will actually copy the mangled version with all the unicode direction-changing characters in it. In the end, I think a user would now actually have to copy it letter by letter as well.
I think this is at least marginally harder for a spam crawler to process. The method also has its problems, at least in its current form. You better make sure that this eMail address does not have anything following it in the same line. However, I just mean this to be a proof of concept, someone can probably use CSS instead of those crazy unicode characters in order to make it work as an inline element. Maybe, some time, I will even do that myself. In fact, expect an update to this post.
- That does not go without saying: In some cases it is very important to discourage people from contacting you directly. For instance, it could be that you are usually not the right person to ask, but you get flooded with requests that you have to manually redirect elsewhere: It would be better if people got discouraged from contacting you while finding it very easy to contact the person that can actually help them. [↩]
- Famous enodings are ROT13, Base64, etcetera. [↩]