eMail obfuscation from several directions



It would be beneficial to have your eMail-Address on your homepage when you want to give people the opportunity to contact you. In my case, I actually want to give people that opportunity ((That does not go without saying: In some cases it is very important to discourage people from contacting you directly. For instance, it could be that you are usually not the right person to ask, but you get flooded with requests that you have to manually redirect elsewhere: It would be better if people got discouraged from contacting you while finding it very easy to contact the person that can actually help them.)). However, we all know that there are villains out there who harvest emails from websites in order to sell them to spammers. Hence, you would want to obfuscate your eMail-Address in such a way that it cannot be read from the HTML source code of the website easily. It goes without saying that naive obfuscation techniques like mymail(at)domain(dot)com are beaten more easily by machines than by people, and the more creative you get with your substitutions, the easier the machines can do it and the less easy your customers can. Images are the end of that line of thought, the point where a user has no chance at all other than to copy your eMail address letter by letter from an image which could still theoretically be OCR'd by a decent program. Therefore, Javascript methods to encode and on-the-fly decode the eMail address ((Famous enodings are ROT13, Base64, etcetera.)) seemed like the best choice to me for a long while. It is harder to run a Javascript interpreter than simple regular expressions for the email harvester, but to a user there is no difference: The browser already comes equipped with Javascript. At this point, some people get cranky that everything should work also without Javascript, and I can understand that. However, the main reason for me to abandon the Javascript method was simple necessity: I have to publish my eMail on the homepage of a university, which is managed from within a complex CMS, and the system would not allow me to use Javascript. This makes a lot of sense, you don't want every user of a large webportal to be able to stick whatever Javascript they like into the page, that stuff can make everything go to hell. This blog post is one among many advertising the capability of unicode to display letters from right to left. However, seriously, how difficult do you think it is for an email harvester to check for reversed email addresses as well? I would suppose it's more or less just a matter of checking for the reversed regular expression, too. Easy prey, if you ask me. But I had an idea! What if we would switch between right-to-left and left-to-right mode a couple of times?
import random, time, re
random.seed(time.time())

RTL = '‮'
LTR = '‭'

def distort(e_mail):
	e_mail = [ '&#x%04X;' % ord(letter) if not re.match("[A-Za-z0-9]",letter) 
			else letter for letter in reversed(e_mail) ]
	mode = random.getrandbits(len(e_mail)-1)
	default, opposite = LTR, RTL
	r = ""
	while e_mail:
		r += e_mail.pop()
		if mode & 1 == 1:
			r += opposite
			default, opposite = opposite, default
			e_mail.reverse()
		mode >>= 1
	if default == RTL: r += LTR
	return r
What does this do? It uses the unicode characters for right-to-left override and the one for left-to-right override to switch between directions. Imagine we have the string usr@dom.com. We first write the string usr from the left. Then, we write moc. from the right, meaning we write the letter m from the right, then the letter o from the right, etc. The current string is usr.com. Next, we write the string @do from the left, which will then be usr@do.com. Finally, we write the letter m from the right and we get usr@dom.com. Here is what the code did, it has a couple more changes in direction, but essentially it's the same concept:
>>> distort("usr@dom.com")
'u‮mo‭sr‮c.m‭@‮od‭'
Oh yeah, and I replaced non-alphanumeric characters such as @ by the unicode HTML entity as well, just for kicks. Go ahead, paste that into an HTML document. You can mark the address with the mouse (although it's a bit awkward). For example, this is my email address:
r‮e‭a‮d‭tt‮.i‭l‮er‭e‮f‭@nu‮re‭ll‮li‭t‮e‭
Unfortunately, I realized that while you can copy and paste the above, you will actually copy the mangled version with all the unicode direction-changing characters in it. In the end, I think a user would now actually have to copy it letter by letter as well. I think this is at least marginally harder for a spam crawler to process. The method also has its problems, at least in its current form. You better make sure that this eMail address does not have anything following it in the same line. However, I just mean this to be a proof of concept, someone can probably use CSS instead of those crazy unicode characters in order to make it work as an inline element. Maybe, some time, I will even do that myself. In fact, expect an update to this post.

3 Replies to “eMail obfuscation from several directions”

  1. Unfortunately, I just realized that while you can copy and paste the eMail-Adress, you will actually copy and paste the mangled version with all the unicode direction-changing characters in it. You can *not* copy and paste that into an email client. So, basically, I haven't really much improved over image-based methods and there are probably some CSS-based methods that are much better. Ah well, at least I wrote some funky Python code.

Leave a Reply

Your email address will not be published. Required fields are marked *