Obfuscating my e-mail address for a PDF

Like 99% of the planet I hate junk email (a.k.a. spam). In the past I had to give up on one e-mail account as I received almost 1,000 junk e-mails per day. I’m very protective of my e-mail address and I refuse to put it out on the Internet. I’d like to recount a tale of woe and PDFs and an experiment I’m about to undertake in obfuscating e-mail addresses.

I was very proud that my work e-mail address was never put out onto the Internet. At every turn I pressured colleagues in the web team for a forms based submission system with CAPTCHA to keep work e-mail addresses out of sight from e-mail address harvesters. Unfortunately many people’s addresses were already out there and it was simply too late for them. My address though was safe and sound … or so I thought.

The History

Several months ago I started receiving junk mail. At first I thought I must have accidentally signed up for something without thinking, but after a week or two the amount of junk mail increased and I decided to investigate. A quick Google search later and I tracked down the culprit. Someone had included my e-mail address in a PDF they had posted up to our corporate web site. I was not amused, but the damage was done and I resigned myself to mail filter tweaking and using the delete key even more frequently in my mailbox.

The Problem

I now find myself in a position where I actually need to put a PDF on the Internet and include a personal e-mail address. The address is associated with an account that has little to no junk mail protection and I don’t have the resources to put in place effective junk mail filtering. For now I have to rely on ‘security by obscurity’ - the least effective form of protection, but currently my only viable option.

At first I considered including a link in the PDF to a contact form and simply follow my own sensible advice. However I need to make the e-mail address as obvious and accessible as possible. I something to make the e-mail address readable and yet not harvestable.

Option 1 - Images

The first solution that came to mind was to create an image of the text of my e-mail address and insert this into the Word document I was using to create the PDF. The image would then be readable but inaccessible to any harvester without OCR capabilities. This I decided wasn’t what I wanted. I hadn’t planned to make the address a clickable mailto link, but I wanted it to display nicely at any zoom level and ideally be accessible to screen readers.

Option 2 - Spamblock

The second solution I came up with was human readable corruption of the address. So for example using something like user[at]example.com or _no_spam_user@example.com. Both of these however rely upon the user understanding what they have to do and that’s something that can’t be guaranteed. Again this wasn’t something I thought would work for my situation.

Option 3 - Split cells

Finally I hit upon an idea as I tried to think through how I would write a harvester. The e-mail harvester is in essence a pattern matcher looking to find something on a web resource that matches an e-mail address format (e.g. “\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b”). So ideally what would be useful to do is to somehow disrupt the pattern matching, but maintain the visual look of the email address. My idea is to use table cells to split the e-mail address.

By creating a table with one row and two columns (i.e. two consecutive cells), we can put half of the e-mail address into the first cell and half of the e-mail address into the second cell. I’ve chosen to put everything up to and including the @ symbol into the first cell and the domain into the second cell. I then set the text alignment in the first cell to right alignment and the second cell to left alignment (the default). Finally to make it visually better I removed the borders of the table and within the options for the table’s properties I set the cell margins to all be zero.

When saved to PDF the text appears seamless, but if you copy and paste it into a rich text editor (like Word), you can see the cells of the table are still in place and splitting the address in two.

Conclusion

In my particular PDFs to aid with formatting I’m embedding my e-mail address table within the cell of another table and there’s some other text in-line with the address. I’m also using a special space character (non-breaking space / EN space / EM space) to further try and confuse the harvesters.

The idea isn’t proven to work at this point, but I’m going to give it a try and see how things pan out. My main hope is that Google’s indexing isn’t clever enough to strip the table cells out when caching the page

  • a harvester could presumably then use that to scrape the e-mail address. So wish me well with my experiment and if you have any other creative ways to obfuscate an e-mail address in a PDF or for another file format leave me a comment below.
Author: Stephen Millard
Tags: | e-mail |

Buy me a coffeeBuy me a coffee



Related posts that you may also like to read