Just over 6 years ago, I consolidated all my personal email accounts onto Google Apps back when the accounts were still free, however it wasn’t actually that long after that I switched up to a paid Google Apps for Business account – there were a number of useful benefits as part of the package, not just limited to the removal of advertising that plagued the interface. My view has always been that while free services are great, unless their is a sustainable business model behind it, don’t expect the service to always be around – especially important if you become heavily dependant on that service.
You might think that someone like Google are unlikely to drop free services? Take Google Reader as a prime example; an excellent online RSS reader, but dropped by the wayside in 2013, even though arguably it was the best service available at the time. Only in more recent years are there now services providing comparable equivalents, but you’ll notice the best ones are not 100% free. If you’re not paying for the product, you are the product.
Earlier this year I decided to move back into the Microsoft eco-system – the package of services within Microsoft Office 365 are hard to compete against. Even on a personal use level, for about £4 per month you get fully hosted email, calendar and contacts (Exchange), unlimited online file storage services (OneDrive for Business; aka Sharepoint) and a real-time communication tool (Lync/Skype for Business, which also provides a WebEx-like service). Believe it or not, take up of Office 365 has increased massively over the past 12 months, now exceeding Google Apps deployments, which many had thought would become the disrupter in this space to become the online office-suite ‘SaaS’ king, pushing Microsoft and it’s traditional software licensing model out of the market. However, Microsoft has adapted and is now unifying it’s approach across all platforms, not just providing software and services as first class citizens on Windows.
Once I was all setup, all said and done, the experience has been very good – the web interface is excellent, on my MacBook a new version of Outlook for Mac has been released which links into the same web interface, while with the acquisition of Acompli late last year, the now rebranded Outlook Mobile is now probably one of the best mobile email clients available. It’s not all been completely smooth sailing however, as migrating over all my historic data has brought about it’s own challenges, in particular the past 10+ years of email.
Migrating Email via IMAP
Within Office 365 there a Connected Accounts option which allows you to link to an existing (in this case, old) account, but for whatever reason it doesn’t support Google’s IMAP servers and so is effectively useless. Another option is to use the migration options available within Exchange’s administration interface, but again the file format of the CSV I created detailing my old account credentials were not accepted in any shape or form.
Googling (the irony) exposed many commercial software packages and chargeable services available to carry out a migration, but with some persistence I did come across a more hands-on approach, providing an ideal minimal cost option. Rick Sanders’ IMAP Tools are a set of perl scripts, one of which (imapcopy.pl) can be used to copy mail between two different IMAP servers. Rather than try and get them up and running on my MacBook, I decided to spin up an instance on DigitalOcean – this would take advantage of the high speed connectivity from the DigitalOcean data centre, but also not require my own MacBook to remain online and connected throughout the migration process.
There are a couple of steps that need to be completed before hand – first, within Google Apps, make sure that IMAP has been enabled, and if two factor authentication has been enabled an app password will also need to be generated. Similarly, if two factor authentication is enabled on Office 365, an app password needs to be generated there as well.
Next, create an account at DigitalOcean – use this link to get $10 free credit. Create a Droplet, assign a name (this can be anything), set to the smallest size ($5 per month) in the nearest region (in my case, London), and set the distribution to CentOS 7 x64 – no other options are required. It takes around 60 seconds to complete, at which point click on the ‘Console Access’ button. Type the user name as ‘root’, then check your email for a message from DigitalOcean, which contains the root password. When prompted, change the password, then you’ll have completed the login process.
There are a couple of dependancies to install, so type:
yum install -y perl openssl perl-IO-Socket-SSL screen
The IMAP Toolkit is no longer free, however the last free version of imapcopy.pl is still available on Google Code. This can be downloaded and prep’d with the following commands:
wget https://imaputils.googlecode.com/svn-history/r5/trunk/imapcopy.pl chmod 755 imapcopy.pl
Next, to be able to convert between Google and Office 365’s folder structure, a mapping file needs to be created using the following command – I wanted to move all mail into a single folder on Office 365 called ‘migrated’, but you can add additional mappings as required:
echo [Gmail]/All Mail:migrated >> map
Finally, to start the migration process use the following commands – this will use the screen command so that if you loose your connection to the DigitalOcean host, the migration will still carry on in the background uninterrupted. Be sure to replace your Google Apps & Office 365 email address and password where highlighted below – if you did change the mapping file above, you’ll need to add the comma separated additional folders to be migrated to the -m command:
screen time ./imapcopy.pl -S imap.gmail.com:993/<google email address>/<google apps password> -D outlook.office365.com:993/<office 365 email address>/<office 365 password> -v -U -d -I -z -M map -m "[Gmail]/All Mail" -L migration.log
Depending on the size of your mailbox, this may take a few hours to run. Once complete, the time command will display how long the migration took to complete. The logfile ‘migration.log’ will have full details of every action taken, including any errors encountered – the migration command can be re-run exactly as-is and it will pick-up where it last left off. If you do get disconnected at any point, reconnect back to your Droplet and login as root as before, then type the following to reconnect back to the migration script:
screen -D -r
On a few occasions I found that the copy hit a problematic email, which prevented the migration from proceeding – I was able to resolve this by opening up the log file using the following command:
Then pressing CTRL-W to initiate a search, then typing ‘BAD FETCH’. This would jump to the place in the log where the issue first occurred. Scrolling up a few lines would reveal the message ID (it looks like an email address) of the problematic email, which I then searched for in Google Apps with the search phrase ‘rfc822msgid:<message id>’. Once found, I deleted the email then restarted the migration script – fortunately, none of the problematic emails were important.
Checking for Duplicates
After the migration had completed, I noticed that a few disconnects/reconnects had taken place, so I decided to check that no duplicates had been created, just in case the migration had re-started from scratch on each reconnect. Another Google search later produced the answer with Quentin Stafford-Fraser’s IMAPdedup hosted on GitHub. This can be download and prep’d as follows:
wget https://raw.githubusercontent.com/quentinsf/IMAPdedup/master/imapdedup.py chmod 755 imapdedup.py
Similar to the migration script, however this time it is executed against only the folder copied over to Office 365 (i.e. ‘migrated’). I prefer to do this, rather than against the original source on Google Apps because if there are any issues / lost mail etc, as a roll-back I can re-copy over the mail from Google Apps. As an extra pre-caution, it’s initially run with a -n, which will carry out a dry-run, not actually deleting any mail, but also with -c and -m does a check-sum against the To, From, Subject, Date, Cc, Bcc and Message-ID fields as the safest method to check for duplicates.
./imapdedup.py -s outlook.office365.com -u <office 365 email address> -x -c -m -n migrated
A final report details how many messages would have been deleted as duplicates, if any. To proceed with deletion of the duplicates, re-run the command without the -n:
./imapdedup.py -s outlook.office365.com -u <office 365 email address> -x -c -m migrated
Once you’re happy that all your email has successfully migrated over, log back into the DigitalOcean control panel, and destroy the Droplet – the account will only be charged the hourly usage while the Droplet was running, deducted from the free $10 credit if you used the sign up link previously mentioned.