Archive

Author Archive

Archiving emails… the hard way

January 22, 2014 30 comments

Email storage can be a problem. Many email providers limit the storage size for a given user. The wrong way to handle email storage is to limit how long an email can be kept.

Unfortunately, this is something I have to deal with. For whatever reason, the powers that be decided people don’t need emails longer then three months, so emails older then three months will be automatically deleted. I find the whole situation comical, but that’s an entirely different conversation.

Since I use Outlook/Exchange for these emails, a normal person would recommend archiving my emails to a PST file on my local machine. Unfortunately, a group policy was pushed out that added the registry key “PSTDisableGrow” for Outlook. This prevents Outlook from adding emails to PST files, even if it’s stored locally.

So now I’m stuck in a position where I can’t automatically archive my emails without paying for a third party product. I need a way to automatically save all my emails as either MSG or EML files to my hard drive, so at least I have a copy.

There are a couple options that I’m exploring. Be warned that the solutions I’m going to talk about are TERRIBLE. They are very much hacks and something that I would completely avoid if I had a chance. I’m open to any suggestions and/or free products.

The first thing I tried was to create a new rule in Outlook for all incoming emails. This rule would execute a custom VB script that saves a copy of the email to my machine. Unfortunately, I couldn’t get it to work. I fumbled around with it for about an hour before I gave up.

The second option was to utilize Exchange Web Services (EWS). Newer versions of Exchange expose a SOAP web service for anyone to use. Most of the time, the location of the web service can be discovered by going to the address http://webmail.example.com/ews/exchange.asmx, where “example.com” is your domain. Microsoft provides a managed interface called Exchange Web Services Managed API that simplifies access. I was quite surprised at how easy it was to develop a simple solution.

var service = new ExchangeService(ExchangeVersion.Exchange2010_SP1)
{
	Credentials = new WebCredentials("user", "password"),
	Url = new Uri("https://webmail.example.com/ews/exchange.asmx")
};

Folder folder = Folder.Bind(service, WellKnownFolderName.Inbox);
FindItemsResults<Item> emails = folder.FindItems(new ItemView(Int32.MaxValue));

service.LoadPropertiesForItems(emails, new PropertySet(ItemSchema.MimeContent));

string archiveDirectory = Path.Combine(@"D:\EmailArchive", DateTime.Now.ToString("yyyy-MM"));

if (!Directory.Exists(archiveDirectory))
	Directory.CreateDirectory(archiveDirectory);

foreach (Item email in emails)
{
	string path = Path.Combine(archiveDirectory, email.StoreEntryId + ".eml");

	if (!File.Exists(path))
		File.WriteAllBytes(path, email.MimeContent.Content);
}

This code snippet basically downloads my entire inbox and saves it locally. It doesn’t get any easier than that. EWS also supports streaming, push, and pull notifications. This allows me to monitor any incoming/outgoing emails and immediately archive them. I could fall back to iterating over the entire inbox every few days to catch any emails I missed.

As much as I like how simple this solution is, I can’t depend on it. Unfortunately, EWS can be disabled by an Exchange administrator. Knowing how people have reacted before, this feature of Exchange will probably be disabled once they realize someone is using it.

The final option I’m currently exploring is to create an Outlook addin using VSTO. Unfortunately, it utilizes COM objects for nearly everything. I have very little experience with COM, so I ran into several issues.

Folder inbox = (Folder) Application.Session.GetDefaultFolder(OlDefaultFolders.olFolderInbox);
string archiveDirectory = Path.Combine(@"D:\EmailArchive", DateTime.Now.ToString("yyyy-MM"));

if (!Directory.Exists(archiveDirectory))
	Directory.CreateDirectory(archiveDirectory);

foreach (object item in inbox.Items)
{
	var email = item as MailItem;

	if (email == null)
		continue;

	string path = Path.Combine(archiveDirectory, email.EntryID + ".msg");

	if (!File.Exists(path))
		email.SaveAs(path);
}

There are several things wrong with the code above. Since nearly everything is a COM object, I need to release them after I’m done. The first time I ran this, it worked for the first few hundred emails. At the 300 mark, I received the exception “Your server administrator has limited the number of items you can open simultaneously.” In the code above, each iteration of the loop would reference a new COM object. After a couple hundred iterations of the loop, it would fail because I never released any of them.

This article mentions a pretty good guideline.

1 dot good, 2 dots bad

This means I need to pay special attention to property chaining. For example:

Folder inbox = (Folder) Application.Session.GetDefaultFolder(OlDefaultFolders.olFolderInbox);

// Bad
inbox.Items.ItemAdd += OnItemAdd;

// Good
Items inboxItems = inbox.Items;
inboxItems.ItemAdd += OnItemAdd;

Since I need to release the COM objects in the opposite order of creation, I used a stack to keep track of all my references.

Stack<object> comObjects = new Stack<object>();

Folder inbox = (Folder) Application.Session.GetDefaultFolder(OlDefaultFolders.olFolderInbox);
comObjects.Push(inbox);

Items inboxItems = inbox.Items;
comObjects.Push(inboxItems);

Folder sent = (Folder) Application.Session.GetDefaultFolder(OlDefaultFolders.olFolderSentMail);
comObjects.Push(sent);

Items sentItems = sent.Items;
comObjects.Push(sentItems);

// 
// Do something
//

while (comObjects.Count != 0)
{
	object obj = comObjects.Pop();

	if (obj != null)
		Marshal.ReleaseComObject(obj);
}

While iterating through the Items collection using a for loop, I would immediately receive an exception saying that the array was out-of-bounds. Like any C# developer, I start the iterating arrays at index 0. However, the Items collection starts at index 1. MSDN documents it here.

The items collection contains an event that fires for each new email that is received. I can utilize this event for both the inbox and sent folders to immediately archive new emails. There is caveat to using this event that is mentioned in this article. Whenever 16 or more items are added at the same time, this event does not fire. I don’t need to worry about this limitation most of the time, but I would still need to fall back to iterating over the entire inbox every once in a while to make sure all the emails have been saved.

Creating an Outlook addin isn’t as simple as using the EWS Managed API. With the restrictions in place, it seems this is the only option I currently have. Even after I save all the emails to my local machine, I still need to create a separate service to parse and index all the emails.

A few people have started migrating all their old emails into OneNote by highlighting their entire inbox and pressing one button. If things get too complicated, I might have to fall back to this.

This is a lot of work for a simple problem. The wrong way to handle email storage is to limit how long an email can be kept.

Levenshtein distance

December 27, 2013 15 comments

Imagine a scenario where a single script is deployed to several hundred different locations. Due to various constraints, this script cannot be centralized, so making a change means I’ll need to deploy it to several hundred locations.

But it gets worse. Some of these scripts are customized and include special logic, so I cannot blindly copy the updated script to all locations. In addition to that, most of the existing scripts contain comments such as:

#
# This script was created on 1/1/1970 by John Doe.
#

If these scripts didn’t include their own unique comments, I could have compared the file sizes or generated SHA1 hashes for each script to see which were identical and which contained their own special logic. Since each script contains their own unique comments, generating hashes would mean a different hash for each script.

Instead of reviewing each script individually, I can use the Levenshtein distance to determine how similar the target script is compared to my updated script.

According to Wikipedia, the Levenshtein distance is:

… a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertion, deletion, substitution) required to change one word into the other.

Sorting each script by the Levenshtein distance gives me a good indication of which scripts I can safely copy over and which I need to review manually. Overwriting scripts with a Levenshtein distance close to zero gives me reasonable assurance that I won’t break anything. While it’s not a bullet proof solution, it’s better then reviewing hundreds of scripts manually.

Loading animations using pure CSS

December 26, 2013 3 comments

With the advent of CSS animations, it’s quite easy to create a loading animation using just CSS. Loading animations have traditionally been done using an animated gif. Using CSS animations only requires a single div element and a few lines of CSS:

#loading-image
{
	width: 25px;
	height: 25px;
	border-width: 8px;
	border-style: solid;
	border-color: #000;
	border-right-color: transparent;
	border-radius: 50%;
	animation-name: loading;
	animation-duration: 1s;
	animation-timing-function: linear;
	animation-iteration-count: infinite;
}

@keyframes loading
{
	0% { transform: rotate(0deg); }
	100%   { transform: rotate(360deg); }
}

Here is the result in JSFiddle.

I recently participated in a code review for a website and instead of using animated images, a developer decided to use CSS animations. While this is neat, I believe it’s a mistake to use this on a customer facing website. Perhaps my opinion will change in five years, but there are still too many people using older browsers that don’t support CSS animations.

Using pure CSS does have some merit. For example, a page might load a tiny bit faster because there is one less image it has to download, which reduces the number of http requests and the size of the page. You can also use LESS to dynamically change the animation color to match customer defined themes, background colors, etc….

While there are some reasons to use CSS animations, there are more reasons not to. The most important reason against using CSS animations at this time is avoiding unnecessary complexity. If you decide to use CSS animations in customer facing websites, you’ll still need to include a fallback method for browsers that don’t support it. I don’t see any reason to complicate things when animated gifs work perfectly fine.

Unless a website is highly dynamic with ever changing colors, I don’t see a reason to use CSS animations for loading images. Again, my opinion might change in five years when more browsers support CSS animations.

Enterprise Development with NServiceBus

December 12, 2013 15 comments

I’m in Minneapolis Minnesota this week to attend a course on NServiceBus.  Over the next couple weeks, I intend to post about my experiences and impressions as I start developing a system from the ground up.  Most of the posts won’t be interesting to veteran NServiceBus developers since they’ll be “hello world” type posts for various features.  These posts will mainly be notes for myself to remember what I’ve done and what features are available.  I’m actually quite eager to start using what I’ve learned this week.

Rebuilding my computer

September 25, 2013 4 comments

There was a storm last Thursday that knocked out the power to my house. Unfortunately, my computer was a casualty. My computer and most of the peripherals were plugged into the surge protector (UPS), but the coaxial cable connected to my cable modem was not. The cable modem, router, and part of the motherboard were fried. I was able to boot up the computer, but it did not recognize several of the devices connected. This computer is nearly five years old and I was already thinking about replacing it.

I previously ordered all the components from Newegg. However, I could pick up the new Core i7 processor at a local Microcenter for $60 cheaper than what Amazon/Newegg were selling for. In addition to that, I was surprised that the other components were competitively priced or cheaper than Amazon/Newegg. This was the first time I’ve shopped at Microcenter and I’m very impressed. Everything I wanted was in-stock at the store. Here’s the list of components I picked up this past weekend:

All the components were purchased at Microcenter. The total came out to be $50 more than what I would have paid for if I ordered from Amazon/Newegg. I just pulled my MSI GTX 660 Ti video card from my old computer to use.

The Corsair Obsidian 650D is one of the best cases I’ve worked with. There are several cutouts for cable routing and about an inch of space for cable management. There are also two removable dust filters, one for the front fan intake and one for the power supply intake below the case. I also use the opening at the top of the case as an intake, but it doesn’t come with a dust filter. The only real negative about the case is that it doesn’t come with USB 3.0 headers for the front ports. Instead, it comes with a USB extension that I have to route to the back of the case, which uses up two of the back ports.

The case also has removable hard drive cages. Since I only have two hard drives, I removed the top cage to provide more airflow from the front 200mm fan. The hard drive trays include screw holes for 2.5in drives, so I didn’t need to purchase an additional adapter for the SSD.

The CPU cooler I purchased is a closed loop water cooler. This type of cooler is great when I need to move my computer since there isn’t 2 lbs of metal straining my motherboard when compared to an air cooler. The cooler uses two 120mm fans attached to a radiator. Corsair recommends positioning the fans as intake for the radiator, so it pulls cooler air from outside the case.

This is also the first motherboard I’ve purchased that utilizes UEFI instead of BIOS. It’s much nicer to look at and easier to navigate with a mouse.

From what I’ve read online, this processor does tend to run on the hotter side. At stock speeds, the processor idles at 30C and peaks at 60C under load. Without increasing voltage, I increased the CPU multiplier to 40x for 4.0 GHz. With the slight overclock, the temperatures didn’t change. I tried increasing the voltage to 1.2v and the CPU multiplier to 45x for 4.5 GHz, but the temperature soared to 80C under load. I’m not comfortable running at those temperatures, so I may experiment some more this weekend. I used the stock TIM that was already applied to the CPU cooler, so I may remove that and apply some Arctic Silver and re-seat the cooler to see it makes a difference.

Creating new application domains while running a unit test

July 30, 2013 3 comments

While using MSTest to test some code that creates new application domains, I kept running into an exception while using both Visual Studio’s test runner and ReSharper’s test runner. This code works under normal circumstances, but fails to run during the execution of a unit test.

[TestMethod]
public void Test1()
{
    AppDomain domain = AppDomain.CreateDomain("Test");
    domain.DoCallBack(() => Console.WriteLine("Hello world"));
}

Line 5 would throw the following exception:

Test method TestProject.DemoTest.Test1 threw exception: 
System.IO.FileNotFoundException: Could not load file or assembly 'TestProject, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null' or one of its dependencies. The system cannot find the file specified.

After some trial and error, I discovered the newly created application domain uses a different application base.

AppDomain domain = AppDomain.CreateDomain("Test");

Console.WriteLine(AppDomain.CurrentDomain.SetupInformation.ApplicationBase);
Console.WriteLine(domain.SetupInformation.ApplicationBase);

Running the code above during the execution of a normal application (ie console application) will print out the same location. In my case, it would be "D:\projects\DemoCode\bin\Debug". But when this code is executed in the context of a test runner, the newly created application domain will print "C:\Program Files (x86)\JetBrains\ReSharper\v6.1\Bin\" while using ReSharper’s test runner or "C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE\" while using Visual Studio’s test runner.

To make sure both application domains use the same application base, I had to modify how I created new application domains.

AppDomain domain = AppDomain.CreateDomain(
	"Test",
	new Evidence(AppDomain.CurrentDomain.Evidence),
	new AppDomainSetup
	{
		ApplicationBase = AppDomain.CurrentDomain.SetupInformation.ApplicationBase,
	});

Syncing TFS with a local directory

June 19, 2013 42 comments

I started learning PowerShell in order to automate some of the more tedious activities in TFS. For example, I needed to treat a local directory as the source of truth instead of the version in source control. I came across this blog post that provides a script that seemed to do exactly what I needed. The first time I executed this script on a directory with over 10,000 sub-directories and files, it took 10 minutes to finish.

Here is phase 1 of the script:

# Phase 1: add all local files into TFS which aren't under source control yet
$items = Get-ChildItem -Recurse 

foreach($item in $items) {
	$localItem = $item.FullName
	$serverItem = Get-TfsChildItem -Item "$localItem"
	
	if (!$serverItem -and !($pendingAdds -contains $localItem)) {
		# if there's no server item AND there's no a pending Add
		write "No such item as '$localItem' on the server, adding"
		Add-TfsPendingChange -Add "$localItem"
	}
}

With over 10,000 files in the local directory, calling Get-TfsChildItem for each item wasn’t time efficient. Running phase 1 took approximately 3 minutes.

Looking at the help page for Get-TfsChildItem, we may pass an array QualifiedItemSpec[] instead of a single item. Instead of calling Get-TfsChildItem thousands of time, we can call it just once.

# Phase 1: add all local files into TFS which aren't under source control yet
$items = Get-ChildItem -Recurse | % { $_.FullName }

$serverItems = Get-TfsChildItem -Item $items

foreach($serverItem in $serverItems){
   if (!$serverItem -and !($pendingAdds -contains $localItem)) {
      write "No such item as '$localItem' on the server"
   }
}

Running the updated script still took approximately 2 minutes. It isn’t much of an improvement, but it is better nonetheless. Phase 2 of the script has the same problem.

# Phase 2: delete all subfolder/files in TFS if there's no local subfolder/file for them anymore, and check out other items
$items = Get-TfsChildItem -Recurse

foreach($item in $items) {
   $serverItem = $item.ServerItem
   $localItem = Get-TfsItemProperty -Item $serverItem 
   
   # Do other stuff...
}

For the entire collection of files found in TFS, the script will iterate through and call Get-TfsItemProperty for each file. Running phase 2 took approximately 6 minutes. Again, we can update the script to call Get-TfsItemProperty once by passing it an array.

# Phase 2: delete all subfolder/files in TFS if there's no local subfolder/file for them anymore, and check out other items
$items = Get-TfsChildItem -Recurse
$itemProperties = Get-TfsItemProperty -Item $items

foreach($item in $itemProperties) {
   $serverItem = $item.SourceServerItem
   $localItem = $item.LocalItem
   
   # Do other stuff...
}

Running the updated script still took approximately 3 minutes. Better, but still slow. After reading the help page for each cmdlet more carefully, I finally noticed that most of these cmdlets offer the -Recurse switch. Instead of iterating through each of the files in the local workspace and server, we can use the top directory along with the -Recurse switch.

$localItems = Get-ChildItem -Recurse | % { $_.FullName }
$serverItemProperties = Get-TfsItemProperty -Item . -Recurse
$serverItems = $serverItemProperties | % { $_.LocalItem }

# Phase 1: add all local files into TFS which aren't under source control yet
foreach ($item in $localItems) {
   if (!($serverItems -contains $item) -and !($pendingAdds -contains $item)) {
      write "No such item as '$localItem' on the server, adding"
      Add-TfsPendingChange -Add "$item"
   }  
}

# Phase 2: delete all subfolder/files in TFS if there's no local subfolder/file for them anymore, and check out
foreach($item in $serverItemProperties) {  
	$serverItem = $item.SourceServerItem
	$localItem = $item.LocalItem
	
	# Do other stuff...
}

The entire script runs in just 16 seconds instead of the original 10 minutes.