Building a Data Import Tool with Azure WebJobs

Recently we built a new site for a client and hosted it on Microsoft Azure. I’ve been very happy with the ease of deployment and scalability that Azure provides. One critial piece to this new application was creating an automated tool that would import data from their primary mainframe system into our new web application. Normally I would build a simple command line tool and install a windows service to handle such a task. However, I didn’t want the overhead and upkeep of running a server in the cloud. Fortunately, thanks to a new Azure feature called Azure WebJobs, I was able to easily create a solution to handle this requirement.


Azure WebJobs provides exactly what you would need in the scenario I described above.

The Windows Azure WebJobs SDK is a framework that simplifies the task of adding background processing to Windows Azure Web Sites.

Simply put, you create a simple console based app, zip and publish it, then schedule how frequently you want it to run. You can also have it continuously running, which is helpful if you are monitoring a folder or waiting for an event to occur (as in our case). As an added bonus, the WebJobs SDK provides a set of data annotations which serve as hooks into other Azure services, such as container storage. Previously you had to do a bit of grunt work to setup listeners for these resouces, now they are one line entries, including output.


After installing the Azure WebJobs SDK in your project (see this link/a> on how to get started), you can create a new console application and get started. The core of my import program looks like this:

Let’s break this code down into it’s core components:

File System Watcher (line 23) – Here we setup a file system watcher to check a specific location(s) for a file to be uploaded. This file system watcher is checking a directory on our website (more on this in a minute).

Upload Method (line 92) – This is an example of a data annotation. We read the file specified into a stream and simply copy it to our output stream, which is hooked up to our Azure storage blob. It couldn’t be simpler!

Process Import File (line 50) – Here is another data annotation that serves as a listener in our Azure container. By setting up an input and output parameter with a machine name parameter, Azure will check for new files in our input location that don’t have a matching file in our output location (preventing duplicate processing of the same file if the process restarts itself). When this occurs it calls the method in question. We pass our output TextWriter (for logging) into our ImportData method that does all the heavy lifting of parsing the data into the database. Once a new file is detected in the blob the process fires off again, allowing us to process multiple files simultaneously if needed. That’s all there is to it! No complex configurations necessary.

Installing Your WebJob

When you’re ready to install and run your WebJob, compile your application and zip up the output binary folder (bin/Debug or bin/Release depending on your configuration). Log in to your Azure Control panel, select your Website, and select the “WebJobs” tab. Click the “Add” button at the bottom of the page and enter the details specified. WebJobAddScreen Click the check box and shortly your job will be up and running! There will also be a link so you can see the output from the WebJob itself.

Checking Output

What happens when you want to track the result of an import? You can always log in to the Azure control panel, navigate to your containers and view files that way, but that can be cumbersome at times. Since I’m outputting my import process into a simple text file, I can build an import viewer into my web application that reads the content of a blob item in a few lines of code:

That’s all you need to do! You can pass the log text to your view, or wherever you need it to view it’s contents. What you choose to log is up to you. For me, I did a line by line record of a unique identifier, a success/fail message, and the basic contents of the exception of the record failed to import. That way I had most of the pieces I needed to debug the issue. If that wasn’t enough I have access to the import file itself and can look further.

Why a File System Watcher?

You may have noticed that there seems to be an “extra step” to this import tool. Why have a file system watcher on a folder in the website when I could upload directly to the container and let the WebJob immediately process the file? There were a few mitigating factors in this:

Technology Support – In order to upload to the blob container directly, I would have needed to create some kind of tool on the mainframe computer and would need the .Net libraries to do so, which wasn’t available.

Security / Complexity – By default, Azure containers are not publically accessible. You need to setup additional authorization to access the content in the container, which some apps (like CloudBerry Explorer) already use. Our mainframe source server couldn’t leverage these tools. In addition, there are some WorkerRoles you can install via Azure that create an FTP server for your storage container, but it added “yet another service” to the system and I didn’t want too many parts in place to provide points of failure for this system. Since Azure websites allow you provide standard FTP access, I found the easiest, and simplest, solution was to setup a special folder in our website for import uploads, create a file system watcher for that location, and simply copy the files to storage container. If the website is recycled for any particular reason, all of our import files are already in the storage container for historical tracking. Similarly, if the webjob needs to recycle itself for any particular reason, any new files in the import folder will immediately be grabbed upon restart.


Azure WebJobs has made a potentially difficult and complex feature an easy and simple one to implement. It’s stable, scalable, and easy to modify as our needs change. If you haven’t checked out Azure WebJobs yet, you should, it can be used for a wide variety of needs.

Enhanced by Zemanta
About the Author
Sean Patterson is a Senior Software Engineer at Fresh Consulting. By day he develops applications at Fresh with the motto "If you can dream it, I can build it". By night he's a java overlord (of the coffee persuasion), CCG junkie, spider killer for his wife, and silly daddy to his twin daughters.