Web Apps

Building a Data Import Tool with Azure WebJobs

June 10, 2014

Recently we built a new site for a client and hosted it on Microsoft Azure. I’ve been very happy with the ease of deployment and scalability that Azure provides. One critial piece to this new application was creating an automated tool that would import data from their primary mainframe system into our new web application. Normally I would build a simple command line tool and install a windows service to handle such a task. However, I didn’t want the overhead and upkeep of running a server in the cloud. Fortunately, thanks to a new Azure feature called Azure WebJobs, I was able to easily create a solution to handle this requirement.


Azure WebJobs provides exactly what you would need in the scenario I described above.

The Windows Azure WebJobs SDK is a framework that simplifies the task of adding background processing to Windows Azure Web Sites.

Simply put, you create a simple console based app, zip and publish it, then schedule how frequently you want it to run. You can also have it continuously running, which is helpful if you are monitoring a folder or waiting for an event to occur (as in our case). As an added bonus, the WebJobs SDK provides a set of data annotations which serve as hooks into other Azure services, such as container storage. Previously you had to do a bit of grunt work to setup listeners for these resouces, now they are one line entries, including output.


After installing the Azure WebJobs SDK in your project (see this link/a> on how to get started), you can create a new console application and get started. The core of my import program looks like this:

Let’s break this code down into it’s core components:

File System Watcher (line 23) – Here we setup a file system watcher to check a specific location(s) for a file to be uploaded. This file system watcher is checking a directory on our website (more on this in a minute).

Upload Method (line 92) – This is an example of a data annotation. We read the file specified into a stream and simply copy it to our output stream, which is hooked up to our Azure storage blob. It couldn’t be simpler!

Process Import File (line 50) – Here is another data annotation that serves as a listener in our Azure container. By setting up an input and output parameter with a machine name parameter, Azure will check for new files in our input location that don’t have a matching file in our output location (preventing duplicate processing of the same file if the process restarts itself). When this occurs it calls the method in question. We pass our output TextWriter (for logging) into our ImportData method that does all the heavy lifting of parsing the data into the database. Once a new file is detected in the blob the process fires off again, allowing us to process multiple files simultaneously if needed. That’s all there is to it! No complex configurations necessary.

Installing Your WebJob

When you’re ready to install and run your WebJob, compile your application and zip up the output binary folder (bin/Debug or bin/Release depending on your configuration). Log in to your Azure Control panel, select your Website, and select the “WebJobs” tab. Click the “Add” button at the bottom of the page and enter the details specified. WebJobAddScreen Click the check box and shortly your job will be up and running! There will also be a link so you can see the output from the WebJob itself.

Checking Output

What happens when you want to track the result of an import? You can always log in to the Azure control panel, navigate to your containers and view files that way, but that can be cumbersome at times. Since I’m outputting my import process into a simple text file, I can build an import viewer into my web application that reads the content of a blob item in a few lines of code:

That’s all you need to do! You can pass the log text to your view, or wherever you need it to view it’s contents. What you choose to log is up to you. For me, I did a line by line record of a unique identifier, a success/fail message, and the basic contents of the exception of the record failed to import. That way I had most of the pieces I needed to debug the issue. If that wasn’t enough I have access to the import file itself and can look further.

Why a File System Watcher?

You may have noticed that there seems to be an “extra step” to this import tool. Why have a file system watcher on a folder in the website when I could upload directly to the container and let the WebJob immediately process the file? There were a few mitigating factors in this:

Technology Support – In order to upload to the blob container directly, I would have needed to create some kind of tool on the mainframe computer and would need the .Net libraries to do so, which wasn’t available.

Security / Complexity – By default, Azure containers are not publically accessible. You need to setup additional authorization to access the content in the container, which some apps (like CloudBerry Explorer) already use. Our mainframe source server couldn’t leverage these tools. In addition, there are some WorkerRoles you can install via Azure that create an FTP server for your storage container, but it added “yet another service” to the system and I didn’t want too many parts in place to provide points of failure for this system. Since Azure websites allow you provide standard FTP access, I found the easiest, and simplest, solution was to setup a special folder in our website for import uploads, create a file system watcher for that location, and simply copy the files to storage container. If the website is recycled for any particular reason, all of our import files are already in the storage container for historical tracking. Similarly, if the webjob needs to recycle itself for any particular reason, any new files in the import folder will immediately be grabbed upon restart.


Azure WebJobs has made a potentially difficult and complex feature an easy and simple one to implement. It’s stable, scalable, and easy to modify as our needs change. If you haven’t checked out Azure WebJobs yet, you should, it can be used for a wide variety of needs.

Enhanced by Zemanta

Sean Patterson

Senior Software Developer

Sean Patterson is a Senior Software Developer at Fresh Consulting. By day he develops applications at Fresh with the motto "If you can dream it, I can build it". By night he's a java overlord (of the coffee persuasion), aspiring running junkie, spider killer for his wife, and silly daddy to his twin daughters.

  • Sean – I assume that IsFileReady(…) exists to avoid trying to grab a file that isn’t yet finished being written. Does checking fileStream.Length risk giving you a false positive in cases where the the file is partially written to disk (not fully buffered)? In other words the file could be incomplete on disk and this function will indicate it as complete. UNLESS, trying to open a file stream on the a partially written will fail (because can’t get exclusive lock) and hence return false in the catch block.

    • Greetings Matthew, sorry for the delay. Yes, the IsFileReady() method does help verify that the file has finished uploading before processing.

      I eliminate false positives in two ways, both of which you’ve outlined above. First, I try to open the file exclusively for me to read (via the FileShare.None parameter). If the file is being used elsewhere (such as being uploaded) it will fail (thowing an exception) and the file isn’t ready. If for some reason we do open the file and the size is 0, then we assume the file isn’t ready (could be a bad upload that causes this too).

      You’re mileage may vary based on the way your FTP server handles uploads, but in the case above (uploading to an Azure server) this has worked withtou any problems.

      • Hey Sean no problem! Thanks for your reply, I appreciate it.

  • Please fill out the survey on Azure WebJobs: Thank you!!

  • agilbert201

    It’s a little misleading to state you would need .NET libraries on the mainframe to write directly to an Azure storage container (Blob) there. Interested in what the toolset limitation was. With a SAS Token, it is a pretty simple CURL like call to put content.

    • Greetings agilbert201. You’re correct, the SAS Token does make things really easy and we had looked into that, but we were working with an AS/400 instance, which had some restrictions in place that didn’t allow this to happen. If given the change, it would definitely be worth looking into a deeper look to make the direct integration happen.

  • Juan Carlos Soto Cruz

    Hello it is not the function InputData and namespace Microsoft.WindowsAzure.Jobs no longer works

  • Mathew Charles [MSFT]

    Sean, I’m on the WebJobs team at Microsoft, and we’re adding support for file triggers targeting this scenario. What we’ll enable is shown in the following code:

    I’d be curious to hear what you think about the approach. The idea is to first class file triggers, so rather than what you had to build manually, you can instead use FileTrigger and a single job function, with auto delete, etc. handled automatically.

    • Greetings Matthew! I would love to see something like this. About 3 or 4 months ago, we went back to this integration job because we were running into instances where the RunAndBlock process would arbitrarily hang, and no new file were being caught and processed. In addition, if multiple files were uploaded simultaneously (a rarity but it happened), only one of the files would get caught and processed.

      As a result, I wound up changing the approach to have a scheduled job, running every 5 minutes, that would look for files in our target folder and process them accordingly. This approach has run much smoother, but doesn’t quite have the “on demand” approach that would be nice. Sometimes files don’t show up for 20 minutes, and removing all those run cycles that do nothing would be ideal.

      I see the filtering and the auto-delete method in the example you provided. I love that, especially since our file name changes every time.

      I’ll definitely be keeping an eye out for when that extension gets merged into the main Azure release. Thank you for letting me know about it!

      • Mathew Charles [MSFT]

        Yes, this extension will correctly handle concurrency and multi instance scale out, allowing multiple jobs to be running and processing the files concurrently. In addition to use of file watcher for immediate processing, the extension will scan for missed files on startup, periodically look for and process missed/error files, etc.

        Note: the plan is for these Extensions to live in this separate Extensions nuget package, not necessarily to be merged into the core SDK.

        • Excellent! Nice to have that as an easy to install NuGet extension. Is this extension officially released or still going through development?

          • Mathew Charles [MSFT]

            Still early in development. Basically we’re looking at doing a new v1.1.0 release of the SDK soonish (couple months perhaps) that will include a new extensibility model enabling these extensions. Currently the extensions are targeting a prerelease version of the SDK (on a feed). So the SDK and the Extensions will evolve together until v1.1.0 release, at which time both will be released on 🙂

You might also like...



Dev Principle #6: Plan Out Your Tech Stack

Architecting an application and deciding on the key components of the technology stack that go into building it is an important task. If software hadn’t worked out as a career, my backup was construction. There are many similarities – from crafting the blueprints to laying the foundation, from choosing your materials to building the structure … Continued



Angular Components: A Simpler Convention

There used to be no single standard for structuring Angular apps. Angular provided us with many different means of doing the same thing. This flexibility has given developers a lot of options and great power in structuring their apps. However, the downside is that often we are left wondering which way is the “right way.” … Continued



How Vagrant Up Can Make Development Easier

Have you ever managed a remote team and needed to get developers or designers set up so they could start contributing to a project? How many hours (or days!) do you wish you could get back? Earlier this year Hashicorp released a product that changed the face of team-based development. Vagrant was originally a simple … Continued