Article

Treat Your Database Like Code

Sean Patterson

March 18, 2015 ...

Oftentimes, when working on a project, we focus solely on code and take the underlying database for granted. With code, we make sure that it’s properly organized, maintaining it in a source control repository where we keep our commits atomic.

Our database doesn’t get that much respect. We may use a tool to import a CSV file to get our database started. As changes arise, we pull up our favorite management tool, change our structure, and move on. When it is time to release the project, we do a simple “sql dump” and import that into the new system. Of course, this approach doesn’t work well as soon as things get complicated.

For example, if you’re working on a particular feature in the code that requires a database change, and your co-worker isn’t, how do you handle the disparity in structure, or merge it when the time comes? Those “sql dump” database scripts can get rather large, and constantly restoring and backing them up can become an issue. More importantly, what happens when a feature that was recently pushed to production contains a bug and needs to be rolled back? It may not be so simple to roll back to the database backup that the DBA made before the new rollout, depending on when the snapshot was made and what data has been involved. Fortunately, there are tools and techniques that can help with this.

Now that ORMs (object relational managers) are commonplace, most (if not all) of the major frameworks have enabled “database migrations” in their core to work with these issues. The concept is to provide a “codified” way of altering the structure of your database. You create a file that contains the code to create your table. More importantly, your migration file contains both an “up” and a “down” method. When you’re adding a new feature, you run the “up” method. When you want to reverse the change, you run the “down” method. What happens when you need to add a new column to an existing table or change the name/type of an existing column? You create a new one with that specific change instead of modifying your existing migration file. Doing so helps provide a snapshot, or change order, to the database over time, and it can also correlate with new features being added in the code itself.

Each framework has its own methods of structuring the migration file order to establish a timeline, or order, for running the migrations. That way, Coder A can work with the database in a state that has the new update, while Coder B can continue working with the database in its original state, with no conflict of interest (or code). Similarly, if you have to roll back the live database in conjunction with the new feature that is buggy, you can target the specific “migration state” you want and the script will apply the necessary changes to the database.

What about the data itself? What if you have a set of lookup tables and need them to house default data? Migration “seeds” exist for this purpose. Seeds allow us to put data in when we need it. It also allows us to generate data manipulation scripts, if we need to copy data from one column to another, or move it to another table. Seeds can also run through a standard command prompt or web page script, which is helpful when we don’t have direct access to the database itself.

Equipping yourself with these tools will make your database a lot more agile within your development environment. It will also allow you to automate a lot of your deployments/rollbacks through scripts. Keep the following best practices in mind when building your environment:

Keep Your Changes “Atomic”

Except for your “initial migration,” which will typically have a few core tables for your application, make sure that each additional migration pertains to one feature or core change to the database. It’s okay if your migration script modifies multiple columns if that is what is required for a new feature, but each database change should correlate with one feature or bug fix, similar to how you maintain your codebase.

Keep Structure and Data Separate

While it may seem natural to modify your data and its structure in the same migration, it is important to keep them separate. If the table structure changes, you may need a different set of “seeds” for each structure. Mixing in the data change (seed) at the same time as the structure change (migration) can cause problems. Put your structure changes in the appropriate migrations, and keep your data insertions (or updates) in the appropriate seeds. Doing this helps to avoid conflicts that can occur when a column type changes and you already have data in it.

Test Your Migrations from “Ground Zero”

The best way you can verify that your migrations are set up properly is to drop your entire database, create a new one, run your migrations, and verify that your application is still running without a problem. This is where testing your database seeds can come in handy. To really test your setup, you should be able have another developer pull down your code (and migrations), install any necessary basic libraries, run the migrations/seeds, and have a fresh instance of the application running on their system.

I recently finished a project where we built a new system from scratch, but also had to migrate the data from its predecessor app into the new system. By leveraging a combination of migrations and seeds (in both the old system and the new), I was able to save hours (potentially even days) of manually importing and manipulating a large set of data from one system to the other. I could easily share the task with a coworker, when necessary. We could also coordinate the migration with our most recent code changes, and use older sample data.

Give your database the same kind of attention that you give your code base using migrations and seeds! You can jump in right away in .Net, Laravel (PHP), CodeIgniter (PHP), Ruby on Rails, and other frameworks too!