Tutorial, part 3: Data Migrations
(You may want to read Part 1: The Basics, or Part 2: Changing Models)
Changing the database is one side of the equation, but often a migration involves changing data as well; for example, you're changing from a single 'password' field into separate 'salt' and 'hash' fields.
This is what is called a 'data migration'; rather than changing the schema of the database, you're changing the way the data is laid out inside it (the implicit schema, if you like).
South, by default, 'freezes' the app's models onto the bottom of every migration - it examines their declarations, and writes out a large dict at the bottom of the migration representing their state at the time the migration was created. As well as being useful for autodetection, this state is also used to allow you to access the models in that historical state to make changes to them.
(If you're wondering why you can't just use models directly from the models.py file, consider what happens when you run this migration in 3 months' time, when you've completely changed the layout of the Adopter model - it won't match what you're expecting, or the current schema of the database while the migration is running)
South will pass a 'fake ORM' to you as the first argument to the forwards() method; to access a model, simply call it as orm.Model - for example, orm.Adopter. If you've frozen models from other apps (more on that later), you must use orm['auth.User'].
So, if I wanted all adopters named Bernie, I could run:
orm.Adopter.objects.filter(name=Bernie)
As you can see, the ORM is exactly like the Django one (because it is the Django one, just loaded differently).
However, one word of warning:
Don't mix data migrations and schema-editing migrations.
It is bad practice. Migrations are supposed to be atomic units, and things get in a muddle if you start mixing and matching - triggers won't have been fired, the ORM will only work at the end of a migration, and more.
Every data migration is usually accompanied by a schema migration; for example, in the password example in the first paragraph, we will be adding 'salt' and 'hash' columns, and removing the old 'password' one. What you should do is write three migrations:
- One to add the new columns (a schema migration)
- One to copy and split the password values from the old column into the new ones (a data migration)
- One to delete the old 'password' column (a schema migration)
I'll show a worked example, assuming you have the models.py file from the previous part. Remember, before our models looked like this:
from django.db import models class Lizard(models.Model): age = models.IntegerField(null=True, blank=True) length = models.FloatField() name = models.CharField(max_length=30) class Adopter(models.Model): lizard = models.ForeignKey(Lizard) name = models.CharField(max_length=50)
Let's split that name field on Adopter into first_name and last_name - we'd like the first name so we can informally address them on our site.
Note that at this point you should add some example names into the database; I recommend adding the following admin.py to your southdemo app and using the administration interface:
from django.contrib import admin from southdemo.models import * admin.site.register(Lizard) admin.site.register(Adopter)
Firstly, we edit the models:
from django.db import models class Lizard(models.Model): age = models.IntegerField(null=True, blank=True) length = models.FloatField() name = models.CharField(max_length=30) class Adopter(models.Model): lizard = models.ForeignKey(Lizard) first_name = models.CharField(max_length=50) last_name = models.CharField(max_length=50)
You might just want to do this:
$ ./manage.py startmigration southdemo split_name --auto + Added field 'southdemo.adopter.last_name' + Added field 'southdemo.adopter.first_name' - Deleted field 'southdemo.adopter.name' Created 0003_split_name.py.
This, as I have previously explained, is wrong (delete the migration if you ran the command). We'd have to add our name-copying code into the forwards method like this:
def forwards(self, orm): # Adding field 'Adopter.last_name' db.add_column('southdemo_adopter', 'last_name', models.CharField(max_length=50)) # Adding field 'Adopter.first_name' db.add_column('southdemo_adopter', 'first_name', models.CharField(max_length=50)) for adopter in orm.Adopter.objects.all(): pass # ... copy names here ... # Deleting field 'Adopter.name' db.delete_column('southdemo_adopter', 'name')
This is bad practice. Instead, as I described above, you should be making three migrations. Starting with your models.py file in the original state, we should:
- Add the new first_name and last_name fields.
- Run ./manage.py startmigration southdemo add_firstlast_name --auto - this should make a 0003_add_firstlast_name.py migration.
- Run ./manage.py startmigration southdemo migrate_names - this should create 0004_migrate_names.py (which will just be a blank template, since we gave startmigration no options)
- Remove the name field from models.py.
- Run ./manage.py startmigration southdemo remove_name --auto - this will make 0005_remove_name.py.
This process can be extended to any kind of migration - do any additions of fields, then create a blank migration in the middle to add your data migration to, then do another removal of fields. This way, in the middle migration, you have access to both the fields you are moving data from and the fields you're moving it to.
You can have a look at the 0003 and 0005 migrations if you wish, but they're reasonably standard schema migrations. We're interested in 0004_migrate_names.py - open it up, and you should see this:
from south.db import db from django.db import models from southdemo.models import * class Migration: def forwards(self, orm): "Write your forwards migration here" def backwards(self, orm): "Write your backwards migration here" models = { ... cut for clarity ... } complete_apps = ['southdemo']
When you run ./manage.py startmigration with only an app name and a migration name, South will create a skeleton template of a migration for you to fill in - it's much more useful than doing that part yourself.
Here, we're interested in data migrations, and so we'll be using the ORM. One thing you should note is that the ORM will not work during a dry run - if you're using MySQL, all migrations are dry-run before each application, and if you're on another database, you should code to make sure your migrations are portable. You have two ways of making ORM code not run during a dry run:
- Wrapping the ORM code in an if not db.dry_run: block
- Adding the no_dry_run = True statement to the Migration class
We'll do the latter, since our entire migration will be using the ORM, and in general you should be making migrations this way anyway. Adding it to the class, we get this:
from south.db import db from django.db import models from southdemo.models import * class Migration: no_dry_run = True def forwards(self, orm): "Write your forwards migration here" ...
Now, we can start writing our migration. You use the fake ORM just like the normal ORM, but prefixing all models with orm.. Thus, our loop to copy the names across might look something like this:
def forwards(self, orm): for adopter in orm.Adopter.objects.all(): try: adopter.first_name, adopter.last_name = adopter.name.split(" ", 1) except ValueError: adopter.first_name, adopter.last_name = adopter.name, "" adopter.save()
At this point, save the migration.
Now, if you followed my advice before and created some names to test this all with, when you run ./manage.py migrate southdemo, it will fail. If you didn't add any names, it will succeed. But why?
The answer lies in the fact that our two new columns - first_name and last_name - are declared NOT NULL (that is, you didn't add null=True), and quite rightly so, since we want them to have to be specified. The only problem is that, when you're adding a new column to a table that already has some rows in it, what do you put in the new column as a value for each current row? You can't use NULL, since we said the column was NOT NULL, so most databases will quit and whinge at this point.
What we need to do is provide a default value for the new columns. South will pick up on any default="foo" declarations you have in your fields, and use those, but again, we don't want to have that on our model - we want the code to have to provide both fields, otherwise we could have mysterious bugs where we forget a field.
The solution is to edit the 0003_add_firstlast_name.py migration, and add the default declarations only there, so it never touches our models.py. Even better, we can use the keep_default=False option of db.add_column, which doesn't even leave that default value in the database, so any non-Django apps will also error when trying to add new rows without those two columns specified.
Change the forwards() method in 0003 to look like this, noting what we've changed:
def forwards(self, orm): # Adding field 'Adopter.last_name' db.add_column('southdemo_adopter', 'last_name', models.CharField(max_length=50, default=""), keep_default=False) # Adding field 'Adopter.first_name' db.add_column('southdemo_adopter', 'first_name', models.CharField(max_length=50, default=""), keep_default=False)
You'll also need to edit the backwards() method in 0005 in a similar way.
Now, save that, and we can finally run our migrations through:
$ ./manage.py migrate southdemo Running migrations for southdemo: - Migrating forwards to 0005_remove_name. > southdemo: 0003_add_firstlast_name = ALTER TABLE "southdemo_adopter" ADD COLUMN "last_name" varchar(50) NOT NULL DEFAULT ''; [] = ALTER TABLE "southdemo_adopter" ADD COLUMN "first_name" varchar(50) NOT NULL DEFAULT ''; [] > southdemo: 0004_migrate_names > southdemo: 0005_remove_name = ALTER TABLE "southdemo_adopter" DROP COLUMN "name" CASCADE; [] - Loading initial data for southdemo.
If you added some names via the admin at an earlier step, you can now open the admin up and see that they've been split (on the first space) into a first and last name. In this particular case of name-splitting, that's not entirely correct, but it gets around 99% of the names you'll encounter if you're in an average, English-speaking country (like I happen to be).
If you didn't add any names, and want to see the effects, roll back:
$ ./manage.py migrate southdemo 0002 - Soft matched migration 0002 to 0002_extend_lizard. Running migrations for southdemo: - Migrating backwards to just after 0002_extend_lizard. < southdemo: 0005_remove_name = ALTER TABLE "southdemo_adopter" ADD COLUMN "name" varchar(50) NOT NULL DEFAULT ''; [] < southdemo: 0004_migrate_names < southdemo: 0003_add_firstlast_name = ALTER TABLE "southdemo_adopter" DROP COLUMN "last_name" CASCADE; [] = ALTER TABLE "southdemo_adopter" DROP COLUMN "first_name" CASCADE; []
(if it fails with an IntegrityError? here, you forgot to follow my advice to add a default to the add_column in 0005's backwards())
Then, edit your models.py to reflect the current schema (a name field and no first_name or last_name fields), add in my admin.py from above, use the admin to add a few names, migrate forwards again, and change models.py back to the version with first_name, last_name, and no name.
For completeness, you may also want to add the backwards() method of 0004_migrate_names.py:
def backwards(self, orm): for adopter in orm.Adopter.objects.all(): adopter.name = adopter.first_name + " " + adopter.last_name adopter.save()
That's all for this initial walk through data migrations. You can do most things with the historical ORM you get passed in migrations; as well as accessing models on your own application with orm.ModelName, you can also access ones on other apps with orm['auth.User'], as long as they've been frozen with --freeze auth, for example.
The next part of the tutorial covers working in teams with South, and recommended workflows to ensure it's a very light overhead to keep migrations around and up-to-date.
