Ticket #393 (reopened defect)

Opened 5 years ago

Last modified 4 years ago

South executes unnecessary SQL when adding a field

Reported by: jacob@… Owned by: andrew
Priority: minor Milestone: 0.7
Component: migrations Version: 0.7-rc1
Keywords: Cc:

Description

South (appears to?) execute unnecessary SQL when adding new fields: it issues an ALTER COLUMN ... TYPE command that isn't needed.

This is under PostgreSQL/psycopg2.

To reproduce, start with this model:

class Person(models.Model):
    pass

Create an apply an initial migration, then add a field:

class Person(models.Model):
    name = models.CharField(max_length=200, default="")

Create a migration for the name field, then run migrate -v2:

 - Soft matched migration 0002 to 0002_add_person_name_field.
Running migrations for people:
 - Migrating forwards to 0002_add_person_name_field.
 > people:0002_add_person_name_field
   = ALTER TABLE "people_person" ADD COLUMN "name" varchar(200) NOT NULL DEFAULT ''; []
   = ALTER TABLE "people_person" ALTER COLUMN "name" TYPE varchar(200), ALTER COLUMN "name" DROP DEFAULT, ALTER COLUMN "name" SET NOT NULL; []

Note the second ALTER COLUMN ... TYPE command.

I doubt that this is dangerous, but it is unnecessary, and on databases under load it could make migrations take longer (since ALTER TYPE requires an exclusive lock).

Change History

comment:1 Changed 5 years ago by andrew

  • Status changed from new to closed
  • Resolution set to wontfix
  • Milestone set to 0.7

Yes, this is currently by design; when South adds the field, it has to do with with a DEFAULT clause (to fill in existing rows), but to match Django's syncdb schema, it will then modify the column to not have a default, using alter_column.

Since alter_column doesn't look at the database before executing, it will always execute the same alter statements, since doing them repeatedly is harmless. I agree that it means more database locks, but to be fair, if you're betting on the migration finishing three seconds faster than normal, or are trying to do it on a very busy live site, you're probably either doing it wrong, or have other issues.

Basically, I'd rather keep this around and have one set of codepaths than unnecessarily complicate things. If you have convincing arguments otherwise, I'm happy to re-open the ticket and fix it.

comment:2 Changed 4 years ago by Jason Culverhouse <jason+south@…>

  • Status changed from closed to reopened
  • Resolution wontfix deleted

I'd like to reopen this ticket

Let's pretend that the table modification is

class Person(models.Model):
    name = models.TextField(blank=True, null=True)

The issued statements are

ALTER TABLE "people_person" ADD COLUMN "name" text NULL;
ALTER TABLE "people_person" ALTER COLUMN "name" TYPE text, 
   ALTER COLUMN "name" DROP NOT NULL, 
   ALTER COLUMN "description" DROP DEFAULT;

All of these are unnecessary , however this statement in isolation will cause a table rewrite on postgres, i.e. the entire table will need to be read and rewritten to disk causing a lot of disk IO and potentially requiring 2x the size of the table on disk

ALTER TABLE "people_person" ALTER COLUMN "name" TYPE text

Note that using the --db-dry-run --verbosity=2 settings show that only the

ALTER TABLE "people_person" ADD COLUMN "name" text NULL;

Will be executed, when in-fact the second alter table statement is also issued.

comment:3 Changed 4 years ago by Jason Culverhouse <jason+south@…>

One more thing...

I find that if I modify the keep_default=False to keep_default=True in the migration file it will suppress the second DDL statement

db.add_column('people_person', 'description', self.gf('django.db.models.fields.TextField')(null=True, blank=True), keep_default=True)

I'm not sure of this is a "safe" way to use the api see: wiki:db.add_column

comment:4 Changed 4 years ago by andrew

The problem here is that South doesn't know that the column is already the right type; we do virtually no database introspection, as that keeps the code a lot more sane.

It is inconvenient that the statement causes Postgres to rewrite the table - that sounds like something they could fix - but without a major rewrite of South's code, I just don't see what we can do here.

Note: See TracTickets for help on using tickets.