Migration and Co-Existence

This section covers design decisions that are driven by how the existing PHP application is structured. All of those should be considered technical debt and eliminated when possible or when the original application is abandoned.

Database

Database is the main integration point. The original structure is just inspection of the original data, so do not pay attention to its design. List of consideration follows.

Database encoding

The original data are misencoded: while stored in field that pretends to be ISO-8859-2/latin2, the data is in fact stored in win1250/cp1250 encoding.

This is transparently handled by ddcz.models.magic; for original tables, MisencodedTextField or MisencodedCharField must be used.

New models respect the connection setting and store data as latin2. Once the old application is shut down, everything should be recoded in a way 21st century people store data (UTF-8).

Warning

While encoding is handled transparently on the model level, it isn’t so during database lookup.

All string lookups on a MisencodedCharField (or MisencodedTextField) has to use a Model.objects.get(field=value.encode("cp1250").decode("latin2")) syntax.

Django’s (database model) migration strategy

Django provides a reasonable framework for handling migration that is used in our application. Initial structure has been done using :cmd:`inspectdb`, which automatically creates unmanaged models and has been placed into ddcz.models.legacy.

When model/table is incorporated into application with all bells and whistles required for it to actually run and be read- and write-able, it’s moved into ddcz.models.used.

There is one problem: unmanaged models are not created during the normal setup, hence tests are failing and application is unusable for anyone without access to database structured backup. To work around it, there is a hack:

  • In the initial migration, the default managed is set depending on SETTINGS.IS_DATABASE_SEEDED. This has to be set depending on whether database is restored from original data
  • This means that migration from unmanaged to managed model will work correctly with seeded database and will be “noop” migration for seeded database

User Model

In order to leverage Django’s authentication framework (meaning reasonable forward-compatible safety), tricks are needed.

Original data is stored in uzivatele table. For usability, this is exposed as UserProfile model and appropriate relation is used.

Warning

Always use ddcz.users.create_user for creating users, instead of django.auth.models.User.create_user

Warning

To avoid the need for complete database migration, django.auth.models.User is not prepopulated and the migration is to be transparently handled on user login until the old application exist.

Hence, the application must use UserProfile model only when displaying user data (i.e. in user stats).

During the initial setup, the arbitrary value of 20000 has been selected for django.auth.models.User’s auto_increment value to distinguish between users from pre-migration to post-migration and to allow old users to retain their IDs.

Author Model

In order to bridge the confusion between article source (zdroj, zdrojmail) and author writing (autor, autmail), we are creating a new model, Author. This contains foreign keys if discovered and allows future normalization of data.

Random discoveries in the legacy data model

  • autor and autmail attribute for authors are denormalized. Author’s email in autmail is never updated if user changes their email in settings