Database Seeding
As described in Do's and Don'ts, databases in Uffizzi should be deployed as containers to your ephemeral environments. And since each ephemeral environment is created as a clean slate, it is necessary to seed the database with data. This is especially true for applications that require a database to function.
Two Strategies
Two primary strategies exist for seeding databases in ephemeral environments:
- Loading an SQL dump file when the container is initiated
- Leverage language or framework-specific migration tools
The choice between these methods depends on factors such as the application needs and the frequency of data updates, with the seed data being stored either in the repository or object storage.
Storage Considerations for Seed Data
Git Repository Storage
Smaller datasets that seldom change may be stored in a Git repository, but this has limitations regarding file size. GitHub, for instance, has restrictions on file sizes above 50MB and uses Large File Storage for bigger files, which can be cumbersome if the seed data is frequently updated.
Object Storage
For larger or more dynamic datasets, the recommended storage solution is an object storage service like Amazon S3 or Google Cloud Storage, which offers scalability and simplicity.
Strategy 1: SQL Dump File Loading
The first seeding strategy involves loading an SQL dump file during the initialization of a container. Official images for databases such as Postgres, MySQL, and MariaDB on Docker Hub are configured to automatically populate the database at startup using files in a specific directory. Database containers execute files in /docker-entrypoint-initdb.d
when starting up, so you can store initialization files (*.sql
, *sql.gz
, or *.sh
) in the /docker-entrypoint-initdb.d
directory, mounted as a Docker volume.
Pros
- Applicable across various SQL databases (e.g., Postgres, MySQL, MariaDB)
- Relies on database containers to self-seed upon initialization
Cons
- Needs updates alongside database schema changes
Strategy 2: Migration Tools
The second strategy recommends using a migration tool specific to the application's language or framework, like Django's ORM or Rails' Active Record. This is particularly useful for applications that frequently update their database schemas, as the tool can seamlessly perform migrations. In both Django and Rails, for instance, there are specific commands and files (such as db/seeds.rb
for Rails) that facilitate the seeding process. However, this approach may require additional logic in the application or the continuous integration pipeline.
Pros
- Facilitates both database seeding and schema migration
- Often integrated into the application language/framework
Cons
- May not be available for all languages/frameworks
- Might require additional application or CI pipeline logic
Examples of Migration Tools
- Django: Uses the
manage.py loaddata
command - Go: Utilizes Atlas CLI for declarative migrations
- Rails: Leverages built-in migrations and
db/seeds.rb
for seeding