Power of SQLite

index

Introduction

Recently at work, I had to make a decision on which database to use on some parts for a new AI Agentic Platform. The options were between PostgreSQL and SQLite. After careful consideration, I chose SQLite for its simplicity and ease of setup. However, I soon realized that SQLite is much more powerful than I initially thought. In this blog post, I will share my experience and insights on how to harness the full potential of SQLite.

TL;DR

Use STRICT tables to enforce data type constraints on columns in a table. Without it, SQLite attempts to convert values to the column’s affinity and you may end up with the unexpected situation where the data is stored based on what you insert, not the declared type.

CREATE TABLE IF NOT EXISTS users (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    age INTEGER CHECK(age >= 0)
) STRICT;

Update the default pragmas to enforce foreign key constraints and improve performance. You can find the full list of pragmas in the SQLite documentation but here are some that I commonly use:

PRAGMA journal_mode = WAL;
PRAGMA journal_size_limit = 6144000;
PRAGMA synchronous = NORMAL;
PRAGMA temp_store = FILE;
PRAGMA foreign_keys = ON;
PRAGMA busy_timeout = 15000;
PRAGMA cache_size = -1000000000;

Use WITHOUT ROWID tables for tables that don’t need a ROWID column, which can improve performance and reduce storage space. In the example below, the sessions table uses session_id as the primary key instead of the default ROWID which is implicitly added to every table.

CREATE TABLE IF NOT EXISTS sessions (
    session_id TEXT PRIMARY KEY,
    user_id INTEGER NOT NULL,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
) STRICT, WITHOUT ROWID;

Use BEGIN IMMEDIATE for transactions to ensure that the transaction starts immediately and avoids potential locking issues. Rollback the transaction if any error occurs.

BEGIN IMMEDIATE;
-- SQL operations here
COMMIT;

Use UPSERT to handle conflicts during insert operations. This allows you to update existing records instead of failing the operation.

INSERT INTO users (id, name, age) VALUES (1, 'Daniel', 25)
ON CONFLICT(id) DO UPDATE SET name = excluded.name, age = excluded.age;

I really like the idea of creating additional context table per table that I have come to name them meta tables (basically a meta table for me is one that stores relevant statistics that would be queried often because as your main tables grows, querying them can become more complex and slower) in SQLITE. Usually the meta table is updated using triggers created on the main table.

CREATE TABLE IF NOT EXISTS users_meta (
    id INTEGER PRIMARY KEY CHECK(id = 1),
    total_users INTEGER DEFAULT 0,
    attributes TEXT DEFAULT '{}', -- SQLite doesn't have a JSON type, so we use a TEXT column to store JSON strings
    last_updated TEXT DEFAULT (datetime('now')) -- SQLite doesn't have a native TIMESTAMP type, so we use TEXT with a default value with its datetime() function
) STRICT, WITHOUT ROWID;

Every now and then, I like to check the health of the database. The method below can be used to perform a quick using PRAGMA quick_check or thorough health check using PRAGMA integrity_check.

public healthCheck(quick: boolean): SqliteDBResult {
    try {
        if (!quick) {
            this.db.exec("PRAGMA quick_check");
        } else {
            this.db.exec("PRAGMA integrity_check");
        }

        return { success: true };
    } catch (error: any) {
        this.logger.error("Database health check failed:", error);
        return { success: false, message: error?.message || "Error checking database health" };
    }
}

Why SQLite?

Well, I chose SQLite for several reasons namely, I wanted the following:

Simplicity: SQLite is a serverless database, which means it doesn’t require a separate server process like PostgreSQL. This makes it easy to set up and use, especially for small to medium-sized applications.
Lightweight: SQLite is a lightweight database that can be easily embedded into applications. This makes it ideal for applications that need a small footprint as some parts of the platform such as log retention and analytics, dynamic data schemas for clients and so on.
Isolation: I wanted to isolate certain parts of the platform from the main database. SQLite allows me to create separate databases for different parts of the application. In this instance, each agent’s context is kept isolated, as one SQLite database is created for each system on the platform.

Taming the dreaded `SQLITE_BUSY` error

One of the challenges I faced when using SQLite was that I was getting a random SQLITE_BUSY error. I found out that this error occurs when a database is locked and cannot be accessed by another process. To mitigate this, I set the PRAGMA busy_timeout to a higher value such as 15 seconds and also implemented a retry mechanism in my database access layer in my application. This allows the application to retry the operation after a short delay if it encounters a SQLITE_BUSY error.

public insert(
    tableRefName: string,
    data: SchemaInsertData[],
    retries: number = 3
): SqliteDBResult {
    let attempt = 0;

    while (attempt < retries) {
        try {
            if (!this.isWrite || !this.writeDB) {
                return {
                    success: false,
                    message: "Cannot insert in read-only mode or no write database connection",
                };
            }

            this.writeDB.exec("BEGIN IMMEDIATE");

            /**
            * Rest of the insert logic
            */

            this.writeDB.exec("COMMIT");

            return {
                success: true,
                message: "Data inserted successfully",
                data: { /* ... */ }
            };

        } catch (error: any) {
            this.logger.error(`Error inserting data (attempt ${attempt + 1}): `, error);

            try {
                this.writeDB.exec("ROLLBACK");
            } catch (err) {
                this.logger.error("Error rolling back transaction: ", err);
            }

            const isBusy =
                error.code === 'SQLITE_BUSY' ||
                error.message?.includes('database is locked') ||
                error.message?.includes('SQLITE_BUSY');

            if (isBusy && attempt < retries - 1) {
                attempt++;
                const waitMs = 100 * Math.pow(2, attempt);
                const start = Date.now();
                while (Date.now() - start < waitMs) {
                    this.logger.debug(`Retrying insert in ${waitMs}ms...`);
                }
                continue;
            }

            if (isBusy) {
                return {
                    success: false,
                    message: `Database is busy after ${retries} attempts. Please try again later.`
                };
            }

            if (error.message?.includes("UNIQUE constraint failed")) {
            }

            return { success: false, message: error.message };
        }
    }

    return {
        success: false,
        message: "Unexpected error: maximum retries exceeded"
    };
}

In snippet, the retry mechanism attempts to retry the insert operation up to 3 times if it encounters a SQLITE_BUSY error. The delay between retries increases exponentially to give the database more time to become available. It works in this case because of the isolation as every AI agent has its own SQLite database file for various tasks, so contention is minimized. However, if multiple processes or threads are accessing the same database file, you may still encounter SQLITE_BUSY errors but this is a good start as it solved my problem.

Again, to further eliminate SQLITE_BUSY errors, I used a single database connection for all write operations,that can be protected by a mutex or something similar synchronization mechanism to queue writes on the application layer. This further avoids reliance on retry mechanisms and busy_timeout.

I implemented this using two connection pools, a read pool that is scaled to number of available CPU cores and a write pool that is limited to a single connection. This ensures that all write operations are serialized, preventing concurrent writes that could lead to database locks and SQLITE_BUSY errors.

export class SqliteDB {
    private writeDB: BetterSqlite3Database | null;
    private readDB: BetterSqlite3Database;
    private dbPath: string;
    private isWrite: boolean;

    constructor(options: SqliteDBOptions) {
        this.dbPath = options.dbPath;
        this.isWrite = options.connectionType !== "read";

        if (this.isWrite) {
            this.writeDB = new Database(this.dbPath, {
                readonly: false,
                fileMustExist: false,
                timeout: options.pragmas?.busyTimeout ?? 5000,
            } as BetterSqlite3Options);
        } else {
            this.writeDB = null;
        }

        this.readDB = new Database(this.dbPath, {
            readonly: options.readOnly ?? false,
            fileMustExist: false,
            timeout: options.pragmas?.busyTimeout ?? 5000,
        } as BetterSqlite3Options);

        if (this.writeDB) {
            this.writeDB.configure('maxConnections', 1);
        }
        const maxReadConns = options.maxReadConnections ?? Math.max(4, cpus().length);
        this.readDB.configure('maxConnections', maxReadConns);

        if (options.pragmas) {
            this.setPragmas(options.pragmas);
        } else {
            this.setDefaultPragmas();
        }
    }
    // Rest of the class methods...
}

Transactions in SQLite

One of SQLite’s biggest pitfalls is its default transaction behavior. For example, consider the following incorrect code snippet:

-- Incorrect: Using default DEFERRED transaction (prone to SQLITE_BUSY error)
BEGIN;
-- Read operations
SELECT * FROM users WHERE id = 1;
-- Write operation triggers lock upgrade, may cause immediate SQLITE_BUSY
UPDATE users SET name = 'Ope' WHERE id = 1;
COMMIT;

The snippet is prone to the dreaded SQLITE_BUSY error because it uses the default DEFERRED transaction mode. In this mode, the transaction starts as a read-only transaction and only upgrades to a write transaction when a write operation is executed. If another connection has a lock on the database at that moment, SQLite will immediately return a SQLITE_BUSY error, ignoring any busy timeout settings.

To avoid this, I always start transactions with BEGIN IMMEDIATE instead of just BEGIN. This ensures SQLite respects the busy_timeout setting if the database is locked at the transaction’s start. The correct code snippet would look like this:

-- Correct: Using IMMEDIATE transaction (respects busy_timeout)
BEGIN IMMEDIATE;
-- Read operations
SELECT * FROM users WHERE id = 1;
-- Write operation, lock acquired at transaction start
UPDATE users SET name = 'Ope' WHERE id = 1;
COMMIT;

Tip (Transaction Best Practices)

Always ensure to batch multiple write operations within a single transaction to minimize lock contention and improve performance.

Optimizing SQLite for deployment

When deploying SQLite in any environment, there are several optimizations you can make to improve performance and reliability. Here are some tips I found useful:

Use WAL mode: Using the Write-Ahead Logging (WAL) mode can improve performance by allowing concurrent reads and writes. Basically, you can enable it by setting the jornal_mode pragma before connecting to the database:

PRAGMA journal_mode = WAL;

What the WAL mode does:

The WAL journaling mode uses a write-ahead log instead of a rollback journal to implement transactions. The WAL journaling mode is persistent; after being set it stays in effect across multiple database connections and after closing and reopening the database. A database in WAL journaling mode can only be accessed by SQLite version 3.7.0 (2010-07-21) or later.

Set Synchronous mode: Setting the synchronous pragma to NORMAL or OFF can improve performance by reducing the number of disk writes. However, this may increase the risk of database corruption in the event of a power failure or crash. I usually set it to NORMAL as a good compromise between performance and reliability.

PRAGMA synchronous = NORMAL;

Increase cache size: Increasing the cache size can improve performance by reducing the number of disk reads. You can set the cache_size pragma to a higher value based on your available memory. I usually set it to 1GB for my use case but you can adjust it as needed. Note that the value is negative to indicate that the size is in bytes and when the cache size is changed, that setting is not persistent and only persist for the current database session, finally reverting back to the default when the database is closed.

PRAGMA cache_size = -1000000; -- 1GB cache, adjust as needed

Enforce foreign key constraints: Enforcing foreign key constraints can help maintain data integrity and prevent orphaned records. You can enable foreign key constraints by setting the foreign_keys pragma to ON as SQLITE does not enforce foreign key constraints by default.

PRAGMA foreign_keys = ON;

Use prepared statements: Using prepared statements can improve performance by reducing the overhead of parsing and compiling SQL statements in SQLite. This is especially useful for frequently executed queries.

const userInsertStmt = db.prepare("INSERT INTO users (name, age) VALUES (?, ?)");

Backups for SQLite

When using SQLite with WAL mode, regular backups are essential to ensure data durability and portability. I use the VACUUM INTO to create a consistent backup file without altering the main database. This command is compatible with WAL mode and creates a new database file with all data, optimized and free of fragmentation.

Here’s an example of creating a backup:

VACUUM INTO 'backup.db'; -- Creates a backup of the current database into 'backup.db'

This command copies the entire database to backup.db, ensuring a consistent snapshot even with active connections in WAL mode. The resulting file can then be uploaded to an external storage service like AWS S3, Azure Blob Storage, Google Cloud Storage or any storage provider you have.

Make sure you schedule your backups during the low-traffic periods for system to minimize performance impact and verify backups periodically by restoring and checking integrity with PRAGMA integrity_check.

Note (Backup Best Practices)

You might come across the sqlite3 command-line tool’s .dump command which is an alternative for creating a SQL script backup. Though I would say it’s less efficient for large databases and requires SQL parsing. Just use VACUUM INTO for binary backups in WAL mode.

Conclusion

SQLite is not a toy database as everyone thinks it is. It is a powerful, lightweight, and easy to use database that can be used for a wide range of applications. By following the tips and practices outlined in this blog post, I hope you can harness the full potential of SQLite and build robust and efficient applications.

See you next time, Daniel out!