Why Would You Want a Circular Log?
Say you want to log messages, but you don't need to keep old ones. If you were logging to files, you could use a log rotation program. But what if you're logging to a database?
Couldn't you just regularly truncate the table? Well, that's what we tried at first. But when someone wanted to see a message from 22:00 the night before, and the truncation had run at midnight, they were out of luck. What we wanted was a way to keep at least 24 hours worth of entries at all times.
Features of the Circular Log
- Each log entry requires only a single SQL statement.
- The maximum number of rows in the table can be strictly controlled (and resized).
- It's fast.
- There's no maintenance required.
Rolling Your Own
First, create the log table.
CREATE TABLE circular_log_table (
log_id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
row_id INTEGER UNSIGNED NOT NULL UNIQUE KEY,
timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
Next, decide on the number of rows you'd like to retain. We'll call that number MAX_CIRCULAR_LOG_ROWS.
Finally, to add new rows:
REPLACE INTO circular_log_table
SET row_id = (SELECT COALESCE(MAX(log_id), 0) % MAX_CIRCULAR_LOG_ROWS + 1
FROM circular_log_table AS t),
payload = 'I like turtles.'
The payload column is here as an example. Any number of additional columns of any type should work, as long as they're all set in the REPLACE statement.
How Does it Work?
If you've used Linux, you're probably familiar with one circular log: the kernel's ring buffer, accessed and controlled through dmesg. The buffer has a fixed size. Once it fills up, it loops back on itself and starts overwriting old messages with new ones. That's essentially what happens with the MySQL log table as well.
Carrying the analogy dangerously far: the modulo of the log_id and the buffer size acts as a pointer to the address (row_id) in the table to write to.
Watching it In Action
Let's say that MAX_CIRCULAR_LOG_ROWS was set to 100. When there's no rows in the table, the subselect will give us 1 for the row_id (
COALESCE(MAX(log_id), 0) % 100 + 1=
COALESCE(NULL,0) % 100 + 1=
0 % 100 + 1=
1). This means that the first row inserted will have log_id = 1, row_id = 1. So far so good.
When it's time to insert the second row,
MAX(log_id)will evaluate to 1 (since we haven't yet inserted the second row) and so the row_id will be 2, which again matches the log_id of the row upon insert (log_id = 2, row_id = 2).
This proceeds as expected up until 100 rows have been inserted into the table (log_id = 100, row_id = 100).
On insertion of the 101th row, row_id rotates back to 1. (
COALESCE(MAX(log_id), 0) % 100 + 1=
100 % 100 + 1=
1) Now, when the row is inserted it, due to the unique constraint on row_id, it will replace the row with row_id = 1 and the new row will have log_id = 101, row_id = 1.
The process continues to repeat itself now thanks to the modulo. At log_id 201 we'll be back to row_id 1, and again at 301, ad infinitum.
Resizing the Log Table
To grow the table, just increase MAX_CIRCULAR_LOG_ROWS. There will be a lag until the row_id reaches the old MAX_CIRCULAR_LOG_ROWS and then it will grow to the new limit.
To shrink the table, decrease MAX_CIRCULAR_LOG_ROWS and then DELETE all rows with
log_id < MAX_CIRCULAR_LOG_ROWS. Again, there will be a lag until all entries are continously ordered without gaps. And keep in mind that the DELETE could lock the table and take a while.
Is It Stable?
We've been using this technique for almost 2 years now on a 2,000,000-row table with a dozen columns and multiple composite indexes. The log_id is up to 615,069,600 at the time I write this. The table has accumulated some overhead, but the overhead is still a fraction of either the table's data or index size.
Eventually the log_id column will be exhausted, but even at 10,000 inserts per second it'll take 3.5 billion years.