What tables were touched within given range of wal LSN?

We had a case recently where one of our DBs failed over to a new primary. To this server (old primary, and then new primary) we had connection from some kind of CDC tool (Debezium, I think).

The thing is that while there was failover, this tool read (using logical decoding) changes on old primary to location 1F1F/4ADC3348, and on new db, it connected from location 1F1F/4ADC4038. Which means that there was some WAL that was not decoded and sent to Debezium.

This is fixable, but we need to know what tables were touched in the WAL stream, that Debezium didn't see, if any. Perhaps we could also get ids/location of the rows that were there?

First things first – let's see how much data we're talking about. Luckily in Pg it's rather simple:

=$ SELECT '1F1F/4ADC4038'::pg_lsn - '1F1F/4ADC3348'::pg_lsn;
 ?COLUMN? 
----------
     3312
(1 ROW)

This tells us that there is ~ 3kB worth of data in WAL that we should look at. Not much. We can definitely live with it.

To get the list of tables we will need to get WAL files that contained it. We can get the wal file names using this query (ran on new primary):

=$ SELECT pg_walfile_name('1F1F/4ADC3348'), pg_walfile_name('1F1F/4ADC4038');
     pg_walfile_name      │     pg_walfile_name
══════════════════════════╪══════════════════════════
 0000000200001F1F0000004A │ 0000000200001F1F0000004A
(1 ROW)

So, we see that all wal that we need is in this single wal file. Which is awesome, as it will make my life a bit easier.

But, for completeness sake, let's assume that starting lsn is 1F1F/48FC3348.

In such case, we will need to get more wal, and also we're dealing with much more data:

=$ SELECT
    pg_walfile_name('1F1F/48FC3348'),
    pg_walfile_name('1F1F/4ADC4038'),
    '1F1F/4ADC4038'::pg_lsn - '1F1F/48FC3348'::pg_lsn;
     pg_walfile_name      |     pg_walfile_name      | ?COLUMN? 
--------------------------+--------------------------+----------
 0000000200001F1F00000048 | 0000000200001F1F0000004A | 31460592
(1 ROW)

Knowing how wal files are named in this example I would need to get files:

  • 0000000200001F1F00000048
  • 0000000200001F1F00000049
  • 0000000200001F1F0000004A

(note: if this was “around" failover, it's possible that first, or second, wal file would not have timeline (prefix: 00000002) 2, but rather “1" – which might be important when restoring the file from walarchive)

But for now, let's do the thing on small range: 1F1F/4ADC4038 – 1F1F/4ADC3348, with single wal file: 0000000200001F1F0000004A

We need this file somewhere, ideally on the server that is currently primary.

It might be in pg_wal/ directory inside $PGDATA, but it is also possible that it isn't there. In such case we will need to recover this file from wal archive. How to do it depends on how you archive, and it's definitely out of scope for this post.

So, let's assume I'm in directory ~postgres/wal-tables, and the wal is here:

postgres :: ~/wal-TABLES =$ ls -l
total 16384
-rw------- 1 postgres postgres 16777216 Jul 19 13:22 0000000200001F1F0000004A

Next thing, we need to get pg_waldump program. It's usually installed, though not always available just by typing pg_waldump. For example, on debian/ubuntu, with Pg packages from PGDG, this program is in /usr/lib/postgresql/PG_VERSION/bin directory. So, in my case it will be:

postgres :: ~/wal-TABLES =$ /usr/lib/postgresql/14/bin/pg_waldump --help
pg_waldump decodes AND displays PostgreSQL write-ahead logs FOR debugging.

Now, to see what is in WAL, within given range of lsns, I would use this command:

postgres :: ~/wal-TABLES =$ /usr/lib/postgresql/14/bin/pg_waldump --start=1F1F/4ADC3348 --end=1F1F/4ADC4038 0000000200001F1F0000004A 0000000200001F1F0000004A
rmgr: XLOG        len (rec/tot):     51/   944, tx:          0, lsn: 1F1F/4ADC3348, prev 1F1F/4ADC3318, DESC: FPI_FOR_HINT , blkref #0: rel 17997/16669/42718133 blk 3409 FPW
rmgr: Heap        len (rec/tot):     54/    54, tx: 3056068701, lsn: 1F1F/4ADC36F8, prev 1F1F/4ADC3348, DESC: LOCK off 14: xid 3056068701: flags 0x00 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 17997/16669/42718133 blk 3409
rmgr: TRANSACTION len (rec/tot):     46/    46, tx: 3056068701, lsn: 1F1F/4ADC3730, prev 1F1F/4ADC36F8, DESC: COMMIT 2024-07-14 16:05:14.585662 UTC
rmgr: Heap2       len (rec/tot):     56/    56, tx:          0, lsn: 1F1F/4ADC3760, prev 1F1F/4ADC3730, DESC: PRUNE latestRemovedXid 3056060590 nredirected 0 ndead 1, blkref #0: rel 17997/16669/42719825 blk 3448
rmgr: Heap2       len (rec/tot):     58/    58, tx:          0, lsn: 1F1F/4ADC3798, prev 1F1F/4ADC3760, DESC: PRUNE latestRemovedXid 3056066930 nredirected 0 ndead 1, blkref #0: rel 17997/16669/42719825 blk 839
rmgr: Heap        len (rec/tot):    266/   266, tx: 3056068702, lsn: 1F1F/4ADC37D8, prev 1F1F/4ADC3798, DESC: HOT_UPDATE off 24 xmax 3056068702 flags 0x10 ; NEW off 25 xmax 0, blkref #0: rel 17997/16669/42718028 blk 118384
rmgr: TRANSACTION len (rec/tot):     46/    46, tx: 3056068702, lsn: 1F1F/4ADC38E8, prev 1F1F/4ADC37D8, DESC: COMMIT 2024-07-14 16:05:14.597812 UTC
rmgr: XLOG        len (rec/tot):     42/    42, tx:          0, lsn: 1F1F/4ADC3918, prev 1F1F/4ADC38E8, DESC: END_OF_RECOVERY tli 2; prev tli 1; TIME 2024-07-14 16:08:14.677533 UTC
rmgr: XLOG        len (rec/tot):     51/  1633, tx:          0, lsn: 1F1F/4ADC3948, prev 1F1F/4ADC3918, DESC: FPI_FOR_HINT , blkref #0: rel 1663/13726/1259 blk 8 FPW
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 1F1F/4ADC3FB0, prev 1F1F/4ADC3948, DESC: RUNNING_XACTS nextXid 3056068703 latestCompletedXid 3056068702 oldestRunningXid 3056068703
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 1F1F/4ADC3FE8, prev 1F1F/4ADC3FB0, DESC: RUNNING_XACTS nextXid 3056068703 latestCompletedXid 3056068702 oldestRunningXid 3056068703

The two final arguments are start wal file, and end wal file. In my case, it's the same wal file (so I could just write it once), but in case of larger range, you have to provide both start and end.

Now, this is a lot of text, and most of it isn't really interesting (for us, for now).

Since I need to know about tables that were modified, I just need to look at Heap/Heap2 lines in the output:

postgres :: ~/wal-TABLES =$ /usr/lib/postgresql/14/bin/pg_waldump --start=1F1F/4ADC3348 --end=1F1F/4ADC4038 0000000200001F1F0000004A 0000000200001F1F0000004A | grep Heap
rmgr: Heap        len (rec/tot):     54/    54, tx: 3056068701, lsn: 1F1F/4ADC36F8, prev 1F1F/4ADC3348, DESC: LOCK off 14: xid 3056068701: flags 0x00 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 17997/16669/42718133 blk 3409
rmgr: Heap2       len (rec/tot):     56/    56, tx:          0, lsn: 1F1F/4ADC3760, prev 1F1F/4ADC3730, DESC: PRUNE latestRemovedXid 3056060590 nredirected 0 ndead 1, blkref #0: rel 17997/16669/42719825 blk 3448
rmgr: Heap2       len (rec/tot):     58/    58, tx:          0, lsn: 1F1F/4ADC3798, prev 1F1F/4ADC3760, DESC: PRUNE latestRemovedXid 3056066930 nredirected 0 ndead 1, blkref #0: rel 17997/16669/42719825 blk 839
rmgr: Heap        len (rec/tot):    266/   266, tx: 3056068702, lsn: 1F1F/4ADC37D8, prev 1F1F/4ADC3798, DESC: HOT_UPDATE off 24 xmax 3056068702 flags 0x10 ; NEW off 25 xmax 0, blkref #0: rel 17997/16669/42718028 blk 118384

Now, we're getting closer. We see there, after desc: description of what has happened. Generally, I would look for things that relate to normal write operations in SQL. Quick grep in sample wal showed that first word of description, in my case, is one of:

  • DELETE
  • HEAP_CONFIRM
  • HOT_UPDATE
  • INPLACE
  • INSERT
  • INSERT+INIT
  • LOCK
  • MULTI_INSERT
  • NEW_CID
  • PRUNE
  • UPDATE
  • UPDATE+INIT
  • VACUUM
  • VISIBLE

Based on this, and some thinking, I assume we can limit our reading to records that do: DELETE, HOT_UPDATE, INSERT, INSERT+INIT, MULTI_INSERT, UPDATE, and UPDATE+INIT.

In my case I have just one such record:

rmgr: Heap        len (rec/tot):    266/   266, tx: 3056068702, lsn: 1F1F/4ADC37D8, prev 1F1F/4ADC3798, desc: HOT_UPDATE off 24 xmax 3056068702 flags 0x10 ; new off 25 xmax 0, blkref #0: rel 17997/16669/42718028 blk 118384

Now, which table it touched?

This info is closer to the end, as set of three numbers separated by / after rel: 17997/16669/42718028.

First number is oid of tablespace, so we can ignore it, for now.

Second number is oid of database, and we can find out which database it was by doing:

=# SELECT datname FROM pg_database  WHERE oid = 16669;
 datname
─────────
 realdb
(1 ROW)

This query can be ran in any database, as pg_database is global table.

The third number can be mapped to table name using this query, but it has to be ran in the database we got in previous step:

=# SELECT relnamespace::regnamespace, relname FROM pg_class WHERE relfilenode = 42718028;
     relnamespace     │   relname
──────────────────────┼────────────
 schema_of_this_table │ TABLE_NAME
(1 ROW)

(note: the number doesn't have to be table. It's relation, so it could be, just as well, index. But if the prefix to wal record says rmgr: Heap – then it's a table. For indexes it would be something like Btree, BRIN, Gin, or Gist.)

And as simply as that we know that this table was touched.

What exactly has happened – well, it was some HOT update.

Now, the final bit of information in the line is at the end: blk 118384 – it's block (8kB page) in the table file (numbers start from 0).

With all this, what else can we do? Getting data, like primary key, might be complicated. The thing is that the row could have been updated since. You can try, of course, to get the info using pageinspect extension, but that could be problematic.

There is also one slight issue. If the table was rewritten (cluster, vacuum full, some forms of alter table) since the lsn range, it's relfilenode will change. And thus finding what table it was will be more complicated.

But, if you're checking relatively soon after the switch, there shouldn't be any problems related to this.

Hope you'll find it useful.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.