We had a case recently where one of our DBs failed over to a new primary. To this server (old primary, and then new primary) we had connection from some kind of CDC tool (Debezium, I think).
The thing is that while there was failover, this tool read (using logical decoding) changes on old primary to location 1F1F/4ADC3348, and on new db, it connected from location 1F1F/4ADC4038. Which means that there was some WAL that was not decoded and sent to Debezium.
This is fixable, but we need to know what tables were touched in the WAL stream, that Debezium didn't see, if any. Perhaps we could also get ids/location of the rows that were there?
First things first – let's see how much data we're talking about. Luckily in Pg it's rather simple:
=$ SELECT '1F1F/4ADC4038'::pg_lsn - '1F1F/4ADC3348'::pg_lsn; ?COLUMN? ---------- 3312 (1 ROW)
This tells us that there is ~ 3kB worth of data in WAL that we should look at. Not much. We can definitely live with it.
To get the list of tables we will need to get WAL files that contained it. We can get the wal file names using this query (ran on new primary):
=$ SELECT pg_walfile_name('1F1F/4ADC3348'), pg_walfile_name('1F1F/4ADC4038'); pg_walfile_name │ pg_walfile_name ══════════════════════════╪══════════════════════════ 0000000200001F1F0000004A │ 0000000200001F1F0000004A (1 ROW)
So, we see that all wal that we need is in this single wal file. Which is awesome, as it will make my life a bit easier.
But, for completeness sake, let's assume that starting lsn is 1F1F/48FC3348.
In such case, we will need to get more wal, and also we're dealing with much more data:
=$ SELECT pg_walfile_name('1F1F/48FC3348'), pg_walfile_name('1F1F/4ADC4038'), '1F1F/4ADC4038'::pg_lsn - '1F1F/48FC3348'::pg_lsn; pg_walfile_name | pg_walfile_name | ?COLUMN? --------------------------+--------------------------+---------- 0000000200001F1F00000048 | 0000000200001F1F0000004A | 31460592 (1 ROW)
Knowing how wal files are named in this example I would need to get files:
- 0000000200001F1F00000048
- 0000000200001F1F00000049
- 0000000200001F1F0000004A
(note: if this was “around" failover, it's possible that first, or second, wal file would not have timeline (prefix: 00000002) 2, but rather “1" – which might be important when restoring the file from walarchive)
But for now, let's do the thing on small range: 1F1F/4ADC4038 – 1F1F/4ADC3348, with single wal file: 0000000200001F1F0000004A
We need this file somewhere, ideally on the server that is currently primary.
It might be in pg_wal/ directory inside $PGDATA, but it is also possible that it isn't there. In such case we will need to recover this file from wal archive. How to do it depends on how you archive, and it's definitely out of scope for this post.
So, let's assume I'm in directory ~postgres/wal-tables, and the wal is here:
postgres :: ~/wal-TABLES =$ ls -l total 16384 -rw------- 1 postgres postgres 16777216 Jul 19 13:22 0000000200001F1F0000004A
Next thing, we need to get pg_waldump program. It's usually installed, though not always available just by typing pg_waldump. For example, on debian/ubuntu, with Pg packages from PGDG, this program is in /usr/lib/postgresql/PG_VERSION/bin directory. So, in my case it will be:
postgres :: ~/wal-TABLES =$ /usr/lib/postgresql/14/bin/pg_waldump --help pg_waldump decodes AND displays PostgreSQL write-ahead logs FOR debugging. …
Now, to see what is in WAL, within given range of lsns, I would use this command:
postgres :: ~/wal-TABLES =$ /usr/lib/postgresql/14/bin/pg_waldump --start=1F1F/4ADC3348 --end=1F1F/4ADC4038 0000000200001F1F0000004A 0000000200001F1F0000004A rmgr: XLOG len (rec/tot): 51/ 944, tx: 0, lsn: 1F1F/4ADC3348, prev 1F1F/4ADC3318, DESC: FPI_FOR_HINT , blkref #0: rel 17997/16669/42718133 blk 3409 FPW rmgr: Heap len (rec/tot): 54/ 54, tx: 3056068701, lsn: 1F1F/4ADC36F8, prev 1F1F/4ADC3348, DESC: LOCK off 14: xid 3056068701: flags 0x00 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 17997/16669/42718133 blk 3409 rmgr: TRANSACTION len (rec/tot): 46/ 46, tx: 3056068701, lsn: 1F1F/4ADC3730, prev 1F1F/4ADC36F8, DESC: COMMIT 2024-07-14 16:05:14.585662 UTC rmgr: Heap2 len (rec/tot): 56/ 56, tx: 0, lsn: 1F1F/4ADC3760, prev 1F1F/4ADC3730, DESC: PRUNE latestRemovedXid 3056060590 nredirected 0 ndead 1, blkref #0: rel 17997/16669/42719825 blk 3448 rmgr: Heap2 len (rec/tot): 58/ 58, tx: 0, lsn: 1F1F/4ADC3798, prev 1F1F/4ADC3760, DESC: PRUNE latestRemovedXid 3056066930 nredirected 0 ndead 1, blkref #0: rel 17997/16669/42719825 blk 839 rmgr: Heap len (rec/tot): 266/ 266, tx: 3056068702, lsn: 1F1F/4ADC37D8, prev 1F1F/4ADC3798, DESC: HOT_UPDATE off 24 xmax 3056068702 flags 0x10 ; NEW off 25 xmax 0, blkref #0: rel 17997/16669/42718028 blk 118384 rmgr: TRANSACTION len (rec/tot): 46/ 46, tx: 3056068702, lsn: 1F1F/4ADC38E8, prev 1F1F/4ADC37D8, DESC: COMMIT 2024-07-14 16:05:14.597812 UTC rmgr: XLOG len (rec/tot): 42/ 42, tx: 0, lsn: 1F1F/4ADC3918, prev 1F1F/4ADC38E8, DESC: END_OF_RECOVERY tli 2; prev tli 1; TIME 2024-07-14 16:08:14.677533 UTC rmgr: XLOG len (rec/tot): 51/ 1633, tx: 0, lsn: 1F1F/4ADC3948, prev 1F1F/4ADC3918, DESC: FPI_FOR_HINT , blkref #0: rel 1663/13726/1259 blk 8 FPW rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 1F1F/4ADC3FB0, prev 1F1F/4ADC3948, DESC: RUNNING_XACTS nextXid 3056068703 latestCompletedXid 3056068702 oldestRunningXid 3056068703 rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 1F1F/4ADC3FE8, prev 1F1F/4ADC3FB0, DESC: RUNNING_XACTS nextXid 3056068703 latestCompletedXid 3056068702 oldestRunningXid 3056068703
The two final arguments are start wal file, and end wal file. In my case, it's the same wal file (so I could just write it once), but in case of larger range, you have to provide both start and end.
Now, this is a lot of text, and most of it isn't really interesting (for us, for now).
Since I need to know about tables that were modified, I just need to look at Heap/Heap2 lines in the output:
postgres :: ~/wal-TABLES =$ /usr/lib/postgresql/14/bin/pg_waldump --start=1F1F/4ADC3348 --end=1F1F/4ADC4038 0000000200001F1F0000004A 0000000200001F1F0000004A | grep Heap rmgr: Heap len (rec/tot): 54/ 54, tx: 3056068701, lsn: 1F1F/4ADC36F8, prev 1F1F/4ADC3348, DESC: LOCK off 14: xid 3056068701: flags 0x00 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 17997/16669/42718133 blk 3409 rmgr: Heap2 len (rec/tot): 56/ 56, tx: 0, lsn: 1F1F/4ADC3760, prev 1F1F/4ADC3730, DESC: PRUNE latestRemovedXid 3056060590 nredirected 0 ndead 1, blkref #0: rel 17997/16669/42719825 blk 3448 rmgr: Heap2 len (rec/tot): 58/ 58, tx: 0, lsn: 1F1F/4ADC3798, prev 1F1F/4ADC3760, DESC: PRUNE latestRemovedXid 3056066930 nredirected 0 ndead 1, blkref #0: rel 17997/16669/42719825 blk 839 rmgr: Heap len (rec/tot): 266/ 266, tx: 3056068702, lsn: 1F1F/4ADC37D8, prev 1F1F/4ADC3798, DESC: HOT_UPDATE off 24 xmax 3056068702 flags 0x10 ; NEW off 25 xmax 0, blkref #0: rel 17997/16669/42718028 blk 118384
Now, we're getting closer. We see there, after desc: description of what has happened. Generally, I would look for things that relate to normal write operations in SQL. Quick grep in sample wal showed that first word of description, in my case, is one of:
- DELETE
- HEAP_CONFIRM
- HOT_UPDATE
- INPLACE
- INSERT
- INSERT+INIT
- LOCK
- MULTI_INSERT
- NEW_CID
- PRUNE
- UPDATE
- UPDATE+INIT
- VACUUM
- VISIBLE
Based on this, and some thinking, I assume we can limit our reading to records that do: DELETE, HOT_UPDATE, INSERT, INSERT+INIT, MULTI_INSERT, UPDATE, and UPDATE+INIT.
In my case I have just one such record:
rmgr: Heap len (rec/tot): 266/ 266, tx: 3056068702, lsn: 1F1F/4ADC37D8, prev 1F1F/4ADC3798, desc: HOT_UPDATE off 24 xmax 3056068702 flags 0x10 ; new off 25 xmax 0, blkref #0: rel 17997/16669/42718028 blk 118384
Now, which table it touched?
This info is closer to the end, as set of three numbers separated by / after rel: 17997/16669/42718028.
First number is oid of tablespace, so we can ignore it, for now.
Second number is oid of database, and we can find out which database it was by doing:
=# SELECT datname FROM pg_database WHERE oid = 16669; datname ───────── realdb (1 ROW)
This query can be ran in any database, as pg_database is global table.
The third number can be mapped to table name using this query, but it has to be ran in the database we got in previous step:
=# SELECT relnamespace::regnamespace, relname FROM pg_class WHERE relfilenode = 42718028; relnamespace │ relname ──────────────────────┼──────────── schema_of_this_table │ TABLE_NAME (1 ROW)
(note: the number doesn't have to be table. It's relation, so it could be, just as well, index. But if the prefix to wal record says rmgr: Heap – then it's a table. For indexes it would be something like Btree, BRIN, Gin, or Gist.)
And as simply as that we know that this table was touched.
What exactly has happened – well, it was some HOT update.
Now, the final bit of information in the line is at the end: blk 118384 – it's block (8kB page) in the table file (numbers start from 0).
With all this, what else can we do? Getting data, like primary key, might be complicated. The thing is that the row could have been updated since. You can try, of course, to get the info using pageinspect extension, but that could be problematic.
There is also one slight issue. If the table was rewritten (cluster, vacuum full, some forms of alter table) since the lsn range, it's relfilenode will change. And thus finding what table it was will be more complicated.
But, if you're checking relatively soon after the switch, there shouldn't be any problems related to this.
Hope you'll find it useful.