On 16th of January 2024, Alexander Korotkov committed patch:
Add new COPY option SAVE_ERROR_TO Currently, when source data contains unexpected data regarding data type or range, the entire COPY fails. However, in some cases, such data can be ignored and just copying normal data is preferable. This commit adds a new option SAVE_ERROR_TO, which specifies where to save the error information. When this option is specified, COPY skips soft errors and continues copying. Currently, SAVE_ERROR_TO only supports "none". This indicates error information is not saved and COPY just skips the unexpected data and continues running. Later works are expected to add more choices, such as 'log' and 'table'. Author: Damir Belyalov, Atsushi Torikoshi, Alex Shulgin, Jian He Discussion: https://postgr.es/m/87k31ftoe0.fsf_-_%40commandprompt.com Reviewed-by: Pavel Stehule, Andres Freund, Tom Lane, Daniel Gustafsson, Reviewed-by: Alena Rybakina, Andy Fan, Andrei Lepikhov, Masahiko Sawada Reviewed-by: Vignesh C, Atsushi Torikoshi
and then, three days later, he changed the syntax in next patch:
Rename COPY option from SAVE_ERROR_TO to ON_ERROR The option names now are "stop" (default) and "ignore". The future options could be "file 'filename.log'" and "table 'tablename'". Discussion: https://postgr.es/m/20240117.164859.2242646601795501168.horikyota.ntt%40gmail.com Author: Jian He Reviewed-by: Atsushi Torikoshi
The ability to make COPY ignore errors is, in my memory, in top five most commonly requested missing features.
Let's see how it behaved, and how it behaves now.
For test I will need faulty input file, which I can produce using:
$ printf 'a\t1\nb\nc\t\nd\t4\n' > copy.data $ cat copy.data a 1 b c d 4
Now, simple table:
=$ CREATE TABLE copytest (FIRST text, SECOND int4); CREATE TABLE
Loading in previous versions was failing like this:
=$ copy copytest FROM '/tmp/copy.data'; ERROR: missing DATA FOR COLUMN "second" CONTEXT: COPY copytest, line 2: "b" =$ SELECT * FROM copytest ; FIRST │ SECOND ───────┼──────── (0 ROWS)
And while testing in Pg17, I got:
=$ copy copytest FROM '/tmp/copy.data' ( on_error IGNORE ); ERROR: missing DATA FOR COLUMN "second" CONTEXT: COPY copytest, line 2: "b"
WHAT ?!
After reading some more, and asking people smarter than me, I understood. In docs it says: “error converting a column's input value into its data type". Converting. The value has to be there, but can be invalid. Let's retry:
$ printf 'a\t1\nb\tz\nc\t\nd\t4\n' | tee /tmp/copy.data a 1 b z c d 4
and, now, let's retry:
=$ copy copytest FROM '/tmp/copy.data' ( on_error IGNORE ); NOTICE: 2 ROWS were skipped due TO DATA TYPE incompatibility COPY 2 =$ SELECT * FROM copytest ; FIRST │ SECOND ───────┼──────── a │ 1 d │ 4 (2 ROWS)
Nice. It would be even nicer if copy could also ignore malformed lines, but I guess it's first step.
Anyway – it's a great thing to add. Thanks to everyone involved.
I made another patch, save error to a table: https://www.postgresql.org/message-id/CACJufxH_OJpVra%3D0c4ow8fbxHj7heMcVaTNEPa5vAurSeNA-6Q%40mail.gmail.com
@jian:
that’s great. it hasn’t been submitted yet, though, as far as I can see? or did I miss something?
it’s here:
https://commitfest.postgresql.org/47/4817/