Friend of mine found something that he thought looked like a bug in Pg.
Very simple query:
SELECT * FROM TABLE WHERE id NOT IN (SELECT FIELD FROM other_table)
was not returning any rows, despite the fact that there definitely are some “ids" that are not in other_table.field. Why is that?
First, let's build a test case:
CREATE TABLE objects (id INT4 PRIMARY KEY); CREATE TABLE secondary (object_id INT4); INSERT INTO objects (id) VALUES (1), (2), (3); INSERT INTO secondary (object_id) VALUES (NULL), (2), (4);
Data:
# SELECT * FROM objects ; id ---- 1 2 3 (3 ROWS) # SELECT * FROM secondary ; object_id ----------- [NULL] 2 4 (3 ROWS)
OK. So, first simple check:
# SELECT * FROM objects WHERE id IN (SELECT object_id FROM secondary); id ---- 2 (1 ROW)
OK. Given the fact that we have 3 ids of objects (1,2,3), and “IN" is only “2", then “NOT IN" should yield 1 and 3. Right? Wrong:
# SELECT * FROM objects WHERE id NOT IN (SELECT object_id FROM secondary); id ---- (0 ROWS)
Whoa. Why?
Let's check what is the result of “id NOT IN …" for every id:
# SELECT id, id NOT IN (SELECT object_id FROM secondary) FROM objects; id | ?COLUMN? ----+---------- 1 | [NULL] 2 | f 3 | [NULL] (3 ROWS)
Hmm, but why?
Let's consider case where id = 2.
- When checking against value “2" in secondary table – id (2) is equal to 2, so we know for sure that “NOT IN" returns false. We can skip going further
- When checking against value “4" in secondary table – id (2) is not equal to 4, so, it looks like “not in" matches. But we have to continue searching.
- When checking against value NULL in secondary table – comparison between id and NULL yields NULL, because NULL is unknown. It might be equal, or it might not. And we still have to search further.
This means that for id = 2, we got for sure “true" (for NOT IN) operator, because there is row in secondary table with object_id = 2.
But what happens when we deal with id = 3?
- When checking against value “2" in secondary table – id (2) is not equal to 2, so, it looks like “not in" matches. But we have to continue searching.
- When checking against value “4" in secondary table – id (2) is not equal to 4, so, it looks like “not in" matches. But we have to continue searching.
- When checking against value NULL in secondary table – comparison between id and NULL yields NULL, because NULL is unknown. It might be equal, or it might not. And we still have to search further.
For 2 rows we got information “NOT IN", but for one – we're not sure. NULL is unknown, so it might as well be 3. We don't know. And since we don't know, NOT IN has to return “NULL" as well.
If you'll look closely you'll see that it is technically impossible to get “true" in this case – as long as there are NULL values in secondary table!
What should we do then?
There is very simple solution:
# SELECT id, id NOT IN (SELECT object_id FROM secondary WHERE object_id IS NOT NULL) FROM objects; id | ?COLUMN? ----+---------- 1 | t 2 | f 3 | t (3 ROWS)
Which will return rows with id = 1 and id = 3.
This seems like really silly behavior. I can’t think of any use case where NULL should be considered unknown. NULL means empty as in nothing there. Therefore, the object.id that do not equal NULLs in secondary.object_id should be considered NOT IN.
Stated differently, if something is NULL and something has a value (string, int, numeric, etc…) they are clearly not equal by no stretch of any imagination or conjecture so they should be considered NOT IN-cluded in the comparisons.
@Eric:
well, perhaps it’s silly. but it’s the standard. and when you remember to see nulls like *unknown* – everything makes sense.
Honestly, any developer worth their salt will understand the properties of NULL. NULL is treated the same way in every language and every RDBMS that I’ve encountered in my career. When testing equality, anything tested against NULL returns NULL.
You did good write-up, and I’m not criticizing you, I’m just saying that there is no bug, there is no issue. This works as expected.
And if PG changed how this works, THAT would be a problem.
@Shane:
Unfortunately not all RDBMSes work this way.
I’m sure at least some versions of oracle (i haven’t used it in years) made NULL = ”.
I’m also sure, that not only Oracle did something with NULLs that is against standard.
Oracle never make NULL = ”,
they make
”=” –> NULL
” IS NULL –> True
I’ve already stumbled upon this few times so I knew what it’ll be about after title itself, I quite agree with shane, when you take a look at it is very reasonable.
But there is simialar thing when using ALL (at least in Oracle didn’t tested other dbs), that
lately confused me.
using the following to selects yields different results because ‘tbl’ contain NULL values
SELECT name FROM tbl WHERE value > ALL (SELECT value FROM tbl WHERE fltr=’Blah’);
SELECT name FROM tbl WHERE value > (SELECT max(value) from tbl WHERE fltr=’Blah’);
Don’t forget, you can often rewrite IN queries to use the EXISTS clause, which might perform a little more to your liking.
pagila=# SELECT * FROM objects WHERE exists (SELECT object_id FROM secondary where object_id = id);
id
—-
2
(1 row)
pagila=# SELECT * FROM objects WHERE not exists (SELECT object_id FROM secondary where object_id = id);
id
—-
1
3
(2 rows)
I have found that, in all cases, the best method is to perform an outer join and use IS NULL on the “not in” table. It seems to optimize much better, though it takes a bit more time and though to write the query in an ad hoc fashion. This also ensures you’re not running into any gotchas like this.
@despez:
In all recent versions of Oracle (maybe every version) NULL becomes ” when Oracle thinks it needs to and ” becomes NULL always.
In Oracle:
select count(*) from dual where ” is null;
…returns 1 (dual is an Oracle-standard one row table)
select ‘abc’||null||’123’ from dual;
…returns ‘abc123’
select length(”) from dual;
…returns NULL
If Oracle followed the standard, the first query would return 0 no matter how many rows dual had, the second would return null, and the third would return 0. These non-standard behaviors are so well entrenched in the Oracle world that I doubt they will change anytime soon.
@depesz:
“when you remember to see nulls like *unknown* – everything makes sense”
That’s not true. See my slides from this talk:
http://www.pgcon.org/2008/schedule/events/83.en.html
Sometimes NULL is treated like unknown (operators, functions, [NOT] IN), sometimes it’s treated like no value at all (aggregates and outer joins), sometimes it’s treated like false (WHERE clause), and sometimes it’s treated like true (constraints).
These inconsistencies can lead to very unintuitive results.
Also, I wrote up something about NULLs on my blog as well, just before depesz:
http://people.planetpostgresql.org/jdavis/index.php?/archives/9-None,-nil,-Nothing,-undef,-NA,-and-SQL-NULL.html
@Jeff Davis:
I saw the slides, but I didn’t found there any examples.
If you could write examples on:
1. null as no value in case of outer join
2. null as false in where
3. null as true in constraints
I’m not arguing with your points – I would just like to see some justification for the points, because I can’t find any examples in my memory, and the only example left (no value in aggregates) is easily explainable by the fact that aggregates should skip null values.
As far as I recall, this “skip nulls in aggregates” comes from standard, but I might be wrong.
@Jeff Davis:
sorry, bad wording. not “didn’t found there any examples”, but “any examples regarding these cases”.
btw. i swear that wordpress once had the ability to edit comments, but it seems to be gone now.
@depesz:
1. null as no value:
If you select the SUM of a table (or group or subselect) and there are no input tuples, it returns NULL. Clearly this is not unknown, because your source data contains no NULLs. If you know everything, how can the results of a query be unknown?
Also, with outer join, how can outer join produce “unknown” when there are no NULLs in your source data?
And as a final observation: if you SUM() a column with NULLs in it in addition to integers, the NULLs will be ignored, and it will produce a value (if NULLs are unknown, clearly it’s wrong to count them as zero). Thus, as I point out in my slides, SUM() of NULL and 1 is 1, but NULL + 1 is NULL.
2. NULL is clearly false in a WHERE clause:
SELECT 1;
SELECT 1 WHERE NULL; — same as “WHERE FALSE”
3. Add a “CHECK (NULL)” as a constraint. It will not reject anything, and is therefore equivalent to “CHECK (TRUE)”.
@Jeff Davis:
Hmm, I’m not sure if I can support the idea that it is not logical.
ad. 1: perhaps it is badly named, but there is nowhere claim that “+” should work the same as sum() which is aggregate. the moment you know that aggregates ignore nulls, there are no surprises.
ad. 2: no, it is not false. sql logic is based on not 2, but 3 values. something that isn’t true doesn’t have to be false. “where” need condition that evaluates to “true”. anything else is rejected. it doesn’t mean in any way that null = false, as this: “select 1 where not null” still doesn’t return any rows, and if it was the case you described – it would.
ad. 3: basically the same thing as with “where”, but checks are checked for *false* value. again. something that isn’t true doesn’t mean it’s false.
and again – 3 value logic works quite well when you’ll assume null to be “unknown”.
where – requires “true”, not something that we don’t know or something that we know is false.
check – it stops entering “false” values, but if it is not definitely false – it can be inserted. it might be “true”. it might be simply “unknown”. but it’s not “false”.
Maybe it’s better to think of NULL as *undefined*. The aggregate sum() returns NULL if the relation doesn’t have any tuples, and it’s correct IMO. Still, developers are drawing conclusions from this undefined return value, like in this case: sum() returns NULL -> relation is empty.
As far as the example is concerned:
SELECT * FROM objects WHERE id NOT IN (SELECT object_id FROM secondary)
Why not write it as
SELECT * FROM objects WHERE id NOT IN (SELECT object_id FROM secondary WHERE object_id IS NOT NULL)
This avoids the problem, doesn’t it?
Featuring Oracle 10gR2 ..
SQL> SELECT ‘yes’ AS hehe FROM DUAL WHERE ” IS NULL;
HEHE
——————————–
yes
SQL>
maybe it’s best to think of NULL as NULL in a three-value logic and not try to bend it into two-value logic because that’s what causes all the alleged anomalies 🙂
Very helpful. I spent long time to get rid of this case, and found your advice.
Really thanks a lot for your trouble to find and explain all things.
Thanks.
Thanks very much for your post. It was exactly the problem I was having.