There exists an extension to PostgreSQL, which lets you use hypothetical indexes.
What are there? That's simple – these are indexes that don't really exist. So what good are they?
Let's see.
First, I will need some test data. That will be relatively simple:
$ CREATE TABLE test_data ( id serial PRIMARY KEY, some_timestamp timestamptz, some_integer int4, other_integer int4, another_integer int4 ); $ INSERT INTO test_data (some_timestamp, some_integer, other_integer, another_integer) SELECT now() - random() * '10 years'::INTERVAL, random() * 10, random() * 100, random() * 1000 FROM generate_series(1,100000000);
100 million records, but the table is relatively small:
\dt+ test_data List OF relations Schema | Name | TYPE | Owner | SIZE | Description --------+-----------+-------+--------+---------+------------- public | test_data | TABLE | depesz | 5746 MB | (1 ROW)
Now, let's assume we're considering adding index on (some_integer, other_integer), but we'd like to know if it would be used for queries that look for other_integer only.
Normally, I'd have to, on some side table/database:
$ CREATE INDEX test_index ON test_data (some_integer, other_integer); CREATE INDEX TIME: 243084.367 ms $ EXPLAIN SELECT * FROM test_data WHERE other_integer = 50; QUERY PLAN --------------------------------------------------------------------- Seq Scan ON test_data (cost=0.00..1985295.00 ROWS=933401 width=24) FILTER: (other_integer = 50) (2 ROWS) TIME: 19.342 ms
The answer is “no, it will not use such index" – but the index creation was long, and pretty expensive in terms of IO and CPU.
Thanks to hypopg, I should be able to get the same answer in fraction of the time. So let's see how it works (I of course dropped this test index).
To install, I need to:
- sudo apt-get install pgxnclient
- pgxn install –unstable hypopg
First step install pgxn tool, which is then used to install actual utility.
To have it working, I have to have certain packages installed (basically pg headers), but this should be trivial to do by appropriate “apt-get install" or “yum install".
Installation took couple of seconds. Afterwards, I can:
$ CREATE extension hypopg; CREATE EXTENSION
Which loads the code to my test database.
Now, sanity check that test_data table is without this additional index:
\d test_data TABLE "public.test_data" COLUMN | TYPE | Modifiers -----------------+--------------------------+-------------------------------------------------------- id | INTEGER | NOT NULL DEFAULT NEXTVAL('test_data_id_seq'::regclass) some_timestamp | TIMESTAMP WITH TIME zone | some_integer | INTEGER | other_integer | INTEGER | another_integer | INTEGER | Indexes: "test_data_pkey" PRIMARY KEY, btree (id)
OK. With this in place, I can create the hypothetical index. This is done using function call:
$ SELECT hypopg_create_index('create index on test_data (some_integer, other_integer)'); hypopg_create_index ----------------------------------------------------------- (16719,<16719>btree_test_data_some_integer_other_integer) (1 ROW)
This took ~ 0.5s.
Now, I can try the explain again:
$ EXPLAIN SELECT * FROM test_data WHERE other_integer = 50; QUERY PLAN ---------------------------------------------------------------------- Seq Scan ON test_data (cost=0.00..1985295.10 ROWS=1210000 width=24) FILTER: (other_integer = 50) (2 ROWS)
Index is not used. But perhaps I did something wrong? let's try to make index what will definitely be used:
$ SELECT hypopg_create_index('create index on test_data (some_integer, other_integer, another_integer)'); hypopg_create_index ------------------------------------------------------------------------- (16721,<16721>btree_test_data_some_integer_other_integer_another_integ) (1 ROW) $ EXPLAIN SELECT * FROM test_data WHERE some_integer = 5 AND other_integer = 50 AND another_integer = 500; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- INDEX Scan USING <16721>btree_test_data_some_integer_other_integer_another_integ ON test_data (cost=0.07..464.07 ROWS=115 width=24) INDEX Cond: ((some_integer = 5) AND (other_integer = 50) AND (another_integer = 500)) (2 ROWS)
Hypopg works (from what I can tell) by modifying system catalog (information about indexes), but only for current backend. So whatever you're testing in your connection – it will not influence other, “normal" connections.
Also, as soon as you close your connection – the hypothetical indexes you created are gone. Of course you can also drop them manually (by calling hypopg_drop_index() function), but I would say that generally there is no point in doing it – since it all will be cleaned as soon as you'll disconnect.
All in all, it looks great, and while in a lot of cases it's trivial to tell if the index will be used, for all the not-so-obvious cases, hypopg will be great tool.
Nice 🙂 How does PG react when you actually run a query that tries to use an hypothetical index ?
It would be nice if it actually works when i run the query. I tried it with the both types of index and the results were not acceptable.
hypo- index:
explain ANALYZE VERBOSE select * from test_data where some_integer = 5 and other_integer = 50 and another_integer = 500;
+—————————————————————————————————————————-+
| QUERY PLAN |
+—————————————————————————————————————————-+
| Seq Scan on pg_temp_2.test_data (cost=0.00..2485295.00 rows=13 width=24) (actual time=710.865..10830.540 rows=91 loops=1) |
| Output: id, some_timestamp, some_integer, other_integer, another_integer |
| Filter: ((test_data.some_integer = 5) AND (test_data.other_integer = 50) AND (test_data.another_integer = 500)) |
| Rows Removed by Filter: 99999909 |
| Planning time: 0.070 ms |
| Execution time: 10830.618 ms |
+—————————————————————————————————————————-+
Standard index
EXPLAIN ANALYZE VERBOSE select * from test_data where some_integer = 5 and other_integer = 50 and another_integer = 500;
+————————————————————————————————————————————+
| QUERY PLAN |
+————————————————————————————————————————————+
| Index Scan using test_index on pg_temp_2.test_data (cost=0.57..56.86 rows=13 width=24) (actual time=1.961..8.133 rows=91 loops=1) |
| Output: id, some_timestamp, some_integer, other_integer, another_integer |
| Index Cond: ((test_data.some_integer = 5) AND (test_data.other_integer = 50) AND (test_data.another_integer = 500)) |
| Planning time: 0.091 ms |
| Execution time: 8.172 ms |
+————————————————————————————————————————————+
I’m not sure how to respond to your questions. Pg cannot run queries using hypothetical indexes because they don’t exist. So when you actually run a query, for example by using explain analyze, it’s being run as if the index didn’t exist. Because it doesn’t.
Yes, the hypothetical indexes only exists when a simple EXPLAIN is called. If you use ANALYZE, the hypothetical indexes are simply skipped.
In fact, the hypothetical are not directly stored in the catalog but in memory. A hook on explain adds them to the planner’s list of indexes when called with a simple EXPLAIN. See the hypo_get_relation_info_hook function for more details.
Ah that’s neat, explain statements and normal statements get a different view of the list of indexes, so there’s no error, thanks.