On 1st of February 2021, Peter Eisentraut committed patch:
SEARCH and CYCLE clauses This adds the SQL standard feature that adds the SEARCH and CYCLE clauses to recursive queries to be able to do produce breadth- or depth-first search orders and detect cycles. These clauses can be rewritten into queries using existing syntax, and that is what this patch does in the rewriter. Reviewed-by: Vik Fearing <vik@postgresfriends.org> Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/db80ceee-6f97-9b4a-8ee8-3ba0c58e5be2@2ndquadrant.com
Just as message suggests this change is related to recursive queries.
Which means, that to test it, we need some data for recursive queries. This time I took directory listing of my /etc, and loaded it to db:
$ CREATE TABLE dirs ( id INT NOT NULL, parent_id INT REFERENCES dirs (id), dirname text, PRIMARY KEY (id) ); CREATE TABLE $ copy dirs FROM '/tmp/dirs.data'; COPY 434
Great, we can now see full paths to each dir:
$ WITH RECURSIVE c AS ( SELECT id, dirname AS path FROM dirs WHERE parent_id IS NULL UNION ALL SELECT dirs.id, c.path || '/' || dirs.dirname AS path FROM dirs JOIN c ON dirs.parent_id = c.id ) SELECT id, path FROM c ORDER BY path LIMIT 15; id | path ----+----------------------------- 1 | etc 42 | etc/acpi 43 | etc/acpi/events 44 | etc/alsa 45 | etc/alsa/conf.d 46 | etc/alternatives 47 | etc/apache2 48 | etc/apache2/conf-available 49 | etc/apm 50 | etc/apm/resume.d 51 | etc/apm/scripts.d 52 | etc/apm/suspend.d 53 | etc/apparmor 54 | etc/apparmor.d 55 | etc/apparmor.d/abstractions (15 ROWS)
Cool. Now let's see what these new changes get us.
First thing is:
$ WITH RECURSIVE c AS ( SELECT id, dirname AS path FROM dirs WHERE parent_id IS NULL UNION ALL SELECT dirs.id, c.path || '/' || dirs.dirname AS path FROM dirs JOIN c ON dirs.parent_id = c.id ) SEARCH DEPTH FIRST BY id SET ordercol SELECT id, path FROM c ORDER BY ordercol LIMIT 15; id | path -----+-------------------------------------------- 1 | etc 2 | etc/.java 434 | etc/.java/.systemPrefs 3 | etc/GNUstep 4 | etc/ImageMagick-6 5 | etc/Muttrc.d 6 | etc/NetworkManager 7 | etc/NetworkManager/conf.d 8 | etc/NetworkManager/dispatcher.d 9 | etc/NetworkManager/dispatcher.d/no-wait.d 10 | etc/NetworkManager/dispatcher.d/pre-down.d 11 | etc/NetworkManager/dispatcher.d/pre-up.d 12 | etc/NetworkManager/dnsmasq.d 13 | etc/NetworkManager/dnsmasq-shared.d 14 | etc/NetworkManager/system-connections (15 ROWS)
OK. So this returns the dirs, depth first, but ordered by id.
Let's try to get the previous output, ascii-betical order, using this new approach, but also, let's see what ordercol really is:
$ WITH RECURSIVE c AS ( SELECT id, dirname AS path FROM dirs WHERE parent_id IS NULL UNION ALL SELECT dirs.id, c.path || '/' || dirs.dirname AS path FROM dirs JOIN c ON dirs.parent_id = c.id ) SEARCH DEPTH FIRST BY path SET ordercol SELECT id, path, ordercol FROM c ORDER BY ordercol LIMIT 15; id | path | ordercol ----+----------------------------------------------+------------------------------------------------------------------------------------------- 1 | etc | {(etc)} 42 | etc/acpi | {(etc),(etc/acpi)} 43 | etc/acpi/events | {(etc),(etc/acpi),(etc/acpi/events)} 44 | etc/alsa | {(etc),(etc/alsa)} 45 | etc/alsa/conf.d | {(etc),(etc/alsa),(etc/alsa/conf.d)} 46 | etc/alternatives | {(etc),(etc/alternatives)} 47 | etc/apache2 | {(etc),(etc/apache2)} 48 | etc/apache2/conf-available | {(etc),(etc/apache2),(etc/apache2/conf-available)} 49 | etc/apm | {(etc),(etc/apm)} 50 | etc/apm/resume.d | {(etc),(etc/apm),(etc/apm/resume.d)} 51 | etc/apm/scripts.d | {(etc),(etc/apm),(etc/apm/scripts.d)} 52 | etc/apm/suspend.d | {(etc),(etc/apm),(etc/apm/suspend.d)} 53 | etc/apparmor | {(etc),(etc/apparmor)} 67 | etc/apparmor/init | {(etc),(etc/apparmor),(etc/apparmor/init)} 68 | etc/apparmor/init/network-interface-security | {(etc),(etc/apparmor),(etc/apparmor/init),(etc/apparmor/init/network-interface-security)} (15 ROWS)
So, I can see now, that in this sort – etc/apparmor/X is actually before /etc/apparmor.d/X. Which kinda makes sense. Let's compare the two orders on their own:
Old approach:
$ WITH RECURSIVE c AS ( SELECT id, dirname AS path FROM dirs WHERE id IN (53, 54) UNION ALL SELECT dirs.id, c.path || '/' || dirs.dirname AS path FROM dirs JOIN c ON dirs.parent_id = c.id ) SELECT id, path FROM c ORDER BY path; id | path ----+------------------------------------------- 53 | apparmor 54 | apparmor.d 55 | apparmor.d/abstractions 56 | apparmor.d/abstractions/apparmor_api 57 | apparmor.d/abstractions/ubuntu-browsers.d 58 | apparmor.d/disable 59 | apparmor.d/force-complain 60 | apparmor.d/libvirt 61 | apparmor.d/LOCAL 62 | apparmor.d/LOCAL/abstractions 63 | apparmor.d/tunables 64 | apparmor.d/tunables/home.d 65 | apparmor.d/tunables/multiarch.d 66 | apparmor.d/tunables/xdg-user-dirs.d 67 | apparmor/init 68 | apparmor/init/network-interface-security (16 ROWS)
and the new one:
$ WITH RECURSIVE c AS ( SELECT id, dirname AS path FROM dirs WHERE id IN (53, 54) UNION ALL SELECT dirs.id, c.path || '/' || dirs.dirname AS path FROM dirs JOIN c ON dirs.parent_id = c.id ) SEARCH DEPTH FIRST BY path SET ordercol SELECT id, path FROM c ORDER BY ordercol; id | path ----+------------------------------------------- 53 | apparmor 67 | apparmor/init 68 | apparmor/init/network-interface-security 54 | apparmor.d 55 | apparmor.d/abstractions 56 | apparmor.d/abstractions/apparmor_api 57 | apparmor.d/abstractions/ubuntu-browsers.d 58 | apparmor.d/disable 59 | apparmor.d/force-complain 60 | apparmor.d/libvirt 61 | apparmor.d/LOCAL 62 | apparmor.d/LOCAL/abstractions 63 | apparmor.d/tunables 64 | apparmor.d/tunables/home.d 65 | apparmor.d/tunables/multiarch.d 66 | apparmor.d/tunables/xdg-user-dirs.d (16 ROWS)
Please note that apparmor/init and apparmor/init/network-interface-security are in much better place now – immediately after apparmor/, and not apparmor.d. Nice!
What's cool, we can also now also have order by breadth first:
$ WITH RECURSIVE c AS ( SELECT id, dirname AS path FROM dirs WHERE parent_id IS NULL UNION ALL SELECT dirs.id, c.path || '/' || dirs.dirname AS path FROM dirs JOIN c ON dirs.parent_id = c.id ) SEARCH BREADTH FIRST BY id SET ordercol SELECT id, path FROM c ORDER BY ordercol LIMIT 15; id | path ----+--------------------- 1 | etc 2 | etc/.java 3 | etc/GNUstep 4 | etc/ImageMagick-6 5 | etc/Muttrc.d 6 | etc/NetworkManager 15 | etc/ODBCDataSources 16 | etc/OpenCL 18 | etc/PackageKit 19 | etc/UPower 20 | etc/X11 42 | etc/acpi 44 | etc/alsa 46 | etc/alternatives 47 | etc/apache2 (15 ROWS)
yeah – There are many dirs in /etc, so to see some dirs deeper I will have to offset quite a lot:
$ WITH RECURSIVE c AS ( SELECT id, dirname AS path FROM dirs WHERE parent_id IS NULL UNION ALL SELECT dirs.id, c.path || '/' || dirs.dirname AS path FROM dirs JOIN c ON dirs.parent_id = c.id ) SEARCH BREADTH FIRST BY id SET ordercol SELECT id, path FROM c ORDER BY ordercol LIMIT 15 offset 170; id | path -----+--------------------------------------- 420 | etc/vulkan 424 | etc/wpa_supplicant 425 | etc/xdg 433 | etc/xml 7 | etc/NetworkManager/conf.d 8 | etc/NetworkManager/dispatcher.d 12 | etc/NetworkManager/dnsmasq.d 13 | etc/NetworkManager/dnsmasq-shared.d 14 | etc/NetworkManager/system-connections 17 | etc/OpenCL/vendors 21 | etc/X11/Xreset.d 22 | etc/X11/Xresources 23 | etc/X11/Xsession.d 24 | etc/X11/app-defaults 25 | etc/X11/cursors (15 ROWS)
This is awesome.
For the second part of the change – cycles, I will need different dataset.
So I took a small bit of Ticket to Ride map:
and converted it to table:
$ CREATE TABLE conns ( c_from text, c_to text, PRIMARY KEY (c_from, c_to) ); CREATE TABLE $ INSERT INTO conns (c_from, c_to) VALUES ( 'Chicago', 'Duluth' ), ( 'Duluth', 'Chicago' ), ( 'Chicago', 'Toronto' ), ( 'Toronto', 'Chicago' ), ( 'Chicago', 'Pittsburg' ), ( 'Pittsburg', 'Chicago' ), ( 'Pittsburg', 'Toronto' ), ( 'Toronto', 'Pittsburg' ), ( 'Duluth', 'Toronto' ), ( 'Toronto', 'Duluth' ), ( 'Chicago', 'Omaha' ), ( 'Omaha', 'Chicago' ); INSERT 0 12
Now, let's print all possible routes we can go through, starting from Chicago. Simplistic approach:
$ WITH RECURSIVE c AS ( SELECT c_from, c_to, array[ c_from, c_to ] AS track FROM conns WHERE c_from = 'Chicago' UNION ALL SELECT n.c_from, n.c_to, c.track || n.c_to FROM conns n JOIN c ON n.c_from = c.c_to ) SELECT * FROM c;
will, of course, never end, because there are cycles in here.
But now, we can have cycle detection! So let's try it:
$ WITH RECURSIVE c AS ( SELECT NULL AS c_from, 'Chicago' AS c_to, ARRAY[ 'Chicago' ] AS track UNION ALL SELECT n.c_from, n.c_to, c.track || n.c_to FROM conns n JOIN c ON n.c_from = c.c_to ) CYCLE c_to SET is_cycle TO TRUE DEFAULT FALSE USING cycle_path SELECT * FROM c; c_from | c_to | track | is_cycle | cycle_path -----------+-----------+--------------------------------------------+----------+------------------------------------------------------ | Chicago | {Chicago} | f | {(Chicago)} Chicago | Duluth | {Chicago,Duluth} | f | {(Chicago),(Duluth)} Chicago | Toronto | {Chicago,Toronto} | f | {(Chicago),(Toronto)} Chicago | Pittsburg | {Chicago,Pittsburg} | f | {(Chicago),(Pittsburg)} Chicago | Omaha | {Chicago,Omaha} | f | {(Chicago),(Omaha)} Duluth | Chicago | {Chicago,Duluth,Chicago} | t | {(Chicago),(Duluth),(Chicago)} Toronto | Chicago | {Chicago,Toronto,Chicago} | t | {(Chicago),(Toronto),(Chicago)} Pittsburg | Chicago | {Chicago,Pittsburg,Chicago} | t | {(Chicago),(Pittsburg),(Chicago)} Pittsburg | Toronto | {Chicago,Pittsburg,Toronto} | f | {(Chicago),(Pittsburg),(Toronto)} Toronto | Pittsburg | {Chicago,Toronto,Pittsburg} | f | {(Chicago),(Toronto),(Pittsburg)} Duluth | Toronto | {Chicago,Duluth,Toronto} | f | {(Chicago),(Duluth),(Toronto)} Toronto | Duluth | {Chicago,Toronto,Duluth} | f | {(Chicago),(Toronto),(Duluth)} Omaha | Chicago | {Chicago,Omaha,Chicago} | t | {(Chicago),(Omaha),(Chicago)} Duluth | Chicago | {Chicago,Toronto,Duluth,Chicago} | t | {(Chicago),(Toronto),(Duluth),(Chicago)} Toronto | Chicago | {Chicago,Duluth,Toronto,Chicago} | t | {(Chicago),(Duluth),(Toronto),(Chicago)} Toronto | Chicago | {Chicago,Pittsburg,Toronto,Chicago} | t | {(Chicago),(Pittsburg),(Toronto),(Chicago)} Pittsburg | Chicago | {Chicago,Toronto,Pittsburg,Chicago} | t | {(Chicago),(Toronto),(Pittsburg),(Chicago)} Pittsburg | Toronto | {Chicago,Toronto,Pittsburg,Toronto} | t | {(Chicago),(Toronto),(Pittsburg),(Toronto)} Toronto | Pittsburg | {Chicago,Duluth,Toronto,Pittsburg} | f | {(Chicago),(Duluth),(Toronto),(Pittsburg)} Toronto | Pittsburg | {Chicago,Pittsburg,Toronto,Pittsburg} | t | {(Chicago),(Pittsburg),(Toronto),(Pittsburg)} Duluth | Toronto | {Chicago,Toronto,Duluth,Toronto} | t | {(Chicago),(Toronto),(Duluth),(Toronto)} Toronto | Duluth | {Chicago,Duluth,Toronto,Duluth} | t | {(Chicago),(Duluth),(Toronto),(Duluth)} Toronto | Duluth | {Chicago,Pittsburg,Toronto,Duluth} | f | {(Chicago),(Pittsburg),(Toronto),(Duluth)} Duluth | Chicago | {Chicago,Pittsburg,Toronto,Duluth,Chicago} | t | {(Chicago),(Pittsburg),(Toronto),(Duluth),(Chicago)} Pittsburg | Chicago | {Chicago,Duluth,Toronto,Pittsburg,Chicago} | t | {(Chicago),(Duluth),(Toronto),(Pittsburg),(Chicago)} Pittsburg | Toronto | {Chicago,Duluth,Toronto,Pittsburg,Toronto} | t | {(Chicago),(Duluth),(Toronto),(Pittsburg),(Toronto)} Duluth | Toronto | {Chicago,Pittsburg,Toronto,Duluth,Toronto} | t | {(Chicago),(Pittsburg),(Toronto),(Duluth),(Toronto)} (27 ROWS)
This is as really cool. What's more – I didn't even had to add any condition on the is_cycle, it was sufficient that it's calculated, and processing stopped. And I can still add “WHERE is_cycle = false" to weed out the connections that cycle back to city that was already visited:
$ WITH RECURSIVE c AS ( SELECT NULL AS c_from, 'Chicago' AS c_to, ARRAY[ 'Chicago' ] AS track UNION ALL SELECT n.c_from, n.c_to, c.track || n.c_to FROM conns n JOIN c ON n.c_from = c.c_to ) CYCLE c_to SET is_cycle TO TRUE DEFAULT FALSE USING cycle_path SELECT * FROM c WHERE is_cycle = FALSE; c_from | c_to | track | is_cycle | cycle_path -----------+-----------+------------------------------------+----------+-------------------------------------------- | Chicago | {Chicago} | f | {(Chicago)} Chicago | Duluth | {Chicago,Duluth} | f | {(Chicago),(Duluth)} Chicago | Toronto | {Chicago,Toronto} | f | {(Chicago),(Toronto)} Chicago | Pittsburg | {Chicago,Pittsburg} | f | {(Chicago),(Pittsburg)} Chicago | Omaha | {Chicago,Omaha} | f | {(Chicago),(Omaha)} Pittsburg | Toronto | {Chicago,Pittsburg,Toronto} | f | {(Chicago),(Pittsburg),(Toronto)} Toronto | Pittsburg | {Chicago,Toronto,Pittsburg} | f | {(Chicago),(Toronto),(Pittsburg)} Duluth | Toronto | {Chicago,Duluth,Toronto} | f | {(Chicago),(Duluth),(Toronto)} Toronto | Duluth | {Chicago,Toronto,Duluth} | f | {(Chicago),(Toronto),(Duluth)} Toronto | Pittsburg | {Chicago,Duluth,Toronto,Pittsburg} | f | {(Chicago),(Duluth),(Toronto),(Pittsburg)} Toronto | Duluth | {Chicago,Pittsburg,Toronto,Duluth} | f | {(Chicago),(Pittsburg),(Toronto),(Duluth)} (11 ROWS)
All in all – both these things look amazing, and will greatly help writing queries for hierarchical data.
Huge thanks to all involved 🙂
Pittsburgh is spelled with an “h” at the end
Awesome feat., what would be the impact on the cache-ability of such queries?