Today, on Stack Overflow there was interesting question.
Generally, given table that looks like this:
room | people | price | hotel 1 | 1 | 200 | A 2 | 2 | 99 | A 3 | 3 | 95 | A 4 | 1 | 90 | B 5 | 6 | 300 | B
Find cheapest combination of rooms that would accomodate given number of guests.
For example – For 1 guest it should be:
hotel | price A | 200 B | 90
for 2 guests:
hotel | price A | 99
and for 6 guests:
hotel | price A | 394 B | 300
So far so good.
Of course, the problem in general is not possible to solve, because it is NP-Complete.
But, what about actually doing it for small sets of data? Well, I know that for
1000 hotels, and 100000 rooms it will take ages, but what about these 2 hotels
and 5 rooms?
Decided to try.
Since for some number of people we will need only one room, and for some many rooms, we need to start (and this is the complicated part) with generating list of all sets of rooms.
For example – having our table of rooms above, we should get following sets of rooms:
- hotel A: 1, 2, 3, 1 + 2, 1 + 3, 2 + 3, 1 + 2 + 3
- hotel B: 4, 5, 4 + 5
This naturally (well, for me, you might have different ideas) screams for using recursive CTE.
Let's start – every set starts with single room. So, I write:
WITH room_sets AS ( SELECT r.hotel, ARRAY[ r.room ] AS rooms, r.price, r.people FROM room_price r ) SELECT * FROM room_sets; hotel | rooms | price | people -------+-------+-------+-------- A | {1} | 200 | 1 A | {2} | 99 | 2 A | {3} | 95 | 3 B | {4} | 90 | 1 B | {5} | 300 | 6 (5 ROWS)
So far nothing magical, I just changed datatype from int4 to int4[] – i.e. array.
Array will be useful, so that we can have all room numbers in a set in single column.
Now, we need to recursively make combined room sets, with additional rooms. New query:
WITH RECURSIVE room_sets AS ( SELECT r.hotel, ARRAY[ r.room ] AS rooms, r.price, r.people FROM room_price r UNION ALL SELECT rs.hotel, rs.rooms || n.room, rs.price + n.price AS price, rs.people + n.people AS people FROM room_sets rs JOIN room_price n USING (hotel) WHERE NOT ( n.room = ANY ( rs.rooms ) ) ) SELECT * FROM room_sets; hotel | rooms | price | people -------+---------+-------+-------- A | {1} | 200 | 1 A | {2} | 99 | 2 A | {3} | 95 | 3 B | {4} | 90 | 1 B | {5} | 300 | 6 A | {1,3} | 295 | 4 A | {1,2} | 299 | 3 A | {2,3} | 194 | 5 A | {2,1} | 299 | 3 A | {3,2} | 194 | 5 A | {3,1} | 295 | 4 B | {4,5} | 390 | 7 B | {5,4} | 390 | 7 A | {1,3,2} | 394 | 6 A | {1,2,3} | 394 | 6 A | {2,3,1} | 394 | 6 A | {2,1,3} | 394 | 6 A | {3,2,1} | 394 | 6 A | {3,1,2} | 394 | 6 (19 ROWS)
Whoa. That's nice. We have all possible combinations of rooms. But it is
actually pretty problematic. For example – last 6 rows all describe the same
set of rows, just ordering of rows in set is different. But we don't care about
order in set. So, what can we do?
It's actually pretty simple. As you see we have condition:
NOT ( n.room = ANY ( rs.rooms ) )
Which makes sure that given room can be in set only once.
But we can change this, by forcing that we can add only rooms that are “after" (numerically) latest room in set. Which gives us:
WITH RECURSIVE room_sets AS ( SELECT r.hotel, ARRAY[ r.room ] AS rooms, r.price, r.people FROM room_price r UNION ALL SELECT rs.hotel, rs.rooms || n.room, rs.price + n.price AS price, rs.people + n.people AS people FROM room_sets rs JOIN room_price n USING (hotel) WHERE n.room > rs.rooms[ array_upper( rs.rooms, 1 ) ] ) SELECT * FROM room_sets; hotel | rooms | price | people -------+---------+-------+-------- A | {1} | 200 | 1 A | {2} | 99 | 2 A | {3} | 95 | 3 B | {4} | 90 | 1 B | {5} | 300 | 6 A | {1,3} | 295 | 4 A | {1,2} | 299 | 3 A | {2,3} | 194 | 5 B | {4,5} | 390 | 7 A | {1,2,3} | 394 | 6 (10 ROWS)
Which is pretty cool. Each set is only once, we have summarized price, and people count.
One thing though. We know that we are looking for rooms for n people. Based on this we probably should limit what the query returns, and not generate sets that are guaranteed to be too big (we still need to generate sets smaller, because larger sets are based on smaller).
Since we will be using this number (n-people) many times, let's put it in single place, and give it a name:
WITH RECURSIVE setup AS ( SELECT 3::INT4 AS people ), room_sets AS ( SELECT r.hotel, ARRAY[ r.room ] AS rooms, r.price, r.people FROM setup s, room_price r WHERE r.people <= s.people UNION ALL SELECT rs.hotel, rs.rooms || n.room, rs.price + n.price AS price, rs.people + n.people AS people FROM setup s, room_sets rs JOIN room_price n USING (hotel) WHERE n.room > rs.rooms[ array_upper( rs.rooms, 1 ) ] AND rs.people + n.people <= s.people ) SELECT * FROM room_sets; hotel | rooms | price | people -------+-------+-------+-------- A | {1} | 200 | 1 A | {2} | 99 | 2 A | {3} | 95 | 3 B | {4} | 90 | 1 A | {1,2} | 299 | 3 (5 ROWS)
Now, we no longer have sets of rooms that are larger than requested number of people.
With this we can finally filter it to room sets that accomodate requested number of people:
WITH RECURSIVE setup AS ( SELECT 3::INT4 AS people ), room_sets AS ( SELECT r.hotel, ARRAY[ r.room ] AS rooms, r.price, r.people FROM setup s, room_price r WHERE r.people <= s.people UNION ALL SELECT rs.hotel, rs.rooms || n.room, rs.price + n.price AS price, rs.people + n.people AS people FROM setup s, room_sets rs JOIN room_price n USING (hotel) WHERE n.room > rs.rooms[ array_upper( rs.rooms, 1 ) ] AND rs.people + n.people <= s.people ) SELECT rs.* FROM setup s, room_sets rs WHERE s.people = rs.people ; hotel | rooms | price | people -------+-------+-------+-------- A | {3} | 95 | 3 A | {1,2} | 299 | 3 (2 ROWS)
Which is all great, but we should show only 1 set per hotel. To do this, we can use DISTINCT ON clause:
WITH RECURSIVE setup AS ( SELECT 3::INT4 AS people ), room_sets AS ( SELECT r.hotel, ARRAY[ r.room ] AS rooms, r.price, r.people FROM setup s, room_price r WHERE r.people <= s.people UNION ALL SELECT rs.hotel, rs.rooms || n.room, rs.price + n.price AS price, rs.people + n.people AS people FROM setup s, room_sets rs JOIN room_price n USING (hotel) WHERE n.room > rs.rooms[ array_upper( rs.rooms, 1 ) ] AND rs.people + n.people <= s.people ) SELECT DISTINCT ON (rs.hotel) rs.* FROM setup s, room_sets rs WHERE s.people = rs.people ORDER BY rs.hotel, rs.price ; hotel | rooms | price | people -------+-------+-------+-------- A | {3} | 95 | 3 (1 ROW)
Which works great, and returns expected results for other cases too.
One final problem which we might have is that when search returns data from different hotels, final result set is sorted by hotel. And we might want to have it sorted by price.
For example, for 1 person the query returns:
hotel | rooms | price | people -------+-------+-------+-------- A | {1} | 200 | 1 B | {4} | 90 | 1 (2 rows)
Solution is simple – we can't change ORDER BY clause, as it's needed for DISTINCT ON, but we can reorder the data:
WITH RECURSIVE setup AS ( SELECT 1::INT4 AS people ), room_sets AS ( SELECT r.hotel, ARRAY[ r.room ] AS rooms, r.price, r.people FROM setup s, room_price r WHERE r.people <= s.people UNION ALL SELECT rs.hotel, rs.rooms || n.room, rs.price + n.price AS price, rs.people + n.people AS people FROM setup s, room_sets rs JOIN room_price n USING (hotel) WHERE n.room > rs.rooms[ array_upper( rs.rooms, 1 ) ] AND rs.people + n.people <= s.people ), results AS ( SELECT DISTINCT ON (rs.hotel) rs.* FROM setup s, room_sets rs WHERE s.people = rs.people ORDER BY rs.hotel, rs.price ) SELECT * FROM results ORDER BY price ; hotel | rooms | price | people -------+-------+-------+-------- B | {4} | 90 | 1 A | {1} | 200 | 1 (2 ROWS)
Final note – this will not be fast for large data sets. But it might be fast enough for the dataset that you have.
Minor nit: NP-Complete doesn’t mean unsolvable. It just means that the computational complexity grows at a rate that’s intractable beyond a certain number.
@David:
well, sure. It was a kind of mental shortcut.
typo – at line 270 you > instead of >
i think. thanks for your posts, always very cool.
@Seb: thanks, fixed.
Boskie 😀
What’s the advantage to using a CTE for a constant vs JOINing to a VALUES?
For instance, we could skip setup, and rewrite results as (https://gist.github.com/946329 in case formatting is wrong):
SELECT
DISTINCT ON (rs.hotel)
rs.*
FROM
room_sets rs, VALUES(3) AS setup(people)
WHERE
s.people = rs.people
ORDER BY
rs.hotel, rs.price
@François:
1. we already have to use CTE
2. I am using the setup cte 3 times. Thanks to this, changing the query to return combinations for another number of people means that I have to change only 1 place.
If i’d use values() I would have to use the values in every place, thus – every change would require changes in 3 places, and it would be additional risk of making the change badly (different values in different places).
Nice, but generating all sets is probably least efficient approach.
BTW some boundaries should be noted (and similar problems appear, depending on boundaries):
1. Do all guests have to stay in the same hotel?
2. If the price is the same, should we choose solution with lowest average people in room?
3. Is it allowed to left unused places in rooms (4 people; hotel A – 2 for 100 and 3 for 150; hotel B – two 2 for 230 each)?
@Rozie:
can you show any other solution?
As for your questions:
1. yes – that’s the point of room sets *per hotel*
2. if the price is the same, I don’t care.
3. no – that was stated pretty clearly in the original question on stack overflow.
I was having trouble understanding where the 394 came from (Rooms A1+A2+A3). I think it would be worth noting that the problem seems to have a condition where a room may only be used once. Or, the table represents the availability of one room only. So for 6 people it would not be $190 (Room 3A+3A).
@Vol7tron:
Text states:
…
“Which makes sure that given room can be in set only once.”