Converting list of integers into list of ranges

Yesterday someone on irc asked:

i've a query that returns sequential numbers with gaps (generate_series + join) and my question is: can is somehow construct ranges out of the returned values? sort of range_agg or something?

There was no further discussion, aside from me saying

sure you can. not trivial task, but possible.
you'd need window functions.

but it got me thinking …

First, it would be good to have test data. To avoid retyping it for every query let's make a table with some integers:

=$ create table test (i int4);
CREATE TABLE
 
=$ insert into test (i) values (1), (2), (5), (6), (7), (8), (9), (10), (20), (30);
INSERT 0 10

This gives us nice table input data:

=$ select i from test order by i;
 i
----
  1
  2
  5
  6
  7
  8
  9
 10
 20
 30
(10 rows)

Based on data above, we'd like to get ranges:

1 to 2
5 to 10
20
30

First, we need a way to tell if given row starts new range. This can be trivially done by checking whether previous value of i smaller by 1:

=$ select
    i,
    i - lag(i) over (order by i) as diff
from test
order by i;
 i  |  diff
----+--------
  1 | [null]
  2 |      1
  5 |      3
  6 |      1
  7 |      1
  8 |      1
  9 |      1
 10 |      1
 20 |     10
 30 |     10
(10 rows)

Nice. If diff is null or > 1 then current i starts new range.

Now, let's make it so that all rows in given range will have the same group id. To do so, I'll assign group_id (being just copy of value from i) for first rows in range, and in other cases, I'll make it null:

=$ with with_group_id as ( select i, case when i - lag(i) over (order by 1) < 2 then null else i end as group_id from test ) select * from with_group_id order by i; i | group_id ----+---------- 1 | 1 2 | [null] 5 | 5 6 | [null] 7 | [null] 8 | [null] 9 | [null] 10 | [null] 20 | 20 30 | 30 (10 rows)

Great. Now, let's fill group_id for non-first rows:

=$ with with_group_id as ( select i, case when i - lag(i) over (order by 1) < 2 then null else i end as group_id from test ) select i, max(group_id) over (order by i) as group_id from with_group_id order by i; i | group_id ----+---------- 1 | 1 2 | 1 5 | 5 6 | 5 7 | 5 8 | 5 9 | 5 10 | 5 20 | 20 30 | 30 (10 rows)

Great. Now, all I need is to get ranges using normal grouping:

=$ with with_group_id as ( select i, case when i - lag(i) over (order by 1) < 2 then null else i end as group_id from test ), full_group_id as ( select i, max(group_id) over (order by i) as group_id from with_group_id ) select min(i) as range_from, max(i) as range_to from full_group_id group by group_id order by group_id; range_from | range_to ------------+---------- 1 | 2 5 | 10 20 | 20 30 | 30 (4 rows)

This can be also used to generate proper PostgreSQL ranges:

=$ with with_group_id as ( select i, case when i - lag(i) over (order by 1) < 2 then null else i end as group_id from test ), full_group_id as ( select i, max(group_id) over (order by i) as group_id from with_group_id ) select int4range( min(i), max(i), '[]' ) as range from full_group_id group by group_id order by group_id; range --------- [1,3) [5,11) [20,21) [30,31) (4 rows)

Pretty cool, isn't it?

5 thoughts on “Converting list of integers into list of ranges”

Long time no write.

I had to solve exactly this problem in a work related task. At the time, the approach I took was direct SQL, no CTEs at all. I must admit the CTEs definitely make it easier to explain the solution here.

The approach I took was (using your table):

SELECT
  int4range(
    i,
    CASE WHEN last THEN i ELSE j END,
    '[]'
  ) AS range
FROM (
  SELECT
    i,
    first,
    last,
    LEAD(i) OVER (ORDER BY i) AS j
  FROM (
    SELECT
      i,
      COALESCE(i - LAG(i) OVER(ORDER BY i), 0)  1 AS first,
      COALESCE(LEAD(i) OVER (ORDER BY i) - i, 0)  1 AS last
    FROM test
  ) a
  WHERE first OR last
) b
WHERE first
ORDER BY i;

This has the nice side effect of reducing the amount of work to be done when there are large contiguous ranges.

I did a similar thing a while back, but for dates. The var `dates` was a `date[]` for this code:

select  daterange(min(d), max(d)+1) as date_ranges
from    (   select d, sum(adj_num) over(order by d) as range_group
            from (  select  e.d,
                            case
                                when lag(d) over(order by d) = d-1 then 0
                                else 1
                            end as adj_num
                    from (  select  distinct d.d
                            from    unnest(dates) as d(d)
                            ) e
                    ) f
            ) g
group by g.range_group;

I believe that this solution is simplest (sorry, no idea where you are getting your nice formatting):

SELECT MIN(i) AS range_from, MAX(i) AS range_to
FROM
(SELECT i, i-row_number() over (order by i) AS group_num FROM test ORDER BY i) AS t
GROUP BY group_num
ORDER BY group_num

@Dan:

This is beautiful. Abuses certain facts about data, but works, and is much simpler. Thanks.

Thanks. 🙂
As I look at it again, it appears that I really could have dropped the second “ORDER BY i” inside the subselect. After all, the aggregate functions don’t care what order the rows of t appear in. I probably put it there to aid with verifying the approach, as I was working on it. So, if I am correct, it could be made slightly simpler, still.

Comments are closed.