Converting list of integers into list of ranges

Yesterday someone on irc asked:

i've a query that returns sequential numbers with gaps (generate_series + join) and my question is: can is somehow construct ranges out of the returned values? sort of range_agg or something?

There was no further discussion, aside from me saying

sure you can. not trivial task, but possible.
you'd need window functions.

but it got me thinking …

First, it would be good to have test data. To avoid retyping it for every query let's make a table with some integers:

=$ create table test (i int4);
CREATE TABLE
 
=$ insert into test (i) values (1), (2), (5), (6), (7), (8), (9), (10), (20), (30);
INSERT 0 10

This gives us nice table input data:

=$ select i from test order by i;
 i
----
  1
  2
  5
  6
  7
  8
  9
 10
 20
 30
(10 rows)

Based on data above, we'd like to get ranges:

  • 1 to 2
  • 5 to 10
  • 20
  • 30

First, we need a way to tell if given row starts new range. This can be trivially done by checking whether previous value of i smaller by 1:

=$ select
    i,
    i - lag(i) over (order by i) as diff
from test
order by i;
 i  |  diff
----+--------
  1 | [null]
  2 |      1
  5 |      3
  6 |      1
  7 |      1
  8 |      1
  9 |      1
 10 |      1
 20 |     10
 30 |     10
(10 rows)

Nice. If diff is null or > 1 then current i starts new range.

Now, let's make it so that all rows in given range will have the same group id. To do so, I'll assign group_id (being just copy of value from i) for first rows in range, and in other cases, I'll make it null:

=$ with with_group_id as (
    select
        i,
        case
            when i - lag(i) over (order by 1) < 2 then null
            else i
        end as group_id
    from test
)
select * from with_group_id
order by i;
 i  | group_id
----+----------
  1 |        1
  2 |   [null]
  5 |        5
  6 |   [null]
  7 |   [null]
  8 |   [null]
  9 |   [null]
 10 |   [null]
 20 |       20
 30 |       30
(10 rows)

Great. Now, let's fill group_id for non-first rows:

=$ with with_group_id as (
    select
        i,
        case
            when i - lag(i) over (order by 1) < 2 then null
            else i
        end as group_id
    from test
)
select
     i,
     max(group_id) over (order by i) as group_id
from with_group_id
order by i;
 i  | group_id
----+----------
  1 |        1
  2 |        1
  5 |        5
  6 |        5
  7 |        5
  8 |        5
  9 |        5
 10 |        5
 20 |       20
 30 |       30
(10 rows)

Great. Now, all I need is to get ranges using normal grouping:

=$ with with_group_id as (
    select
        i,
        case
            when i - lag(i) over (order by 1) < 2 then null
            else i
        end as group_id
    from test
), full_group_id as (
    select
        i,
        max(group_id) over (order by i) as group_id
    from with_group_id
)
select
    min(i) as range_from,
    max(i) as range_to
from
    full_group_id
group by group_id
order by group_id;
 range_from | range_to
------------+----------
          1 |        2
          5 |       10
         20 |       20
         30 |       30
(4 rows)

This can be also used to generate proper PostgreSQL ranges:

=$ with with_group_id as (
    select
        i,
        case
            when i - lag(i) over (order by 1) < 2 then null
            else i
        end as group_id
    from test
), full_group_id as (
    select
        i,
        max(group_id) over (order by i) as group_id
    from with_group_id
)
select
    int4range(
        min(i),
        max(i),
        '[]'
    ) as range
from
    full_group_id
group by group_id
order by group_id;
  range
---------
 [1,3)
 [5,11)
 [20,21)
 [30,31)
(4 rows)

Pretty cool, isn't it?

5 thoughts on “Converting list of integers into list of ranges”

  1. Long time no write.

    I had to solve exactly this problem in a work related task. At the time, the approach I took was direct SQL, no CTEs at all. I must admit the CTEs definitely make it easier to explain the solution here.

    The approach I took was (using your table):

    SELECT
      int4range(
        i,
        CASE WHEN last THEN i ELSE j END,
        '[]'
      ) AS range
    FROM (
      SELECT
        i,
        first,
        last,
        LEAD(i) OVER (ORDER BY i) AS j
      FROM (
        SELECT
          i,
          COALESCE(i - LAG(i) OVER(ORDER BY i), 0)  1 AS first,
          COALESCE(LEAD(i) OVER (ORDER BY i) - i, 0)  1 AS last
        FROM test
      ) a
      WHERE first OR last
    ) b
    WHERE first
    ORDER BY i;

    This has the nice side effect of reducing the amount of work to be done when there are large contiguous ranges.

  2. I did a similar thing a while back, but for dates. The var `dates` was a `date[]` for this code:

    select  daterange(min(d), max(d)+1) as date_ranges
    from    (   select d, sum(adj_num) over(order by d) as range_group
                from (  select  e.d,
                                case
                                    when lag(d) over(order by d) = d-1 then 0
                                    else 1
                                end as adj_num
                        from (  select  distinct d.d
                                from    unnest(dates) as d(d)
                                ) e
                        ) f
                ) g
    group by g.range_group;
  3. I believe that this solution is simplest (sorry, no idea where you are getting your nice formatting):

    SELECT MIN(i) AS range_from, MAX(i) AS range_to
    FROM
    (SELECT i, i-row_number() over (order by i) AS group_num FROM test ORDER BY i) AS t
    GROUP BY group_num
    ORDER BY group_num
  4. @Dan:

    This is beautiful. Abuses certain facts about data, but works, and is much simpler. Thanks.

  5. Thanks. 🙂
    As I look at it again, it appears that I really could have dropped the second “ORDER BY i” inside the subselect. After all, the aggregate functions don’t care what order the rows of t appear in. I probably put it there to aid with verifying the approach, as I was working on it. So, if I am correct, it could be made slightly simpler, still.

Comments are closed.