Removing elements from arrays – select * from depesz;

Cezio wrote post about removing elements from arrays in PostgreSQL.

Unfortunately his blog engine requires registration before comment, which I don't like, so I decided to comment using my own blogspace.

The approach Cezio showed is pretty cool – having more functions to handle arrays is definitely cool, but it might be better to use sql functions instead of pl/pgsql.

I did a quick test with 2 functions. Pl/PgSQL was taken from Cezio post – I just changed the name, and I wrote my own version in SQL. Sources:

create or replace function array_remove_plpgsql(arr int8[], other_arr int8[]) returns int8[] as $$
declare
out_arr int8[];
el_idx int;
begin
    if arr is null or other_arr is null then
        return arr;
    end if;
    for el_idx in array_lower(arr, 1)..array_upper(arr, 1) loop
    if not arr[el_idx] =any(other_arr) then
        out_arr = array_append(out_arr, arr[el_idx]);
    end if;
    end loop;
    return out_arr;
end;
$$ language plpgsql immutable;

create or replace function array_remove_sql(int8[], int8[]) returns int8[] as $$
SELECT array(select $1[i] from generate_series(array_lower($1,1), array_upper($1,1)) i WHERE $1[i] <> ALL($2));
$$ language sql immutable;

Then I ran it many times. 100000 times to be exact:

explain analyze select array_remove_plpgsql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) from generate_series(1,100000);
explain analyze select array_remove_sql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) from generate_series(1,100000);

Results:

PL/PgSQL version:

# explain analyze select array_remove_plpgsql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) from generate_series(1,100000);
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..12.50 rows=1000 width=0) (actual time=36.095..176.331 rows=100000 loops=1)
 Total runtime: 308.723 ms
(2 rows)
# explain analyze select array_remove_plpgsql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) from generate_series(1,100000);
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..12.50 rows=1000 width=0) (actual time=35.962..173.675 rows=100000 loops=1)
 Total runtime: 304.880 ms
(2 rows)
# explain analyze select array_remove_plpgsql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) from generate_series(1,100000);
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..12.50 rows=1000 width=0) (actual time=36.279..175.507 rows=100000 loops=1)
 Total runtime: 308.333 ms
(2 rows)

Pretty stable result, around 305ms.

SQL version:

# explain analyze select array_remove_sql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) from generate_series(1,100000);
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..12.50 rows=1000 width=0) (actual time=13.459..149.143 rows=100000 loops=1)
 Total runtime: 277.871 ms
(2 rows)
# explain analyze select array_remove_sql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) from generate_series(1,100000);
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..12.50 rows=1000 width=0) (actual time=13.429..144.672 rows=100000 loops=1)
 Total runtime: 270.746 ms
(2 rows)
# explain analyze select array_remove_sql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) from generate_series(1,100000);
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..12.50 rows=1000 width=0) (actual time=13.511..148.526 rows=100000 loops=1)
 Total runtime: 274.809 ms
(2 rows)

Not bad. PlPgSQL is only 10% slower, while being more readable. Is the trade-off worth it – you have to decide for yourself.

Additionally, one factual correction. Cezio said “Jeśli używacie PostgreSQL'a w wersji >=8.3, to w contribie jest pakiet intarray" which means “If you use PostgreSQL 8.3 or newer, there is contrib module intarray".

Actually – intarray has been in contrib since at least 7.3. But Cezio is 100% right – it's definitely worth checking, as it is simply awesome for handling int4[].

3 thoughts on “Removing elements from arrays”

pavel stehule says:

2008-08-07 at 07:41

Difference between plpgsql and sql will be bigger with larger arrays (array update is expensive operation). for n=10000 is plpgsql very bad choice.
Valentine Gogichashvili says:

2008-08-07 at 15:22

On my machine with Postgres 8.2 the plpgsql version of your example runs 25% faster.
flexpadawan says:

2009-04-22 at 06:56

Great Post! Been trying to figure out how to do this for the last 3 days!

I did some testing and tweaking.

The Test on postgres 8.3:
306.370ms – plpgsql (Winner) AMD Quad-Core??
1849.881ms – sql

The Tweek:
I re-wrote this to allow for any data type.

CREATE OR REPLACE FUNCTION array_difference(anyarray, anyarray)
RETURNS anyarray AS
$BODY$
declare
out_arr $1%TYPE;
el_idx int;
begin
if $1 is null or $2 is null then
return $1;
end if;
for el_idx in array_lower($1, 1)..array_upper($1, 1) loop
if not $1[el_idx] =any($2) then
out_arr = array_append(out_arr, $1[el_idx]);
end if;
end loop;
return out_arr;
end;
$BODY$
LANGUAGE ‘plpgsql’ IMMUTABLE
COST 100;
ALTER FUNCTION array_difference(anyarray, anyarray) OWNER TO postgres;

Comments are closed.