i was informed that jan wieck wrote another function which does cumulative sum – using cursors and srf's to get the same result.
i checked it, and here are the results.
i modified jan's code to look like this:
CREATE OR REPLACE FUNCTION f1 (REFCURSOR) RETURNS SETOF RECORD AS ' DECLARE c1 ALIAS FOR $1; s INTEGER := 0; r RECORD; last_code TEXT; BEGIN FETCH c1 INTO r; WHILE FOUND LOOP IF last_code IS DISTINCT FROM r.code THEN s := 0; last_code := r.code; END IF; s := s + r.val; r.sum := s; RETURN NEXT r; FETCH c1 INTO r; END LOOP; RETURN; END; ' LANGUAGE plpgsql;
modification was necessary to allow many “groups" within one recordset.
it has to be noted that this method has different semantics than mine. i.e. it has to be called within transaction using 2 separate commands:
DECLARE curs cursor FOR SELECT id, user_id::text AS code, VALUE AS val, 0 AS SUM FROM test ORDER BY user_id, id; EXPLAIN analyze SELECT * FROM f1('curs'::refcursor) AS (id INTEGER, code text, val INTEGER, SUM INT);
is it any good? yes!
using test case from my previous post (10,000 records, 10 users) i got following explain timings:
my approach:
# EXPLAIN ANALYZE SELECT *, cumulative_sum(user_id::text, VALUE) FROM (SELECT * FROM test ORDER BY user_id, id ) x; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Subquery Scan x (cost=0.00..3144.12 ROWS=10000 width=12) (actual TIME=0.114..239.199 ROWS=10000 loops=1) -> INDEX Scan USING q ON test (cost=0.00..469.12 ROWS=10000 width=12) (actual TIME=0.026..26.369 ROWS=10000 loops=1) Total runtime: 259.162 ms (3 ROWS)
jan code:
# BEGIN; DECLARE curs cursor FOR SELECT id, user_id::text AS code, VALUE AS val, 0 AS SUM FROM test ORDER BY user_id, id; *# EXPLAIN analyze SELECT * FROM f1('curs'::refcursor) AS (id INTEGER, code text, val INTEGER, SUM INT); QUERY PLAN --------------------------------------------------------------------------------------------------------------- FUNCTION Scan ON f1 (cost=0.00..260.00 ROWS=1000 width=44) (actual TIME=131.372..150.723 ROWS=10000 loops=1) Total runtime: 170.102 ms (2 ROWS)
nice! next runs of both approaches show more or less the same timings (+/- 1-3%)
now – i don't feel 100% comfortable with this solution. the main reason is that it requires me to issue 2 commands, but i have to note these facts:
- it is the fastest (know to me at the moment) way to achieve the results
- it doesn't have the issue with “counters from previous runs" that i described in previous post
- it shows that “return next;" is not that slow – i was thinking it will be much slower.
as for drawbacks:
- requires 2 commands to get data
- calling function is quite cumbersome due to the fast that you have to specify columns.
One thought on “cumulative sum in sql – howto part2”
Comments are closed.