Some time ago, guys from Instagram shared their approach to generating unique ids on multiple hosts in a way that guarantees (to reasonable extend) uniqueness, and doesn't require any centralized service.
Earlier this month, the build benchmarked their solution vs. UUIDs, and vs. bigserial.
I thought – whether C based code for nextid would be faster.
My knowledge of C is very limited, so what I did is probably not that nice as it could be, but anyway. Original function from Instagram:
CREATE OR REPLACE FUNCTION next_id(OUT RESULT BIGINT) AS $$ DECLARE our_epoch BIGINT := 1314220021721; seq_id BIGINT; now_millis BIGINT; shard_id INT := 5; BEGIN SELECT NEXTVAL('table_id_seq') % 1024 INTO seq_id; SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis; RESULT := (now_millis - our_epoch) << 23; RESULT := RESULT | (shard_id << 10); RESULT := RESULT | (seq_id); END; $$ LANGUAGE PLPGSQL;
My function, in C:
#include <sys/time.h> #include "postgres.h" #include "funcapi.h" PG_MODULE_MAGIC; PG_FUNCTION_INFO_V1(nextid_c); Datum nextid_c(PG_FUNCTION_ARGS) { int32 shard_id = PG_GETARG_INT32(0); int64 seq_no = PG_GETARG_INT64(1); int64 our_epoch = 1314220021721; int64 now_millis; int64 result; struct timeval tp; gettimeofday(&tp, NULL); now_millis = tp.tv_sec * 1000 + tp.tv_usec / 1000 - our_epoch; result = now_millis << 23; result = result | (shard_id << 10); result = result | (seq_no % 1024); PG_RETURN_INT64(result); }
This got compiled to nextid_c.so, copied do libdir for PostgreSQL, and function was created using:
CREATE FUNCTION next_id_c(INT4, INT8) RETURNS int8 AS '$libdir/nextid_c', 'nextid_c' LANGUAGE C STRICT;
it's important to note that this function takes 2 arguments:
- Shard number (I didn't want to hardcode it in source)
- Next id from sequence – since I didn't want to write db-access from C
Old function was called:
SELECT next_id();
while new function:
SELECT next_id( 5, NEXTVAL('table_id_seq') );
More complex, but I figured it's good enough, and it made it possible not to hardcode name of the sequence.
To test only id generation time, I skipped table creation, and simply did:
SELECT COUNT(*) FROM ( SELECT NEXTVAL('table_id_seq' ) FROM generate_series(1, 1000000)) AS x; SELECT COUNT(*) FROM ( SELECT uuid_generate_v4() FROM generate_series(1, 1000000)) AS x; SELECT COUNT(*) FROM ( SELECT next_id() FROM generate_series(1, 1000000)) AS x; SELECT COUNT(*) FROM ( SELECT next_id_c( 5, NEXTVAL('table_id_seq' )) FROM generate_series(1, 1000000)) AS x;
This showed me time for generating 1 million ids using given approach.
Each approach was ran 10 times. Results:
Method | Time (in ms.) | ||
---|---|---|---|
Best | Average | Worst | |
next_id | 6810.946 | 7001.082 | 7117.601 |
next_id_c | 505.491 | 540.303 | 580.531 |
nextval | 489.318 | 534.481 | 569.947 |
uuid | 3364.151 | 3409.031 | 3447.849 |
Which shows us that nextval is (obviously) fastest, and slowdown for other methods is:
- +3% of time for next_id_c
- +588% (or, almost 7 times as slow) for uuid
- +1292% (or, almost 14 times as slow) for next_id
Results from the build are different, but I think it's because they measured writes to table – which I totally ignored, and just checked how fast is the function in id generation.
Long story short – I would say that given normal usage (write to table) – even plpgsql function will be fast enough, but you might want to move to C based function if you need the most speed out of it.
the plpgsql function can be maybe 5x faster if you reduce number of expressions there. result := ; result := result | xxx; result := is expensive when you use it in pretty short repeated functions.
5 times faster was too optimistic – I did tests, and the speedup is is about 3x
The LANGUAGE SQL version of this is only +74% slower than nextval().
CREATE OR REPLACE FUNCTION public.next_id_sql() RETURNS bigint LANGUAGE sql AS $$
select ((((floor(extract(epoch from clock_timestamp()) * 1000)::bigint) – 1314220021721) << 23) | (5 << 10)) | (nextval('table_id_seq') % 1024)$$;
Thanks for the test Depesz. This line’s up with our results as well. We contemplated using Instagram-style IDs, and ended up going for a better alternative. We hard-coded the shard ID into a normal sequence, and just used nextval. It is still all wrapped under a function call, so it is transparent to any client.