in postgresql 8.2, in contrib, is great new datatype called hstore.
if you're not familiar with it – check the docs.
in short – this is indexable associative-array (hash) in one field.
i was given a task to convert some text field to hstore field.
text is very simple, contains set of “words", separated by spaces, where each “word" contains:
- textual identifier, containing only lower case letters
- “:" – separator
- numerical (0 or positive integer) value
example value:
abc:123 def:1 bg:3 foo:99
luckily, the format is really simple.
now, i have to convert it to hstore column. adding a column is a no-brainer, but converting is apparently not so simple.
first, i tried making “replace" calls to change format of data to format acceptable for hstore:
# SELECT REPLACE(REPLACE('abc:123 def:1 bg:3 foo:99', ' ', ', '), ':', '=>'); REPLACE ---------------------------------- abc=>123, def=>1, bg=>3, foo=>99 (1 ROW)
but, while this works:
# SELECT 'abc=>123, def=>1, bg=>3, foo=>99'::hstore; hstore -------------------------------------------------- "bg"=>"3", "abc"=>"123", "def"=>"1", "foo"=>"99" (1 ROW)
this doesn't:
# SELECT REPLACE(REPLACE('abc:123 def:1 bg:3 foo:99', ' ', ', '), ':', '=>')::hstore; ERROR: cannot CAST TYPE text TO hstore LINE 1: ...'abc:123 def:1 bg:3 foo:99', ' ', ', '), ':', '=>')::hstore;
oops. got to find better way. luckily, it's not very hard:
text2hstore converting functions seems to be quite simple:
CREATE OR REPLACE FUNCTION text_to_hstore(INPUT TEXT) RETURNS hstore AS $BODY$ DECLARE input_parts TEXT[]; output hstore := ''; item TEXT[]; i INT; BEGIN IF INPUT IS NULL OR INPUT = '' THEN RETURN output; END IF; IF NOT INPUT ~ '^[a-z]+:[0-9]( [a-z]+:[0-9]+)*$' THEN raise exception 'input doesn''t match validating regexp [%]', INPUT; END IF; input_parts := string_to_array(INPUT, ' '); FOR i IN array_lower( input_parts, 1 ) .. array_upper( input_parts, 1 ) loop item := string_to_array( input_parts[i], ':' ); output := output || tconvert( item[1], item[2] ); END loop; RETURN output; END; $BODY$ LANGUAGE plpgsql;
what it does is:
- if given string (input) is null, or is empty string – return empty hstore
- if given string doesn't match validation regexp – raise exception (i want to be sure the conversion is ok)
- split input string to array of strings, where each item in array is “word" (code:value)
- iterate over array of strings, for each item do:
- split item into code and value (stored as item array)
- add new key to output hstore
- after finishing all “words", return generated hstore.
so, how fast does it work?
i did test on some random test table, with following dataset:
# SELECT COUNT(*), COUNT(text_field), MIN(LENGTH(text_field)), MAX(LENGTH(text_field)), SUM(LENGTH(text_field)) FROM test ; COUNT | COUNT | MIN | MAX | SUM ---------+---------+-----+-----+---------- 1502922 | 1349308 | 0 | 238 | 11262277 (1 ROW)
creating new table with hstore value took:
# CREATE TABLE t2 AS SELECT text_to_hstore(text_field) FROM test ; SELECT TIME: 47898.277 ms
not bad. of course the function is very task-specific, but perhaps somebody will use it for similar case.
Depesz, most of the types in postgresql have 2 functions for representation. you can usually use one of those (for example the representation is used for psql to display data struct). So instead of using a custom func you could use:
test=# select hstore_in(‘hello=>depesz’);
hstore_in
——————
“hello
“=>”depesz”
(1 row)
and the hstore_out of course does again convert the hstore type to the text presentation.
it doesn’t really do *text* to hstore conversion:
# select hstore_in(‘a=>b’::text);
ERROR: function hstore_in(text) does not exist
LINE 1: select hstore_in(‘a=>b’::text);
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
it works only with un-typed values.
Pretty interesting post.
I know I am 4 years late, but I was just trying to solve similar issue.
This works:
begin
return hstore_in(_hstore_as_text::cstring);
end;