text to hstore migration

in postgresql 8.2, in contrib, is great new datatype called hstore.

if you're not familiar with it – check the docs.

in short – this is indexable associative-array (hash) in one field.

i was given a task to convert some text field to hstore field.

text is very simple, contains set of “words", separated by spaces, where each “word" contains:

  • textual identifier, containing only lower case letters
  • “:" – separator
  • numerical (0 or positive integer) value

example value:

abc:123 def:1 bg:3 foo:99

luckily, the format is really simple.

now, i have to convert it to hstore column. adding a column is a no-brainer, but converting is apparently not so simple.

first, i tried making “replace" calls to change format of data to format acceptable for hstore:

# SELECT REPLACE(REPLACE('abc:123 def:1 bg:3 foo:99', ' ', ', '), ':', '=>');
             REPLACE
----------------------------------
 abc=>123, def=>1, bg=>3, foo=>99
(1 ROW)

but, while this works:

# SELECT 'abc=>123, def=>1, bg=>3, foo=>99'::hstore;
                      hstore
--------------------------------------------------
 "bg"=>"3", "abc"=>"123", "def"=>"1", "foo"=>"99"
(1 ROW)

this doesn't:

# SELECT REPLACE(REPLACE('abc:123 def:1 bg:3 foo:99', ' ', ', '), ':', '=>')::hstore;
ERROR:  cannot CAST TYPE text TO hstore
LINE 1: ...'abc:123 def:1 bg:3 foo:99', ' ', ', '), ':', '=>')::hstore;

oops. got to find better way. luckily, it's not very hard:

text2hstore converting functions seems to be quite simple:

CREATE OR REPLACE FUNCTION text_to_hstore(INPUT TEXT) RETURNS hstore AS $BODY$
DECLARE
    input_parts TEXT[];
    output hstore := '';
    item TEXT[];
    i INT;
BEGIN
    IF INPUT IS NULL OR INPUT = '' THEN
        RETURN output;
    END IF;
    IF NOT INPUT ~ '^[a-z]+:[0-9]( [a-z]+:[0-9]+)*$' THEN
        raise exception 'input doesn''t match validating regexp [%]', INPUT;
    END IF;
    input_parts := string_to_array(INPUT, ' ');
    FOR i IN array_lower( input_parts, 1 ) .. array_upper( input_parts, 1 ) loop
        item := string_to_array( input_parts[i], ':' );
        output := output || tconvert( item[1], item[2] );
    END loop;
    RETURN output;
END;
$BODY$ LANGUAGE plpgsql;

what it does is:

  1. if given string (input) is null, or is empty string – return empty hstore
  2. if given string doesn't match validation regexp – raise exception (i want to be sure the conversion is ok)
  3. split input string to array of strings, where each item in array is “word" (code:value)
  4. iterate over array of strings, for each item do:
  5. split item into code and value (stored as item array)
  6. add new key to output hstore
  7. after finishing all “words", return generated hstore.

so, how fast does it work?

i did test on some random test table, with following dataset:

# SELECT COUNT(*), COUNT(text_field), MIN(LENGTH(text_field)), MAX(LENGTH(text_field)), SUM(LENGTH(text_field)) FROM test ;
  COUNT  |  COUNT  | MIN | MAX |   SUM
---------+---------+-----+-----+----------
 1502922 | 1349308 |   0 | 238 | 11262277
(1 ROW)

creating new table with hstore value took:

# CREATE TABLE t2 AS SELECT text_to_hstore(text_field) FROM test ;
SELECT
TIME: 47898.277 ms

not bad. of course the function is very task-specific, but perhaps somebody will use it for similar case.

5 thoughts on “text to hstore migration”

  1. Depesz, most of the types in postgresql have 2 functions for representation. you can usually use one of those (for example the representation is used for psql to display data struct). So instead of using a custom func you could use:

    test=# select hstore_in(‘hello=>depesz’);
    hstore_in
    ——————
    “hello
    “=>”depesz”
    (1 row)

  2. and the hstore_out of course does again convert the hstore type to the text presentation.

  3. it doesn’t really do *text* to hstore conversion:

    # select hstore_in(‘a=>b’::text);
    ERROR: function hstore_in(text) does not exist
    LINE 1: select hstore_in(‘a=>b’::text);
    ^
    HINT: No function matches the given name and argument types. You might need to add explicit type casts.

    it works only with un-typed values.

  4. I know I am 4 years late, but I was just trying to solve similar issue.

    This works:

    begin
    return hstore_in(_hstore_as_text::cstring);
    end;

Comments are closed.