What is index overhead on writes?

One of things people learn is that adding indexes isn't free. All write operations (insert, update, delete) will be slower – well, they have to update index.

But realistically – how much slower?

Full tests should involve lots of operations, on realistic data, but I just wanted to see some basic info. So I figured I'll just look at speed of inserting data (well, COPYing data), and will try to extract some knowledge from it…

Continue reading What is index overhead on writes?

Do you really need tsvector column?

When using tsearch one usually, often, creates a tsvector column to put data in, and then create index on it.

But, do you really need the index? I wrote once already that you don't have to, but then a person talked with me on IRC, and pointed this section of docs:

One advantage of the separate-column approach over an expression index … Another advantage is that searches will be faster, since it will not be necessary to redo the to_tsvector calls to verify index matches.

So, let's see how big of a problem it really is.

Continue reading Do you really need tsvector column?

Waiting for PostgreSQL 18 – Allow parallel CREATE INDEX for GIN indexes

On 3rd of March 2025, Tomas Vondra committed patch:

Allow parallel CREATE INDEX for GIN indexes
 
Allow using parallel workers to build a GIN index, similarly to BTREE
and BRIN. For large tables this may result in significant speedup when
the build is CPU-bound.
 
The work is divided so that each worker builds index entries on a subset
of the table, determined by the regular parallel scan used to read the
data. Each worker uses a local tuplesort to sort and merge the entries
for the same key. The TID lists do not overlap (for a given key), which
means the merge sort simply concatenates the two lists. The merged
entries are written into a shared tuplesort for the leader.
 
The leader needs to merge the sorted entries again, before writing them
into the index. But this way a significant part of the work happens in
the workers, and the leader is left with merging fewer large entries,
which is more efficient.
 
Most of the parallelism infrastructure is a simplified copy of the code
used by BTREE indexes, omitting the parts irrelevant for GIN indexes
(e.g. uniqueness checks).
 
Original patch by me, with reviews and substantial improvements by
Matthias van de Meent, certainly enough to make him a co-author.
 
Author: Tomas Vondra, Matthias van de Meent
Reviewed-by: Matthias van de Meent, Andy Fan, Kirill Reshke
Discussion: https://postgr.es/m/6ab4003f-a8b8-4d75-a67f-f25ad98582dc%40enterprisedb.com

Continue reading Waiting for PostgreSQL 18 – Allow parallel CREATE INDEX for GIN indexes

Waiting for 9.6 – Phrase full text search.

On 7th of April, Teodor Sigaev committed patch:

Phrase full text search.
 
Patch introduces new text search operator (<-> or <DISTANCE>) into tsquery.
On-disk and binary in/out format of tsquery are backward compatible.
It has two side effect:
- change order for tsquery, so, users, who has a btree index over tsquery,
  should reindex it
- less number of parenthesis in tsquery output, and tsquery becomes more
  readable
 
Authors: Teodor Sigaev, Oleg Bartunov, Dmitry Ivanov
Reviewers: Alexander Korotkov, Artur Zakirov

Continue reading Waiting for 9.6 – Phrase full text search.

Waiting for 9.6 – Allow usage of huge maintenance_work_mem for GIN build.

On 2nd of September, Teodor Sigaev committed patch:

Allow usage of huge maintenance_work_mem for GIN build.
 
Currently, in-memory posting list during GIN build process is limited 1GB
because of using repalloc. The patch replaces call of repalloc to repalloc_huge.
It increases limit of posting list from 180 millions
(1GB / sizeof(ItemPointerData)) to 4 billions limited by maxcount/count fields
in GinEntryAccumulator and subsequent calls. Check added.
 
Also, fix accounting of allocatedMemory during build to prevent integer
overflow with maintenance_work_mem > 4GB.
 
Robert Abraham <robert.abraham86@googlemail.com> with additions by me

Continue reading Waiting for 9.6 – Allow usage of huge maintenance_work_mem for GIN build.

Joining BTree and GIN/GiST indexes

Today, I'd like to show you how you can use the same index for two different types of conditions. One that is using normal BTree indexing ( equal, less than, greater than ), and one that is using GIN/GiST index, for full text searching.

Continue reading Joining BTree and GIN/GiST indexes

Waiting for 9.4 – Introduce jsonb, a structured format for storing json.

Portuguese Brazil Version

On 23rd of March, Andrew Dunstan committed patch:

Introduce jsonb, a structured format for storing json.
 
The new format accepts exactly the same data as the json type. However, it is
stored in a format that does not require reparsing the orgiginal text in order
to process it, making it much more suitable for indexing and other operations.
Insignificant whitespace is discarded, and the order of object keys is not
preserved. Neither are duplicate object keys kept - the later value for a given
key is the only one stored.
 
The new type has all the functions and operators that the json type has,
with the exception of the json generation functions (to_json, json_agg etc.)
and with identical semantics. In addition, there are operator classes for
hash and btree indexing, and two classes for GIN indexing, that have no
equivalent in the json type.
 
This feature grew out of previous work by Oleg Bartunov and Teodor Sigaev, which
was intended to provide similar facilities to a nested hstore type, but which
in the end proved to have some significant compatibility issues.
 
Authors: Oleg Bartunov,  Teodor Sigaev, Peter Geoghegan and Andrew Dunstan.
Review: Andres Freund

Continue reading Waiting for 9.4 – Introduce jsonb, a structured format for storing json.

Understanding postgresql.conf : work_mem

For todays post in Understanding postgresql.conf series, I chose work_mem parameter.

Continue reading Understanding postgresql.conf : work_mem

Waiting for 9.1 – Faster LIKE/ILIKE

On 1st of February, Tom Lane committed patch:

Support LIKE and ILIKE index searches via contrib/pg_trgm indexes.
 
Unlike Btree-based LIKE optimization, this works for non-left-anchored
search patterns.  The effectiveness of the search depends on how many
trigrams can be extracted from the pattern.  (The worst case, with no      
trigrams, degrades to a full-table scan, so this isn't a panacea.  But   
it can be very useful.)                                                 
 
Alexander Korotkov, reviewed by Jan Urbanski

Continue reading Waiting for 9.1 – Faster LIKE/ILIKE

Waiting for 8.4 – partial-match support in GIN, and sequence restart

Today we have two interesting patches:

patch by Teodor Sigaev and Oleg Bartunov, and committed by Tom Lane, which adds interesting capability to GIN indexes
patch by Zoltan Boszormenyi, also committed by Tom, which adds “RESTART" option to ALTER SEQUENCE. With some interesting consequences

Continue reading Waiting for 8.4 – partial-match support in GIN, and sequence restart

=$
|

Tag: gin