i just “love” locale issues.

nice machine with 2 gb of ram, 800 megabytes in 2 logfiles. single word as search phrase. polish utf-8 locale (pl_PL.UTF-8), gnu grep 2.5.1. results?

=> time grep -in reloading postgresql-2007-10-22_000000.log postgresql-2007-10-22_120909.log
postgresql-2007-10-22_000000.log:40001:2007-10-22 10:50:13.528 CEST @ 24681  LOG:  received SIGHUP, reloading configuration files
postgresql-2007-10-22_120909.log:1215696:2007-10-22 12:15:21.769 CEST @ 24681  LOG:  received SIGHUP, reloading configuration files
real    1m21.212s
user    1m20.909s
sys     0m0.284s

same, check without -i:

=> time grep -n reloading postgresql-2007-10-22_000000.log postgresql-2007-10-22_120909.log
postgresql-2007-10-22_000000.log:40001:2007-10-22 10:50:13.528 CEST @ 24681  LOG:  received SIGHUP, reloading configuration files
postgresql-2007-10-22_120909.log:1215696:2007-10-22 12:15:21.769 CEST @ 24681  LOG:  received SIGHUP, reloading configuration files
real    0m1.147s
user    0m0.868s
sys     0m0.268s

after setting locale to C:

=> time grep -in reloading postgresql-2007-10-22_000000.log postgresql-2007-10-22_120909.log
postgresql-2007-10-22_000000.log:40001:2007-10-22 10:50:13.528 CEST @ 24681  LOG:  received SIGHUP, reloading configuration files
postgresql-2007-10-22_120909.log:1215696:2007-10-22 12:15:21.769 CEST @ 24681  LOG:  received SIGHUP, reloading configuration files
real    0m1.209s
user    0m0.896s
sys     0m0.316s

all tests were repeated many times to get all data in memory, and check for extreme values.

does anybody need another proof that locale “thing" is broken? of course it might be that only locale handling in grep is bad, but anyway – it's still locale issue.

4 thoughts on “i just “love” locale issues.”

  1. Just a note for anyone still reading this, this bug was fixed in GNU grep 2.7

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.