Grouping values in shell – select * from depesz;

I assume that everyone reading my blog understands GROUP BY clause in SQL.

Lately I've been doing some maintenance work, and found myself in a position that I could really use similar thing in shell.

Let's look at simplistic example: /etc/passwd file. It contains 7 fields, separated by :. For example:

=$ head -n 3 /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin

For my example, I want to know which accounts are using which shell, but I want just one line of output per shell.

Normally, I'd have to do something like this:

=$ cut -d: -f7 /etc/passwd | \
    sort -uV | \
    while read -r shell
    do
        printf "%s\t%s\n" "${shell}" "$( awk -v"s=${shell}" -F: '$7==s {print $1}' /etc/passwd | paste -s -d, )"
    done
/bin/bash       root,depesz,postgres,pgdba
/bin/false      systemd-timesync,systemd-network,systemd-resolve,systemd-bus-proxy,_apt,rtkit,dnsmasq,messagebus,usbmux,festival,speech-dispatcher,pulse,sddm,avahi,colord,saned,hplip,Debian-exim,debian-tor,uuidd
/bin/sync       sync
/usr/sbin/nologin       daemon,bin,sys,games,man,lp,mail,news,uucp,proxy,www-data,backup,list,irc,gnats,nobody,sshd

or, something similar. This is not trivial to customize, and becomes tedious if you need to do it many times over different sets of data.

So, I wrote group_by script.

With this I can simply:

=$ group_by -s: -g7 -d1 < /etc/passwd
/bin/bash       root,depesz,postgres,pgdba
/usr/sbin/nologin       daemon,bin,sys,games,man,lp,mail,news,uucp,proxy,www-data,backup,list,irc,gnats,nobody,sshd
/bin/sync       sync
/bin/false      systemd-timesync,systemd-network,systemd-resolve,systemd-bus-proxy,_apt,rtkit,dnsmasq,messagebus,usbmux,festival,speech-dispatcher,pulse,sddm,avahi,colord,saned,hplip,Debian-exim,debian-tor,uuidd

Output is ordered the same way as input – first group that showed in input will be the first in output. Same for detail values.

If you're interested, it's licensed under 3 clause BSD license.

One thought on “Grouping values in shell”

Orondo Rodriguez says:

2019-02-22 at 08:28

There is a tool called “q-text-as-data” which makes possible to use SQL statements to parse data in a text file.

It can by used with csv files or any file with structured data, specifying a separator.

Comments are closed.