I assume that everyone reading my blog understands GROUP BY clause in SQL.
Lately I've been doing some maintenance work, and found myself in a position that I could really use similar thing in shell.
Let's look at simplistic example: /etc/passwd file. It contains 7 fields, separated by :. For example:
=$ head -n 3 /etc/passwd root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin bin:x:2:2:bin:/bin:/usr/sbin/nologin
For my example, I want to know which accounts are using which shell, but I want just one line of output per shell.
Normally, I'd have to do something like this:
=$ cut -d: -f7 /etc/passwd | \ sort -uV | \ while read -r shell do printf "%s\t%s\n" "${shell}" "$( awk -v"s=${shell}" -F: '$7==s {print $1}' /etc/passwd | paste -s -d, )" done /bin/bash root,depesz,postgres,pgdba /bin/false systemd-timesync,systemd-network,systemd-resolve,systemd-bus-proxy,_apt,rtkit,dnsmasq,messagebus,usbmux,festival,speech-dispatcher,pulse,sddm,avahi,colord,saned,hplip,Debian-exim,debian-tor,uuidd /bin/sync sync /usr/sbin/nologin daemon,bin,sys,games,man,lp,mail,news,uucp,proxy,www-data,backup,list,irc,gnats,nobody,sshd
or, something similar. This is not trivial to customize, and becomes tedious if you need to do it many times over different sets of data.
So, I wrote group_by script.
With this I can simply:
=$ group_by -s: -g7 -d1 < /etc/passwd /bin/bash root,depesz,postgres,pgdba /usr/sbin/nologin daemon,bin,sys,games,man,lp,mail,news,uucp,proxy,www-data,backup,list,irc,gnats,nobody,sshd /bin/sync sync /bin/false systemd-timesync,systemd-network,systemd-resolve,systemd-bus-proxy,_apt,rtkit,dnsmasq,messagebus,usbmux,festival,speech-dispatcher,pulse,sddm,avahi,colord,saned,hplip,Debian-exim,debian-tor,uuidd
Output is ordered the same way as input – first group that showed in input will be the first in output. Same for detail values.
If you're interested, it's licensed under 3 clause BSD license.
There is a tool called “q-text-as-data” which makes possible to use SQL statements to parse data in a text file.
It can by used with csv files or any file with structured data, specifying a separator.