Advanced Example on Command line usage (sed, gawk, etc)

This section develops a fairly advanced example of performance a simple data analytics task, by purely using Linux command line tools. It shows the power of these command line tools and of the ability to "pipe" the output of one tool into the other.

Task: Calculate to cumulative population of the 50 largest cities/towns in Scotland.

Separate input data into columns

In the first step, we separate the data in the input file by column, using the gawk command.

Example 12. Separate input data into columns


  # very basic: just drop header line and print column by colunm; NB: ; to separate commands in gawk
  cat ScotCities.txt | sed -e '1d' | gawk -e '{ print "{"; print "rank: "; print $1; print ","; print "name: " ; print $2; print ";"; print "pop: " ; print $3; print ","; print "status:" ; print $4; print "council: "; print $5 $6 ; print "}" }' | less