Learning Linux Commands: awk

In the case of this article, the Learning Linux Commands: awk title might be a little misleading. And that is because awk is more than a command, it’s a programming language in its own right. You can write awk scripts for complex operations or you can use awk from the command line. The name stands for Aho, Weinberger and Kernighan (yes, Brian Kernighan), the authors of the language, which was started in 1977, hence it shares the same Unix spirit as the other classic *nix utilities.

If you’re getting used to C programming or know it already, you will see some familiar concepts in awk, especially since the ‘k’ in awk stands for the same person as the ‘k’ in K&R, the C programming bible. You will need some command-line knowledge in Linux and possibly some scripting basics, but the last part is optional, as we will try to offer something for everybody. Many thanks to Arnold Robbins for all his work involved in awk.

In this tutorial you will learn:

  • What does awk do? How does it work?
  • awk basic concepts
  • Learn to use awk through command line examples

Learning about the awk command through various command line examples on Linux

Learning about the awk command through various command line examples on Linux

Software Requirements and Linux Command Line Conventions
Category Requirements, Conventions or Software Version Used
System Any Linux distro
Software awk
Other Privileged access to your Linux system as root or via the sudo command.
Conventions # – requires given linux commands to be executed with root privileges either directly as a root user or by use of sudo command
$ – requires given linux commands to be executed as a regular non-privileged user

What is it that awk does?



awk is a utility/language designed for data extraction. If the word “extraction” rings a bell, it should because awk was once Larry Wall’s inspiration when he created Perl. awk is often used with sed to perform useful and practical text manipulation chores, and it depends on the task if you should use awk or Perl, but also on personal preference. Just as sed, awk reads one line at a time, performs some action depending on the condition you give it and outputs the result.

One of the most simple and popular uses of awk is selecting a column from a text file or other command’s output. One thing I used to do with awk was, if I installed Debian on my second workstation, to get a list of the installed software from my primary box then feed it to aptitude. For that, I did something like this:

$ dpkg -l | awk ' {print $2} ' > installed

Most package managers today offer this facility, for example rpm’s -qa options, but the output is more than I want. I see that the second column of dpkg -l‘s output contains the name of the packages installed, so this is why I used $2 with awk: to get me only the 2nd column.

Basic concepts

As you have noticed, the action to be performed by awk is enclosed in braces, and the whole command is quoted. But the syntax is awk ' condition { action }'. In our example, we had no condition, but if we wanted to, say, check only for vim-related packages installed (yes, there is grep, but this is an example, plus why use two utilities when you can only use one?), we would have done this:

$ dpkg -l | awk ' /'vim'/ {print $2} '

This command would print all packages installed that have “vim” in their names. One thing about awk is that it’s fast. If you replace “vim” with “lib”, on my system that yields 1300 packages. There will be situations where the data you’ll have to work with will be much bigger, and that’s one part where awk shines.

Anyway, let’s start with the examples, and we will explain some concepts as we go. But before that, it would be good to know that there are several awk dialects and implementations, and the examples presented here deal with GNU awk, as an implementation and dialect. And because of various quoting issues, we assume you’re using bash, ksh or sh, we don’t support (t)csh.

awk command examples

See some of the examples below to gain an understanding of awk and how you can apply it in situations on your own system. Feel free to follow along and use some of these commands in your terminal to see the output you get back.

  1. Print only columns one and three using stdin.
    awk ' {print $1,$3} '
    
  2. Print all columns using stdin.
    awk ' {print \$0} '
    
  3. Print only elements from column 2 that match pattern using stdin.
    awk ' /'pattern'/ {print $2} '
    
  4. Just like make or sed, awk uses -f to get its instructions from a file, which is useful when there is a lot to be done and using the terminal would be impractical.
    awk -f script.awk inputfile
    
  5. Execute program using data from inputfile.
    awk ' program ' inputfile
    
  6. Classic “Hello, world” in awk.
    awk "BEGIN { print \"Hello, world!!\" }"
    
  7. Print what’s entered on the command line until EOF (^D).
    awk '{ print }'
    
  8. awk script for the classic “Hello, world!” (make it executable with chmod and run it as is).
    #! /bin/awk -f
    BEGIN { print "Hello, world!" }
    
  9. Comments in awk scripts.
    # This is a program that prints \
    "Hello, world!"
    # and exits
    
  10. Define the FS (field separator) as null, as opposed to white space, the default.
    awk -F "" 'program' files
    
  11. FS can also be a regular expression.
    awk -F "regex" 'program' files
    
  12. Will print <‘>. Here’s why we prefer Bourne shells. 🙂


    awk 'BEGIN { print "Here is a single \
    quote <'\''>" }'
    
  13. Print the length of the longest line.
    awk '{ if (length(\$0) > max) max = \
    length(\$0) }
    END { print max }' inputfile
    
  14. Print all lines longer than 80 characters.
    awk 'length(\$0) > 80' inputfile
    
  15. Print every line that has at least one field (NF stands for Number of Fields).
    awk 'NF > 0' data
    
  16. Print seven random numbers from 0 to 100.
    awk 'BEGIN { for (i = 1; i <= 7; i++)
    print int(101 * rand()) }'
    
  17. Print the total number of bytes used by files in the current directory.
    ls -l . | awk '{ x += $5 } ; END \
    { print "total bytes: " x }'
    total bytes: 7449362
    
  18. Print the total number of kilobytes used by files in the current directory.
    ls -l . | awk '{ x += $5 } ; END \
    { print "total kilobytes: " (x + \
    1023)/1024 }'
    total kilobytes: 7275.85
    
  19. Print sorted list of login names.
    awk -F: '{ print $1 }' /etc/passwd | sort
    
  20. Print number of lines in a file, as NR stands for Number of Rows.
    awk 'END { print NR }' inputfile
    
  21. Print the even-numbered lines in a file. How would you print the odd-numbered lines?
    awk 'NR % 2 == 0' data
    
  22. Prints the total number of bytes of files that were last modified in November.
    ls -l | awk '$6 == "Nov" { sum += $5 }
    END { print sum }'
    
  23. Regular expression matching all entries in the first field that start with a capital j.
    awk '$1  /J/' inputfile
    
  24. Regular expression matching all entries in the first field that don’t start with a capital j.
    awk '$1  !/J/' inputfile
    
  25. Escaping double quotes in awk.
    awk 'BEGIN { print "He said \"hi!\" \
    to her." }'
    
  26. Prints “<A>bcd”
    echo aaaabcd | awk '{ sub(/a+/, \
    "<A>"); print }'
    


  27. Attribution example; try it 🙂
    ls -lh | awk '{ owner = $3 ; $3 = $3 \
    " 0wnz"; print $3 }' | uniq
    
  28. Modify inventory and print it, with the difference being that the value of the second field will be lessened by 10.
    awk '{ $2 = $2 - 10; print \$0 }' inventory
    
  29. Even though field six doesn’t exist in inventory, you can create it and assign values to it, then display it.
    awk '{ $6 = ($5 + $4 + $3 + $2); print \
    $6' inventory
    
  30. OFS is the Output Field Separator and the command will output “a::c:d” and “4” because although field two is nullified, it still exists so it gets counted.
    echo a b c d | awk '{ OFS = ":"; $2 = ""
    > print \$0; print NF }'
    
  31. Another example of field creation; as you can see, the field between $4 (existing) and $6 (to be created) gets created as well (as $5 with an empty value), so the output will be “a::c:d::new” “6”.
    echo a b c d | awk ’{ OFS = ":"; \
    $2 = ""; $6 = "new"
    > print \$0; print NF }’
    
  32. Throwing away three fields (last ones) by changing the number of fields.
    echo a b c d e f | awk ’\
    { print "NF =", NF;
    > NF = 3; print \$0 }’
    
  33. This is a regular expression setting the field separator to space and nothing else (non-greedy pattern matching).
    FS=[ ]
    
  34. This will print only “a”.
    echo ' a b c d ' |  awk 'BEGIN { FS = \
    "[ \t\n]+" }
    > { print $2 }'
    
  35. Print only the first match of RE (regular expression).
    awk -n '/RE/{p;q;}' file.txt
    
  36. Sets FS to \\
    awk -F\\ '...' inputfiles ...
    
  37. If we have a record like:
    John Doe
    1234 Unknown Ave.
    Doeville, MA
    This script sets the field separator to newline so it can easily operate on rows.
    BEGIN { RS = "" ; FS = "\n" }
    {
    print "Name is:", $1
    print "Address is:", $2
    print "City and State are:", $3
    print ""
    }
    
  38. With a two-field file, the records will be printed like this:
    “field1:field2

    field3;field4

    …;…”
    Because ORS, the Output Record Separator, is set to two newlines and OFS is “;”

    awk 'BEGIN { OFS = ";"; ORS = "\n\n" }
    > { print $1, $2 }' inputfile
    
  39. This will print 17 and 18, because the Output ForMaT is set to round floating point values to the closest integer value.
    awk 'BEGIN {
    > OFMT = "%.0f" # print numbers as \
    integers (rounds)
    > print 17.23, 17.54 }'
    


  40. You can use printf mainly how you use it in C.
    awk 'BEGIN {
    > msg = "Dont Panic!"
    > printf "%s\n", msg
    >} '
    
  41. Prints the first field as a 10-character string, left-justified, and $2 normally, next to it.
    awk '{ printf "%-10s %s\n", $1, \
    $2 }' inputfile
    
  42. Making things prettier.
    awk 'BEGIN { print "Name  Number"
                 print "----  ------" }
         { printf "%-10s %s\n", $1, \
    $2 }' inputfile
    
  43. Simple data extraction example, where the second field is written to a file named “phone-list”.
    awk '{ print $2 > "phone-list" }' \
    inputfile
    
  44. Write the names contained in $1 to a file, then sort and output the result to another file (you can also append with >>, like you would in a shell).
    awk '{ print $1 > "names.unsorted"
           command = "sort -r > names.sorted"
           print $1 | command }’ inputfile
    
  45. Will print 9, 11, 17.
    awk 'BEGIN { printf "%d, %d, %d\n", 011, 11, \
    0x11 }'
    
  46. Simple search for foo or bar.
    if (/foo/ || /bar/)
       print "Found!"
    
  47. Simple arithmetic operations (most operators resemble C a lot).
    awk '{ sum = $2 + $3 + $4 ; avg = sum / 3
    > print $1, avg }' grades
    
  48. Simple, extensible calculator.
    awk '{ print "The square root of", \
    $1, "is", sqrt($1) }'
    2
    The square root of 2 is 1.41421
    7
    The square root of 7 is 2.64575
    


  49. Prints every record between start and stop.
    awk '$1 == "start", $1 == "stop"' inputfile
    
  50. BEGIN and END rules are executed exactly once, before and after any record processing.
    awk '
    > BEGIN { print "Analysis of \"foo\"" }
    > /foo/ { ++n }
    > END { print "\"foo\" appears", n,\
     "times." }’ inputfile
    
  51. Search using shell.
    echo -n "Enter search pattern: "
    read pattern
    awk "/$pattern/ "'{ nmatches++ }
    END { print nmatches, "found" }' inputfile
    
  52. Simple conditional. awk, like C, also supports the ?: operators.
    if (x % 2 == 0)
    print "x is even"
    else
    print "x is odd"
    
  53. Prints the first three fields of each record, one per line.
    awk '{ i = 1
      while (i <= 3) {
        print $i
        i++
      }
    }’ inputfile
    
  54. Prints the first three fields of each record, one per line.
    awk '{ for (i = 1; i <= 3; i++)
      print \$i
    }'
    
  55. Exiting with an error code different from 0 means something’s not quite right. Here’s an example.
    BEGIN {
    if (("date" | getline date_now) <= 0) {
      print "Can't get system date" > \
    "/dev/stderr"
      exit 1
    }
    print "current date is", date_now
    close("date")
    }
    


  56. Prints awk file1 file2.
    awk 'BEGIN {
    > for (i = 0; i < ARGC; i++)
    > print ARGV[i]
    > }’ file1 file2
    
  57. Delete elements in an array.
    for (i in frequencies)
    delete frequencies[i]
    
  58. Check for array elements.
    foo[4] = ""
    if (4 in foo)
    print "This is printed, even though foo[4] \
    is empty"
    
  59. An awk variant of ctime() in C. This is how you define your own functions in awk.
    function ctime(ts, format)
    {
      format = "%a %b %d %H:%M:%S %Z %Y"
      if (ts == 0)
      ts = systime()
      # use current time as default
      return strftime(format, ts)
    }
    
  60. A Cliff random number generator.
    BEGIN { _cliff_seed = 0.1 }
    function cliff_rand()
    {
      _cliff_seed = (100 * log(_cliff_seed)) % 1
      if (_cliff_seed < 0)
        _cliff_seed = - _cliff_seed
      return _cliff_seed
    }
    
  61. Anonymize an Apache log (IPs are randomized).
    cat apache-anon-noadmin.log | \
    awk 'function ri(n) \
    {  return int(n*rand()); }  \
    BEGIN { srand(); }  { if (! \
    ($1 in randip)) {  \
    randip[$1] = sprintf("%d.%d.%d.%d", \
    ri(255), ri(255)\
    , ri(255), ri(255)); } \
    $1 = randip[$1]; print \$0  }'
    


Conclusion

As you can see, with awk you can do lots of text processing and other nifty stuff. We didn’t get into more advanced topics, like awk‘s predefined functions, but we showed you enough (we hope) to start remembering it as a powerful tool.



Comments and Discussions
Linux Forum