Text Processing with grep, sed, and awk

Text processing is where the command line truly shines. Three tools — grep, sed, and awk — form a powerful toolkit for searching, transforming, and analyzing text data. Together with cat and less for viewing files, they handle the vast majority of text manipulation tasks you’ll encounter.

Viewing files with cat and less

Before processing text, you need to see it. cat prints entire files, while less lets you page through them:

cat file.txt                  # Print entire file
cat -n file.txt               # Show line numbers
less file.txt                 # Page through (Space/b to navigate, q to quit)

Searching with grep

grep searches text for patterns. It’s the tool you’ll reach for most often — to find error messages in logs, locate function definitions in code, or filter command output:

grep "pattern" file.txt       # Search for pattern
grep -r "pattern" .           # Recursive search in directory
grep -i "pattern" file.txt    # Case-insensitive
grep -n "pattern" file.txt    # Show line numbers
grep -c "pattern" file.txt    # Count matches
grep -v "pattern" file.txt    # Invert — show NON-matching lines

grep -r is one of the most frequently used commands in software development. Before reaching for a more complex tool, try grep -rn "function_name" . to find where something is defined or used.

Transforming with sed

sed (stream editor) transforms text as it flows through — find-and-replace, delete lines, extract ranges. The s/old/new/ syntax handles most use cases:

sed 's/old/new/' file.txt     # Replace first occurrence per line
sed 's/old/new/g' file.txt    # Replace ALL occurrences
sed -i 's/old/new/g' file.txt # Edit file in-place
sed -n '5,10p' file.txt       # Print only lines 5-10
sed '/pattern/d' file.txt     # Delete lines matching pattern

sed doesn’t modify the original file unless you use the -i flag. By default, it outputs the transformed text to stdout, which makes it safe for experimentation and perfect for pipelines.

Analyzing with awk

awk processes structured text — files with columns, CSV data, log files with consistent formats. It splits each line into fields and lets you operate on them:

awk '{print $1}' file.txt           # Print first column
awk -F: '{print $1}' /etc/passwd    # Custom delimiter
awk '{sum += $1} END {print sum}'   # Sum a column
awk 'NR >= 5 && NR <= 10' file.txt  # Print lines 5-10

The real power comes from combining these tools with pipes. grep "ERROR" app.log | awk '{print $3}' | sort | uniq -c | sort -rn finds all errors, extracts the third field, and ranks them by frequency — all in one line.

The pipeline philosophy

These tools are designed to work together. Each takes input, processes it, and produces output that the next tool can consume. This composability is the core of the Unix philosophy and the reason these decades-old tools remain essential.

Ready to practice? Explore the project repository for the full text processing reference and interactive exercises.