Text Processing: sed, awk, and grep

Text processing is where BSD and GNU tools diverge most significantly. Scripts that work perfectly on Linux often fail on macOS due to subtle differences in sed, awk, and grep. This chapter provides equivalent commands for both platforms and explains when each syntax is required.

sed: Stream Editor

In-Place Editing

The most common gotcha when moving from Linux to macOS:

# GNU sed (Linux)
$ sed -i 's/old/new/g' file.txt

# BSD sed (macOS) - requires backup extension argument
$ sed -i '' 's/old/new/g' file.txt    # No backup created
$ sed -i '.bak' 's/old/new/g' file.txt  # Creates file.txt.bak

# The error you'll see using GNU syntax on BSD:
$ sed -i 's/old/new/g' file.txt
sed: 1: "file.txt": invalid command code f
# BSD interpreted 's/old/new/g' as the backup extension!

Newlines and Special Characters

# Insert newline - GNU sed
$ echo "hello" | sed 's/$/\n/'
hello
                    # (blank line)

# Insert newline - BSD sed (\n doesn't work)
# Method 1: Use $'...' quoting
$ echo "hello" | sed $'s/$/\\\n/'

# Method 2: Literal newline in the command
$ echo "hello" | sed 's/$/\
/'

# Method 3: Use a variable
$ NL=$'\n'
$ echo "hello" | sed "s/$/$NL/"

Tab characters:

# GNU sed - \t works
$ echo "a b" | sed 's/ /\t/'
a	b

# BSD sed - \t doesn't work in replacement
$ echo "a b" | sed $'s/ /\t/'
a	b

# Or use a literal tab
$ echo "a b" | sed 's/ /	/'    # Actual tab character

Extended Regular Expressions

# Basic regex (both platforms)
$ echo "hello" | sed 's/l\+/L/'
heLlo

# Extended regex - GNU
$ echo "hello" | sed -E 's/l+/L/'
heLo

# Extended regex - BSD (same -E flag, compatible!)
$ echo "hello" | sed -E 's/l+/L/'
heLo

# Note: GNU also accepts -r for extended regex
$ echo "hello" | sed -r 's/l+/L/'    # GNU only

Case-Insensitive Matching

# GNU sed supports 'i' flag
$ echo "Hello HELLO hello" | sed 's/hello/hi/gi'
hi hi hi

# BSD sed doesn't support 'i' flag
$ echo "Hello HELLO hello" | sed 's/hello/hi/gi'
sed: 1: "s/hello/hi/gi": bad flag in substitute command: 'i'

# BSD workaround: use character classes
$ echo "Hello HELLO hello" | sed 's/[Hh][Ee][Ll][Ll][Oo]/hi/g'
hi hi hi

# Or use tr for simple case conversion, then sed
$ echo "Hello HELLO hello" | tr '[:upper:]' '[:lower:]' | sed 's/hello/hi/g'

Multiple Commands

# Both support -e for multiple expressions
$ sed -e 's/a/A/' -e 's/b/B/' file.txt

# Both support semicolons
$ sed 's/a/A/; s/b/B/' file.txt

# Both support newlines in the script
$ sed '
s/a/A/
s/b/B/
' file.txt

Address Ranges

# Line numbers (compatible)
$ sed '10,20d' file.txt        # Delete lines 10-20
$ sed '5q' file.txt            # Print first 5 lines and quit

# Pattern addresses (compatible)
$ sed '/start/,/end/d' file.txt

# Last line (compatible)
$ sed '$d' file.txt            # Delete last line

# First match only - GNU sed
$ sed '0,/pattern/s/pattern/replace/' file.txt

# First match only - BSD sed (0 address not supported)
$ sed '1,/pattern/s/pattern/replace/' file.txt
# Note: BSD behavior differs if pattern is on line 1

Portable sed Script

#!/bin/bash
# Works on both BSD and GNU sed

# Detect sed type
if sed --version 2>&1 | grep -q GNU; then
    SED="sed"
    SED_I="sed -i"
else
    SED="sed"
    SED_I="sed -i ''"
fi

# Use with eval for in-place editing
eval $SED_I "'s/old/new/g'" file.txt

# Or use a temp file (most portable)
sed 's/old/new/g' file.txt > file.tmp && mv file.tmp file.txt

awk: Pattern Processing

awk on macOS is actually nawk (new awk), which is largely compatible with GNU awk. However, differences exist.

Basic Usage (Compatible)

# Print columns (compatible)
$ echo "a b c" | awk '{print $2}'
b

# Field separator (compatible)
$ echo "a:b:c" | awk -F: '{print $2}'
b

# Patterns (compatible)
$ awk '/error/ {print}' log.txt

# Variables (compatible)
$ awk -v name="John" 'BEGIN {print "Hello", name}'
Hello John

GNU awk Extensions

# Case-insensitive matching - GNU awk (gawk)
$ echo -e "Hello\nhello\nHELLO" | gawk 'BEGIN{IGNORECASE=1} /hello/'
Hello
hello
HELLO

# BSD awk doesn't have IGNORECASE
$ echo -e "Hello\nhello\nHELLO" | awk '/[Hh][Ee][Ll][Ll][Oo]/'

# Length function as array length - gawk
$ gawk 'BEGIN {a[1]=1; a[2]=2; print length(a)}'
2

# BSD awk - length(array) may not work
# Use a loop to count
$ awk 'BEGIN {a[1]=1; a[2]=2; for(i in a) n++; print n}'
2

Regular Expression Differences

# Word boundaries - GNU awk
$ echo "foo foobar" | gawk '{gsub(/\bfoo\b/, "bar"); print}'
bar foobar

# BSD awk doesn't support \b
$ echo "foo foobar" | awk '{gsub(/foo[^a-z]|foo$/, "bar "); print}'
bar foobar

# Interval expressions {n,m} - both support with --posix or -r
$ echo "aaa" | gawk '{print gsub(/a{2,3}/, "X")}'
1

$ echo "aaa" | awk '{print gsub(/a{2,3}/, "X")}'
# May or may not work depending on macOS version

In-Place Editing with awk

# GNU awk 4.1+ has -i inplace
$ gawk -i inplace '{gsub(/old/, "new")}1' file.txt

# BSD awk has no in-place option
# Use a temp file
$ awk '{gsub(/old/, "new")}1' file.txt > tmp && mv tmp file.txt

Recommended: Install gawk

For consistent behavior, install GNU awk:

$ brew install gawk

# Use as gawk or add to PATH
$ gawk --version
GNU Awk 5.2.2, API 3.2, PMA Avon 8-g1

grep: Pattern Matching

Basic Usage (Compatible)

# Simple pattern (compatible)
$ grep "error" log.txt

# Case insensitive (compatible)
$ grep -i "error" log.txt

# Line numbers (compatible)
$ grep -n "error" log.txt

# Count matches (compatible)
$ grep -c "error" log.txt

# Invert match (compatible)
$ grep -v "debug" log.txt

# Recursive search (compatible)
$ grep -r "TODO" src/

# Only filenames (compatible)
$ grep -l "error" *.log

Extended Regular Expressions

# Extended regex (compatible with -E)
$ grep -E "error|warning" log.txt
$ grep -E "[0-9]{3}-[0-9]{4}" contacts.txt

# Equivalent using egrep (both platforms)
$ egrep "error|warning" log.txt

Perl-Compatible Regex

The biggest difference - GNU grep’s -P flag:

# GNU grep - Perl regex
$ grep -P '\d{3}-\d{4}' contacts.txt
$ grep -P '(?<=@)\w+(?=\.com)' emails.txt    # Lookbehind/ahead

# BSD grep - no -P flag
$ grep -P '\d+'
grep: invalid option -- P

# Alternatives on BSD:

# 1. Use extended regex equivalents
$ grep -E '[0-9]{3}-[0-9]{4}' contacts.txt

# 2. Use perl directly
$ perl -ne 'print if /\d{3}-\d{4}/' contacts.txt

# 3. Install GNU grep
$ brew install grep
$ ggrep -P '\d{3}-\d{4}' contacts.txt

Common Perl Regex Features and BSD Alternatives

# Digits: \d (Perl) vs [0-9] (POSIX)
$ ggrep -P '\d+'          # GNU
$ grep -E '[0-9]+'        # BSD

# Word characters: \w vs [a-zA-Z0-9_]
$ ggrep -P '\w+'          # GNU
$ grep -E '[a-zA-Z0-9_]+' # BSD

# Word boundaries: \b vs [[:<:]] and [[:>:]]
$ ggrep -P '\bword\b'     # GNU
$ grep -E '[[:<:]]word[[:>:]]'  # BSD

# Non-greedy matching: .*?
$ ggrep -oP '<.*?>'       # GNU - minimal match
# BSD has no equivalent; use different approach
$ grep -oE '<[^>]*>'      # Matches <...> without > inside

# Lookahead/lookbehind
$ ggrep -P '(?<=\$)\d+'   # GNU - digits after $
# BSD has no equivalent

Context Lines

# Lines before match (compatible)
$ grep -B 3 "error" log.txt

# Lines after match (compatible)
$ grep -A 3 "error" log.txt

# Lines before and after (compatible)
$ grep -C 3 "error" log.txt

Line-Buffered Output

# Both support --line-buffered for live log watching
$ tail -f log.txt | grep --line-buffered "error"

sort: Sorting Text

Basic Sorting (Compatible)

# Alphabetical sort (compatible)
$ sort file.txt

# Reverse sort (compatible)
$ sort -r file.txt

# Numeric sort (compatible)
$ sort -n numbers.txt

# Sort by field (compatible)
$ sort -t: -k2 /etc/passwd

GNU-Only Features

# Version sort - GNU only
$ printf "1.10\n1.2\n1.1" | sort -V
1.1
1.2
1.10

$ printf "1.10\n1.2\n1.1" | sort -V    # BSD
sort: invalid option -- V

# BSD workaround (complex, not exact equivalent)
$ printf "1.10\n1.2\n1.1" | sort -t. -k1,1n -k2,2n
1.1
1.2
1.10

# Human-readable sizes - GNU only
$ du -h | sort -h
1K    small/
10M   medium/
2G    large/

$ du -h | sort -h    # BSD
sort: invalid option -- h

Stable Sort

# Both support stable sort
$ sort -s file.txt

# Random sort - both support
$ sort -R file.txt

cut: Extract Fields

Basic Usage (Compatible)

# Cut by character position (compatible)
$ echo "hello" | cut -c1-3
hel

# Cut by field (compatible)
$ echo "a:b:c" | cut -d: -f2
b

# Multiple fields (compatible)
$ echo "a:b:c" | cut -d: -f1,3
a:c

# Range of fields (compatible)
$ echo "a:b:c:d:e" | cut -d: -f2-4
b:c:d

GNU Extensions

# Complement - GNU only
$ echo "a:b:c" | cut -d: --complement -f2
a:c

$ echo "a:b:c" | cut -d: --complement -f2    # BSD
cut: illegal option -- -

# BSD alternative using awk
$ echo "a:b:c" | awk -F: '{print $1":"$3}'
a:c

# Output delimiter - GNU only
$ echo "a:b:c" | cut -d: -f1,3 --output-delimiter=' '
a c

# BSD alternative
$ echo "a:b:c" | cut -d: -f1,3 | tr ':' ' '
a c

tr: Translate Characters

tr is mostly compatible between BSD and GNU:

# Character substitution (compatible)
$ echo "hello" | tr 'a-z' 'A-Z'
HELLO

# Delete characters (compatible)
$ echo "hello 123" | tr -d '0-9'
hello

# Squeeze repeated characters (compatible)
$ echo "heeello" | tr -s 'e'
hello

# Character classes (compatible)
$ echo "Hello123" | tr -d '[:digit:]'
Hello

# Complement (compatible)
$ echo "hello 123" | tr -dc '0-9\n'
123

uniq: Filter Duplicates

Mostly compatible:

# Remove adjacent duplicates (compatible)
$ sort file.txt | uniq

# Count occurrences (compatible)
$ sort file.txt | uniq -c

# Only duplicates (compatible)
$ sort file.txt | uniq -d

# Only unique (compatible)
$ sort file.txt | uniq -u

# Case insensitive (compatible)
$ sort file.txt | uniq -i

Portable Text Processing Tips

1. Use temp files instead of in-place editing

# Most portable
sed 's/old/new/g' file.txt > file.tmp && mv file.tmp file.txt

2. Avoid GNU-specific regex

# Instead of \d, use [0-9]
# Instead of \w, use [a-zA-Z0-9_]
# Instead of \s, use [[:space:]]
# Instead of \b, use [[:<:]] and [[:>:]] (BSD) or word boundary logic

3. Create wrapper functions

# Add to ~/.zshrc or script
portable_sed_i() {
    if sed --version 2>&1 | grep -q GNU; then
        sed -i "$@"
    else
        sed -i '' "$@"
    fi
}

4. Use awk for complex transformations

awk is more consistent across platforms than sed for complex operations:

# Instead of complex sed
$ awk '{gsub(/old/, "new"); print}' file.txt > tmp && mv tmp file.txt

5. Test on both platforms

Before distributing scripts, test on both macOS and Linux:

# Check for GNU-specific features
$ shellcheck myscript.sh    # Static analysis

# Test in Docker for Linux
$ docker run --rm -v "$PWD:/work" -w /work alpine:latest sh myscript.sh

Summary: BSD vs GNU Text Tools

Feature	GNU	BSD	Portable Alternative
sed -i	`sed -i`	`sed -i ''`	temp file
sed \n	Works	Doesn’t work	$‘\n’ or literal
grep -P	Works	Doesn’t exist	Use -E with POSIX
sort -V	Works	Doesn’t exist	Custom solution
sort -h	Works	Doesn’t exist	Sort raw bytes
cut –complement	Works	Doesn’t exist	Use awk
awk IGNORECASE	Works	Doesn’t exist	Character classes

When in doubt, install GNU tools via Homebrew and use them explicitly, or write POSIX-compliant commands that work everywhere.

Keyboard shortcuts

macOS and Unix: The Complete Guide