What is Regex

Regular expressions (regex) are patterns used to match and manipulate text. Think of them as a super-powered search function that lets you find specific patterns in strings—like “all email addresses,” “phone numbers,” or “words starting with ‘cat’"—instead of just exact matches. They’re everywhere: text editors, command-line tools, programming languages, and even your IDE’s search-and-replace feature.

Why Should You Care?

Regex saves you hours of manual text processing. Whether you’re parsing log files, validating user input, extracting data from documents, or refactoring code, regex turns 100 lines of string manipulation into a single elegant pattern. It’s like having a Swiss Army knife for text—once you learn it, you’ll wonder how you ever lived without it.

The Building Blocks

Literals and Character Classes

The simplest regex is just plain text: cat matches “cat”. But the magic begins with character classes:

  • [abc] matches any single character: a, b, or c
  • [a-z] matches any lowercase letter
  • [0-9] matches any digit
  • [^abc] matches anything EXCEPT a, b, or c (the ^ inside brackets means “not”)

The Dot and Escapes

  • . matches any single character except newline
  • \. matches a literal period (backslash escapes special characters)
  • \d matches any digit (shorthand for [0-9])
  • \w matches any word character (letters, digits, underscore)
  • \s matches any whitespace (spaces, tabs, newlines)

Quantifiers (The Power Multipliers)

Quantifiers tell you how many times a pattern should repeat:

  • * means zero or more times (e.g., a* matches “”, “a”, “aa”, “aaa”)
  • + means one or more times (e.g., a+ matches “a”, “aa”, but not “”)
  • ? means zero or one time (makes something optional)
  • {3} means exactly 3 times
  • {3,} means 3 or more times
  • {3,6} means between 3 and 6 times

Anchors

Anchors don’t match characters—they match positions:

  • ^ matches the start of a line
  • $ matches the end of a line
  • \b matches a word boundary (the edge between a word and non-word character)

Groups and Alternation

  • (pattern) creates a capture group
  • | means “or” (e.g., cat|dog matches “cat” or “dog”)

Quick Examples

  • ^\d{3}-\d{4}$ matches phone numbers like “123-4567”
  • \w+@\w+\.\w+ matches simple email addresses
  • ^[A-Z] matches lines starting with an uppercase letter
  • colou?r matches both “color” and “colour”
  • \d{2,4} matches 2 to 4 consecutive digits

Practice Time: Meet rg (ripgrep)

Time to get your hands dirty! rg (ripgrep) is a blazing-fast command-line search tool that uses regex. Let’s install it and practice.

Installing ripgrep

Linux (Ubuntu/Debian):

sudo apt update
sudo apt install ripgrep

MacOS:

brew install ripgrep

Windows:

Using Scoop (no admin required):

# Install Scoop first

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
Invoke-RestMethod -Uri https://get.scoop.sh | Invoke-Expression

# Then install ripgrep

scoop install ripgrep

Verify installation:

rg --version

The Challenge

Create a sample file to practice on:

cat > sample.txt << 'EOF'
My email is john@example.com
Call me at 321-555-1234 or 543-555-5678
The year 2024 was great!
Colors: color, colour, favorite, favourite
Error: Failed to connect on 2024-11-08
192.168.1.1 is the router IP
IMPORTANT: Read this carefully
The quick brown fox jumps
http://example.com and https://secure.com
EOF

Now try these exercises. First, try writing the regex yourself, then test it:

Exercise 1: Find all lines containing email addresses

HintThink about the pattern: word characters, @, word characters, dot, word characters
Solution
rg '\w+@\w+\.\w+' sample.txt

Exercise 2: Find all phone numbers (format: 321-555-1234)

HintUse \d for digits and remember to escape the hyphen or not (it's literal outside brackets)
Solution
rg '\d{3}-\d{3}-\d{4}' sample.txt

Exercise 3: Find lines starting with uppercase words

HintUse ^ for line start and character classes for uppercase letters
Solution
rg '^[A-Z]' sample.txt

Exercise 4: Match both “color” and “colour” (and their variants)

HintMake the 'u' optional using ?
Solution
rg 'colou?r' sample.txt

Exercise 5: Find IP addresses (simple version)

HintPattern: digits, dot, digits, dot, digits, dot, digits
Solution
rg '\d+\.\d+\.\d+\.\d+' sample.txt

Exercise 6: Find URLs (starting with http or https)

HintUse alternation (|) for http or https, and .+ for the rest
Solution
rg 'https?://\S+' sample.txt

Bonus Tips for rg

  • rg -i pattern for case-insensitive search
  • rg -w pattern to match whole words only
  • rg -n pattern to show line numbers
  • rg -c pattern to count matches per file
  • rg -A 2 pattern to show 2 lines after each match

You’re Now a Regex Practitioner

With these fundamentals, you can handle 95% of everyday regex tasks. The key to mastery? Practice! Start using regex in your daily workflow—in your code editor’s search, command-line tools, or scripts. You’ll be pattern-matching like a pro in no time.

Remember: regex can get complex fast, but you don’t need to memorize everything. Keep this guide handy, and don’t be afraid to test your patterns incrementally. Happy pattern matching!