Mastering Regular Expressions (Regex) in Bash
Regular Expressions (regex) are extremely powerful tools for pattern matching in Bash. They allow you to search, manipulate, and validate text efficiently. Regex is commonly used with commands like:
grepsedawk[[ string =~ regex ]](Bash built-in)find
A regular expression is essentially a pattern used to match text. In Bash, regex is primarily used in two ways:
Inside the
[[ ... ]]test constructThrough external utilities like
grep,sed, andawk
1. The Two Main Flavors of Regex
BRE (Basic Regular Expressions)
Default for
grepandsed.Metacharacters like
+,?,{, and(must be escaped with a backslash (\) to work.Example:
grep '\+' file.txtmatches one or more of the preceding character.
ERE (Extended Regular Expressions)
Used by
grep -E(oregrep) and Bash’s[[ =~ ]]operator.Most symbols work "out of the box" without backslashes.
Example:
grep -E 'a+b' file.txt
Tip: For even more advanced regex (lookaheads, lookbehinds),
grep -Penables Perl-Compatible Regex.
2. Core Syntax Cheat Sheet
Anchors & Boundaries
^: Matches the start of a line$: Matches the end of a line.: Matches any single character except newline
Quantifiers (How many times?)
*: 0 or more of the preceding character+: 1 or more (ERE)?: 0 or 1 (ERE){n,m}: Betweennandmoccurrences
Character Classes
[abc]: Matches any one ofa,b, orc[^abc]: Matches any character excepta,b, orc[0-9]: Matches any digit[a-z]: Matches any lowercase letter
3. Basic Examples in Bash
Example 1: Matching Names
[root@oel01db ~]# cat re_simple.sh
re='^(dave|joe)'
input=$1
if [[ $input =~ $re ]]; then
echo match
else
echo no match
fi
[root@oel01db ~]# bash re_simple.sh dave
match
[root@oel01db ~]# bash re_simple.sh davejohn
match
Here, the string must start with
daveorjoe.
Example 2: Exact Match at Start and End
[root@oel01db ~]# cat re_simple.sh
re='^(dave|joe)$'
input=$1
if [[ $input =~ $re ]]; then
echo match
else
echo no match
fi
[root@oel01db ~]# bash re_simple.sh davejohn
no match
[root@oel01db ~]# bash re_simple.sh dave
match
Adding
$ensures that the entire string matches the regex.
4. Using ${BASH_REMATCH}
Bash automatically populates a special array variable called BASH_REMATCH whenever [[ string =~ regex ]] matches.
${BASH_REMATCH[0]}→ The entire matched string${BASH_REMATCH[1]}→ The first capture group${BASH_REMATCH[2]}→ The second capture group, and so on
[root@oel01db ~]# cat re_simple.sh
re='^(dave|joe)$'
input=$1
if [[ $input =~ $re ]]; then
echo match
printf '%s\n' "${BASH_REMATCH[@]}"
else
echo no match
fi
[root@oel01db ~]# bash re_simple.sh joe
match
joe
joe
Why “joe” appears twice?
BASH_REMATCH[0]→ Entire match =joeBASH_REMATCH[1]→ Captured group(dave|joe)=joe
Parentheses
()in regex create capture groups.
5. More Complex Example
[root@oel01db ~]# cat re_simple.sh
re='^(d|j).*$'
input=$1
if [[ $input =~ $re ]]; then
echo match
printf '%s\n' "${BASH_REMATCH[@]}"
else
echo no match
fi
[root@oel01db ~]# bash re_simple.sh john
match
john
j
Here:
The string must start with
dorj, followed by anything (.*)Capture group
(d|j)captures only the first character
Valid Examples:
john→ matchesjack→ matchesdave→ matchesdog→ matchesj→ matchesd→ matches
6. Important Additional Points
Escaping in BRE vs ERE
BRE:
+,?,{},()need a backslash (\+)ERE: No escape needed for
+,?,{},()
Testing for Regex in Bash
Always quote the regex if it contains special characters to prevent shell expansion:
[[ $input =~ "$re" ]]
Capture Groups vs Non-Capturing Groups
Capturing:
(abc)→ stored in${BASH_REMATCH[n]}Non-capturing (not supported in Bash
[[ =~ ]]):(?:abc)
Perl-Compatible Regex
grep -Pallows lookahead/lookbehind and more advanced patterns:grep -P '(?<=foo)bar' file.txt
Always Test Your Regex
Tools like regex101.com help visualize capture groups and matches before using in Bash.
No comments:
Post a Comment