Mastering File Parsing in Bash: Cut vs Regex
When working with files in Bash, you often need to extract parts of filenames like names and dates. There are multiple ways to do this—using external commands like cut or using regular expressions (regex) built into Bash. In this post, we’ll explore both approaches and see why regex can be a faster and more elegant solution.
Example Scenario
Suppose you have the following files:
[root@oel01db images]# ls -lrt
total 0
-rw-r--r-- 1 root root 0 Mar 6 05:29 learn shell script - 2026-03-11.jpg
-rw-r--r-- 1 root root 0 Mar 6 05:30 my_first_regex - 2026-03-10.sh
-rw-r--r-- 1 root root 0 Mar 6 05:30 my_family_photo - 2026-03-10.jpg
-rw-r--r-- 1 root root 0 Mar 6 05:31 mysql_dump - 2026-03-01.log
[root@oel01db images]#
We want to format them like this:
2026-03-11: learn shell script
2026-03-10: my_family_photo
2026-03-10: my_first_regex
2026-03-01: mysql_dump
Using cut and External Commands
Here’s one approach using cut and xargs:
[root@oel01db ~]# cat 01-without-regex.sh
for f in ./images/*; do
bname=$(basename "$f")
name=$(echo "$bname" | cut -d - -f 1)
date=$(echo "$bname" | cut -d - -f 2- | cut -d . -f 1 | xargs echo)
echo "$date: $name"
done
Explanation
-
cut -d - -f 1selects the first field before the dash. -
cut -d - -f 2-selects from the second field to the end of the line. -
xargstrims whitespace.
Output
[root@oel01db ~]# bash 01-without-regex.sh
2026-03-11: learn shell script
2026-03-10: my_family_photo
2026-03-10: my_first_regex
2026-03-01: mysql_dump
[root@oel01db ~]#
✅ Works fine, but notice: every cut command forks a new process. For large files or many filenames, this adds noticeable overhead.
Using Bash Regular Expressions
Regex allows us to do everything natively in Bash, without spawning external tools:
[root@oel01db ~]# cat 02-with-regex.sh
#!/usr/bin/env bash
regex="^.*/(.*) - ([0-9]{4}-[0-9]{2}-[0-9]{2})\..*$"
for f in ./images/*; do
if ! [[ $f =~ $regex ]]; then
echo "$f didn't match pattern"
continue
fi
name=${BASH_REMATCH[1]}
date=${BASH_REMATCH[2]}
echo "$date: $name"
done
Output
[root@oel01db ~]# bash 02-with-regex.sh
2026-03-11: learn shell script
2026-03-10: my_family_photo
2026-03-10: my_first_regex
2026-03-01: mysql_dump
[root@oel01db ~]#
Breaking Down the Regex
regex="^.*/(.*) - ([0-9]{4}-[0-9]{2}-[0-9]{2})\..*$"
| Segment | Meaning |
|---|---|
^ | Start of string |
.* | Greedy match: any characters (except newline) 0 or more times |
/ | Literal forward slash |
(.*) | Capture group 1: filename before the dash |
- | Literal space, dash, space separator |
([0-9]{4}-[0-9]{2}-[0-9]{2}) | Capture group 2: date in YYYY-MM-DD format |
\. | Literal dot |
.* | Matches the rest of the string (file extension) |
$ | End of string |
Date Capture Breakdown:
-
[0-9]{4}→ year -
-→ dash -
[0-9]{2}→ month -
-→ dash -
[0-9]{2}→ day
Performance Comparison
Regex (Bash built-in)
[root@oel01db ~]# time ./02-with-regex.sh
real 0m0.008s
user 0m0.006s
sys 0m0.000s
[root@oel01db ~]#
Using cut (external commands)
[root@oel01db ~]# time ./01-without-regex.sh
real 0m0.107s
user 0m0.058s
sys 0m0.022s
[root@oel01db ~]#
Observation: Regex is over 10x faster because it doesn’t fork external processes for every file.
Key Takeaways
-
Regex is powerful for complex text extraction and pattern matching.
-
Avoid unnecessary external commands like
cut,awk,sedif Bash regex can handle the task—it’s faster and cleaner. -
Use capture groups to extract multiple pieces of data in one pass.
-
Performance matters in loops or when processing thousands of files—Bash regex can dramatically reduce runtime.
-
Keep your regex readable—comment your patterns for maintainability.
No comments:
Post a Comment