Tuesday, 10 March 2026

Mastering File Parsing in Bash: Cut vs Regex

 

Mastering File Parsing in Bash: Cut vs Regex

When working with files in Bash, you often need to extract parts of filenames like names and dates. There are multiple ways to do this—using external commands like cut or using regular expressions (regex) built into Bash. In this post, we’ll explore both approaches and see why regex can be a faster and more elegant solution.


Example Scenario

Suppose you have the following files:

[root@oel01db images]# ls -lrt
total 0
-rw-r--r-- 1 root root 0 Mar 6 05:29 learn shell script - 2026-03-11.jpg
-rw-r--r-- 1 root root 0 Mar 6 05:30 my_first_regex - 2026-03-10.sh
-rw-r--r-- 1 root root 0 Mar 6 05:30 my_family_photo - 2026-03-10.jpg
-rw-r--r-- 1 root root 0 Mar 6 05:31 mysql_dump - 2026-03-01.log
[root@oel01db images]#

We want to format them like this:

2026-03-11: learn shell script
2026-03-10: my_family_photo
2026-03-10: my_first_regex
2026-03-01: mysql_dump

Using cut and External Commands

Here’s one approach using cut and xargs:

[root@oel01db ~]# cat 01-without-regex.sh
for f in ./images/*; do
bname=$(basename "$f")

name=$(echo "$bname" | cut -d - -f 1)
date=$(echo "$bname" | cut -d - -f 2- | cut -d . -f 1 | xargs echo)

echo "$date: $name"
done

Explanation

  • cut -d - -f 1 selects the first field before the dash.

  • cut -d - -f 2- selects from the second field to the end of the line.

  • xargs trims whitespace.

Output

[root@oel01db ~]# bash 01-without-regex.sh
2026-03-11: learn shell script
2026-03-10: my_family_photo
2026-03-10: my_first_regex
2026-03-01: mysql_dump
[root@oel01db ~]#

✅ Works fine, but notice: every cut command forks a new process. For large files or many filenames, this adds noticeable overhead.


Using Bash Regular Expressions

Regex allows us to do everything natively in Bash, without spawning external tools:

[root@oel01db ~]# cat 02-with-regex.sh
#!/usr/bin/env bash

regex="^.*/(.*) - ([0-9]{4}-[0-9]{2}-[0-9]{2})\..*$"

for f in ./images/*; do
if ! [[ $f =~ $regex ]]; then
echo "$f didn't match pattern"
continue
fi

name=${BASH_REMATCH[1]}
date=${BASH_REMATCH[2]}

echo "$date: $name"
done

Output

[root@oel01db ~]# bash 02-with-regex.sh
2026-03-11: learn shell script
2026-03-10: my_family_photo
2026-03-10: my_first_regex
2026-03-01: mysql_dump
[root@oel01db ~]#

Breaking Down the Regex

regex="^.*/(.*) - ([0-9]{4}-[0-9]{2}-[0-9]{2})\..*$"
SegmentMeaning
^Start of string
.*Greedy match: any characters (except newline) 0 or more times
/Literal forward slash
(.*)Capture group 1: filename before the dash
-Literal space, dash, space separator
([0-9]{4}-[0-9]{2}-[0-9]{2})Capture group 2: date in YYYY-MM-DD format
\.Literal dot
.*Matches the rest of the string (file extension)
$End of string

Date Capture Breakdown:

  • [0-9]{4} → year

  • - → dash

  • [0-9]{2} → month

  • - → dash

  • [0-9]{2} → day


Performance Comparison

Regex (Bash built-in)

[root@oel01db ~]# time ./02-with-regex.sh
real 0m0.008s
user 0m0.006s
sys 0m0.000s
[root@oel01db ~]#

Using cut (external commands)

[root@oel01db ~]# time ./01-without-regex.sh
real 0m0.107s
user 0m0.058s
sys 0m0.022s
[root@oel01db ~]#

Observation: Regex is over 10x faster because it doesn’t fork external processes for every file.


Key Takeaways

  1. Regex is powerful for complex text extraction and pattern matching.

  2. Avoid unnecessary external commands like cut, awk, sed if Bash regex can handle the task—it’s faster and cleaner.

  3. Use capture groups to extract multiple pieces of data in one pass.

  4. Performance matters in loops or when processing thousands of files—Bash regex can dramatically reduce runtime.

  5. Keep your regex readable—comment your patterns for maintainability.

No comments:

Post a Comment

JFrog Artifactory - How to install

JFrog Artifactory OSS Installation Guide CentOS 9 + PostgreSQL 17 This guide provides a structured workflow to install JFrog Artifactory OSS...