Comprehensive Text File Processing in the Command Line
A comprehensive guide to processing text files using command-line tools for viewing, searching, editing, and analyzing content.
Before processing text files, you’ll need to examine their contents using various viewing commands:
cat filename.txt: Displays the entire file contents at onceless filename.txt: Provides scrollable viewing with search capabilities (press/to search)more filename.txt: Similar to less but with fewer featureshead -n 10 filename.txt: Shows the first 10 linestail -n 10 filename.txt: Shows the last 10 linestail -f filename.txt: Follows the file for real-time updates (useful for logs)
The grep command is your primary tool for searching text patterns:
Basic searches:
grep "search_string" filename.txt: Find lines containing the search termgrep -i "search_string" filename.txt: Case-insensitive searchgrep -n "search_string" filename.txt: Show line numbers with matchesgrep -v "search_string" filename.txt: Show lines that DON’T match
Advanced searches:
grep -r "search_string" /path/: Recursively search all files in a directorygrep -E "pattern1|pattern2" filename.txt: Search for multiple patternsgrep -c "search_string" filename.txt: Count matching lines
Use sed for powerful find-and-replace operations:
Basic substitution:
sed 's/old/new/g' filename.txt: Replace all occurrences of ‘old’ with ’new’sed 's/old/new/' filename.txt: Replace only the first occurrence per linesed -i 's/old/new/g' filename.txt: Edit the file in-place
Advanced sed operations:
sed '/pattern/d' filename.txt: Delete lines matching patternsed -n '1,10p' filename.txt: Print only lines 1-10
Sort content using various criteria:
sort filename.txt: Alphabetical sortsort -r filename.txt: Reverse alphabetical sortsort -n filename.txt: Numerical sortsort -k2 filename.txt: Sort by the second columnsort -u filename.txt: Sort and remove duplicates
Use uniq to work with duplicate content:
sort filename.txt | uniq: Remove adjacent duplicates (requires sorted input)sort filename.txt | uniq -c: Count occurrences of each linesort filename.txt | uniq -u: Show only truly unique lines (no duplicates)sort filename.txt | uniq -d: Show only lines that appear multiple times
Get file statistics with the wc command:
wc filename.txt: Shows line count, word count, and character countwc -l filename.txt: Count lines onlywc -w filename.txt: Count words onlywc -c filename.txt: Count characters only
Use cut and awk for column-based data extraction:
Using cut:
cut -d',' -f1 filename.csv: Extract first column from CSV filecut -d',' -f1,3 filename.csv: Extract first and third columnscut -c1-10 filename.txt: Extract first 10 characters from each line
Using awk (more powerful):
awk '{print $1}' filename.txt: Print first field (whitespace-separated)awk -F',' '{print $2}' filename.csv: Print second field from CSV
Combine multiple files:
cat file1.txt file2.txt > combined.txt: Concatenate filespaste file1.txt file2.txt: Merge files side by side
Split large files:
split -l 1000 largefile.txt chunk_: Split by line countsplit -b 1M largefile.txt chunk_: Split by file size
Use tr for character-level transformations:
tr '[:lower:]' '[:upper:]' < filename.txt: Convert to uppercasetr -s ' ' < filename.txt: Squeeze multiple spaces into onetr -d '[:digit:]' < filename.txt: Delete all numbers
Understanding redirection is crucial for text processing:
command > file.txt: Redirect output to file (overwrite)command >> file.txt: Append output to filecommand1 | command2: Pipe output of command1 to command2command 2> error.log: Redirect error messages to file