Comprehensive Text File Processing in the Command Line

A comprehensive guide to processing text files using command-line tools for viewing, searching, editing, and analyzing content.

1. Viewing Text Files

Before processing text files, you’ll need to examine their contents using various viewing commands:

  • cat filename.txt: Displays the entire file contents at once
  • less filename.txt: Provides scrollable viewing with search capabilities (press / to search)
  • more filename.txt: Similar to less but with fewer features
  • head -n 10 filename.txt: Shows the first 10 lines
  • tail -n 10 filename.txt: Shows the last 10 lines
  • tail -f filename.txt: Follows the file for real-time updates (useful for logs)

2. Searching Within Text Files

The grep command is your primary tool for searching text patterns:

Basic searches:

  • grep "search_string" filename.txt: Find lines containing the search term
  • grep -i "search_string" filename.txt: Case-insensitive search
  • grep -n "search_string" filename.txt: Show line numbers with matches
  • grep -v "search_string" filename.txt: Show lines that DON’T match

Advanced searches:

  • grep -r "search_string" /path/: Recursively search all files in a directory
  • grep -E "pattern1|pattern2" filename.txt: Search for multiple patterns
  • grep -c "search_string" filename.txt: Count matching lines

3. Stream Editing with sed

Use sed for powerful find-and-replace operations:

Basic substitution:

  • sed 's/old/new/g' filename.txt: Replace all occurrences of ‘old’ with ’new’
  • sed 's/old/new/' filename.txt: Replace only the first occurrence per line
  • sed -i 's/old/new/g' filename.txt: Edit the file in-place

Advanced sed operations:

  • sed '/pattern/d' filename.txt: Delete lines matching pattern
  • sed -n '1,10p' filename.txt: Print only lines 1-10

4. Sorting and Organizing Data

Sort content using various criteria:

  • sort filename.txt: Alphabetical sort
  • sort -r filename.txt: Reverse alphabetical sort
  • sort -n filename.txt: Numerical sort
  • sort -k2 filename.txt: Sort by the second column
  • sort -u filename.txt: Sort and remove duplicates

5. Finding and Filtering Unique Lines

Use uniq to work with duplicate content:

  • sort filename.txt | uniq: Remove adjacent duplicates (requires sorted input)
  • sort filename.txt | uniq -c: Count occurrences of each line
  • sort filename.txt | uniq -u: Show only truly unique lines (no duplicates)
  • sort filename.txt | uniq -d: Show only lines that appear multiple times

6. Counting and Statistics

Get file statistics with the wc command:

  • wc filename.txt: Shows line count, word count, and character count
  • wc -l filename.txt: Count lines only
  • wc -w filename.txt: Count words only
  • wc -c filename.txt: Count characters only

7. Extracting Columns and Fields

Use cut and awk for column-based data extraction:

Using cut:

  • cut -d',' -f1 filename.csv: Extract first column from CSV file
  • cut -d',' -f1,3 filename.csv: Extract first and third columns
  • cut -c1-10 filename.txt: Extract first 10 characters from each line

Using awk (more powerful):

  • awk '{print $1}' filename.txt: Print first field (whitespace-separated)
  • awk -F',' '{print $2}' filename.csv: Print second field from CSV

8. Combining and Splitting Files

Combine multiple files:

  • cat file1.txt file2.txt > combined.txt: Concatenate files
  • paste file1.txt file2.txt: Merge files side by side

Split large files:

  • split -l 1000 largefile.txt chunk_: Split by line count
  • split -b 1M largefile.txt chunk_: Split by file size

9. Text Transformation

Use tr for character-level transformations:

  • tr '[:lower:]' '[:upper:]' < filename.txt: Convert to uppercase
  • tr -s ' ' < filename.txt: Squeeze multiple spaces into one
  • tr -d '[:digit:]' < filename.txt: Delete all numbers

10. Output Redirection and Pipelines

Understanding redirection is crucial for text processing:

  • command > file.txt: Redirect output to file (overwrite)
  • command >> file.txt: Append output to file
  • command1 | command2: Pipe output of command1 to command2
  • command 2> error.log: Redirect error messages to file