Text files processing in CLI
Small article on how to process text files in CLI
Before processing text files, you might need to view their contents. You can use commands like cat, less, more, and tail:
-
cat filename.txt: Displays the entire contents of the file.
-
less filename.txt: Allows for scrollable viewing of the file contents.
-
more filename.txt: Similar to less, but with less flexibility.
-
tail -n 10 filename.txt: Shows the last 10 lines of the file.
To search within a text file, grep is incredibly useful:
-
grep “search_string” filename.txt: Prints lines containing the search string.
-
grep -i “search_string” filename.txt: Case-insensitive search.
-
grep -r “search_string” /path/: Recursively search all files under the specified directory.
While Bash isn’t typically used for interactive editing, you can use sed for stream editing:
-
sed ’s/old/new/g’ filename.txt: Replaces all occurrences of ‘old’ with ’new’ in the file and displays the result.
-
To save changes back to the file, you can redirect the output: sed ’s/old/new/g’ filename.txt > modified_filename.txt
Sorting content is another common requirement:
-
sort filename.txt: Sorts lines alphabetically.
-
sort -r filename.txt: Sorts lines in reverse order.
-
sort -n filename.txt: Sorts lines numerically.
To find or filter unique lines, use uniq:
-
sort filename.txt | uniq: Removes duplicate lines (note that uniq requires sorted input).
-
sort filename.txt | uniq -u: Displays only unique lines that do not have duplicates.
The wc command is useful for getting basic statistics:
-
wc filename.txt: Displays the line, word, and character counts.
-
wc -l filename.txt: Counts the number of lines in the file.
The cut command is handy when dealing with delimited data:
- cut -d’,’ -f1 filename.txt: Extracts the first column from a CSV (comma-separated values) file.
To combine the contents of multiple text files, use cat:
- cat file1.txt file2.txt > combined.txt: Concatenates file1.txt and file2.txt into combined.txt.
You can use tr to translate or delete characters:
-
cat filename.txt | tr ‘[:lower:]’ ‘[:upper:]’: Converts all lowercase letters to uppercase.
-
tr -s ’ ’ < filename.txt: Squeezes multiple spaces into a single space.
Understanding redirection is crucial:
-
command > file.txt: Redirects the output of command to file.txt, overwriting it.
-
command » file.txt: Appends the output of command to file.txt.