Text files processing in CLI

2024-04-11T00:00:00+00:00

Small article on how to process text files in CLI

1. Viewing Text Files

Before processing text files, you might need to view their contents. You can use commands like cat, less, more, and tail:

cat filename.txt: Displays the entire contents of the file.
less filename.txt: Allows for scrollable viewing of the file contents.
more filename.txt: Similar to less, but with less flexibility.
tail -n 10 filename.txt: Shows the last 10 lines of the file.

2. Searching Text Files

To search within a text file, grep is incredibly useful:

grep “search_string” filename.txt: Prints lines containing the search string.
grep -i “search_string” filename.txt: Case-insensitive search.
grep -r “search_string” /path/: Recursively search all files under the specified directory.

3. Editing Text Files

While Bash isn’t typically used for interactive editing, you can use sed for stream editing:

sed ’s/old/new/g’ filename.txt: Replaces all occurrences of ‘old’ with ’new’ in the file and displays the result.
To save changes back to the file, you can redirect the output: sed ’s/old/new/g’ filename.txt > modified_filename.txt

4. Sorting Data in Text Files

Sorting content is another common requirement:

sort filename.txt: Sorts lines alphabetically.
sort -r filename.txt: Sorts lines in reverse order.
sort -n filename.txt: Sorts lines numerically.

5. Unique Lines in Text Files

To find or filter unique lines, use uniq:

sort filename.txt | uniq: Removes duplicate lines (note that uniq requires sorted input).
sort filename.txt | uniq -u: Displays only unique lines that do not have duplicates.

6. Counting Words, Lines, and Characters

The wc command is useful for getting basic statistics:

wc filename.txt: Displays the line, word, and character counts.
wc -l filename.txt: Counts the number of lines in the file.

7. Extracting Columns of Data

The cut command is handy when dealing with delimited data:

cut -d’,’ -f1 filename.txt: Extracts the first column from a CSV (comma-separated values) file.

8. Combining Multiple Files

To combine the contents of multiple text files, use cat:

cat file1.txt file2.txt > combined.txt: Concatenates file1.txt and file2.txt into combined.txt.

9. Transforming Text

You can use tr to translate or delete characters:

cat filename.txt | tr ‘[:lower:]’ ‘[:upper:]’: Converts all lowercase letters to uppercase.
tr -s ’ ’ < filename.txt: Squeezes multiple spaces into a single space.

10. Redirecting and Appending Output

Understanding redirection is crucial:

command > file.txt: Redirects the output of command to file.txt, overwriting it.
command » file.txt: Appends the output of command to file.txt.

Text files cleanup

2022-01-30T00:00:00+00:00

Text files cleanup and content deduplication

Sort file lines and deduplicate content:

sort file.txt | uniq -u

Usage of AWK for deduplicate file content:

awk '!seen[$0]++' file.txt

text on Decaf Blog