Essential Text Transformation Commands for File Processing
A comprehensive guide to text transformation commands for encoding conversion, file cleanup, and batch processing operations.
Convert text files between different character encodings using iconv:
Basic encoding conversion:
iconv -f CP1252 -t UTF-8 example.txt > example-utf-8.txt
Windows-1251 to UTF-8 with error handling:
iconv -f windows-1251 -t UTF-8//IGNORE example.txt > example-utf-8.txt
The //IGNORE option skips characters that cannot be converted rather than stopping with an error.
Remove duplicates and empty lines in one command:
awk 'NF && !seen[$0]++' inputfile.txt > outputfile.txt
Alternative method using grep and uniq:
grep -v "^[[:space:]]*$" input.txt | uniq > output.txt
Count the number of lines in a file:
wc -l file.txt
Get comprehensive file statistics:
wc file.txt # Shows lines, words, and characters
Split large files by line count:
split -l 1000 largefile.txt chunk_
This creates files named chunk_aa, chunk_ab, etc., each containing 1000 lines.
Split files by size:
split -b 10M largefile.txt chunk_ # 10MB chunks
Find and remove files by name:
find . -name "Success.txt" -exec rm -rf {} \;
Alternative using xargs (more efficient for many files):
find . -name "Success.txt" | xargs rm -rf
Find files by pattern and size:
find . -name "*.log" -size +10M -exec rm {} \; # Remove log files larger than 10MB
Convert multiple files to UTF-8:
for file in *.txt; do
iconv -f windows-1251 -t UTF-8//IGNORE "$file" > "utf8/${file}"
done
Clean multiple files:
for file in *.txt; do
awk 'NF && !seen[$0]++' "$file" > "cleaned/${file}"
done