Essential Wget Commands for File Downloads and Web Mirroring

Feb 22, 2023 3 min read

A comprehensive guide to wget commands for downloading files, mirroring websites, and automating web content retrieval.

Basic Download Operations

Simple file download:

wget https://example.com/file.zip

Download with custom filename:

wget -O custom_name.zip https://example.com/file.zip

Download to specific directory:

wget -P /path/to/downloads/ https://example.com/file.zip

Background and Quiet Downloads

Run download in background:

wget -b https://example.com/largefile.zip
# Progress logged to wget-log file

Quiet download (suppress output):

wget -q https://example.com/file.zip

Show only progress bar:

wget --progress=bar https://example.com/file.zip

Recursive Downloads and Website Mirroring

Download entire website:

wget -r -p -E -k -K -np https://example.com/

Mirror with specific depth:

wget -r -l 2 -np -k https://example.com/

Download specific file types only:

wget -r -A "*.pdf,*.doc,*.txt" https://example.com/

Recursive Options Explained

-r, --recursive: Enable recursive downloading
-l, --level=NUMBER: Maximum recursion depth
-p, --page-requisites: Download CSS, images, etc. for HTML display
-E, --adjust-extension: Save HTML files with .html extension
-k, --convert-links: Convert links for local viewing
-K, --backup-converted: Keep original files when converting
-np, --no-parent: Don’t follow links outside the directory

Download Control and Rate Limiting

Limit download speed:

wget --limit-rate=200k https://example.com/file.zip

Add delays between downloads:

wget --wait=2 -r https://example.com/

Random wait times:

wget --random-wait -r https://example.com/

Resume and Retry Options

Resume interrupted download:

wget -c https://example.com/largefile.zip

Set retry attempts:

wget -t 5 https://example.com/file.zip  # Try 5 times
wget -t 0 https://example.com/file.zip  # Infinite retries

Retry on connection refused:

wget --retry-connrefused -t 10 https://example.com/file.zip

Authentication and Headers

HTTP authentication:

wget --http-user=username --http-password=password https://example.com/file.zip

Custom user agent:

wget --user-agent="Mozilla/5.0 (compatible; MyBot/1.0)" https://example.com/

Add custom headers:

wget --header="Accept: application/json" \
     --header="Authorization: Bearer token123" \
     https://api.example.com/data

Load cookies from file:

wget --load-cookies cookies.txt https://example.com/members/file.zip

HTTPS and Certificate Handling

Skip SSL certificate verification:

wget --no-check-certificate https://self-signed.example.com/file.zip

Specify CA certificate:

wget --ca-certificate=/path/to/ca-cert.pem https://example.com/file.zip

Proxy Configuration

Use HTTP proxy:

wget --proxy=on --proxy-server=proxy.example.com:8080 https://example.com/file.zip

Set proxy via environment:

export http_proxy=http://proxy.example.com:8080
export https_proxy=http://proxy.example.com:8080
wget https://example.com/file.zip

Advanced Use Cases

Download All PDFs from a Website

wget -r -A "*.pdf" -np -nd https://example.com/documents/

Mirror Website for Offline Viewing

wget --mirror --convert-links --adjust-extension \
     --page-requisites --no-parent https://example.com/

Download with FTP

wget ftp://username:[email protected]/file.zip

Batch Download from URL List

wget -i urls.txt
# Where urls.txt contains one URL per line

Download with Timestamping

wget -N https://example.com/file.zip
# Only download if server file is newer than local file

Useful Combinations

Complete website backup:

wget --recursive --no-clobber --page-requisites \
     --html-extension --convert-links --restrict-file-names=windows \
     --domains example.com --no-parent https://example.com/

Polite crawling:

wget --recursive --wait=1 --random-wait \
     --limit-rate=200k --user-agent="polite-crawler/1.0" \
     https://example.com/

Monitoring and Logging

Custom log file:

wget -o download.log https://example.com/file.zip

Append to log:

wget -a download.log https://example.com/file.zip

Debug information:

wget -d https://example.com/file.zip

Tips and Best Practices

Always use --robots=off carefully and respect robots.txt
Add delays when downloading multiple files to be server-friendly
Use --no-parent to avoid downloading unwanted parent directories
Test with --spider to check links without downloading
Use --timeout for unreliable connections
Monitor bandwidth usage with --limit-rate