Essential Wget Commands for File Downloads and Web Mirroring
A comprehensive guide to wget commands for downloading files, mirroring websites, and automating web content retrieval.
Simple file download:
wget https://example.com/file.zip
Download with custom filename:
wget -O custom_name.zip https://example.com/file.zip
Download to specific directory:
wget -P /path/to/downloads/ https://example.com/file.zip
Run download in background:
wget -b https://example.com/largefile.zip
# Progress logged to wget-log file
Quiet download (suppress output):
wget -q https://example.com/file.zip
Show only progress bar:
wget --progress=bar https://example.com/file.zip
Download entire website:
wget -r -p -E -k -K -np https://example.com/
Mirror with specific depth:
wget -r -l 2 -np -k https://example.com/
Download specific file types only:
wget -r -A "*.pdf,*.doc,*.txt" https://example.com/
-r, --recursive: Enable recursive downloading-l, --level=NUMBER: Maximum recursion depth-p, --page-requisites: Download CSS, images, etc. for HTML display-E, --adjust-extension: Save HTML files with .html extension-k, --convert-links: Convert links for local viewing-K, --backup-converted: Keep original files when converting-np, --no-parent: Don’t follow links outside the directory
Limit download speed:
wget --limit-rate=200k https://example.com/file.zip
Add delays between downloads:
wget --wait=2 -r https://example.com/
Random wait times:
wget --random-wait -r https://example.com/
Resume interrupted download:
wget -c https://example.com/largefile.zip
Set retry attempts:
wget -t 5 https://example.com/file.zip # Try 5 times
wget -t 0 https://example.com/file.zip # Infinite retries
Retry on connection refused:
wget --retry-connrefused -t 10 https://example.com/file.zip
HTTP authentication:
wget --http-user=username --http-password=password https://example.com/file.zip
Custom user agent:
wget --user-agent="Mozilla/5.0 (compatible; MyBot/1.0)" https://example.com/
Add custom headers:
wget --header="Accept: application/json" \
--header="Authorization: Bearer token123" \
https://api.example.com/data
Load cookies from file:
wget --load-cookies cookies.txt https://example.com/members/file.zip
Skip SSL certificate verification:
wget --no-check-certificate https://self-signed.example.com/file.zip
Specify CA certificate:
wget --ca-certificate=/path/to/ca-cert.pem https://example.com/file.zip
Use HTTP proxy:
wget --proxy=on --proxy-server=proxy.example.com:8080 https://example.com/file.zip
Set proxy via environment:
export http_proxy=http://proxy.example.com:8080
export https_proxy=http://proxy.example.com:8080
wget https://example.com/file.zip
wget -r -A "*.pdf" -np -nd https://example.com/documents/
wget --mirror --convert-links --adjust-extension \
--page-requisites --no-parent https://example.com/
wget ftp://username:[email protected]/file.zip
wget -i urls.txt
# Where urls.txt contains one URL per line
wget -N https://example.com/file.zip
# Only download if server file is newer than local file
Complete website backup:
wget --recursive --no-clobber --page-requisites \
--html-extension --convert-links --restrict-file-names=windows \
--domains example.com --no-parent https://example.com/
Polite crawling:
wget --recursive --wait=1 --random-wait \
--limit-rate=200k --user-agent="polite-crawler/1.0" \
https://example.com/
Custom log file:
wget -o download.log https://example.com/file.zip
Append to log:
wget -a download.log https://example.com/file.zip
Debug information:
wget -d https://example.com/file.zip
- Always use
--robots=offcarefully and respect robots.txt - Add delays when downloading multiple files to be server-friendly
- Use
--no-parentto avoid downloading unwanted parent directories - Test with
--spiderto check links without downloading - Use
--timeoutfor unreliable connections - Monitor bandwidth usage with
--limit-rate