There are many ways to save web pages and web sites for offline viewing. These methods will work on Linux, Windows and/or Mac OS X. These tools will save entire web pages and web sites. If you are looking for a way to take screenshots, try this page instead.
Spiderzilla was a great Firefox extension that downloaded entire web sites with an embedded version of HTTrack. It looks like you can still download Spiderzilla, but the extension might not be maintained anymore. Worth checking out.
HTTrack is a classic tool for downloading entire web sites, or parts of web sites. Think carefully before you use this program on someone's web site. If it's a large web site you are going to use up a lot of bandwidth, so don't do it to someone else's web site. Use the Scrapbook Firefox extension, described above, to download individual pages instead.
Lynx is a text-based Web browser. I previously wrote a Lynx tutorial that shows how to extract text from web pages. You can also use Lynx to capture just the text of multiple web pages. It's a bit messy though and I don't recommend it unless you have a specific purpose that needs text extraction from web pages in this manner. Here it is:
First make a test directory:
Navigate into that directory:
Start the crawl. Don't do this on other people's large web sites because it could use up a lot of bandwidth on a large site.
lynx -crawl -traversal "http://www.[yoursite].com"
You will then end up with a directory full of text files with a .dat file extension.
Tip:You can change the .dat file extensions to .txt with the following command — make sure you are in the right directory first:
rename -v 's/\.dat$/\.txt/' *.dat
Or remove the file extensions altogether with the following command:
rename -v 's/\.dat$//' *.dat
More about the rename command here
Assuming that you are leaving the .dat file extensions for now, this is a list of files and what they contain:
If you want to combine all the pages of text into one file for searching with a visual text editor like gedit, SciTE, or Notepad, you can use the cat command like this:
cat * >MyFile.txt
That will create a file called MyFile.txt that contains all of the text from the files in the current directory.
You can also grep (search) the files all at once with the grep command. Navigate to the directory with the files that you want to search and type something like:
grep -i "your search terms" *
The -i will make it a case-insensitive search. For more information on grep, type man grep in the terminal.
Wget information is coming soon, but will be covered in another post.
For saving individual web pages, I recommend the Scrapbook Firefox extension. For downloading and saving entire web sites I recommend HTTrack (don't use it on large web sites though). Wget is great for selectively grabbing files from a Web page/site. If you know of other good tools for saving web pages for offline viewing, leave a comment below.