There are many ways to save web pages and web sites for offline viewing. These methods will work on Linux, Windows and/or Mac OS X. These tools will save entire web pages and web sites. If you are looking for a way to take screenshots, try this page instead.
Firefox has an extension called Scrapbook. Scrapbook lets you edit the saved web pages so you can add notes, highlighting, inline annotations, and more. It is an excellent tool for research.
Spiderzilla was a great Firefox extension that downloaded entire web sites with an embedded version of HTTrack. It looks like you can still download Spiderzilla, but the extension might not be maintained anymore. Worth checking out.
HTTrack is a classic tool for downloading entire web sites, or parts of web sites. Think carefully before you use this program on someone's web site. If it's a large web site you are going to use up a lot of bandwidth, so don't do it to someone else's web site. Use the Scrapbook Firefox extension, described above, to download individual pages instead.
Lynx is a text-based Web browser. I previously wrote a Lynx tutorial that shows how to extract text from web pages. You can also use Lynx to capture just the text of multiple web pages. It's a bit messy though and I don't recommend it unless you have a specific purpose that needs text extraction from web pages in this manner. Here it is:
First make a test directory:
mkdir lynx_testing
Navigate into that directory:
cd ./lynx_testing
Start the crawl. Don't do this on other people's large web sites because it could use up a lot of bandwidth on a large site.
lynx -crawl -traversal "http://www.[yoursite].com"
You will then end up with a directory full of text files with a .dat file extension.
Tip:You can change the .dat file extensions to .txt with the following command — make sure you are in the right directory first:
rename -v 's/\.dat$/\.txt/' *.dat
Or remove the file extensions altogether with the following command:
rename -v 's/\.dat$//' *.dat
More about the rename command here
Assuming that you are leaving the .dat file extensions for now, this is a list of files and what they contain:
If you want to combine all the pages of text into one file for searching with a visual text editor like gedit, SciTE, or Notepad, you can use the cat command like this:
cat * >MyFile.txt
That will create a file called MyFile.txt that contains all of the text from the files in the current directory.
You can also grep (search) the files all at once with the grep command. Navigate to the directory with the files that you want to search and type something like:
grep -i "your search terms" *
The -i will make it a case-insensitive search. For more information on grep, type man grep in the terminal.
Wget information is coming soon, but will be covered in another post.
For saving individual web pages, I recommend the Scrapbook Firefox extension. For downloading and saving entire web sites I recommend HTTrack (don't use it on large web sites though). Wget is great for selectively grabbing files from a Web page/site. If you know of other good tools for saving web pages for offline viewing, leave a comment below.
Did you find this post helpful? Leave a comment below, and subscribe to my RSS feed.
Comments
Saving webpages offline
Thanks sooooo much!
I have been trying to figure out an easier way to store webpages for offline viewing. This is the best info on this topic that I have found on the net. I didn't think that you should have to pay extra for this feature.
Thanks again for your help! =)
Ginny
free software
Thanks for the feedback. If you are interested in more free software, check out the Students' Guide to Free Software page.
Finally, there is a
Finally, there is a solution. iCab browser saves the web page exactly as they are when viewed online.
Surfulater saves web content
Surfulater lets you permanently save anything you see on the Web from selected text and images to complete web pages. Then add notes, tags and links, edit content, organize information and quickly find it again. Surfulater removes the pain of having to find Web sites again, worry about whether they or the content you want still exists and provides access to information without having to be connected to the Internet.
Surfulater also saves content such as Word Documents, PDF files etc. keeping everything together in the one place. The information you accumulate becomes an invaluable resource for all sorts of research activities and makes knowledge reuse possible.
When you use Surfulater you will never loose important information you've captured or have to worry about Web sites or pages disappearing, never to be found again.