I just bought a new laptop (Thinkpad T500, which I'll review later) and was trying to copy the files from my old laptop to a portable hard drive. There was an error every time the computer tried to copy a file that contained a colon or a question mark.
Continuing in the series of Brazilian bikini Web development tutorials, here is an experiment with the Yahoo Search API, Ruby and Brazilian bikinis.
The script uses Ruby to convert the XML from the Yahoo Image Search API into XHTML Strict as shown in the image below:

Please download the attached Ruby file to follow along.
I've been busy with work lately and haven't had time to write much.
Here are some useful scripting links that have been sitting in my Firefox tabs for a week or so:
You can check the year that a domain was registered with the following command:
whois example.com | grep -i 'creat' | head -n1 | grep -o '[[:digit:]]{4}'The above line does the following:
You can extract the exact day with the following command:
whois example.com | grep -i 'creat' | head -n1 | \
egrep -o '[[:digit:]]{2}-[a-zA-Z0-9]{1,10}-[[:digit:]]{4}'It works in a similar manner to the first example, but uses a regular expression to extract the full date.
You can also run this on a list of domains in a text file by reading each line of the file.
You can quickly scrape Web pages in a rough manner with the Lynx Browser and grep (and other tools that will be explained in the near future). Lynx is able to dump the contents of Web pages in two ways: only the text of the page, or the entire HTML source of the page.
Sometimes you will end up with a list of URLs that you would like to check the HTTP response codes on. You might have 200 pages that are sending Google a 302 redirect header and you would like to check them all at once.
This very rough example reads a list of URLs from a file, fetches their HTTP response codes and redirect location, and prints them to the screen:
while read inputline
do
url="$(echo $inputline)"
headers="$(lynx -dump -head $url | grep -e HTTP -e Location)"
echo "$url $headers"
sleep 2
done < filename.txtIt is a rough script because the Location field of the headers returned by Lynx sometimes spans two lines. (I'm going to fix that problem soon.)
The sleep command tells the script to pause for 2 seconds between requests. It is optional, but if I am requesting a lot of URLs from one site, I usually pause between requests so that it doesn't make the server do too much work at once.
The basic syntax for processing a file line-by-line in the shell is:
while read inputline
do
[some commands here]
done < [input filename] This page describes some ways to use the GNU/Linux terminal to extract search engine hits from a Web site's log files.
To extract just the Googlebot hits on the site from a logfile, try this: