Basic GNU/Linux Commands

The following list of commands is not comprehensive. It is just meant to be a quick introduction that you can print out and paste to the wall next to your computer. To get documentation on any GNU/Linux command (most of these are usually GNU), just type man [command] in a terminal. To learn how to use the man command, type man man in a terminal.

An alternative to the man command is the info command. To learn how to use info, type info info in a terminal.

An introduction to some of the commands is below. To print out these pages as a reference, you can visit the print-friendly version.

Head and Tail

head

Use head to get the first 10 lines of a file. To change the number of lines use the -n option like this:

head -n 1000 access.log

That will output the first 1000 lines of a file.

tail

To get the last 10 lines of a file, use tail. To change the number of lines, use the -n option. The following command outputs the last 1000 lines of a file name access.log:

tail -n 1000 access.log

awk

awk Text-Processing Language

Awk is a text processing language. For more information about awk, type man awk in a terminal. Awk is a great tool that will be covered in more detail later.

For an introduction, try this Awk tutorial. If using the GNU version of Awk (gawk), you can download the manual. If you are using GNU/Linux you are probably using gawk.

fg, Ctrl-z and jobs

Pausing Commands

Ctrl-z

To pause a task in the terminal press Ctrl-z. That will free up the terminal for other tasks, while still running your other tasks in the background.

fg

If you have sent a program into the background with Ctrl-z, you can bring it back to the foreground with the fg command. If you have more than one program running in the background, you can put the number of the job after the command.

jobs

Typing jobs in the terminal will give you a list of all the programs running in the background:

$ jobs
[1]-  Stopped                 vim linux_seo_tools.html
[2]+  Stopped                 man iwlist

To restart my instance of vim in the above example, I would type fg 1. To restart the man page, I would type fg 2.

Grep Tutorial

Using the grep Command

Grep is one of the most useful commands. You can use grep on any UNIX-based operating system, and you can even get grep for Windows. When I'm stuck in Windows, I use use grep inside of Cygwin.

The syntax for grep is:

grep [options] PATTERN [FILE...]

Commonly used options are:

-e
This turns on extended regular expressions. It allows you to use regex patterns that aren't possible in regular grep. You can either turn this on by using grep -e or egrep — they are the same thing. See below for more about extended regular expressions.
-i
This makes the search case-insensitive.
-v
This means to find every line that doesn't match.
-o
This means to extract only the matching part of the line.
-m [number]
Stop after [number] of matching lines.
-r
Search recursively, decending into all subdirectories.
-n
Add line numbers to show where the match was in the file.

There are many more options than the common ones I've listed above. Type man grep in the terminal for a complete list.

Grep Example

To search a log file for every line that contains Googlebot and write to a file called google_access.log, you could use this:

grep Googlebot access.log > google_access.log

The following example of grep takes a logfile that only has hits from Googlebot and removes all of the requests for .png files.

grep -v \.png google_access.log > google_access_no_pngs.log

Lynx Browser

Lynx Browser

Google's Webmaster Guidelines say,

Use a text browser such as Lynx to examine your site, because most search engine spiders see your site much as Lynx would. If fancy features such as JavaScript, cookies, session IDs, frames, DHTML, or Flash keep you from seeing all of your site in a text browser, then search engine spiders may have trouble crawling your site.

I've already written a Lynx Browser tutorial as well as devoting an entire section of this Web site to Lynx, but here is a quick reference:

Get the text from a Web page as well as a list of links

lynx -dump "http://www.example.com/"

Get the source code from a Web page with Lynx

lynx -source "http://www.example.com/"

Get the response headers with Lynx

lynx -dump -head "http://www.example.com/"

pwd: Finding Your Location in the Filesystem

pwd

If you lose your place in the filesystem and would like to know your current location, type pwd. This is useful when connected to a Unix/Linux server over SSH where the terminal prompt does not give your location in the filesystem.

Sed - A Stream Editor

sed

Sed is a stream editor. It allows you to transform text when piping it through a series of commands. A common use for sed is to substitue characters. The syntax for substitution in sed is s/old/new/g, where new replaces old. The g means to replace globally on the line. If you leave the g off, sed will only replace the first occurance of old. You can replace g with a number to replace a certain instance of a word. For example, to replace the 2nd occurance of old with new on each line you could use s/old/new/2.

An example of sed would be to take a file with a list of URLs and remove the query strings from the URLs like this:

cat urls.txt | sed 's/\?.*//g'

The backslash is an escape character. Because the question mark has a meaning in regular expressions, the backslash escapes that regular meaning so that it is treated just as a normal question mark. The period indicates any character, and the asterisk means "zero or more of the preceeding character". When put together it means replace the question mark and any characters after it with nothing.

For an overview of what sed can do, type man sed in a terminal, and check out some of these sed resources:

sort and uniq

sort

The sort command sorts output.

uniq

The uniq command only returns unique lines.

See the GNU/Linux command line tutorial for detailed instructions. Or, type man sort or man uniq in a terminal.

The cat Command

cat

The cat command concatenates (combines) multiple files into one. It is also used to output the contents of a file.

The following command will combine all files in the current directory with the extension .log and put them in a file calle big_log_file.log:

cat *.log > big_log_file.log

The following command will output the contents of a logfile. This is useful for piping the contents into another command:

cat access.log

You can create a new file with cat like this:

cat > myfile.txt

Then hit enter and type in the text of your file. When you are finished, use Ctrl-c to exit the cat command.

The diff Command

diff

The diff command finds the difference between files. Type man diff for full details. Diff will be covered in more detail soon.

The echo Command

echo

The echo command prints text. The following command prints Hello World to the screen:

echo "Hello World"

The following command prints Hello World to a file called myfile.txt:

echo "Hello World" > myfile.txt

The tee Command

tee

The tee command writes the text that is passed to it to a file, and then passes it to standard output — generally the output will be piped to another command.

Here is an example of the tee command:

cat access.log | grep 'msnbot' | tee msn_access.log | egrep '(jpg|png|gif)' > msn_image_access.log

The above example does the following:

  1. cat is used to output the contents of a logfile.
  2. Then all of the lines containing msnbot are extracted with grep
  3. The tee command writes the input that it recieves to a file called msn_access.log, and then passes the same output the the next command.
  4. Egrep extracts lines from the msnbot lines that contain the text jpg, png, or gif (i.e., images files).
  5. The output is then written to a file called msn_image_access.log.

The result is two new log files — one containing hits from msnbot, and the other containing only msnbot's requests for images on the site.

(NOTE: the above example is not going to be highly precise, but it is simplified so that it doesn't get too confusing.)

You can also use tee to write output to a file and to the screen at the same time like this:

grep 'Googlebot' access.log | tee googlebot_access.log

The tr Command

tr

The tr command translates, deletes, or squeezes characters. The following example takes a logfile that has been converted into a CSV file and deletes all dashes. (Many log files put a dash in a column if there is no data available.)

cat access_log.csv | tr -d - > access_log_no_dashes.csv

(You probably wouldn't want to do this if the site has URLs with dashes in them.)

The wc Command: Counting Lines in a File

wc

The wc is useful for counting lines in a file. In the case of log files, it can count hits. The -l option will print out the number of lines.

wc -l access.log

You can also use it on multiple files — in the following case, on all files with a .log extension:

wc -l *.log

unzip, tar and File Compression

unzip

Used to decompress files with a .zip extension. Basic usage: unzip [filename].

tar

tar is used for compressing and uncompressing files. You might find yourself using it to uncompress archived log files with a tar.gz extension. Type man tar in a terminal for instructions on how to use it.

Variables

Variables in the Shell

When a variable is first used you assign it a value with an equals sign like this:

myvariable="Hello"

When you want to access the variable, put a dollar sign in front of it:

echo $myvariable
Hello

There are also built-in variables like:

  • $HOME
  • $PATH
  • $USER or $USERNAME
  • $OSTYPE
  • $LINES
  • $SHELL
  • $COLUMNS

For example, to find your home directory, you can type echo $HOME, or use it in a script. The following line will download the source code from the URL http://www.example.com/ and save it to the Desktop of the current user:

lynx -source "http://www.example.com/" > $HOME/Desktop/example.html

whois, nslookup, ping

whois

The whois command will get domain registration information about a Web site. Usage:

whois example.com

nslookup

Used to query nameservers. You can find out a site's IP address with this command.

$ nslookup google.com
Server:         66.94.25.120
Address:        66.94.25.120#53

Non-authoritative answer:
Name:   google.com
Address: 64.233.187.99
Name:   google.com
Address: 64.233.167.99
Name:   google.com
Address: 72.14.207.99

ping

Ping a site to see if you get a response. It will also tell you the remote site's IP address.

$ ping google.com
PING google.com (64.233.187.99) 56(84) bytes of data.
64 bytes from jc-in-f99.google.com (64.233.187.99): icmp_seq=1 ttl=242 time=33.0 ms
64 bytes from jc-in-f99.google.com (64.233.187.99): icmp_seq=2 ttl=242 time=31.8 ms
64 bytes from jc-in-f99.google.com (64.233.187.99): icmp_seq=3 ttl=242 time=44.2 ms
64 bytes from jc-in-f99.google.com (64.233.187.99): icmp_seq=4 ttl=242 time=29.7 ms
64 bytes from jc-in-f99.google.com (64.233.187.99): icmp_seq=5 ttl=242 time=52.0 ms
64 bytes from jc-in-f99.google.com (64.233.187.99): icmp_seq=6 ttl=242 time=46.8 ms
64 bytes from jc-in-f99.google.com (64.233.187.99): icmp_seq=7 ttl=242 time=48.8 ms
64 bytes from jc-in-f99.google.com (64.233.187.99): icmp_seq=8 ttl=242 time=55.3 ms
64 bytes from jc-in-f99.google.com (64.233.187.99): icmp_seq=9 ttl=242 time=43.4 ms