Counting Outbound Links on a Web Page With Lynx

How to count the number of outbound links on a page with Lynx and GNU

Lynx can be used with the -dump option to dump the text and links from a Web page in the terminal. That output can then be piped into the grep command, which can extract the URLs or other information.

The following line will count the number of outgoing links on a Web page, including internal and external links:

lynx -dump "http://www.example.com/" | grep -o "http.*" | wc -l

See my other GNU/Linux Lynx tutorial for more details on how lynx and grep can work together to extract links. The wc -l command counts the number of lines. In this case, each line is one link, so counting the lines gives you the number of links on a Web page.

How to count the number of links to external sites on a page

lynx -dump "http://www.example.com/" | grep -o "http.*" | grep -v "http://www.example.com" | wc -l

Using grep with the -v option tells it to give you all of the lines that don't match. In this case it will give you all of the links that don't include the domain name of the current Web page.

How to count the number of internal links on a page

lynx -dump "http://www.example.com/" | grep -o "http://www.example.com" | wc -l

Similar to the above example, this will only count URLs that do include the domain name of the current Web page.

Syndicate content