Introduction: How to Monitor Webites for Changes

About: A geek traveling the world!
I have run into the need to being more pro-active and than reactive to when a website goes down.  I've built many WordPress websites for many years and this year I've really gotten into customizing them.  Because I am helping out others, this means using different webhosters.  I have been a customer of 1&1 for 10 years now and once a website has been set up, I have never had an issue with it going own.  Sadly, this is not true of other webhosters and it certainly doesn’t mean squat if the people you are helping, make DNS changes and break things.  Nothing like getting a call saying 'our website is down, please fix ASAP!' and realizing it's been down for hours!  Thus is born the website-check bash script.

Last year I wrote a simple bash script that used wget to grab a copy of a web page, then compare it to a previous copy of that web page then notify me if there was a difference.  (This was for a job posting page of a company site). 

#!/usr/bin/env bash
if [[ -f new_page.html ]]; then
  mv new_page.html old_page.html
fi
wget $1 -O new_page.html > /dev/null 2>&1
diff ./new_page.html ./old_page.html > /dev/null
if [[ $? -ne 0 ]]; then
  printf "%s\n" "The file has changed!!!"
  notify-send "The file has changed!!!"
  date
  exit 0
else
  printf "%s\n" "No change"
  notify-send "The file has NOT changed!!!"
  date
  exit 0
fi


Step 1: Going Deeper Into the Matrix

This time around I wanted to check two things.  If the webserver was accessible AND if the site had changed.  So the first thing that was setup was to ping the webserver but why just ping when I can make use of the information I get back from that ping result as well!

ping_status=$(ping -c 1 $link | grep "packet" | cut -d"%" -f1 | cut -d"," -f3 | cut -d" " -f2)

When you ping, you get a bunch of results.  One of them being 'packet loss'.  Typically if a webserver is up and running, it will respond to your ping.  Ping can be set up to check multiple times in a row(or continuously....if this is done fast enough and by enough machines, you have a DDOS attack but we are just doing a normal ping).  Each check returns a bunch of information and one of them is how many times the ping failed. (I'm simplifying here, feel free to read up for more details).  This is returned as a percentage.  i.e. 4 pings sent, 4 returns, 0% fail.  That is what 'grep' does, finds the section that says 'packet' and then reads the % and reports it.  100% fail would be a bad thing.

Step 2: Does It Blend?

Ok, so now we have checked, and ideally verified,  that the website is up and running.  Now we need to see if the page has changed.

Using wget you can grab a copy of a web page from any site.  What I do is grab a copy, give it a specific name of my choosing, then compare it to a previous copy.  If they are different, then something needs to be looked into.

wget -o ./website_check/tmp.log $link -O ./website_check/$folder/new_page.html


Here I tell wget to output details to a log file, use the defined $link and call the grabbed page a specific name.  Although I should mention that to avoid issues on the next time I do this, I do some filename shuffling:

if [ -f ./website_check/$folder/new_page.html ]
then
mv ./website_check/$folder/new_page.html ./website_check/$folder/old_page.html
fi

So from the last time I checked, I rename the file so that when I grab a new copy, there is no conflict.  Then I compare the two for changes.

Step 3: Now the Geeky Part

Now we know the site is up and that the page has changed but what if you aren't sitting in front of the computer?  I mean, you do have to go out to get groceries, run errands.  Well, luckily I found a way to solve that!  Enter in Notify My Android!  This website allows you to push notifications to your Android device. (I am aware of other programs for other platforms, I only use Linux and Android).

Basically after I have all my information about the website I am monitoring, I send that in a notification to my Galaxy Tab 8.9 (any recent Android device will work, check site for details).

curl https://nma.usk.bz/publicapi/notify --silent --data-ascii "apikey=$APIKEY" --data-ascii "application=$1" --data-ascii "event=curl-event" --data-asci "description=$2" --data-asci "priority=1"

Enter in curl.

A command line tool for getting or sending files using URL syntax.

To put it simple.  You can send information to websites using curl.  In this case, I have an account with Notify My Android and I send the results of my ping/comparison to that account.  In turn, that information gets sent to the Notify My Android app running on my Tablet.

What is really cool is that you can customize the 'subject' and 'message' to be whatever you want.  In my message body of the push notification, I use yet another piece of information from the page difference.  When I run the cmp command, one of the resulting information is the line number on which the first change was noticed.  So I grab that info from the output file and put it into the message body!

Now we have a fully functional script that tells you if the server is down, and if it's up, it tells you if the page has changed(i.e. maybe hacked? Broken?) then send you the notification wherever you are.  Pretty cool huh?

Sure, there have been commercial products that have been able to do the same but this was all done with nothing but an OS, and a free account on a website.  Commercial-like results from free stuff!

There are some limitations and I am working on finding methods that will get around those limitations.  For example, if you have a website that runs some sort of updating script (i.e. a twitter feed, or a live stock market index feed) then the page will always be shown as being changed whenever those feeds update.  If you are interested in following this development, feel free to head on over and subscribe to my geeky blog at http://cosmopolitangeek.wordpress.com/.  Full versions of the script are available here https://app.box.com/s/l2emmq9q9nhayvci25r5