Service Monitor Script for Linux Servers : 4 Steps

Having a stable, always running system, even if you're using Linux can be a difficult task.

Due to the complexity of modern software packages and bad coding, inevitably some processes might crash from time to time. This could be a bad thing if you are running a server and some people rely on these services.

Step 1: Using Methods Provided by Systemd

As you might already know, most modern Linux operation systems are using systemd.

If you're not familiar with systemd, this is, according to wikipedia:

"... an init system used in Linux distributions to bootstrap the user space and manage all processes subsequently, instead of the UNIX System V or Berkeley Software Distribution (BSD) init systems. ..."

A lot of people are still arguing why it was necessary to replace the good old init system with this more complicated process management system, but on the following link one might find a good explanation:

https://www.tecmint.com/systemd-replaces-init-in-l...

The most important improvement would be that it's able to bring up the system faster than init, due to concurrent and parallel processing at boot instead of the sequential approach of init

Without going into the deeps of systemd, in order to add a process to the systemd, you must create a service file. The syntax of such a file can range from very simple to utterly complicated, and we won't go into details. In order to have a basic .service file, it's sufficient to use the following entries:

[Unit]
Description=Description of application

Documentation=http://wikipedia.org/
After=local-fs.target network.target

[Service]

Type=simple

ExecStart=/usr/sbin/application

ExecReload=/usr/sbin/application reload

ExecStop=/usr/sbin/application stop

Restart=always

[Install]

WantedBy=multi-user.target

Place these into application.service file in /lib/systemd/system folder.

What each of these options do is explained in the following link:

https://access.redhat.com/documentation/en-US/Red_...

In oder to start your application, issue the following command:

sudo systemctl start application.service

Note: the .service extension can be omitted.

To stop the application:

sudo systemctl stop application.service

If the configuration file has been changed and you would like to reload the settings:

sudo systemctl reload application.service

To restart the application:

sudo systemctl restart application.service

To enable automatic starting at boot:

sudo systemctl enable application.service

If this is enabled, then the systemd process manager will try to start up the application based on the settings that were provided by the system file.

To disable it, use the same command as above, but with 'disable' parameter.

If you place Restart=always in the service file, then systemd will monitor the process and if it cannot be found in the process list, it will try to restart it automatically.

If you place

RestartSec=30

after the restart directive, it will wait for 30 seconds before trying to restart the process. This might be useful, as a continuous restart attempt of a failing service/application can lead to high demand on the system (writing error logs, etc)

As you can see, systemd already provides some means to monitor the processes. However, in some cases this might be not sufficient. What if a process does not exit (it will still be in the process list), but it stops responding. In this case, in order to make sure that a process is indeed up and running, you might need additional checks to be performed.

Here is where the scripts from this instructable will come in handy.

Step 2: Configuring and Using the Service Checker Scripts

If you need more control of your running processes/services, these scripts will be helpful, for sure.

As the code is slightly large, It's uploaded to github and can be found under the following repository:

https://github.com/trex2000/Service-Monitor-Scripts/blob/master/checkService.sh

The 'heart' of the whole package is the

checkService.sh

Before using it, you must replace the full path to the service folder. This can be found at the beginning of the script.

The script can monitor several processes and perform additional task, as described below:

It goes through each files from the /services subfolder having .serv or .check extensions and will check if there is an active process called 'application'.

If there is no '.check' file for an application, only application.serv file:

If the process is active, it will consider the process as being active.

If the process is inactive, then it will restart the service by issue-ing the following command:

systemctl restart application

if the .serv file is empty!

If the .serv file is not empty and has executable rights, it will try to run it as a plain BASH script.

This is useful if something additionally has to be done besides just restarting the service.

For example, in the spamd.serv file, from the repo above, in case the spamd service is dead, the spamassassin service needs to be restarted instead, which will also restart spamd. Restarting just spamd would not be sufficient.

One can edit the content of such a serv file according to the needs.

Another example is the pcscd.serv file. In this case several other processes were also restarted/killed.

If there is a check file, after checking if the process is running, it will also run this script file to perform additional checks.

For example, for the oscam service, we've created a check file which tries to connect to it's web interface to see if it's successful. If not, then, despite having the process active, the service is not responding and needs to be restarted. The restart of the service must be performed/called by the .check file itself.

Another example would be the mediatomb DLNA service.

This is a small server which provides video/audio content to DLNA clients and broadcasts itself on the network. Sometimes the service hangs and it's not discover-able any more , but the process will be still active. To check if the service is discover-able, the CLI utility called gssdp-discover was used. The whole code that checks the DLNA server was placed inside a mediatomb.check script.

These are just a few examples on how you can used the .serv and .check files.

In order to monitor a new service, you must create a .serv and, if needed also a check file and write the corresponding script inside them.

If only checking of the presence of the process if enough, then an empty .serv file will be sufficient. If additional checks must be performed, then a .check file must be created and a small script has to be written to do the job.

Of cource, the .sh script has to be run periodically, therefore a cron job must be also created for it:

#check running services every 5 minutes
*/5 * * * * /var/bin/ServiceCheck/checkService.sh >/dev/null

Step 3: Final Thoughts

I hope you will find this package useful as it can greatly simply the monitoring of Linux processes and hopefully will minimize the downtime of your services.

Feel free to upload additional scripts to github, if you create new ones. Just let me know and I will add you as contributor.