- Product Information
- Support Request
- Training & Services
- About Us
- Free Trial
Who's Minding the Store? When Central Monitoring and Automatic Notification are Essential
Download White Paper:
Companies rely on IT operations to run the background processing needed for departmental output and maintenance. Workload automation tools excel at automating these routine job streams, which reduces error and speeds up time to market. Automation also frees up staff for more complex tasks.
When managing multiple servers and applications, it becomes difficult to track job streams with cross-system dependencies and a variety of schedules. Operators need to know instantly which tasks have completed, which are running late, and which are failing.
Central monitoring and automatic notification are the keys to successful workload automation. Modern workload automation (WLA) tools provide these features in a single package that’s fast, easy to set up, and easy to use. Monitoring and notification also free System Administrators and Operators from having to perform manual checks for job statuses and resources, work overtime, or spend unnecessary time locating and fixing errors.
A full-featured workload automation solution with monitoring and notification is mission-critical for the following IT situations:
- Critical job streams that must start and complete on time without error.
- On-time services that require immediate notification.
- Failed jobs during non-business hours.
Critical Job Streams
Let’s say that your payroll process runs every other Thursday afternoon. The process might include a number of different jobs, such as summarizing the time entry information from each employee, calculating the correct hourly rate and totals, calculating any added personal time accrual, sending information to the IRS, and sending the check to the bank. Any number of these tasks, and more, would be part of the payroll process.
As a Data Center Manager, you are responsible for making sure that each of these steps runs, that they run in the correct order, and that they run on time. It’s inefficient and expensive to have someone such as an Operator or Help Desk Technician monitor the process as it runs.
Computers, on the other hand, work 24/7 and never take a vacation or sick day. WLA tools monitor critical, system-wide processes, so your operators don’t have to.
Your workload automation solution must include monitoring and automatic notification for the following events:
- Server goes down.
- Process terminates unexpectedly.
- Process is delayed.
- Process is looping and running much longer than normal.
- New file arrives in a directory.
- File or directory grows past a set threshold.
- File or directory's size changes or date and time stamp change.
- Daemon or service ends unexpectedly.
Think of the last time you were called in to the data center because of a delay in processing or because a job stream needed to be rerun. The root cause of the problem was probably one of the above events. Automatic notification is the key to a smoothly running shop.
It’s imperative that staff responsible for providing on-time service is notified as soon as possible when an event occurs. For example, if a customer places an order on your website and that process creates a file or adds a record to a file on your web server, your WLA tool can monitor for the new files or file changes and execute a process to move that order to the next step. The sales rep for that account also can be notified automatically.
These WLA tools use SMTP for text messages or email and SNMP to send to or receive traps from the central server. Either method of automatic notification can be used throughout any critical job process. Many times Help Desk software interfaces with the WLA tool, so that a ticket can be automatically created when an error occurs.
If you have any service level agreements (SLA) with your customers, you may want to monitor for start or end times to make sure you hit your SLAs every time.
Notification options should include setting a threshold for the length of time a process runs, setting a time that a task must start or end by, as well as notifying on errors.
You also can monitor for a minimum processing time. Sometimes when jobs run for only a few seconds—even if the exit code says the completion is normal—it’s not. Obviously, that process did not do what it was supposed to.
Along with automatic notification, you should be able to set up an automatic error recovery process that fixes predictable errors and, at the very least, stops the job stream so that you don’t have to re-run the entire process if an error does occur.
Let’s go back to the example of the payroll process that runs on Thursday evenings. If one of the steps fails, you want to make sure that the process stops and whoever supports the payroll process is notified immediately. Proceeding to the next step would cause major issues with payroll and may require some type of restore, not to mention a re-run of the entire job stream. The failure should also be able to trigger an error recovery process, such as automatically restoring the database files affected by the error.
Automation of error recovery steps, in addition to notification, should be a requirement in today’s workload automation applications. Examples of these automation functions include:
- Email the error logs to the support staff.
- Open a ticket in your Help Desk or customer application.
Modern customer support applications include some type of communication interface, such as SMTP or SNMP, so that tickets are created at the time of the problem, include error codes and logs, and are assigned to the correct staff member for resolution.
Backups, invoicing, and other batch processes are usually scheduled to run during off hours when you don’t have hundreds of users on your systems. If a backup fails because it can’t allocate a file or directory, you want to be notified immediately so the correct action can be taken.
Failed backups may not seem like a huge issue when they occur. The problem comes when you need to restore data from that backup and the files are corrupted. As part of your disaster recovery plan, you need to make sure that those backups are running as scheduled and completing successfully. It’s not always a natural disaster that requires a restore of files, more commonly some type of human error is the culprit. That’s another reason why automating as much as possible is good practice in all data centers; it lowers the odds of an error occurring.
Enter Skybot Scheduler
Skybot Scheduler is the modern workload automation solution that seamlessly integrates business processes through event-driven scheduling across Windows, UNIX, and Linux servers.
More than just a job scheduler, Skybot Scheduler includes central monitoring and notification, robust analysis tools, built-in audit history, and file transfer management in a software package that installs and deploys in minutes.