rtexec - real time data acquisition system executive
rtexec [-cdfknstvx] [-q minutes] [-r minutes]
[-u user] [-w whyfile] [comment]
rtexec starts up, shuts down, and monitors the operation of the
real time data acquisition system.
Typically, it performs the following operations:
-
Reads its parameter file, getting the set up information
-
Sets up the environment for the tasks, clearing
the old environment entirely, and putting back only the
environment variables specified in the parameter file.
-
Set up process resource limits according to the specs
in the parameter file.
-
Runs any specified startup tasks.
-
Starts the specified tasks.
-
Loads crontab entries if necessary.
-
Monitors the running tasks and the rtexec.pf parameter file.
-
If a SIGTERM signal is received, rtexec attempts
an orderly shutdown. It sends a SIGTERM to each process
and its children. It then waits some period, and
sends a SIGKILL ( -9 ) to any process still running.
It then runs any specified shutdown tasks.
If a task dies, it is restarted automatically.
If the rtexec.pf parameter file changes, it is reread.
If any task has been added or deleted, or its command
line changed, it is started, restarted, or killed as appropriate.
Any crontab changes are incorporated.
If the Defines or Env or Limit arrays in rtexec.pf environment change,
all the tasks are restarted.
Any optional comment on the command line is recorded in the log.
-
-c
clear logs
-
-d
debug mode -- very verbose,
and rtexec does not turn itself into a daemon
-
-f
Don't ask about killing a previous rtexec, just do it if
necessary.
-
-k
kill running rtexec
-
-n
show what tasks would be run, but don't run anything else
-
-q minutes
Run for the specified number of minutes, then shutdown:
primarily for testing.
-
-r minutes
Run for the specified number of minutes, then shutdown:
repeat indefinitely. For testing purposes only.
-
-s
save old logs in a new subdirectory of logs named after the current time
The old logs are compressed using compress(1).
-
-t
run tail -f logs/rtexec after changing to a daemon
-
-u user
specify user requesting shutdown (in conjunction with -k)
-
-v
more verbose
-
-w whyfile
Specify a file containing the explanation for a shutdown (in conjunction with -k)
-
-x
use xuserauth(1) to verify authorization, rather than userauth(1)
rtexec typically runs in a directory with a particular organization;
the typical directories are described below.
This directory and file structure is only the default organization.
The directories can be rearranged as desired by modifying the appropriate
parameter files.
-
pf
This directory contains the parameter files for the tasks run
by rtexec.
-
logs
This directory contains logs of error messages from rtexec and
the tasks it runs.
-
state
Some tasks save a state file in this directory
when they terminate; this state file allows them to restart at
the same location in the ring buffer.
-
orb
This directory contains the orb buffer files used by the orbserver(1).
-
archive
orb2db(1) writes waveform files and the real time wfdisc table in this
directory. orb2dbt(1) saves database records for the origin, arrival,
assoc and event tables in this directory.
-
dbmaster
The master lastid table, used by the real time database in archive, and all
other data collection databases is kept here. In addition,
static database tables like site, sitechan, sensor, and instrument are kept
in this directory.
-
rtsys database
rtexec saves some state information into a database, by default named
rtsys/rtsys. rtsys keeps track of the tasks currently running
-
logs/rtexec.pid
The pid of the running rtexec is saved into this file, for
convenience when restarting the system. This file is also used as a
lock file, to prevent multiple rtexec's from running at once.
Finally, the permissions on this file provide state information
(stopped, running, starting, stopping) to rtm(1).
-
state/.pid
This file is a symbolic link to the file(directory) in /proc for the
the rtexec process. This is for the convenience of rtm(1), so that
it can quickly determine if rtexec is still running.
-
state/cron
This modification time on this file records the last time the rtexec
cron job queue was run.
The environment for rtexec and tasks rtexec runs is specified in
the parameter file.
The execution of the real time system is largely dictated
by rtexec's parameter file.
-
Defines
This array defines values for some set of symbolic names, usually including
ROOT, ANTELOPE, and ORB. The symbolic names can then be used in the environment
and the execution (command) lines in the form $name, and the corresponding value
from the Defines array is substituted. As described for the Env array
below, you may also use PRESERVE as a keyword here; then the corresponding
environment variable is used.
-
Env
This is an array of environment variables and the corresponding values
to which they should be set.
This specifies the environment for rtexec and tasks rtexec runs.
All other values (not listed in the Env array) are cleared from the
environment. This ensures a consistent and well defined environment for
running tasks.
You may use the keyword PRESERVE to use the original
environment value when rtexec was run. Keep in mind, however, that on a
running system, rtexec may be started by at boot time out of
a script in /etc/rc3.d, so that some environment variables like DISPLAY
are not set, or may be set differently than you expect.
For a case where a variable may not be set (eg, DISPLAY), you can specify a
default value with a line like:
DISPLAY PRESERVE || :0
-
Limit
This array allows setting the process limits for tasks run by
rtexec. It's primarily to unlimit coredumpsize, so that failures
due to segmentation faults and bus errors
leave a core dump, and to unlimit descriptors, so that orb2db
can have many open many files. However, if stacksize is set too large,
it can limit the orbserver buffer size.
-
Database
This is the name of the database in which rtexec keeps some
statistics on running tasks.
-
Processes
This is a list of tasks which may be run by rtexec;
they are started in the same order as they appear in this list.
Each item in the list has a name,
followed by the execution line. The execution
line is interpreted by a shell, so special shell characters
must be escaped or quoted to pass them on to the program.
-
Run
This is an array of flags indexed by task name. The corresponding
task from the Process list is run only if the flag in Run
is non-zero.
-
startup_shutdown_email
When the system is stopped or started, mail is sent to these email addresses.
-
status_email
These addresses receive email when
1) rtexec declares a task failure as
described below under Failure_threshold and gives up on restarting a
task.
2) a task fails with a segmentation fault, bus error or other
hardware failure.
3) some limit on resources is exceeded; see the section below on Resource Limits.
-
Startup_tasks
These are execution lines for one-shot executions when rtexec is
started.
-
Shutdown_tasks
These are execution lines for one-shot executions as rtexec is shutting
down.
-
Shutdown_order
During shutdown, kill signals are sent to (running) tasks
in the order named
in this list. Note that the task name need not be related to the
program run. The names in Shutdown_order are either task names
(like orb2orb_from_UCSD), or
program names (like orb2orb). Each line can name multiple tasks, which
are killed concurrently during a shutdown.
All tasks which match entries on a particular line have either died
or been sent kill -9 signals before any tasks from a later line
are sent signals.
It's ok to have tasks listed which are not running.
Processes not listed in the Shutdown table are the last to be sent signals;
orbserver is often the last task to be killed.
Usually, the correct shutdown order is:
1) orb readers (like orb2db, orbassoc)
2) orb writers (like q3302orb, orb2orb, orbdetect, etc)
3) other programs
4) orbserver, diskserver
-
Shutdown_when_task_dies
rtexec can be made to run a collection of programs until
one completes; when that task dies, rtexec waits a specified
number of seconds, and then shuts down the system.
This array names tasks which terminate rtexec when they die;
the corresponding value is the number of seconds
rtexec waits before shutting down.
-
Start_period
When first starting tasks, rtexec waits this specified number
of seconds before starting the next task. For tasks which are
dependent on one another, (eg, orb2db is dependent on orbserver),
this allows time for the earlier task to get into execution.
-
Minimum_period_between_starts
To minimize problems with multiple restarts of tasks, rtexec enforces
a minimum time between restarts.
-
Pf_restart
Each line in this list begins with a task name, followed by parameter
files. When the modification time on one of these parameter files
is later than the start time of the corresponding task, rtexec restarts that
task.
Only parameter files in the subdirectory pf are inspected, not
parameter files elsewhere.
-
Failure_threshold
When a task repeatedly dies in times less than the threshold,
the period between restarts is repeatedly doubled. This kind of
failure usually indicates a problem which requires
intervention by an operator, though there might be exceptions.
Backing off the period between restarts might allow some problems
to resolve themselves. Ultimately, though, the restart of chronically
failing tasks needs to be curtailed, to avoid overflowing logs, and
avoid burying the useful information in a sea of restart messages.
-
Failure_repetitions
-
Failure_retry_period
After the number of restart attempts specified by Failure_repetitions,
rtexec waits for Failure_retry_period before again attempting a restart.
In addition, mail is sent to anyone specified in
the startup_shutdown_email parameter.
-
Time_to_die
When a Processes execution line changes, or one of the Run array values
is turned off, rtexec signals the corresponding task to stop, using
a KILL signal. It then waits up to Time_to_die seconds for the task
to quit, before sending a kill -9 signal.
-
crontab
It may be desirable to run certain jobs on a regular basis, but not
continuously. This table is a list of jobs which are run by rtexec.
The format of each line is the similar to crontab(1), but with
two additional, leading parameters: a job name, and a timezone code.
The log file for a cron job is named "cron:job_name".
rtexec will not start a new cron job while the previous
execution of the same name is still running.
The timezone code may be either UTC or LOCAL, and indicates whether
the following parameters specify a UTC time or a local timezone time.
crontab(1) jobs always use local time, but in a real time system,
it's often more convenient
to specify a UTC time.
See crontab(1) for a description of the remainder of the parameters.
These crontab-like jobs also differ from typical crontab lines in that they
are run by rtexec, with the
rtexec process environment, and from the same directory.
This may simplify the
problem of getting a crontab entry to work properly, as missing environment
variables are a frequent problem in jobs run from the system crontab.
Incidentally, an easy way to test rtexec Crontab jobs from the command
line is to use the rtrun(1) command, eg:
rtrun 'my-crontab-job arg1 arg2 ..'
-
max_cron_gap
If rtexec is down
for less time than max_cron_gap, any jobs which would have run during
the down time are run when rtexec is restarted. If rtexec is down for longer,
then only cron jobs scheduled after the restart are run.
-
Use_UTC
This parameter is now defunct, and relates to an earlier implementation
fo rtexec. It should be removed the parameter file.
-
email_incident_reports
When a program dies due to a segmentation violation
or bus error, an incident report is generated by rtincident(1).
A copy of this report is sent via email to the addresses specified in this list.
-
include_corefile
When this is "yes", any corefile generated is
sent via email also. Caution: corefiles are often too
large to send via email.
-
disks
This is a list of disk partitions also used by rtm(1).
Each line consists of a partition name,
a directory or file on that partition, the minimum # Mbytes which should
available on that partition, the minimum number of kilo inodes which
should be available, and finally a short description of the partition uses.
rtm(1) displays a bar for each distinct filesystem in this list; rtexec
checks each distinct filesystem regularly, and sends email
(to the startup_shutdown_email recipients) if the
free space on the disk falls below the specified minimum.
-
Min_vmfree
rtexec regularly checks available memory/swap space. If the available memory
falls below Min_vmfree Mbytes, rtexec attempts to send email to the recipients
specified in startup_shutdown_email.
-
Chatter_limit
Messages about resource shortfalls are sent at most only as frequently
as this limit, by default 7200 seconds (2 hours).
Some parameters are kept in rtexec.pf only because it's a convenient
central location, not because they're used by rtexec. Some parameters
are shared by rtexec and rtm. Following are the rtm parameters;
please refer to rtm(1) for a description:
-
Processes
-
Run
-
Parameter_files
-
disks
-
Buttons
-
Edit_files
-
orbtasks
-
title
In addition, the network code used by rtreport(1) and rtsys(1)
can be specified in rtexec.pf; the default is to use the net code from
the first entry in the network table.
# start a real time system, first saving the old log files
# to a sub-directory.
% rtexec -ts
# kill a running real time system
% rtexec -k
rtexec is generally pretty verbose, logging most changes to the
file logs/rtexec. Fatal errors include the following:
-
No rtlog directory
rtexec normally saves messages to various files in the logs directory;
it quits immediately if this directory does not exist.
-
no rtexec.pf
rtexec reads all of its startup and configuration information from
a parameter file, which it expects to find in the local directory.
-
rename $i to $new failed
-
failed to remove $i
During startup, rtexec has the options of removing files in the logs directory,
or moving them to a new subdirectory. If either is requested and the
deletion or move fails, rtexec quits.
-
Can't fork
Either memory is in short supply or the process table is full.
rtexec_setup(5)
rt(5)
rtsystem(5)
rtsetup(8)
pf(3), pfecho(1)
crontab(1.gz)
rtincident(1)
rtsnapshot(1)
rtmail(1)
Presumably, many cron jobs may run soon after midnight UTC. However, local
times in the hour after switches to and from Daylight savings time should be
avoided.
The names of tasks are completely arbitrary; however, it has proven
convenient to use this convention:
the task name is the executable name, potentially followed by an underscore
and a qualifier. For instance, the name of a single orb2orb task would
be orb2orb. If there were multiple orb2orb tasks, the names might be
distinguished by qualifiers, eg: orb2orb_AZ, orb2orb_cslb, etc.
Daniel Quinlan
Boulder Real Time Technologies, Inc.
Table of Contents
Antelope Release 4.8 Darwin 8.7.0 2006-09-26
Boulder Real Time Technologies, Inc
For more information, contact support@brtt.com