NAME
rtexec - real time data acquisition system executive
SYNOPSIS
rtexec [-cdDfknstvx] [-q minutes] [-r minutes]
[-u user] [-w whyfile] [comment]
DESCRIPTION
rtexec starts up, shuts down, and monitors the operation of the
real time data acquisition system.
Typically, it performs the following operations:
-
Reads its parameter file, getting the set up information
-
Sets up the environment for the tasks, clearing
the old environment entirely, and putting back only the
environment variables specified in the parameter file.
-
Set up process resource limits according to the specs
in the parameter file.
-
Runs any specified startup tasks.
-
Starts the specified tasks.
-
Loads crontab entries if necessary.
-
Monitors the running tasks and the rtexec.pf parameter file.
-
If a SIGTERM signal is received, rtexec attempts
an orderly shutdown. It sends a SIGTERM to each process
and its children. It then waits some period, and
sends a SIGKILL ( -9 ) to any process still running.
It then runs any specified shutdown tasks.
If a task dies, it is restarted automatically.
If the rtexec.pf parameter file changes, it is reread.
If any task has been added or deleted, or its command
line changed, it is started, restarted, or killed as appropriate.
Any crontab changes are incorporated.
If the Defines or Env or Limit arrays in rtexec.pf environment change,
all the tasks are restarted.
OPTIONS
Any optional
comment on the command line is recorded in the log.
-
-c
clear logs
-
-C
certify mode: stick around until spawned rtexec quits, for certify tests.
-
-d
debug mode -- very verbose,
and rtexec does not turn itself into a daemon
-
-D
For Mac OS X launchd, don't fork, but still redirect all the output
files.
-
-f
Don't ask about killing a previous rtexec, just do it if
necessary. Without this option, there is a prompt to ask if it's ok to
kill off the previous rtexec.
-
-k
kill running rtexec
-
-M
This option is essentially an internal switch that rtexec uses to
bootstrap itself into daemonized execution. It is therefore also invoked
from within the boot scripts installed by install_boot_scripts(1).
-
-n
show what tasks would be run, but don't run anything else
-
-q minutes
Run for the specified number of minutes, then shutdown:
primarily for testing.
-
-r minutes
Run for the specified number of minutes, then restart;
repeat indefinitely. For testing purposes only.
-
-s
save old logs in a new subdirectory of logs named after the current time
The old logs are compressed using compress(1).
-
-t
run tail -f logs/rtexec after changing to a daemon
-
-u user
specify user requesting shutdown (in conjunction with -k)
-
-v
more verbose
-
-w whyfile
Specify a file containing the explanation for a shutdown (in conjunction with -k)
-
-x
use xuserauth(1) to verify authorization, rather than userauth(1)
FILES
rtexec typically runs in a directory with a particular organization;
the typical directories are described below.
This directory and file structure is only the default organization.
The directories can be rearranged as desired by modifying the appropriate
parameter files.
-
pf
This directory contains the parameter files for the tasks run
by rtexec.
-
logs
This directory contains logs of error messages from rtexec and
the tasks it runs.
-
state
Some tasks save a state file in this directory
when they terminate; this state file allows them to restart at
the same location in the ring buffer.
-
orb
This directory contains the orb buffer files used by the orbserver(1).
-
archive
orb2db(1) writes waveform files and the real time wfdisc table in this
directory. orb2dbt(1) saves database records for the origin, arrival,
assoc and event tables in this directory.
-
dbmaster
The master lastid table, used by the real time database in archive, and all
other data collection databases is kept here. In addition,
static database tables like site, sitechan, sensor, and instrument are kept
in this directory.
-
rtsys database
rtexec saves some state information into a database, by default named
rtsys/rtsys. rtsys keeps track of the tasks currently running
-
logs/rtexec.pid
The pid of the running rtexec is saved into this file, for
convenience when restarting the system. This file is also used as a
lock file, to prevent multiple rtexec's from running at once.
Finally, the permissions on this file provide state information
(stopped, running, starting, stopping) to rtm(1).
-
logs/rtexec.plist
When running after being executed by a non-root user on Mac systems running
OS X 10.10.3 and above, rtexec saves the plist file it automatically
constructs for the OS X launchctl program in logs/rtexec.plist. This file
is moved to logs/rtexec.old.plist upon real-time-system shutdown.
-
logs/rtexec.state
This file contains a short single line describing the state of the system, e.g.:
System starting up |
Running system startup tasks |
Starting system tasks |
System is up |
Shutting down |
Running Shutdown tasks |
System is Shutdown |
-
state/cron
This modification time on this file records the last time the rtexec
cron job queue was run.
ENVIRONMENT
The environment for
rtexec and tasks
rtexec runs is specified in
the parameter file.
PARAMETER FILE
The execution of the real time system is largely dictated
by
rtexec's parameter file.
-
Defines
This array defines values for some set of symbolic names, usually including
ROOT, ANTELOPE, and ORB. The symbolic names can then be used in the environment
and the execution (command) lines in the form $name, and the corresponding value
from the Defines array is substituted. As described for the Env array
below, you may also use PRESERVE as a keyword here; then the corresponding
environment variable is used.
-
Env
This is an array of environment variables and the corresponding values
to which they should be set.
This specifies the environment for rtexec and tasks rtexec runs.
All other values (not listed in the Env array) are cleared from the
environment. This ensures a consistent and well defined environment for
running tasks.
You may use the keyword PRESERVE to use the original
environment value when rtexec was run. Keep in mind, however, that on a
running system, rtexec may be started at boot time out of
a script in /etc/rc3.d, so that some environment variables like DISPLAY
are not set, or may be set differently than you expect.
For a case where a variable may not be set (e.g., DISPLAY), you can specify a
default value with a line like:
DISPLAY PRESERVE || :0
-
Limit
This array allows setting the process limits for tasks run by
rtexec. It's primarily to unlimit coredumpsize, so that failures
due to segmentation faults and bus errors
leave a core dump, and to unlimit descriptors, so that orb2db
can have many open many files. However, if stacksize is set too large,
it can limit the orbserver buffer size.
-
Database
This is the name of the database in which rtexec keeps some
statistics on running tasks.
-
Processes
This is a list of tasks which may be run by rtexec;
they are started in the same order as they appear in this list.
Most items in the list have a name,
followed by the execution line. The execution
line is interpreted by a shell, so special shell characters
must be escaped or quoted to pass them on to the program.
However, special transient commands can be added to the
list of Processes. These commands have no name, are always
preceded with the at symbol @, and must finish executing
before the next command is started. Thus they can be used to
insert arbitrary delays or wait for some event:
see is_idle(1) for a program that waits for a file to be quiescient
as an example. A single number following the @ sign indicates
a time in seconds to wait.
The idea of transient tasks originated in rtmanage(1), where there
are some additional transient commands (kill and exit) which are not
available in rtexec. The return code from a transient task does
not affect execution, whereas in rtmanage, a return code of 9 terminates
the rtmanage session.
-
Run
This is an array of flags indexed by task name. The corresponding
task from the Process list is run only if the flag in Run
is non-zero.
-
startup_shutdown_email
When the system is stopped or started, mail is sent to these email addresses.
-
status_email
These addresses receive email when
1) rtexec declares a task failure as
described below under Failure_threshold and gives up on restarting a
task.
2) a task fails with a segmentation fault, bus error or other
hardware failure.
3) some limit on resources is exceeded; see the section below on Resource Limits.
-
Startup_tasks
These are execution lines for one-shot executions when rtexec is
started.
-
Shutdown_tasks
These are execution lines for one-shot executions as rtexec is shutting
down.
-
Shutdown_order
During shutdown, kill signals are sent to (running) tasks
in the order named
in this list. Note that the task name need not be related to the
program run. The names in Shutdown_order are either task names
(like orb2orb_from_UCSD), or
program names (like orb2orb). Each line can name multiple tasks, which
are killed concurrently during a shutdown.
All tasks which match entries on a particular line have either died
or been sent kill -9 signals before any tasks from a later line
are sent signals.
It's ok to have tasks listed which are not running.
Processes not listed in the Shutdown table are the last to be sent signals;
orbserver is often the last task to be killed.
Usually, the correct shutdown order is:
1) orb readers (like orb2db, orbassoc)
2) orb writers (like q3302orb, orb2orb, orbdetect, etc)
3) other programs
4) orbserver, diskserver
-
Shutdown_when_task_dies
rtexec can be made to run a collection of programs until
one completes; when that task dies, rtexec waits a specified
number of seconds, and then shuts down the system.
This array names tasks which terminate rtexec when they die;
the corresponding value is the number of seconds
rtexec waits before shutting down.
-
Start_period
When first starting tasks, rtexec waits this specified number
of seconds before starting the next task. For tasks which are
dependent on one another (e.g., orb2db is dependent on orbserver),
this allows time for the earlier task to get into execution.
-
Minimum_period_between_starts
To minimize problems with multiple restarts of tasks, rtexec enforces
a minimum time between restarts.
-
Pf_restart
Each line in this list begins with a task name, followed by parameter
files. When the modification time on one of these parameter files
is later than the start time of the corresponding task, rtexec restarts that
task.
Only parameter files in the subdirectory pf are inspected, not
parameter files elsewhere.
-
Failure_threshold
When a task repeatedly dies in times less than the threshold,
the period between restarts is repeatedly doubled. This kind of
failure usually indicates a problem which requires
intervention by an operator, though there might be exceptions.
Backing off the period between restarts might allow some problems
to resolve themselves. Ultimately, though, the restart of chronically
failing tasks needs to be curtailed, to avoid overflowing logs, and
avoid burying the useful information in a sea of restart messages.
-
Failure_repetitions
-
Failure_retry_period
After the number of restart attempts specified by Failure_repetitions,
rtexec waits for Failure_retry_period before again attempting a restart.
In addition, mail is sent to anyone specified in
the startup_shutdown_email parameter.
-
Time_to_die
When a Processes execution line changes, or one of the Run array values
is turned off, rtexec signals the corresponding task to stop, using
a KILL signal. It then waits up to Time_to_die seconds for the task
to quit, before sending a kill -9 signal.
-
crontab
It may be desirable to run certain jobs on a regular basis, but not
continuously. This table is a list of jobs which are run by rtexec.
The format of each line is the similar to crontab(1), but with
two additional, leading parameters: a job name, and a timezone code.
The log file for a cron job is named "cron-job_name".
rtexec will not start a new cron job while the previous
execution of the same name is still running.
The timezone code may be either UTC or LOCAL, and indicates whether
the following parameters specify a UTC time or a local timezone time.
crontab(1) jobs always use local time, but in a real time system,
it's often more convenient
to specify a UTC time.
See crontab(1) for a description of the remainder of the parameters.
These crontab-like jobs also differ from typical crontab lines in that they
are run by rtexec, with the
rtexec process environment, and from the same directory.
This may simplify the
problem of getting a crontab entry to work properly, as missing environment
variables are a frequent problem in jobs run from the system crontab.
Incidentally, an easy way to test rtexec Crontab jobs from the command
line is to use the rtrun(1) command, e.g.:
rtrun 'my-crontab-job arg1 arg2 ..'
-
max_cron_gap
If rtexec is down
for less time than max_cron_gap, any jobs which would have run during
the down time are run when rtexec is restarted. If rtexec is down for longer,
then only cron jobs scheduled after the restart are run.
-
Use_UTC
This parameter is now defunct, and relates to an earlier implementation
fo rtexec. It should be removed the parameter file.
-
email_incident_reports
When a program dies due to a segmentation violation
or bus error, an incident report is generated by rtincident(1).
A copy of this report is sent via email to the addresses specified in this list.
-
include_corefile
When this is "yes", any corefile generated is
sent via email also. Caution: corefiles are often too
large to send via email.
Resource Limits
-
disks
This is a list of disk partitions also used by rtm(1).
Each line consists of a partition name,
a directory or file on that partition, the minimum # Mbytes which should
available on that partition, the minimum number of kilo inodes which
should be available, and finally a short description of the partition uses.
rtm(1) displays a bar for each distinct filesystem in this list; rtexec
checks each distinct filesystem regularly, and sends email
(to the startup_shutdown_email recipients) if the
free space on the disk falls below the specified minimum.
-
Chatter_limit
Messages about resource shortfalls are sent at most only as frequently
as this limit, by default 7200 seconds (2 hours).
Parameters used by rtm
Some parameters are kept in rtexec.pf only because it's a convenient
central location, not because they're used by rtexec. Some parameters
are shared by rtexec and rtm. Following are the rtm parameters;
please refer to rtm(1) for a description:
-
Processes
-
Run
-
Parameter_files
-
disks
-
Buttons
-
Edit_files
-
orbtasks
-
title
In addition, the network code used by rtreport(1) and rtsys(1)
can be specified in rtexec.pf; the default is to use the net code from
the first entry in the network table.
EXAMPLE
# start a real time system, first saving the old log files
# to a sub-directory.
% rtexec -ts
# kill a running real time system
% rtexec -k
DIAGNOSTICS
rtexec is generally pretty verbose, logging most changes to the
file logs/
rtexec. Fatal errors include the following:
-
No rtlog directory
rtexec normally saves messages to various files in the logs directory;
it quits immediately if this directory does not exist.
-
no rtexec.pf
rtexec reads all of its startup and configuration information from
a parameter file, which it expects to find in the local directory.
-
rename $i to $new failed
-
failed to remove $i
During startup, rtexec has the options of removing files in the logs directory,
or moving them to a new subdirectory. If either is requested and the
deletion or move fails, rtexec quits.
-
Can't fork
Either memory is in short supply or the process table is full.
-
DUE TO APPLE SECURITY RESTRICTIONS, THIS RTEXEC INSTANCE WILL TERMINATE WHEN YOU LOG OUT. If this is not what you want, launch as root or via the install_boot_scripts(1) mechanism.
As of Mac OS X Yosemite 10.10.3, Apple prohibits any rtexec started as a non-root user from making the changes it needs to
survive user logout without loss of critical services. Therefore, if you start up rtexec from a non-privileged account, it will shutdown and exit
when you log out. To prevent this, you must either startup rtexec as root (which in turn requires that root own the directory in which
rtexec is running, as well as the contents of that directory), or you must install a turnkey boot script, per the man-page install_boot_scripts(1).
SEE ALSO
rtexec_setup(5)
rt(5)
rtsetup(8)
pf(3), pfecho(1)
crontab(1)
rtincident(1)
rtsnapshot(1)
rtmail(1)
rtmanage(1)
is_idle(1)
install_boot_scripts(1)
BUGS AND CAVEATS
As of Mac OS X Yosemite 10.10.3, Apple prohibits any rtexec started as a non-root user from making the changes it needs to
survive user logout without loss of critical services. Therefore, if you start up rtexec from a non-privileged account, it will shutdown and exit
when you log out. To prevent this, you must either startup rtexec as root (which in turn requires that root own the directory in which
rtexec is running, as well as the contents of that directory), or you must install a turnkey boot script, per the man-page install_boot_scripts(1).
Presumably, many cron jobs may run soon after midnight UTC. However, local
times in the hour after switches to and from Daylight savings time should be
avoided.
rtexec attempts to save and report on incidents which create a core file.
(The stock rtexec.pf unlimits coredumpsize so that core files should be created).
It expects to find the core file in the same directory as rtexec.pf.
However, some systems are configured so that the core files are saved
in some alternate location. MacOS X drops
core files into /cores. In these cases, rtexec does not find the core file,
and hence can't generate stack traces or other useful information from the
core file. A few programs (eg, orbxfer2) may also move to a different directory;
rtexec does not find these core files either.
The names of tasks are completely arbitrary; however, it has proven
convenient to use this convention:
the task name is the executable name, potentially followed by an underscore
and a qualifier. For instance, the name of a single orb2orb task would
be orb2orb. If there were multiple orb2orb tasks, the names might be
distinguished by qualifiers, e.g.: orb2orb_AZ, orb2orb_cslb, etc.
rtexec does not shut down running cron tasks at exit. This may or may not be the desired behavior, depending on the cron task in question.
If rtexec shuts down while a cron task is still running, the rtsys/rtsys.cron table will not be updated with the appropriate last_end value.
To clear the resulting cron-job-still-running indicator for rtm, go into the rtsys/rtsys.cron table with dbe or dbset and set any NULL
values of the last_end field to now().
AUTHOR
Daniel Quinlan
Boulder Real Time Technologies, Inc.