NAME
orb2wf - archive waveform data from an ORB into a database
SYNOPSIS
orb2wf [-c chanmatch] [-m srcmatch] [-r srcreject]
[-p pf-file] [-S state-file] [{-v|-vv|-vvv}]
[-dbm dbmaster] orb db [start-time [window]]
DESCRIPTION
orb2wf writes waveform data from an ORB ring buffer to a continuous database.
The output waveform data is written into fixed duration sample grids of waveform sample values.
As new waveform sample values are read from ORB input packets, each sample value is projected
into its appropriate waveform file and sample position. When a new sample value projects into
a non-existing waveform file, then the sample rate corresponding to the new sample value is used
to determine a regular grid of sample values over the fixed duration of the new waveform file.
Also created is a new wfdisc row. The new waveform file is pre-filled with special sample values
to indicate missing data, referred to as gap sample values. As new sample values are read for
that channel, they are projected into the appropriate sample value slots in the waveform files.
When new waveform files are created, the corresponding start time of the first sample is always
exactly on a second boundary and aligned with the day-hour-minute boundaries according to the
fixed waveform durations. When new waveform files are created, the sample rate used to determine the
sample grid is rounded off to the nearest 0.1 sps for sample rates of 0.1 or higher. Note that this
method of waveform archiving insures that the time accuracy of each sample value is within +-0.5 of the
output waveform sample interval. This method of waveform archiving produces one wfdisc row and
waveform file per channel per output waveform duration interval regardless of the input waveform
time fragmentation or data time order.
This program is a replacement for the old cdorb2db(1) program originally written to archive
CD1.0 data produced by CTBTO systems. Although this program can be used to archive CD1.0 data, it has
been developed as a general alternative to orb2db(1). In most situations this program should be
used instead of orb2db(1) to archive waveform data. The reason for this is that this program
will produce well healed waveform archive databases regardless of the time fragmentation of the
input ORB data. When archiving data that is out of time order or displays lots of time tears, orb2db(1)
will produce heavily fragmented archive databases with many wfdisc rows and potentially many short
archive waveform fragments. The down side to using orb2wf is that it cannot compress the output
waveform archive on-the-fly. The output waveform format will be a fixed-size-per-sample format, such
as 4-byte integer or floating point formats. Post processing compression of waveforms can be done
with db2msd(1) once a particular time range of data is no longer being updated with new data.
A big difference between orb2wf and cdorb2db(1) is the way in which data channels with
changing sample rate or changing metadata are handled. In cdorb2db(1), the sample rate of incoming
ORB packets are only used once when new waveform channels are processed. The sample rate of the new waveform channel
is fixed forever according to the sample rate of the first ORB packet processed. Subsequent ORB packets for
each channel are projected to the nearest sample grid according to the start time for each ORB packet and
the sample rate of the original ORB packet that was used when the new channel was first created.
The rest of the samples after the start sample for each ORB packet are then laid down into the fixed sample grid
sequentially ignoring the samplerate of the input ORB packet. This can result in unexpected behavior
when the sample rate for a channel of data is changed substantially. orb2wf will always treat each
waveform sample value individually by applying the ORB packet sample rate to determine the real sample time
and then project into the fixed output sample grid. Also in orb2wf is a sample rate threshold
tolerance which is applied to each ORB packet sample rate against the fixed sample rate of the output
waveform sample grid. If a new ORB packet for a given channel has a sample rate that exceeds this
tolerance, then a new version of the output waveform sample grid is created to accomodate the changed
channel sample rate. The same thing will happen if calib, calper or segtype change
for a given data channel. See orbwfproc_export(3O) for a more detailed explanation of how this works.
OPTIONS
-
-c chanmatch
This is a regular expression that is matched against incoming SEED net_sta_chan[_loc] values
to determine which channels are processed. If this option is not specified, all channels are propcessed.
-
-m srcmatch
An ORB select expression that is applied to the waveform packet ORB reads.
This argument is optional and if not specified, all packets are selected.
-
-r srcreject
An ORB reject expression that is applied to the waveform packet ORB reads.
This argument is optional and if not specified, no packets are rejected.
-
-p pf-file
Name of program parameter file.
The actual parameter file name is pfname.pf. If this argument
is not specified, then the default pfname is "orb2wf".
-
-S state-file
This defines a state file that contains state information used
across program restarts.
If the file exists, it is read at program start and used to position the ORB read pointer.
At periodic intervals duration program execution and at program exit the last read ORB pktid and pkttime are saved in the state file.
-
-v|-vv|-vvv
Verbose log output.
-
-dbm dbmaster
This is the name of a master database that can be used to determine calib, calper and segtype
metadata information that can be married to data as it is read. This is optional and if not specified, then
the metadata will be taken directly from the input ORB packets.
-
orb
Name of the input ORB. This is a required argument.
-
db
Name of the output waveform archive database. This is a required argument.
-
start-time
A time to start processing the data.
This argument is optional and if not specified processing starts at the most recent data.
-
window
Where to end processing the data as a time duration relative to the start time.
This argument is optional and if not specified then the processing will continue indefinitely.
PARAMETER FILE
Following is an example of a parameter file.
# orb2wf parameter file
preferred_waveform_file_range 86400 # waveform files should cover this range of seconds:
# 86400 = 1 day
# 3600 = 1 hour
pause_timeout 3600 # maximum time (in seconds) to wait after pause for
# the matching continue. This is a failsafe, in case the
# program issuing the original pause fails to deliver
# a continue for some reason. The period should
# probably be less than the ringbuffer size, and
# definitely greater than the worst case performance
# of any backup or cleanup script.
too_old 31556925 # ~ 1 year # fill in with a time x to discard packets
# with time less than system time - x
too_new 604800 # ~ 1 week # fill in with a time to discard packets
# with times greater than system time + x
decent_interval 60 # interval at which to save state file in seconds
max_open_files 500 # force opening and closing files after specified number
calibfromdb no # marry calib, calper, segtype from a master db?
heap_throttle 10M # heap memory based throttle on input
maxduration 300.0 # maximum time duration for buffering data before flushing to disk
realtime_thresh 60.0 # data latency threshold for reducing buffering
samprate_thresh 10.0 # percent change in sample rate that triggers new wfdisc rows
calib_thresh 1.0 # percent change in calib value that triggers new wfdisc rows
calper_thresh 1.0 # percent change in calper value that triggers new wfdisc rows
nullcalib no # Should output calib, calper and segtype all be set to NULL values?
translations &Tbl{ # SEED code translations for output
.* ONETSTA_OCHANLOC # noop translation
}
datatype input # output waveform data type
add_wfdisc_headers yes # should a copy of the wfdisc row be added as a waveform header?
gain 1.0 # gain value for converting floating data to integer data
pf_revision_time 1319822905
The orb2wf parameters are as follows.
-
preferred_waveform_file_range
This is the duration of output wfdisc waveform segments in seconds. The output segments are always aligned
exactly on day-hour-minute boundaries.
-
pause_timeout
This is a time duration in seconds used to override the wait time when writing
to an output database. The program orb2db_msg(1) can be used to pause
output database processing for periodic database maintenance, such as cleaning.
During this time the output databases are closed and the tasks that write
to databases are paused until the continue signal is sent by another
call to orb2db_msg(1). When the continue signal is received, the database
output tasks will reopen the output databases and continue processing.
The database output tasks will reopen the output databases and continue processing
automatically after pause_timeout seconds while paused even without
a signal from orb2db_msg(1).
-
too_old
A time duration in seconds that is used to determine if an incoming ORB packet is too old to be
processed. The packet is thrown away if the current system time minus its packet time is greater
than this parameter. A value of 0 disables this test. The default value is 0.
Used to filter out packets with bad time stamps.
-
too_new
A time duration in seconds that is used to determine if an incoming ORB packet is too new to be
processed. The packet is thrown away if its packet time minus the current system time is greater
than this parameter. A value of 0 disables this test. The default value is 0.
Used to filter out packets with bad time stamps.
-
decent_interfval
This is a time duration in seconds that determines how often the program state
file is updated.
-
max_open_files
This specifies the maximum number of waveform files that are kept open at any single time.
As incoming data is buffered into the minimum
fragmentation waveform sample grids, this controls the maximum number of waveform files that are
open. When this number is exceeded, the waveform files with the oldest modification times are automatically
closed to keep the count below the maximum.
-
calibfromdb
If this is yes, then the calib, calper and segtype values
obtained from the master database calibration table, instead of the ORB input packets.
If this is no, then the calib, calper and segtype values
are obtained from the ORB input packets. This parameter
defaults to no. If this parameter is yes, then the -dbm command
line argument must be specified.
-
heap_throttle
This is a total heap memory allocation size in bytes that is used as a threshold for triggering throttling of the input data.
The FIFO-related heap memory for all FIFOs is continuously monitored and compared against this number. When the total FIFO
memory gets larger than this value, the input reading is suspended until the total FIFO heap memory decreases back
below the threshold. This number can be specified with an ending K character for thousands of bytes, or M for
millions of bytes, or G for billions of bytes. Specification of this parameter will tend to keep heap mamory allocations
from becoming excessive in situations where importing data is much faster than processing and exporting the results.
Setting this to 0 disables the heap memory throttling.
-
maxduration
This is a data time duration in seconds that controls how often the waveform buffers accumulated by pktchannel2buffer(3)
are flushed out to disk.
-
realtime_thresh
This is a data latency threshold value in seconds that controls adaptive flushing of waveform buffers. This causes the maxduration
values to be decreased in a linear fashion according to the data latency. When the data latency is greater than realtime_thresh, maxduration
is used directly. As the data latency goes down from realtime_thresh to 0, the value
of maxduration is ramped down to 0, which causes the data to be immediately flushed out to the waveform files. In this manner data
latency is minimized when the data is caught up with real time but is buffered efficiently when data is lagging behind real time.
-
samprate_thresh
This is a precentage threshold that is used to determine if a sample rate change is large enough to trigger generation of new
wfdisc rows and waveform files. If this is less than 0.0, then sample rate changes will never trigger generation of new wfdisc rows and
waveform files.
-
calib_thresh
This is a floating precentage threshold that is used to determine if a calib value change is large enough
to trigger generation of new wfdisc rows and waveform files. If this is less than 0.0, then calib value
changes will never trigger generation of new wfdisc rows and waveform files.
-
calper_thresh
This is a floating precentage threshold that is used to determine if a calib value change is large enough
to trigger generation of new wfdisc rows and waveform files. If this is less than 0.0, then neither calib, calper nor segtype value
changes will trigger generation of new wfdisc rows and waveform files. If the calib, calper and segtype values are being
managed through post-processing, such as with rtdbclean(1), then calib_thresh should be set to a negative number.
-
nullcalib
This should be one of yes or no and specifies whether or not the calib, calper and segtype
values are set to NULL values before output.
-
translations
This is a
morph table of SEED net_sta_chan[_loc] expressions that specifies which input SEED channels
to archive to the output database and a mapping that translates the input SEED net_sta_chan[_loc]
codes to output SEED net_sta_chan[_loc] codes. See the morph(3) man page for a
description of morph table specifications. The input string is composed from
the input SEED net_sta_chan[_loc]. If an expression is found to match the input
string, the morph utility generates an output string according to the
morph substitution rules. The output strings should be specified in net_sta_chan
or net_sta_chan_loc form. The special string ONETSTA can be used in the output
string to mean substitution with the input net_sta code. Similarly, the special
string OCHANLOC can be used in the output string to mean substitution with the
chan or chan_loc code determined from the input SEED packets and the special
string OCHAN can be used in the output string to mean substitution with just the
chan code determined from the input SEED packets.
If an input net_sta_chan[_loc]
string does not match any of the morph table entries, then that channel is not
processed. Note that these translations are
automatically converted back to CSS3.0 sta and chan codes using
the foreign(3) routines before writing to the output database.
-
datatype
This specifies the output waveform data format.
The datatype parameter must be one of the following.
-
input
The data is kept in its native format. Note that most waveform data is represented internally as
floats and therefore this value will usually result in output waveform data in 4-byte floating format.
The byte order is the same as the host computer byte order. This is the default.
-
integer
The output data is converted to 4-byte integer format in the host computer byte order.
-
float
The output data is converted to 4-byte floating format in the host computer byte order.
-
s4
The output data is converted to 4-byte integer format in Motorola byte order.
-
i4
The output data is converted to 4-byte integer format in Intel x86 byte order.
-
t4
The output data is converted to 4-byte floating format in Motorola byte order.
-
f4
The output data is converted to 4-byte floating format in Intel x86 byte order.
-
add_wfdisc_headers
This controls whether or not each wfdisc row is added as a header to each of the waveform files.
Since the output waveform files are usually in 4-byte raw integer or floating format, this provides
a simple mechanism to retrieve lost wfdisc rows directly from the waveform files themselves
without having to resort to the file naming conventions, much like can be done now with miniseed data.
When the wfdisc headers are added, a byte offset to the binary waveform data is put into each of the wfdisc rows.
-
gain
This is a floating gain factor applied to waveform sample data before it is converted to its
output format and written to the waveform files. When this value is not 1 it is used
to modify the wfdisc calib value by dividing the calib value. This parameter
is provided for cases where input waveform data is in floating format and has a small diminished dynamic range
so that conversions to integer can be scaled appropriately.
BUGS AND CAVEATS
orb2wf is a skin for
orbwfproc and is exactly the same executable as
orbwfproc.
The program name is used to determine which skin is being run and then
parses arguments differently depending on whether the database or ORB version
is being run.
SEE ALSO
orbwfproc(1)
orbwfproc_import(3o)
orbwfproc_export(3o)
AUTHOR
Danny Harvey
Boulder Real Time Technologies, Inc.