NAME
orbwfproc_export - data export task for orbwfproc program
SYNOPSIS
As specified in
orbwfproc(1) parameter file:
wf_tasks &Tbl{
...
<task_name> export <parameters> <input> -
...
}
DESCRIPTION
export is a task class used by
orbwfproc(1) to export data. It can
export waveform and grid data to running ORBs, and can also export waveform data
to Datascope databases.
INPUT
Each
export task instance reads input data objects from one
orbwfproc(1) FIFO queue.
The input data objects may be of type
wf (see
orbwfproc_import(3o)),
wfgather
(see
orbwfproc_wfgather(3o)), or
wfstack
(see
orbwfproc_wfstack(3o)). Currently raw import waveforms, gathered waveforms, waveform stack
results and grid stack results are supported (grids are only supported for ORB output).
OUTPUT
The
export task class puts all of its output into ORB packets and
writes these packets to a running ORB, through a normal connection
to
orbserver(1), or it will write waveforms into an output Datascope database.
All of the ORB-related and Datascope-related configurations and communication are managed directly by the
export
task instances. Note that the
output specification in the
wf_tasks table in the
orbwfproc(1) parameter
file should be set to
- since
export task instances do not need to write output
to the internal
orbwfproc(1)
FIFO queues. For ORB output, the definition of ORB name is
specified through the
orbwfproc(1) command line and the
export task parameter file entry
within the
orbwfproc(1) parameter file. For database output, the output database name is
specified through the
orbwfproc(1) command line
datatag dataname arguments or
the
dbwfproc(1) dbout command line argument, along with
the
export task parameter file entry
within the
orbwfproc(1) parameter file.
OPTIONS FROM ORBWFPROC COMMAND LINE
-
-state statefile
This defines a state file that contains export task state information used
across program restarts. This file must be in parameter file format with an
associative array entry corresponding to each export task name. If the file
and associative array entry exist, it is read at program start and used to trim
out time overlapping output ORB packets from a previous program run.
At program exit the last sample times for each output ORB waveform channel are saved in the state file.
-
-pause_timeout pause_timeout
This is a time duration in seconds used to override the wait time when writing
to an output database. The program orb2db_msg(1) can be used to pause
output database processing for periodic database maintenance, such as cleaning.
During this time the output databases are closed and the tasks that write
to databases are paused until the continue signal is sent by another
call to orb2db_msg(1). When the continue signal is received, the database
output tasks will reopen the output databases and continue processing.
The database output tasks will reopen the output databases and continue processing
automatically after pause_timeout seconds while paused even without
a signal from orb2db_msg(1).
-
{datatag orbname|datatag dbname}
These are required arguments and are used to define the ORB name for the output ORB, or a database name for the output database.
The datatag is matched against the value of the input parameter to determine
the orbname or dbname (see PARAMETER FILE ENTRY below).
OPTIONS FROM DBWFPROC COMMAND LINE
-
dbout
Name of an output database that contains the output waveform data.
Note that when dbwfproc(1) is being run, the output_type parameter
is automatically set to db.
PARAMETER FILE ENTRY
An example of an
export task parameter file entry within the
orbwfproc(1) parameter file is:
# This is the list of processing tasks to be run
wf_tasks &Tbl{
#task_name class_name parameters input output
...
myexport export export_params wfgather - # export wfgather data to an ORB
...
}
export_params &Arr{ # parameters for exportg task
output_type db # output to orb or db?
output dbout # output to orbout tag in command line
translations &Tbl{ # SEED code translations table - this just adds a 'g' loc code
.* ONETSTA_OCHAN_g
}
trim_overlaps yes # Trim time overlapping waveform data from a previous run?
repackage &Tbl{ # used to repackage output ORB packets
#nscl_expr twin suffix subcode
BR_Q113_HH. 10.0 MGENC M10S
}
fragment_wfdisc no # preserve wfdisc fragmentation?
wfduration 86400.0 # time duration of output wfdisc rows
maxduration 300.0 # maximum time duration for buffering data before flushing to disk
realtime_thresh 60.0 # data latency threshold for reducing buffering
datatype auto # output waveform data type
gain 1.0 # gain value for converting floating data to integer data
max_open_files 5000 # maximum number of open waveform files
addcheck no # perform wfdisc add check?
samprate_thresh 10.0 # percent change in sample rate that will trigger generation fo a new wfdisc row
calib_thresh 1.0 # percent change in calib value that will trigger generation fo a new wfdisc row
calper_thresh 1.0 # percent change in calper value that will trigger generation fo a new wfdisc row
nullcalib no # Should output calib, calper and segtype all be set to NULL values?
add_wfdisc_headers yes # should a copy of the wfdisc row be added as a waveform header?
wfname_pattern %Y/%j/%{sta}.%{chan}.%Y.%j.%H.%M.%S # pattern for naming waveform files
maximum_flagged_gap_duration 30.0 # largest internally flagged gap duration
}
...
The export task parameters are as follows.
-
output_type
This specifies the type of the output, either orb for ORB output, or db for database output.
If this parameter is not specified, it defaults to orb. Note that when running dbwfproc(1)
this parameter is automatically set to db and need not be specified in the parameter file.
Also note that when running orbwfproc(1) it is possible to write output to a database by
setting this to db.
-
output
This must correspond to one of the datatag names in the orbwfproc(1) command line
and defines the output ORB, when output_type is orb,
or defines the output database name, when output_type is set to db.
This parameter is required when running orbwfproc(1).
When running dbwfproc(1) this is automatically set to the dbout command line argument and need
not be set in the dbwfproc(1) parameter file.
-
subcode
This is an optional subcode field that is added to all ORB packet srcnames
in the form source/suffix/subcode. This is only used when output_type is orb.
-
translations
This is a
morph table of SEED net_sta_chan[_loc] expressions that specifies which input SEED channels
to copy to the output ORB or database and a mapping that translates the input SEED net_sta_chan[_loc]
codes to output SEED net_sta_chan[_loc] codes. See the morph(3) man page for a
description of morph table specifications. The input string is composed from
the input SEED net_sta_chan[_loc]. If an expression is found to match the input
string, the morph utility generates an output string according to the
morph substitution rules. The output strings should be specified in net_sta_chan
or net_sta_chan_loc form. The special string ONETSTA can be used in the output
string to mean substitution with the input net_sta code. Similarly, the special
string OCHANLOC can be used in the output string to mean substitution with the
chan or chan_loc code determined from the input SEED packets and the special
string OCHAN can be used in the output string to mean substitution with just the
chan code determined from the input SEED packets.
If an input net_sta_chan[_loc]
string does not match any of the morph table entries, then that channel is not
copied to the output ORB or output database. Note that these translations are
automatically converted back to CSS3.0 sta and chan codes using
the foreign(3) routines when the output is to a database.
-
trim_overlaps
This is a boolean that controls whether or not to trim out waveform samples
that time overlap with output samples from a previous program run. The last
channel sample value times are stored in the program state file.
-
repackage
This is a table that defines how output ORB packets should be formed. This parameter
is only used when output_type is orb. Each line should have four fields.
The first field is a regular expression that is matched against the channel SEED code,
net_sta_chan[_loc], after any translations. The first line that matches is used
to define output ORB packet repackaging parameters. The second field is the time duration of
the output ORB packets. The third field is the suffix component of the output ORB
packet srcname which defines the packet format.
The fourth field is the subcode component of the output ORB
packet srcname. Note that for multiplexed packet formats, such as suffix = MGENC,
all channels that match a particular line in the table will be multiplexed together.
This parameter is optional and if not specified, all channels will occupy single output
ORB packets.
-
nullcalib
This should be one of yes or no and specifies whether or not the calib, calper and segtype
values are set to NULL values before output.
-
fragment_wfdisc
This parameter and all of the following are only used when the output is to a database.
The fragment_wfdisc parameter is a fundamental switch that controls how the output waveform
data is to be processed. If this is set to no, then an attempt is made to minimize wfdisc
table fragmentation by writing sample values into pre-determined sample grids, where each new wfdisc
row always spans a single channel-wfduration and all of the waveform sample values are prefilled by
flagged gap values. In this mode the pktchannel2db(3) utility is used to buffer and accumulate the data
into the fewest wfdisc-day rows possible. In this mode data is always written as 4-byte integer
or floating samples into a nominal sampling grid and data gaps are represented as specially flagged
sample values instead of as separate wfdisc rows. This mode provides the highest performance
and results in a database with the minimum amount of waveform fragmentation. The default
value of fragment_wfdisc is no.
If fragment_wfdisc is set to yes, then each incoming FIFO packet will result in a
separate output wfdisc row. WARNING - SETTING fragment_wfdisc TO yes CAN HAVE
VERY NEGATIVE SIDE EFFECTS AND SHOULD ONLY BE DONE IF THE USER UNDERSTANDS THESE NEGATIVE SIDE EFFECTS.
In this mode no attempt is made to accumulate or buffer waveform data. Also, no attempt is made
to avoid overlapping or repeated waveforms. Generally, this mode should never be used for real-time
input data since it will make a new wfdisc row for every input ORB packet. The exception would
be for very specialized debugging cases. Generally, this should only be used when processing input
data from a database and, more specifically, when the time_slice_duration parameter
in the import tasks have been set to 0, which causes the import tasks to
preserve the original wfdisc table based waveform segmentation characteristics. In this situation
there would be a one-to-one correspondence between the input wfdisc rows and the output wfdisc rows
and would be appropriate for certain batch processing of database waveform data, such as resampling.
-
wfduration
This is the duration of output wfdisc waveform segments in seconds. The output segments are always aligned
exactly on day-hour boundaries. This defaults to one day, 86400.0 seconds. This parameter is only
used when output_type is db and fragment_wfdisc is no.
-
maxduration
This is a data time duration in seconds that controls how often the waveform buffers accumulated by pktchannel2buffer(3)
are flushed out to disk. This defaults to 300.0 seconds. This parameter is only
used when output_type is db and fragment_wfdisc is no.
-
realtime_thresh
This is a data latency threshold value in seconds that controls adaptive flushing of waveform buffers. This causes the maxduration
values to be decreased in a linear fashion according to the data latency. When the data latency is greater than realtime_thresh, maxduration
is used directly. As the data latency goes down from realtime_thresh to 0, the value
of maxduration is ramped down to 0, which causes the data to be immediately flushed out to the waveform files. In this manner data
latency is minimized when the data is caught up with real time but is buffered efficiently when data is lagging behind real time. This
defaults to 60.0.
This parameter is only
used when output_type is db and fragment_wfdisc is no.
-
datatype
This specifies the output waveform data format and is only used when the output is to a database.
The datatype parameter must be one of the following.
-
auto
The data is kept in its native format. Note that most waveform data is represented internally as
floats and therefore this value will usually result in output waveform data in 4-byte floating format.
The byte order is the same as the host computer byte order. This is the default.
-
integer
The output data is converted to 4-byte integer format in the host computer byte order.
-
float
The output data is converted to 4-byte floating format in the host computer byte order.
-
s4
The output data is converted to 4-byte integer format in Motorola byte order.
-
i4
The output data is converted to 4-byte integer format in Intel x86 byte order.
-
t4
The output data is converted to 4-byte floating format in Motorola byte order.
-
f4
The output data is converted to 4-byte floating format in Intel x86 byte order.
-
gain
This is a floating gain factor applied to waveform sample data before it is converted to its
output format and written to the waveform files and is only used when the output is to a database. When this value is not 1 it is used
to modify the wfdisc calib value by dividing the calib value. This parameter
is provided for cases where input waveform data is filtered resulting in a diminished dynamic range
so that conversions to integer can be scaled appropriately. This parameter is ignored and not used
when fragment_wfdisc is set to yes.
-
max_open_files
This specifies the maximum number of waveform files that are kept open at any single time and is only used when the output is to a database.
This parameter is ignored and not used
when fragment_wfdisc is set to yes. As incoming data is buffered into the minimum
fragmentation waveform sample grids, this controls the maximum number of waveform files that are
open. When this number is exceeded, the waveform files with the oldest modification times are automatically
closed to keep the count below the maximum.
-
addcheck
This controls the behavior when a new incoming waveform packet overlaps an existing wfdisc
row but its calib, calper, or samprate values are sufficiently different and is only used when the output is to a database.
This parameter is ignored and not used when fragment_wfdisc is set to yes.
If addcheck is yes, then new waveform packets are compared against existing wfdisc
rows for inconcistencies in calib or calper (and sometimes segtype in cases where
datatype is set to auto). Also checked is if samprate. If there are conflicts, then an error message is printed
and the new conflicting packets are ignored. If addcheck is set to no, then new waveform
packets that conflict with existing wfdisc rows (i.e. inconsistent calib, etc. values) will
still be added to the output data by making competely new wfdisc rows and new waveform files
which will overlap with existing wfdisc rows and waveform files. This is the only way two
waveforms from the same channel in the same day but with different calib values can be
represented using the minimum fragmentation approach. Note that when this happens, the waveform file
names will have an instance designator to maintain separate files named from the same set of parameters.
-
samprate_thresh
This is a precentage threshold that is used to determine if a sample rate change is large enough to trigger generation of new
wfdisc rows and waveform files. If this is less than 0.0, then sample rate changes will never trigger generation of new wfdisc rows and
waveform files.
This parameter is ignored and not used when fragment_wfdisc is set to yes.
-
calib_thresh
This is a floating precentage threshold that is used to determine if a calib value change is large enough
to trigger generation of new wfdisc rows and waveform files. If this is less than 0.0, then neither calib, calper nor segtype value
changes will trigger generation of new wfdisc rows and waveform files. If the calib, calper and segtype values are being
managed through post-processing, such as with rtdbclean(1), then calib_thresh should be set to a negative number.
This parameter is ignored and not used when fragment_wfdisc is set to yes.
-
calper_thresh
This is a floating precentage threshold that is used to determine if a calper value change is large enough
to trigger generation of new wfdisc rows and waveform files. If this is less than 0.0, then calper value
changes will never trigger generation of new wfdisc rows and waveform files.
This parameter is ignored and not used when fragment_wfdisc is set to yes.
-
add_wfdisc_headers
This controls whether or not each wfdisc row is added as a header to each of the waveform files and is only used when the output is to a database.
This parameter is ignored and not used when fragment_wfdisc is set to yes.
Since the output waveform files are usually in 4-byte raw integer or floating format, this provides
a simple mechanism to retrieve lost wfdisc rows directly from the waveform files themselves
without having to resort to the file naming conventions, much like can be done now with miniseed data.
When the wfdisc headers are added, a byte offset to the binary waveform data is put into each of the wfdisc rows.
-
wfname_pattern
This specifies how waveform files should be named and is only used when the output is to a database.
This parameter is ignored and not used when fragment_wfdisc is set to no.
The waveform file naming follows the rules described in trwfname(3).
-
maximum_flagged_gap_duration
This specifies the maximum allowed time duration of consecutive internally flagged gaps in the output database and is only used when the output is to a database.
This parameter is ignored and not used when fragment_wfdisc is set to no.
Extra wfdisc rows are made to eliminate internally flagged gaps that go over this limit. When maximum_flagged_gap_duration
is set to a negative number, then all internally flagged gaps are preserved.
When maximum_flagged_gap_duration
is set to 0 number, then all internally flagged gaps are removed resulting in extra wfdisc fragmentation.
SEE ALSO
orbwfproc(1)
dbwfproc(1)
orbwfproc_import(3o)
AUTHOR
Danny Harvey
Boulder Real Time Technologies, Inc.