NAME
cdorb2db - create continuous database from cd1 packets (deprecated)
SYNOPSIS
cdorb2db [-c chanmatch]
[-m srcmatch]
[-r srcreject]
[-p pf-file]
[-S state-file]
[-t time-tags]
[-w wfname]
[-CsvW]
orb db [start-time [window]]
SUPPORT
Deprecated: NO BRTT support -- please seek alternatives.
DESCRIPTION
You should use
orb2wf(1) rather than cdorb2db.
cdorb2db writes waveform data from the ring buffer orb to a continuous
database db. It writes data by taking the samples from the packet and filling
them into a waveform day buffer at the appropriate location. Because the waveform day
buffer slots are fixed in time -- starting at epoch time midnight and proceeding through
the day at exactly the nominal sample rate -- writing the waveform data to the file
may entail rounding the packet time tag slightly to fit in a particular slot.
For data loggers with clocks that wander significantly, this will occasionally mean
a sample is left at the null value or a sample is overwritten.
However, the alternative approach -- as implemented in orb2db -- still rounds the time
tags slightly, but can also generate multiple wfdisc records which are difficult to piece
together for processing. In extreme cases, certain packet streams can cause orb2db to
generate many, many wfdisc records for a single day. This adversely affects the performance
of most processing software. The cdorb2db approach is much more stable and robust,
generating only a single wfdisc record per day.
cdorb2db writes waveforms only into uncompressed 4-byte integer format waveform files.
These take considerably more
space (about a factor of four) than miniseed. For this reason, it's generally a good
idea to compress the waveforms once a day is complete, using db2msd(1) in an rtexec
cron job.
Unless there is a state file or a specified start time,
cdorb2db starts at oldest packet in the orb.
To avoid conflicts with orb2db, cdorb2db appends
an equal sign (=) to the default waveform pattern. This can be overridden
with the -w option, at your peril. If orb2db and cdorb2d ever end up trying
to write to the same file, a mess will result, and the waveform file will be unusable.
The initial creation of the whole day waveform files can potentially create a
surge in activity soon after midnight UTC. This can be avoided by using the -W option,
which creates the next day waveform files earlier, spread out over a longer time
period. Of course, these waveform files take up disk space earlier than expected.
cdorb2db creates new wfdisc records (usually
for the current day) as it runs.
As with any Datascope database, no other process should be
modifying the output wfdisc table while cdorb2db is running, although
different processes can be adding to the table, so long as they are not
internally using any of the same records.
Trimming the output
database periodically, by deleting rows representing older data,
improves performance of programs using the wfdisc table.
cdorb2db implements a simple mechanism for allowing the trimming
without shutting down and restarting cdorb2db. A special file is
created named after the output database: database.MSGFILE.
The first 8 bytes of this file contain a flag which is
set while a cleanup is performed, and an integer count of
cleanups performed.
cdorb2db tests the flag and the count at every packet;
when the flag is set, or the count has changed,
cdorb2db closes its open database, and
waits for the flag to be reset.
When the flag is reset, it
reopens the output database, and
finds the new record numbers for the records it is modifying.
The MSGFILE may be monitored or modified using the program orb2db_msg(1).
Because there is no acknowledgement (from cdorb2db that it has paused),
it is important
to allow sufficient time for cdorb2db to stop itself, before beginning to
modify the database. orb2db_msg sleeps for some period of time before
returning, in an attempt to assure this time window.
In addition, the time required to
clean up the database must be considerably less than the time allowed
by the ring buffer size, so that no data is lost.
cd1 data
cdorb2db was originally developed to address the special requirements
of cd1 data streams.
cd1 data packets
may arrive in the
orb out of time order. Duplicate packets may
be present, and some packets may be lost completely; this data
stream creates many problems for the
orb2db(1) program, but
is not an issue for
cdorb2db.
Because the
cdorb2db approach is much more stable, it has
become the recommended method for creating waveform files.
OPTIONS
-
-C
Check the waveform file for overlaps and duplicates before each write; this
can be expensive. See also the dynamic controls.
-
-s
Monitor calib, calper and segtype, creating new wfdisc records and corresponding
waveform files if these change. The default behavior is to ignore these parameters;
oftentimes they are not present in input packets, and they are set later by
running rtdbclean(1). If the -s option is used, there is a danger that
cdorb2db will create multiple waveform files and records when it attempts to
match the incoming packet calib with a calib set by rtdbclean.
-
-W
Attempt to produce the initialized waveform files for the next day,
based on waveform files in which packet data is written during the
current day. This should avoid the crunch around UTC midnight, when
otherwise a large number of files might be created, possibly slowing the
rate at which data is written to the waveform files.
-
-c chanmatch
As each packet is unstuffed, the complete net_sta_chan_loc
name for each channel in the packet is composed and
compared to this regular expression chanmatch.
Only channels which match the regular expression
are accumulated and written to the output database.
This option is for separating out certain channels from
multi-channel packets, and is of limited utility; generally,
you would use the -m and -r
options to select networks and stations of interest.
-
-m srcmatch
Only packets which match the regular expression srcmatch
are requested from the orb server.
-
-r srcreject
packets which match the regular expression srcreject
are not collected from the orb server.
-
-p pf-file
Specify an alternative parameter file name pf-file
instead of cdorb2db.pf.
-
-S statefile
cdorb2db saves the pktid of the latest packet it processed
in this file when it quits abnormally.
-
-t file
As a debugging tool, record each packet into a time tag file
as data from the packet is saved.
-
-v
Be more verbose; -vv shows every packet read; -vvv shows creation of
next day waveform files.
-
-w wfname
see trwfname(3) -- this allows choosing the output
file naming convention.
-
start-time
-
window
A time range may be specified. The first parameter is the time
in any of the formats accepted by epoch(1). The second may be
either an ending time, or a time-window. A time window is most
conveniently specified as hours and minutes like: hh:mm or
hh:mm:ss. The parameter file specifies a maximum acceptable
time window (default is 24 hours); longer time windows are usually
command line errors, but a longer time window can be specified using an
end time rather than a window.
DYNAMIC CONTROLS
The dynamic control byte flag
check_overlaps may be set to dynamically turn on
checking for overlapping and duplicate data, or reset to turn off the checking.
PARAMETER FILE
-
preferred_waveform_file_range
Waveforms for a single net/sta/chan are stored in a single
file, covering this range of seconds.
-
preferred_waveform_file_offset
waveform volumes start and end at a range boundary + offset
So, if you desire to have day volumes, but to make the day volumes
correspond to local time rather than UTC, you might specify offset
as the local offset in seconds from UTC.
-
decent_interval
The state file is refreshed at this interval.
Some kinds of problems can be quietly saved into the database tables
changed, retransmit, ratechange and gap. The parameters below allow
also printing error messages at regular intervals beyond a particular
threshold error rate.
-
chatter_limit
each type of error message is printed at most once per
chatter_limit seconds.
-
min_problem_count
-
min_problem_time
error messages are output only when there are at least min_problem_count
problems within min_problem_time
-
max_window
If a time range (rather than an end time) is specified on the command line,
then that range must be less than this parameter; the default is 24 hours.
Longer time windows are usually
command line errors.
The following parameters affect whether more detailed information about packet anomalies
is saved into database tables. The tables make more detailed analysis possible, but when
transmissions are poor, or clocks wander a lot, the tables may grow unreasonably large
unreasonably quickly. The default is not to write into the tables.
-
record_changed
record records to the changed table when the calib or segtype
specified in the packet changes.
-
pause_timeout
maximum time (in seconds) to wait after pause for the matching
continue. This is a failsafe, in case the program issuing the
original pause fails to deliver a continue for some reason. The
period should probably be less than the ring buffer size, and
definitely greater than the worst case performance of any backup or
cleanup script. Having this number too short risks corrupting the
output database and causing an awful mess. Having this number too
long risks losing some data.
-
too_old
Some positive time difference between now and the packet time may
be specified here; if a packet is older than now() - too_old, the
packet is not saved to the database. However, it can optionally
be saved to the discards file/orb.
The time can be straight seconds, or
anything understood by str2epoch (e.g., "72:00" is 72 hours or 3 days).
The default is set to 1 year old.
-
too_new
Some positive time difference between the packet time and now may
be specified here; if a packet is newer than now() + too_new, (i.e., it's
from the future) the packet is not saved to the database.
However, it can optionally
be saved to the discards file/orb.
The time can be straight seconds, or
anything understood by str2epoch (e.g., "72:00" is 72 hours or 3 days).
The default is set to 1 week.
-
discards
A file (beginning with either ./ or /), or
an output orbserver may be specified here.
Packets that are too old or too new or fail to unstuff are saved in forb(5) format.
This can facilitate debugging. For data logger problems (implied by packets
from the distant past or far future), it might even allow recovery of the
improperly time tagged data.
-
max_open_files
if specified, force opening and closing files after specified number. On a mac,
be sure that the default limits have been increased appropriately before
resorting to using this.
-
start_next_wffile_creation
-
stop_next_wffile_creation
cdorb2db accumulates names for next day waveform files until the time specified in start_next_wffile_creation.
E.g, if start_next_wffile_creation is 15:30, then cdorb2db won't start creating the next
day's waveform files until epoch time 15:30 each day. And regardless of whether all files have been
created, cdorb2db stops creating waveform files after stop_next_wffile_creation.
FILES
cdorb2db creates waveform files, following the defaults dictated by
trwfname(3)
and
trdefaults.pf(5) or the command line argument.
ENVIRONMENT
see
antelopeenv(5)
EXAMPLE
Continuously collect data from the BHZ channel of station RDM from a
ring buffer on XYZ to a database rdm.
% cdorb2db -m '.*RDM.*' -c '.*_RDM_BHZ' XYZ rdm
Collect data for the last 10 minutes
from the BHZ channels of stations from a ring buffer on XYZ to a
database called cels.
% cdorb2db -c '.*_BHZ' XYZ cels '1996298 15:37' -0:10 0:10
RETURN VALUES
returns 0 for success, 1 if any failure occurs.
SEE ALSO
db2msd(1)
orbmonrtd(1)
orbstat(1)
orb2db_msg(1)
orb2db(1)
BUGS AND CAVEATS
cdorb2db has many fewer tests and checks than
orb2db(1). It presumes that the
datalogger clocks are correct and that every packet contains the right time, and
the clock must be perfectly correct over the recording period. The latest
packet contents overwrite any previous data.
Be cautious about using both cdorb2db and orb2db. If you do, you must be careful
to have disjoint sets of packets going to corb2db and orb2db. If both orb2db and
cdorb2db get the same packets, they are likely to try to save them into the same
file in different formats, leaving behind an awful mess.
However, it should be possible to switch from orb2db to cdorb2db or vice versa,
once per day, without messing up the database.
Because cdorb2db keeps many files open, it's often necessary to raise the limit
of open files; try unlimit descriptors
If the start and stop times are more granular than the packet size,
the actual start and stop times vary from the specification; Data
gaps can also cause start and stop times which vary somewhat from the
spec.
cdorb2db does not detect any problem
when given a source match which doesn't select any sources; it waits
forever for a source matching the regular expression, even if
there is an explicit ending time.
cdorb2db should be run so that its database and its waveform files
are on local, not nfs-mounted partitions. Having the waveforms
on an nfs-mounted partition is particularly troublesome when the partition fills
up.
cdorb2db writes big-endian integer (s4) datatype on big-endian machines (SPARC
and power-pc),
and little-endian integer (i4) datatype on small-endian machines (intel architectures).
The timing of waveforms written with cdorb2db is only good to 1/2 dt; this may be an issue
for very low frequency data -- e.g. 1 sample/second or less.
AUTHOR
Daniel Quinlan
Boulder Real Time Technologies, Inc.