cdorb2db - create continuous database from cd1 packets
cdorb2db [-a wffile] [-c chanmatch]
[-m srcmatch] [-r srcreject]
[-p pf-file] [-S state-file]
[-t time-tags] [-w wfname] [-v]
orb db [start-time [window]]
cdorb2db copies data from the ring buffer orb to a continuous
database db. A starting time may be specified, otherwise
cdorb2db attempts to coordinate with an existing database, starting
after the earliest of the maximum endtimes for each station/channel
in the database.
cdorb2db reads cd1 packets from the ring buffer and attempts to fit them
into continuous uncompressed four byte integer waveform files,
typically on day boundaries. It requires
that packets fit together; ie, the packet times and number of
samples should fit into a continuous waveform file with a single
sample rate. This presumption may not be true: data loggers often
have inconstant clocks, packets may be lost, connections may go up
and down. However, cdorb2db ignores any problems and writes all
data into a single volume.
The uncompressed waveform files may be compressed at a later date --
after all the data from the channel has been acquired -- using db2msd(1).
In addition to copying data, cdorb2db
accumulates some statistics about packet errors into
the gap, changed, retransmit and duplicate tables.
cdorb2db is continuously modifying a set of wfdisc records (usually
for the current day) as it runs.
As with any Datascope database, no other process should be
modifying the output database while cdorb2db is running. However,
in the real time system, it is useful to be able trim this output
database by deleting rows representing older data. This trimming
improves performance of other programs running against the database.
cdorb2db implements a simple mechanism for accomplishing this trimming
without shutting down and restarting cdorb2db. A special file is
created named after the output database: database.MSGFILE.
The first 8 bytes of this file contain a flag which is
set while a cleanup is performed, and an integer count of
cleanups performed.
cdorb2db tests the flag and the count at every packet;
when the flag is set, or the count has changed,
cdorb2db closes its open database, and
waits for the flag to be reset.
When the flag is reset, it
reopens the output database, and
finds the new record numbers for the records it is modifying.
The MSGFILE may be monitored or modified using the program orb2db_msg(1).
Because there is no acknowledgement (from cdorb2db that it has paused),
it is important
to allow sufficient time for cdorb2db to stop itself, before beginning to
modify the database. orb2db_msg sleeps for some period of time before
returning, in an attempt to assure this time window.
In addition, the time required to
clean up the database must be considerably less than the time allowed
by the ring buffer size, so that no data is lost.
cd1 data packets
may arrive in the orb out of time order. Duplicate packets may
be present, and some packets may be lost completely.
cdorb2db simply writes packets into the waveform files as they
are read from the orb, backing up in the orb as necessary.
-
-c chanmatch
As each packet is unstuffed, the complete net_sta_chan_loc
name for each channel in the packet is composed and
compared to this regular expression chanmatch.
Only channels which match the regular expression
are accumulated and written to the output database.
This option is for separating out certain channels from
multi-channel packets, and is of limited utility; generally,
you would use the -m and -r
options to select networks and stations of interest.
-
-m srcmatch
Only packets which match the regular expression srcmatch
are requested from the orb server.
-
-r srcreject
packets which match the regular expression srcreject
are not collected from the orb server.
-
-p pf-file
Specify an alternative parameter file name pf-file
instead of cdorb2db.pf.
-
-S statefile
cdorb2db saves the pktid of the latest packet it processed
in this file when it quits abnormally
-
-t file
As a debugging tool, record each packet into a time tag file
as data from the packet is saved.
-
-v
Be more verbose; -vv shows every packet read.
-
-w wfname
see trwfname(3) -- this allows choosing the output
file naming convention.
Note that delayed packets (srcname has "/@" appended) are
saved into delayed waveform files, which similarly have "@" appended.
Similarly, they have independent wfdisc records.
-
start-time
-
window
A time range may be specified. The first parameter is the time
in any of the formats accepted by epoch(1). The second may be
either an ending time, or a time-window. A time window is most
conveniently specified as hours and minutes like: hh:mm or
hh:mm:ss. The parameter file specifies a maximum acceptable
time window (default is 24 hours); longer time windows are usually
command line errors, but a longer time window can be specified using an
end time rather than a window.
-
preferred_waveform_file_range
Waveforms for a single net/sta/chan are stored in a single
file, covering this range of seconds.
-
preferred_waveform_file_offset
waveform volumes start and end at a range boundary + offset
So, if you desire to have day volumes, but to make the day volumes
correspond to local time rather than UTC, you might specify offset
as the local offset in seconds from UTC.
Some kinds of problems can be quietly saved into the database tables
changed, retransmit, ratechange and gap. The parameters below allow
also printing error messages at regular intervals beyond a particular
threshold error rate.
-
chatter_limit
each type of error message is printed at most once per
chatter_limit seconds.
-
min_problem_count
-
min_problem_time
error messages are output only when there are at least min_problem_count
problems within min_problem_time
-
max_window
If a time range (rather than an end time) is specified on the command line,
then that range must be less than this parameter; the default is 24 hours.
Longer time windows are usually
command line errors.
The following parameters affect whether more detailed information about packet anomalies
is saved into database tables. The tables make more detailed analysis possible, but when
transmissions are poor, or clocks wander a lot, the tables may grow unreasonably large
unreasonably quickly. The default is not to write into the tables.
-
record_changed
record records to the changed table when the samprate, calib or tick registration
specified in the packet changes.
-
record_ratechange
add records to ratechange table when the observed sample rate changes. The sample
rate is calculated from the difference between packet time tags and the number of samples
in a packet.
-
pause_timeout
maximum time (in seconds) to wait after pause for the matching
continue. This is a failsafe, in case the program issuing the
original pause fails to deliver a continue for some reason. The
period should probably be less than the ring buffer size, and
definitely greater than the worst case performance of any backup or
cleanup script. Having this number too short risks corrupting the
output database and causing an awful mess. Having this number too
long risks losing some data.
-
too_old
Some positive time difference between now and the packet time may
be specified here; if a packet is older than now() - too_old, the
packet is not saved to the database. However, it can optionally
be saved to the discards file.
The time can be straight seconds, or
anything understood by str2epoch (eg, "72:00" is 72 hours or 3 days).
-
too_new
Some positive time difference between the packet time and now may
be specified here; if a packet is newer than now() + too_new, (ie, it's
from the future) the packet is not saved to the database.
However, it can optionally
be saved to the discards file.
The time can be straight seconds, or
anything understood by str2epoch (eg, "72:00" is 72 hours or 3 days).
-
discards
A file may be specified here, relative to the output database directory,
where packets that are too old or too new are saved in forb(5) format.
This might facilitate debugging the data logger problem implied by packets
from the distant past or far future. It might even allow recovery of the
improperly time tagged data.
cdorb2db adds records to the wfdisc table, and optionally to the gaps, retransmit, ratechange,
and changed tables. It also creates waveform files, following the defaults dictated by trwfname(3)
and trdefaults.pf(5) or the command line argument.
see antelopeenv(5)
Continuously collect data from the BHZ channel of station RDM from a
ring buffer on XYZ to a database rdm.
% cdorb2db -m '.*RDM.*' -c '.*_RDM_BHZ' XYZ rdm
Collect data for the last 10 minutes
from the BHZ channels of stations from a ring buffer on XYZ to a
database cels.
% cdorb2db -c '.*_BHZ' XYZ cels '1996298 15:37' -0:10 0:10
Generally, cdorb2db soldiers on in the face of errors; however,
it gives up when writes or database updates start to fail.
-
"Can't close 'wfdisc->path'."
-
"Couldn't save pktid #pktid database packet for 'srcname'"
-
"Can't create new wfdisc record"
-
"Can't create new waveform data file"
-
"Can't open wfdisc->path to write trace data."
-
"Can't open output database database"
-
"Can't open output table 'database'"
Something is amiss in the output database.
-
discarding data for xx_xxx from packet #nnn
either calib changed, or the segtype changed, or the apparent sampling comb changed.
-
"unknown return code unstuff from unstuffPkt for pktid=pktid at s=strtime(pkttime)"
unstuffPkt(3) was unable to unstuff the specified packet.
-
"Can't read parameter file"
-
"Can't open ring buffer 'orbname'"
The orbserver(1) may not be running, or may be rejecting connections
from this machine.
-
"orbselect 'match' failed"
-
"orbreject 'reject' failed"
Something is wrong with the regular expression match or reject, and
the calls to orbselect(3) or orbreject(3) failed.
-
"orbget to get current server time fails"
-
"orbafter to strtime(params.from) failed"
-
"couldn't compile pattern 're'"
Something is wrong with the regular expression re.
db2msd(1)
orbmonrtd(1)
orbstat(1)
orb2db_msg(1)
orb2db(1)
cdorb2db has many fewer tests and checks than orb2db(1). It presumes that the
datalogger clocks are correct and that every packet contains the right time, and
the clock must be perfectly correct over the recording period. The latest
packet contents overwrite any previous data.
Be cautious about using both cdorb2db and orb2db. If you do, you must be careful
to have disjoint sets of packets going to corb2db and orb2db. If both orb2db and
cdorb2db get the same packets, they are likely to try to save them into the same
file in different formats, leaving behind an awful mess.
However, it should be possible to switch from orb2db to cdorb2db or vice versa,
once per day, without messing up the database.
Because cdorb2db keeps many files open, it's often necessary to raise the limit
of open files; try unlimit descriptors
If the start and stop times are more granular than the packet size,
the actual start and stop times vary from the specification; Data
gaps can also cause start and stop times which vary somewhat from the
spec.
cdorb2db does not detect any problem
when given a source match which doesn't select any sources; it waits
forever for a source matching the regular expression, even if
there is an explicit ending time.
cdorb2db should be run so that its database and its waveform files
are on local, not nfs-mounted partitions. Having the waveforms
on an nfs-mounted partition is particularly troublesome when the partition fills
up.
cdorb2db writes big-endian integer (s4) datatype on big-endian machines (SPARC
and power-pc),
and little-endian integer (i4) datatype on small-endian machines (intel architectures).
Daniel Quinlan
Boulder Real Time Technologies, Inc.
Table of Contents
Antelope Release 4.8 Darwin 8.6.0 2006-06-28
Boulder Real Time Technologies, Inc
For more information, contact support@brtt.com