NAME

orb2db - create continuous database from orb ring buffer

SYNOPSIS


orb2db [-a wffile] [-c chanmatch]
       [-m srcmatch] [-r srcreject]
       [-p pf-file] [-S state-file] [-s datatype]
       [-t time-tags] [-T tolerance] [-w wfname] [-v]
           orb db [start-time [window]]

DESCRIPTION

orb2db copies data from the ring buffer orb to a continuous database db. A starting time may be specified, otherwise orb2db attempts to coordinate with an existing database, starting after the earliest of the maximum endtimes for each station/channel in the database.

orb2db reads packets from the ring buffer and attempts to fit them into continuous waveform files, typically on day boundaries. It expects packets to fit together; i.e., the packet times and number of samples should fit into a continuous waveform file with a single sample rate. This presumption is not always true: data loggers often have inconstant clocks, packets may be lost, connections may go up and down. For these and other reasons, there may be multiple wfdisc records for a channel in a single day, though in the best case there would be only one.

When starting against an existing database, orb2db attempts to begin saving data immediately after the last data for that channel for that day in the existing database. This avoids overlapping data, but doesn't require a complicated state file, as it's ok to start a little before the time where orb2db previously stopped.

In addition to copying data, orb2db accumulates some statistics about packet errors into the gap, changed, retransmit and ratechange tables.

orb2db expects the sample rate from the packet to be approximately correct, in the sense that the time tags of adjacent packets are within 1/2 sample period of the time calculated from the previous time tag and number of samples and the sample rate. It attempts to handle a clock (or data logger like the Quanterra) where the actual sample rate is slightly different from the rate specified in the packet, by calculating a rate based on the time tags and requiring that the time calculated with this rate and the number of samples match the actual time tag to within 1/2 a sample period. New wfdisc rows are generated when this criterion fails.

orb2db is continuously modifying a set of wfdisc records (usually for the current day) as it runs. As with any Datascope database, no other process should be modifying the output database while orb2db is running. However, in the real time system, it is useful to be able trim this output database by deleting rows representing older data. This trimming improves performance of other programs running against the database.

orb2db implements a simple mechanism for accomplishing this trimming without shutting down and restarting orb2db. A special file is created named after the output database: database.MSGFILE. The first 8 bytes of this file contain a flag which is set while a cleanup is performed, and an integer count of cleanups performed. orb2db tests the flag and the count at every packet; when the flag is set, or the count has changed, orb2db closes its open database, and waits for the flag to be reset. When the flag is reset, it reopens the output database, and finds the new record numbers for the records it is modifying.

The MSGFILE may be monitored or modified using the program orb2db_msg(1). Because there is no acknowledgement (from orb2db that it has paused), it is important to allow sufficient time for orb2db to stop itself, before beginning to modify the database. orb2db_msg sleeps for some period of time before returning, in an attempt to assure this time window. In addition, the time required to clean up the database must be considerably less than the time allowed by the ring buffer size, so that no data is lost.

Packet Reordering

For certain data loggers and transmission schemes, data packets may arrive in the orb out of time order. Duplicate packets may be present, and some packets may be lost completely. orb2db has an option to reorder these out of order packets and eliminate duplicates before writing waveforms to disk. This results in cleaner databases, with fewer wfdisc records and fewer overlaps.

When performing this reordering, orb2db keeps a stack of the data from multiple packets for each channel. The size of these stacks is dictated by the max_out_of_sequence parameter.

When max_out_of_sequence is zero, no reordering is performed, and almost every packet received is written to the disk. (The exception is at startup; at startup, orb2db tries to match the start time for each wfdisc record with the previous end time for that channel, eliminating any overlap).

When reordering packets, the max_out_of_sequence parameter dictates the size of each stack, and therefore the maximum number of packets which orb2db waits for a single late packet. (If there are multiple late packets, then it may wait longer, until the stack is full).

Using the max_out_of_sequence parameter significantly increases the overhead and memory usage of orb2db. Each channel of data typically requires a 4096 byte buffer for writing the waveforms, plus other overhead. Adding a stack adds

(4 bytes * #samples)  + some overhead
For example, for max_out_of_sequence=50, packets with 100 samples each, that's about 500 bytes * 50/stack, or 50 kbytes for each channel. Compare this with the original 5 kbytes per channel for standard processing.

There is also a startup cost. orb2db knows a packet is late when it sees a packet with a time later than it expects based on the previous packet. When there is no previous packet, it waits until the entire stack is full before concluding that no previous packet is coming. So, using the example above with a stack of 100 1 second packets, it would wait for 100 1 second packets before starting to write data.

In addition to arriving late, some packets may be duplicated. It's easy to detect a duplicate packet if its duplicate is still in the stack; oftentimes, however, the previous copy is already written to disk. orb2db does not keep old copies around. In addition, sometimes the data loggers clocks misbehave, and the packet times may then jump around: sometimes by a second or two, sometimes by days, months or years.

orb2db differentiates the following possibilities:

Startup Recovery

During startup, orb2db uses the wfdisc table as a state file, and tries to seamlessly start recording new waveforms at the last point written previously. Because the previous shutdown may have been untidy, and because of compression and buffering, the endpoints may be ragged. orb2db attempts to detect various error conditions like waveform files which don't match the wfdisc entry, and various kinds of errors in the wfdisc like impossible nsamp or foff values, or unlikely time ranges. If the time range is greater than the specified preferred_waveform_file_range in the parameter file, or (if too_new is specified) endtime is in the future by at least the parameter too_new, then the wfdisc record is ignored, but endtime is set to time - 1/samprate.

OPTIONS

PARAMETER FILE

Some kinds of problems can be quietly saved into the database tables changed, retransmit, ratechange and gap. The parameters below allow also printing error messages at regular intervals beyond a particular threshold error rate.

The following parameters affect whether more detailed information about packet anomalies is saved into database tables. The tables make more detailed analysis possible, but when transmissions are poor, or clocks wander a lot, the tables may grow unreasonably large unreasonably quickly. The default is not to write into the tables.

When a packet arrives out of time order, it is presumed to represent a retransmission of the packet. There are several categories of retransmission; recording each type may be individually suppressed.

Filesystem free space

The following parameters relate to a check for space available on the output waveform filesystem.

FILES

orb2db adds records to the wfdisc table, and optionally to the gaps, retransmit, ratechange, and changed tables. It also creates waveform files, following the defaults dictated by trwfname(3) and trdefaults.pf(5) or the command line argument. Waveform files corresponding to delayed packets are saved into files with the same path, but with "@" appended.

ENVIRONMENT

see antelopeenv(5)

EXAMPLE

Continuously collect data from the BHZ channel of station RDM from a ring buffer on XYZ to a database rdm.


% orb2db -m '.*RDM.*' -c '.*_RDM_BHZ' XYZ rdm

Collect data for the last 10 minutes from the BHZ channels of stations from a ring buffer on XYZ to a database cels.


% orb2db -c '.*_BHZ' XYZ cels '1996298 15:37' -0:10 0:10

DIAGNOSTICS

Fatal Errors

Generally, orb2db soldiers on in the face of errors; however, it gives up when writes or database updates start to fail.

Non-Fatal Errors

Command Line errors and Initialization errors

SEE ALSO

orbmonrtd(1)
orbstat(1)
orb2db_msg(1)

BUGS AND CAVEATS

Because orb2db keeps many files open, it's often necessary to raise the limit of open files; try unlimit descriptors

If the start and stop times are more granular than the packet size, the actual start and stop times vary from the specification; Data gaps can also cause start and stop times which vary somewhat from the spec.

orb2db does not detect any problem when given a source match which doesn't select any sources; it waits forever for a source matching the regular expression, even if there is an explicit ending time.

orb2db should be run so that its database and its waveform files are on local, not nfs-mounted partitions. Having the waveforms on an nfs-mounted partition is particularly troublesome when the partition fills up.

AUTHOR

Daniel Quinlan
Boulder Real Time Technologies, Inc.

Table of Contents
Antelope Release 4.10 Mac OS X 10.4.11 2008-05-02
Boulder Real Time Technologies, Inc For more information, contact support@brtt.com