NAME
trexcerpt - extract waveform segments from a database
SYNOPSIS
trexcerpt
[-AaDdEegOpvV]
[-c sta/chan/instrument-subset]
[-j origin/site_subset]
[-m {time|convert|event|arrival|explicit}]
[-o datatype]
[-s event-subset]
[-w wfname]
[-W waveform_database]
input-db output_db [start_time {end_time|duration}]
DESCRIPTION
It's often useful to create a small database with a restricted
set of waveforms, often from a continuous database. This new database
may be smaller, more focused, have a special output format (e.g., sac),
or be useful for export to another site.
trexcerpt attempts to automate
this process in a flexible but convenient fashion.
Modes
trexcerpt has five basic modes of operation, which determine
how the waveform segments of interest are selected, and the interpretation
of the
start_time and
end_time command line parameters:
-
time
In the (default) time mode, the time must be specified explicitly
on the command line.
-
convert
This convert mode is intended solely for converting waveforms from one
datatype to another. The output waveform
segmentation (i.e., how the continuous waveform is divided into segments)
is the same as the input segmentation.
-
event
In event mode, the waveform segments are selected relative to
events/origins. Typically the segments encompass some time frame around the
predicted p-arrival time.
-
arrival
In arrival mode, the picks in a database are used to select the
waveform segments. For instance, it might be interesting to
prepare a database containing one minute of data around every
p-pick.
-
explicit
In explicit mode, trexcerpt expects to read a
table containing sta, chan, time and endtime. These values are
taken to specify the desired waveform segments, presumably calculated by
other means. Any suitable table may be used: wfdisc, wfmeas, wftar, and
sensor are all possibilities. This explicit table is often
in a different database, so that the -W option must be used to specify
the database containing the waveforms.
Process
Conceptually, there are three views which trexcerpt reads or constructs:
-
1) selection view
This depends on the mode, and may optionally be read in rather
than constructed. In time, convert, and explicit modes,
the selection view is the same
as the channel selection view, but in event and arrival modes,
it is a view containing origin or arrival respectively. If a
subset is specified with -s, it is applied to this view.
-
2) channel selection view
This view contains all the station/channel pairs which are of
interest in the output. Only station/channel pairs which occur
in this view appear in the output.
-
3) waveform view
This view includes the waveform table (usually wfdisc), and possibly
other tables which are useful when generating output formats
which embed parameter information in the waveform files.
The process of generating the output consists of the following steps:
-
1)
Read the selection view.
-
2)
Optionally, restrict the interesting time ranges by one of the
following methods:
-
time
specify the time window explicitly
-
event
choose particular events or origins
-
arrival
choose particular arrivals
-
view
use a view specified in the parameter file
-
3)
Based on steps 1 and 2, select the actual data from the available data
in the database.
-
4)
Optionally, attempt to create corresponding event and station tables
to accompany the wfdisc table output.
OPTIONS
-
-A
autodrm(1) output: use cm6 output to a single file; use the input view
to select the channels of interest only.
-
-a
append to existing waveform files if present; this is primarily useful
when condensing a bunch of separate waveform files into a single
(larger) file. This can be helpful for some mass storage systems.
-
-d
write a complete database with site, sensor, sitechan, and instrument
-
-D
use dumb mode, which requires only the wfdisc file and does
not look at other tables. This avoids problems of missing channels
which can be encountered with incomplete databases.
Using this option causes the disclaimer from trexcerpt.pf to be printed,
unless the -E "expert" option is also specified.
-
-e
include the correct subset of event, origin, origerr, arrival,
and assoc in the output database.
-
-E
expert mode: don't check the database for problems in site, sensor,
sitechan and instrument tables.
-
-g
eliminate marked data gaps, i.e. data gaps where the waveform has been filled in
with a specific unlikely or impossible value (has no effect in the convert mode).
-
-j origin-site_subset
If specified, only waveforms from stations which satisfy the specified
condition on a join of origin and site are saved to the output database.
This is useful for restricting the output to stations within a certain
range of the event.
-
-m mode
Select a selection method from the possibilities in the
parameter file. The default method is time; other standard possibilities
include event, arrival, convert, and explicit.
-
-O
Don't attempt speed optimization of direct cp copy of waveform files; follow usual path of
reading and uncompressing waveforms, then recompressing and writing out new waveform files.
-
-o datatype
Specify the output waveform datatype, e.g. sd, as, s4, sc, s2, t4.
Run dbhelp css3.0 and look up datatype in the wfdisc table for a complete list.
-
-p
In arrival mode, save waveforms only for the channel on which the pick was made.
-
-s subset
restrict selection view according to the Datascope expression subset
-
-c subset
restrict channel selection view according to the Datascope expression subset
-
-w wfname
specify the naming convention for the output waveforms, according to
trwfname(3).
The default pattern %Y/%j/%{sta}.%{chan}.%Y.%j.%H.%M.%S
results in names of the form:
1996/296/GLAC.BHZ.1996.296.01.00.10
A useful special case is for the final character of the pattern
to be '%'. Then the entire default pattern above replaces that
trailing percent sign; this makes it simple to specify a special top
level directory under which the default hierarchy
resides.
In arrival mode, trexcerpt replaces the pattern '%{arid}' with
the specific arid selecting that waveform.
In event mode, '%{evid}' and '%{arid}' are replaced with the
relevant evid and orid, respectively.
-
-W waveform-database
In event mode,
it's often useful to specify an origin or event table
from a different database than the waveform database; this parameter
allows specifying a different waveform database.
Similarly in explicit mode, the table used to specify station, channel,
and time ranges might be from a different database.
(In explicit mode, any table containing sta, chan, time and endtime
could be used to specify the required time ranges. wftar or wfmeas might be
good choices, since they are used less frequently.)
-
-v
Show details about what channels are excerpted.
-
-vv
Show the results from the evaluation of various views; this
can be useful when trying to understand why there is no output,
or the output doesn't contain certain stations or channels.
-
-vvv
Show more detail, particularly about use of cp to copy waveforms.
PARAMETER FILE
The
trexcerpt parameter file is fairly complex, so it
should be edited with caution. It is primarily a method
of configuring the program, rather than a means to
specify a single run.
Many entries specify a list of views, rather than a
single view. This is an attempt to accommodate
more and less complete databases. The first view which
when evaluated has at least one row is used.
-
modes
Each entry in this array may specify a few default values; these
values would be overridden by command line arguments.
The wfname value specifies a default format for the output
waveform files.
The start and stop values are default expressions used when
no command line values are provided.
For event and arrival modes, the view from which events or
arrivals are selected are specified by the view entry,
For event and arrival modes,
the view from which the event-side output tables
are constructed are specified by the event_join table.
For other modes, the event_database entry is used for this purpose.
-
waveform_view
trexcerpt loads waveforms from a view created by
one of the specifications in this list.
A join containing site,
sitechan, sensor and instrument facilitates propagating
fields from these tables into the output, and is required
for some output types like autodrm(1) and sac(1).
-
channel_selection
Following are the options for a view from which the
desired station/channels are selected. This view is
subset by the -c option, and the -s option in time and
convert mode.
-
station_database
The output station side of a database is constructed from
this view.
-
event_database
The output event side of a database is constructed from this
view for time and convert modes; in arrival and event modes,
the event_join views specified under modes are used instead.
-
per_wfdisc_row
Show these fields for each output waveform when the -vv option is set.
-
per_event
In event mode, show these fields for each event when the -vv option is set.
-
per_arrival
In arrival mode, show these fields for each arrival when the -vv option is set.
-
event_leadtime
As an optimization, the input event view is subset according to the time range of
input waveforms, with an extra event_leadtime seconds before the first waveform time.
-
default_output_table
Usually this is wfdisc, but some different table with the same fields
could be specified.
-
conflicts_database
Output database records (other than wfdisc records) which
conflict with the existing output database are saved to
the conflicts database. This might facilitate later editing by
hand.
-
state_of_health_channels
list of channels which don't have sensor rows by design, to suppress
complaints about this.
EXAMPLE
event mode
-
get 10 seconds of data around the predicted p-arrival for every
event in input-db, saving the waveforms into the output-db.
% trexcerpt -m event -v -w '%{orid}/%' dbin dbout \
'parrival()-5' 10
-
get 5 seconds of data starting at the predicted p-arrival for every event
in the reb catalog from the dbwf database, saving the results into out.
Get only the Z channels, and save the results as s4 data in directories
named after the origin id.
% trexcerpt -m event -vvo s4 -c "chan=~/.*Z/" \
-aw '%{orid}/e' -W dbwf reb out 'parrival()' 5
-
Get z-channel data for the origin with id=645; extract the relevant
rows from the event (event, origin, arrival, assoc, etc) and
station (site, sitechan, sensor, instrument) sides of the database also.
% trexcerpt -m event -c 'chan=~/..Z/' -s 'orid==645' -de \
dbin dbout
-
Excerpt data for stations greater than 150 degrees away from the event:
% trexcerpt -m event -de
-j 'distance(lat,lon,site.lat,site.lon)>150' in out
arrival mode
-
excerpt data around each arrival; second example shows
one way to specify the time range around the arrival.
Modifying trexcerpt.pf is another.
% trexcerpt -m arrival -vv -w '%{arid}/%' in out
% trexcerpt -vv -m arrival in out time-1 time+2
4850 5/17/1992 21:55:46.350 P AAK BHZ
AAK BHE 5/17/1992 21:55:45.400 2.900 seconds ..
AAK BHZ 5/17/1992 21:55:45.400 2.900 seconds ..
AAK BHN 5/17/1992 21:55:45.400 2.900 seconds ..
4851 5/17/1992 21:55:51.350 P KBK BHZ
KBK BHZ 5/17/1992 21:55:50.400 2.900 seconds ..
KBK BHE 5/17/1992 21:55:50.400 2.900 seconds ..
KBK BHN 5/17/1992 21:55:50.400 2.900 seconds ..
-
excerpt data only for the pick channel:
% trexcerpt -m arrival -s 'chan!="BHZ"' -p -vv $db arrivals
modes{arrival{view}} : view #0
## dbopen arrival => 12 records
## dbsubset iphase!="del" => 10 records
waveform_view : view #0
## dbopen sensor => 39 records
## dbjoin sitechan => 39 records
## dbjoin site => 39 records
## dbjoin -o instrument => 39 records
## dbleftjoin wfdisc sta chan time::endtime => 18 records
channel_selection : view #0
## dbopen sensor => 39 records
## dbjoin sitechan => 39 records
## dbjoin site => 39 records
## dbjoin -o instrument => 39 records
4855 5/17/1992 21:56:15.340 S AAK BHN
AAK BHN 5/17/1992 21:56:10.350 14.950 seconds 300 20.0000000 0 sd
4856 5/17/1992 21:56:24.434 S KBK BHN
KBK BHN 5/17/1992 21:56:19.450 14.950 seconds 300 20.0000000 0 sd
4857 5/17/1992 21:56:27.289 S CHM BHN
CHM BHN 5/17/1992 21:56:22.300 14.950 seconds 300 20.0000000 0 sd
4858 5/17/1992 21:56:28.129 S USP BHE
USP BHE 5/17/1992 21:56:23.150 14.950 seconds 300 20.0000000 0 sd
-
excerpt data only for selected channels:
% trexcerpt -m arrival -s 'chan!="BHZ"' \
-c "chan=='BHZ'" -vv $db arrivals
modes{arrival{view}} : view #0
## dbopen arrival => 12 records
## dbsubset iphase!="del" => 10 records
waveform_view : view #0
## dbopen sensor => 39 records
## dbjoin sitechan => 39 records
## dbjoin site => 39 records
## dbjoin -o instrument => 39 records
## dbleftjoin wfdisc sta chan time::endtime => 18 records
channel_selection : view #0
## dbopen sensor => 39 records
## dbjoin sitechan => 39 records
## dbjoin site => 39 records
## dbjoin -o instrument => 39 records
4855 5/17/1992 21:56:15.340 S AAK BHN
AAK BHZ 5/17/1992 21:56:10.350 14.950 seconds 300 20.0000000 0 sd
4856 5/17/1992 21:56:24.434 S KBK BHN
KBK BHZ 5/17/1992 21:56:19.450 14.950 seconds 300 20.0000000 0 sd
4857 5/17/1992 21:56:27.289 S CHM BHN
CHM BHZ 5/17/1992 21:56:22.300 14.950 seconds 300 20.0000000 0 sd
4858 5/17/1992 21:56:28.129 S USP BHE
USP BHZ 5/17/1992 21:56:23.150 14.950 seconds 300 20.0000000 0 sd
convert mode
-
Excerpt all data for station AAK, channel BHZ, converting the waveforms
to sac format. Note that this attempts to fill in event and arrival
information if the input database contains it; however, it's only guessing
about what event and arrivals are relevant.
% trexcerpt -o sc -vde -m convert \
-c "sta=='AAK' && chan=='BHZ'" dbin dbout
-
get 25 seconds of autodrm(1) style waveforms starting at 1992-138 21:55:15
% trexcerpt -A demo db "1992-138 21:55:15" 0:00:25
explicit mode
-
Use the times from a temporarily constructed table chop.wfmeas to
excerpt explicit waveform segments from the dbwf database.
Each row in chop.wfmeas specifies a station, channel, and time range
to excerpt.
% trexcerpt -o as -m explicit -W dbwf -v -d chop.wfmeas out
RETURN VALUES
trexcerpt returns 0 for success, 2 if some errors occurred; fatal errors
should return 1.
SEE ALSO
dbwfexcerpt_dep(1)
BUGS AND CAVEATS
trexcerpt requires a complete and correct input database to
figure out the stations and channels to copy to
the output. If the input database is missing stations or channels
in the site, sitechan or sensor table, then these channels
(somewhat mysteriously) do
not appear in the output, even if they are present in the wfdisc.
The css3.0 schema limits the number of samples which can be represented
in a single wfdisc record: the nsamp field is 8 characters wide, so the
maximum number of samples is 99,999,999. At 200 samples/sec, this is
close to six days. The typical size of a waveform file is 1 day.
The user must not request more than the maximum number of samples in the
output records. If the limit is exceeded, trexcerpt complains and does
not create the requested output.
When creating the station and event sides of the output database,
trexcerpt does not copy external files, specifically the response files.
When using the -j option with the -d or -e options, while the waveforms
are the appropriate subset, the event and station sides of the database
are likely to be larger than required, as the -j subset is not
applied to the event and station views used to construct the output.
When creating miniseed files, the net and loc codes are filled in
using the routines seed_net(3) and seed_loc(3). See trdefaults.pf(5)
for more information about how these routines use the foreignkeys database.
When writing sac format data,
trexcerpt attempts to fill in event data by looking for an arrival
at the station in the waveform segment being written, and using that
arrival to look up origin information (through the assoc table).
Thus origin information is not saved into the waveforms when there
is no picked arrival in the data, even in event mode.
Some additional information is printed when writing sac data if a file
named verbose exists in the current directory, showing what arrival matches were
found for particular data segments.
For a plain waveform copy, where the input parameters (time, endtime, datatype)
match the output parameters, trexcerpt attempts to just use cp(1) instead of
reading and uncompressing the waveform, and then recompressing and writing
the waveform. This improves the speed dramatically, but prevents processing
to eliminate marked gaps. Thus if -g is specified, this optimization is omitted.
It could also potentially copy a much larger file than
necessary, if the waveform file were referenced by multiple wfdisc records.
AUTHOR
Daniel Quinlan