• Antelope Release 5.2-64 Mac OS X 10.6.8 2012-04-24

 

NAME

cssconversion - converting between css3.0 and css3.1 databases

DESCRIPTION

In the 5.2 version of Antelope we are offering the option of using a modified version of the css3.0 schema, css3.1. This guide will help you to understand why we are offering a modified version of the css3.0 schema, how css3.1 is different from css3.0 and the procedures you need to follow to use the new css3.1 schema.

BACKGROUND

From the very first days of Antelope we decided to use the css3.0 schema as Antelope's relational database representation of seismic data. At the time Antelope came into being, there were very few other relational database representations of seismic data being used in the world. Also, css3.0 had been in operational use for many years in the nuclear monitoring community. css3.0 had evolved through a series of schema versions, all driven by the needs of operational systems. This meant that css3.0 was the most mature and tested relational database schema for processing and archiving seismic data.

One of the challenges we faced was the need to interface the SEED representations of seismic data with the css3.0 representations of seismic data. Note that SEED is not, nor has it every been, a relational database. It is possible to map a relational database schema to some of the SEED data structures, but generally this is difficult to do. Instead SEED is an exchange format with defined binary data structures and documentation of how the fields in these data structures should be used. Probably the biggest challenge we faced with interfacing SEED with the css3.0 schema was in the basic naming conventions used to identify channels of waveform data. css3.0 uses a naming convention where a single unique channel of waveform data is named with a combination of a station code (sta) and a channel code (chan). In the css3.0 schema, the sta code is not longer than 6 characters and the chan code is not longer than 8 characters. SEED uses a naming convention where a single channel of waveform data is named with a combination of four codes; a 2 character network code (snet), a 5 character station code (ssta), a 3 character channel code (schan) and an optional 2 character "location" code (sloc).

STATION AND CHANNEL NAME MAPPINGS

We had to be able to uniquely and reversibly map the seed snet, ssta, schan and sloc codes into css3.0 sta and chan codes. The channel part of the code was relatively straightforward to do; we mapped css3.0 chan = SEED schan[_sloc] (the [_sloc] means this part is optional and only used if the SEED data had a non-NULL sloc code). A typical example would be SEED schan = BHZ and no sloc code would result in a css3.0 chan code of BHZ. Another example would be a SEED schan = LHZ and sloc = 00 would result in a css3.0 chan code of LHZ_00. A css3.0 chan code of EHZ_01 would transform into a SEED schan code of EHZ and sloc code of 01. This is an unambiguous, unique and reversible way to transform SEED schan and sloc codes into css3.0 chan codes and vice versa. It works because the css3.0 chan code is large enough (8 characters) to accommodate any combinations of SEED schan and sloc codes, since the schan code is not more than 3 characters and the sloc is not more than 2 characters.

Unfortunately, we could not follow this same approach for mapping the SEED snet and ssta codes into a single css3.0 sta code. The problem is that the css3.0 sta attribute can be no longer than 6 characters. Since the SEED snet code is 2 characters and the SEED ssta code can be up to 5 characters, we would need sta to be 8 characters long to map a 2 character snet code and 5 character ssta code in the same manner as we did with the SEED schan and sloc codes. At the time we first did this, since ssta codes were "generally" unique by themselves and since most network operators were used to using a single sta code to represent their station location, we just dropped the snet code in the mapping of SEED to css3.0, i.e. for SEED snet, ssta the css3.0 sta code = ssta. This did not provide either a unique or reversible transformation. However, it usually worked in situations where it was not necessary to transform from css3.0 back to SEED. In the early Antelope code base, these mappings were hardwired.

As Antelope evolved it became apparent that we needed a more robust way to make the SEED to css3.0 mappings completely unique and reversible. We accomplished this by introducing two new tables into the css3.0 schema; snetsta, for mapping SEED snet and ssta codes into css3.0 sta code, and schanloc, for mapping SEED schan and sloc codes into css3.0 chan codes. We preserved the schan, sloc to chan code mappings as before, but these mappings are recorded in the schanloc database table and the mappings could be changed by the user if desired. We also preserved the snet, ssta to sta code mappings, but these mappings are now recorded in the snetsta table so that the reverse mapping from css3.0 sta to SEED snet, ssta is supported. Also, with the css3.0 snetsta table, we could check to see if a new SEED ssta code matched a previous code but with a different snet. In those cases it is possible to make a new unique css3.0 sta alias to preserve the different snet-ssta combination. All of this was typically handled through a new software library, the so called foreign keys library documented in foreign(3), that made up new aliases automatically according to a set of rules. Although this approach allowed for automatic mappings that were always unique and reversible within a single database, the aliasing of the css3.0 sta code did not always provide globally unique and reversible transformations across multiple databases. The best solution to this problem would be if the css3.0 sta code attribute was long enough to accommodate the same kind of mapping as we used with the css3.0 chan code.

DEVELOPMENT OF THE CSS3.1 SCHEMA

There are a number of other shortcomings in the css3.0 schema that became apparent since Antelope came into being. Like with the css3.0 sta code, many of these shortcomings fell within the category of css3.0 attributes that were too short or did not have enough precision. For instance, the css3.0 time attribute is precise to 10 microseconds whereas the double precision epoch time time precision is 1 microsecond. Other examples are the lat and lon attributes in the css3.0 schema which are only precise to about 11 meters. We identified a number of attributes, like these examples, for which a simple expansion of their sizes would improve the overall processing ability using the modified schema. This did not involve changing the naming of attributes or tables or the ways in which the tables would normally be used.

In the 5.2 release of Antelope we have included a new css3.1 schema definition. You can run 'dbhelp css3.1' to get a detailed listing of the changes. The sta attribute in css3.1 is 14 characters, more than enough to accommodate the same kind of globally unique and reversible mapping we have been using for the chan attribute. The chan attribute in css3.1 is 15 characters long. This provides a means for concatenating processing identification strings, such as for beam forming or filtering, onto the chan code of the raw data to uniquely represent processed data. There are many other changes as well that all involve expanding the css3.0 attribute sizes. The one place where we made a name change in css3.1 was that we added a new prefmag attribute to the event table. We use the css3.1 schema in the original 5.2 version of the GSN demo.

Why would you want to use css3.1? One reason is to make use of an unambiguous and globally unique and reversible mapping of SEED snet, sta to css3.1 sta codes. One possibility is to map SEED snet, ssta codes to css3.1 sta = SEED snet_ssta, in the same manner of the chan code mapping. This would result in css3.1 sta codes that look like II_PFO, for station PFO in network II, as an example. This makes it easy to form database queries that involved SEED network codes. For instance, using this mapping you could see all of the IU network stations with something like 'dbpick -sc IU_.* mydatabase'. This is a lot easier than doing the join of the site table with the snetsta table and sifting of the snet code. Also, in display programs like dbpick you can see immediately which network a particular station belongs to. Other places where css3.1 would be useful is in any situation where you need the more precise time stamps or locations and in situations where you want to archive processed data that has been tagged by appending processing strings onto the raw css3.1 chan codes.

USE OF EXPANDED STA NAME MAPPING IN CSS3.1

However, a serious problem with using a new expanded sta name mapping, as we described, is compatibility with any existing css3.0 databases. Since it is not possible to use the snet_ssta style sta name mapping in css3.0 because of the short sta field length in css3.0, switching over to the expanded name mapping as part of a switch to css3.1 insures that the css3.1 versions of sta codes will not match the css3.0 versions of sta codes. This means that any new css3.1 databases built using the expanded sta name mapping will not be compatible with your existing css3.0 databases. This can obviously cause a number of problems.

One simple way to preserve compatibility across css3.0 and css3.1 databases is to stick to the original sta name mapping definitions in your css3.1 databases. In order to preserve compatibility, the Antelope code base by default will use the old sta name mappings in css3.1 databases. If you really want to switch to the new expanded sta name mapping, then it is advisable to convert all of your existing css3.0 databases to css3.1 databases while simultaneously converting the sta name mapping. This can be done with cssconvert(1), as we will demonstrate.

PLANNING FOR USING CSS3.1

Hopefully this has given you an understanding of the original problem, why we developed the css3.1 schema and why you would want to use it. However, in order to switch over an existing css3.0 based system to css3.1 or if you want to start from scratch with css3.1, there are a number of decisions you will need to make and steps that must be followed in order for your endeavors to be successful. Lets get to that now.

USING THE EXPANDED STA NAME MAPPINGS WITH CSS3.1

Probably the single most important decision you must make is which SEED to css3.1 name mappings you will use. Although the expanded sta name mapping is desirable for the reasons we have given, this name mapping will insure that your css3.1 databases will have sta codes that are incompatible with any css3.0 databases that you already have or plan to import. Note that generic conversion of a css3.0 database to css3.1, for instance with something like dbconvert(1), will not also convert the SEED to css3.1 name mappings. Therefore, if you use the expanded sta name mapping in your other css3.1 databases, a simple conversion of a css3.0 database to css3.1 will not be compatible with your other css3.1 databases. However, there is a special program designed for conversion between css3.0 and css3.1 databases, cssconvert(1), that will also do the name mapping conversion on-the-fly. You can convert a css3.0 database with the old sta name mapping to a css3.1 database with the expanded sta name mapping by specifying the -use_expanded_sta_mapping option in the command line. Note that cssconvert(1) will require that the input css3.0 database have a snetsta table with entries for each unique sta value in the database. This is the only way that cssconvert(1) can determine the SEED snet code for generating the output css3.1 expanded sta codes. You can see an example of doing this with the css3.0 version of the GSN demo dbmaster database in the cssconvert(1) man page.

If you decide to use the expanded sta name mapping for your css3.1 databases, you will also need to consider how new SEED channels of data get included into your css3.1 databases. We define a new SEED channel as one that is not already in the database and therefore not referenced in the snetsta table. This can happen in a real time system when new data channels appear in one or more data streams. When a new SEED snet-ssta combination is seen, the Antelope foreign key routines will automatically generate a new entry in the snetsta table for the new data. the name mapping that Antelope will use is defined in the trdefaults.pf parameter file (see trdefaults(5)). We have preserved the default behavior to be that of the original old name mapping. Therefore, if you are acquiring new unseen data into your css3.1 system which already has sta codes generated using the expanded sta code mapping, and you have not changed trdefaults.pf from the default version, the new sta codes will use the old style sta code mapping. In order to avoid this behavior, you will need to modify the seed_net_sta table in trdefaults.pf to look like this:



seed_net_sta   &Tbl{
# pattern          substitution
# net_fsta        sta
# The following pattern maps SEED snet, ssta to css sta = snet_ssta
(.*)_(.*)           $1_$2
}

You should put this modified trdefaults.pf file in the pf directory of your rtexec instances. We do not recommend changing the trdefaults.pf file in $ANTELOPE/data/pf since that would make it difficult to acquire and write out new css3.0 databases throughout your system.

USING THE OLD STA NAME MAPPINGS WITH CSS3.1

Although the expanded sta name mappings are desirable in terms of database clarity and utility, because of the inherent incompatibility between css3.0 and css3.1 when using the expanded sta name mapping, it may be simpler and less prone to mistakes if the original old sta name mapping is used in any new css3.1 database you make. If you stick with the old style sta name mapping, there should be no inherent incompatibilities between your css3.0 and css3.1 databases. You can easily convert a css3.0 to css3.1 database using the old sta name mapping by running either dbconvert(1) or cssconvert(1). If you want to stick to the old sta name mapping you will not need to modify trdefaults.pf.

SCHEMA INTEROPERABILITY

As we have stated repeatedly, css3.0 will not be generally interoperable with css3.1, except in a few cases like when running cssconvert(1), if you use the expanded sta name mapping in css3.1. Exceptions to this are in cases where the sta field is not used, such as with external event databases that do not have station-dependent information, such as arrivals or station magnitudes. If your css3.1 databases use the old style sta name mappings, then for the most part your css3.0 and css3.1 databases will be interoperable.

Aside from the sta name mapping issues, many of the real-time Antelope software modules have been modified to be able to switch database schemas on-the-fly. For instance orb2dbt(1) can take css3.0 external origin rows, generated by dborigin2orb(1), and properly merge them into a css3.1 output database. In this example sta name mapping is not important because the sta field is not in the origin rows produced by dborigin2orb(1).

OTHER CONSIDERATIONS

If you decide to switch to the new expanded sta name mapping, all of your css3.1 sta codes will look different from what they were previously in your css3.0 databases. This means you will have to find everyplace where you had to specify a css sta code in the various parameter files in your real-time processing systems and modify these to reflect the new sta codes. An example is ttgrid.pf.

You should get into the habit of always using database descriptor files instead of depending on the default behavior of assuming databases are of schema css3.0 (we did not change this default behavior). Programs like seed2db(1) will write out css3.1 databases as long as you provide a database descriptor file for for the output database. You can generate css3.1 metadatabases from dataless SEED data with either sta name mappings, depending on trdefaults.pf, by running seed2db(1) with a preexisting output database file descriptor that specifies the css3.1 schema.

If you are using the expanded sta name mapping, it is important that you use css3.1 as the schema for the temporary databases create by orbassoc(1). This is done by the schema parameter in orbassoc.pf.

SEE ALSO

cssconvert(1)
foreign(3)
trdefaults(5)

AUTHOR

Danny Harvey
Boulder Real Time Technologies, Inc.
Printer icon