Run rtsnapshot to collect a variety of general information. Then use your knowledge of the problem to collect more specific data. Try to save this information into a file that is an exact recording of what you try and what the result is. Look at script(1) for one way of accomplishing this; otherwise, direct output of various commands to files. See the section DEBUGGING RUNNING PROCESSES for some suggestions for looking at a problem process.
However, if the problem is due to resources -- i.e., disks full or not enough memory -- you should shut down the whole system, fix the resource problem, and then restart. For disk full problems, this may mean cleaning up old data files, perhaps with rtdbclean, or excess log files, using truncate_log. If the problem is memory, you might try adding swap space. The better solution in this case is to add physical memory, however.
If you can isolate the problem to a situation which is reproducible, you are well on the way to solving the problem. If the program is in a BRTT program, you're probably ready to submit a bug report. See bugs(5).
It's essential to be able to reproduce the problem, preferably in much simpler circumstances. BRTT is very unlikely to be able to help if you can't get to this point. And whether the problem is in Antelope or some local programs or configuration, you can't be certain you've fixed the problem unless you have a way of reproducing it. Blind fixes are difficult to verify. Maybe other circumstances have caused the problem to disappear, or maybe you've just succeeded in making some problem less frequent.
Sometimes, processes do not manage their memory correctly, and their memory usage continuously grows. This is a big problem for programs which run all the time. If you notice that a program is always using more memory, you need to fix it (or get it fixed by BRTT if it's an Antelope program).
Linux has something similar to truss named strace. ldd on the executable gives similar information to pldd. I'm not aware of anything comparable to pstack. You can inspect the /proc/<pid> filesystem for some information.
MacOS X has none of the /proc tools, and nothing like truss/strace or pstack. (However, newer releases have the DTRACE tools.
On any of the architectures, you can use gdb (or some other debugger) to attach to a running program and step through it by hand. This is less likely to be useful, however.
You should be familiar with the interactive mode of orbstat. Using this tool, you can frequently bore down into the nitty gritty details of a problem -- at least for processes which use the orb. Run orbstat on the orbserver which your process connects to, select the packet(s) of interest with select and reject, move to the general time of the problem with after, and then use commands like + and - to inspect the packets. Configure the level of packet detail shown with the commands terse, peek, hdr or unstuff or dump.
Once you find the right packets, you can save them to a file with save and reap. Here's an example:
Now you should be able to run the problem program against these packets, and perhaps reproduce the problem:% orbstat -i :your-port > select TA_Q12A.*/M1 1 sources selected > [ #40277115 'TA_Q12A/MGENC/M1': 4/02/2007 (092) 7:32:44.000 : 259 bytes > ] #40274800 'TA_Q12A/MGENC/M1': 4/04/2007 (094) 22:23:40.000 : 259 bytes > after 4/3/2007 15:35:10 seeking to 4/03/2007 (093) 15:35:10.000 new pktid is 8430060 > peek > . #8430060 'TA_Q12A/MGENC/M1': 4/03/2007 (093) 15:35:11.000 : 256 bytes 0 : TA Q12A LHZ 1.000/s calib= 1.5894587 calper=-1.000 segtype=V 10 samps Tue 2007-093 Apr 03 15:35:11.00000 - 15:35:21.00000 740 922 959 795 621 466 462 746 975 968 1 : TA Q12A LHN 1.000/s calib= 1.5894587 calper=-1.000 segtype=V 10 samps Tue 2007-093 Apr 03 15:35:11.00000 - 15:35:21.00000 -174 -152 -154 -91 -23 -14 -123 -276 -220 -68 2 : TA Q12A LHE 1.000/s calib= 1.5894587 calper=-1.000 segtype=V 10 samps Tue 2007-093 Apr 03 15:35:11.00000 - 15:35:21.00000 126 153 237 207 150 118 105 184 230 181 > hdr > . #8430060 'TA_Q12A/MGENC/M1': 4/03/2007 (093) 15:35:11.000 : 256 bytes 0 : TA Q12A LHZ 1.000/s calib= 1.5894587 calper=-1.000 segtype=V 10 samps Tue 2007-093 Apr 03 15:35:11.00000 - 15:35:21.00000 1 : TA Q12A LHN 1.000/s calib= 1.5894587 calper=-1.000 segtype=V 10 samps Tue 2007-093 Apr 03 15:35:11.00000 - 15:35:21.00000 2 : TA Q12A LHE 1.000/s calib= 1.5894587 calper=-1.000 segtype=V 10 samps Tue 2007-093 Apr 03 15:35:11.00000 - 15:35:21.00000 > terse > save /tmp/packets saving packets to '/tmp/packets' > reap -n 10 <Enter any character to stop> #8433186 'TA_Q12A/MGENC/M1': 4/03/2007 (093) 15:35:21.000 : 258 bytes #8436406 'TA_Q12A/MGENC/M1': 4/03/2007 (093) 15:35:31.000 : 259 bytes #8439331 'TA_Q12A/MGENC/M1': 4/03/2007 (093) 15:35:41.000 : 258 bytes #8442657 'TA_Q12A/MGENC/M1': 4/03/2007 (093) 15:35:51.000 : 259 bytes #8445693 'TA_Q12A/MGENC/M1': 4/03/2007 (093) 15:36:01.000 : 258 bytes #8448729 'TA_Q12A/MGENC/M1': 4/03/2007 (093) 15:36:11.000 : 256 bytes #8451706 'TA_Q12A/MGENC/M1': 4/03/2007 (093) 15:36:21.000 : 256 bytes #8455134 'TA_Q12A/MGENC/M1': 4/03/2007 (093) 15:36:31.000 : 259 bytes #8457885 'TA_Q12A/MGENC/M1': 4/03/2007 (093) 15:36:41.000 : 258 bytes #8461239 'TA_Q12A/MGENC/M1': 4/03/2007 (093) 15:36:51.000 : 256 bytes reaped 10 packets > <^D> %
% program /tmp/packets other-arguments
Read the man pages for more information on these parameter files: elog(3), trdefaults.pf(5), site.pf(5).
Sometimes a local parameter file or a parameter file that is seen because of your PFPATH environment variable can cause surprising behavior.
The setup files setup.csh and setup.sh look for environment variables which may cause execution problems. If they complain, you should change your environment until they don't.
bugs(5) reporting(5) http://catb.org/~esr/faqs/smart-questions.html rtsnapshot(1) dbsnapshot(1) rtincident(1) orbstat(1)
Please bear in mind that a complete description of the problem, including an example of how to generate it, what you expected and what you actually saw, and any error messages generated is essential in order to diagnose and fix problems. See bugs(5) for further suggestions on how to report problems.
Do not send problem reports to individual email addresses at BRTT: they will not be answered. To receive a response, only use support@brtt.com. Requests sent to support@brtt.com are read by multiple people and will be responded to in a timely manner.