Introduction
Engineers working in software maintenance know, how the life would look like, if they got a core in site, without much debug symbols enabled. The least possible information that can be obtained from release mode corefile are, just 3.
- Function name
- Function offset
- The coring instruction
Thankfully, in Solaris, there is a way for the developer to really understand, which C / C++ statement has triggered the core.
Background
In case of customer escalations, the first and foremost thing every developer should do is to find the RC and update the end customer that, "we are working on it". In order to get the RC, one should understand what had happened and where exactly the core has occurred, w.r.t C / C++.
Well, at least in Solaris, we are not left alone.
Solaris provides a wonderful tool named er_src.
er_src - print source or dissasembly with index lines and interleaved compiler commentary ...
More information can be obtained from official man page.
Using the Code
Let's take an example, to understand the power of er_src
.
Consider a code snippet as follows:
using namespace std;
int getSquare(int *pnum);
int main()
{
int num, *pnum=NULL;
cout << "This function generates square" << endl;
cout << "Enter the number which you want to find the square" << endl;
cin >> num;
cout << "The square is " << getSquare(pnum) << endl;
return 0;
}
int getSquare(int *pnum)
{
cout << "Entering getSquare" << endl;
int num = *pnum;
return (num * num);
}
As you can probably guess, there would be a segmentation fault
in the line 18 [ the pointer is still NULL]
. Assume this executable is in release mode and there is a core in customer site. When you get the core from site and try to open it, all you would get is this:
$dbx test core
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.6' in your .dbxrc
Reading test
core file header read successfully
Reading ld.so.1
Reading libCstd.so.1
Reading libCrun.so.1
Reading libm.so.2
Reading libc.so.1
Reading libCstd_isa.so.1
Reading libc_psr.so.1
WARNING!!
A loadobject was found with an unexpected checksum value.
See `help core mismatch' for details, and run `proc -map'
to see what checksum values were expected and found.
dbx: warning: Some symbolic information might be incorrect.
program terminated by signal SEGV (no mapping at the fault address)
0x000111ec: getSquare+0x002c: ld [%i0], %i5
At this point, you cannot make sense with what is meant by:
0x000111ec: getSquare+0x002c: ld [%i0], %i5
Here is where, er_src
gives a helping hand. If you look closely, you can understand that some instruction which is at offset 2c
from getSquare
has caused the process to core. Now, you make a debug executable of the same process and run the er_src
against it, as follows:
$er_src -disasm all -1 test > disasm.txt
Now, look at the resulting disasm.txt.
82 16. int getSquare(int *pnum)
83 <Function: getSquare(int*)>
84 [ 16] 11278: save %sp, -104, %sp
85 [ 16] 1127c: st %i0, [%fp + 68]
86 17. {
87 18. cout << "Entering getSquare" << endl;
88 [ 18] 11280: sethi %hi(0x21800), %l0
89 [ 18] 11284: bset 0, %l0 ! 0x21800
90 [ 18] 11288: sethi %hi(0x11400), %l1
91 [ 18] 1128c: bset 385, %l1 ! 0x11581
92 [ 18] 11290: or %l0, %g0, %o0
93 [ 18] 11294: call std::operator<<
(std::basic_ostream<char,std::char_traits<char> >&,const char* ) ! 0x215ec
94 [ 18] 11298: or %l1, %g0, %o1
95 [ 18] 1129c: sethi %hi(0x11000), %l0
96 [ 18] 112a0: bset 840, %l0 ! 0x11348
97 [ 18] 112a4: call std::basic_ostream<char,
std::char_traits<char> >::operator<<(std::basic_ostre am<char,
std::char_traits<char> >&(*)(std::basic_ostream<char,std::char_traits<char> >&)) ! 0x215f8
98 [ 18] 112a8: or %l0, %g0, %o1
99 19. int num = *pnum;
100 [ 19] 112ac: ld [%fp + 68], %l0
101 [ 19] 112b0: ld [%l0], %l0
102 [ 19] 112b4: st %l0, [%fp - 8]
103 20. return (num * num);
104 [ 20] 112b8: ld [%fp - 8], %l0
105 [ 20] 112bc: smul %l0, %l0, %l0
106 [ 20] 112c0: st %l0, [%fp - 4]
107 21. }
108 [ 21] 112c4: ld [%fp - 4], %l0
109 [ 21] 112c8: or %l0, %g0, %i0
110 [ 21] 112cc: ret
The disassemble for the function getSquare
has started @ address 1127c
.
And we know from the core that, the coring instruction was placed at offset 2c from the getSquare
.
So 1127c + 2c = 112a8
. Aha ... we now got the exact coring instruction ... that too in the source.
Now let's apply our learning on the dissassembly. Here we go.
[ 18] 112a8: or %l0, %g0, %o1
19. int num = *pnum
Of course, it's not an exact match, due to extra offsets in debug mode. But nevertheless, it still helps to narrow down.
Points of Interest
Hope now you understood the power of er_src
. It personally helped me lot, to crack lot of puzzles in Solaris cores. Hence, I thought of sharing the same with all of you.
Happy coding and debugging!
History
- 11th September, 2014: Version 1