Wednesday, February 27, 2013

Solaris core dump recovery

A dump is the image of kernel at the time of issue. Sometimes it is generated intentionally by user or may be kernel itself decide to panic and generate dump.

Dump is very important for root cause analysis ( RCA) to figure out what was wrong with the Operating System.

Most important aspect is the size of dump. The maximum size of the dump can reach is equal to size of physical memory or RAM. But most of the time it will be smaller than that.
So we need some space to save the dump on filesystem. Incase of insufficient space core dump may fail to get save after the panic.

No dump available: Jan 20 01:37:41 <hostname> savecore: [ID 976488 auth.error] not enough space in /var/ak/core (0 MB avail, 10223 MB needed)

If you are using ZFS filesystem you can increase the quota of dump device as:

# dumpadm
 Dump content: kernel and current process pages
 Dump device: /dev/zvol/dsk/system/dump (dedicated) <------ dump device
 Savecore directory: /var/crash    <----- directory where core will be saved 
 Savecore enabled: yes
 Save compressed: on


# zfs list |egrep "core|dump"
system/cores                                       48.0G      0  48.0G  legacy
system/dump                                        12.0G   264G  12.0G  -


We can see above that cores directory is already filled up with old dump files and eating upto 48G. So i will delete them and get some space.
If still space is insufficient then i will increase the quota of system/cores as:
# zfs set quota=100GB system/cores


You can recover the core dump image by doing the following as soon as the system reboots after panic:
1. Increase the filesystem size or find some other location to save dump
2. run command
# savecore -v              //incase you want to save dump on same location configured by "dumpadm"
OR
# savecore -dv <new_location>   //save dump to new location


No comments:

Post a Comment