Thursday 23 October 2014

Core file administration


This is the defaults:
     global core file pattern:
     global core file content: default
     init core file pattern: core
     init core file content: default
     global core dumps: disabled
     per-process core dumps: enabled
     global setid core dumps: disabled
     per-process setid core dumps: disabled
     global core dump logging: disabled

This is what I like to set it to:
     global core file pattern: /var/cores/core_%n_%f_%u_%g_%t_%p
     global core file content: default
     init core file pattern: core
     init core file content: default
     global core dumps: enabled
     per-process core dumps: disabled
     global setid core dumps: enabled
     per-process setid core dumps: disabled
     global core dump logging: enabled
Reasoning:
1. I don't like my core files all over the place
2. Easier to find and clean up.
3. I only enable per-process core file dump when a user needs it (on a zone)

VALUE HOW TO SET IT
global core file pattern Specifies name and location of the global core files (A leading "/" specifes an absolute path) coreadm -g /var/cores/core_%n_%f_%u_%g_%t_%p
global core file content Defines content of global files
init core file pattern Specifies name and location of the global per-process files (No leading "/" makes it relative to working directory) coreadm -p core
init core file content Defines content of per-process files
global core dumps Enable/disables global core-dumps coreadm -e global
per-process core dumps Enable/disables per-process core-dumps coreadm -d process
global setid core dumps Enable/disables global setid core-dumps coreadm -e global-setid
per-process setid core dumps Enable/disables per-process setid core-dumps coreadm -d proc-setid
global core dump logging Enable/disables logging to syslog coreadm -e log

Wednesday 2 July 2014

Installing ASR

Auto Service Request (ASR) is a secure, scalable, customer-installable software feature of Oracle Premier Support for Systems and Oracle/Sun Limited Warranty support that provides auto-case generation when specific hardware faults occur. ASR is designed to enable faster problem resolution by eliminating the need to initiate contact with Oracle for hardware failures, reducing both the number of phone calls needed and overall phone time required. ASR also simplifies support operations by utilizing electronic diagnostic data. If your qualified system is under warranty or covered by a service plan, you are entitled to ASR at no charge.
 I've been using ASR on my SUN storage for years. And it works brilliantly. Great to come into work in the morning and receive emails that a disk failed and also that a replacement disk is on its way. That ASR worked a little different - the new implementation needs an ASR server installed. Here are the steps I followed to install ASR:
  1. Created a zone for ASR - I put everything in zones. Bitches love zones. I had a warm standby machine in my DMZ so I used that.
  2. Download the software logging into support.oracle.com and searching for Oracle Document : 1185493.1. To install ASR, you need to install OASM (Oracle Automated Service Manager) first. This doc will give you the link to both ASR and OASM.
  3. Install OASM
    • unzip Oracle_OASM_150_SOLARIS64.zip
    • pkgadd -d SUNWsasm-1.5.0-20130731121217.pkg
    • (The next two steps I did because of errors messages during the install so maybe do them before the install.)
    • pkg install SUNWcar SUNWkvm
    • pkg install pkg:/system/management/service-tag@1.1.5-0.175.1.0.0.23.0
  4. Install ASR
    • unzip Oracle_ASR_481_SOLARIS64.zip
    • pkgadd -d SUNWswasr-4.8.1-20140415212449.pkg
    • export PATH=$PATH:/opt/SUNWswasr/bin:/opt/SUNWsasm/bin
    • put line above in .profile
    • Run: asr
    • asr> register
    • Options are all all easy - just make sure your machine can get to transport.oracle.com
  5. Make sure you other machines can talk to ASR server
    1. First way of doing this:
      • asr> enable_http_receiver -p 8234
    2. Second way of doing this:
      • Edit /var/opt/SUNWsasm/configuration/config.ini
        jetty.enable=true
        jetty.host=<ipaddress> (Don't use "localhost")
        jetty.http.port=8234
      • Restart sasm: svcadm restart sasm
  6. Edit /var/opt/SUNWsasm/configuration/config.ini and set java.exec property to point to your java binary. And then restart sasm with svcadm restart sasm.
  7. Check that it all works with:
    • asr> show_http_receiver
  8. To connect Solaris 11 machines to your new ASR server, run:
    • asradm register -e http://<ip_of_your_server>:8234/asr
And that's it! Next you'll want to log into support.oracle.com and make sure that all your servers' details are correct and what not.

Tuesday 25 February 2014

Rolling back a Solaris boot environment

Boot environments have been with us since Solaris 10. In my experience it started out pretty buggy but in the last year or so its been stable. It really has revolutionized patching for me in terms of cutting down risk,time and effort.

In Solaris 11, boot environments are pretty much built into how you do patching. ("Patching" is a bit of a misnomer in Solaris 11 - "updating/upgrading" are more appropriate terms.)

Recently I had to roll back to a previous boot environment two weeks after the implementation. Rolling back is pretty easy (activating the previous boot environment and rebooting), however, you do lose changes you've made to the OS in the meantime. You can mitigate this risk by mounting the other boot environment and comparing files between the two.

Areas you should check are:

  1. /etc/system and any other config files in /etc
  2. Crons
  3. Zone configurations
  4. Files in root's home directory and other home directories.
These are just the basics, hopefully if you've made any major changes since then, you've documented it well enough to figure out how to re-implement then if needed.




Thursday 20 February 2014

Solaris Crash Dumps and Basic Analysis

If your Solaris system panics and reboots, it'll probably create a crash dump in /var/crash. You can also force a crash dump either online (using "savecore -L") or as part of a reboot (using "reboot -d").

Normally this is where I stop and upload the /var/crash/vmdump.0 file up to Oracle to find out what the problem was. However, you can do some basic investigations yourself using the following steps:
# savecore –f vmdump.0 /somedirectory
# cd /somedirectory
# mdb *0
mdb> ::status
mdb> ::panicinfo
mdb> ::stack
mdb> ::msgbuf
mdb> ::cpuinfo
mdb> ::ps
mdb> ::arc
mdb> ::memstat
(*If the vmdump file is called vmdump.1 then use 1 instead of 0 in the above steps)

Honestly, most of the output is Greek to me but it's nice to know in the off chance something makes sense.