analytic

Tuesday 30 October 2012

Recovering Linux without getting into rescue mode

In large organizations, mostly the servers would be managed remotely through the console (ilo/drac/imm). The large company's also change the root password on all servers periodically which mostly happen through scripts.  If, due to any reason, the root password is not updated on server and if the server crash due to any file system related issues, the system admin has to spent lot of time to recover the server because the system admin cannot even get into single user mode due to the fact that file system has issues and the root password is not known.   In these situations, the system admin has to boot the system with the Linux Boot CD and get into rescue mode to recover server/root password. The large organizations also mostly would not have a CD at the datacenter and the system admins are expected to use boot.iso (which comes with the linux distribution) to get into rescue mode.  Since, the iso image is connected from user desktop, mostly the server (high end) would take huge time to boot up and get into rescue mode.

However, there is an easy method to quickly recover the server in these situations.  Linux allows system admins to provide kernel parameters at boot time (grub screen). rc.sysinit script is responsible for checking the file system(s) and mounting them in the read/write mode.  If the system admin can skip the init process and get a root prompt, the kernel will not check for the file system and drop you to the # prompt.  The following steps needs to be followed to skip init from loading into the memory in the booting process.

1. Reboot the server.
2. At the Grub Prompt, press “a” to provide additional kernel parameters.
3. Add “init=/bin/bash” at the end and press “Enter”. Basically, we are just telling the kernel to load /bin/bash in place of init to skip init from loading into the memory.  The kernel would load /bin/bash in place of init and would drop you to the # Prompt. But, since rc.sysinit script is not yet executed, the root file system is mounted in read-only mode. Hence you need to mount the root file system in read/write mode before making any changes to the system configuration. 
4.  mount the root file system in read/write mode using the following command :

bash-3.1# mount -o remount,rw /

 5.Now you can do everything, which you can do from single user mode.  Update any config file, scan file systems for errors or even enable/disable services at boot time. Once you are done, type exit and hit enter to reboot the server.  You would get an error message, “Kernel panic – not syncing : Attempted to kill init !”. Ignore the error and hard reboot the server.

Cheers,
Vaibhav

Saturday 27 October 2012

RPM Package Management a Quick How To

Quick How to on Red Hat Package Manager (RPM)

How to install packages with all the dependency packages if all the packages are available at a common repository?
Ans : rpm -ivh --aid packagename.

How to check, where a particular package installed it’s configuration files.
Ans : rpm -qc packagename.

How to check the change log of the installed package.
Ans : rpm -q --changelog packagename.

How to check, where a particular package installed it’s doc files.
Ans : rpm -qd packagenme

How to check all the files installed by package?
Ans : rpm -q --filesbypkg packagename

How to check the version of files installed by a package
Ans : rpm -qi packagename

How to check the dependencies for a particular packages i.e. Required libraries packages etc.
Ans : rpm -q -R packagename.

How to upgrade the packages which are already installed on to the linux box.
Ans : rpm -F install options packagename.

What is the command to update only the rpm database.
Ans : rpm -i --justdb packagename

What is the command to check whether a particular package installation would be successful but would not actually install the package.
Ans : rpm -ivh --test packagename

How to check that a particular file belong to which package
Ans : rpm -qf filename

How to list files in a package
Ans : rpm -ql packagename

How to verify whether the files installed by package are intact or been tampered/corrupted.
Ans : rpm -qs packagename

What is the command to create a new RPM Database
Ans : rpm --initdb

What is the command to rebuild the RPM Database
Ans : rpm --rebuilddb

Cheers!!
Vaibhav

Enabling HTTPS Support in JBOSS


Perform the below steps to enable HTTPS support in JBOSS :

1. Execute the following command to generate encryption key for HTTPS.

/usr/java/jdk1.5.0_14/bin/keytool -genkey -keyalg RSA -keystore jboss.keystore -validity 3650

2. The command will ask you some information like, name company, country etc. Provide the information, keystore password and key password for mykey. The command will create a file with name jboss.keystore in your current working directory.

3. Copy the keystore file to $JBOSS_HOME/server/all/conf/ folder.

4. Edit the $JBOSS_HOME/server/all/deploy/jboss-web.deployer/server.xml file and uncomment the SSL/TLS Connector section. Also update the keystore file location and keystore password in the server.xml file as shown below.

<Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true"
maxThreads="150" scheme="https" secure="true"
clientAuth="false"
keystoreFile="${jboss.server.home.dir}/conf/jboss.keystore"
keystorePass="server" sslProtocol = "TLS" />

5. Restart JBOSS.

Friday 26 October 2012

Reserved Blocks in Linux

Reserved blocks are disk blocks reserved by the kernel for processes owned by privileged users to prevent operating system from a crash due to unavailability of storage space for critical processes.   For example, just imagine the size of root file system is 14 GB and the root file system is 100% full, all the non privileged user processes would not be able to write data to the root file system whereas processes which are owned by  privileged user (root by default) would still be able to write the data to the file system. With the help of reserved blocks, operating system keeps running for hours or sometimes days even though root file system is 100% full.

The default percentage of reserved block is 5 % of the total size of file system and can be increased or decreased based upon the requirement.  However, it is not recommended to reduce the percentage of reserved block less than 5 %.  Reserved blocks are supported on ext2 and ext3 file system(s).


How to check how many blocks are reserved :


To check the number of blocks reserved on a file system, execute the following command.

[root@VCSNode2 ~]# dumpe2fs -h /dev/VolGroup00/LogVol00  | grep -i block
dumpe2fs 1.39 (29-May-2006)
Block count: 3637248
Reserved block count:  181862
Free blocks: 2709898
First block:  0
Block size:  4096
Reserved GDT blocks:  887
Blocks per group:  32768
Inode blocks per group:   1024
Reserved blocks uid:  0 (user root)
Reserved blocks gid:  0 (group root)
Journal backup:   inode blocks
[root@VCSNode2 ~]#

To find the value for reserved block percentage, compare "Block Count" and "Reserved Block Count" values.  In above example, Block Count (Total Blocks) = 3637248 and Reserved Block Count = 181862 so the reserved block percentage option is set to 181862/3637248*100 i.e. 5 % (default value).

How to change/configure reserve block percentage value :

The value for Reserved Block Percentage can be set at the time of creating the file system as well as after creating the file system.  To set the value at the time of creating file system, pass -m option followed by desired value (in percentage) for reserved blocks to the mkfs command.

To set the reserved block percentage value after creating file system, use the following command.
[root@VCSNode2 ~]# tune2fs -m 3 /dev/VolGroup00/LogVol00
tune2fs 1.39 (29-May-2006)
Setting reserved blocks percentage to 3% (109117 blocks)
[root@VCSNode2 ~]#

Above command would set the reserved block percentage value to 3 % of total block count.
Just in case if you do not want to provide percentage and would like to set exact reserved blocks, you can also accomplish that with the help of below command

[root@VCSNode2 ~]# tune2fs -r 109117 /dev/VolGroup00/LogVol00
tune2fs 1.39 (29-May-2006)
Setting reserved blocks count to 109117
[root@VCSNode2 ~]#


Which user can access reserved blocks:


By default, only root user is allowed to access reserved blocks, but this can be changed. To allow any other user or group to access reserved blocks, execute the following commands.

[root@VCSNode2 ~]# tune2fs -u oracle /dev/VolGroup00/LogVol00
tune2fs 1.39 (29-May-2006)
Setting reserved blocks uid to 500
[root@VCSNode2 ~]#
[root@VCSNode2 ~]# tune2fs -g dba /dev/VolGroup00/LogVol00
tune2fs 1.39 (29-May-2006)
Setting reserved blocks gid to 500
[root@VCSNode2 ~]#

To check which user/group is allowed to access reserved blocks, execute the following command.

[root@VCSNode2 ~]# dumpe2fs -h /dev/VolGroup00/LogVol00 | grep -i reserved
dumpe2fs 1.39 (29-May-2006)
Reserved block count:  109117
Reserved GDT blocks:  887
Reserved blocks uid: 500 (user oracle) // oracle user is allowed to access reserved blocks
Reserved blocks gid: 500 (group dba) // dba group is allowed to access reserved blocks
[root@VCSNode2 ~]#

If the available storage is limited, it is always a good idea to set this option specifically for Oracle/Mysql Database Servers where there is a risk of operating system/database being crashed due to file system full. This option give some extra time to system administrator/database administrator to fix space issues and prevents Operating System/Database from crash.

Note : It is never recommended to reduce reserved block percentage or changes in reserved block uid/gid on root file system.

Cheers !!
Vaibhav

Thursday 25 October 2012

Scanning SCSI Bus and HBA Devices in Linux

How to scan HBA’s to acquire newly added storage:

Loop Initialization Protocol (issue_lip): Loop Initialization Protocol is a method to perform bus reset which scan the scsi bus and updates the scsi layer to reflect all the scsi devices. This is a asynchronous operation and may take longer time to complete. Usually after executing the command to trigger issue_lip, you would get a command prompt immediately, but in actual it might take up longer time to complete the bus scan.

Loop Initialization Protocol method to scan the HBA’s can cause delay and I/O timeouts if the HBA/Device is in use and can also remove devices unexpectedly. Hence performing the scan using this method is not recommended on any production server where the SAN Devices are already configured in use. This type of scan is recommended on a newly built server to scan all the LUNS/Devices.

To perform the bus scan using this method, execute the following command as root.

echo “1″ > /sys/class/fc_host/host*/issue_lip // this command will scan all the HBA’s

If you want to scan any specific HBA, then execute the command as below

echo “1″ > /sys/class/fc_host/hostN/issue_lip //replace N with the number of HBA to scan

Scanning specific device/path :

If you have a running server where you need to scan specific device or a path to the device, it is not recommended to use issue_lip method and you should use the below method to scan for the updated device information.

To scan a specific device/lun, you need the following information at minimum.

1. WWNN Number of the HBA to scan

2. Lun number (which is assigned from SAN)

Once you have the above mentioned information, you can identify the HBA number, HBA Channel and SCSI Target ID information by executing the following command.

grep 1111111111111111 /sys/class/fc_transport/*/node_name //replace ’1111111111111111′ with actual WWNN Number.

This command would display output as below

/sys/class/fc_transport/target4:0:1/node_name:0×5006016090203181

In this case, the HBA number (Host) is 4, HBA Channel is 0 and SCSI Target id is 1. If the Lun Number is 11, execute the following command to scan this device.

echo “0 1 11″ >/sys/class/scsi_host/host4/scan //here 4 is the HBA Number.

If you do not have the HBA Number, HBA Channel, SCSI Target ID or Lun ID Information, and wanted to scan all devices on specific HBA, execute the following command.

echo “- – -” > /sys/class/scsi_host/hostN/scan //replace N with the HBA Number.

Cheers !!
Vaibhav

Wednesday 24 October 2012

Killing Multiple Processes

I was working on one of the issue today, where in the passwordless ssh was not working for one user.  When I checked, I found more than 3000 defunct processes running on the server owned by a specific user and the system was not allowing to start a new process due to the user's nproc limit. The same user was also running 100's of other processes which were not defunct.  The server was running multiple applications so I didn't had the liberty to reboot the server. Hence, the only option I had was to kill  all the processes which were running under the ownership of that specific user.

In this scenario, killing all the processes one by one is a very tedious task.  Also, since there were many defunct processes running on the server, it was very difficult to identify the parent process and kill the parent process of the defunct process.  I decided to write a long command using awk and grep.

Below is the command I used to kill all the processes owned by that user (xyz) in a single command which saved my lot of time.


linux-ydn1:~ # ps -ef | grep -i xyz| awk '{print "kill -9 ", $2}' | sh


Command explanation : 

1.  The first part of command (ps -ef) is providing the list of all the processes running on the server.
2.  The 2nd part, (grep -i xyz) is searching for processes where word xyz exist.  This can be customized based upon the specific requirement.
3.  The 3rd part is searching for the pid of the running process and prefixing the kill command before the pid.
4.  The fourth and the last command is executing each line printed by the 1st 3 parts of the long command using sh command.

Cheers !!
Vaibhav

Tuesday 23 October 2012

sort: open failed: +1: No such file or directory


Newer versions of POSIX are not fully compatible with older version of POSIX as a result, there are some changes in the parameters/switches for some commands like sort.
For Example, if the POSIX Version is 200112 or above, “sort +2″ command would through an error message “sort: open failed: +1: No such file or directory” as the newer version 200112 (1003.1-2001) is not fully compatible with POSIX older versions like 199209 (1003.2-1992).
[root@vcsnode1 ~]# ps -ef | sort +2
sort: open failed: +2: No such file or directory
[root@vcsnode1 ~]#
As a result, the sort command needs to be executed as below to sort the output based on column three which is equivalent to sort +2
[root@vcsnode1 ~]# ps -ef | sort -k 3 | head
root 1 0 0 May18 ? 00:00:00 init [5]
root 2670 1 0 May18 ? 00:00:00 auditd
root 2929 1 0 May18 ? 00:00:00 automount
avahi 3400 1 0 May18 ? 00:00:00 avahi-daemon: running [vcsnode1.local]
root 3835 1 0 May18 ? 00:00:00 /bin/dbus-daemon –fork –print-pid 4 –print-address 6 –session
root 1990 1 0 May18 ? 00:00:00 /bin/sh – /usr/lib/vxvm/bin/vxconfigbackupd
root 1988 1 0 May18 ? 00:00:00 /bin/sh – /usr/lib/vxvm/bin/vxrelocd root
root 3281 1 0 May18 ? 00:00:00 crond
root 3034 1 0 May18 ? 00:00:00 cupsd
dbus 2812 1 0 May18 ? 00:00:00 dbus-daemon –system
[root@vcsnode1 ~]#
The current version of POSIX can be checked by executing the following command
[root@vcsnode1 ~]# getconf -a | grep ^POSIX2_VERSION
POSIX2_VERSION 200112
[root@vcsnode1 ~]#
For any reason, if you still want to use the old syntax of sort command, there is a work around available.
If you set the “_POSIX2_VERSION:” environment variable to older POSIX Version value, you can still execute the sort command by providing switch as +2.
[root@vcsnode1 ~]# ps -ef | sort +2
sort: open failed: +2: No such file or directory
[root@vcsnode1 ~]# export _POSIX2_VERSION=199209
[root@vcsnode1 ~]# ps -ef | sort +2 | head
root 1 0 0 May18 ? 00:00:00 init [5]
root 2670 1 0 May18 ? 00:00:00 auditd
root 2929 1 0 May18 ? 00:00:00 automount
avahi 3400 1 0 May18 ? 00:00:00 avahi-daemon: running [vcsnode1.local]
root 3835 1 0 May18 ? 00:00:00 /bin/dbus-daemon –fork –print-pid 4 –print-address 6 –session
root 1990 1 0 May18 ? 00:00:00 /bin/sh – /usr/lib/vxvm/bin/vxconfigbackupd
root 1988 1 0 May18 ? 00:00:00 /bin/sh – /usr/lib/vxvm/bin/vxrelocd root
root 3281 1 0 May18 ? 00:00:00 crond
root 3034 1 0 May18 ? 00:00:00 cupsd
dbus 2812 1 0 May18 ? 00:00:00 dbus-daemon –system
[root@vcsnode1 ~]#