On the node, execute the lssrc command, as follows.
You should see output similar to the following:
# lssrc -s ctrmc
Status should be active.
If the daemon is inoperative, examine err.out.
Diagnosis
During the first level of verification, you can diagnose the RMC connection issues in the following ways:
1. If an active partition has RMC Active:<0> in the lspartition command output, refer to the detailed
diagnostics to address common RMC connection issues.
2. If the lspartition command displays an RMC connection as Active<1> but the lssyscfg command
displays none or inactive, the data that supports these two commands are not in agreement. In this
case, perform the server rebuild operation on the server or restart the HMC. This operation brings the
connection status data back in agreement.
====================================================
Verifying server connection states
You can verify all the managed servers on HMC have good connections to the service processor on the
private service network by running the lssyscfg command.
hscroot@myMC:~> lssyscfg -r sys -F name,type_model,serial_num,state
9.3.206.220,9179-MHD,1003EFP,No Connection
9.3.206.223,9179-MHD,1038D0P,No Connection
The following states indicate good connections:
v Operating
v Standby
v Power Off
v Error
v Other transient states, for example, Powering On
The following states indicate problems:
v Incomplete
v No Connection
v Recovery
Note: In HMC Version 7.7.7.0 Service Pack 1, and earlier, the existing connections are removed in a
server that is in No Connection, Incomplete, or Error states and these servers prevent connections for
newly activated partitions. This restriction does not apply to HMC Version 7.7.7.0 Service Pack 2, and
later.
Diagnosis
v If server connection state is Incomplete, perform a server rebuild operation:
hscroot@trucMC:~> chsysstate -r sys -o rebuild -m CEC_name
v If server connection state is No Connection, resolve or remove the connection. Common issues that
cause No Connection follow:
– Improper firewall configuration on the network from HMC to the Fiber Service Platform (FSP).
– More than two HMCs are attempting to manage the server.
========================================================================================================
Verifying the IP addresses used for RMC connections
List the HMC IP addresses by using the lshmc HMC command. In this example, the HMC has two
network adapters that have IPv4 and IPv6 addresses:
hscroot@myMC:~> lshmc -n -F ipaddrlpar,ipaddr,ipv6addrlpar
9.53.202.86,9.53.202.86,9.53.202.87,fe80:0:0:0:20c:29ff:fedb:4816,
fe80:0:0:0:20c:29ff:fedb:4817
The lshmc command output lists the IP addresses that partitions use to establish RMC communication
with the HMC. The ipaddrlpr parameter is the preferred IP address that is used to establish the
connection. If a connection is not established with this IP address, RMC attempts connections on the
other IP addresses in the listed order.
Diagnosis
If the IP addresses listed in this command are not correct, one or more of the HMC network interfaces is
configured incorrectly.
========================================================================================================
Verifying RMC port configuration
Verify that RMC is accepting requests from both TCP and UDP 657 ports by using the netstat HMC
command:
hscroot@truchmc:~> netstat -tulpn | grep 657
tcp 0 0 :::657 :::* LISTEN -
udp 0 0 :::657 :::* -
Diagnosis
If one of the entries is not listed, restart the HMC.
========================================================================================================
Verifying the RMC port for each partition
Verify whether the partition's firewall is open and authenticated for port 657 and is accessible from the
HMC by using the telnet or ssh commands from the HMC to establish a connection to the partition to
verify the network and authenticate the firewall.
hscroot@truchmc:~># ssh lpar_host name|IP
This verification must be repeated for each partition as necessary.
Diagnosis
From the HMC GUI, click HMC Management → Change Network Settings → LAN Adapter/Details →
Firewall Settings, and then select Allow RMC.
========================================================================================================
Verifying the HMC RMC port from each partition
Verify whether the HMC firewall is open and authenticated for port 657 and accessible from one or more
partitions.
From the partition, use the telnet command to verify whether the HMC port 657 is open for RMC's use.
#telnet HMC_host name | IP 657
Diagnosis
The following problems can indicate the RMC port communication issues:
v RMC ports, specifically TCP 657, is not enabled in the HMC firewall.
Navigate to the HMC firewall as described earlier and enable the RMC port.
v RMC has an issue that it does not communicate to TCP 657.
Restart HMC to restart the RMC subsystem.
=======================================================================================================
Verifying partition file systems
Verify whether the partition's /var and /tmp file systems are not full.
On each RS/6000® Platform Architecture (RPA) partition that does not have RMC connection to the HMC,
use the df command to display the file system usage:
# df
Filesystem ... Use% Mounted on
/dev/hda2 ... 44% /
/dev/hda3 ... 23% /var
...
Diagnosis
If the /var or /tmp file system is 100% full, remove unnecessary files or increase the file system sizes by
using the smitty or equivalent Linux commands.
After changes are complete to increase the space in /var file system, run the following commands to fix
the potentially corrupted files:
# rmrsrc -s "Hostname!=’t’ " IBM.ManagementServer
# /usr/sbin/rsct/bin/rmcctrl -z
# rm /var/ct/cfg/ct_has.thl
# rm /var/ct/cfg/ctrmc.acls
# /usr/sbin/rsct/bin/rmcctrl -A
========================================================================================================
Checking for reused IP addresses
Similar to the Duplicate NodeId state, reused or recycled IP addresses among partitions can cause an
HMC error if a new partition connection is established while the old (probably inactive) connection still
exists.
The lssyscfg -r lpar HMC command can be used to list all the IP addresses for all RMC connections.
When this list is sorted, duplicate RMC addresses are listed adjacent and can be identified.
lssyscfg -r lpar -m CEC_name -F rmc_ipaddr,lpar_id,name,state,rmc_state | sort
When you scan the list, you can identify the duplicate addresses as consecutive entries with the same
first parameter (RMC IP address).
Diagnosis
If a duplicate address is identified, determine which IP address is valid or expected and which IP address
is invalid or stale. To correct the problem, complete the following steps:
1. On the HMC, unmanage the server corresponding to the stale RMC connection by running the
following command:
rmsysconn –ip CEC_IP
2. Wait for 6 minutes or more, then start managing the server again by running the following command:
mksysconn -ip CEC_IP
=======================================================================================================
Checking for MTU size mismatch
Most of the current versions of RMC require all parties to use the same maximum transmission unit
(MTU) size.
The recommended MTU setting for RMC on both HMC and partitions is 1500. If jumbo
frames are required, all parties on that network must use jumbo frames.
You can use different MTU sizes on other network interfaces.
For example, if different HMC network
adapters are used for the two networks, jumbo frames can be used on the HMC to server (Fiber Service
Platform (FSP) network) while regular frames (MTU size = 1500) can be used for RMC communication.
Different MTU settings between HMC and the partitions results in a No Connection condition and an
indefinite hang in the partition. This type of hang is recreatable by using VIOS lsmap -all command in a
large system that produces a large output and requires multiple packages to be transferred between HMC
and VIOS.
To check MTU size on partitions, run the following command:
#ifconfig | fgrep MTU
UP BROADCAST RUNNING MULTICAST MTU:1500
To check whether jumbo frame is enabled on HMC, run the following command:
#lshmc -n
hostname=myhmc,...,jumboframe_eth0=off,lparcomm_eth0=off,..,jumboframe_eth1=on,lparcom_eth1=on
Diagnosis
The issue can be addressed by either changing the incorrect MTU sizes or by changing the HMC network
interface that is used for RMC communication. To designate a different Ethernet adapter for partition
communication, you can use one of the following options:
v Run the chhmc HMC command.
v Use the HMC GUI (HMC Management -> Change Network Settings).
======================================================================================================
Checking for duplicate node ID on the partitions
RMC uses a unique node ID to identify partitions. Having more than one partition with the same node
ID can cause an RMC error.
If a partition is cloned improperly, it can have a duplicate node ID from the cloned partition, causing
intermittent enabled or disabled connections between the partitions. The connections are also disabled for
all partition that share the duplicate node ID.
To determine whether duplicate Node IDs exist, consider the following options:
v For partitions with active RMC connections:
From the HMC, as root user, run the /usr/sbin/rsct/bin/rmcdomainstatus -s ctrmc command and
identify any duplicate entries. If HMC is managing a large number of partitions, it might be a difficult
task.
v On partitions without an active RMC connection:
Compare the /etc/ct_node_id file manually in each partition.
Diagnosis
To repair duplicate node IDs, complete the following steps on the partitions that have duplicate node IDs:
1. Remove the /etc/node_id file, and then run the recfgct command to generate a new node ID.
Note: You must run the recfgct command only if you do not have any high availability clusters set
up on this node that uses the IBM PowerHA SystemMirror or IBM Tivoli System Automation for
Multiplatforms (SAMP) products.
2. If the LPARs are running AIX 6 with 6100-07, or later, run the following command:
oemdelete -o CuAt -q name=cluster0 to remove ’cluster0’ entry from the CuAt ODM.
/usr/sbin/rsct/install/bin/recfgct
3. If the LPARs are running AIX 6 with 6100-06, or earlier, run the following command:
/usr/sbin/rsct/install/bin/recfgct
=====================================================================================================
LINKS:
http://www-01.ibm.com/support/knowledgecenter/HW4M4/p8hat/p8hat_hatrmctroubleshoot.htm
http://www-01.ibm.com/support/docview.wss?uid=isg3T1020611
https://www.ibm.com/developerworks/community/blogs/talor/entry/restartrmc?lang=en
http://www-01.ibm.com/support/knowledgecenter/HW4M4/p8hat/p8hat_hatrmctroubleshoot.htm
RMC CONNECTION ERROR
Reviewed by stefzeer
on
February 03, 2016
Rating:
No comments: