早上收到一个老客户其单位Oracle11G RAC节点1异常无法启动数据库,并且所有的应用程序无法连接数据库的CASE,于是我方远程客户DB服务器进行排查,发现RAC有很多资源启动失败:
问题现象:
1, Oracle 11g RAC集群中的节点1无法启动CRS服务,数据库也无法启动;
2, 当前Oracle 11g RAC数据库无法连接,TNS访问提示找不到正确的连接字符。
解决过程:
问题1:尝试手动启动RAC节点1的各项服务:crsctl start crs
查看节点1的集群日志,发现有大量的磁盘访问报错:
Unable to discover any voting files, retrying discovery in 15 seconds
Unable to discover any voting files, retrying discovery in 15 seconds
Unable to discover any voting files, retrying discovery in 15 seconds
Unable to discover any voting files, retrying discovery in 15 seconds
Unable to discover any voting files, retrying discovery in 15 seconds
Unable to discover any voting files, retrying discovery in 15 seconds
Unable to discover any voting files, retrying discovery in 15 seconds
经过一段时间等待,最后节点1的集群服务启动失败,根据我方经验,此问题很可能是节点1与存储的连通性发生了故障,经与客户协商,建议客户更换其链路端口或接口介质,客户到达机房后,按照我方建议做了调整后,RAC节点1的服务正常启动,但此时虽然数据库启动了,但VIP,SCAN IP等资源仍然OFFLINE:
问题2解决过程:
尝试手动对异常资源启动无果后,经与客户沟通,他们由于业务原因,对服务器网络IP做了调整,其调整主要是更改了服务器的IP地址,还有/etc/hosts的内容:
上面的内容是客户更改HOSTS文件后新的IP地址信息,基本上除了私有网络没有更改,其他IP包括VIP,SCAN IP都更改了,客户以为主要更改这个地方即可正常使用,但RAC的各项服务资源在RAC安装初期是做了网络信息绑定的,为了确认此问题,我们查看当前RAC绑定的资源信息:
[grid@xjlytnw1 ~]$ srvctl config vip -n xjlytnw1
VIP 存在: /xjlytnw1_vip/10.109.254.15/10.204.91.0/255.255.255.0/eth0, 托管节点 xjlytnw1
[grid@xjlytnw1 ~]$ srvctl config vip -n xjlytnw2
VIP 存在: /xjlytnw2_vip/10.109.254.16/10.204.91.0/255.255.255.0/eth0, 托管节点 xjlytnw2
[root@xjlytnw1 bin]# ./srvctl config scan
SCAN 名称: xjlytnw-scan, 网络: 1/10.204.91.0/255.255.255.0/eth0
SCAN VIP 名称: scan1, IP: /xjlytnw-scan/10.109.91.205
可见,SCAN IP也绑定了原来的IP地址,这个时候需要手动重新注册:
修改前先关闭所有VIP和监听资源:
[grid@xjlytnw1 ~]$ srvctl stop listener -n xjlytnw1
[grid@xjlytnw1 ~]$ srvctl stop listener -n xjlytnw2
[grid@xjlytnw1 ~]$ srvctl stop vip -n xjlytnw1
[grid@xjlytnw1 ~]$ srvctl stop vip -n xjlytnw2
修改SCAN IP注册:
[root@xjlytnw1 bin]# ./srvctl modify scan -n xjlytnw-scan
确认:
[root@xjlytnw1 bin]# ./srvctl config scan
SCAN 名称: xjlytnw-scan, 网络: 1/10.204.91.0/255.255.255.0/eth0
SCAN VIP 名称: scan1, IP: /xjlytnw-scan/10.109.254.17
修改VIP注册:
[root@xjlytnw1 bin]# srvctl modify nodeapps -n xjlytnw1 -A 10.109.254.15/255.255.255.0/eth0
[root@xjlytnw1 bin]# srvctl modify nodeapps -n xjlytnw2 -A 10.109.254.16/255.255.255.0/eth0
确认VIP信息:
[grid@xjlytnw1 ~]$ srvctl config vip -n xjlytnw1
VIP 存在: /xjlytnw1_vip/10.109.254.15/10.109.254.0/255.255.255.0/eth0, 托管节点 xjlytnw1
[grid@xjlytnw1 ~]$ srvctl config vip -n xjlytnw2
VIP 存在: /xjlytnw2_vip/10.109.254.16/10.109.254.0/255.255.255.0/eth0, 托管节点 xjlytnw2
确定无误后,启动VIP和监听资源(SCAN IP会随之自动启动)
[grid@xjlytnw1 ~]$ srvctl start vip -n xjlytnw1
[grid@xjlytnw1 ~]$ srvctl start vip -n xjlytnw2
[grid@xjlytnw1 ~]$ srvctl start listener -n xjlytnw1
[grid@xjlytnw1 ~]$ srvctl start listener -n xjlytnw2
最后确认RAC集群状态:
[grid@xjlytnw1 ~]$ crs_stat -t
Name Type
Target State Host
------------------------------------------------------------
ora....ER.lsnr ora....er.type ONLINE ONLINE
xjlytnw1
ora....N1.lsnr ora....er.type ONLINE ONLINE
xjlytnw2
ora.LYTARCH.dg ora....up.type ONLINE ONLINE
xjlytnw1
ora.LYTDATA.dg ora....up.type ONLINE ONLINE
xjlytnw1
ora....VOTE.dg ora....up.type ONLINE ONLINE
xjlytnw1
ora....ARCH.dg ora....up.type ONLINE ONLINE
xjlytnw1
ora....DATA.dg ora....up.type ONLINE ONLINE
xjlytnw1
ora....ARCH.dg ora....up.type ONLINE ONLINE
xjlytnw1
ora....DATA.dg ora....up.type ONLINE ONLINE
xjlytnw1
ora.asm ora.asm.type ONLINE
ONLINE xjlytnw1
ora.cvu ora.cvu.type ONLINE
ONLINE xjlytnw1
ora.gsd ora.gsd.type OFFLINE
OFFLINE
ora.lytdb.db ora....se.type ONLINE ONLINE
xjlytnw1
ora....network ora....rk.type ONLINE ONLINE
xjlytnw1
ora.oc4j ora.oc4j.type ONLINE
ONLINE xjlytnw2
ora.ons ora.ons.type ONLINE
ONLINE xjlytnw1
ora....tslg.db ora....se.type ONLINE ONLINE
xjlytnw1
ora....zhly.db ora....se.type ONLINE ONLINE
xjlytnw2
ora....ry.acfs ora....fs.type ONLINE ONLINE
xjlytnw1
ora.scan1.vip ora....ip.type ONLINE ONLINE
xjlytnw2
ora....SM1.asm application ONLINE ONLINE
xjlytnw1
ora....W1.lsnr application ONLINE ONLINE
xjlytnw1
ora....nw1.gsd application OFFLINE OFFLINE
ora....nw1.ons application ONLINE ONLINE
xjlytnw1
ora....nw1.vip ora....t1.type ONLINE ONLINE
xjlytnw1
ora....SM2.asm application ONLINE ONLINE
xjlytnw2
ora....W2.lsnr application ONLINE ONLINE
xjlytnw2
ora....nw2.gsd application OFFLINE OFFLINE
ora....nw2.ons application ONLINE ONLINE
xjlytnw2
ora....nw2.vip ora....t1.type ONLINE ONLINE
xjlytnw2
可见,所有资源均正常启动,最后让客户测试连接,一切正常,问题解决!
发表评论 取消回复