文章內(nèi)容

Oracle 10G RAC數(shù)據(jù)庫(kù)日志報(bào)錯(cuò)LMS 0: 8069 GCS shadows trave

發(fā)布時(shí)間: 2012/9/16 14:28:45

今日有套aix 10G RAC數(shù)據(jù)庫(kù)節(jié)點(diǎn)1alert日志報(bào)LMS 0: 8069 GCS shadows traversed, 4001 replayed如下錯(cuò)誤，因節(jié)點(diǎn)2重啟導(dǎo)致。

后上網(wǎng)查看了些資料，如果修改系統(tǒng)時(shí)間也會(huì)報(bào)如上錯(cuò)誤并導(dǎo)致機(jī)器重啟。

轉(zhuǎn)載下itpub上kamus的一篇文章：

除了Windows和Linux，10.2.0.2以后的RAC是不是修改操作系統(tǒng)時(shí)間都會(huì)導(dǎo)致操作系統(tǒng)重啟-

在Oracle 10.2.0.3 RAC的測(cè)試中，發(fā)現(xiàn)如果修改某個(gè)節(jié)點(diǎn)的系統(tǒng)時(shí)間超過1.5秒，那么這個(gè)節(jié)點(diǎn)會(huì)被自動(dòng)重新啟動(dòng)。

好狠的處理方式 ......

詳細(xì)機(jī)制參見Internal Only的Metalink Note 308051.1。

The OPROCD executable sets a signal handler for the SIGALRM handler and sets the interval timer based on the to-millisec parameter provided. The alarm handler gets the current time and checks it against the time that the alarm handler was last entered. If the difference exceeds (to-millisec + margin-millisec), it will fail; the production version will cause a node reboot.

嘗試修改/etc/init.cssd中關(guān)于OPROCD的配置，將DISABLE_OPROCD設(shè)置為TRUE，然后重新啟動(dòng)系統(tǒng)，在系統(tǒng)進(jìn)程中已經(jīng)不存在oprocd進(jìn)程，但是居然修改完系統(tǒng)時(shí)間以后，機(jī)器仍然被重新啟動(dòng)了。

文檔中另外的描述提到，如果OPROCD是在non fatal mode狀態(tài)下啟動(dòng)的，那么將只會(huì)寫一段log而不去重新啟動(dòng)機(jī)器，并且在Note:265769.1中也描述了如何修改為non fatal mode，但是我沒有去嘗試。

In fatal mode, OPROCD will reboot the node if it detects excessive wait. In Non Fatal mode, it will write an error message out to the file <hostname>.oprocd.log in one of the following directories.

最后嘗試的結(jié)果是將整個(gè)cssd進(jìn)程disable掉，這樣可以避免因?yàn)樾薷南到y(tǒng)時(shí)間而引起機(jī)器重啟。

這段時(shí)間發(fā)現(xiàn)Oracle10g的CRS確實(shí)有些霸道，上次的測(cè)試中拔掉Private IP網(wǎng)卡上的網(wǎng)線，操作系統(tǒng)會(huì)重新啟動(dòng)，這次居然修改系統(tǒng)時(shí)間也會(huì)導(dǎo)致系統(tǒng)重啟，真當(dāng)這些機(jī)器是Windows了？UNIX Server中重啟一次機(jī)器多大的事兒啊，CRS搞的跟吃飯一樣隨意，動(dòng)不動(dòng)reboot。

下面的這段資料描述了Oracle CRS的三個(gè)進(jìn)程會(huì)在哪些狀態(tài)下重新啟動(dòng)機(jī)器。

Oracle clusterware has the following three daemons which may be responsible for panicing the node. It is possible that some other external entity may have rebooted the node. In the context of this discussion, we will assume that the reboot/panic was done by an Oracle clusterware daemon.

* Oprocd - Cluster fencing module
* Cssd - Cluster sychronization module which manages node membership
* Oclsomon - Cssd monitor which will monitor for cssd hangs

OPROCD This is a daemon that only gets activated when there is no vendor clusterware present on the OS. This daemon is also not activated to run on Windows/Linux. This daemon runs a tight loop and if it is not scheduled for 1.5 seconds, will reboot the node.
CSSD This daemon pings the other members of the cluster over the private network and Voting disk. If this does not get a response for Misscount seconds and Disktimeout seconds respectively, it will reboot the node.
Oclsomon This daemon monitors the CSSD to ensure that CSSD is scheduled by the OS, if it detects any problems it will reboot the node.

需要找到方法去禁用這些reboot的特性，reboot了你又不能解決問題，瞎操什么心嘛。

本文出自：億恩科技【mszdt.com】

服務(wù)器租用/服務(wù)器托管中國(guó)五強(qiáng)！虛擬主機(jī)域名注冊(cè)頂級(jí)提供商！15年品質(zhì)保障！--億恩科技[ENKJ.COM]

激情五月天婷婷,亚洲愉拍一区二区三区,日韩视频一区,a√天堂中文官网8

服務(wù)器租用

服務(wù)器托管

機(jī)柜批發(fā)

云服務(wù)器

建站俠

空間/域名

安全保姆

幫助類別

幫助中心

文章內(nèi)容

Oracle 10G RAC數(shù)據(jù)庫(kù)日志報(bào)錯(cuò)LMS 0: 8069 GCS shadows trave

同類文章

億恩公告

在線客服