Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GET_LOCK #1

Open
annProg opened this issue Jul 2, 2020 · 22 comments
Open

GET_LOCK #1

annProg opened this issue Jul 2, 2020 · 22 comments

Comments

@annProg
Copy link
Contributor

annProg commented Jul 2, 2020

From itop wiki:

Galera clusters with multiple masters are NOT supported by iTop, because such clusters do not properly implement the GET_LOCK MySQL function (for more information: Galera cluster known limitations).

什么是 GET_LOCK

GET_LOCK(str,timeout)

尝试使用 timeout 秒数超时来获取具有字符串str给定名称的锁. 负timeout值表示无限超时. 锁是排他的. 当一个会话举行时,其他会话无法获得相同名称的锁.

注意 timeout 参数是获取锁这个操作的超时时间,不是锁本身的超时时间。

@annProg
Copy link
Contributor Author

annProg commented Jul 2, 2020

手动备份警告信息:

2020-07-02 13:07:32 | Info    | mysqldump said: Warning: A partial dump from a server that has GTIDs will by default include the GTIDs of all transactions, even those that changed suppressed parts of the database. If you don't want to restore GTIDs, pass --set-gtid-purged=OFF. To make a complete dump, pass --all-databases --triggers --routines --events. | IssueLog

@annProg
Copy link
Contributor Author

annProg commented Jul 2, 2020

会有节点一直在维护状态页面

Update:此问题应是 rsync 同步导致,node1 上执行维护任务,进入维护状态,node2, node3 恰好开始同步,node1 维护完成之后删除 data/.maintenance,但是 node2, node3 没有删除(不能使用 rsync --delete 选项,因为每个节点有不同的cache等文件,此选项会删除这些不同的文件)。应将此文件排除在 rsync 同步列表之外。

image

Run:

rm -f data/.maintenance

Update: 会有2个节点(102,103)经常进入 maintenance 状态。

@annProg
Copy link
Contributor Author

annProg commented Jul 2, 2020

恢复备份失败,报错

@@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty.

@annProg
Copy link
Contributor Author

annProg commented Jul 2, 2020

解决备份还原报错问题

setup/backup.class.inc.php 增加 --set-gtid-purged=OFF 参数

                        // Note: opt implicitely sets lock-tables... which cancels the benefit of single-transaction!
                        //       skip-lock-tables compensates and allows for writes during a backup
                        $sCommand = "$sMySQLDump --defaults-extra-file=\"$sMySQLDumpCnfFile\" --opt --skip-lock-tables --default-character-set=".$sMysqldumpCharset." --add-drop-database --single-transaction --host=$sHost $sPortOption --user=$sUser $sTlsOptions --set-gtid-purged=OFF --result-file=$sTmpFileName $sDBName $sTables 2>&1";
                        $sCommandDisplay = "$sMySQLDump --defaults-extra-file=\"$sMySQLDumpCnfFile\" --opt --skip-lock-tables --default-character-set=".$sMysqldumpCharset." --add-drop-database --single-transaction --host=$sHost $sPortOption --user=xxxxx $sTlsOptions --set-gtid-purged=OFF --result-file=$sTmpFileName $sDBName $sTables";

参考:开启GTID的情况下导出导入库的注意事项

@annProg
Copy link
Contributor Author

annProg commented Jul 6, 2020

# find ./ -name "*.php" |xargs grep "GET_LOCK" --color
./core/mutex.class.inc.php:		// using GET_LOCK anytime on the same session will RELEASE the current and unique session lock (known issue)
./core/mutex.class.inc.php:				$res = $this->QueryToScalar("SELECT GET_LOCK('".$this->sName."', 3600)");
./core/mutex.class.inc.php:		$res = $this->QueryToScalar("SELECT GET_LOCK('".$this->sName."', 0)");
./core/mutex.class.inc.php:			$sMsg = 'GET_LOCK('.$this->sName.', 0) returned: '.var_export($res, true).'. Expected values are: 0, 1 or null';
./lib/symfony/http-foundation/Session/Storage/Handler/PdoSessionHandler.php:                $stmt = $this->pdo->prepare('SELECT GET_LOCK(:key, 50)');

@annProg
Copy link
Contributor Author

annProg commented Jul 6, 2020

core/mutex.class.inc.php 中关于 GET_LOCKRELEASE_LOCK 的函数

/**
 * Class iTopMutex
 * A class to serialize the execution of some code sections
 * Emulates the API of PECL Mutex class
 * Relies on MySQL locks because the API sem_get is not always present in the
 * installed PHP.    
 *
 * @copyright   Copyright (C) 2013-2018 Combodo SARL
 * @license     http://opensource.org/licenses/AGPL-3.0
 */
class iTopMutex
{
...
	/**
	 * Acquire the mutex. Uses a MySQL lock. <b>Warn</b> : can have an abnormal behavior on MySQL clusters (see R-016204)
	 *
	 * @see https://dev.mysql.com/doc/refman/5.7/en/miscellaneous-functions.html#function_get-lock
	 */	
	public function Lock()
	{
		if ($this->bLocked)
		{
			// Lock already acquired
			return;
		}
		if (self::$aAcquiredLocks[$this->sName] == 0)
		{
			do
			{
				$res = $this->QueryToScalar("SELECT GET_LOCK('".$this->sName."', 3600)");
				if (is_null($res))
				{
					throw new Exception("Failed to acquire the lock '".$this->sName."'");
				}
				// $res === '1' means I hold the lock
				// $res === '0' means it timed out
			}
			while ($res !== '1');
		}
		$this->bLocked = true;
		self::$aAcquiredLocks[$this->sName]++;
	}

	/**
	 *	Attempt to acquire the mutex
	 *	@returns bool True if the mutex is acquired, false if already locked elsewhere	 
	 */	
	public function TryLock()
	{
		if ($this->bLocked)
		{
			return true; // Already acquired
		}
		if (self::$aAcquiredLocks[$this->sName] > 0)
		{
			self::$aAcquiredLocks[$this->sName]++;
			$this->bLocked = true;
			return true;
		}
		
		$res = $this->QueryToScalar("SELECT GET_LOCK('".$this->sName."', 0)");
		if (is_null($res))
		{
			throw new Exception("Failed to acquire the lock '".$this->sName."'");
		}
		// $res === '1' means I hold the lock
		// $res === '0' means it timed out
		if ($res === '1')
		{
			$this->bLocked = true;
			self::$aAcquiredLocks[$this->sName]++;
		}
		if (($res !== '1') && ($res !== '0'))
		{
			$sMsg = 'GET_LOCK('.$this->sName.', 0) returned: '.var_export($res, true).'. Expected values are: 0, 1 or null';
			IssueLog::Error($sMsg);
			throw new Exception($sMsg);
		}
		return ($res !== '0');
	}

	/**
	 *	Release the mutex
	 */	
	public function Unlock()
	{
		if (!$this->bLocked)
		{
			// ??? the lock is not acquired, exit
	        return;	
		}
		if (self::$aAcquiredLocks[$this->sName] == 0)
		{
			return; // Safety net
		}
		
		if (self::$aAcquiredLocks[$this->sName] == 1)
		{
			$res = $this->QueryToScalar("SELECT RELEASE_LOCK('".$this->sName."')");
		}
		$this->bLocked = false;
		self::$aAcquiredLocks[$this->sName]--;
	}

@annProg
Copy link
Contributor Author

annProg commented Jul 6, 2020

cron

分别在 1,2 号节点部署 cron 任务,1号2号都可能有报错日志:

ERROR: 'Trying to commit transaction whereas none have been started !: mysql_errno = 3101, mysql_error = Plugin instructed the server to rollback the current transaction.'

Please note, that in general this error is an expected behavior of Group Replication plugin, when it finds conflicting transactions on different nodes, if Multi Primary mode is used. Application should be able to handle this type of errors based on application logic. The following sections can only applied in those cases, when this error is caused by multiple updates on the same node.

参考: https://support.oracle.com/knowledge/Oracle%20Database%20Products/2211042_1.html

可能最好只在一个节点上运行 cron

@annProg
Copy link
Contributor Author

annProg commented Jul 7, 2020

用到 Lock 的代码

./core/counter.class.inc.php:		$oiTopMutex->Lock();
./core/log.class.inc.php:			$oLock->Lock();
./core/ormlinkset.class.inc.php:		$oMtx->Lock();
./core/ownershiplock.class.inc.php:		$oMutex->Lock();
./core/ownershiplock.class.inc.php:		$oMutex->Lock();
./core/ownershiplock.class.inc.php:		$oMutex->Lock();
./core/ownershiplock.class.inc.php:		$oMutex->Lock();
./core/ownershiplock.class.inc.php:		$oMutex->Lock();
./datamodels/2.x/itop-backup/ajax.backup.php:				$oRestoreMutex->Lock();
./datamodels/2.x/itop-backup/main.itop-backup.php:		$oMutex->Lock();
./datamodels/2.x/itop-core-update/src/Service/CoreUpdater.php:		$oMutex->Lock();
./datamodels/2.x/itop-hub-connector/ajax.php:	$oMutex->Lock();
./synchro/synchro_import.php:		$oMutex->Lock();
./synchro/synchrodatasource.class.inc.php:			$oMutex->Lock();

@annProg
Copy link
Contributor Author

annProg commented Jul 7, 2020

core/counter.class.inc.php

# find ./ -name "*.php" |xargs grep --color "IncClass("
./core/counter.class.inc.php:	public static function IncClass($sLeafClass)
./core/metamodel.class.php:		return ItopCounter::IncClass($sClass);
./env-production/itop-tickets/model.itop-tickets.php:          $iNextId = ItopCounter::IncClass(get_class($this));

MetaModel::GetNextKey()

MetaModel::GetNextKey()
iTopCounter::IncClass()
iTopCounter::Inc()

弃用的函数,2.7 版本未使用此函数。但是搜不到 ItopCounter::incRootClass 函数,怀疑可能就是 ItopCounter::IncClass

    /**
     * @deprecated 2.7.0 N°1627, use ItopCounter::incRootClass($sClass) instead
     *
     * @param string $sClass
     *
     * @return int
     * @throws \CoreException
     */
    public static function GetNextKey($sClass)
    {
        return ItopCounter::IncClass($sClass);
    }

itop-tickets DBInsertNoReload()

DBInsertNoReload()
MakeTicketRef()
ItopCounter::IncClass()
iTopCounter::Inc()

ref 指工单编号,因此多主集群可能导致生成的 工单编号 错误。

    public function MakeTicketRef()
    {
          $iNextId = ItopCounter::IncClass(get_class($this));
          $sRef = $this->MakeTicketRef($iNextId);
          $this->SetIfNull('ref', $sRef);
          $iKey = parent::DBInsertNoReload();
          return $iKey;
    }

        protected function MakeTicketRef($iNextId)
        {
                return sprintf(static::GetTicketRefFormat(), $iNextId);
        }




        public static function GetTicketRefFormat()
        {
                return 'T-%06d';
        }

测试方案

分别使用 csv 导入 和 api 来写入 100 个工单,检查 ref 生成是否重复或者缺失

csv导入

实验组:mgr集群单实例操作
对照组:非mgr集群

结论:未发现异常,均生成连续不重复的 ref

实验组:mgr集群多实例同时操作
对照组:非mgr集群多浏览器窗口同时操作

实验组出现重复 ref,且最终导入了166个工单,对照组生成100个连续且不重复的 ref

      1 R-000261
      1 R-000262
      1 R-000263
      1 R-000264
      1 编号
      2 R-000127
      2 R-000154
      2 R-000157
      2 R-000189
      2 R-000243

image

API调用

脚本

#!/bin/bash

[ $# -lt 2 ] && echo "$0 password url" && exit 1
user=admin
password=$1
url=$2
json_data='{"operation":"core/create","comment":"test mgr","class":"UserRequest","output_fields":"id,friendlyname","fields":{"org_id":"SELECT Organization WHERE name = \"Demo\"","caller_id":{"name":"Christie", "first_name":"Agatha"},"title":"Test MGR From API", "description":"Test"}}'

curl -s "$url/webservices/rest.php?version=1.3" -d "auth_user=$user&auth_pwd=$password&json_data=$json_data"

同时调用3个节点的API

#!/bin/bash

for id in `seq 1 100`;do echo $id;done |parallel -j 3 ./ticket-api.sh admin http://192.168.10.101 &
for id in `seq 1 100`;do echo $id;done |parallel -j 3 ./ticket-api.sh admin http://192.168.10.102 &
for id in `seq 1 100`;do echo $id;done |parallel -j 3 ./ticket-api.sh admin http://192.168.10.103 &

结果:有重复编号

      1 R-000743
      1 R-000744
      1 R-000745
      1 R-000746
      1 R-000747
      1 编号
      2 R-000574
      2 R-000585
      2 R-000591
      2 R-000594
      2 R-000595
      2 R-000599
      2 R-000603
      2 R-000612
      2 R-000614
      2 R-000615
      2 R-000638
      2 R-000646
      2 R-000655
      2 R-000665
      2 R-000667
      2 R-000668
      2 R-000669
      2 R-000673
      2 R-000674
      2 R-000676
      2 R-000677
      2 R-000681
      2 R-000683
      2 R-000684
      2 R-000686
      3 R-000596
      3 R-000685
      4 R-000593
      4 R-000598
      5 R-000597

image

@annProg
Copy link
Contributor Author

annProg commented Jul 7, 2020

core/log.class.inc.php

	/**
	 * Rotate current log file
	 *
	 * @param DateTime $oLogFileLastModified date when the log file was last modified
	 *
	 * @uses \iTopMutex instead of flock as doing a rename on a file with a flock cause an error on PHP 5.6.40 Windows (ok on 7.3.15 though)
	 * @uses GetRotatedFileName to get rotated file name
	 */
	protected function RotateLogFile($oLogFileLastModified)
	{
		if (!$this->IsLogFileExists()) // extra check, but useful for cron also !
		{
			return;
		}

		$oLock = null;
		try
		{
			$oLock = new iTopMutex('log_rotation_'.$this->sLogFileFullPath);
			$oLock->Lock();
			if (!$this->IsLogFileExists()) // extra extra check if we were blocked and another process moved the file in the meantime
			{
				$oLock->Unlock();
				return;
			}
			$this->ResetLastModifiedDateForFile();
			$sNewLogFileName = $this->GetRotatedFileName($oLogFileLastModified);
			rename($this->sLogFileFullPath, $sNewLogFileName);
		}
		catch (Exception $e)
		{
			// nothing to do, cannot log... file will be renamed on the next call O:)
			return;
		}
		finally
		{
			if (!is_null($oLock)) { $oLock->Unlock();}
		}
	}

CheckAndRotateLogFile()
RotateLogFile()
iTopMutex::Lock()

暂无结论

@annProg
Copy link
Contributor Author

annProg commented Jul 7, 2020

core/ormlinkset.class.inc.php

cmdbAbstractObject::DBInsertNoReload and cmdbAbstractObject::DBUpdate
DBObject::DBInsertNoReload() and DBObject::DBUpdate()
DBObject::DBWriteLinks()
OrmLinkSet::DBWrite()
iTopMutex::Lock()

	public function DBWrite(DBObject $oHostObject)
	{
                 ...
		// Critical section : serialize any write access to these links
		//
		$oMtx = new iTopMutex('Write-'.$this->sClass);
		$oMtx->Lock();

DBObject:DBWriteLinks()

/**
 * used both by insert/update
 *
 * @internal
 *
 * @throws \CoreException
 */
private function DBWriteLinks()
{
    foreach(MetaModel::ListAttributeDefs(get_class($this)) as $sAttCode => $oAttDef)
    {
        if (!$oAttDef->IsLinkSet()) continue;
        if (!array_key_exists($sAttCode, $this->m_aTouchedAtt)) continue;
        if (array_key_exists($sAttCode, $this->m_aModifiedAtt) && ($this->m_aModifiedAtt[$sAttCode] == false)) continue;

        /** @var \ormLinkSet $oLinkSet */
        $oLinkSet = $this->m_aCurrValues[$sAttCode];
        $oLinkSet->DBWrite($this);
    }
}

影响范围很大。

测试思路:尝试写 xx_list 属性是否有异常

测试脚本

#!/bin/bash

[ $# -lt 2 ] && echo "$0 password url" && exit 1
user=admin
password=$1
url=$2
json_data='{"operation":"core/update","comment":"test mgr","class":"Person","key":"SELECT Person WHERE name=\"Xing\" AND first_name=\"Ming\"","output_fields":"id,team_list,friendlyname","fields":{"team_list":[{"team_id":{"name":"Helpdesk"}}]}}'

curl -s "$url/webservices/rest.php?version=1.3" -d "auth_user=$user&auth_pwd=$password&json_data=$json_data" |jq .

单主模式并发调用3个iTop实例

#!/bin/bash

for id in `seq 1 10`;do echo $id;done |parallel -j 3 ./ormlinkset.sh admin http://192.168.10.101 &
for id in `seq 1 10`;do echo $id;done |parallel -j 3 ./ormlinkset.sh admin http://192.168.10.102 &
for id in `seq 1 10`;do echo $id;done |parallel -j 3 ./ormlinkset.sh admin http://192.168.10.103 &

出现重复创建的 lnkPersonToTeam
image

单主模式并发调用1个iTop实例

for id in `seq 1 30`;do echo $id;done |parallel -j 20 ./ormlinkset.sh admin http://192.168.10.101

也出现创建 2个 lnk 的情况

非mgr并发调用

也出现创建 2 个 lnk 的情况。

mgr多主并发调用

也出现了 2 个 lnk 的情况

结论

可能是测试 case 不对?

@annProg
Copy link
Contributor Author

annProg commented Jul 7, 2020

core/ownershiplock.class.inc.php

/**
 * Mechanism to obtain an exclusive lock while editing an object
 *
 * @package     iTopORM
 */

/**
 * Persistent storage (in the database) for remembering that an object is locked 
 */
class iTopOwnershipToken extends DBObject
{
...
}

/**
 * Utility class to acquire/extend/release/kill an exclusive lock on a given persistent object,
 * for example to prevent concurrent edition of the same object.
 * Each lock has an expiration delay of 120 seconds (tunable via the configuration parameter 'concurrent_lock_expiration_delay')
 * A watchdog (called twice during this delay) is in charge of keeping the lock "alive" while an object is being edited.
 */
class iTopOwnershipLock
{

application/cmdbabstract.class.php 中

	public function DisplayModifyForm(WebPage $oPage, $aExtraParams = array())
	{
		$sOwnershipToken = null;
		$iKey = $this->GetKey();
		$sClass = get_class($this);
		$sMode = ($iKey > 0) ? static::ENUM_OBJECT_MODE_EDIT : static::ENUM_OBJECT_MODE_CREATE;

		if ($sMode === static::ENUM_OBJECT_MODE_EDIT)
		{
			// The concurrent access lock makes sense only for already existing objects
			$LockEnabled = MetaModel::GetConfig()->Get('concurrent_lock_enabled');
			if ($LockEnabled)
			{
				$sOwnershipToken = utils::ReadPostedParam('ownership_token', null, 'raw_data');
				if ($sOwnershipToken !== null)
				{
					// We're probably inside something like "apply_modify" where the validation failed and we must prompt the user again to edit the object
					// let's extend our lock
				}
				else
				{
					$aLockInfo = iTopOwnershipLock::AcquireLock($sClass, $iKey);
					if ($aLockInfo['success'])
					{
						$sOwnershipToken = $aLockInfo['token'];
					}
					else
					{
						// If the object is locked by the current user, it's worth trying again, since
						// the lock may be released by 'onunload' which is called AFTER loading the current page.
						//$bTryAgain = $oOwner->GetKey() == UserRights::GetUserId();
						self::ReloadAndDisplay($oPage, $this, array('operation' => 'modify'));

						return;
					}
				}
			}
		}

阻止并发修改

注意到代码中检查配置项 concurrent_lock_enabled

$LockEnabled = MetaModel::GetConfig()->Get('concurrent_lock_enabled');

因此,这似乎是一个可选项,而且默认是关闭的,没有应该也没啥关系。

存在的问题:
只有在同一节点上才能正常锁住对象,不同节点编辑同一对象,没有提示信息。也就是说多 master 的集群无法使用 Locking to prevent concurrent modifications 的功能
image

ref: https://www.itophub.io/wiki/page?id=2_2_0%3Aadmin%3Alocking

@annProg
Copy link
Contributor Author

annProg commented Jul 7, 2020

datamodels/2.x/itop-backup

class BackupExec extends AbstractWeeklyScheduledProcess
{
...
	/**
	 * @param int $iUnixTimeLimit
	 * @return string
	 * @throws Exception
	 */
	public function Process($iUnixTimeLimit)
	{
		$oMutex = new iTopMutex('backup.'.utils::GetCurrentEnvironment());
		$oMutex->Lock();

in ajax.backup.php,数据恢复时也需要 Lock

		/*
		 * We can't call \LoginWebPage::DoLogin because DBRestore will do a compile after restoring the DB
		 * Authentication is checked with a token file (see $sOperation='restore_get_token')
		 */
		case 'restore_exec':
			require_once(APPROOT."setup/runtimeenv.class.inc.php");
			require_once(APPROOT.'/application/utils.inc.php');
			require_once(APPROOT.'/setup/backup.class.inc.php');
			require_once(dirname(__FILE__).'/dbrestore.class.inc.php');

			IssueLog::Enable(APPROOT.'log/error.log');

			$oPage = new ajax_page("");
			$oPage->no_cache();
			$oPage->SetContentType('text/html');

			if (utils::GetConfig()->Get('demo_mode'))
			{
				$oPage->add("<div data-error-stimulus=\"Error\">Sorry, iTop is in <b>demonstration mode</b>: the feature is disabled.</div>");
			}
			else
			{
				$sEnvironment = utils::ReadParam('environment', 'production', false, 'raw_data');
				$oRestoreMutex = new iTopMutex('restore.'.$sEnvironment);
				IssueLog::Info("Backup Restore - Acquiring the LOCK 'restore.$sEnvironment'");
				$oRestoreMutex->Lock();
				IssueLog::Info('Backup Restore - LOCK acquired, executing...');
				try
				{

@annProg
Copy link
Contributor Author

annProg commented Jul 7, 2020

datamodels/2.x/itop-core-update

CoreUpdater::DoBackup

	/**
	 *
	 * @param string $sTargetFile
	 *
	 * @throws Exception
	 */
	private static function DoBackup($sTargetFile)
	{
		// Make sure the target directory exists
		$sBackupDir = dirname($sTargetFile);
		SetupUtils::builddir($sBackupDir);

		$oBackup = new DBBackup();
		$oBackup->SetMySQLBinDir(MetaModel::GetConfig()->GetModuleSetting('itop-backup', 'mysql_bindir', ''));

		$oMutex = new iTopMutex('backup.'.utils::GetCurrentEnvironment());
		$oMutex->Lock();
		try
		{
			$oBackup->CreateCompressedBackup($sTargetFile);
			SetupLog::Info('itop-core-update: Backup done: '.$sTargetFile);
		} catch (Exception $e)
		{
			$oMutex->Unlock();
			throw $e;
		}
		$oMutex->Unlock();
	}

@annProg
Copy link
Contributor Author

annProg commented Jul 7, 2020

datamodels/2.x/itop-hub-connector

备份相关

/**
 *
 * @param string $sTargetFile
 * @throws Exception
 * @return DBBackupWithErrorReporting
 */
function DoBackup($sTargetFile)
{
	// Make sure the target directory exists
	$sBackupDir = dirname($sTargetFile);
	SetupUtils::builddir($sBackupDir);
	
	$oBackup = new DBBackupWithErrorReporting();
	$oBackup->SetMySQLBinDir(MetaModel::GetConfig()->GetModuleSetting('itop-backup', 'mysql_bindir', ''));
	$sSourceConfigFile = APPCONF.utils::GetCurrentEnvironment().'/'.ITOP_CONFIG_FILE;
	
	$oMutex = new iTopMutex('backup.'.utils::GetCurrentEnvironment());
	$oMutex->Lock();
	try
	{
		$oBackup->CreateCompressedBackup($sTargetFile, $sSourceConfigFile);
	}
	catch (Exception $e)
	{
		$oMutex->Unlock();
		throw $e;
	}
	$oMutex->Unlock();
	return $oBackup;
}

@annProg
Copy link
Contributor Author

annProg commented Jul 7, 2020

synchro

synchro_import.php

		$iPreviousTimeLimit = ini_get('max_execution_time');
		$iLoopTimeLimit = MetaModel::GetConfig()->Get('max_execution_time_per_loop');
		$oMutex = new iTopMutex('synchro_import_'.$oDataSource->GetKey());
		$oMutex->Lock();
		set_time_limit($iLoopTimeLimit);
		foreach ($aData as $iRow => $aRow)
		{
			$sReconciliationCondition = '`primary_key` = '.CMDBSource::Quote($aRow[$iPrimaryKeyCol]);

synchrodatasource.class.inc.php

class SynchroExecution
{
...
	/**
	 * Perform a synchronization between the data stored in the replicas (synchro_data_xxx_xx table)
	 * and the iTop objects.
	 *
	 * @return SynchroLog
	 * @throws \CoreUnexpectedValue
	 */
	public function Process()
	{
		$this->PrepareLogs();

		self::$m_oCurrentTask = $this->m_oDataSource;

		$oMutex = new iTopMutex('synchro_process_'.$this->m_oDataSource->GetKey());
		try
		{
			$oMutex->Lock();
			$this->DoSynchronize();
			$oMutex->Unlock();

@annProg
Copy link
Contributor Author

annProg commented Jul 8, 2020

结论

最好使用 MGR Single-Primary 模式,用 msyql-shell 把集群切换到 SinglePrimary 模式

var c = dba.getCluster()
c.switchToSinglePrimaryMode()

初始化 mysqlrouter

# mysqlrouter --bootstrap root:[email protected]:3306 --user=mysqlrouter
# Bootstrapping system MySQL Router instance...

- Creating account(s) (only those that are needed, if any)
- Verifying account (using it to run SQL queries that would be run by Router)
- Storing account in keyring
- Adjusting permissions of generated files
- Creating configuration /etc/mysqlrouter/mysqlrouter.conf

Existing configuration backed up to '/etc/mysqlrouter/mysqlrouter.conf.bak'

# MySQL Router configured for the InnoDB Cluster 'itopMgr'

After this MySQL Router has been started with the generated configuration

    $ /etc/init.d/mysqlrouter restart
or
    $ systemctl start mysqlrouter
or
    $ mysqlrouter -c /etc/mysqlrouter/mysqlrouter.conf

the cluster 'itopMgr' can be reached by connecting to:

## MySQL Classic protocol

- Read/Write Connections: localhost:6446
- Read/Only Connections:  localhost:6447

## MySQL X protocol

- Read/Write Connections: localhost:64460
- Read/Only Connections:  localhost:64470

按照提示连接 mysqlrouter

附自动生成的配置文件 /etc/mysqlrouter/mysqlrouter.conf

# File automatically generated during MySQL Router bootstrap
[DEFAULT]
name=system
user=mysqlrouter
keyring_path=/var/lib/mysqlrouter/keyring
master_key_path=/etc/mysqlrouter/mysqlrouter.key
connect_timeout=15
read_timeout=30
dynamic_state=/var/lib/mysqlrouter/state.json

[logger]
level = INFO

[metadata_cache:itopMgr]
cluster_type=gr
router_id=1
user=mysql_router1_vf4ftydo7wb1
metadata_cluster=itopMgr
ttl=0.5
auth_cache_ttl=-1
auth_cache_refresh_interval=2
use_gr_notifications=0

[routing:itopMgr_rw]
bind_address=0.0.0.0
bind_port=6446
destinations=metadata-cache://itopMgr/?role=PRIMARY
routing_strategy=first-available
protocol=classic

[routing:itopMgr_ro]
bind_address=0.0.0.0
bind_port=6447
destinations=metadata-cache://itopMgr/?role=SECONDARY
routing_strategy=round-robin-with-fallback
protocol=classic

[routing:itopMgr_x_rw]
bind_address=0.0.0.0
bind_port=64460
destinations=metadata-cache://itopMgr/?role=PRIMARY
routing_strategy=first-available
protocol=x

[routing:itopMgr_x_ro]
bind_address=0.0.0.0
bind_port=64470
destinations=metadata-cache://itopMgr/?role=SECONDARY
routing_strategy=round-robin-with-fallback
protocol=x

将 bootstrap 节点上的 mysqlrouter 的配置文件及 key 复制到其他节点并在其他节点启动 mysqlrouter,然后将 iTop 数据库配置改为

'db_host' => '127.0.0.1:6446',

测试 case 回顾

使用 SinglePrimary 模式回顾以上测试 case

ticket-ref

正常生成 工单编号,无重复。

# cat /tmp/a |sort |uniq -c |sort -n |tail -n 5
      1 R-001910
      1 R-001911
      1 R-001912
      1 R-001913
      1 编号

备份和恢复报错

依然存在。单主模式也用了 gtid

阻止并发修改

未发现异常

@annProg
Copy link
Contributor Author

annProg commented Jul 8, 2020

单主模式容灾测试

关闭 Primary 节点,可以看到 102 节点变成了 Primary

RUN on 192.168.10.102
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST    | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 94ef913e-c027-11ea-a1ee-5254004d77d3 | 192.168.10.102 |        3306 | ONLINE       | PRIMARY     | 8.0.20         |
| group_replication_applier | d56ba753-c027-11ea-a866-5254004d77d3 | 192.168.10.103 |        3306 | ONLINE       | SECONDARY   | 8.0.20         |
+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+

访问 101 上的 iTop 实例,依然可以访问。

启动 101 上的 mysql,重新加入集群,角色为 SECONDARY

RUN on 192.168.10.102
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST    | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 51f47cd8-c027-11ea-958b-5254004d77d3 | 192.168.10.101 |        3306 | ONLINE       | SECONDARY   | 8.0.20         |
| group_replication_applier | 94ef913e-c027-11ea-a1ee-5254004d77d3 | 192.168.10.102 |        3306 | ONLINE       | PRIMARY     | 8.0.20         |
| group_replication_applier | d56ba753-c027-11ea-a866-5254004d77d3 | 192.168.10.103 |        3306 | ONLINE       | SECONDARY   | 8.0.20         |
+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+

指定 Primary

宕机Primary恢复后,指定其仍为 Primary

var c = dba.getCluster()
c.setPrimaryInstance('[email protected]')

结果

RUN on 192.168.10.102
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST    | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 51f47cd8-c027-11ea-958b-5254004d77d3 | 192.168.10.101 |        3306 | ONLINE       | PRIMARY     | 8.0.20         |
| group_replication_applier | 94ef913e-c027-11ea-a1ee-5254004d77d3 | 192.168.10.102 |        3306 | ONLINE       | SECONDARY   | 8.0.20         |
| group_replication_applier | d56ba753-c027-11ea-a866-5254004d77d3 | 192.168.10.103 |        3306 | ONLINE       | SECONDARY   | 8.0.20         |
+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+

@annProg
Copy link
Contributor Author

annProg commented Jul 8, 2020

问题

https://dev.mysql.com/doc/refman/5.7/en/group-replication-limitations.html

文档中对 GET_LOCK 的限制并没有明确说只在 multi primary 模式存在。

实测,参考:https://blog.csdn.net/tangtong1/article/details/51792617

连接 PRIMARY 节点,创建 test库, test_lock 表,注意mgr要求每个表都有索引,需要设置 primary key

create database test;
create table test_lock(
	id int,
	name varchar(50),
	address varchar(50),
	primary key(id)
);

启动两个客户端连接 MySQL。
A1 上执行以下语句:

select get_lock('key_lock', 100);
update test_lock set name = 'tt2', address = 'aaaaaaaaaaaaaaaaaaaa' where id = 1; #只更新name列
select release_lock('key_lock');

B1 上执行以下语句

select get_lock('key_lock', 100);

现象:
当 A1 执行完 get_lock 之后,B1 执行 get_lock 会卡住,直到 A1 执行 release_lock 之后 B1 才能成功获取锁。以下结果可以看到 A1 立即获取了锁,B1 等待了 23.9 秒才获取到锁。

A1:
 MySQL  192.168.10.101:33060+ ssl  test  SQL > select release_lock('key_lock');
+--------------------------+
| release_lock('key_lock') |
+--------------------------+
|                        1 |
+--------------------------+
1 row in set (0.0004 sec)

B1:
MySQL  192.168.10.101:33060+ ssl  test  SQL > select get_lock('key_lock', 100);
+---------------------------+
| get_lock('key_lock', 100) |
+---------------------------+
|                         1 |
+---------------------------+
1 row in set (23.9473 sec)

结论,单主模式可以使用 GET_LOCK

接下来测试多主模式。

A1,B1 连接同一个 PRIMIARY 时,结果和单主模式相同,连接不同 PRIMARY 时,GET_LOCK 无效。

@annProg
Copy link
Contributor Author

annProg commented Jul 8, 2020

唯一性检查

唯一性检查是一个比较重要的功能,应测试是否受 MGR 影响。

测试思路:

默认模型中 Person 的唯一性检查规则规定员工号唯一,尝试创建具有相同员工号的 Person,试试会不会有问题。和单节点做对照。

使用以下脚本测试(unique.sh)

#!/bin/bash

[ $# -lt 2 ] && echo "$0 password url" && exit 1
user=admin
password=$1
url=$2
json_data='{"operation":"core/create","comment":"test mgr","class":"Person","output_fields":"id,employee_number,friendlyname","fields":{"org_id":"SELECT Organization WHERE name = \"Demo\"","name":"Xing","first_name":"Ming","employee_number":"2020"}}'

curl -s "$url/webservices/rest.php?version=1.3" -d "auth_user=$user&auth_pwd=$password&json_data=$json_data" |jq .

并发请求3节点

执行以下脚本,并发向3个节点创建Person

#!/bin/bash

for id in `seq 1 10`;do echo $id;done |parallel -j 3 ./unique.sh admin http://192.168.10.101 &
for id in `seq 1 10`;do echo $id;done |parallel -j 3 ./unique.sh admin http://192.168.10.102 &
for id in `seq 1 10`;do echo $id;done |parallel -j 3 ./unique.sh admin http://192.168.10.103 &

在 单主 和 多主 模式的MGR集群中均创建了5个有相同工号的联系人。
image

请求单一节点

for id in `seq 1 30`;do echo $id;done |parallel -j 30 ./unique.sh admin http://192.168.10.101

单主模式下 理论上也会出问题。非 mgr 下测试,创建了6个

db检查

System 下的 DB工具
image

代码分析

代码注释里已经写明了可能的问题。。

			// No iTopMutex so there might be concurrent access !
			// But the necessary lock would have a high performance cost :(

加锁性能代价太大。。

dbobject.class.php 中

	/**
     * @internal
     * 
	 * @throws \CoreException
	 * @throws \OQLException
     *
	 * @since 2.6.0 N°659 uniqueness constraint
	 * @api
	 */
	protected function DoCheckUniqueness()
	{
		$sCurrentClass = get_class($this);
		$aUniquenessRules = MetaModel::GetUniquenessRules($sCurrentClass);

		foreach ($aUniquenessRules as $sUniquenessRuleId => $aUniquenessRuleProperties)
		{
			if ($aUniquenessRuleProperties['disabled'] === true)
			{
				continue;
			}

			// No iTopMutex so there might be concurrent access !
			// But the necessary lock would have a high performance cost :(
			$bHasDuplicates = $this->HasObjectsInDbForUniquenessRule($sUniquenessRuleId, $aUniquenessRuleProperties);
			if ($bHasDuplicates)
			{
				$bIsBlockingRule = $aUniquenessRuleProperties['is_blocking'];
				if (is_null($bIsBlockingRule))
				{
					$bIsBlockingRule = true;
				}

				$sErrorMessage = $this->GetUniquenessRuleMessage($sUniquenessRuleId);

				if ($bIsBlockingRule)
				{
					$this->m_aCheckIssues[] = $sErrorMessage;
					continue;
				}
				$this->m_aCheckWarnings[] = $sErrorMessage;
				continue;
			}
		}
	}

# 然后在 DoCheckToWrite 函数中调用唯一性检查
	/**
	 * Check integrity rules (before inserting or updating the object)
	 *
     * **This method is not meant to be called directly, use DBObject::CheckToWrite()!**
	 * Errors should be inserted in $m_aCheckIssues and $m_aCheckWarnings arrays
     *
     * @overwritable-hook You can extend this method in order to provide your own logic.
     * @see CheckToWrite()
     * @see $m_aCheckIssues
     * @see $m_aCheckWarnings
     *
	 * @throws \ArchivedObjectException
	 * @throws \CoreException
	 * @throws \OQLException
	 *
	 */
	public function DoCheckToWrite()
	{
		$this->DoComputeValues();

		$this->DoCheckUniqueness();

		$aChanges = $this->ListChanges();

		foreach($aChanges as $sAttCode => $value)
		{
			$res = $this->CheckValue($sAttCode);
			if ($res !== true)
			{
				// $res contains the error description
				$this->m_aCheckIssues[] = "Unexpected value for attribute '$sAttCode': $res";
			}
		}
		if (count($this->m_aCheckIssues) > 0)
		{
			// No need to check consistency between attributes if any of them has
			// an unexpected value
			return;
		}
		$res = $this->CheckConsistency();
		if ($res !== true)
		{
			// $res contains the error description
			$this->m_aCheckIssues[] = "Consistency rules not followed: $res";
		}

		// Synchronization: are we attempting to modify an attribute for which an external source is master?
		//
		if ($this->m_bIsInDB && $this->InSyncScope() && (count($aChanges) > 0))
		{
			foreach($aChanges as $sAttCode => $value)
			{
				$iFlags = $this->GetSynchroReplicaFlags($sAttCode, $aReasons);
				if ($iFlags & OPT_ATT_SLAVE)
				{
					// Note: $aReasonInfo['name'] could be reported (the task owning the attribute)
					$oAttDef = MetaModel::GetAttributeDef(get_class($this), $sAttCode);
					$sAttLabel = $oAttDef->GetLabel();
					if (!empty($aReasons))
					{
						// Todo: associate the attribute code with the error
						$this->m_aCheckIssues[] = Dict::Format('UI:AttemptingToSetASlaveAttribute_Name', $sAttLabel);
					}
				}
			}
		}
	}

结论

实践中,两人同时创建相同对象的情况应该很少。基本可以忽略此问题。如果出现,可以用 DB工具 来发现并修复问题。

另外,由本节开头提到的代码注释可知,唯一性检查未使用 iTopMutex,即和 GET_LOCK 无关,和MGR也无关。

@annProg
Copy link
Contributor Author

annProg commented Jul 9, 2020

yum install -y https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm
yum install -y mysql-community-server mysql-shell
cat > /tmp/init.sql <<EOF
ALTER USER 'root'@'localhost' IDENTIFIED BY '$MYSQL_ROOT' PASSWORD EXPIRE NEVER;
ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY '$MYSQL_ROOT';
create user root@'%' identified WITH mysql_native_password BY '$MYSQL_ROOT';
grant all privileges on *.* to root@'%' with grant option;
flush privileges;
EOF
	rm -fr /var/lib/mysql
	mysqld --initialize-insecure --user=mysql
	systemctl start mysqld
	# (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2),重启生成 sock 文件
	grep "report_host" /etc/my.cnf || echo "report_host=$MYIP" >> /etc/my.cnf
	systemctl restart mysqld
	systemctl status mysqld
	mysql -uroot -e "show databases;"
 
	mysql -uroot < /tmp/init.sql
cat << EOF > /tmp/config_local_instance.js
dba.configureLocalInstance('root@$MYIP:3306', {'password': 'root', 'interactive': false})
EOF
 
	# 配置本地实例
	mysqlsh --js --file=/tmp/config_local_instance.js
	systemctl restart mysqld
cat << EOF > /tmp/check_local_instance.js
shell.connect('[email protected]:3306', 'root')
dba.checkInstanceConfiguration()
EOF
	# 检查本地实例
	mysqlsh --js --file=/tmp/check_local_instance.js
cat << EOF > /tmp/init_cluster.js
shell.connect('[email protected]:3306', 'root')
dba.createCluster('$CLUSTER_NAME', {'localAddress': '192.168.10.201','multiPrimary': true, 'force': true})
var cluster=dba.getCluster('$CLUSTER_NAME')
cluster.addInstance('[email protected]:3306', {'localAddress': '192.168.10.202', 'password': 'root', 'recoveryMethod':'clone'})
cluster.addInstance('[email protected]:3306', {'localAddress': '192.168.10.203', 'password': 'root','recoveryMethod':'clone'})
EOF
 
	# 等全部节点启动后执行,只执行一次,js脚本中只在 201 节点上执行
	if [ "$ID"x == "10203"x ];then
		mysqlsh --js --file=/tmp/init_cluster.js
JS > var cluster = dba.getCluster()
JS > var cluster = dba.getCluster("<Cluster_name>")
JS > var cluster = dba.createCluster('name')
JS > cluster.removeInstance('root@<IP_Address>:<Port_No>',{force: true})
JS > cluster.addInstance('root@<IP add>,:<port>')
JS > cluster.addInstance('root@<IP add>,:<port>')
JS > dba.getCluster()
JS > dba.getCluster().status() 
JS > dba.getCluster().checkInstanceState('root@<IP_Address>:<Port_No>')
JS > dba.getCluster().rejoinInstance('root@<IP_Address>:<Port_No>')
JS > dba.getCluster().describe()

@annProg
Copy link
Contributor Author

annProg commented Jul 9, 2020

digraph mgrsingle{   
    node[shape=box];
    a1 [label="iTop-1"];
    b1 [label="iTop-2"];
    c1 [label="iTop-3"];
    
    subgraph cluster_mgr_s{
        node[shape=cylinder];
        ms1 [label="PRIMARY"];
        ms2 [label="SECONDARY-1"];
        ms3 [label="SECONDARY-2"];
    }
    
    r1 [label="mysqlrouter1"];
    r2 [label="mysqlrouter2"];
    r3 [label="mysqlrouter3"];
    a1 -> r1;
    b1 -> r2;
    c1 -> r3;
    
    r1 -> {ms1,ms2,ms3;}
    r2 -> {ms1,ms2,ms3;}
    r3 -> {ms1,ms2,ms3;}
    label="MGR Single-Primary";
}
digraph itopmgr {
    node[shape=box];
    a [label="iTop-1"];
    b [label="iTop-2"];
    c [label="iTop-3"];
    
    subgraph cluster_mgr{
        node[shape=cylinder];
        m1 [label="PRIMARY-1"];
        m2 [label="PRIMARY-2"];
        m3 [label="PRIMARY-3"];
    }
    
    a -> m1;
    b -> m2;
    c -> m3;
    label="MGR Multi-Primary";
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant