Skip to content
This repository has been archived by the owner on Jan 22, 2019. It is now read-only.

Commit

Permalink
improve seek() to return a generator object, and support multi-seek f…
Browse files Browse the repository at this point in the history
…unc.

fix a big bug for seek loop.
  • Loading branch information
andares committed Oct 6, 2016
1 parent 1cd99e0 commit 8fa5e9e
Show file tree
Hide file tree
Showing 4 changed files with 72 additions and 38 deletions.
31 changes: 22 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,12 @@

因为并不精通算法,实现上更偏向于业务代码风格,较多地利用了php数组双向链表的特性。算法实现上若有不对,或是有更好的优化方案,**请狠狠给个merge request!**

## 开发计划
## 更新与路线图

|计划功能|目标版本|
|功能描述|实现版本|
|---|---|
|增加seekMore功能,可一次匹配所有命中词| - |
|增加批量查找功能,可一次获取所有命中词| 1.3 |
|修正查找失败时的指针偏移bug| 1.3 |

## 安装

Expand Down Expand Up @@ -63,31 +64,43 @@ $dict->confirm();
字典创建完后就可以用于搜索,例如:

```
$result = $dict->seek('get out! asshole!');
$result = $dict->seek('get out! asshole!')->current();
if ($result) {
throw new \LogicException('you could not say it');
}
```

* seek() 搜索一个字串,看是否有字典中匹配的词。返回一个int型,找不到返回0
* seek() 搜索一个字串,看是否有字典中匹配的词。返回一个生成器对象,所以请使用```current()```方法获取第一个匹配词。

在上面的例子中,我们假设```asshole```在字典中,那么返回的```$result```即不为0
在上面的例子中,我们假设```asshole```在字典中,那么```current()```方法即可得到一个不为0的整数。

事实上,$result即Darts中的**叶子节点state**
> 事实上,$result即Darts中的**叶子节点state**
**如果传入的字串中未包含字典中的内容,由于迭代器特性,则会返回一个null值,这点需要注意!**

## 根据state获取匹配词

稍稍改进一下上面的代码:

```
$result = $dict->seek('get out! asshole!');
$result = $dict->seek('get out! asshole!')->current();
if ($result) {
throw new \LogicException('you could not say ' . $dict->getWordsByState($result));
}
```

* getWordsByState() 根据**叶子节点state**获取找到的匹配词,如果没意外上面取到的是asshole

## 查找多个命中词

```
foreach ($dict->seek('get out! asshole!') as $result) {
echo "you could not say ' . $dict->getWordsByState($result);
}
```

利用迭代器特性,foreach返回的生成器对象即可获取所有命中词条。

### 关于找到的位置

因为支持失败指针,所以state的转换不是线性的,当通过失败指针跳到其他词条(的某个节点)时,还没找到好的方法(有效率地)逆推到起始节点的办法。
Expand Down Expand Up @@ -119,7 +132,7 @@ redis()->set('dict', $packed);
$packed = redis()->get('dict');
$dict = unserialize($packed);
$result = $dict->seek($some_words);
$result = $dict->seek($some_words)->current();
// ...搜索后的1000行业务代码
```

Expand Down
9 changes: 4 additions & 5 deletions src/Dictionary.php
Original file line number Diff line number Diff line change
Expand Up @@ -33,18 +33,17 @@ public function __construct() {
/**
*
* @param string $sample
* @return int
* @return \Generator
*/
public function seek(string $sample): int {
public function seek(string $sample): \Generator {
// 先生成用于搜索的转义串
$haystack = [];
foreach ($this->splitWords($sample) as $char) {
$haystack[] = $this->index[$char] ?? 0;
}

$seeker = new Seeker($this->check, $this->base, $this->fail_states);
$found = $seeker($haystack);
return $found;
return $seeker($haystack);
}

/**
Expand Down Expand Up @@ -84,7 +83,7 @@ public function add(string $words): self {
* @param int $state
* @return string
*/
public function getWordsByState(int $state): string {
public function getWordsByState($state): string {
return $this->words_states[$state] ?? '';
}

Expand Down
31 changes: 20 additions & 11 deletions src/Seeker.php
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,12 @@ public function forFail(array $haystack): int {
return $state;
}

public function __invoke(array $haystack): int {
// 确定为ac自动机模式
$acm_mode = true;
public function __invoke(array $haystack): \Generator {
$it = $this->process($haystack);
return $it;
}

private function process(array $haystack) {
// 当前base
$base = $this->base[0];
// 开始位指针
Expand All @@ -55,7 +57,13 @@ public function __invoke(array $haystack): int {
$pre_state = 0;

// 开始搜索
$count = 0;
while (isset($haystack[$cursor])) {
$count++;
if ($count > 1000) {
die('vvv');
}

// 根据当前 base 与匹配指针位计算出 state
// 未进入索引取不到 code 的 state = -1
$state = isset($haystack[$cursor]) ?
Expand All @@ -75,7 +83,13 @@ public function __invoke(array $haystack): int {

} else {
// 遇到叶子节点,匹配成功
return $state;
yield $state;

// 重置搜索位
$base = $this->base[0];
$start++;
$verify = 0;
$pre_state = 0;
}
} else {
// state 检查失败
Expand All @@ -87,12 +101,7 @@ public function __invoke(array $haystack): int {
} else {
// 无 fail 指针,重置 base 到 root
$base = $this->base[0];
if ($acm_mode) {
// ac自动机模式下不回滚匹配进度
$start += $verify - 1;
}
// 开始位总是要步进
$start++;
$verify ? ($start += $verify) : $start++;
}
// 重置检测位 pre state
$verify = 0;
Expand All @@ -104,7 +113,7 @@ public function __invoke(array $haystack): int {
// du("$cursor = $start + $verify", '$cursor = $start + $verify');
}

// 没找到
// 搜索结束
return 0;
}
}
39 changes: 26 additions & 13 deletions tests/src/DictionaryTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -43,34 +43,47 @@ public function testAdd() {
du($this->object);

$packed = serialize($this->object);
var_dump($packed);
du(strlen($packed));
// var_dump($packed);
// du(strlen($packed));
$this->object = unserialize($packed);

// 测试找批量
$result_list = [
'毛 abcfd' => 1,
'主 导' => 1,
'习boss' => 1,
];
foreach ($this->object->seek('abd毛 主毛 abcfd 毛 主 导习bossk') as $result) {
du($result);
$word = $this->object->getWordsByState($result);
$this->assertTrue(isset($result_list[$word]));
unset($result_list[$word]);
}

// 测试深回归
$result = $this->object->seek('123毛 abcfwr');
$result = $this->object->seek('123毛 abcfwr')->current();
du($result);
du($this->object->getWordsByState($result));
$this->assertEquals('', $this->object->getWordsByState($result));

// 测试失败指针
$result = $this->object->seek('abd毛 主d 毛 主 导k');
$result = $this->object->seek('abd毛 主d 毛 主 导k')->current();
du($result);
du($this->object->getWordsByState($result));
$this->assertEquals('主 导', $this->object->getWordsByState($result));

// 简化
$packed = serialize($this->object->simplify());
var_dump($packed);
du(strlen($packed));
// var_dump($packed);
// du(strlen($packed));
$this->object = unserialize($packed);

// 测试未找到
$result = $this->object->seek('abd毛习');
$result = $this->object->seek('abd毛习')->current();
du($result);
du($this->object->getWordsByState($result));
$this->assertEquals('', $this->object->getWordsByState($result));

// 测试找到
$result = $this->object->seek('abd习bosseee');
du($result);
du($this->object->getWordsByState($result));
$result = $this->object->seek('abd习bosseee')->current();
$this->assertEquals(33, $result);

}

Expand Down

0 comments on commit 8fa5e9e

Please sign in to comment.