使用正則表達式突出搜索結果

例:php

<style>
	.word-0{ background-color: yellow; }
	.word-1{ border:1px solid red; }
</style>

<?php

header('Content-type:text/html;charset=utf-8');

/* 標記Web頁面 */
$body = '
<p>I like pickles and hrrring.</p>
<a href="pickle.php"><img width="200" src="pickle.png">A pickle pic</a>
I have herringbone-patterned toaster cozy.
<herring>Herring is not a real HTML element!</herring>
';

$words = array('pickle', 'herring');
$replacements = array();
foreach($words as $i => $word) {
	$replacements[] = "<span class='word-$i'>$word</span>";
}

// 將頁面分解爲多個塊
// 由看上去相似HTML元素的部分分隔
$parts = preg_split("{(<(?:\"[^\"]*\"|'[^']*'|[^'\">])*>)}", $body, -1, PREG_SPLIT_DELIM_CAPTURE);
//var_dump($parts);
/*
array (size=15)
  0 => string '
' (length=2)
  1 => string '<p>' (length=3)
  2 => string 'I like pickles and hrrring.' (length=27)
  3 => string '</p>' (length=4)
  4 => string '
' (length=2)
  5 => string '<a href="pickle.php">' (length=21)
  6 => string '' (length=0)
  7 => string '<img width="200" src="pickle.png">' (length=34)
  8 => string 'A pickle pic' (length=12)
  9 => string '</a>' (length=4)
  10 => string '
I have herringbone-patterned toaster cozy.
' (length=46)
  11 => string '<herring>' (length=9)
  12 => string 'Herring is not a real HTML element!' (length=35)
  13 => string '</herring>' (length=10)
  14 => string '
' (length=2)
*/

foreach($parts as $i => $part) {
	//若是這個部分是HTML元素則跳過
	if(isset($part[0]) && ($part[0] == '<')) { continue; }
	//將這些單詞用<span/>包圍起來
	$parts[$i] = str_replace($words, $replacements, $part);
}

$body = implode('', $parts);

echo $body;

說明:html

preg_split() 函數中使用的正則表達式匹配 HTML 標籤正則表達式

<(?:\"[^\"]*\"|'[^']*'|[^'\">])*>

能夠這樣理解:函數

<                                //開始尖括號
    (?:\                         //任意數量的 
        "[^\"]*\"                //雙引號字符串
        |                        //
        '[^']*'                  //單引號字符串
        |                        //
        [^'\">]                  //除去單引號、雙引號和>的其餘文本
    )*                          
>                                //結束尖括號

 

可是這種方法沒法高亮最後一個 Herring,由於它的首字母是大寫的。要徹底不區分大小寫的更改,須要把 str_replace() 方法 改成 preg_replace() 方法:spa

<style>
	.word-0{ background-color: yellow; }
	.word-1{ border:1px solid red; }
</style>

<?php

header('Content-type:text/html;charset=utf-8');

/* 標記Web頁面 */
$body = '
<p>I like pickles and hrrring.</p>
<a href="pickle.php"><img width="200" src="pickle.png">A pickle pic</a>
I have herringbone-patterned toaster cozy.
<herring>Herring is not a real HTML element!</herring>
';

$words = array('pickle', 'herring');
$replacements = array();
foreach($words as $i => $word) {
	$patterns[] = '/'.preg_quote($word).'/i'; 
	//preg_quote()須要參數 str 並向其中 每一個正則表達式語法中的字符前增長一個反斜線。正則表達式特殊字符有: . \ + * ? [ ^ ] $ ( ) { } = ! < > | : -
	$replacements[] = "<span class='word-$i'>\\0</span>";
}

// 將頁面分解爲多個塊
// 由看上去相似HTML元素的部分分隔
$parts = preg_split("{(<(?:\"[^\"]*\"|'[^']*'|[^'\">])*>)}", $body, -1, PREG_SPLIT_DELIM_CAPTURE);
var_dump($parts);
/*
array (size=15)
  0 => string '
' (length=2)
  1 => string '<p>' (length=3)
  2 => string 'I like pickles and hrrring.' (length=27)
  3 => string '</p>' (length=4)
  4 => string '
' (length=2)
  5 => string '<a href="pickle.php">' (length=21)
  6 => string '' (length=0)
  7 => string '<img width="200" src="pickle.png">' (length=34)
  8 => string 'A pickle pic' (length=12)
  9 => string '</a>' (length=4)
  10 => string '
I have herringbone-patterned toaster cozy.
' (length=46)
  11 => string '<herring>' (length=9)
  12 => string 'Herring is not a real HTML element!' (length=35)
  13 => string '</herring>' (length=10)
  14 => string '
' (length=2)
*/

foreach($parts as $i => $part) {
	//若是這個部分是HTML元素則跳過
	if(isset($part[0]) && ($part[0] == '<')) { continue; }
	//將這些單詞用<span/>包圍起來
	$parts[$i] = preg_replace($patterns, $replacements, $part);
}

$body = implode('', $parts);

echo $body;

  

 

參考:code

<PHP Cookbook>3'rdhtm

《精通正則表達式》第3版blog

相關文章
相關標籤/搜索