Replace text using regular expressionhtml
collapse all in pagegit
exampleexpress
replaces the text in newStr
= regexprep(str
,expression
,replace
)str
that matches expression
with the text described by replace
. The regexprep
function returns the updated text in newStr
.api
If str
is a single piece of text (either a character vector or a string scalar), then newStr
is also a single piece of text of the same type.newStr
is a single piece of text even when expression
or replace
is a cell array of character vectors or a string array. Whenexpression
is a cell array or a string array, regexprep
applies the first expression to str
, and then applies each subsequent expression to the preceding result.app
If str
is a cell array or a string array, then newStr
is a cell array or string array with the same dimensions as str
. For each element ofstr
, the regexprep
function applies each expression in sequence.ide
If there are no matches to expression
, then newStr
is equivalent to str
.ui
collapse allscala
Replace words that begin with M
, end with y
, and have at least one character between them.
str = 'My flowers may bloom in May'; expression = 'M(\w+)y'; replace = 'April'; newStr = regexprep(str,expression,replace)
newStr = My flowers may bloom in April
Replace variations of the phrase 'walk up'
by capturing the letters that follow 'walk'
in a token.
str = 'I walk up, they walked up, we are walking up.'; expression = 'walk(\w*) up'; replace = 'ascend$1'; newStr = regexprep(str,expression,replace)
newStr = I ascend, they ascended, we are ascending.
Replace lowercase letters at the beginning of sentences with their uppercase equivalents using the upper
function.
str = 'here are two sentences. neither is capitalized.'; expression = '(^|\.)\s*.'; replace = '${upper($0)}'; newStr = regexprep(str,expression,replace)
newStr = Here are two sentences. Neither is capitalized.
The regular expression matches single characters (.
) that follow the beginning of the character vector (^)
or a period (\.)
and any whitespace (\s*)
. The replace
expression calls the upper
function for the currently matching character ($0
).
Replace each occurrence of a double letter in a set of character vectors with the symbols '--'
.
str = { ... 'Whose woods these are I think I know.' ; ... 'His house is in the village though;' ; ... 'He will not see me stopping here' ; ... 'To watch his woods fill up with snow.'}; expression = '(.)\1'; replace = '--'; newStr = regexprep(str,expression,replace)
newStr = 4×1 cell array 'Whose w--ds these are I think I know.' 'His house is in the vi--age though;' 'He wi-- not s-- me sto--ing here' 'To watch his w--ds fi-- up with snow.'
Ignore letter case in the regular expression when finding matches, but mimic the letter case of the original text when updating.
str = 'My flowers may bloom in May'; expression = 'M(\w+)y'; replace = 'April'; newStr = regexprep(str,expression,replace,'preservecase')
newStr = My flowers april bloom in April
Insert text at the beginning of a character vector using the '^'
operator, which returns a zero-length match, and the 'emptymatch'
keyword.
str = 'abc'; expression = '^'; replace = '__'; newStr = regexprep(str,expression,replace,'emptymatch')
newStr = __abc
str
— Text to updateText to update, specified as a character vector, a cell array of character vectors, or a string array.
Data Types: char
| cell
| string
expression
— Regular expressionRegular expression, specified as a character vector, a cell array of character vectors, or a string array. Each expression can contain characters, metacharacters, operators, tokens, and flags that specify patterns to match in str
.
The following tables describe the elements of regular expressions.
Metacharacters represent letters, letter ranges, digits, and space characters. Use them to construct a generalized pattern of characters.
Metacharacter |
Description |
Example |
---|---|---|
|
Any single character, including white space |
|
|
Any character contained within the brackets. The following characters are treated literally: |
|
|
Any character not contained within the brackets. The following characters are treated literally: |
|
|
Any character in the range of |
|
|
Any alphabetic, numeric, or underscore character. For English character sets, |
|
|
Any character that is not alphabetic, numeric, or underscore. For English character sets, |
|
|
Any white-space character; equivalent to |
|
|
Any non-white-space character; equivalent to |
|
|
Any numeric digit; equivalent to |
|
|
Any nondigit character; equivalent to |
|
|
Character of octal value |
|
|
Character of hexadecimal value |
|
Operator |
Description |
---|---|
|
Alarm (beep) |
|
Backspace |
|
Form feed |
|
New line |
|
Carriage return |
|
Horizontal tab |
|
Vertical tab |
|
Any character with special meaning in regular expressions that you want to match literally (for example, use |
Quantifiers specify the number of times a pattern must occur in the matching text.
Quantifier |
Matches the expression when it occurs... |
Example |
---|---|---|
|
0 or more times consecutively. |
|
|
0 times or 1 time. |
|
|
1 or more times consecutively. |
|
|
At least
|
|
|
At least
|
|
|
Exactly Equivalent to |
|
Quantifiers can appear in three modes, described in the following table. q represents any of the quantifiers in the previous table.
Mode |
Description |
Example |
---|---|---|
|
Greedy expression: match as many characters as possible. |
Given the text '<tr><td><p>text</p></td>' |
|
Lazy expression: match as few characters as necessary. |
Given the text '<tr>' '<td>' '</td>' |
|
Possessive expression: match as much as possible, but do not rescan any portions of the text. |
Given the text |
Grouping operators allow you to capture tokens, apply one operator to multiple elements, or disable backtracking in a specific group.
Grouping Operator |
Description |
Example |
---|---|---|
|
Group elements of the expression and capture tokens. |
|
|
Group, but do not capture tokens. |
Without grouping, |
|
Group atomically. Do not backtrack within the group to complete the match, and do not capture tokens. |
|
|
Match expression If there is a match with You can include |
|
Anchors in the expression match the beginning or end of the input text or word.
Anchor |
Matches the... |
Example |
---|---|---|
|
Beginning of the input text. |
|
|
End of the input text. |
|
|
Beginning of a word. |
|
|
End of a word. |
|
Lookaround assertions look for patterns that immediately precede or follow the intended match, but are not part of the match.
The pointer remains at the current location, and characters that correspond to the test
expression are not captured or discarded. Therefore, lookahead assertions can match overlapping character groups.
Lookaround Assertion |
Description |
Example |
---|---|---|
|
Look ahead for characters that match |
|
|
Look ahead for characters that do not match |
|
|
Look behind for characters that match |
|
|
Look behind for characters that do not match |
|
If you specify a lookahead assertion before an expression, the operation is equivalent to a logical AND
.
Operation |
Description |
Example |
---|---|---|
|
Match both |
|
|
Match |
|
Logical and conditional operators allow you to test the state of a given condition, and then use the outcome to determine which pattern, if any, to match next. These operators support logical OR
, and if
or if/else
conditions.
Conditions can be tokens, lookaround operators, or dynamic expressions of the form (?@cmd)
. Dynamic expressions must return a logical or numeric value.
Conditional Operator |
Description |
Example |
---|---|---|
|
Match expression If there is a match with |
|
|
If condition |
|
|
If condition |
|
Token Operators
Tokens are portions of the matched text that you define by enclosing part of the regular expression in parentheses. You can refer to a token by its sequence in the text (an ordinal token), or assign names to tokens for easier code maintenance and readable output.
Ordinal Token Operator |
Description |
Example |
---|---|---|
|
Capture in a token the characters that match the enclosed expression. |
|
|
Match the |
|
|
If the |
|
Named Token Operator |
Description |
Example |
---|---|---|
|
Capture in a named token the characters that match the enclosed expression. |
|
|
Match the token referred to by |
|
|
If the named token is found, then match |
|
Note: If an expression has nested parentheses, MATLAB® captures tokens that correspond to the outermost set of parentheses. For example, given the search pattern |
Dynamic Regular Expressions
Dynamic expressions allow you to execute a MATLAB command or a regular expression to determine the text to match.
The parentheses that enclose dynamic expressions do not create a capturing group.
Operator |
Description |
Example |
---|---|---|
|
Parse When parsed, |
|
|
Execute the MATLAB command represented by |
|
|
Execute the MATLAB command represented by |
|
Within dynamic expressions, use the following operators to define replacement text.
Replacement Operator |
Description |
---|---|
|
Portion of the input text that is currently a match |
|
Portion of the input text that precedes the current match |
|
Portion of the input text that follows the current match (use |
|
|
|
Named token |
|
Output returned when MATLAB executes the command, |
Comments
Characters |
Description |
Example |
---|---|---|
(?#comment) |
Insert a comment in the regular expression. The comment text is ignored when matching the input. |
|
Search Flags
Search flags modify the behavior for matching expressions. An alternative to using a search flag within an expression is to pass an option
input argument.
Flag |
Description |
---|---|
(?-i) |
Match letter case (default for |
(?i) |
Do not match letter case (default for |
(?s) |
Match dot ( |
(?-s) |
Match dot in the pattern with any character that is not a newline character. |
(?-m) |
Match the |
(?m) |
Match the |
(?-x) |
Include space characters and comments when matching (default). |
(?x) |
Ignore space characters and comments when matching. Use |
The expression that the flag modifies can appear either after the parentheses, such as
(?i)\w*
or inside the parentheses and separated from the flag with a colon (:
), such as
(?i:\w*)
The latter syntax allows you to change the behavior for part of a larger expression.
Data Types: char
| cell
| string
replace
— Replacement textReplacement text, specified as a character vector, a cell array of character vectors, or a string array, as follows:
If replace
is a single character vector and expression
is a cell array of character vectors, then regexprep
uses the same replacement text for each expression.
If replace
is a cell array of N
character vectors and expression
is a single character vector, then regexprep
attempts N
matches and replacements.
If both replace
and expression
are cell arrays of character vectors, then they must contain the same number of elements. regexprep
pairs eachreplace
element with its matching element in expression
.
The replacement text can include regular characters, special characters (such as tabs or new lines), or replacement operators, as shown in the following tables.
Replacement Operator |
Description |
---|---|
|
Portion of the input text that is currently a match |
|
Portion of the input text that precedes the current match |
|
Portion of the input text that follows the current match (use |
|
|
|
Named token |
|
Output returned when MATLAB executes the command, |
Operator |
Description |
---|---|
|
Alarm (beep) |
|
Backspace |
|
Form feed |
|
New line |
|
Carriage return |
|
Horizontal tab |
|
Vertical tab |
|
Any character with special meaning in regular expressions that you want to match literally (for example, use |
Data Types: char
| cell
| string
option
— Search or replacement option'once'
| N
| 'warnings'
| 'ignorecase'
| 'preservecase'
| 'emptymatch'
| 'dotexceptnewline'
| 'lineanchors'
| ...Search or replacement option, specified as a character vector or an integer value, as shown in the following table.
Options come in sets: one option that corresponds to the default behavior, and one or two options that allow you to override the default. Specify only one option from a set. Options can appear in any order.
Default |
Override |
Description |
---|---|---|
|
|
Match and replace the expression as many times as possible (default), or only once. |
|
Replace only the |
|
|
|
Suppress warnings (default), or display them. |
|
|
Match letter case (default), or ignore case while matching and replacing. |
|
Ignore case while matching, but preserve the case of corresponding characters in the original text while replacing. |
|
|
|
Ignore zero length matches (default), or include them. |
|
|
Match dot with any character (default), or all except newline ( |
|
|
Apply |
|
|
Include space characters and comments when matching (default), or ignore them. With |
Data Types: char
| string
newStr
— Updated textUpdated text, returned as a character vector, a cell array of character vectors, or a string array. The data type of newStr
is the same as the data type of str
.
This function fully supports tall arrays. For more information, see Tall Arrays.