Mid-Atlantic Developer Conference - Call for Speakers


The power of regular expressions comes from the ability to include alternatives and repetitions in the pattern. These are encoded in the pattern by the use of meta-characters, which do not stand for themselves but instead are interpreted in some special way.

There are two different sets of meta-characters: those that are recognized anywhere in the pattern except within square brackets, and those that are recognized in square brackets. Outside square brackets, the meta-characters are as follows:

Meta-characters outside square brackets
\general escape character with several uses
^assert start of subject (or line, in multiline mode)
$assert end of subject or before a terminating newline (or end of line, in multiline mode)
.match any character except newline (by default)
[start character class definition
]end character class definition
|start of alternative branch
(start subpattern
)end subpattern
?extends the meaning of (, also 0 or 1 quantifier, also makes greedy quantifiers lazy (see repetition)
*0 or more quantifier
+1 or more quantifier
{start min/max quantifier
}end min/max quantifier
Part of a pattern that is in square brackets is called a character class. In a character class the only meta-characters are:
Meta-characters inside square brackets (character classes)
\general escape character
^negate the class, but only if the first character
-indicates character range
The following sections describe the use of each of the meta-characters.

add a note add a note

User Contributed Notes 3 notes

1 hour ago
A hint for those of you who are trying to fight off (or work around at least) the problem of matching a pattern correctly at the end ($) of any line in multiple lines mode (/m).
// Various OS-es have various end line (a.k.a line break) chars:
// - Windows uses CR+LF (\r\n);
// - Linux LF (\n);
// - OSX CR (\r).
// And that's why single dollar meta assertion ($) sometimes fails with multiline modifier (/m) mode - possible bug in PHP 5.3.8(?).
$str="ABC ABC\n\n123 123\r\ndef def\rnop nop\r\n890 890\nQRS QRS\r\r~-_ ~-_";
//          C          3                   p          0                   _
$n=preg_match_all($pat1, $str, $m1);
$o=preg_match_all($pat2, $str, $m2);
$p=preg_match_all($pat3, $str, $m3);
$str."\n1 !!! $pat1 ($n): ".print_r($m1[0], true)
"\n2 !!! $pat2 ($o): ".print_r($m2[0], true)
"\n3 !!! $pat3 ($p): ".print_r($m3[0], true);
// Note the difference between the two very helpful escape sequences in $pat2 (\r) and in $pat3 (\R) - for some applications at least.

/* The code above results in the following output:

123 123
def def
nop nop
890 890

~-_ ~-_
1 !!! /\w$/mi (3): Array
    [0] => C
    [1] => 0
    [2] => _

2 !!! /\w\r?$/mi (5): Array
    [0] => C
    [1] => 3
    [2] => p
    [3] => 0
    [4] => _

3 !!! /\w\R?$/mi (5): Array
    [0] => C

    [1] => 3
    [2] => p
    [3] => 0
    [4] => _
Unfortunately, I haven't got any access to a server with the latest PHP version - my local PHP is 5.3.8 and my public host's PHP is version 5.2.17.
Kurt Wei
2 years ago
disturbing usage of "any character" for multi-lines...

'.' (all characters) just does NOT include on single character the newline (\n) by default,
while \n is included in all other matching searches (e.g. \s).
Funny enough, the "carriage return" (\r) is included, when using '.'

You have to write "(.|\\n)" instead of a single dot, with disadvantages in using complex matching-results,

or simple use the "s" modificator to bring dot to accept the newline.


preg_match( '/<tag>[A-Za-z\\s]*<\\/tag>/' , $subject ); //true
preg_match( '/<tag>[^<]*<\\/tag>/' , $subject ); //true
preg_match( '/<tag>(.|\\n)*<\\/tag>/' , $subject ); //true
preg_match( '/<tag>.*<\\/tag>/s' , $subject ); //true
preg_match( '/<tag>.*<\\/tag>/' , $subject ); //ATTENTION! *false*
2 years ago
The meta character $ accepts a (one) newline character (\n).

(Take a moment to let this information sink in)

You might want to (r)trim() your input afterwards if you have a match because otherwise it (still) might not meet a length requirement or other strange stuff might happen when you store the input as-is.
To Top