Previous section To contents Next section

14.3 Regexp

Regexp is short for Regular Expression. A regular expression is a standardized way to make pattern that match certain strings. In Pike you can often use the sscanf, range and index operators to match strings, but sometimes a regexp is both faster and easier.

A regular expression is actually a string, then compiled into an object. The string contains characters that make up a pattern for other strings to match. Normal characters, such as A through Z only match themselves, but some characters have special meaning.
pattern  Matches
.   any one character
[abc]   a, b or c
[a-z]   any character a to z inclusive
[^ac]   any character except a and c
(x)   x (x might be any regexp) If used with split, this also puts the string matching x into the result array.
x*   zero or more occurrences of 'x' (x may be any regexp)
x+   one or more occurrences of 'x' (x may be any regexp)
x|y   x or y. (x or y may be any regexp)
xy   xy (x and y may be any regexp)
^   beginning of string (but no characters)
$   end of string (but no characters)
\<   the beginning of a word (but no characters)
\>   the end of a word (but no characters)
Let's look at a few examples:
Regexp Matches
[0-9]+ one or more digits
[^ \t\n] exactly one non-whitespace character
(foo)|(bar) either 'foo' or 'bar'
\.html$ any string ending in '.html'
^\. any string starting with a period

Note that \ can be used to quote these characters in which case they match themselves, nothing else. Also note that when quoting these something in Pike you need two \ because Pike also uses this character for quoting.

To make make regexps fast, they are compiled in a similar way that Pike is, they can then be used over and over again without needing to be recompiled. To give the user full control over the compilations and use of regexp an object oriented interface is provided.

You might wonder what regexps are good for, hopefully it should be more clear when you read about the following functions:

METHOD
Regexp.create - compile regexp

SYNTAX
void create();
void create(string regexp);
object(Regexp) Regexp();
object(Regexp) Regexp(string regexp);

DESCRIPTION
When create is called, the current regexp bound to this object is cleared. If a string is sent to create(), this string will be compiled to an internal representation of the regexp and bound to this object for later calls to match or split. Calling create() without an argument can be used to free up a little memory after the regexp has been used.

SEE ALSO
clone and Regexp->match

METHOD
Regexp.match - match a regexp

SYNTAX
int match(string s)

DESCRIPTION
Return 1 if s matches the regexp bound to the object regexp, zero otherwise.

SEE ALSO
Regexp->create and Regexp->split

METHOD
Regexp.split - split a string according to a pattern

SYNTAX
array(string) split(string s)

DESCRIPTION
Works as regexp->match, but returns an array of the strings that matched the sub-regexps. Sub-regexps are those contained in ( ) in the regexp. Sub-regexps that were not matched will contain zero. If the total regexp didn't match, zero is returned.

BUGS
You can only have 40 sub-regexps.

SEE ALSO
Regexp->create and Regexp->match


Previous section To contents Next section