Regexp is short for Regular Expression. A regular expression is
a standardized way to make pattern that match certain strings. In Pike
you can often use the sscanf, range and index operators to match strings,
but sometimes a regexp is both faster and easier.
A regular expression is actually a string, then compiled into an object.
The string contains characters that make up a pattern for other strings
to match. Normal characters, such as A through Z only match themselves,
but some characters have special meaning.
Let's look at a few examples:
| pattern || ||Matches|
| . || || any one character |
| [abc] || || a, b or c |
| [a-z] || || any character a to z inclusive |
| [^ac] || || any character except a and c |
| (x) || || x (x might be any regexp) If used with split, this also puts the string matching x into the result array. |
| x* || || zero or more occurrences of 'x' (x may be any regexp) |
| x+ || || one or more occurrences of 'x' (x may be any regexp) |
| x|y || || x or y. (x or y may be any regexp) |
| xy || || xy (x and y may be any regexp) |
| ^ || || beginning of string (but no characters) |
| $ || || end of string (but no characters) |
| \< || || the beginning of a word (but no characters) |
| \> || || the end of a word (but no characters) |
|[0-9]+|| ||one or more digits|
|[^ \t\n]|| ||exactly one non-whitespace character|
|(foo)|(bar)|| ||either 'foo' or 'bar'|
|\.html$|| ||any string ending in '.html'|
|^\.|| ||any string starting with a period|
Note that \ can be used to quote these characters in which case
they match themselves, nothing else. Also note that when quoting
these something in Pike you need two \ because Pike also uses
this character for quoting.
To make make regexps fast, they are compiled in a similar way that Pike is,
they can then be used over and over again without needing to be recompiled.
To give the user full control over the compilations and use of regexp an
object oriented interface is provided.
You might wonder what regexps are good for, hopefully it should be more clear
when you read about the following functions:
- Regexp.create - compile regexp
void create(string regexp);
object(Regexp) Regexp(string regexp);
When create is called, the current regexp bound to this object is
cleared. If a string is sent to create(), this string will be compiled
to an internal representation of the regexp and bound to this object
for later calls to match or split. Calling create() without an
argument can be used to free up a little memory after the regexp has
- SEE ALSO
- clone and Regexp->match
- Regexp.match - match a regexp
int match(string s)
Return 1 if s matches the regexp bound to the object regexp,
- SEE ALSO
- Regexp->create and Regexp->split
- Regexp.split - split a string according to a pattern
array(string) split(string s)
Works as regexp->match, but returns an array of the strings that
matched the sub-regexps. Sub-regexps are those contained in ( ) in
the regexp. Sub-regexps that were not matched will contain zero.
If the total regexp didn't match, zero is returned.
You can only have 40 sub-regexps.
- SEE ALSO
- Regexp->create and Regexp->match