4.1.3 string
A string can be seen as an array of values from 0 to 2³²-1.
Usually a string contains text such as a word, a sentence, a page or
even a whole book. But it can also contain parts of a binary file,
compressed data or other binary data. Strings in Pike are shared,
which means that identical strings share the same memory space. This
reduces memory usage very much for most applications and also speeds
up string comparisons. We have already seen how to write a constant
string:
"hello world" // hello world
"he" "llo" // hello
"\116" // N (116 is the octal ASCII value for N)
"\t" // A tab character
"\n" // A newline character
"\r" // A carriage return character
"\b" // A backspace character
"\0" // A null character
"\"" // A double quote character
"\\" // A singe backslash
"\x4e" // N (4e is the hexadecimal ASCII value for N)
"\d78" // N (78 is the decimal ACII value for N)
"hello world\116\t\n\r\b\0\"\\" // All of the above
"\xff" // the character 255
"\xffff" // the character 65536
"\xffffff" // the character 16777215
"\116""3" // 'N' followed by a '3'
As you can see, any sequence of characters within double quotes is a string.
The backslash character is used to escape characters that are not allowed or
impossible to type. As you can see, \t is the sequence to produce
a tab character, \\ is used when you want one backslash and
\" is used when you want a double quote (") to be a part
of the string instead of ending it.
Also, \XXX where XXX is an
octal number from 0 to 37777777777 or \xXX where XX
is 0 to ffffffff lets you write any character you want in the
string, even null characters. From version 0.6.105, you may also use
\dXXX where XXX is 0 to 2³²-1. If you write two constant
strings after each other, they will be concatenated into one string.
You might be surprised to see that individual characters can have values
up to 2³²-1 and wonder how much memory that use. Do not worry, Pike
automatically decides the proper amount of memory for a string, so all
strings with character values in the range 0-255 will be stored with
one byte per character. You should also beware that not all functions
can handle strings which are not stored as one byte per character, so
there are some limits to when this feature can be used.
Although strings are a form of arrays, they are immutable. This means that
there is no way to change an individual character within a string without
creating a new string. This may seem strange, but keep in mind that strings
are shared, so if you would change a character in the string "foo",
you would change *all* "foo" everywhere in the program.
However, the Pike compiler will allow you to to write code like you could
change characters within strings, the following code is valid and works:
string s="hello torld";
s[6]='w';
However, you should be aware that this does in fact create a new string and
it may need to copy the string s to do so. This means that the above
operation can be quite slow for large strings. You have been warned.
Most of the time, you can use replace, sscanf, `/
or some other high-level string operation to avoid having to use the above
construction too much.
All the comparison operators plus the operators listed here can be used on strings:
- Summation
- Adding strings together will simply concatenate them.
"foo"+"bar" becomes "foobar".
- Subtraction
- Subtracting one string from another will remove all occurrences
of the second string from the first one. So
"foobarfoogazonk" - "foo" results in "bargazonk".
- Indexing
- Indexing will let you get the ASCII value of any character in a string.
The first index is zero.
- Range
- The range operator will let you copy any part of the string into a
new string. Example: "foobar"[2..4] will return "oba".
- Division
- Division will let you divide a string at every occurrence of a word or
character. For instance if you do "foobargazonk" / "o" the
result would be ({"f","","bargaz","nk"}). It is also possible
to divide the string into strings of length N by dividing the string
by N. If N is converted to a float before dividing, the reminder of
the division will be included in the result.
- Multiplication
- The inverse of the division operator can be accomplished by multiplying
an array with a string. So if you evaluate
({"f","","bargaz","nk"}) * "o" the result would be
"foobargazonk".
- Modulo
- To complement the division operator, you can do string % int.
This operator will simply return the part of the string that was not
included in the array returned by string / int
Also, these functions operates on strings:
- string String.capitalize(string s)
- Returns s with the first character converted to upper case.
- int String.count(string haystack, string needle)
- Returns the number of occurances of needle in haystack.
Equvivalent to sizeof(haystack/needle)-1.
- int String.width(string s)
- Returns the width s in bits (8, 16 or 32).
- string lower_case(string s)
- Returns s with all the upper case characters converted to lower case.
- string replace(string s, string from, string to)
- This function replaces all occurrences of the string from in s with to and returns the new string.
- string reverse(string s)
- This function returns a copy of s with the last byte from s
first, the second last in second place and so on.
- int search(string haystack, string needle)
- This function finds the first occurrence of needle in haystack and returns where it found it.
- string sizeof(string s)
- Same as strlen(s), returns the length of the string.
- int stringp(mixed s)
- This function returns 1 if s is a string, 0 otherwise.
- int strlen(string s)
- Returns the length of the string s.
- string upper_case(string s)
- This function returns s with all lower case characters converted
to upper case.