Previous section To contents

4.1.3 string

A string can be seen as an array of values from 0 to 2³²-1. Usually a string contains text such as a word, a sentence, a page or even a whole book. But it can also contain parts of a binary file, compressed data or other binary data. Strings in Pike are shared, which means that identical strings share the same memory space. This reduces memory usage very much for most applications and also speeds up string comparisons. We have already seen how to write a constant string:
"hello world" // hello world
"he" "llo" // hello
"\116" // N (116 is the octal ASCII value for N)
"\t" // A tab character
"\n" // A newline character
"\r" // A carriage return character
"\b" // A backspace character
"\0" // A null character
"\"" // A double quote character
"\\" // A singe backslash
"\x4e" // N (4e is the hexadecimal ASCII value for N)
"\d78" // N (78 is the decimal ACII value for N)
"hello world\116\t\n\r\b\0\"\\" // All of the above
"\xff" // the character 255
"\xffff" // the character 65536
"\xffffff" // the character 16777215
"\116""3" // 'N' followed by a '3'
As you can see, any sequence of characters within double quotes is a string. The backslash character is used to escape characters that are not allowed or impossible to type. As you can see, \t is the sequence to produce a tab character, \\ is used when you want one backslash and \" is used when you want a double quote (") to be a part of the string instead of ending it. Also, \XXX where XXX is an octal number from 0 to 37777777777 or \xXX where XX is 0 to ffffffff lets you write any character you want in the string, even null characters. From version 0.6.105, you may also use \dXXX where XXX is 0 to 2³²-1. If you write two constant strings after each other, they will be concatenated into one string.

You might be surprised to see that individual characters can have values up to 2³²-1 and wonder how much memory that use. Do not worry, Pike automatically decides the proper amount of memory for a string, so all strings with character values in the range 0-255 will be stored with one byte per character. You should also beware that not all functions can handle strings which are not stored as one byte per character, so there are some limits to when this feature can be used.

Although strings are a form of arrays, they are immutable. This means that there is no way to change an individual character within a string without creating a new string. This may seem strange, but keep in mind that strings are shared, so if you would change a character in the string "foo", you would change *all* "foo" everywhere in the program.

However, the Pike compiler will allow you to to write code like you could change characters within strings, the following code is valid and works:

string s="hello torld";
s[6]='w';
However, you should be aware that this does in fact create a new string and it may need to copy the string s to do so. This means that the above operation can be quite slow for large strings. You have been warned. Most of the time, you can use replace, sscanf, `/ or some other high-level string operation to avoid having to use the above construction too much.

All the comparison operators plus the operators listed here can be used on strings:

Summation
Adding strings together will simply concatenate them. "foo"+"bar" becomes "foobar".
Subtraction
Subtracting one string from another will remove all occurrences of the second string from the first one. So "foobarfoogazonk" - "foo" results in "bargazonk".
Indexing
Indexing will let you get the ASCII value of any character in a string. The first index is zero.
Range
The range operator will let you copy any part of the string into a new string. Example: "foobar"[2..4] will return "oba".
Division
Division will let you divide a string at every occurrence of a word or character. For instance if you do "foobargazonk" / "o" the result would be ({"f","","bargaz","nk"}). It is also possible to divide the string into strings of length N by dividing the string by N. If N is converted to a float before dividing, the reminder of the division will be included in the result.
Multiplication
The inverse of the division operator can be accomplished by multiplying an array with a string. So if you evaluate ({"f","","bargaz","nk"}) * "o" the result would be "foobargazonk".
Modulo
To complement the division operator, you can do string % int. This operator will simply return the part of the string that was not included in the array returned by string / int

Also, these functions operates on strings:

string String.capitalize(string s)
Returns s with the first character converted to upper case.
int String.count(string haystack, string needle)
Returns the number of occurances of needle in haystack. Equvivalent to sizeof(haystack/needle)-1.
int String.width(string s)
Returns the width s in bits (8, 16 or 32).
string lower_case(string s)
Returns s with all the upper case characters converted to lower case.
string replace(string s, string from, string to)
This function replaces all occurrences of the string from in s with to and returns the new string.
string reverse(string s)
This function returns a copy of s with the last byte from s first, the second last in second place and so on.
int search(string haystack, string needle)
This function finds the first occurrence of needle in haystack and returns where it found it.
string sizeof(string s)
Same as strlen(s), returns the length of the string.
int stringp(mixed s)
This function returns 1 if s is a string, 0 otherwise.
int strlen(string s)
Returns the length of the string s.
string upper_case(string s)
This function returns s with all lower case characters converted to upper case.

Previous section To contents