Previous chapter To contents To appendices

Chapter 17, Pike internals - how to extend Pike

The rest of this book describes how Pike works and how to extend it with your own functions written in C or C++. Even if you are not interested in extending Pike, the information in this section can make you understand Pike better and thus make you a better Pike programmer. From this point on I will assume that the reader knows C or C++.

17.1 The master object

Pike is a very dynamic language. Sometimes that is not enough, sometimes you want to change the way Pike handles errors, loads modules or start scripts. All this and much more can be changed by modifying the master object. The master object is a Pike object like any other object, but it is loaded before anything else and is expected to perform certain things for the Pike executable. The Pike executable cannot function without a master object to take care of these things. Here is a list of the methods needed in the master object:
program cast_to_program(string program_name, string current_file)
This function is called whenever someone performs a cast from a string to a program.
program handle_inherit(string program_name, string current_file)
This is called whenever a Pike program which uses inherit with a string argument is called. It is expected to return the program to inherit.
void handle_error(array trace)
This function is expected to write the error messages when a run time error occurs. The argument is of the form ({"error_description", backtrace() }). If any error occurs in this routine Pike will dump core.
program cast_to_program(string program_name, string current_file)
This function is called whenever someone performs a cast from a string to an object.
mixed resolv(string identifier, string current_file)
This function is called whenever the compiler finds an unknown identifier in a program. It is normally used for loading modules. It is supposed to return ([])[0] if the master doesn't know what the value should be, and the value in question otherwise.
void _main(array(string) argv, array(string) env)
This function is supposed to start a Pike script. It receives all the command line arguments in the first array and all environment variables on the form "var=value". _main is called as soon as all modules and setup is done.
void compile_error(string file, int line, string err)
This function is called whenever a compile error is encountered. Normally it just writes a message to stderr.
string handle_include(string file, string current_file, int local_include)
This function is used to locate include files. file is the file name the user wants to include, and local_include is 1 if the user used double quotes rather than lesser-than, greater-than to quote the file name. Otherwise it is zero.

Aside from the above functions, which are expected from the Pike binary, the master object is also expected to provide functions used by Pike scripts. The current master object adds the following global functions:

add_include_path, remove_include_path, add_module_path, remove_module_path, add_program_path, remove_program_path, master, describe_backtrace, mkmultiset, strlen, new, clone, UNDEFINED, write, getenv and putenv.

There are at least two ways to change the behavior of the master object. (Except for editing it directly, which would cause other Pike scripts not to run in most cases.) You can either copy the master object, modify it and use the command line option -m to load your file instead of the default master object. However, since there might be more functionality added to the master object in the future I do not recommend this.

A better way is to write an object that inherits the master and then calls replace_master with the new object as argument. This should be far more future-safe. Although I can not guarantee that the interface between Pike and the master object will not change in the future, so be careful if you do this.

Let's look an example:

#!/usr/local/bin/pike

class new_master {
    inherit "/master";

    void create()
    {
        /* You need to copy the values from the old master to the new */
        /* NOTE: At this point we are still using the old master */
        object old_master = master();
        object new_master = this_object();

        foreach(indices(old_master), string varname)
        {
            /* The catch is needed since we can't assign constants */
            catch { new_master[varname] = old_master[varname]; };
        }
    }

    void handle_error(array trace)
    {
        Stdio.write_file("error log",describe_backtrace(trace));
    }
};

int main(int argc, array(string) argv)
{
    replace_master(new_master());
    /* Run rest of program */
    exit(0);
}
This example installs a master object which logs run time errors to file instead of writing them to stderr.

17.2 Data types from the inside

This section describes the different data types used inside the Pike interpreter. It is nessesary to have at least a basic understanding of these before you write Pike extentions.

17.2.1 Basic data types

First, we must come to know the basic data types pike uses.
INT8, INT16, INT32, INT64
These are defines which are at least as many bits as the number suggests. If there is an integer which is 32 bits on a certain platform, INT32 is guaranteed to be 32 bits.
INT_TYPE
This is the type Pike uses for integers. Usually 32 bits.
FLOAT_TYPE
This is the type Pike uses for floats. Usually defined as 'float'.
TYPE_FIELD
This is a bit field which can be any combination of the flags: BIT_INT, BIT_FLOAT, BIT_STRING, BIT_ARRAY, BIT_MAPPING, BIT_MULTISET, BIT_OBJECT, BIT_PROGRAM, BIT_FUNCTION. Please note that BIT_INT is defined as 1<<T_INT, BIT_MAPPING defined as 1<<T_MAPPING etc. Also, there are some special values defined for your convenience:

17.2.2 struct svalue

An svalue is the most central data structure in the Pike interpreter. It is used to hold values on the stack, local variables, items in arrays and mappings and a lot more. Any of the data types described in
chapter 4 "Data types" can be stored in an svalue.

A struct svalue has three members:

short type;
This says what type of value is actually stored in the svalue. Valid values are T_INT, T_FLOAT, T_STRING, T_ARRAY, T_MAPPING, T_MULTISET, T_FUNCTION, T_PROGRAM, T_OBJECT. In certain situations, other values are used in the type field, but those are reserved for internal Pike use only.
short subtype;
union anything u
This union contains the data. Depending on what the type member is, you can access one of the following union members:
type is:member to use:notes:
T_INTINT_TYPE integer
T_FLOATFLOAT_TYPE float_number
T_STRINGstruct pike_string *string
T_ARRAYstruct array *array
T_MAPPINGstruct mapping *mapping
T_MULTISETstruct multiset *multiset
T_OBJECTstruct object *object
T_PROGRAMstruct program *program
T_FUNCTIONstruct callable *efunIf subtype == FUNCTION_BUILTIN
T_FUNCTIONstruct object *objectIf subtype != FUNCTION_BUILTIN

Of course there are a whole bunch of functions for operating on svalues:

FUNCTION
free_svalue - free the contents of an svalue

SYNTAX
void free_svalue(struct svalue *s);

DESCRIPTION
This function is actually a macro, it will the contents of s. It does not however free s itself. After calling free_svalue, the contents of s is undefined, and you should not be surprised if your computer blows up if you try to access the it's contents. Also note that this doesn't nessecarily free whatever the svalue is pointing to, it only frees one reference. If that reference is the last one, the object/array/mapping/whatever will indeed be freed.

NOTE
This function will *not* call Pike code or error().

FUNCTION
free_svalues - free many svalues

SYNTAX
void free_svalues(struct svalue *s, INT32 howmany, TYPE_FIELD type_hint);

DESCRIPTION
This function does the same as free_svalue but operates on several svalues. The type_hint is used for optimization and should be set to BIT_MIXED if you don't know exactly what types are beeing freed.

NOTE
This function will *not* call Pike code or error().

SEE ALSO
free_svalue and TYPE_FIELD

FUNCTION
assign_svalue - copy an svalue to another svalue

SYNTAX
void assign_svalue(struct svalue *to, sstruct svalue *from);

DESCRIPTION
This function frees the contents of to and then copies the contents of from into to. If the value in from uses refcounts, they will be increased to reflect this copy.

NOTE
This function will *not* call Pike code or error().

SEE ALSO
free_svalue and assign_svalue_no_free

FUNCTION
assign_svalue_no_free - copy an svalue to another svalue

SYNTAX
void assign_svalue_no_free(struct svalue *to, sstruct svalue *from);

DESCRIPTION
This function does the same as assign_svalue() but does not free the contents of to before overwriting it. This should be used when to has not been initialized yet. If this funcion is incorrectly, memory leaks will occur. On the other hand, if you call assign_svalue on an uninitialized svalue, a core dump or bus error will most likely occur.

NOTE
This function will *not* call Pike code or error().

SEE ALSO
assign_svalue and free_svalue

FUNCTION
IS_ZERO - check if an svalue is true or false

SYNTAX
int IS_ZERO(struct svalue *s);

DESCRIPTION
This macro returns 1 if s is false and 0 if s is true.

NOTE
This macro will evaluate s several times.
This macro may call Pike code and/or error().

SEE ALSO
is_eq

FUNCTION
is_eq - check if two svalues contains the same value

SYNTAX
int is_eq(struct svalue *a, struct svalue *b);

DESCRIPTION
This function returns 1 if a and b contain the same value. This is the same as the `== operator in pike.

NOTE
This function may call Pike code and/or error().

SEE ALSO
IS_ZERO, is_lt, is_gt, is_le, is_ge and is_equal

FUNCTION
is_equal - check if two svalues are equal

SYNTAX
int is_equal(struct svalue *a, struct svalue *b);

DESCRIPTION
This function returns 1 if a and b contains equal values. This is the same as the function equal in pike.

NOTE
This function may call Pike code and/or error().

SEE ALSO
equal and is_eq

FUNCTION
is_lt - compare the contents of two svalues

SYNTAX
int is_lt(struct svalue *a, struct svalue *b);
int is_le(struct svalue *a, struct svalue *b);
int is_gt(struct svalue *a, struct svalue *b);
int is_ge(struct svalue *a, struct svalue *b);

DESCRIPTION
These functions are equal to the pike operators `<, `<=, `>, `>= respectively. For instance is_lt will return 1 if the contents of a is lesser than the contents of b.

NOTE
This function may call Pike code and/or error(). For instance, it will call error() if you try to compare values which cannot be compared such as comparing an integer to an array.

SEE ALSO
IS_ZERO and is_eq

17.2.3 struct pike_string

A struct pike_string is the internal representation of a string. Since Pike relies heavily on string manipulation, there are quite a few features and quirks to using this data structure. The most important part is that strings are shared. This means that after a string has been entered into the shared string table it must never be modified. Since some other thread might be using the very same string, it is not even permitted to change a shared string temporarily and then change it back.

A struct pike_string has these members:

INT32 refs;
The references to this string.
INT32 length;
This is the length of the string.
unsigned INT32 hval;
This is the internal hash value for the string, you should not have to use this member for any reason.
struct pike_string *next;
This points to the next string in the hash table. Internal use only.
int size_shift;
This represents the size of the characters in the string. Currently size_shift has three valid values: 0, 1 and 2. These values mean that the characters in the string are 1, 2 and 4 bytes long respectively.
char str[1];
This is the actual data. Note that you should never use this member directly. Use STR0, STR1 and STR2 instead.

General string management

Since pike strings are shared, you can compare them by using ==. FIXME -- add more here.

FUNCTION
STR0 - Get a pointer to a 'char'

SYNTAX
p_wchar0 *STR0(struct pike_string *s);
p_wchar1 *STR1(struct pike_string *s);
p_wchar2 *STR2(struct pike_string *s);

DESCRIPTION
These macros return raw C pointers to the data in the string s. Note that you may only use STR0 on strings where size_shift is 0, STR1 on strings where size_shift is 1 and STR2on strings where size_shift is 2. When compiled with DEBUG these macros will call fatal if used on strings with the wrong size_shift.

NOTE
All pike strings have been zero-terminated for your convenience.
The zero-termination is not included in the length of the string.

FUNCTION
free_string - Free a reference to a pike_string

SYNTAX
void free_string(struct pike_string *s);

DESCRIPTION
This function frees one reference to a pike string and if that is the last reference, it will free the string itself. As with all refcounting functions you should be careful about how you use it. If you forget to call this when you should, a memory leak will occur. If you call this function when you shouldn't Pike will most likely crash.

FUNCTION
make_shared_string - Make a new shared string

SYNTAX
struct pike_string *make_shared_string(char *str);

DESCRIPTION
This function takes a null terminated C string as argument and returns a pike_string with the same contents. It does not free or change str. The returned string will have a reference which will be up to you to free with free_string unless you send the string to a function such as push_string which eats the reference for you.

SEE ALSO
free_string, push_string, begin_shared_string, make_shared_binary_string, make_shared_string1 and make_shared_string2

FUNCTION
make_shared_binary_string - Make a new binary shared string

SYNTAX
struct pike_string *make_shared_binary_string(char *str, INT32 len);

DESCRIPTION
This function does essentially the same thing as make_shared_string, but you give it the length of the string str as a second argument. This allows for strings with zeros in them. It is also more efficient to call this routine if you already know the length of the string str.

SEE ALSO
free_string, push_string, begin_shared_string, make_shared_string, make_shared_binary_string1 and make_shared_binary_string2

FUNCTION
begin_shared_string - Start building a shared string

SYNTAX
struct pike_string *begin_shared_string(INT32 len);

DESCRIPTION
This function is used to allocate a new shared string with a specified length which has not been created yet. The returned string is not yet shared and should be initialized with data before calling end_shared_string on it.

If after calling this function you decide that you do not need this string after all, you can simply call free on the returned string to free it. It is also possible to call free_string(end_shared_string(s)) but that would be much less efficient.

EXAMPLE
// This is in effect equal to s=make_shared_string("test") struct pike_string *s=begin_shared_string(4); STR0(s)[0]='t'; STR0(s)[1]='e'; STR0(s)[2]='s'; STR0(s)[3]='t'; s=end_shared_string(s);

SEE ALSO
begin_wide_shared_string, free_string, push_string, make_shared_string and end_shared_string

FUNCTION
end_shared_string - Insert a pre-allocated string into the shared string table

SYNTAX
struct pike_string *end_shared_string(struct pike_string *s);

DESCRIPTION
This function is used to finish constructing a pike string previously allocated with begin_shared_string or begin_wide_shared_string. It will insert the string into the shared string table. If there already is such a string in the shared string table then s will be freed and that string will be returned instead. After calling this function, you may not modify the string any more. As with make_shared_string this function returns a string with a reference which it is your responsibility to free.

SEE ALSO
begin_shared_string and begin_wide_shared_string

FUNCTION
begin_wide_shared_string - Start building a wide shared string

SYNTAX
struct pike_string *begin_wide_shared_string(INT32 len, int size_shift);

DESCRIPTION
This function is a more generic version of begin_shared_string. It allocates space for a string of length len where each character is 1 << size_shift bytes. As with begin_shared_string it is your responsibility to initialize the string and to call end_shared_string on it.

EXAMPLE
struct pike_string *s=begin_wide_shared_string(1,2); STR2(s)[0]=4711; s=end_shared_string(s);

SEE ALSO
begin_shared_string, end_shared_string, make_shared_string, make_shared_string1 and make_shared_string2

FUNCTION
make_shared_string1 - Make a wide shared string

SYNTAX
struct pike_string *make_shared_string1(p_whcar1 *str);
struct pike_string *make_shared_binary_string1(p_whcar1 *str,INT32 len);
struct pike_string *make_shared_string2(p_whcar2 *str);
struct pike_string *make_shared_binary_string2(p_whcar2 *str,INT32 len);

DESCRIPTION
These functions are the wide string equivialents of make_shared_string and make_shared_binary_string. The functions ending in 1 use 2-byte characters and the ones ending in 2 use 4-byte characters.

SEE ALSO
make_shared_string, make_shared_binary_string and begin_wide_shared_string

17.2.4 struct array

Internally Pike uses a struct array to represent the type array. As with strings, arrays are used in many different ways, so they have many supporting functions for making them easier to manipulate. Usually you will not have to construct array structures yourself, but it is often nessecary to read data from arrays.

A struct array has these members:

INT32 refs;
The references to this array.
INT32 size;
The number of elements in the array.
INT32 malloced_size;
The number of elements there is room for in the array without re-allocating.
TYPE_FIELD type_field;
This bit field contains one bit for each type present in the array. Note that bits may be set that are not present in the array, but not vice versa. See TYPE_FIELD for more information.
INT16 flags;
ARRAY_* flags, you may set one or more of:
struct svalue item[size];
This is a variable-size array of svalues which contains the actual values in this array.
Here is an example function which will print the type of each value in an array:
void prtypes(struct array *a)
{
    INT e;
    for(e=0;e<a->size;e++)
        printf("Element %d is of type %d\n",e,a->item[e].type);
}

FUNCTION
allocate_array

FUNCTION
free_array

FUNCTION
array_index

FUNCTION
array_index_no_free

FUNCTION
simple_array_index_no_free

FUNCTION
array_set_index

FUNCTION
push_array_items

FUNCTION
aggregate_array

FUNCTION
f_aggregate_array

FUNCTION
append_array

FUNCTION
explode

FUNCTION
slice_array

FUNCTION
add_arrays

FUNCTION
copy_array

17.2.5 struct mapping

struct mapping is used to represent a mapping, for the most part you should be able to write modules and understand Pike internals without actually touching the internals of a mapping. It also helps that mappings are very well abstracted, so you can almost always use the supporting functions instead of fiddling around with the contents of a struct mapping directly.

This is the contents of a struct mapping:

INT32 refs;
The references, as in all data types except int and float.
INT32 size;
The number of key-index pairs in the mapping.
INT32 hashsize;
The size of the mapping hash table. Normally between 1/2 and 1/4 of the size of the mapping.
INT16 flags;
May contain the flag MAPPING_FLAG_WEAK to have data being garbage collected even when it is still in the mapping.
TYPE_FIELD ind_types, val_types;
These type fields tells what types may be present among the indices and values of the mapping. See TYPE_FIELD for more information about type fields.
struct keypair **hash;
This is the hash table.
struct keypair *free_list;
This is a linked list of free keypairs.
Mappings are allocated as two separate blocks of memory. One is the struct mapping which holds pointers into the second memory block. The second memory block contains the hash table and all key-value pairs. A key-value pair is represented as a struct keypair which has the following members:
struct keypair *next;
Pointer to the next key-value pair in this bucket. Also used to link free key-value pairs together.
struct svalue ind, val;
The key and value respectively.
Please note that the free list is separate for each mapping. Also, when there are no more free key-value pairs the whole memory block is re-allocated and the mapping is re-hashed with a larger hash table.

Below is an illustration which shows an example of a small mapping with hash table, free list and key-index pairs.

As you can see, mappings uses a linked list for each bucket in the hash table. Also, the current implementation moves key-value pairs to the top of the hash chain everytime a match is found, this can greately increase performance in some situations. However, because of this the order of elements in a mapping can change every time you access it. Also, since mappings can be re-allocated any time you add an element to it you can never trust a pointer to a pointer to a struct keypair if any Pike cod has a chance to execute.

FUNCTION
m_sizeof

FUNCTION
m_ind_types

FUNCTION
m_val_types

FUNCTION
MAPPING_LOOP

FUNCTION
free_mapping

FUNCTION
allocate_mapping

FUNCTION
mapping_insert

FUNCTION
mapping_get_item_ptr

FUNCTION
map_delete

FUNCTION
low_mapping_lookup

FUNCTION
low_mapping_string_lookup

FUNCTION
simple_mapping_string_lookup

FUNCTION
mapping_string_insert

FUNCTION
mapping_indices

FUNCTION
mapping_values

FUNCTION
mapping_to_array

FUNCTION
mapping_replace

FUNCTION
mkmapping

FUNCTION
copy_mapping

17.2.6 struct object

17.2.7 struct program

17.3 The interpreter

Functional overview

Overview of the Pike source

library files
compiler
language.yacc, las.c, program.c, docode.c peep.c peep.in
backend
backend.c
interpreter
interpret.c interpreter.h opcodes.c operators.c
supporting files
constants
docode

Previous chapter To contents To appendices