Search Top Index

REF ARRAYS John Gibson Feb 1996 COPYRIGHT University of Sussex 1996. All Rights Reserved. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>>>>>> <<<<<<<<<<<<<<<<<<<<< ARRAY PROCEDURES >>>>>>>>>>>>>>>>>>>>>> <<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>>>>>> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< This REF file details the procedures which deal with the array data structure. It introduces array structures and how they are implemented in Poplog. There is also a section on sparse arrays, which can be used for storing associations between arbitrary objects (see also REF * PROPS). CONTENTS - (Use <ENTER> g to access required sections) 1 Introduction 2 Predicates On Arrays 3 Obtaining Array Parameters 4 Constructing New Arrays 5 Generic Data Structure Procedures on Arrays 6 Miscellaneous 6.1 Storing Arrays on Disk 7 Examples of Arrays 8 Sparse Arrays 8.1 Examples of Sparse Arrays 9 Related Documentation --------------- 1 Introduction --------------- Arrays are data structures which enable information to be stored and accessed on the basis of a set of integer subscripts, where each subscript corresponds to a separate 'dimension' of the array; thus an N-dimensional array requires N such subscripts to access or update each of its elements. A one dimensional array is analogous to a vector except that vectors can be indexed only from 1 to their length, whereas arrays can have bounds covering arbitrary ranges of integers. (For information on vectors also see REF * VECTORS, * INTVEC, * STRINGS, * DATA, and REF * KEYS.) Because computer memory cannot directly represent multiple dimensions, the elements of an array actually have to be stored in an underlying 1-dimensional structure (e.g. a vector), each position in which corresponds to a particular set of N subscript values. The task of the array accessing mechanism is therefore to convert a given set of N-dimensional subscripts into the appropriate 1-dimensional subscript for accessing this structure. In Poplog, arrays are implemented as special procedures constructed by newanyarray. Such a procedure for an N-dimensional array takes N arguments, and includes within it the arrayvector, that is, the underlying 1-dimensional structure in which the elements of the array are in reality stored (which, despite its name, need not actually be a vector although it usually is). When called, e.g. array(sub1, sub2, ..., subN) the array procedure performs the computation of the 1-dimensional subscript from the N values given, and supplies this to the appropriate access procedure for the arrayvector structure (which then returns or updates the specified element). An array may have any number of elements in a given dimension, subscripted by any suitable range of integers. That is, an array with say, 30 in a particular dimension is not restricted to using subscripts 1 to 30, but can use 0 to 29, 5 to 34, -10 to 19, etc. An array's dimensions are specified to newanyarray by supplying its boundslist, i.e. a list of the minimum and maximum subscripts in each dimension (of length 2 * N for an N-dimensional array). When called, array procedures mishap if any of their subscript arguments is not in the appropriate range. When using an array in general, it is not necessary to know how its elements are arranged in the arrayvector; however, this does need to be known when the latter needs to be processed separately from the array. While in principle many different schemes are possible, Poplog (like most systems) provides just two: arraying 'by row' or 'by column'. Since the terms 'row' and 'column' only make sense for 2-dimensional arrays, we shall not attempt to define them, but simply say that 'by row' means the elements are stored with subscripts increasing in significance from left to right, and 'by column' with subscripts increasing in significance from right to left. That is, letting array be a 3-dimensional array with subscripts 1 - 3 in each dimension, the order of the 3 * 3 * 3 = 27 elements in its arrayvector with the two different schemes would be as follows: Arrayvector Subscript By Row By Column --------------------- ------ --------- 1 array(1, 1, 1) array(1, 1, 1) 2 array(2, 1, 1) array(1, 1, 2) 3 array(3, 1, 1) array(1, 1, 3) 4 array(1, 2, 1) array(1, 2, 1) 5 array(2, 2, 1) array(1, 2, 2) 6 array(3, 2, 1) array(1, 2, 3) 7 array(1, 3, 1) array(1, 3, 1) 8 array(2, 3, 1) array(1, 3, 2) 9 array(3, 3, 1) array(1, 3, 3) 10 array(1, 1, 2) array(2, 1, 1) ... ... ... 26 array(2, 3, 3) array(3, 3, 2) 27 array(3, 3, 3) array(3, 3, 3) Note that (as mentioned above), the arrayvector of an array is not limited to being an actual vector-class structure. The arguments to newanyarray allow the specification of any 'virtual' vector, in terms of an 'vector init' procedure and a 'subscriptor' procedure; newanyarray then calls the 'init' procedure with an appropriate length argument to construct the arrayvector initially, and the array procedure then uses the 'subscriptor' to store and retrieve elements. Note also that two or more arrays can be made to share the same arrayvector structure, either in whole or in part (for example, one array can be a sub-array of another). While arrays are limited to storing associations between sets of integers and arbitrary values, Poplog also provides mechanisms for storing associations between arbitrary objects -- see Sparse Arrays below and REF * PROPS. See also REF * PROCEDURE for procedures applicable to arrays as procedures in general. ----------------------- 2 Predicates On Arrays ----------------------- As mentioned above, arrays are procedures: thus isprocedure is true for any array, and the datakey of an array will be the same as the datakey of any procedure. (See REF * KEYS.) isarray(item) -> array [procedure] Returns a true result if item is either an array procedure itself, or is a closure of one through any number of sub-closures (i.e. the pdpart of item is examined recursively until a non-closure is found, and then this is tested for being an array). If item is an array procedure, or one is found inside a nest of closures, then that is returned, otherwise false. (This enables closures of arrays to be recognised as arrays, e.g. if array is 3-dimensional array then isarray will return array when applied to the 2-dimensional array array(%2%), etc.) isarray_by_row(array) -> bool [procedure] Returns true if the array array is arrayed by row, false if arrayed by column. ----------------------------- 3 Obtaining Array Parameters ----------------------------- boundslist(array) -> bounds_list [procedure] Given an array array, returns its list of minimum and maximum subscripts in each dimension. arrayvector(array) -> arrayvec [procedure] arrayvec -> arrayvector(array) Returns/updates the 1-dimensional structure used to hold the elements of the array array. The updater checks that the new arrayvec is the same data type as the old. arrayvector_bounds(array) -> min_sub -> max_sub [procedure] Returns the minimum and maximum subscripts used by the array array on its arrayvector. (Generally, min_sub will be 1 and max_sub will be the number of elements in the array; this will not be the case, however, when array is actually a sub-array of the arrayvector.) array_subscrp(array) -> subscr_p [procedure] Returns the subscriptor procedure used to access elements in the arrayvector of the array array. -------------------------- 4 Constructing New Arrays -------------------------- newanyarray is the basic procedure for constructing arrays. For historical and other reasons, the arguments to this procedure are somewhat complicated, in that it can take a number of optional arguments in various combinations. Essentially though, the pieces of information it requires to construct a new array are quite straightforward, viz # A 2 * N list of minimum and maximum subscripts for each of the N dimensions (the boundslist). This is used to derive both the number of dimensions, and the size in each dimension (and thus the total number of elements in the array). # A specification for the arrayvector to hold the elements of the array, and a subscriptor procedure with which to access and update it. These two can be specified in a number of different ways, either as separate arguments or by a single argument which implies values for both together. Other optional pieces of information can specify whether the elements are to be arrayed by row or by column, an initialisation for each array element, and (when the new array is to be a sub-array of an existing one), the starting offset of the new array within the existing arrayvector. newanyarray(bounds_list, element_init, arrayvec_spec, [procedure] subscr_p, arrayvec_offset, by_row) -> array This procedure constructs and returns a new N-dimensional array procedure array (where N >= 0). Its arguments are as follows: bounds_list A list of 2 * N integers whose elements are alternately the minimum and maximum subscript in each dimension, i.e. [% min1, max1, min2, max2, ..., minN, maxN %] This argument is always required. An empty boundslist specifies a 0-dimensional array (an object which when applied to zero subscripts returns its single value). element_init This argument is optional, and specifies an initialisation for each array element: its interpretation depends on whether it is a procedure or not. If a procedure, it is assumed to take N subscript arguments and to return a (possibly different) initialising value for each position in the array. newanyarray will then initialise the array by applying the procedure in turn to every combination of subscripts, i.e. element_init(sub1, sub2, ..., subN) -> array(sub1, sub2, ..., subN) (Note that the order in which the combinations of subscripts are generated is UNDEFINED). Otherwise, if element_init is not a procedure, every element of the array is simply initialised to that value. (However, to distinguish it from bounds_list, this argument must not be a list -- if you wish to initialise the elements to some list, you must use a procedure returning the list, as above.) arrayvec_spec This argument is always required, and specifies the arrayvector structure to be used to store the elements of the array. It must supply either an existing structure, or a procedure to construct a new one: for the former the value may be either (a) an actual vector-class structure (e.g. full vector, string, intvec, shortvec, etc); (b) an array procedure whose arrayvector is to be used; while for the latter it may be (c) a 'vector init' procedure p (which will be called as p(size) -> arrayvec where size is the number of elements in the array); (d) a vector-class key whose class_init procedure is to be used (see REF * DEFSTRUCT, * KEYS). All cases except (c) also implicitly supply a subscriptor procedure for the structure, making the next argument (subscr_p) optional; for case (c), subscr_p must be present. If an existing structure is specified with (a) or (b), it must of course be large enough to accommodate the new array elements. Note that if element_init is omitted, the initial values of the array elements will depend upon the arrayvec_spec argument (i.e. for a new structure they will be whatever the init procedure initialises them to, and for an existing structure they will have their current values). subscr_p A subscriptor procedure for accessing and updating elements of the arrayvector, i.e. a procedure with updater of the form subscr_p(subscript, arrayvec) -> element element -> subscr_p(subscript, arrayvec) This argument may always be supplied, but is essential only when arrayvec_spec specifies a 'vector init' procedure; in all other cases, i.e. an existing array, an existing vector-class structure or vector-class key, the subscriptor procedure derived from that is used if subscr_p is omitted. arrayvec_offset An optional integer argument specifying the starting offset of the new array's elements within an existing array vector got from arrayvec_spec (i.e. this argument is illegal in all cases where a new arrayvector has to be constructed). Note that this is an offset, not a subscript, i.e. 0 means start at the first element (the default), 1 at the second, etc. The existing structure must be large enough to accommodate all the array elements starting at the given offset. by_row An optional argument specifying whether the array elements are to be arrayed by row or by column: if supplied, it must be a boolean, true meaning by row or false meaning by column. If omitted, the value of poparray_by_row is used instead. poparray_by_row -> bool [variable] bool -> poparray_by_row This boolean variable controls the order in which the elements of an array produced by newanyarray are stored in its underlying vector, and supplies the default value for the BY_ROW argument (q.v.). newarray(bounds_list, element_init) -> full_array [procedure] This procedure provides a simpler interface to newanyarray for constructing arrays of full items, and is just newanyarray(% vector_key %) i.e. use standard full vectors to store the array elements (see REF * VECTORS). newsarray(bounds_list, element_init) -> char_array [procedure] Same as newarray, but using strings rather than full vectors, i.e. newanyarray(% string_key %) (see REF * STRINGS). ---------------------------------------------- 5 Generic Data Structure Procedures on Arrays ---------------------------------------------- The generic data structure procedures described in REF * DATA (datalength, appdata, explode, fill, and others defined in terms of those) can all be applied to arrays: they treat an array as the set of its arrayvector elements between the minimum and maximum subscripts given by the arrayvector_bounds. Thus, for example, if arrayvector_bounds(array) -> min_sub -> max_sub then datalength(array) will be max_sub - min_sub + 1 Similarly, appdata(array, p) will apply the procedure p to the value of each arrayvector element from min_sub to max_sub inclusive, and so on. The procedure copy when applied to an array copies both the array procedure and its arrayvector (so that the copy is completely independent of the original). ---------------- 6 Miscellaneous ---------------- in_array [syntax] This * FOR_FORM allows iteration over data in arrays, much as other forms allow iteration over data in lists and vectors. Its format is: for var1, var2, ... with_index ivar in_array array1, array2, ... updating_last n in_region list of_dimension d do statement-sequence endfor See HELP * in_array for details. arrayscan(boundslist, p) [procedure] Given a boundlist of array minimum and maximum subscripts (as supplied to newanyarray, etc), applies the procedure p to every combination of subscripts in that range (ordered 'by column', i.e. with the last subscript varying fastest). For each combination, p is given a list of subscripts of length N, where boundslist is of length 2 * N. For example, the following arrayscan([3 5 1 3], spr); will produce [3 1] [3 2] [3 3] [4 1] [4 2] [4 3] [5 1] [5 2] [5 3] The for-forms * in_array and * in_region provide alternatives to this. 6.1 Storing Arrays on Disk --------------------------- The libraries LIB * ARRAYFILE and LIB * DATAFILE may be used to store arrays on disk, and be subsequently read back into Poplog. arrayfile is the more efficient and space-economical procedure; however, it can only be used on arrays whose arrayvector is a 'byte-accessible' data structure. datafile should be used to save arrays whose arrayvector is a full vector. arrayfile(filename) -> array [procedure] arrayfile(filename, array) -> array array -> arrayfile(filename) Stores the array array in the disk file named by filename, using a special (machine-specific) data format. The arrayvector of array must be a byte-accesible structure. The following information is recorded: # The number of dimensions, and bounds, of array # The dataword of the array's arrayvector # Whether array is ordered by row or column # The contents of the arrayvector of array In its non-updater form, arrayfile reads the information stored in filename, and returns a new array constructed from it. If the optional second argument array is specified, the data saved in filename is written into it; no new array is created. See HELP * ARRAYFILE for examples. The procedure datafile is documented in REF * datafile. --------------------- 7 Examples of Arrays --------------------- The following: constant procedure multab = newarray([1 12 1 12], nonop *); will create and assign to multab a 12 x 12 2-dimensional array each of whose elements can be a full Poplog item, and where each element subscripted by I, J is initialised to I * J (in other words, multab is a multiplication table): multab(4, 3) => ** 12 To turn an existing string of "noughts-and-crosses" into an array: constant procedure oxo = newanyarray([1 3 1 3], '0 X X00 X'); cucharout(oxo(1, 3)); 0 If, in the previous example, we make the array by row instead of by column, the mapping between subscripts and positions in the string is "reversed": constant procedure oxo2 = newanyarray([1 3 1 3], '0 X X00 X', false); cucharout(oxo2(1, 3)); X ---------------- 8 Sparse Arrays ---------------- It is sometimes necessary to use large arrays in which many of the elements will all have some default value, and only relatively few will have a 'significant' value, i.e. different from the default. Represented as ordinary arrays, such 'sparse' arrays are very wasteful of space, since the arrayvector must allow storage cells for each element corresponding to a given combination of subscripts, and yet many or most of these will just repeat the default value. Rather than using an vector-type structure to hold the elements, Poplog sparse arrays are therefore implemented as multi-dimensional properties, that is, they use a tree of properties to associate each set of subscripts to a value (see REF * PROPS for a description of properties). This means that only the 'significant' elements take up space, and moreover, the 'subscripts' are not restricted to being integers, but can be any Poplog items. (Although this approach saves space, it does however mean that accessing sparse array elements is slower than for ordinary arrays.) A sparse array can also have an 'active default' -- i.e. a procedure that is run to decide what the default value of an element should be when it has not been assigned a value explicitly; for certain applications this can save even more space. newanysparse(dimensions, default_item) -> sparse_array [procedure] newanysparse(dimensions, default_p, apply) -> sparse_array Constructs and returns a new N-dimensional sparse array procedure (where N >= 1). This can be called as sparse_array(sub1, sub2, ..., subN) -> element element -> sparse_array(sub1, sub2, ..., subN) to access or update the element associated with a given set of N 'subscripts' (which can be any items at all, but note that subscript equality is on the basis of ==, not =). The dimensions argument specifies the number of dimensions N, either directly as an integer or as an N-element list. In the latter case, the elements of the list are integers giving the table sizes to be used for the properties employed for each dimension; in the former case, when dimensions is simply an integer, the table size in each dimension defaults to 20. (The table size for a property affects how fast items are accessed in it -- see REF * PROPS. For maximum speed it should be roughly the 'size', i.e. number of different subscript values, in each dimension; also, subscripts are dealt with from right to left, so that putting 'larger' dimensions to the left of 'smaller' ones will increase efficiency.) The remaining argument(s) specify the default value for elements of the array: in the first form of the call, the item default_item is the fixed default value for every element. The second form specifies an 'active default' procedure default_p, and to distinguish this from the first form (in which default_item could be a procedure), the last argument must be the procedure apply. default_p is expected to be a procedure which takes N 'subscript' arguments and returns a default value, i.e. default_p(sub1, sub2, ..., subN) -> value (this is comparable to a procedure specified as the element_init argument to newanyarray). newsparse(dimensions) -> sparse_array [procedure] This is a simpler version of newanysparse which has the word "undef" as a fixed default_item, i.e. same as newanysparse(%"undef"%) 8.1 Examples of Sparse Arrays ------------------------------ The first example is an sparse array representing points in 3-D space, where each element in the array is a list of points to which that point is connected to; each point is defaults to being connected just to itself: define lconstant here(x, y, z); lvars x, y, z; [[^x ^y ^z]] enddefine; constant procedure connected = newanysparse(3, here, apply); In this array, each cell contains a list of the points it is directly connected to: connected(1, 2, 3) => ** [[1 2 3]] connected(8, 9, 10) => ** [[8 9 10]] Some explicit connectivity can then be added: [8 9 10] :: connected(1, 2, 3) -> connected(1, 2, 3); connected(1,2,3) => ** [[8 9 10] [1 2 3]] The second example creates a sparse array in which the 3 'dimensions' are English words, foreign languages, and modalities, and where each element is the translation of the English word into the corresponding language and modality (the default value being false for no translation): constant procedure dictionary = newanysparse(3, false); "bonjour" -> dictionary("hello", "french", "polite"); "ola" -> dictionary("hello", "spanish", "familiar"); dictionary("hello", "french", "polite") => ** bonjour dictionary("hello", "french", "vulgar") => ** <false> ------------------------ 9 Related Documentation ------------------------ REF * DATA Information on Poplog data types and how they are represented. REF * KEYS Describes the information associated with each data type and some general procedures for manipulating data and creating fast access procedures. REF * VECTORS Describes standard full vectors (vectors containing any Poplog items). REF * INTVEC Describes intvecs (vectors containing 32 bit signed integers) and shortvecs (vectors containing 16 bit signed integers). REF * STRINGS Describes byte vectors. REF * PROPS Information on properties (hash-tables, indexed by arbitrary objects). REF * DEFSTRUCT Describes syntax for creating new vector or record classes. HELP * ARRAYS An introduction to arrays in Pop-11. HELP * NEWMAPPING Describes a particular type of property for associating items with structures on the basis of structure equality. --- C.all/ref/arrays --- Copyright University of Sussex 1996. All rights reserved.