Arena Language Manual
(C) 2006, Pascal Schmidt <arena-language@ewetel.net>
Contents
1 Introduction
1.1 What's Arena?
1.2 Why another scripting language
1.3 Target audience
1.4 Versioning
1.5 Structure of this manual
1.6 License
2 Language
2.1 Basic tokens
2.1.1 Comments
2.1.2 Keywords
2.1.3 Operators
2.1.4 Identifiers
2.1.5 Integer literals
2.1.6 Float literals
2.1.7 String literals
2.1.8 Grouping symbols
2.2 Runtime type system
2.2.1 void
2.2.2 bool
2.2.3 int
2.2.4 float
2.2.5 string
2.2.6 array
2.2.7 struct
2.2.8 fn
2.2.9 resource
2.3 Scopes and namespaces
2.3.1 Top-level vs. function-level scope
2.3.2 Global vs. local namespace
2.4 Statements
2.4.1 Basic rules for statements
2.4.2 Include statement
2.4.3 Control flow statements
2.4.3.1 if statement
2.4.3.2 while loop statement
2.4.3.3 do loop statement
2.4.3.4 for loop statement
2.4.3.5 continue statement
2.4.3.6 break statement
2.4.3.7 switch statement
2.4.3.8 try statement
2.4.3.9 throw statement
2.4.4 User-defined functions
2.4.4.1 Function definition
2.4.4.2 return statement
2.4.5 Structure templates
2.4.5.1 Defining structure fields
2.4.5.2 Defining structure methods
2.4.5.3 Constructor method
2.5 Expressions
2.5.1 Basic rules for expression nesting
2.5.2 Constant expressions
2.5.3 Reference expressions
2.5.3.1 Static reference expressions
2.5.3.2 Indexing of elements
2.5.4 Cast expressions
2.5.4.1 Conversion to void
2.5.4.2 Conversion to bool
2.5.4.3 Conversion to int
2.5.4.4 Conversion to float
2.5.4.5 Conversion to string
2.5.4.6 Conversion to array
2.5.4.7 Conversion to struct
2.5.4.8 Conversion to fn
2.5.4.9 Conversion to resource
2.5.5 Assignment expressions
2.5.5.1 Indexing in assignments
2.5.5.2 Combining assignments and operators
2.5.6 Function calls
2.5.6.1 Passing arguments "by reference"
2.5.7 Basic rules for structure templates
2.5.8 Constructor calls
2.5.9 Method calls
2.5.9.1 Static method calls
2.5.9.2 Dynamic method calls
2.5.10 Operators
2.5.10.1 Math operators
2.5.10.2 Boolean operators
2.5.10.3 Equality operators
2.5.10.4 Order operators
2.5.10.5 Bitwise operators
2.5.10.6 Operator precedence
2.5.11 Conditional expression
2.5.12 Source file and line expressions
2.5.13 Anonymous functions
3 Library
3.1 Runtime system
3.1.1 FLT_RADIX
3.1.2 FLT_DIG
3.1.3 FLT_MANT_DIG
3.1.4 FLT_MAX_EXP
3.1.5 FLT_MIN_EXP
3.1.6 FLT_EPSILON
3.1.7 FLT_MAX
3.1.8 FLT_MIN
3.1.9 INT_MAX
3.1.10 INT_MIN
3.1.11 type_of
3.1.12 tmpl_of
3.1.13 is_void
3.1.14 is_bool
3.1.15 is_int
3.1.16 is_float
3.1.17 is_string
3.1.18 is_array
3.1.19 is_struct
3.1.20 is_fn
3.1.21 is_resource
3.1.22 is_a
3.1.23 is_function
3.1.24 is_var
3.1.25 is_tmpl
3.1.26 is_local
3.1.27 is_global
3.1.28 cast_to
3.1.29 set
3.1.30 get
3.1.31 get_static
3.1.32 unset
3.1.33 global
3.1.34 assert
3.1.35 versions
3.2 Math functions
3.2.1 exp
3.2.2 log
3.2.3 log10
3.2.4 sqrt
3.2.5 ceil
3.2.6 floor
3.2.7 fabs
3.2.8 sin
3.2.9 cos
3.2.10 tan
3.2.11 asin
3.2.12 acos
3.2.13 atan
3.2.14 sinh
3.2.15 cosh
3.2.16 tanh
3.2.17 abs
3.3 Printing functions
3.3.1 print
3.3.2 dump
3.3.3 sprintf
3.3.4 printf
3.4 String functions
3.4.1 strlen
3.4.2 strcat
3.4.3 strchr
3.4.4 strrchr
3.4.5 strstr
3.4.6 strspn
3.4.7 strcspn
3.4.8 strpbrk
3.4.9 strcoll
3.4.10 tolower
3.4.11 toupper
3.4.12 isalnum
3.4.13 isalpha
3.4.14 iscntrl
3.4.15 isdigit
3.4.16 isgraph
3.4.17 islower
3.4.18 isprint
3.4.19 ispunct
3.4.20 isspace
3.4.21 isupper
3.4.22 isxdigit
3.4.23 substr
3.4.24 left
3.4.25 right
3.4.26 ord
3.4.27 chr
3.4.28 explode
3.4.29 implode
3.4.30 ltrim
3.4.31 rtrim
3.4.32 trim
3.5 Array functions
3.5.1 mkarray
3.5.2 qsort
3.5.3 is_sorted
3.5.4 array_unset
3.5.5 array_compact
3.5.6 array_search
3.5.7 array_merge
3.5.8 array_reverse
3.6 List functions
3.6.1 nil
3.6.2 cons
3.6.3 length
3.6.4 null
3.6.5 elem
3.6.6 head
3.6.7 tail
3.6.8 last
3.6.9 init
3.6.10 take
3.6.11 drop
3.6.12 intersperse
3.6.13 replicate
3.7 Structure functions
3.7.1 mkstruct
3.7.2 struct_get
3.7.3 struct_set
3.7.4 struct_unset
3.7.5 struct_fields
3.7.6 struct_methods
3.7.7 is_field
3.7.8 is_method
3.7.9 struct_merge
3.8 Functions on functions
3.8.1 is_builtin
3.8.2 is_userdef
3.8.3 function_name
3.8.4 call
3.8.5 call_array
3.8.6 call_method
3.8.7 call_method_array
3.8.8 prototype
3.8.9 map
3.8.10 filter
3.8.11 foldl
3.8.12 foldr
3.8.13 take_while
3.8.14 drop_while
3.9 Random number functions
3.9.1 RAND_MAX
3.9.2 rand
3.9.3 srand
3.10 Environment functions
3.10.1 argc
3.10.2 argv
3.10.3 exit
3.10.4 getenv
3.10.5 system
3.11 File I/O functions
3.11.1 stdin
3.11.2 stdout
3.11.3 stderr
3.11.4 is_file_resource
3.11.5 fopen
3.11.6 fseek
3.11.7 ftell
3.11.8 fread
3.11.9 fgetc
3.11.10 fgets
3.11.11 fwrite
3.11.12 setbuf
3.11.13 fflush
3.11.14 feof
3.11.15 ferror
3.11.16 clearerr
3.11.17 fclose
3.11.18 remove
3.11.19 rename
3.11.20 errno
3.11.21 strerror
3.12 Date and time functions
3.12.1 Date and time structure
3.12.2 time
3.12.3 gmtime
3.12.4 localtime
3.12.5 mktime
3.12.6 asctime
3.12.7 ctime
3.12.8 strftime
3.13 Locale functions
3.13.1 getlocale
3.13.2 setlocale
3.13.3 localeconv
3.14 Dictionary functions
3.14.1 is_dict_resource
3.14.2 dopen
3.14.3 dread
3.14.4 dwrite
3.14.5 dremove
3.14.6 dexists
3.14.7 dclose
3.15 Memory management functions
3.15.1 is_mem_resource
3.15.2 malloc
3.15.3 calloc
3.15.4 realloc
3.15.5 free
3.15.6 cnull
3.15.7 is_null
3.15.8 cstring
3.15.9 mputchar
3.15.10 mputshort
3.15.11 mputint
3.15.12 mputfloat
3.15.13 mputdouble
3.15.14 mputstring
3.15.15 mputptr
3.15.16 mgetchar
3.15.17 mgetshort
3.15.18 mgetint
3.15.19 mgetfloat
3.15.20 mgetdouble
3.15.21 mgetstring
3.15.22 mgetptr
3.15.23 mstring
3.15.24 is_rw
3.15.25 msize
3.15.26 memcpy
3.15.27 memmove
3.15.28 memcmp
3.15.29 memchr
3.15.30 memset
3.16 Foreign function calls
3.16.1 dyn_supported
3.16.2 is_dyn_resource
3.16.3 dyn_open
3.16.4 dyn_close
3.16.5 dyn_fn_pointer
3.16.6 cfloat
3.16.7 dyn_call_void
3.16.8 dyn_call_int
3.16.9 dyn_call_float
3.16.10 dyn_call_ptr
3.17 PCRE functions
3.17.1 pcre_supported
3.17.2 PCRE_ANCHORED
3.17.3 PCRE_CASELESS
3.17.4 PCRE_DOLLAR_ENDONLY
3.17.5 PCRE_DOTALL
3.17.6 PCRE_EXTENDED
3.17.7 PCRE_MULTILINE
3.17.8 PCRE_UNGREEDY
3.17.9 PCRE_NOTBOL
3.17.10 PCRE_NOTEOL
3.17.11 PCRE_NOTEMPTY
3.17.12 is_pcre_resource
3.17.13 pcre_compile
3.17.14 pcre_match
3.17.15 pcre_exec
3.17.16 pcre_free
4 Changes
4.1 Language changes
4.1.1 Version 1.0 to 2.0
4.1.2 Version 2.0 to 2.1
4.1.3 Version 2.1 to 2.2
4.2 Library changes
4.2.1 Version 1.0 to 1.1
4.2.2 Version 1.1 to 2.0
4.2.3 Version 2.0 to 2.1
4.2.4 Version 2.1 to 2.2
4.2.5 Version 2.2 to 2.3
4.2.6 Version 2.3 to 2.4
4.2.7 Version 2.4 to 2.5
4.2.8 Version 2.5 to 2.6
4.2.9 Version 2.6 to 2.7
4.2.10 Version 2.7 to 3.0
1 Introduction
This manual describes the Arena scripting language. It is
meant to give a complete overview of the language. This
includes syntax, semantics, and standard library functions
provided by the language runtime environment.
1.1 What's Arena?
Arena is a scripting language. It is closely modelled on
the C programming language, but with some features removed
and added to create a language more suitable to ad-hoc
scripting. The following is a description of the main
differences between Arena and C.
Arena does automatic memory management. This means the
programmer does not have to reserve memory for strings
and arrays. Additionally, variables do not have to be
declared before they are used.
Arena uses dynamic typing. This means variables can be used
to store arbitrary values. A variable that holds an integer
at the beginning of a script may well be used to hold a
string at the end of the same script. The concept extends
to arrays -- arrays can have elements of different types.
Arena has anonymous functions. Sometimes you may want to
pass a function into another function (functions can accept
other functions as their arguments), and anonymous functions
provide a way of doing so without having to invent a function
name. This is especially useful if you need a particular
function just once and just for passing into another function.
Arena provides exception support. Exceptions can be used
for handling error situations in a script. They provide
out-of-band error signalling and handling.
Arena does not allow user-defined datatypes. This is a
restriction common to many scripting languages. It does, however,
have structure templates, which work a lot like classes in
object-oriented programming languages.
Arena does not provide a way to define constants -- that is,
values set by the programmer that cannot change during
the execution of a script. The rationale is that it is not
strictly necessary to have constants provided by the language.
One can simply use a global variable and write to it only
once at the beginning of a script.
Apart from the functions listed above, Arena tries to emulate
C as much as possible. The semantics of language construct
are supposed to match C, and the standard library of functions
uses the same names as the C standard library where both
provide the same functionality.
1.2 Why another scripting language
There is no shortage of existing scripting languages, so why
design and write another one? Two reasons, mainly.
The first reason is that many people, especially in the
Unix community, know how to program in C, but having to do
your own memory management all the time is a pain for small
or quick projects. Arena provides a way to write "almost C"
code without having to think about memory management. Dynamic
typing was added because it is very convenient to have once
you have already abandoned the need to declare variables
before use (which you have to do in C so that the compiler
can set aside memory for variables).
The second reason for writing another language is that most
scripting languages of today are not really lightweight
anymore. Extensive function libraries often mean that a
scripting language interpreter is several megabytes in size.
For fans of more minimalist approaches, several megabytes
ain't it. Arena's standard library of functions is based on
that of ISO C for the very reason that it is very compact and
does not provide bells and whistles.
1.3 Target audience
This manual tries to describe the syntax and semantics
of the Arena language, but it does not go into every detail
and certainly is no guide on how to solve real problems
using Arena.
It is assumed the reader already knows how to program. Most
of the language constructs of Arena appear in other
languages, as well, so already knowing a different
programming language helps. Since Arena is modelled on C,
knowing C helps a lot. For structure templates, which
are not taken from C, knowledge of object-oriented
programming languages such as C++ or Java should help,
since structure templates are basically a low-level
version of classes.
1.4 Versioning
Both the language and standard library are versioned. This
manual describes version 2.2 of the language and version
3.0 of the standard library.
Incompatible changes to the language or library result
in a change of the major version number and an
implementation of the new version cannot run all
scripts written for a previous version of the language.
Thus, an implementation of version 2.0 of the language
will not run all possible version 1.0 scripts.
Compatible changes to the language or library result
in a change of the minor version number. An implementation
of such a new version must still be able to run all
scripts written for a version of the language with the
same major version number and a smaller minor version
number. Thus, an implementation for version 1.3 of the
language will still run all version 1.0, 1.1, and 1.2
scripts.
Minor version number changes for the language are only
possible if some new syntax is introduced, in such a way
that the new syntax would have been a syntax error in the
previous version. Changes to existing syntax require a
new major version.
Minor version number changes for the library are possible
as long as only new library functions are introduced by
the new version. Old scripts that already use the same
function names for user-defined functions will still work
as the user-defined functions will overwrite the library
functions.
1.5 Structure of this manual
The rest of the text is divided into two main chapters.
The first describes the syntax and intended semantics
of the language. The second describes the standard library
of functions that come with the language.
If some aspect of the behaviour of the language or
library is said to be "implementation-defined", this means
an implementation of the language can freely choose how to
behave for the described situation. However, the choice
must be consistent -- under the same circumstances, the
same behaviour must result.
If some aspect of the behaviour of the language
or library is said to be "undefined", this means an
implementation of the language can do anything for the
described situation, no matter how inconsistent. An
implementation may even crash if an undefined situation
arises during the execution of a script; or, as has
been observed about C, an implementation may make
demons fly out of your nose if you invoke undefined
behaviour.
1.6 License
You are free to copy, distribute, display, make derivative
works of, and/or make commercial use of this manual,
provided you follow these conditions:
You must keep any copyright notices and license terms
intact. You are free to add your own copyright notices to
parts of a derivative work that you wrote yourself.
If you make changes to the semantics of existing parts
of the text, those parts must carry prominent notice that
you changed them. This condition is made because this
manual describes the behaviour of a programming language,
and changes to the text can easily change the described
behaviour. This could lead to the changed text describing
another, slightly incompatible language.
2 Language
This section of the manual describes the syntax and semantics
of the Arena scripting language.
This version of the language manual describes version 2.2 of
the language.
2.1 Basic tokens
When a script is parsed by the Arena interpreter, it is
first split up into tokens. These tokens are then combined
to form statements and expressions. Since it is important
to know what kind of tokens (for example, variable and
function names) are accepted by the language, the different
token types are described next.
2.1.1 Comments
Comments can be part of a script. They are ignored by the
interpreter and can be used to annotate the script for
human readers. There are two forms of comments: one-line
comments and multi-line comments.
One-line comments start with the character "#" (hash) or
the characters "//" (double forward slash). They can be
placed anywhere on an input line and cause the rest of
the line to be treated as a comment. The following are
examples of one-line comments:
# this line is ignored
a = 5; // everything back here is ignored
Multi-line comments start with the characters "/*" (forward
slash followed by asterisk) and end with the characters
"*/" (asterisk followed by forward slash). Everything between
those two markings is ignored. Multi-line comments can be
nested -- you need a matching number of "/*" and "*/"
sequences to really end a comment. The following is an
example of a multi-line comment:
/* this is lengthy explanation of what is happening,
but you can probably figure that out yourself */
2.1.2 Keywords
Keyword are words reserved by the language. They are
used to make up statements and expressions. They cannot
be used as names for variables, functions, or templates.
Keywords are case-sensitive: "do" is a language keyword,
"Do" or "DO" are not.
The following is a list of all Arena keywords:
array break bool case catch continue
default do else extends false float
fn for forced if include int
mixed new resource return string struct
switch template throw true try void
while
2.1.3 Operators
Operators are special symbols reserved by the language.
They are used to combine expressions and generally
represent operations performed on pieces of data. For
example, the + operator denotes mathematical addition.
The following is a list of all Arena operator symbols:
:: == != <= >= <
> ++ -- && || **
+ - * / % &
| ^ << >> ! ~
= += -= *= /= &=
|= ^= <<= >>=
2.1.4 Identifiers
An identifier is a name used for a variable, a function,
or a structure template. It is used in a script to refer
to entities of the language by name. Identifiers are chosen
by the programmer. The language actually puts some
identifiers in place before a script starts (those for
the standard library of functions), but those are not
reserved in the same way that keywords are -- you can
reuse them for your own variables, functions, or
structure templates if you wish.
An identifier starts with an underscore character or
an upper-case or lower-case letter. A letter is one of
the 26 characters in the range A-Z (no umlauts or
accented letters allowed). For the rest of an identifier,
the same characters are allowed, with the addition
of decimal digits. Decimal digits are characters in the
range 0-9.
Keywords cannot be used as identifiers.
The following is a list of example identifiers:
foo
x2
my_funny_name
__something
2.1.5 Integer literals
An integer literal is used to represent an integer number
in a script. An integer literal is made up of an optional
prefix and one or more digits. An integer literal with
no prefix is treated as a decimal number. Decimal digits
are characters in the range 0-9. An integer literal with
the prefix "0" (zero) is treated as an octal number. Octal
digits are characters in the range 0-7. An integer literal
with the prefix "0x" (zero x) is treated as a hexadecimal
number. Hexadecimal digits are characters in the ranges
0-9, a-f, and A-F.
The following are examples of integer literals:
0
123
0755
0xFF
0xbeef
2.1.6 Float literals
A float literal is used to represent a floating point number
in a script. A float literal is made up of zero or more
decimal digits, followed by a period, followed by one or
more decimal digits. A decimal digit is a character in the
range 0-9. Optionally, an exponent can be added to the end
of the literal. This is composed of the letter "e" or "E",
followed by either "+" or "-", followed by one or more
decimal digits. If present, the exponent is used as a base
10 exponent and multiplied with the rest of the number. As
an example, "1E-2" is the same as 1 * 10^(-2) which is
0.01.
The following are examples of float literals:
1.0
.25
0.376568E-10
1E+30
2.1.7 String literals
A string literal is used to represent a string inside a
script. A string literal is made up of a single or double
quote character, followed by an arbitrary number of
characters, followed by a matching single or double quote.
If the string literal is enclosed in single quotes, it
cannot contain a single quote. The same applies to string
literals in double quotes; they cannot contain double
quotes.
To allow the representation of characters that cannot
directly appear inside a script or string, some escape
sequences are permitted. An escape sequence begins with
the character "\" (backslash). The following escape
sequences are defined:
\\ a literal backslash
\b backspace character
\e escape character
\f form feed character
\n newline character
\r carriage return character
\t tab character
\ccc character with octal character code ccc
\occc character with octal character code ccc
\dccc character with decimal character code ccc
\xcc character with hexadecimal char code cc
For character code escapes, less digits than given above can be
used if the character code needed is small enough. Note that if
any character not listed above follows the backslash, the escape
sequence results in that character. For example, the escape
sequence "\q" results in the character "q".
The following are examples of string literals:
"Hello"
'Greetings to you!\n'
"All your base are belong to us"
'Embedded \0 zero \0 characters'
2.1.8 Grouping symbols
Grouping symbols are used to make up larger entities from
statements and expressions or to change the order in which
script code is executed. The following is a list of the
grouping symbols used by the Arena language:
( ) { } [ ]
. ; ,
2.2 Runtime type system
Types are used to provide categories for different kinds of
values that a script deals with. Arena provides eight datatypes
for use by the programmer. No user-defined types are possible,
but a script can use structure templates to provide a sort of
sub-typing for the struct datatype.
Values of some types can be converted into values of other
types by use of a cast expression. More on that later in the
chapter about expressions.
2.2.1 void
The void type is used in places where no meaningful value
can be returned. The void type has only one value, which is
written "()" (two parenthesis immediately following each other,
pronounced "void" or "unit"). All Arena functions must return
a value. If a function does not have a meaningful result (for
example, a function that outputs a message to the user), it
can return a void value instead of having to invent something
else.
2.2.2 bool
The bool type is used to represent truth values. It has two
values called "false" and "true". It is normally used to
hold the results of boolean computations or for representing
simple on-off switches.
2.2.3 int
The int type is used to hold signed integer values. The
precision is at least 32 bits. This means an int can generally
hold integer values between -2^31 and 2^31 - 1.
Arena does not provide unsigned integers. The rationale for
this is that the additional bit of precision that an unsigned
type provides for large positive integer values is not
enough of a benefit to warrant extra complexity for an
implementation.
2.2.4 float
The float type is used to represent signed floating point
number. The precision of a float is at least that of an IEEE
double precision floating point number.
Arena does not provide multiple floating point types with
different precisions, like C does. Like the omission of an
unsigned integer type, this was decided to keep implementation
complexity down to a minimum.
2.2.5 string
The string type is used to represent an arbitrary sequence
of bytes or characters. It is normally used to represent text.
Note that unlike strings (character pointers, really) in C,
an Arena string can contain bytes with the value 0 (zero). In
C such a byte would be considered the end of the string.
2.2.6 array
The array type is used to represent a numbered collection
of values. The types of the values stored in an array,
called the elements of the array, are not constrained. This
means each element can have a different type from the other
elements. An array can have other arrays as elements.
Arrays are indexed using integers, starting at 0. This means
the first element of an array has index 0, the second has
index 1, the third has index 2, and so on.
2.2.7 struct
The struct (short for structure) type is used to represent
a collection of values. Unlike an array, in which the
elements are reference by integer indices, the elements
of a struct have names. The order of elements in a struct
is not significant, which is another important difference
to the array type. Elements in a structure are called "fields"
or sometimes "methods" (if they are of type fn, see below).
The names of structure elements are identifier tokens, but
there are also library function that use normal string values
as structure element names. In general, you can think of a
struct as being indexed by string values.
2.2.8 fn
The fn type is used to represent functions. This type allows
an Arena script to use functions like any other value. For
example, functions can be used as arguments to other functions
or can be returned as results from other functions. It is
also possible to create so-called anonymous functions on the
fly, by use of a special expression that results in an fn
value.
2.2.9 resource
The resource type is used to represent operating system
resources in use by a script. Examples are file handles or
manually allocated memory. The resource type has automatic
management that ensures that operating system resources are
freed when a resource value is no longer accessible by a
running script.
The contents of a resource value are opaque from the
viewpoint of a running Arena script.
2.3 Scopes and namespaces
A scope is defined as the area where a given portion of
source code appears a script. A namespace defines a
limited area of visibility for variables, functions,
and structure templates. Both concepts are related and
determine what parts of a script can access other parts of
the same script.
2.3.1 Top-level vs. function-level scope
The scope of a piece code is determined wholly by its
position in the source code. The scope of a given piece
of code cannot and does not change at runtime.
The scope active at the beginning of a script is the
top-level scope. At this scope, arbitrary statements
can be used, including function and structure template
definitions.
When a function definition begins, the source code scope
changes to function-level scope. At this scope, all
statements except other function definitions and structure
template definitions are allowed. This means function
definitions cannot be nested.
When a function definition ends, the statements that
appeared in the function-level scope become the function's
body. The function body is what gets executed when a
function later is called from other code. After leaving
a function definition, the top-level scope is active
again.
When a structure template definition begins, the scope
remains top-level scope, but the following definitions
up to the end of the structure template definition are
considered to be part of it. Structure template
definitions cannot be nested.
2.3.2 Global vs. local namespace
Namespaces are areas where variables, functions, and
structure templates are stored. All the named entities
of the language that are used in a script are part of a
namespace. A namespace associates identifiers with
the entities they name. Note that there are no
separate namespaces for variables, functions, and
structure templates. A given identifier can only
be used for one kind of entity at a time.
Namespaces can be visible or invisible to the currently
executing code. Code can only see variables, functions,
and structure templates stored in a visible namespace.
Entities stored in an invisible namespace are involatile
until they become visible again.
There is one special namespace called the global
namespace. This namespace is always visible. Variables
and functions provided by the Arena standard library
are stored in the global namespace. Code running at
top-level scope has access to only one namespace, the
global namespace.
In addition to the global namespace, there are local
namespaces. A local namespace is created whenever
a function is called. The code inside the function
runs within a local namespace of its own. To this code,
both the global namespace and the local namespace of the
function are visible. The local namespace starts out
empty.
The visibility rules inside a local namespace are as
follows: the local namespace has priority. Only if an
identifier is not found in the local namespace, the
global namespace is consulted. When the namespace
is written to, the write always only effects the
local namespace. If a function attempts to change a
variable it has obtained from the global namespace,
a copy of the variable is created in the local
namespace.
When a function calls another function, another
local namespace is created. The previous local
namespace is invisible to the code inside the
called function. Only when the called function exits,
that namespace becomes visible again.
When a function exits, its local namespace is destroyed.
Everything that was stored in the local namespace is
no longer accessible. You can assume memory that was
used by the local namespace is freed at this point.
What the above boils down to is that functions have
their own set of local variables and can manipulate
them without affecting variables outside of the
function itself.
As a side note, the struct type works just like
a namespace of its own.
2.4 Statements
Statements provide a way to sequence and structure code.
In other words, statements determine what gets executed
and under which conditions.
The following sections include code examples that make
use of expressions, which have not been described up
to now. Expressions will be explained in the next
chapter.
2.4.1 Basic rules for statements
Statements are executed in order that they appear in the
top-level scope. Individual statements are end with
a ";" (semicolon) character. Expressions can be used
as statements by simply adding a semicolon at the end
of the expression. For example, if "expr" is a valid
expression, then the following is a valid statement:
expr;
Using an expression as a statement evaluates the
expression. Evaluation of an expression results in a
value in one of the types provided by the language.
When an expression is used as a statement, that value
is discarded.
Statements can be grouped together into one statement
by using curly braces. The whole block of statements
counts as one statement. When the block is executed, the
statements inside it are executed in the order they are
listed. For example:
{
stmt1;
stmt2;
stmt3;
}
The above is a block consisting of three statements. Note
that there is no semicolon at the end of the block itself.
Blocks can be nested arbitrarily deep. Blocks are normally
used when you want to supply a list of statements to
execute in a place where only one statement is allowed by
the language.
A semicolon all by itself also constitutes a valid
statement that does nothing when executed. Blocks are
allowed to be empty. An empty block does nothing when
executed.
2.4.2 Include statement
The include statement is made up of the keyword "include"
followed by a string in double quotes, followed by a
semicolon as usual for ending a statement. The string
is used as a filename. The contents of the file are
parsed as source code as if it were present after the line
with the include statement on it.
Note that the included code will be parsed at the
current scope. If the current scope is inside a function,
the included code cannot define functions or structure
templates. Normally include statements are only used at
global scope, for including files that contain libraries of
functions or structure template definitions.
An implementation of Arena may search for the named include
file in implementation-defined locations on the system
running the script. However, it is only guaranteed that
the current working directory will be searched.
Include files can be nested arbitrarily deep. It is the
responsibility of the programmer to prevent loops.
The following is an example of an include statement used
to read in a file called "library.inc":
include "library.inc";
2.4.3 Control flow statements
Control flow statements influence the order in which
statements are executed, or whether they are executed at
all.
2.4.3.1 if statement
The if statement is used to execute code based on a
condition. It consists of the keyword "if", followed by
an expression in parenthesis, followed by a statement
or block. The expression is called a guard expression.
When the if statement is executed, the guard expression
is evaluated. If the resulting value is not of type bool,
it is converted to bool (using the same rules as
for cast expressions, see below). If the result is the
bool value "true", the statement part of the if
statement is executed. If the the result of the guard
expression is "false", the statement part is not executed.
The following is an example of an if statement:
if (x % 2 == 0)
print("x is even!");
If you need to execute multiple statements, use a block
statement.
You can also give a statement to be executed when the
guard expression evaluates to "false". This is done
by following the first statement with the keyword "else"
and another statement. An example:
if (x % 2 == 0)
print("x is even!");
else
print("Sorry, x is uneven!");
2.4.3.2 while loop statement
The while loop statement can be used to execute another
statement or block multiple times. It consists of the
keyword "while", followed by a guard expression in
parenthesis, followed by a statement known as the loop
body.
When a while loop is executed, the guard expression
is evaluated, following the same rules as given for
the guard expression of an if statement. If the result
is "true", the loop body is executed. Execution of
the while loop then restarts at the beginning. If the
guard expression evaluates to "false", the loop
body is not executed and the while loop is not
restarted at the beginning.
These rules mean that a while loop only executes as long
as the guard expression evaluates to "true". If the guard
expression evaluates to "false" the first time it is
considered, the loop body is never executed.
The code inside the while loop normally has side effects
that eventually change the result of the guard
expression to "false".
The following is an example of a while loop with a
block statement as its loop body:
while (x % 2 == 0) {
print("x was even");
x = rand(0,999);
}
2.4.3.3 do loop statement
The do loop statement is a close cousin of the while loop
statement; only the positions of the guard expression and
loop body are exchanged. A do loop consists of the keyword
"do", followed by a statement as the loop body, followed
by the keyword "while" and a guard expression in
parenthesis.
When a do loop is executed, the loop body gets executed
first. Then the guard expression is evaluated using the
same rules as given for the guard expression of an if
statement. If the result is "true", the do loop is executed
again. If the result is "false", execution continues after
the loop.
The above rules mean that the body of a do loop is always
executed at least once. It is then executed again as long
as the guard expression evaluates to "true".
The following is an example of a do loop:
do {
now = time();
} while (now - saved < 10);
2.4.3.4 for loop statement
The for loop statement offers a more versatile form of
looping compared to the while and do loops detailed
in the previous two sections. A for loop consists of
the keyword "for", followed by three semicolon-separated
expressions in parenthesis, followed by a statement
that serves as the loop body. The first expression
is called an initialiser expression, the second a guard
expression, and the third a loop expression.
When a for loop executes, the initialiser expression
is evaluated. This happens only once, and the result
of the evaluation is discarded. Then the guard expression
is evaluated using the same rules as given for the guard
expression of an if statement. If the result is "true",
the loop body is executed. Following the loop body, the
loop expression is evaluated and its result discarded.
Execution of the for loop then restarts, omitting the
initialiser expression. If the guard expression
evaluates to "false", the loop body and loop expression
are not executed and execution resumes after the for
loop.
The above rules mean that a for loop executes as long
as its guard expressions evaluates to "true". If it
does not evaluate to "true" on the first execution of a
for loop, the loop body is never executed.
Each of the three expressions in a for loop statement
can be left empty. In that case the (empty) expression is
replaced with the literal constant "true". This means
a for loop with all three expressions left off produces
an infinite loop.
For loops are often used to execute a piece of code
a given number of times. For example, the following
loop prints the word "hello" ten times in a row:
for (i = 0; i < 10; i++) {
print("hello");
}
2.4.3.5 continue statement
The continue statement can be used inside of do, while,
and for loops. It consists of the keyword "continue".
When a continue statement is executed inside of a loop
body, the statements following the continue statement
in the loop body are skipped. Processing continues as
normal for the loop statement in question. Normally this
means the loop's guard expression will be evaluated again.
When a continue statement is executed outside of a loop
body, it has the same effect as an empty statement.
The following is a (silly) example of counting the
number of odd integers between 0 and 99. A for loop
is used and the increment of a counter variable is skipped
by use of a continue statement if the number in
question is even.
odd = 0;
for (i = 0; i < 100; i++) {
if (i % 2 == 0) continue;
++odd;
}
print(odd, " odd numbers found");
2.4.3.6 break statement
The break statement can be used inside of do, while,
and for loops (for the use of break in a switch
statement, see the next section). It consists of the
keyword "break".
When a break statement is executed inside of a loop
body, the execution of the rest of the loop body
is skipped. Execution then resumes with the next
statement following the loop statement that contains
the break statement. In effect, execution of that
loop is terminated by the break statement.
When a break statement is executed outside of a loop
body (or switch statement, see below), it has the same
effect as an empty statement.
The following is an example use of break which exits
from an infinite for loop as soon as a random number
between 0 and 99 equals zero.
for (;;) {
number = rand(0, 99);
print("my number: ", number, "\n");
if (number == 0) break;
}
2.4.3.7 switch statement
The switch statement is used to execute one or more
of a number of statement groups depending on the
value of a guard expression. It consists of the
keyword "switch", followed by a guard expression
in parenthesis, followed by statement groups enclosed
in curly braces.
Two different kinds of statement groups are possible.
There can be an arbitrary number of case groups and
one default group. A case group starts with the keyword
"case" followed by an expression, followed by a colon,
followed by an arbitrary number of statements. If the
last statement in the group is a break statement, this
has a special meaning described below. The default group
consists of the keyword "default" followed by a colon,
followed by an arbitrary number of statements. A break
statement at the end of a default group has no special
meaning relevant to the switch statement, but it still
has its normal effect on an enclosing loop statement.
When a switch statement is executed, its guard expression
is evaluated. The resulting value is then used to decide
which case group to execute. Case groups are considered
in the order that they appear in the switch statement.
When a case group is considered, its expression is
evaluated. If the resulting value is equal (in type and
value) to the value of the guard expression, the
statements inside the case group are executed. If the
last statement of the group is a break statement,
execution of the switch ends and the next statement
executed is the one following the switch statement.
If there is no break at the end of the case group, the
statements of the next group are also executed, without
evaluating the expression of that group. This is called
"fall through". This behaviour continues until either a
break statement at the end of a case group is encountered,
a default statement group is executed, or the switch
statement ends.
If a case group is considered and its value does
not match the value of the switch's guard expression,
the statements in the case group are not executed. The
next case group is considered instead and its
expression will be evaluated and checked. A default group,
if present, is not included in the case statements to
consider for execution.
When all case statements have been considered and no
match was found, the behaviour of the switch statement
depends on the presence of a default group. It it is
present, the statements associated with it are executed.
If it is not present, the switch simply executes
nothing. Note that there is no fall through out of a
default group, execution of a switch always ends once
the last statement of the default group has been executed.
The following example counts how many numbers between
0 and 99 are divisible by 3 or 6. It uses a switch that
evaluates the remainder of a division by 6. It employs fall
through since anything divisible by 6 is also divisible by 3.
It uses a default group to count how many numbers were not
divisible by 3 or 6.
three = six = none = 0;
for (i = 0; i < 100; i++) {
switch (i % 6) {
case 0:
++six;
case 3:
++three;
break;
default:
++none;
}
}
print(three, "numbers were divisible by 3\n");
print(six, "numbers were divisible by 6\n");
print(none, "number were not divisible by either\n");
2.4.3.8 try statement
The try statement is used to handle exceptions. It
consists of the keyword "try", followed by a statement,
followed by the keyword "catch", followed by an
identifier in parenthesis, followed by another
statement.
When a try statement is executed, the statement
following the keyword "try" is executed. What gets
executed next depends on whether this statement
causes an exception (by use of a throw statement, see
below). If the enclosed statement does not cause
an exception, the next statement executed is
the statement directly following the try
statement; the statement in the catch part of the
try statement is not executed.
If the enclosed statement does throw an exception,
the value thrown is assigned to a variable with the
identifier given in the catch part of the try
statement. The statement given in the catch part is
then executed. Execution then continues behind the
try statement. The variable with the exception value
remains visible to the code following the try
statement. Executing the catch part of a try
statement is often called "handling" the exception.
It is possible for try statements to be nested
arbitrarily deep. An exception is always handled by
the innermost try statement that encloses the code
that caused the exception.
It is common for both statements in a try statement
to actually be block statements.
The following is an example of a try statement used
to encapsulate two function calls which may cause
exceptions. If an exception occurs, its value is
printed.
try {
a = somefunc();
b = someotherfunc();
} catch (e) {
print("exception ", e, " occurred\n");
}
2.4.3.9 throw statement
The throw statement is used to cause an exception.
It consists of the keyword "throw" followed by an
expression.
When a throw statement is executed inside of a
try statement (either directly or because it occurs
inside a function called from within a try statement),
the throw expression is evaluated and the resulting
value becomes the exception value. Execution then
continues with the catch part of the innermost
enclosing try statement.
Note that the above means a throw statement executed
inside a loop body breaks out of the loop if the
handling try statement is outside of the loop.
When a throw statement is executed outside of a
try statement, this is considered a fatal error and
execution of the whole Arena script is terminated at
the point where the exception was thrown.
The following is an example of the use of a throw
statement to throw an exception with the string value
"oops" as the exception value:
throw "oops";
2.4.4 User-defined functions
User-defined functions provide a way to structure code
into separate, named entities. Each function accepts
input values, called function arguments, and computes
a value called the return value of the function when
called.
2.4.4.1 Function definition
A function definition declares a user-defined function
to the script interpreter. It consists of the function
return type, followed by an identifier naming the
function, followed by a list of argument types and
names in parenthesis, followed by a statement to be
used as the function body. The individual argument types
and names are separated by commas. The list of arguments
can be left empty.
The return type can be given by using one of the keywords
"void", "bool", "int", "float", "string", "array, "struct",
"resource", or "fn". The intent is to specify that the function
returns a value of the given type when it is called. It is a
fatal error if the code of the function body does not return a
value that has the return type. The special keyword "forced"
can be prefixed to the return type. If it is, it is not
an error if the function attempts to return a value not
having the return type -- instead, the language automatically
casts (see cast expressions, below) the return value to the
appropriate type. The special keyword "mixed" can also be
used in place of a real type to indicate that the return
value of the function does not always have one and the same
type.
Function arguments are specified by using the optional
keyword "forced", followed by a type name (same as the return
type detailed above), followed by an identifier. The
identifier is used to name the argument. When a function
is called, the function's arguments are available to the
function body as variables with names as given in the function
definition. The argument type of an argument is checked when
a function is called. If the "forced" keyword was used, the
argument value is automatically cast to the given type. If not,
it is a fatal error to call the function with an argument value
not matching the given argument type.
The type of a function argument can be left out, in
which case the language behaves as if the type "mixed" had
been specified.
The function body can be any statement. Most functions
contain more than one statement, thus most function
bodies will be block statements.
The return type, name, and argument types of a function
are called the prototype of the function.
When a function definition is executed, the new
function's existence is recorded in the current
namespace. Since function definitions can only occur
at top-level scope, this will always be the global
namespace. It is not an error to define a function
with the same name as an existing variable, function,
or structure template. The new function definition will
override any previous meaning of the same name.
The result value, or return value, of a function is
determined by using a return statement, described
below. A function body that does not use a return
statement will automatically be made to return a void
value by the language runtime system.
The following is an example of a function definition
for a function named "sum" that returns an int value
and excepts two int arguments named "x" and "y",
respectively. The example function body returns the
sum of both int values.
int sum(int x, int y)
{
return x + y;
}
The function definition above will result in a fatal
error if passed float arguments, for example. To cause
the language to automatically convert both arguments
to int when the function is called, the definition
would have to be changed to:
int sum(forced int x, forced int y)
{
return x + y;
}
2.4.4.2 return statement
The return statement is used to set the return value
of a function and terminate the execution of a function
body. It consists of the keyword "return" followed
by an optional expression.
When a return statement is executed inside a function
body, the return expression is evaluated and used
as the return value of the function. If no return
expression is present, a void value is substituted
instead. Statements following the return statement
in the function body are not executed. The effect
of the return statement is to always end the execution
of a function body.
The return value is passed back the caller of the
function.
When a return statement is executed outside of a
function body, it behaves like an empty statement and
the return expression is not evaluated.
The following is an example of a return statement
used to return the bool value "true":
return true;
2.4.5 Structure templates
A structure template is a blueprint for constructing
values of type struct. Structure templates support
inheritance, meaning one structure template can build
upon another structure template defined earlier.
Structure templates can define fields and methods
that are to be created when a struct value is
constructed from the template.
A structure template consists of the keyword "template",
followed by an identifier to name the template, followed
by field and method definitions enclosed in curly
braces. Optionally, the name of the template can be
followed by the keyword "extends" and an identifier
naming another structure template that this template
builds upon.
When a structure template is executed, the new template
is stored in the current namespace and is available to
code following the structure template. Since structure
template can only occur at top-level scope, they are
always stored in the global namespace. It is not an
error if the template name is already used by an existing
variable, function, or other template. The new structure
template overrides any previous definition of the same
name.
See the following sections for examples of structure
templates. See the section "Constructor calls" in the
chapter on expressions for information on how to create
struct values from structure templates.
2.4.5.1 Defining structure fields
Structure fields in structure templates are used to
define data fields that will appear in struct values
created from the template. The definition of a structure
field gives the identifier of the field. A value for the
field can also be given, but this is optional.
A structure field definition without value consists
of an identifier followed by a semicolon. When a struct
value is constructed from the template, the resulting
value will have an element named by the identifier that
contains a void value.
A structure field definition with value consists of
an identifier, followed by the assignment operator ("="),
followed by an expression, followed by a semicolon. When
a struct value is constructed from the template, the
resulting value will have an element named by the
identifier that contains the result of evaluating
the expression.
The following is an example of a structure template
that defines two structure fields. The first field is
named "i" and not given a value, the second is called
"foo" and given the constant int expression 42 as a
value.
template example
{
i;
foo = 42;
}
When a template extends another template, both may contain
fields of the same name. The values given by the
extending template have precedence. In the following
example, a struct value constructed from template "bar"
will contain a field called "i" with the int value 2.
template foo
{
i = 1;
}
template bar extends foo
{
i = 2;
}
2.4.5.2 Defining structure methods
A method is a function stored within a structure. This is
basically the same as a struct field with type fn. The
name "method" was chosen because that is how object-oriented
languages name a similar construct.
A structure method definition inside a structure template
is written exactly like a function definition (see above). The
only difference is that the function definition occurs within
the curly braces enclosing the structure template's definition.
When a struct value is constructed from the structure template,
it will contain an element with the function name from the
function definition. The element will contain a value of type
fn that corresponds to the given function prototype and body.
The following is an example of a structure template that
defines a method called "double", which is given as a function
that will double its int argument.
template foo
{
int double(int x)
{
return 2 * x;
}
}
For structure templates extending other structure templates,
the same rules as for structure fields apply: when both
templates define a method of the same name, the definition in
the extending template takes precedence. In the following
example, struct values constructed from the "bar" template
will contain a method called "fiddle" that quadruples its
argument, whereas struct value constructed from the "foo"
template will contain a method called "fiddle" that triples
its argument.
template foo
{
int fiddle(int x)
{
return 3 * x;
}
}
template bar extends foo
{
int fiddle(int x)
{
return 4 * x;
}
}
Note that field and method definitions in a structure template
can be intermixed in any order.
2.4.5.3 Constructor method
A constructor method is a structure method definition with
a special name. A method is called the constructor method
if its identifier is the same as the identifier of the
structure definition it is part of.
Constructor methods play a special role when a struct value
is constructed from a template, as described in the section
"Constructor calls" in the chapter on expressions. Apart
from that, a constructor method behaves identically to other
methods defined by a structure template.
The following is an example of a structure template "foo"
that contains a constructor method that will print out a
message whenever it is called.
template foo
{
void foo()
{
print("constructor method foo called!\n");
}
}
2.5 Expressions
Expressions are basically descriptions on how to compute a
value. Determining the value of an expression is called
evaluating the expression. The result of evaluating an
expression, called its value, is a value from one of the
eight built-in types of the Arena scripting language.
2.5.1 Basic rules for expression nesting
Expression can be made up of other expressions by use
of several operators which are detailed in the sections
below. The exact meaning of compound expressions such
as "2 + 3 * 5" is determined by precedence and
associativity. For example, in the expression "2 + 3 * 5",
the multiplication is performed before the addition. To
override the order in which parts of an expression are
evaluated, it is possible to put parts of an expression
into parenthesis. The sub-expression thus formed must be
a valid expression in itself and its value will be
evaluated before the rest of the original expression.
For example, to compute the addition before the
multiplication in the aforementioned example, the
expression would have to be changed to "(2 + 3) * 5".
The following sections list all possible types of
expressions supported by the Arena scripting language.
Precedence and associativity of all language operators
are given near the end of the chapter.
2.5.2 Constant expressions
A constant expression consists of a literal token.
There are literal tokens for the types void, bool, int,
float, and string.
When a literal token expression is evaluated, the
result is a value of the appropriate type. For
example, the literal expression "12" evaluates to the
int value 12.
The following are examples of constant expressions:
true
12.0
"I'm a string"
()
42
2.5.3 Reference expressions
A reference expression is used to refer to a variable
or function. It consists of an identifier.
When a reference expression is evaluated, the result is
the value of the named variable in the current
namespace. If the identifier refers to a function, the
result is a value of type fn. If the identifier is
unknown or names a structure template, the result is
a void value.
The following are examples of reference expressions:
a
foo
some_long_identifier
2.5.3.1 Static reference expressions
A static reference expression is used to refer to
elements of a structure template. It consists of
an identifier, followed by the operator symbol "::"
(double colon), followed by another identifier.
The first identifier is a template name that is
looked for in the current namespace. If it does
not denote an existing structure template, a fatal
error is generated. Otherwise, a separate namespace
is created. The field and method definitions of
the structure template are then executed inside the
new namespace. The second identifier is then used
like a normal reference expression inside the new
namespace. The new namespace is destroyed after
obtaining the value of the static reference, which
is the value of the whole static reference expression.
The following are examples of static references:
foo::bar
some_template::some_field
2.5.3.2 Indexing of elements
Indexing is used to refer to elements of array and
struct values. Indices can be placed directly
after reference expressions, static reference
expressions, and all kinds of function and method
calls.
An array index consists of one or more expressions,
each enclosed in square brackets. When an array index
is evaluated, the indexed expression and the expression(s)
used as the index are evaluated. If the result value of
the indexed expression is not an array, a void value is
returned. Otherwise, the result of the index expression
is cast to an integer (see below for type casting rules)
and used as an index into the array. If the resulting
integer index is valid for the array in question, the
element stored at that index is the result of the
indexed expression. Otherwise, the result is a void value.
The following is an example expression that assumes
"a" is the name of an array variable and references
the third element of the array:
a[2]
As a special case, negative indexing is allowed.
A negative index is taken to be an offset from the
end of the array. This way, the index -1 accesses
the last element of an array. -2 accesses the element
immediately preceding the last element, and so on.
If a negative index reaches beyond the beginning of
an array, the result is a void value.
Struct values contain values indexed by identifiers.
A reference to a struct field consists of the operator
symbol "." (period) followed by an identifier.
When a struct index is evaluated, the preceding
expression is evaluated. If the result is not a
struct value, the result is a void value. Otherwise,
the index identifier is used as an element name
for the struct value. If the struct has an element
of that name, the value stored under that name is
the result of the indexing expression. If the struct
value does not have an element with the given name,
a void value is used as the result.
The following is an example of an expression that
uses "a" as the name of a struct variable and
indexes a field "name" off the variable's value:
a.name
Array and struct indices can be freely mixed.
Multiple array and struct indices can follow
each other. Evaluation proceeds from left to
right. The following are examples of expressions
with multiple indices:
a[2].foo[3][7].value
str.data[100]
a[0][1][2]
foo.bar.foobar
a[1].bar.foo[2]
The last example above would be evaluated as follows:
first the variable reference "a" would be evaluated.
If the resulting value is an array, the second element
of the array is accessed. If the result is a struct, the
field named "bar" is accessed. If the result is again a
struct, the field named "foo" is accessed. If this results
in an array value, the third element of that array is
accessed and used as the value for the whole expression.
If any value produced along the way does not have the
expected type (array or struct, depending on the kind of
indexing used), the result of the whole expression is a
void value.
2.5.4 Cast expressions
Cast expressions are used to convert values from one
type to another. A cast expression consists of an
opening parenthesis, followed by a type name, followed
by closing parenthesis, followed by an expression. No
whitespace is allowed between parenthesis and type name.
The result of a cast expression is obtained by first
computing the value of the inner expression and then
converting it to the type named in the cast expression.
If the value produced by the inner expression already
has the right type, it is directly used as the result
of the cast expression. Otherwise, the type conversion
rules given in the following sections are applied.
This is an example of a cast expression casting the
integer constant "1" to float:
(float) 1
2.5.4.1 Conversion to void
Since the void type has only one value, all values of
all other types are converted to that one value.
2.5.4.2 Conversion to bool
Converting a void value to bool results in the bool
value "false".
Converting an int value to bool results in the bool
value "false" if the int value is 0 (zero). Otherwise,
the result is the bool value "true".
Converting a float value to bool results in the bool
value "false" if the float value is 0.0 (zero). Otherwise,
the result is the bool value "true".
Converting a string value to bool results in the
bool value "false" if the string is empty (that is,
contains no characters). Otherwise, the result is
the bool value "true".
Converting an array value to bool results in the
bool value "false" if the array is empty (that is,
contains no elements). Otherwise, the result is the
bool value "true".
Converting a struct value to bool results in the
bool value "false" if the struct is empty (that is,
contains no fields or methods). Otherwise, the result
is the bool value "true".
Converting an fn value to bool results in the bool
value "true".
Converting a resource value to bool results in the
bool value "true".
2.5.4.3 Conversion to int
Converting a void value to int results in the int value
0 (zero).
Converting a bool value to int results in the int value
0 (zero) if the bool value is "false". If the bool
value is "true", the resulting int value is 1 (one).
Converting a float value to int results in an int value
that corresponds to the integral part of the float
value. If the integral part of the float value cannot
be represented as an int, the resulting value is
undefined.
Converting a string value to int attempts to interpret
the string as an integer literal. Only an initial part
of the string consisting solely of digits is considered
for conversion.
Converting an array value to int results in an int
value that gives the number of elements in the array.
Converting a struct value to int results in an int
value that gives the number of elements in the struct.
Converting an fn value to int results in the int value
1 (one).
Converting a resource value to int results in the int
value 1 (one).
2.5.4.4 Conversion to float
Converting a void value to float results in the float value
0.0 (zero).
Converting a bool value to float results in the float value
0.0 (zero) if the bool value is "false". If the bool
value is "true", the resulting float value is 1.0 (one).
Converting an int value to float results in a float
value with the same integral value as the original int
value and no fractional part.
Converting a string value to float attempts to interpret
the string as an float literal. Only an initial part
of the string consisting solely of character that can
occur in a float literal is considered for conversion.
Converting an array value to float results in an float
value that gives the number of elements in the array.
Converting a struct value to float results in an float
value that gives the number of elements in the struct.
Converting an fn value to float results in the float value
1.0 (one).
Converting a resource value to float results in the float
value 1.0 (one).
2.5.4.5 Conversion to string
Converting a void value to string results in an empty
string value.
Converting a bool value to string results in an empty
string value if the bool value is "false" or in a string
value containing the single character "1" (digit one) if
the bool value is "true".
Converting an int value to string results in a string
value containing the integer literal for the original
int value.
Converting a float value to string results in a string
value containing the float literal for the original
float value.
Converting an array value to string results in a string
value containing the word "Array".
Converting a struct value to string results in a string
value containing the word "Struct"
Converting an fn value to string results in a string
value containing the word "Function".
Converting a resource value to string results in a string
value containing the word "Resource".
2.5.4.6 Conversion to array
Converting a non-array value to an array results in
a one-element array that contains the original value
at index 0 (zero).
2.5.4.7 Conversion to struct
Converting a non-struct value to a struct results in
a struct with a single field named "value" that
contains the original value.
2.5.4.8 Conversion to fn
Attempting to convert a non-fn value to fn is a fatal
error.
2.5.4.9 Conversion to resource
Attempting to convert a non-resource value to resource is
a fatal error.
2.5.5 Assignment expressions
An assignment expression is used to assign a value
to a variable. It consists of an identifier, followed
by the assignment operator "=" (equals sign),
followed by an expression.
Evaluation of an assignment expression evaluates the
inner expression and stores the result in the current
namespace, in the form of a variable with the name
given by the identifier in the assignment expression.
Any previous meaning of the same identifier is lost.
The assignment expression itself has the same result
value as the inner expression.
The following is an example expression that assigns
the float value "12.5" to a variable named "val":
val = 12.5
Note that if an exception is thrown while evaluating
the right side of an assignment, the assignment does
not take place and the variable retains its previous
value.
Since an assignment expression has the assigned
value as its own value, and assignment associates to
the right, it is possible to assign a value to
multiple variables with an expression like this:
a = b = 0
2.5.5.1 Indexing in assignments
Array and struct indices can be used in an assignment
expression just like they can be used in combination
with reference expressions. For example, the
following expression will assign the bool value "true"
to the fifth element of an array stored in the variable
"map":
map[4] = true
There is a difference to using indices in references,
though. The above example will enforce "map" to be
a variable of type array. If it is not an array before
the assignment, an empty array will be created on the
fly, the fifth element be set to "true", and the
resulting array will be assigned to the variable "map".
In the same way, when struct indexing is used on
something that is not a struct, an empty struct value
will be created on the fly and substituted for the
original non-struct value.
If a negative array index is used in an assignment
that does not fall into the bounds of the array, the
effect is to assign to the first element of the array.
Consider the following example:
a.foo.data[3] = 12
No matter what the value of the variable "a" is before
the assignment, the following will be true after the
assignment expression was evaluated: "a" will be a
struct with at least the field "foo". The field "foo"
will itself be a struct with a least the field "data".
The field "data" will itself contain an array with
at least four elements, the one at index 3 containing the
int value 12. Values that already had the correct
type for the assignment are not disturbed: for example,
if the "data" field above already existed as an array
of ten elements, it would still be an array of ten elements
after the assignment; just the element at index 3 would
have been overwritten with an int value of 12.
If both an index and the outer assignment have side
effects on the same structure or array, the side
effects of the index expression are discarded after the
value of the index has been computed. In the following
example, the value of "s.sp" is not changed after
evaluation of the whole assignment expression:
s.stack[s.sp++] = 42
2.5.5.2 Combining assignments and operators
Instead of the plain assignment operator, the following
operators can also be used:
+= -= *= /= &= |=
^= <<= >>=
These are all composed of a normal operator symbol of the
Arena language and the assignment operator symbol. The
meaning of a special assignment is best explained by an
example. Consider this expression using a special
assignment operator:
a += 2
This expression behaves exactly the same as another, longer
expression:
a = a + 2
In effect, using a special assignment operator is exactly
the same as first referencing the target of the assignment,
combining the result with the operator and inner expression
given, and assigning the result to the target of the
assignment.
2.5.6 Function calls
Function calls are used to call library functions or
user-defined functions. A function call consists
of an identifier naming the function, followed by
a comma-separated list of expressions (the function
call arguments) in parenthesis. The argument list
is allowed to be empty.
When a function call expression is evaluated, the
existence of the function is checked. If the identifier
name is not found in the current namespace or does not
refer to a function or fn variable, a fatal error is
generated. If the function is found, the number of
argument expressions is checked against the number of
arguments given in the function's prototype. It is a
fatal error to pass less arguments than present in the
prototype. It is allowed to pass more arguments, extra
arguments will be made available to a function's body as
described below.
When it has been determined that a function call is
valid as described above, the argument expressions are
evaluated. Argument expressions are evaluated from left
to right. The types of the resulting values are checked
against the function's prototype, as described in the
section about function definition statements (above, in
the chapter about statements).
If the argument type check succeeds, a new local
namespace is created. The values of the function's
arguments are then added to the new namespace as if
they were local variables assigned inside the
function's body. For example, consider a function
with the following prototype:
int mult(int x, int y)
When this function is called with the arguments 42 and
12, the local namespace of the function will contain an
int variable named "x" with initial value 42 and another
int variable named "y" with initial value 12.
In addition to the named arguments, the local int
variable "argc" is defined and is assigned the number of
arguments actually passed to the function. The variable
"argv" is also defined and contains an array filled with
copies of all function arguments. The function's body
can use these two variables to gain access to extra
parameters given in a call of the function, beyond those
named in the function's prototype.
When these preparations are complete, the function's
body is executed inside its own local namespace. If the
function body executes a return statement, the value
used in the statement becomes the result of the
function call expression. If the function does not
explicitly return a value, a void value is automatically
generated. The local namespace of the function is then
destroyed, which frees all local variables, including
the values of the function arguments.
The following are examples of function call expressions:
printf("Hello World!\n");
array_merge(a, b ,c);
versions()
my_func(12, "foo", 42);
The above rules mean that function arguments are passed
to the function as copies. For example, consider the
function call:
foo(a, b)
When this function call is evaluates, the variables "a" and
"b" are referenced and copies of their current values
are passed to the body of the function "foo". No matter what
the function does with its argument values, the values of
the variables "a" and "b" as stored in the namespace outside
of the function's body are not changed.
2.5.6.1 Passing arguments "by reference"
As detailed in the last section, function arguments are
normally passed into function bodies as copies. Even if
the argument expressions are variable reference, a
function body cannot manipulate the variables themselves.
However, there is a special syntax for passing argument
expressions to a function that makes it possible for the
function's body to influence the value of variables that
are used as arguments. It consists of placing an ampersand
before variable reference expressions or indexed variable
reference expressions that are used as function arguments.
This is called passing "by reference", though Arena does
not exactly use references for this construct (the method
that Arena uses is called "copy-retract" or "copy-in
copy-out").
When a function call using this syntax is evaluated, the
normal function call semantics as described in the last
section are in effect. However, when the function's body
finishes executing, the language tries to update the
values of all arguments that were passed "by reference".
This is best explained by an example. Consider the following
function body:
void swap(mixed a, mixed b)
{
c = a;
a = b;
b = c;
}
For example, this function might be called like this:
swap(&x, &y);
During the function call, the values of the variables "x"
and "y" are available inside the function's body as local
variables "a" and "b" (copy-in). When the function's code
has been executed, the language checks whether the local
variable "a" is still defined. If yes, its value is copied
into the variable "x" outside the function. The same
happens for local variable "b" and "y" outside the
function (copy-out). The order given here is for explanatory
purposes. The language takes care that the copy-out actions
happen atomically with regard to each other -- from the
script's point of view, all copy-out actions look as if
they happen at exactly the same time. For example, the
above example function might be called like this:
swap(&i, &a[i])
In this case, the array index used for the update of the
second variable will always be the same one that was
used for the actual argument value passed into the
function, even if the function changes its first argument.
If the same variable is passed into a function twice or
more using "by reference" passing more than once, the value of
the variable after the function call is implementation-defined.
Note that passing "by reference" only works for arguments
named in the called function's prototype. It does not
work for arguments accessed via the special "argv" array.
2.5.7 Basic rules for structure templates
Structure templates are used to construct values of the
struct datatype. This process is called creating an
instance of the template. Another use of a template is
to use a static reference, which means accessing
something inside the structure template without actually
creating an instance.
In both cases, the language needs to create concrete
versions of the abstract definitions given in the
template. This happens as follows: a new local namespace
is created. Inside this namespace, the definitions
given in the template are executed. Field definitions
with values are executed like assignment expressions.
Field definitions without values are executed like
assignment expressions that assign a void value. Method
definitions are executed as normal. The result is a
local namespace that contains all fields and methods
from the template with their default values.
If a template extends another template, the process
above is used recursively, depth-first. This means
the chain of templates extending each other is
searched until a template that does not extend another
is found. The definitions from that template are
evaluated first, followed by those in the template
that extends the first one, and so on until the
definitions from the template that started the process
are evaluated. This means definitions in a template
can override all fields and methods from another
template that it extends.
If the process was used to create a struct value, the
completed local namespace is then used to populate the
new struct value. If the process was used for evaluating
a static reference, the referenced member is copied and
the namespace discarded.
2.5.8 Constructor calls
Constructor call expressions are used to create
struct values from structure templates. A constructor call
consists of the keyword "new", followed by an identifier
naming a template, followed by a comma-separated list of
argument expressions enclosed in parenthesis. The argument
list is allowed to be empty.
When a constructor call expression is evaluated,
the identifier is used to look for a structure template
definition in the local and global namespace. It is
a fatal error if none is found. If the template is
found, the initial values of a new struct value are
computed as described under "Basic rules", above.
If a constructor method is defined in the template,
it is called using the argument expressions given
as arguments in the constructor call expression. If
the template itself does not define a constructor
method but a template it extends does, the constructor
of the parent template is called instead. Consider
this example:
template foo
{
void foo()
{
print("this is foo\n");
}
}
template bar extends foo
{
i = 12;
}
When a constructor call is evaluated for template "bar",
the constructor method defined in the "foo" template
will be called. Note that it is legal for there to be
no constructor method to call at all.
Normal argument type checks take place for constructor
methods. Using an incorrect number of arguments or
arguments of unsuitable types results in a fatal error.
Values returned from a constructor (by use of a return
statement) are discarded.
During execution of the constructor method, a special
local variable "this" is defined. It contains a copy
of the struct value that is being constructed. It behaves
like a function argument passed "by reference", meaning
the constructor method's body can use it to access and
change elements in the struct value that is the result
of the whole constructor call expression.
Note that the argument expressions given in the constructor
call expression are only evaluated when a constructor
method is actually called. If no constructor method is
defined, the argument expressions are not evaluated.
At the end of the evaluation of a constructor call expression,
an additional element called "__template" is added to the
new struct value. It contains a string value with the name
of the template used to create the struct value.
An example. The following structure template contains
a constructor method that will set an field called "i"
to the value of the first argument used in the
constructor call expression:
template foo
{
void foo(int x)
{
this.i = x;
}
}
The above example can be used in a constructor call
expression like this:
new foo(12)
The result is a value of type struct. This value will have
three elements: a field called "i" with the int value 12,
a method called "foo", and a field called "__template" that
contains the string value "foo".
2.5.9 Method calls
A method call works like a normal function call, but
refers to a function defined by a structure template or
contained in a struct value.
The conventions for argument evaluation, type checks
and namespaces are the same as for function calls,
described above.
2.5.9.1 Static method calls
A static method calls is used to call a function
defined in a structure template. It consists of
an identifier naming a template, followed by the
characters "::" (double colon), followed by another
identifier naming the method, followed by an argument
list of expressions in parenthesis.
It is a fatal error if the template named by the
first identifier is not defined in the current
namespace. It is also a fatal error if the named
template does not contain, either directly or via
inheritance from an extended template, a method
with the name given by the second identifier.
The following are examples of static method calls:
foo::bar(1, 2, 3)
input::check("foo", false)
login::logout()
2.5.9.2 Dynamic method calls
A dynamic method call is used to call a method contained
in a struct value. It consists of appending a single
period, followed by an identifier and an argument list
of expressions in parenthesis, to some other expression
that results in a struct value.
If a method call is appended to a non-struct value or the
named method does not exist in the struct value, a fatal
error is generated.
If the method exists and the arguments are compatible with
its prototype, the method's body is called as described
for normal functions. A special local variable called "this"
is also defined and contains a copy of the struct that
contains the called method. This variable can be used to
access fields and methods stored in the same struct value.
Changes to the variable "this" will be copied into the
real struct variable (if any) when the method body is
finished executing.
The following are examples of dynamic method calls (the
last is a method call applied to the result of a
previous constructor call):
foo.bar()
registry[512].files.destroy(2)
new foo().something("foo", 42)
2.5.10 Operators
Operators work a lot like functions, but instead of
names and argument lists they consist of an operator
symbol applied to one or more other expressions. Which
other expressions are combined by the operator
depends on the kind of operator, as described next.
A prefix operator expression affects a single inner
expression and consists of the operator symbol prefixed
to another expression.
An infix operator expression affects two inner
expressions and consists of the operator symbol
written between the two other expressions.
A postfix operator expression affects a single inner
expression and consists of the operator symbol suffixed
to another expression.
Operators work on different types of expressions. All
operators automatically cast the values of their
argument expression to a type appropriate to the
operator, as described below for different kinds of
operators.
Not all operators evaluate all of their argument
expressions. The rules for evaluation are also described
below.
2.5.10.1 Math operators
Math operators are used to represent arithmetic
operations. They work with values of types int and
float.
A math operator always evaluates all its argument
expressions. If at least one of the argument
expressions results in a float value, both values are
cast to float before use. Otherwise both values are
cast to int.
There is only a single math prefix operator. It uses
the operator symbol "-" (minus sign) and denotes
negation of the value of the argument expression.
The following table lists the infix math operators and
their respective meanings.
+ addition
- subtraction
* multiplication
/ division
% remainder
** exponentiation
If the result of a math operator expression falls
outside of the domain of the type of its arguments
(after casting), the result is an undefined value
of the same type as the argument values.
The following are examples of math operator expressions:
-12
1 + 2
1.2 * 5
2 ** 10
2.5.10.2 Boolean operators
Boolean operators are used to represent logic
computations on truth values. When a boolean operator
computes the value of one of its argument expressions,
the result is always cast to bool.
The prefix operator "!" (exclamation mark) denotes
logical negation. It always computes the value of its
argument expression.
The infix operator "||" (double vertical bar) denotes
logical disjunction ("or"). It always evaluates its
first, left argument expression. If the result is the
value "true", the result of the whole expression is also
"true" and the second argument expression is not
evaluated. Otherwise, the second argument expression is
evaluated and its bool value is the result of the whole
expression.
The infix operator "&&" (double ampersand) denotes
logical conjunction ("and"). It always evaluates its
first, left argument expression. If the result is the
value "false", the result of the whole expression is also
"false" and the second argument expression is not
evaluated. Otherwise, the second argument expression is
evaluated and its bool value is the result of the whole
expression.
The following are examples of boolean operator expressions:
!failed
x && y
(x || y) && !z
2.5.10.3 Equality operators
Equality operators are used to compare values for
equality. The two equality operators always evaluate
both their argument values. No casting of the resulting
values takes place.
If both arguments to an equality operator are of type
array, struct, or resource, the result of the equality
operator expression is implementation-defined.
The operator "==" (double equals sign) denotes an
equality test. The value of the whole expression is "true"
if both argument values are of the same type and
represent the same value of that type. Otherwise the
value of the whole expression is "false".
For values of type fn, two values are considered equal
if and only if they refer to the same function body.
The operator "!=" (exclamation mark followed by equals
sign) denotes an inequality test. The value of the
whole expression is "true" if the argument values are
of different types or do not represent the same value
if they are of the same type. Otherwise the value of
the whole expression is "false".
The following are examples of equality operator
expressions:
1 != 2
x == "foo"
divisor != 0.0
2.5.10.4 Order operators
Order operators are used to compare the ordering of
two values with respect to each other. An order
operator always evaluates both of its argument
expressions. If only one of the values is a literal
constant, the other value is cast to the same type.
Otherwise, the second value is cast to the type of
the first value (the first value is the one produced
by the argument expression on the left of the operator
symbol).
Possible result values of an order operator expression
are "true" and "false", depending on whether the
ordering the expression checks for is present for the
argument values.
Ordering of void values is always "false" by convention
since there is only one value in the datatype.
Ordering of bool values is such that the value "false"
is smaller than "true", but not equal.
Ordering of int values is the same as for whole numbers
in mathematics.
Ordering of float values is the same as for rational
numbers in mathematics.
Ordering of string values is such that the bytes
forming the string are compared from left to right,
interpreting them as numbers in the range 0-255. The
comparison stops as soon as one of the bytes is
smaller or larger than the other one. The string with
larger byte is considered to be larger than the other.
If both bytes are the same, the comparison moves on to
the next byte in both strings. If this process reaches
the end of exactly one of the strings, that string is
considered to be the smaller of the two. If the process
reaches the end of both strings at the same time, the
strings are considered equal.
Ordering of array, struct, fn, and resource values is
implementation-defined.
The following table lists all order operators and
the condition that they check for.
< left value smaller than right value
> left value larger than right value
<= left value smaller or equal to right value
>= left value larger or equal to right value
The following are examples of order operator
expressions:
a < b
x >= 10
epsilon < 0.01
2.5.10.5 Bitwise operators
Bitwise operators are used to manipulate bits in int
values. A bitwise operator always evaluates all of
its argument expressions and casts their values to int.
The prefix operator "~" (tilde) denotes bitwise
negation of its argument value.
The prefix operator "++" (double plus sign) returns
the value of its argument expression increased by one.
If the argument is a reference expression or indexed
reference expression, the increased value is also
stored in the namespace in the same place that the
original value was obtained from.
The prefix operator "--" (double minus sign) returns
the value of its argument expression decreased by one.
If the argument is a reference expression or indexed
reference expression, the decreased value is also
stored in the namespace in the same place that the
original value was obtained from.
The infix operator "|" (vertical bar) computes the
bitwise "or" of its argument values. This means bits
set in either of the argument values will be set in the
result value.
The infix operator "&" (ampersand) computes the bitwise
"and" of its argument values. This means only bits set
in both the argument values will be set in the result
value.
The infix operator "^" (caret) computes the bitwise
"exclusive or" of its argument values. This means only
bits set in exactly one of the argument values will be
set in the result value.
The postfix operator "++" (double plus sign) returns
the value of its argument expression. In addition, if
the argument expression is a reference or indexed
reference expression, the value stored in the namespace
is increased by one. The previous value is returned
as result of the whole expression.
The postfix operator "--" (double minus sign) returns
the value of its argument expression. In addition, if
the argument expression is a reference or indexed
reference expression, the value stored in the namespace
is decreased by one. The previous value is returned
as result of the whole expression.
The following are examples of bitwise operator
expressions:
i++
flags & 0x40
x ^ y
--refcount
2.5.10.6 Operator precedence
If multiple operators occur in one expression, the order
in which they are evaluated depends on the relative
precedence of the two operators. Operators with higher
precedence are evaluated first.
If the same operator occurs multiple times in an
expression, the order of evaluation depends on the
associativity of the operator. If the operator is
left-associative, it is evaluated so that applications
proceed from left to right. For a right-associative
operator, applications proceed from right to left.
To change the order of evaluation or to use
more than one instance of a non-associative operator
in a single expression, the programmer can enclose
subexpressions in parenthesis. Expressions inside
parenthesis are evaluated first, independent of
any operators outside the parenthesis.
The following table lists all operator symbols. Operators
listed at the top have lower precedence than those listed
below them. Operators listed on the same line have the
same precedence. Associativity is given on the same line
as the operator symbols it applies to.
Associativity Operators
right = += -= *= /= |= &= ^= <<= >>=
none ?
right ||
right &&
right !
none == != < <= > >=
left & | ^
left + - (infix)
left * / %
right **
left << >>
left ~ - (postfix)
left ++ --
Casts have higher precedence than any operator and
associate to the right.
2.5.11 Conditional expression
A conditional expression is the expression equivalent
to an if-else statement. It consists of an expression,
followed by a "?" (question mark) character, followed
by another expression, followed by a ":" (colon)
character, followed by a third expression.
When a conditional expression is evaluated, the value
of the first argument expression is evaluated and its
result value is cast to bool. If the result is "true",
the value of the second expression is evaluated and
used as the value of the whole expression. The third
expression is not evaluated. If the value of the first
expression is "false", the third expression is evaluated
and its value used as the value of the whole expression.
The second expression is not evaluated in that case.
The following are examples of conditional expressions:
x % 2 == 0 ? "even" : "odd"
x ? false : true
2.5.12 Source file and line expressions
Source file and line expressions are used to refer to
the script they appear in. They are mostly useful for
printing error messages annotated with script source
code locations.
The expression "__FILE__" is evaluated to a string
value that contains the name of the script file that
the expression appears in.
The expression "__LINE__" is evaluated to an int value
that gives the line number that the expressions appears
on, relative to the script file that it appears in.
2.5.13 Anonymous functions
An anonymous function is a function that does not have
a name. Such a function cannot be defined by use of a
function definition statement since that mandates an
identifier to be used as the function's name. Instead,
an anonymo