source: project/wiki/programming-for-performance @ 36657

Last change on this file since 36657 was 35383, checked in by Kon Lovett, 2 years ago

in release now

File size: 13.9 KB
1== Programming for Performance
4=== Introduction
6Compiled CHICKEN code can be pretty fast - depending on programming
7style and problem set CHICKEN outperforms most Scheme systems,
8provided the tools and options available to speed up your code are
9used in a wise manner.
11First of all, a disclaimer: standard R5RS Scheme can not be executed
12efficiently. Period. Every implementation must bend the semantics
13of the language in more or less subtle ways to do a minimum of
14optimization and there are several approaches to do this:
16* drop expensive operations from the language - this is what [[|Bigloo]] does: it
17only supports continuations when they are explicitly enabled and doing so will impact overall performance substantially
19* another expensive language feature is support for optimizing tail calls (Bigloo avoids this as well, as does
22* generic arithmetic requires dispatching according to the types of the arguments of an operation - type-specific
23(monomorphic) operations like {{fx+}} can reduce the overhead
25* perform flow-analysis or type-inference - the latter is usually somewhat stricter and may not allow variables to change
26their type during the execution of the code
28* do not allow redefinition of standard procedures - this is the most crucial requirement, as all further special treatment
29of primitive operations depends on the compiler being able to assume that {{+}} still means addition everywhere
31That last point is important: nearly all Scheme implementations
32provide some way to make sure primitives are known, either by using a
33module system (CHICKEN, [[|Racket]], Bigloo),
34performing heavyweight flow analysis (Stalin), or by using
37CHICKEN). An interesting way of dealing with this problem is used by
38Gambit: "speculative inlining" emits checks that make sure a variable
39naming a standard procedure has not been redefined. This allows both
40redefinitions and specialization of primitive calls but may lead to
41bloat of the executable and has a certain runtime cost.
43=== Optimization options
45If you want to make sure your code runs fast, the first measure should
46be to crank up the optimization level. There are 5 levels where each
47level enables an additional set of optimizations:
49; Level 1 ("{{-O1}}") : optimize small, non-allocating procedures when generating C code
51; Level 2 ("{{-O2}}") : perform general inlining of small known procedures
53; Level 3 ("{{-O3}}") : assume definitions in the compilation unit are not modified from other compilation units - this allows inlining toplevel procedures, provided they are small enough; additionally "specialization" is enabled (see below)
55; Level 4 ("{{-O4}}") : compile in unsafe mode, so runtime-checks are removed which will result in undefined behaviour (crashes) if uncaught errors occur at runtime
57; Level 5 ("{{-O5}}") : do everything to get the fastest code possible, disabling interrupts (threads + signal handling), remove all debugging information and compile in "block" mode
59The compiler will by default always assume standard procedures (and a
60set of so called "extended" procedures) will not be redefined, see the
61[[/man/4/faq|FAQ]] for a list of these procedures. The compiler will
62also do a good deal of copy-propagation, constant folding and
63value-propagation, together with inlining of known procedures that are
64only called at one location in the code. These optimizations never
65change the semantics and so will not alter the behaviour of the
66program or conflict with R5RS.
68Side note: the default debugging settings will emit "trace" code that
69keeps a call-trace during execution of the program to provide more
70usable error messages. Compile with "-d1" or "-d0" to avoid this as
71it has a runtime cost.
73The so called "block" mode compiles in a "closed world" manner: by
74compiling in this mode you declare that the toplevel definitions are
75not visible from beyond the currently compiled file. This improves
76inlining and allows unused code to be removed.
78=== Compiler syntax
80Using syntax to perform hand-inlining is a common way of speeding
81up Scheme or Lisp code. CHICKEN provides {{define-inline}} which is
82slightly more convenient and supports "compiler syntax", similar to
83the compiler macros available in Common Lisp. Compiler syntax means
84you are defining one or more rewrite rules that should be applied
85to (explicit) calls to certain functions. As an example, consider
86the case of having two functions: one that packs up an
87argument and a complementary function that does the opposite.
90<enscript highlight=scheme>
91  (define-compiler-syntax unpack
92    (syntax-rules (pack)
93      ((_ (pack x)) x)))
95  (define-compiler-syntax pack
96    (syntax-rules (unpack)
97      ((_ (unpack x)) x)))
100A call of the form
102  (pack (unpack foo))
104will be changed to simply
106  foo
108Note that when no rule matches, the call is compiled as it originally
109appeared in the code.  Compiler syntax may be defined with a
110procedural macro using {{er-macro-transformer}}, in this case
111returning the original form (being {{eq?}}) will assume default
112processing.  What can be done with this device is limited to
113syntactically visible special cases, but particularly macro-generated
114code may expose many regularities that can be easily handled in this
115manner. Compiler syntax can not be exported and is always local to the
116current compilation unit.
118=== Type declarations and specialization
120Scheme is a dynamically typed language: values carry their type
121information and type-checks are usually done at run-time which will
122always result in executables that run slower than their counterpart
123written in C, C++, ML or another statically typed
124language. Monomorphic operations specific to a certain value type
125still need to test their arguments of being of correct type, as long
126as type-information is not available statically, during compile-time.
128CHICKEN supports a basic form of "local" flow-analysis that attempts
129to maintain type-information for local bindings and subexpressions of
130a procedure body. This analysis must naturally be conservative as
131global mutation and side-effects may break all assumptions over
132variable and expression types. Based on that type information, certain
133type-related errors can be caught, like passing an argument of a wrong
134type to a known library procedure. The current version of
135CHICKEN uses this information to replace calls to known library
136procedures with more efficient procedure calls or performs the
137equivalent code directly in-line, a process called "specialization".
139The user can assists in giving the compiler type-information via
140declarations, e.g.:
142<enscript highlight=scheme>
143  (: mumble (float -> (list float string)))
144  (define (mumble num) ...)
147Here we declared {{mumble}} to be a procedure of one argument, a
148floating-point number, and returning a list of two elements (a
149floating-point number and a string). The type-syntax and declaration
150form is similar to the one used by Typed [[Racket]], but is purely
151optional and does not treat the code as a separate dialect or
152sub-language. The type-system is straightforward with support for
153basic types, sequence types, a "union" ({{or}}) type and qualified
154types with type-variables and optional constraints.
156Here is an example, the internal declaration of the type of the
157{{sort}} library procedure in the {{data-structures}} unit:
159  (: sort (forall (e (s (or (vector-of e) (list-of e))))
160            (s (e e -> *) -> s)))
162{{sort}} takes a sequence which must be either a vector or a list,
163and a procedure taking two arguments of the element type of the
164argument sequence. It will return a result of the same type. As
165can be seen here, types can be pretty elaborate so abbreviations
166may be defined using {{define-type}}.
168Type-declarations can be exposed to be used by other compilation
169units. The compiler option {{-emit-type-file}} writes the locally
170declared type-information to a file (usually having the "{{.types}}"
171extension. Loading this file explicitly with the {{-types}} option or
172implicitly via the {{use}} or {{require-extension}} forms will make
173the type-information available to client code. This means you can
174simply install a {{EXTENSIONNAME.types}} file along with your
175extension and type-information will be available to provide more
176opportunitites to catch statically detectable type errors or to speed
177up user code by taking advantage of known result- and argument types.
179It is even possible to hook into the specialization machinery by
180using {{define-specialization}}, a feature similar to compiler-syntax
181but matching on type-information, similar to C++ templates:
183<enscript highlight=scheme>
184  (define-specialization (mumble (x fixnum))
185    (list x (number->string x)))
188Here we declared that a call to {{mumble}}, when invoked with a fixnum
189argument (this must be a direct, syntactically visible call) will be
190replaced with the given expression (actually a call to a hidden global
191procedure, so no code-duplication will take place, unless the body
192is small enough to be subject to inlining).
194The flow-analysis algorithm takes care not to introduce "false
195positives" and will usually infer less information than might at first
196sight be available. Explicit assignments make the analysis more
197difficult as will variables that have different types during the
198lifetime of the program. The compiler tries hard not to emit unsafe code
199in case type-declarations are violated, but in the end may fail to do
200so and it is the reponsibility of the programmer to not declare
201variables incorrectly. Using the {{-strict-types}} compiler option
202will go one step further and declare that variables have fixed types
203and declarations will not be violated. The compiler will in this
204case have a maximum of type information and can try to optimize
205more optimistically.
207For an example of record-type definitions enhanced with type
208information, see [[/egg/typed-records|typed-records]] which replaces
209the usual record definition forms like {{define-record}} with typed
210variants that will give much more type-information to the compiler
211improving error reporting and runtime performance.
213Compiler-syntax and user-defined specializations allow manipulating
214code heavily up to the point of completely changing the meaning of the
215original program. This can be used to make code completely inscrutable
216but also allows tuning code without touching it, something that may be
217useful when you want to incorporate upstream code unchanged into a
220=== functors
222"Functors" can be used to define generic code for specialization
223over a set of bindings. Say you have a general algorithm and
224want to specialize it for various representations of data. A
225functor defines a higher order module: a module that can take
226other modules as arguments. So, for example:
228<enscript highlight=scheme>
229(functor (dot-product (VS (vref vlen add multiply zero))) (dot-product)
230  (import scheme chicken VS)
231  (define (dot-product a b)
232    (let ((alen (vlen a))
233          (blen (vlen b)))
234      (assert (= alen blen))
235      (do ((i 0 (fx+ i 1))
236           (r (zero) (add r (multiply (vref a i) (vref b i)))))
237          ((fx>= i alen) r)))))
239;; "instantiate" the functor with a given implementation
240(module f64dp = dot-product
241  (import scheme chicken)
242  (use srfi-4)
243  (define vlen f64vector-length)
244  (define vref f64vector-ref)
245  (define (zero) 0.0)
246  (define multiply fp*)
247  (define add fp+))
250The {{dot-product}} functor takes a module that exports a number of
251operations and implements the scalar product of a vector. The second
252form above combines the definition of a module with the instantiation
253of the functor (see the manual for more information). The final result
254will be a module named {{f64dp}} that defines the scalar product of a
255SRFI-4 f64vector. Functors are somewhat like code generation
256factories. Note that nothing prevents the imports of the
257{{dot-product}} functor to be syntax.
259=== Programming style
261The style of coding naturally has a great impact on what the compiler
262is able to do with the code. As has been already described,
263assignments make flow-analysis harder and should be
264avoided. Monomorphic operators make type-information more explicit and
265also make the code easier to understand (because reading code means
266mentally keeping track of value types, however abstract they may be -
267using dynamically typed languages means doing the type-inference in
268your head, a price you pay for more expressivity). Keep related
269definitions together, preferrable in the same compilation unit to give
270more opportunities for inlining. For the same reason hide unexported
271variables (for example by using modules) which means they can not be
272destructively modified from elsewhere. Non-tail procedure calls in
273CHICKEN are slightly more expensive than tail calls (while the latter
274is as costly as a raw C function call, the former needs allocation of
275a closure record) so the more the compiler is able to find out from
276the information you give him, the better.
278=== Conclusion
280To find out more ways of speeding up your code, check out
281[[/egg/crunch|CRUNCH]], [[/egg/fast-generic|fast-generic]] or
282[[/egg/record-variants|record-variants]]. Scheme is so flexible that
283there are countless possibilities to extend the language with forms
284the provide more opportunities for optimizing code, either manually or
285by assisting the compiler. CHICKEN tries to expose much of the
286internal machinery to the user but this does not automagically make
287your code run as fast as idiomatic C. There are a lot of implicit
288assumptions that apply in Scheme code, assumptions that may not be
289that obvious at first glance (genericity, global redefinitions,
290destructive mutations, {{eval}}, continuations, etc.).  Experience
291with the system and the language is very important so: EXPERIMENT!
292There are a lot debugging options to peek at what the compiler is
293making of your code inside the various stages that it goes
294through. Use them and don't be afraid of asking on the CHICKEN mailing
295lists if you have a question.
Note: See TracBrowser for help on using the repository browser.