source: project/wiki/programming-for-performance @ 35556

Last change on this file since 35556 was 35383, checked in by Kon Lovett, 23 months ago

in release now

File size: 13.9 KB
Line 
1== Programming for Performance
2
3
4=== Introduction
5
6Compiled CHICKEN code can be pretty fast - depending on programming
7style and problem set CHICKEN outperforms most Scheme systems,
8provided the tools and options available to speed up your code are
9used in a wise manner.
10
11First of all, a disclaimer: standard R5RS Scheme can not be executed
12efficiently. Period. Every implementation must bend the semantics
13of the language in more or less subtle ways to do a minimum of
14optimization and there are several approaches to do this:
15
16* drop expensive operations from the language - this is what [[http://www-sop.inria.fr/mimosa/fp/Bigloo/|Bigloo]] does: it
17only supports continuations when they are explicitly enabled and doing so will impact overall performance substantially
18
19* another expensive language feature is support for optimizing tail calls (Bigloo avoids this as well, as does
20[[http://en.wikipedia.org/wiki/Stalin_(Scheme_implementation)|Stalin]])
21
22* generic arithmetic requires dispatching according to the types of the arguments of an operation - type-specific
23(monomorphic) operations like {{fx+}} can reduce the overhead
24
25* perform flow-analysis or type-inference - the latter is usually somewhat stricter and may not allow variables to change
26their type during the execution of the code
27
28* do not allow redefinition of standard procedures - this is the most crucial requirement, as all further special treatment
29of primitive operations depends on the compiler being able to assume that {{+}} still means addition everywhere
30
31That last point is important: nearly all Scheme implementations
32provide some way to make sure primitives are known, either by using a
33module system (CHICKEN, [[http://racket-lang.org/|Racket]], Bigloo),
34performing heavyweight flow analysis (Stalin), or by using
35declarations
36([[http://dynamo.iro.umontreal.ca/~gambit/wiki/index.php/Main_Page|Gambit]],
37CHICKEN). An interesting way of dealing with this problem is used by
38Gambit: "speculative inlining" emits checks that make sure a variable
39naming a standard procedure has not been redefined. This allows both
40redefinitions and specialization of primitive calls but may lead to
41bloat of the executable and has a certain runtime cost.
42
43=== Optimization options
44
45If you want to make sure your code runs fast, the first measure should
46be to crank up the optimization level. There are 5 levels where each
47level enables an additional set of optimizations:
48
49; Level 1 ("{{-O1}}") : optimize small, non-allocating procedures when generating C code
50
51; Level 2 ("{{-O2}}") : perform general inlining of small known procedures
52
53; Level 3 ("{{-O3}}") : assume definitions in the compilation unit are not modified from other compilation units - this allows inlining toplevel procedures, provided they are small enough; additionally "specialization" is enabled (see below)
54
55; Level 4 ("{{-O4}}") : compile in unsafe mode, so runtime-checks are removed which will result in undefined behaviour (crashes) if uncaught errors occur at runtime
56
57; Level 5 ("{{-O5}}") : do everything to get the fastest code possible, disabling interrupts (threads + signal handling), remove all debugging information and compile in "block" mode
58
59The compiler will by default always assume standard procedures (and a
60set of so called "extended" procedures) will not be redefined, see the
61[[/man/4/faq|FAQ]] for a list of these procedures. The compiler will
62also do a good deal of copy-propagation, constant folding and
63value-propagation, together with inlining of known procedures that are
64only called at one location in the code. These optimizations never
65change the semantics and so will not alter the behaviour of the
66program or conflict with R5RS.
67
68Side note: the default debugging settings will emit "trace" code that
69keeps a call-trace during execution of the program to provide more
70usable error messages. Compile with "-d1" or "-d0" to avoid this as
71it has a runtime cost.
72
73The so called "block" mode compiles in a "closed world" manner: by
74compiling in this mode you declare that the toplevel definitions are
75not visible from beyond the currently compiled file. This improves
76inlining and allows unused code to be removed.
77
78=== Compiler syntax
79
80Using syntax to perform hand-inlining is a common way of speeding
81up Scheme or Lisp code. CHICKEN provides {{define-inline}} which is
82slightly more convenient and supports "compiler syntax", similar to
83the compiler macros available in Common Lisp. Compiler syntax means
84you are defining one or more rewrite rules that should be applied
85to (explicit) calls to certain functions. As an example, consider
86the case of having two functions: one that packs up an
87argument and a complementary function that does the opposite.
88Defining
89
90<enscript highlight=scheme>
91  (define-compiler-syntax unpack
92    (syntax-rules (pack)
93      ((_ (pack x)) x)))
94
95  (define-compiler-syntax pack
96    (syntax-rules (unpack)
97      ((_ (unpack x)) x)))
98</enscript>
99
100A call of the form
101
102  (pack (unpack foo))
103
104will be changed to simply
105
106  foo
107
108Note that when no rule matches, the call is compiled as it originally
109appeared in the code.  Compiler syntax may be defined with a
110procedural macro using {{er-macro-transformer}}, in this case
111returning the original form (being {{eq?}}) will assume default
112processing.  What can be done with this device is limited to
113syntactically visible special cases, but particularly macro-generated
114code may expose many regularities that can be easily handled in this
115manner. Compiler syntax can not be exported and is always local to the
116current compilation unit.
117
118=== Type declarations and specialization
119
120Scheme is a dynamically typed language: values carry their type
121information and type-checks are usually done at run-time which will
122always result in executables that run slower than their counterpart
123written in C, C++, ML or another statically typed
124language. Monomorphic operations specific to a certain value type
125still need to test their arguments of being of correct type, as long
126as type-information is not available statically, during compile-time.
127
128CHICKEN supports a basic form of "local" flow-analysis that attempts
129to maintain type-information for local bindings and subexpressions of
130a procedure body. This analysis must naturally be conservative as
131global mutation and side-effects may break all assumptions over
132variable and expression types. Based on that type information, certain
133type-related errors can be caught, like passing an argument of a wrong
134type to a known library procedure. The current version of
135CHICKEN uses this information to replace calls to known library
136procedures with more efficient procedure calls or performs the
137equivalent code directly in-line, a process called "specialization".
138
139The user can assists in giving the compiler type-information via
140declarations, e.g.:
141
142<enscript highlight=scheme>
143  (: mumble (float -> (list float string)))
144  (define (mumble num) ...)
145</enscript>
146
147Here we declared {{mumble}} to be a procedure of one argument, a
148floating-point number, and returning a list of two elements (a
149floating-point number and a string). The type-syntax and declaration
150form is similar to the one used by Typed [[Racket]], but is purely
151optional and does not treat the code as a separate dialect or
152sub-language. The type-system is straightforward with support for
153basic types, sequence types, a "union" ({{or}}) type and qualified
154types with type-variables and optional constraints.
155
156Here is an example, the internal declaration of the type of the
157{{sort}} library procedure in the {{data-structures}} unit:
158
159  (: sort (forall (e (s (or (vector-of e) (list-of e))))
160            (s (e e -> *) -> s)))
161
162{{sort}} takes a sequence which must be either a vector or a list,
163and a procedure taking two arguments of the element type of the
164argument sequence. It will return a result of the same type. As
165can be seen here, types can be pretty elaborate so abbreviations
166may be defined using {{define-type}}.
167
168Type-declarations can be exposed to be used by other compilation
169units. The compiler option {{-emit-type-file}} writes the locally
170declared type-information to a file (usually having the "{{.types}}"
171extension. Loading this file explicitly with the {{-types}} option or
172implicitly via the {{use}} or {{require-extension}} forms will make
173the type-information available to client code. This means you can
174simply install a {{EXTENSIONNAME.types}} file along with your
175extension and type-information will be available to provide more
176opportunitites to catch statically detectable type errors or to speed
177up user code by taking advantage of known result- and argument types.
178
179It is even possible to hook into the specialization machinery by
180using {{define-specialization}}, a feature similar to compiler-syntax
181but matching on type-information, similar to C++ templates:
182
183<enscript highlight=scheme>
184  (define-specialization (mumble (x fixnum))
185    (list x (number->string x)))
186</enscript>
187
188Here we declared that a call to {{mumble}}, when invoked with a fixnum
189argument (this must be a direct, syntactically visible call) will be
190replaced with the given expression (actually a call to a hidden global
191procedure, so no code-duplication will take place, unless the body
192is small enough to be subject to inlining).
193
194The flow-analysis algorithm takes care not to introduce "false
195positives" and will usually infer less information than might at first
196sight be available. Explicit assignments make the analysis more
197difficult as will variables that have different types during the
198lifetime of the program. The compiler tries hard not to emit unsafe code
199in case type-declarations are violated, but in the end may fail to do
200so and it is the reponsibility of the programmer to not declare
201variables incorrectly. Using the {{-strict-types}} compiler option
202will go one step further and declare that variables have fixed types
203and declarations will not be violated. The compiler will in this
204case have a maximum of type information and can try to optimize
205more optimistically.
206
207For an example of record-type definitions enhanced with type
208information, see [[/egg/typed-records|typed-records]] which replaces
209the usual record definition forms like {{define-record}} with typed
210variants that will give much more type-information to the compiler
211improving error reporting and runtime performance.
212
213Compiler-syntax and user-defined specializations allow manipulating
214code heavily up to the point of completely changing the meaning of the
215original program. This can be used to make code completely inscrutable
216but also allows tuning code without touching it, something that may be
217useful when you want to incorporate upstream code unchanged into a
218library.
219
220=== functors
221
222"Functors" can be used to define generic code for specialization
223over a set of bindings. Say you have a general algorithm and
224want to specialize it for various representations of data. A
225functor defines a higher order module: a module that can take
226other modules as arguments. So, for example:
227
228<enscript highlight=scheme>
229(functor (dot-product (VS (vref vlen add multiply zero))) (dot-product)
230  (import scheme chicken VS)
231  (define (dot-product a b)
232    (let ((alen (vlen a))
233          (blen (vlen b)))
234      (assert (= alen blen))
235      (do ((i 0 (fx+ i 1))
236           (r (zero) (add r (multiply (vref a i) (vref b i)))))
237          ((fx>= i alen) r)))))
238
239;; "instantiate" the functor with a given implementation
240(module f64dp = dot-product
241  (import scheme chicken)
242  (use srfi-4)
243  (define vlen f64vector-length)
244  (define vref f64vector-ref)
245  (define (zero) 0.0)
246  (define multiply fp*)
247  (define add fp+))
248</enscript>
249
250The {{dot-product}} functor takes a module that exports a number of
251operations and implements the scalar product of a vector. The second
252form above combines the definition of a module with the instantiation
253of the functor (see the manual for more information). The final result
254will be a module named {{f64dp}} that defines the scalar product of a
255SRFI-4 f64vector. Functors are somewhat like code generation
256factories. Note that nothing prevents the imports of the
257{{dot-product}} functor to be syntax.
258
259=== Programming style
260
261The style of coding naturally has a great impact on what the compiler
262is able to do with the code. As has been already described,
263assignments make flow-analysis harder and should be
264avoided. Monomorphic operators make type-information more explicit and
265also make the code easier to understand (because reading code means
266mentally keeping track of value types, however abstract they may be -
267using dynamically typed languages means doing the type-inference in
268your head, a price you pay for more expressivity). Keep related
269definitions together, preferrable in the same compilation unit to give
270more opportunities for inlining. For the same reason hide unexported
271variables (for example by using modules) which means they can not be
272destructively modified from elsewhere. Non-tail procedure calls in
273CHICKEN are slightly more expensive than tail calls (while the latter
274is as costly as a raw C function call, the former needs allocation of
275a closure record) so the more the compiler is able to find out from
276the information you give him, the better.
277
278=== Conclusion
279
280To find out more ways of speeding up your code, check out
281[[/egg/crunch|CRUNCH]], [[/egg/fast-generic|fast-generic]] or
282[[/egg/record-variants|record-variants]]. Scheme is so flexible that
283there are countless possibilities to extend the language with forms
284the provide more opportunities for optimizing code, either manually or
285by assisting the compiler. CHICKEN tries to expose much of the
286internal machinery to the user but this does not automagically make
287your code run as fast as idiomatic C. There are a lot of implicit
288assumptions that apply in Scheme code, assumptions that may not be
289that obvious at first glance (genericity, global redefinitions,
290destructive mutations, {{eval}}, continuations, etc.).  Experience
291with the system and the language is very important so: EXPERIMENT!
292There are a lot debugging options to peek at what the compiler is
293making of your code inside the various stages that it goes
294through. Use them and don't be afraid of asking on the CHICKEN mailing
295lists if you have a question.
Note: See TracBrowser for help on using the repository browser.