Opened 8 years ago

Closed 7 years ago

#1322 closed defect (wontfix)

Locale can influence how CHICKEN reads numbers

Reported by: sjamaan Owned by:
Priority: major Milestone: 4.12.0
Component: core libraries Version: 4.11.0
Keywords: number parsing Cc:
Estimated difficulty: hard

Description (last modified by sjamaan)

Because CHICKEN uses the libc strtol/strtoll and strtod functions when reading flonums and fixnums, locale settings may influence how CHICKEN reads numbers, especially in decode_literal.

Hugo Arregui provided the following simple test:

;; Compile this with the -embedded option, since it defines its own main()
(import chicken scheme foreign)

#>
#include <locale.h>

int main(int argc, char** argv) {
   setlocale(LC_NUMERIC, "es_AR.UTF-8");
   CHICKEN_run(C_toplevel);
   return 0;
}
<#

(return-to-host)

This fails because the runtime system has several encoded floating-point numbers, which will no longer be read correctly. Also note that strtod might incorrectly "parse" a floating-point number like 1.002 if it happens to be valid in the current locale using thousands separators.

Parsing floating-point numbers in C is going to be pretty damn tricky, so we might just try and use setlocale() to set the locale to C and restore it to whatever it was before after doing so. I have no idea what the effects are of calling these functions often in the same program, and if there's a performance impact (it might be loading the strings or formatting rules for this locale every single time, on the fly, since it'll be designed for "normal" programs in which setlocale() will be called only a handful of times)

See also https://github.com/JuliaLang/julia/pull/5988 for example

Change History (4)

comment:1 Changed 8 years ago by sjamaan

Component: unknowncore libraries

comment:2 Changed 8 years ago by sjamaan

Description: modified (diff)

comment:3 Changed 8 years ago by sjamaan

Note that this particular situation will have been fixed in CHICKEN 5 already; we simply encode flonums as a packed byte sequence, and "large" fixnums (> 30 bits) as bignums, which will be simplified to fixnums after reading. The bignum reader doesn't use strtod. Note that there's still some compatibility code in runtime.c that still triggers the old code path. This is to make it possible to compile CHICKEN 5 through a boot-chicken with CHICKEN 4.

Regardless of this being fixed in CHICKEN 5, there could still be issues lurking due to locale mismatch, we should really try to figure out a way to catch these stupid bugs :(

comment:4 Changed 7 years ago by sjamaan

Resolution: wontfix
Status: newclosed

It's probably not worth fixing this in the 4 series.

Note: See TracTickets for help on using tickets.