source: project/wiki/eggref/5/srfi-152 @ 39131

Last change on this file since 39131 was 39131, checked in by gnosis, 3 months ago

Initial revision of documentation for SRFI-152

File size: 42.5 KB
Line 
1== String Library (reduced)
2=== Abstract
3Scheme has an impoverished set of string-processing utilities, which is a problem for authors of portable code. This SRFI proposes a coherent and comprehensive set of string-processing procedures. It is a reduced version of SRFI 13 that has been aligned with SRFI 135, Immutable Texts. Unlike SRFI 13, it has been made consistent with the R5RS, R6RS, and R7RS-small string procedures.
4
5For more information, see: [[https://srfi.schemers.org/srfi-152/|SRFI 152: String Library (reduced)]]
6=== Procedure Index
7Here is a list of the procedures provided by this SRFI:
8==== Predicates
9* string?
10* string-null?
11* string-every string-any
12==== Constructors
13* make-string string
14* string-tabulate
15* string-unfold string-unfold-right
16==== Conversion
17* string->vector string->list
18* vector->string list->string
19* reverse-list->string
20==== Selection
21* string-length
22* string-ref
23* substring
24* string-copy
25* string-take string-take-right
26* string-drop string-drop-right
27* string-pad string-pad-right
28* string-trim string-trim-right string-trim-both
29==== Replacement
30* string-replace
31==== Comparison
32* string=? string-ci=?
33* string<? string-ci<?
34* string>? string-ci>?
35* string<=? string-ci<=?
36* string>=? string-ci>=?
37==== Prefixes and suffixes
38* string-prefix-length string-suffix-length
39* string-prefix? string-suffix?
40==== Searching
41* string-index string-index-right
42* string-skip string-skip-right
43* string-contains string-contains-right
44* string-take-while string-take-while-right
45* string-drop-while string-drop-while-right
46* string-break string-span
47==== Concatenation
48* string-append string-concatenate string-concatenate-reverse
49* string-join
50==== Fold and map and friends
51* string-fold string-fold-right
52* string-map string-for-each
53* string-count
54* string-filter string-remove
55==== Replication and splitting
56* string-replicate
57* string-segment string-split
58==== Input-output
59* read-string write-string
60==== Mutation
61* string-set! string-fill! string-copy!
62=== Rationale
63This SRFI is based upon [[https://srfi.schemers.org/srfi-130/srfi-130.html|SRFI 130]], copying much of its structure and wording, but eliminating the concept of cursors. However, it is textually derived from [[https://srfi.schemers.org/srfi-135/srfi-135.html|SRFI 135]], in order to gain access to the editorial improvements made to the text of that SRFI, which was itself based on SRFI 130. Ultimately the origin of all these SRFIs is [[https://srfi.schemers.org/srfi-13/srfi-13.html|SRFI 13]].
64==== This SRFI omits the following bells, whistles, and gongs of SRFI 13:
65* the Knuth-Morris-Pratt string search algorithm (still used internally by the sample implementation but not exposed)
66* case-insensitive operations, other than the ones in R7RS-small
67* titlecase operations
68* direct comparison of substrings
69* mutation procedures, other than the ones in R7RS-small
70* string reversal (even more pointless in a Unicode age)
71* characters and SRFI 14 character sets as alternatives to predicates
72==== In addition, this SRFI includes the string-segment and string-split procedures from other sources.
73==== For completeness, string-take-while, string-drop-while, string-take-while-right, and string-drop-while-right are also provided.
74==== There are no performance guarantees for any of the procedures in this SRFI.
75==== The Scheme programming language does not expose the internal representation of strings.
76Some implementations of Scheme use UTF-32 or a similar encoding, which makes
77string-length, string-ref, and string-set! run in O(1) time. Some implementations use UTF-16 or UTF-8, which save space at the expense of making string-ref take time
78proportional to the length of a string. Others allow only 256 characters, typically the Latin-1 repertoire.
79==== Although Scheme's string data type allows portable code to use strings independently of their internal representation, the variation in performance between implementations has created a problem for programs that use long strings.
80In some systems, long strings are inefficient with respect to space; in other systems, long strings are inefficient with respect to time. Consequently, this SRFI suggests that Scheme's mutable strings be used only for relatively short sequences of characters, while using the immutable texts defined by [[https://srfi.schemers.org/srfi-135/srfi-135.html|SRFI 135]] for long sequences of characters.
81=== Specification
82Procedures present in R5RS, R6RS, and R7RS-small are marked (R5RS). Procedures present in R5RS and R6RS but with additional arguments in R7RS-small are marked (R5RS+). Procedures present in R6RS and R7RS-small are marked (R6-R7RS). Procedures present in R6RS only are marked (R6RS). Procedures present in R7RS-small only are marked (R7RS-small).
83
84Except as noted, the results returned from the procedures of this SRFI must be newly allocated strings. This is a change from the definition of SRFIs 13 and 130, though most Schemes do not support sharable strings in any case. However, the empty string need not be newly allocated.
85
86The procedures of this SRFI follow a consistent naming scheme, and are consistent with the conventions developed in SRFI 1 and used in SRFI 13, SRFI 130, and SRFI 135. In particular, procedures that have left/right directional variants use no suffix to specify left-to-right operation, -right to specify right-to-left operation, and -both to specify both. One discrepancy between SRFI 1 and other SRFIs is in the tabulate procedure: SRFI 1's list-tabulate takes the length argument first, before the procedure, whereas all string SRFIs put the procedure first, in line with mapping and folding operations.
87
88The order of common arguments is consistent across the different procedures. In particular, all procedures place the main string to be operated on first, with the exception of the mapping and folding procedures, which are consistent with R7RS-small and SRFI 1.
89
90If a procedure's return value is said to be "unspecified," the procedure returns a single result whose value is unconstrained and might even vary from call to call.
91==== Notation
92===== In the following procedure specifications:
93* A string argument is a string.
94* A char argument is a character.
95* An idx argument is an exact non-negative integer specifying a valid character index into a string.
96The valid character indexes of a string string of length {{n}} are the exact integers idx satisfying {{0 <= idx < n}}.
97* A {{k}} argument or result is a position:
98an exact non-negative integer that is either a valid character index for one of the string arguments or is the length of a string argument.
99* start and end arguments are positions specifying a half-open interval of indexes for a substring.
100When omitted, start defaults to 0 and end to the length of the corresponding string argument. It is an error unless {{0 <= start <= end <= (string-length string)}}; the sample implementations detect that error and raise an exception.
101* A {{len}} or {{nchars}} argument is an exact non-negative integer specifying some number of characters, usually the length of a string.
102* A {{pred}} argument is a unary character predicate, taking a character as its one argument and returning a value that will be interpreted as true or false.
103Unless noted otherwise, as with string-every and string-any, all predicates passed to procedures specified in this SRFI may be called in any order and any number of times. It is an error if {{pred}} has side effects or does not behave functionally (returning the same result whenever it is called with the same character); the sample implementation does not detect those errors.
104* An obj argument may be any value at all.
105===== It is an error to pass values that violate the specification above.
106Arguments given in square brackets are optional. Unless otherwise noted in the string describing the procedure, any prefix of these optional arguments may be supplied, from zero arguments to the full list. When a procedure returns multiple values, this is shown by listing the return values in square brackets as well. So, for example, the procedure with signature
107
108 halts? f [x init-store] → [boolean integer]
109
110would take one {{(f)}}, two {{(f, x)}} or three {{(f, x, init-store)}} input arguments, and return two values, a boolean and an integer.
111
112An argument followed by "..." means zero or more elements. So the procedure with the signature
113
114 sum-squares x ...  → number
115
116takes zero or more arguments {{(x ...)}}, while the procedure with signature
117
118 spell-check doc dict[1] dict[2] ... → string-list
119
120takes two required arguments {{(doc and dict[1])}} and zero or more optional arguments {{(dict[2] ...)}}.
121==== Procedures
122===== Predicates
123<procedure>string? obj → boolean (R5RS)</procedure>
124Is obj a string?
125
126<procedure>string-null? string → boolean</procedure>
127Is string the empty string?
128
129<procedure>string-every pred string [start end] → value</procedure>
130
131<procedure>string-any   pred string [start end] → value</procedure>
132
133Checks to see if every/any character in string satisfies {{pred}}, proceeding from left {{(index start)}} to right {{(index end)}}. These procedures are short-circuiting: if {{pred}} returns false, string-every does not call {{pred}} on subsequent characters; if {{pred}} returns true, string-any does not call {{pred}} on subsequent characters; Both procedures are "witness-generating":
134* If {{string-every}} is given an empty interval {{(with start = end)}}, it returns {{#t}}.
135* If {{string-every}} returns true for a non-empty interval {{(with start < end)}}, the returned true value is the one returned by the final call to the predicate on {{(string-ref (string-copy string) (- end 1))}}.
136* If string-any returns true, the returned true value is the one returned by the predicate.
137* Note:
138The names of these procedures do not end with a question mark. This indicates a general value is returned instead of a simple boolean (#t or #f).
139===== Constructors
140<procedure>make-string len char → string (R5RS)</procedure>
141Returns a string of the given length filled with the given character.
142
143<procedure>string char ... → string (R5RS)</procedure>
144Returns a string consisting of the given characters.
145
146<procedure>string-tabulate proc len → string</procedure>
147{{proc}} is a procedure that accepts an exact integer as its argument and returns a character. Constructs a string of size {{len}} by calling {{proc}} on each value from {{0}} (inclusive) to {{len}} (exclusive) to produce the corresponding element of the string. The order in which {{proc}} is called on those indexes is not specified.
148* Rationale:
149Although string-unfold is more general, string-tabulate is likely to run faster for the common special case it implements.
150
151<procedure>string-unfold stop? mapper successor seed [base make-final] → string</procedure>
152This is a fundamental constructor for strings.
153
154<parameter>successor</parameter>
155is used to generate a series of "seed" values from the initial seed:
156 seed, (successor seed), (successor^2 seed), (successor^3 seed), ...
157
158<parameter>stop?</parameter>
159tells us when to stop -- when it returns true when applied to one of these seed values.
160
161<parameter>mapper</parameter>
162maps each seed value to the corresponding character(s) in the result string, which are assembled into that string in left-to-right order. It is an error for mapper to return anything other than a character or string.
163
164<parameter>base</parameter>
165is the optional initial/leftmost portion of the constructed string, which defaults to the empty string "". It is an error if base is anything other than a character or string.
166
167<parameter>make-final</parameter>
168is applied to the terminal seed value (on which {{stop?}} returns true) to produce the final/rightmost portion of the constructed string. It defaults to {{(lambda (x) "")}}. It is an error for make-final to return anything other than a character or string.
169* {{string-unfold}} is a fairly powerful string constructor.
170You can use it to convert a list to a string, read a port into a string, reverse a string, copy a string, and so forth.
171======= Examples:
172<enscript highlight="scheme">
173(port->string p) = (string-unfold eof-object?
174                           values
175                           (lambda (x) (read-char p))
176                           (read-char p))
177
178(list->string lis) = (string-unfold null? car cdr lis)
179
180(string-tabulate f size) = (string-unfold (lambda (i) (= i size)) f add1 0)
181</enscript>
182======= To map f over a list lis, producing a string:
183<enscript highlight="scheme">
184(string-unfold null? (compose f car) cdr lis)
185</enscript>
186
187Interested functional programmers may enjoy noting that string-fold-right and string-unfold are in some sense inverses. That is, given operations {{knull?}}, {{kar}}, {{kdr}}, and {{kons}}, and a value {{knil}} satisfying
188
189<enscript highlight="scheme">
190(kons (kar x) (kdr x)) = x  and  (knull? knil) = #t
191</enscript>
192
193then
194
195<enscript highlight="scheme">
196(string-fold-right kons knil (string-unfold knull? kar kdr x)) = x
197</enscript>
198
199and
200
201<enscript highlight="scheme">
202(string-unfold knull? kar kdr (string-fold-right kons knil string)) = string.
203</enscript>
204
205This combinator pattern is sometimes called an "anamorphism."
206======= Note:
207Implementations should not allow the size of strings created by string-unfold to be limited by limits on stack size.
208
209<procedure>string-unfold-right stop? mapper successor seed [base make-final] → string</procedure>
210This is a fundamental constructor for strings. It is the same as string-unfold except the results of mapper are assembled into the string in right-to-left order, base is the optional rightmost portion of the constructed string, and make-final produces the leftmost portion of the constructed string. If mapper returns a string, the string is prepended to the constructed string (without reversal).
211
212<enscript highlight="scheme">
213(string-unfold-right (lambda (n) (< n (char->integer #\A)))
214                   (lambda (n) (char-downcase (integer->char n)))
215                   (lambda (n) (- n 1))
216                   (char->integer #\Z)
217                   #\space
218                   (lambda (n) " The English alphabet: "))
219    => " The English alphabet: abcdefghijklmnopqrstuvwxyz "
220
221(string-unfold-right null?
222                     (lambda (x) (string  #\[ (car x) #\]))
223                     cdr
224                     '(#\a #\b #\c))
225   => "[c|b|a]"
226</enscript>
227===== Conversion
228<procedure>string->vector string [start end] → char-vector (R7RS-small)</procedure>
229
230<procedure>string->list   string [start end] → char-list (R5RS+)</procedure>
231These procedures return a newly allocated (unless empty) vector or list of the characters that make up the given substring.
232
233<procedure>vector->string char-vector [start end] → string (R7RS-small)</procedure>
234
235<procedure>list->string   char-list → string (R5RS)</procedure>
236These procedures return a string containing the characters of the given (sub)vector or list. The behavior of the string will not be affected by subsequent mutation of the given vector or list.
237
238<procedure>reverse-list->string char-list → string</procedure>
239Semantically equivalent to (compose list->string reverse):
240
241<enscript highlight="scheme">
242(reverse-list->string '(#\a #\B #\c)) → "cBa"
243</enscript>
244
245This is a common idiom in the epilogue of string-processing loops that accumulate their result using a list in reverse order. (See also {{string-concatenate-reverse}} for the "chunked" variant.)
246===== Selection
247<procedure>string-length string → len (R5RS)</procedure>
248Returns the number of characters within the given string.
249
250<procedure>string-ref string idx → char (R5RS)</procedure>
251Returns character string[idx], using 0-origin indexing.
252
253<procedure>substring    string start end → string (R5RS)</procedure>
254
255<procedure>string-copy  string [start end] → string (R5RS+)</procedure>
256These procedures return a string containing the characters of string beginning with index {{start}} (inclusive) and ending with index {{end}} (exclusive). The only difference is that {{substring}} requires all three arguments, whereas {{string-copy}} requires only one.
257
258<procedure>string-take       string nchars → string</procedure>
259
260<procedure>string-drop       string nchars → string</procedure>
261
262<procedure>string-take-right string nchars → string</procedure>
263
264<procedure>string-drop-right string nchars → string</procedure>
265{{string-take}} returns a string containing the first {{nchars}} of string; string-drop returns a string containing all but the first {{nchars}} of string. {{string-take-right}} returns a string containing the last {{nchars}} of string; {{string-drop-right}} returns a string containing all but the last {{nchars}} of string.
266
267<enscript highlight="scheme">
268(string-take "Pete Szilagyi" 6) => "Pete S"
269(string-drop "Pete Szilagyi" 6) => "zilagyi"
270
271(string-take-right "Beta rules" 5) => "rules"
272(string-drop-right "Beta rules" 5) => "Beta "
273</enscript>
274
275It is an error to take or drop more characters than are in the string:
276
277<enscript highlight="scheme">
278(string-take "foo" 37) => error
279</enscript>
280
281<procedure>string-pad       string len [char start end] → string</procedure>
282
283<procedure>string-pad-right string len [char start end] → string</procedure>
284Returns a string of length {{len}} comprised of the characters drawn from the given subrange of string, padded on the left (right) by as many occurrences of the character char as needed. If string has more than {{len}} chars, it is truncated on the left (right) to length {{len}}. char defaults to {{#\space}}.
285
286<enscript highlight="scheme">
287(string-pad     "325" 5) => "  325"
288(string-pad   "71325" 5) => "71325"
289(string-pad "8871325" 5) => "71325"
290</enscript>
291
292<procedure>string-trim       string [pred start end] → string</procedure>
293
294<procedure>string-trim-right string [pred start end] → string</procedure>
295
296<procedure>string-trim-both  string [pred start end] → string</procedure>
297Returns a string obtained from the given subrange of string by skipping over all characters on the left side / on the right side / on both sides that satisfy the second argument {{pred}}: {{pred}} defaults to {{char-whitespace?}}.
298
299<enscript highlight="scheme">
300(string-trim-both "  The outlook wasn't brilliant,  \n\r")
301    => "The outlook wasn't brilliant,"
302</enscript>
303===== Replacement
304<procedure>string-replace string1 string2 start1 end1 [start2 end2] → string</procedure>
305Returns
306
307<enscript highlight="scheme">
308(string-append (substring string1 0 start1)
309                (substring string2 start2 end2)
310                (substring string1 end1 (string-length string1)))
311</enscript>
312
313That is, the segment of characters in {{string1}} from {{start1}} to {{end1}} is replaced by the segment of characters in {{string2}} from {{start2}} to {{end2}}. If {{start1=end1}}, this simply splices the characters drawn from {{string2}} into {{string1}} at that position.
314======= Examples:
315<enscript highlight="scheme">
316(string-replace "The TCL programmer endured daily ridicule."
317                 "another miserable perl drone" 4 7 8 22)
318    => "The miserable perl programmer endured daily ridicule."
319
320(string-replace "It's easy to code it up in Scheme." "lots of fun" 5 9)
321    => "It's lots of fun to code it up in Scheme."
322
323(define (string-insert s i t) (string-replace s t i i))
324
325(string-insert "It's easy to code it up in Scheme." 5 "really ")
326    => "It's really easy to code it up in Scheme."
327
328(define (string-set s i c) (string-replace s (string c) i (+ i 1)))
329
330(string-set "String-ref runs in O(n) time." 21 #\1)
331    => "String-ref runs in O(1) time."
332</enscript>
333===== Comparison
334<procedure>string=? string1 string2 string3 ... → boolean (R5RS)</procedure>
335Returns {{#t}} if all the strings have the same length and contain exactly the same characters in the same positions; otherwise returns {{#f}}.
336
337<procedure>string<?  string1 string2 string3 ... → boolean (R5RS)</procedure>
338
339<procedure>string>?  string1 string2 string3 ... → boolean (R5RS)</procedure>
340
341<procedure>string<=? string1 string2 string3 ... → boolean (R5RS)</procedure>
342
343<procedure>string>=? string1 string2 string3 ... → boolean (R5RS)</procedure>
344These procedures return #t if their arguments are (respectively): monotonically increasing, monotonically decreasing, monotonically non-decreasing, or monotonically non-increasing.
345
346These comparison predicates are required to be transitive.
347
348These procedures compare strings in an implementation-defined way. One approach is to make them the lexicographic extensions to strings of the corresponding orderings on characters. In that case, {{string<?}} would be the lexicographic ordering on strings induced by the ordering {{char<?}} on characters, and if two strings differ in length but are the same up to the length of the shorter string, the shorter string would be considered to be lexicographically less than the longer string. However, implementations are also allowed to use more sophisticated locale-specific orderings.
349
350In all cases, a pair of strings must satisfy exactly one of {{string<?}}, {{string=?}}, and {{string>?}}, must satisfy {{string<=?}} if and only if they do not satisfy string>?, and must satisfy {{string>=?}} if and only if they do not satisfy {{string<?}}.
351
352<procedure>string-ci=? string1 string2 string3 ... → boolean (R5RS)</procedure>
353Returns #t if, after calling string-foldcase on each of the arguments, all of the case-folded strings would have the same length and contain the same characters in the same positions; otherwise returns #f.
354
355<procedure>string-ci<?  string1 string2 string3 ... → boolean (R5RS)</procedure>
356
357<procedure>string-ci>?  string1 string2 string3 ... → boolean (R5RS)</procedure>
358
359<procedure>string-ci<=? string1 string2 string3 ... → boolean (R5RS)</procedure>
360
361<procedure>string-ci>=? string1 string2 string3 ... → boolean (R5RS)</procedure>
362These procedures behave as though they had called {{string-foldcase}} on their arguments before applying the corresponding procedures without "-ci".
363===== Prefixes and suffixes
364<procedure>string-prefix-length string1 string2 [start1 end1 start2 end2] → integer</procedure>
365
366<procedure>string-suffix-length string1 string2 [start1 end1 start2 end2] → integer</procedure>
367Return the length of the longest common prefix/suffix of {{string1}} and {{string2}}. For prefixes, this is equivalent to their "mismatch index" (relative to the start indexes).
368
369The optional start/end indexes restrict the comparison to the indicated substrings of {{string1}} and {{string2}}.
370
371<procedure>string-prefix? string1 string2 [start1 end1 start2 end2] → boolean</procedure>
372
373<procedure>string-suffix? string1 string2 [start1 end1 start2 end2] → boolean</procedure>
374Is string1 a prefix/suffix of string2?
375
376The optional start/end indexes restrict the comparison to the indicated substrings of {{string1}} and {{string2}}.
377===== Searching
378<procedure>string-index       string pred [start end] → idx-or-false</procedure>
379
380<procedure>string-index-right string pred [start end] → idx-or-false</procedure>
381
382<procedure>string-skip        string pred [start end] → idx-or-false</procedure>
383
384<procedure>string-skip-right  string pred [start end] → idx-or-false</procedure>
385string-index searches through the given substring from the left, returning the index of the leftmost character satisfying the predicate {{pred}}. {{string-index-right}} searches from the right, returning the index of the rightmost character satisfying the predicate {{pred}}. If no match is found, these procedures return {{#f}}.
386
387The start and end arguments specify the beginning and end of the search; the valid indexes relevant to the search include start but exclude end. Beware of "fencepost" errors: when searching right-to-left, the first index considered is (- end 1), whereas when searching left-to-right, the first index considered is start. That is, the start/end indexes describe the same half-open interval {{[start,end)}} in these procedures that they do in all other procedures specified by this SRFI.
388
389The skip functions are similar, but use the complement of the criterion: they search for the first char that doesn't satisfy {{pred}}. To skip over initial whitespace, for example, say
390
391<enscript highlight="scheme">
392(substring string
393            (or (string-skip string char-whitespace?)
394                (string-length string))
395            (string-length string))
396</enscript>
397
398<procedure>string-contains       string1 string2 [start1 end1 start2 end2] → idx-or-false</procedure>
399
400<procedure>string-contains-right string1 string2 [start1 end1 start2 end2] → idx-or-false</procedure>
401Does the substring of {{string1}} specified by {{start1}} and {{end1}} contain the sequence of characters given by the substring of {{string2}} specified by {{start2}} and {{end2}}?
402
403Returns {{#f}} if there is no match. If {{start2 = end2}}, {{string-contains}} returns {{start1}} but {{string-contains-right}} returns {{end1}}. Otherwise returns the index in {{string1}} for the first character of the first/last match; that index lies within the half-open interval {{[start1,end1)}}, and the match lies entirely within the {{[start1,end1)}} range of {{string1}}.
404
405<enscript highlight="scheme">
406(string-contains "eek -- what a geek." "ee" 12 18) ; Searches "a geek"
407    => 15
408</enscript>
409====== Note:
410The names of these procedures do not end with a question mark. This indicates a useful value is returned when there is a match.
411
412<procedure>string-take-while        string pred [start end] → string</procedure>
413
414<procedure>string-take-while-right  string pred [start end] → string</procedure>
415Returns the longest initial prefix/suffix of the substring of string specified by start and end whose elements all satisfy the predicate {{pred}}. (Not SRFI 13 procedures.)
416
417<procedure>string-drop-while        string pred [start end] → string</procedure>
418
419<procedure>string-drop-while-right  string pred [start end] → string</procedure>
420Drops the longest initial prefix/suffix of the substring of string specified by start and end whose elements all satisfy the predicate {{pred}}, and returns the rest of the string.
421
422These are the same as {{string-trim}} and {{string-trim-right}}, but with a different order of arguments. (Not SRFI 13 procedures.)
423
424<procedure>string-span   string pred [start end] → [string string]</procedure>
425
426<procedure>string-break  string pred [start end] → [string string]</procedure>
427String-span splits the substring of string specified by {{start}} and {{end}} into the longest initial prefix whose elements all satisfy {{pred}}, and the remaining tail. String-break inverts the sense of the predicate: the tail commences with the first element of the input string that satisfies the predicate. (Not SRFI 13 procedures.)
428
429In other words: {{span}} finds the initial span of elements satisfying {{pred}}, and break breaks the string at the first element satisfying {{pred}}.
430
431String-span is equivalent to
432
433<enscript highlight="scheme">
434(values (string-take-while pred string)
435        (string-drop-while pred string))
436</enscript>
437===== Concatenation
438<procedure>string-append string ... → string (R5RS)</procedure>
439Returns a string whose sequence of characters is the concatenation of the sequences of characters in the given arguments.
440
441<procedure>string-concatenate string-list → string</procedure>
442Concatenates the elements of string-list together into a single string.
443======= Rationale:
444Some implementations of Scheme limit the number of arguments that may be passed to an n-ary procedure, so the
445<enscript highlight="scheme">
446(apply string-append string-list)
447</enscript>
448idiom, which is otherwise equivalent to using this procedure, is not as portable.
449
450<procedure>string-concatenate-reverse string-list [final-string end] → string</procedure>
451With no optional arguments, calling this procedure is equivalent to
452
453<enscript highlight="scheme">
454(string-concatenate (reverse string-list))
455</enscript>
456
457If the optional argument final-string is specified, it is effectively consed onto the beginning of string-list before performing the list-reverse and string-concatenate operations.
458
459If the optional argument end is given, only the characters up to but not including end in final-string are added to the result, thus producing
460
461<enscript highlight="scheme">
462(string-concatenate
463  (reverse (cons (substring final-string 0 end)
464                 string-list)))
465</enscript>
466======= For example:
467<enscript highlight="scheme">
468(string-concatenate-reverse '(" must be" "Hello, I") " going.XXXX" 7)
469  => "Hello, I must be going."
470</enscript>
471======= Rationale:
472This procedure is useful when constructing procedures that accumulate character data into lists of string buffers, and wish to convert the accumulated data into a single string when done. The optional end argument accommodates that use case by allowing the final buffer to be only partially full without having to copy it a second time, as string-take would require.
473======= Note
474that reversing a string simply reverses the sequence of code points it contains. Caution should be taken if a grapheme cluster is divided between two string arguments.
475
476<procedure>string-join string-list [delimiter grammar] → string</procedure>
477This procedure is a simple unparser; it pastes strings together using the delimiter string.
478
479{{string-list}} is a list of strings. delimiter is a string. The grammar argument is a symbol that determines how the delimiter is used, and defaults to 'infix.
480======= It is an error for grammar to be any symbol other than these four:
481
482<parameter>'infix</parameter>
483means an infix or separator grammar: insert the delimiter between list elements. An empty list will produce an empty string.
484
485<parameter>'strict-infix</parameter>
486means the same as 'infix if the string-list is non-empty, but will signal an error if given an empty list. (This avoids an ambiguity shown in the examples below.)
487
488<parameter>'suffix</parameter>
489means a suffix or terminator grammar: insert the delimiter after every list element.
490
491<parameter>'prefix</parameter>
492means a prefix grammar: insert the delimiter before every list element.
493======= The delimiter is the string used to delimit elements;
494it defaults to a single space " ".
495======= Examples
496<enscript highlight="scheme">
497(string-join '("foo" "bar" "baz"))
498         => "foo bar baz"
499(string-join '("foo" "bar" "baz") "")
500         => "foobarbaz"
501(string-join '("foo" "bar" "baz") ":")
502         => "foo:bar:baz"
503(string-join '("foo" "bar" "baz") ":" 'suffix)
504         => "foo:bar:baz:"
505
506;; Infix grammar is ambiguous wrt empty list vs. empty string:
507(string-join '()   ":") => ""
508(string-join '("") ":") => ""
509
510;; Suffix and prefix grammars are not:
511(string-join '()   ":" 'suffix)) => ""
512(string-join '("") ":" 'suffix)) => ":"
513</enscript>
514===== Fold and map and friends
515<procedure>string-fold       kons knil string [start end] → value</procedure>
516
517<procedure>string-fold-right kons knil string [start end] → value</procedure>
518These are the fundamental iterators for strings.
519======= The string-fold procedure maps the kons procedure across the given string from left to right:
520<enscript highlight="scheme">
521(... (kons string[2] (kons string[1] (kons string[0] knil))))
522</enscript>
523======= In other words, string-fold obeys the (tail) recursion
524<enscript highlight="scheme">
525  (string-fold kons knil string start end)
526== (string-fold kons (kons string[start] knil) start+1 end)
527</enscript>
528======= The string-fold-right procedure maps kons across the given string from right to left:
529<enscript highlight="scheme">
530(kons string[0]
531      (... (kons string[end-3]
532                 (kons string[end-2]
533                       (kons string[end-1]
534                             knil)))))
535</enscript>
536======== obeying the (tail) recursion
537<enscript highlight="scheme">
538  (string-fold-right kons knil string start end)
539== (string-fold-right kons (kons string[end-1] knil) start end-1)
540</enscript>
541======= Examples:
542<enscript highlight="scheme">
543;;; Convert a string to a list of chars.
544(string-fold-right cons '() string)
545
546;;; Count the number of lower-case characters in a string.
547(string-fold (lambda (c count)
548                (if (char-lower-case? c)
549                    (+ count 1)
550                    count))
551              0
552              string)
553</enscript>
554======= The string-fold-right combinator is sometimes called a "catamorphism."
555<procedure>string-map proc string1 string2 ... → string (R7RS-small)</procedure>
556It is an error if {{proc}} does not accept as many arguments as the number of string arguments passed to string-map, does not accept characters as arguments, or returns a value that is not a character or string.
557
558The string-map procedure applies {{proc}} element-wise to the characters of the string arguments, converts each value returned by {{proc}} to a string, and returns the concatenation of those strings. If more than one string argument is given and not all have the same length, then string-map terminates when the shortest string argument runs out. The dynamic order in which {{proc}} is called on the characters of the string arguments is unspecified, as is the dynamic order in which the coercions are performed. If any strings returned by {{proc}} are mutated after they have been returned and before the call to string-map has returned, then string-map returns a string with unspecified contents; the string-map procedure itself does not mutate those strings.
559======= Example:
560<enscript highlight="scheme">
561(string-map (lambda (c0 c1 c2)
562               (case c0
563                ((#\1) c1)
564                ((#\2) (string c2))
565                ((#\-) (string #\- c1))))
566             "1222-1111-2222"
567             "Hi There!"
568             "Dear John")
569     => "Hear-here!"
570</enscript>
571<procedure>string-for-each proc string1 string2 ... → unspecified (R7RS-small)</procedure>
572It is an error if {{proc}} does not accept as many arguments as the number of string arguments passed to string-map or does not accept characters as arguments.
573
574The string-for-each procedure applies {{proc}} element-wise to the characters of the string arguments, going from left to right. If more than one string argument is given and not all have the same length, then string-for-each terminates when the shortest string argument runs out.
575
576<procedure>string-count string pred [start end] → integer</procedure>
577Returns a count of the number of characters in the specified substring of string that satisfy the given predicate.
578
579<procedure>string-filter pred string [start end] → string</procedure>
580
581<procedure>string-remove pred string [start end] → string</procedure>
582Filter the given substring of string, retaining only those characters that satisfy / do not satisfy pred.
583
584Compatibility note: In SRFI 13, string-remove is called string-delete. This is inconsistent with SRFI 1 and other SRFIs.
585===== Replication and splitting
586<procedure>string-replicate string from to [start end] → string</procedure>
587This is an "extended substring" procedure that implements replicated copying of a substring.
588<parameter>string</parameter>
589is a string
590
591<parameter>start</parameter>
592
593<parameter>end</parameter>
594start and end are optional arguments that specify a substring of string, defaulting to 0 and the length of string.
595
596This substring is conceptually replicated both up and down the index space, in both the positive and negative directions.
597======== For example,
598if string is "abcdefg", start is 3, and end is 6, then we have the conceptual bidirectionally-infinite string
599
600 ...  d  e  f  d  e  f  d  e  f  d  e  f  d  e  f  d  e  f  d ...
601     -9 -8 -7 -6 -5 -4 -3 -2 -1  0 +1 +2 +3 +4 +5 +6 +7 +8 +9
602======= string-replicate returns the substring of this string beginning at index from, and ending at to.
603It is an error if from is greater than to.
604======== You can use string-replicate to perform a variety of tasks:
605* To rotate a string left: {{(string-replicate "abcdef" 2 8) => "cdefab"}}
606* To rotate a string right: {{(string-replicate "abcdef" -2 4) => "efabcd"}}
607* To replicate a string: {{(string-replicate "abc" 0 7) => "abcabca"}}
608======= Note that
609* The {{from/to}} arguments give a half-open range containing the characters from index from up to, but not including, index to.
610* The {{from/to}} indexes are not expressed in the index space of string. They refer instead to the replicated index space of the substring defined by {{string}}, {{start}}, and {{end}}.
611======= It is an error if start=end, unless from=to, which is allowed as a special case.
612======= Compatibility note:
613In SRFI 13, this procedure is called {{xsubstring}}.
614
615<procedure>string-segment string k → list</procedure>
616Returns a list of strings representing the consecutive substrings of length k. The last string may be shorter than k. (Not a SRFI 13 procedure.)
617
618<procedure>string-split string delimiter [grammar limit start end] → list</procedure>
619Returns a list of strings representing the words contained in the substring of string from {{start}} (inclusive) to {{end}} (exclusive). The {{delimiter}} is a string to be used as the word separator. This will often be a single character, but multiple characters are allowed for use cases such as splitting on "\r\n". The returned list will have one more item than the number of non-overlapping occurrences of the delimiter in the string. If {{delimiter}} is an empty string, then the returned list contains a list of strings, each of which contains a single character. (Not a SRFI 13 procedure; replaces {{string-tokenize}}).
620
621The grammar is a symbol with the same meaning as in the {{string-join}} procedure. If it is infix, which is the default, processing is done as described above, except an empty string produces the empty list; if grammar is strict-infix, then an empty string signals an error. The values prefix and suffix cause a leading/trailing empty string in the result to be suppressed.
622
623If {{limit}} is a non-negative exact integer, at most that many splits occur, and the remainder of string is returned as the final element of the list (so the result will have at most limit+1 elements). If {{limit}} is not specified or is {{#f}}, then as many splits as possible are made. It is an error if limit is any other value.
624
625To split on a regular expression, use SRFI 115's {{regexp-split}} procedure.
626===== Input-output
627<procedure>read-string k [port] → string (R7RS-small)</procedure>
628Reads the next k characters, or as many as are available before the end of file, from the textual input port into a newly allocated string in left-to-right order and returns the string. If no characters are available before the end of file, an {{end-of-file}} object is returned. The default port is the value of {{(current-input-port)}}.
629
630<procedure>write-string string [port start end]→ unspecified (R7RS-small)</procedure>
631Writes the characters of string from index {{start}} to index {{end}} onto textual output port {{port}}. The default port is the value of {{(current-output-port)}}.
632===== Mutation
633<procedure>string-set! string k char → unspecified (R5RS)</procedure>
634The {{string-set!}} procedure stores char in element {{k}} of string.
635
636<procedure>string-fill! string fill [start end] → unspecified (R5RS+)</procedure>
637The {{string-fill!}} procedure stores fill in elements {{start}} through {{end}} of string.
638
639<procedure>string-copy! to at from [start end] → unspecified (R7RS-small)</procedure>
640Copies the characters of string from between {{start}} and {{end}} to string to, starting at at. The order in which characters are copied is unspecified, except that if the source and destination overlap, copying takes place as if the source is first copied into a temporary string and then into the destination. This can be achieved without allocating storage by making sure to copy in the correct direction in such circumstances.
641=== Sample implementation
642The [[https://srfi.schemers.org/srfi-152/srfi-152.tgz|sample implementations]] of this SRFI are in the SRFI repository. The main implementation is portable but inefficient; since efficiency is not a design goal (use texts for that!), it should be satisfactory.
643
644There are two modules for Chicken. One works on Chicken's native 8-bit strings; the other leverages the {{utf8}} egg to provide a UTF-8 facade over those same strings. This means that there is no reliable way to tell by inspection whether a string is 8-bit or UTF-8, and one must take precautions to avoid mixing them.
645
646The Chicken modules {{srfi-13}} {{utf8}} {{utf8-srfi-13}} {{utf8-case-map}} shouldn't be imported together into the same module or program with either {{srfi-152}} or {{utf8-srfi-152}}, as they are inherently incompatible. However, it is possible to import {{utf8-srfi-152}} and then cherry-pick non-conflicting identifiers from {{utf8}} with (import (only {{utf8}} read-char write-char print ...)). There is no problem with the {{utf8-srfi-14}} and unicode-char-sets modules.
647
648When importing any of the scheme chicken data-structures extras modules along with {{utf8-srfi-152}}, be sure to do it as follows to avoid conflicts:
649
650<enscript highlight="scheme">
651(import (except scheme
652    make-string string string-length string-ref string-set! substring
653    string->list list->string string-fill!))
654(import (except chicken
655    reverse-list->string))
656(import (except data-structures
657    string-split substring-index))
658(import (except extras
659    read-string write-string))
660</enscript>
661
662When using the {{srfi-152}} module instead, import the scheme module as follows:
663
664<enscript highlight="scheme">
665(import (except scheme
666  string->list string-fill!))
667</enscript>
668
669The other modules, if imported, must be restricted in the same way as shown above.
670
671The R7RS library assumes the presence of all R7RS-small procedures and does not require excluding any of them, as this SRFI is inherently compatible with R7RS-small.
672=== Author
673John Cowan, ported to Chicken 5 and packaged by Sergey Goldgaber.
674=== Acknowledgements
675I acknowledge the participants in the SRFI 152 mailing list, and everyone acknowledged in SRFI 135 (which acknowledges everyone acknowledged in SRFI 130 (which acknowledges everyone acknowledged in SRFI 13)). Particularly important are Olin Shivers, the author of SRFI 13, and Will Clinger, the author of SRFI 135.
676
677As Olin said, we should not assume any of those individuals endorse this SRFI.
678=== Copyright
679Copyright (C) John Cowan (2017).
680
681Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
682
683The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
684
685THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
686=== Version history
687==== 0.1 - Packaged for Chicken Scheme 5.2.0
Note: See TracBrowser for help on using the repository browser.