source: project/wiki/eggref/5/srfi-130 @ 39134

Last change on this file since 39134 was 39134, checked in by gnosis, 3 months ago

Initial revision of SRFI-130 documentation

File size: 43.6 KB
Line 
1== SRFI-130: Cursor-based string library
2=== Abstract
3[[https://srfi.schemers.org/srfi-130/srfi-130.html#R5RS|R5RS]] Scheme has an impoverished set of string-processing utilities, which is a problem for authors of portable code. Although [[https://srfi.schemers.org/srfi-130/srfi-130.html#R7RS|R7RS]] provides some extensions and improvements, it is still very incomplete. This SRFI proposes a coherent and comprehensive set of string-processing procedures; it is accompanied by a portable sample implementation of the spec.
4
5This SRFI is derived from SRFI 13. The biggest difference is that it allows subsequences of strings to be specified by cursors as well as the traditional string indexes. In addition, it omits the comparison, case-mapping, and mutation operations of SRFI 13, as well as all procedures already present in [[https://srfi.schemers.org/srfi-130/srfi-130.html#R7RS|R7RS]].
6
7For more information see: [[https://srfi.schemers.org/srfi-130/|SRFI-130: Cursor-based string library]]
8=== Procedure Index
9Here is a list of the procedures provided by this SRFI.
10==== Cursor operations
11* string-cursor?
12* string-cursor-start    string-cursor-end
13* string-cursor-next     string-cursor-prev
14* string-cursor-forward  string-cursor-back
15* string-cursor=?
16* string-cursor<?        string-cursor>?
17* string-cursor<=?       string-cursor>=?
18* string-cursor-diff
19* string-cursor->index   string-index->cursor
20==== Predicates
21* string-null?
22* string-every string-any
23==== Constructors
24* string-tabulate
25* string-unfold   string-unfold-right
26==== Conversion
27* string->list/cursors string->vector/cursors
28* reverse-list->string string-join
29==== Selection
30* string-ref/cursor
31* substring/cursors  string-copy/cursors
32* string-take        string-take-right
33* string-drop        string-drop-right
34* string-pad         string-pad-right
35* string-trim        string-trim-right string-trim-both
36==== Prefixes & suffixes
37* string-prefix-length    string-suffix-length
38* string-prefix?          string-suffix?
39==== Searching
40* string-index     string-index-right
41* string-skip      string-skip-right
42* string-contains  string-contains-right
43==== The whole string
44* string-reverse
45* string-concatenate  string-concatenate-reverse
46* string-fold         string-fold-right
47* string-for-each-cursor
48* string-replicate    string-count
49* string-replace      string-split
50* string-filter       string-remove
51=== Rationale
52This SRFI defines a rich set of operations for manipulating strings. These are frequently useful for scripting and other text-manipulation applications. The library's design was influenced by the string libraries found in MIT Scheme, Gambit, RScheme, MzScheme, SLIB, Common Lisp, Bigloo, Guile, Chez, APL, Java, and the SML standard basis. All functionality is available in substring and full-string forms.
53
54When SRFI 13 was defined in 1999, it was intended to provide efficient string operations on both whole strings and substrings. At that time, it was normal for strings to be sequences of 8-bit characters, or in a few cases, characters of other fixed lengths. Consequently, many of the SRFI 13 procedures accept optional exact integer arguments for each of the string arguments, indexing the beginning and the end of the substring(s) to be operated on.
55
56Unfortunately for this design, Unicode has become much more widely used, and it is now fairly common for implementations to store strings internally as UTF-8 or UTF-16 code unit sequences, which means that indexing operations are potentially O(n) rather than O(1). Using opaque cursors makes it possible to iterate much more efficiently through such strings compared to incrementing or decrementing indexes; however, for backward compatibility, the procedures defined in this SRFI accept either cursors or indexes. The results returned are always cursors: the use of indexes is preserved mainly for the sake of existing code and for implementer convenience.
57
58The operations provided here are entirely independent of the character repertoire supported by the implementation. In particular, this means that the comparison and case conversion procedures of SRFI 13 are excluded. There is also no provision for [[http://www.r6rs.org/final/html/r6rs-lib/r6rs-lib-Z-H-2.html#node_idx_54|R6RS normalization procedures]] or for a string->integer procedure that was proposed for SRFI 13 but not included. These may appear in future SRFIs. Furthermore, string mutation can be extremely expensive if the storage used for the string needs to be expanded, particularly if the implementation does not use an indirect pointer to it (as in Chicken), so this SRFI does not provide for it. The low-level procedures of SRFI 13 are specific to the sample implementation, and have been removed to make other implementations simpler and easier.
59
60Many SRFI 13 procedures accept either a predicate, a single character, or a [[https://srfi.schemers.org/srfi-14/srfi-14.html|SRFI 14]] character set. In this SRFI, only support for predicates is required, though implementations may also support the other two alternatives. In that case, a single character is interpreted as a predicate which returns true if its argument is the same (in the sense of eqv?) to that character; a character set is interpreted as a predicate which returns true if its argument belongs to that character set. In SRFI 13, character sets are inherently more efficient than predicates [[https://srfi.schemers.org/srfi-13/mail-archive/msg00052.html|because testing them is fast and free of side effects]], though how fast character sets actually are if they support full Unicode is very implementation-dependent. The only procedure that absolutely requires character set support, string-tokenize, has been replaced here by the more usual string-split procedure provided by Perl, Python, Java, JavaScript, and other languages.
61
62The search procedures in SRFI 13 return either an index or {{#f}} if the search fails. Their counterparts in this SRFI return cursors. Left-to-right searches return a cursor representing the leftmost matching character, or the post-end cursor if there is no match; right-to-left searches return a cursor representing the successor of the rightmost matching character, or the start cursor if there is no match. This convention was devised by Alan Watson and implemented in Chibi Scheme.
63
64In short, this SRFI is intended to help move the practice of Scheme programming away from mutable strings, string indexes, and SRFI 13, while largely maintaining backward compatibility. It does not require any particular run-time efficiencies from its procedures.
65=== Specification
66==== String cursors
67While indexes are exact integers ranging from 0 to the length of the string they refer to, cursors are opaque objects that point into strings. However, they are not required to belong to a disjoint type, as long as they are either disjoint from indexes or identical to indexes. For example, they may be negative exact integers representing indexes into a byte array underlying the string. It is also possible to implement cursors as a record type or an implementation-specific primitive type. Additionally, in implementations where no provision has been made for cursors, or there is no benefit in implementing them separately because strings are in fact arrays of fixed-length characters, it is useful to allow indexes and cursors to be the same thing. (Cursors must also be disjoint from {{#f}}.)
68
69It is an error to make any use of a cursor referring to a string after the string, or any string that shares storage with it, has been mutated by a procedure like {{string-set!}}, {{string-copy!}}, or {{string-fill!}}.
70
71Given a string of length {{n}}, there are {{n + 1}} valid cursors that refer to it: one for each character in the string, and one for the position just after the last character, known as the "post-end cursor". The cursor for the first (or zeroth) position in the string is known as the "start cursor". The post-end cursor is provided because when creating a string from cursors the second cursor argument is exclusive. It is an error if a cursor argument is not one of the valid cursors for the string argument. The index analogue of the post-end cursor is n.
72==== Calling predicates
73All predicates passed to procedures defined in this SRFI may be called in any order and any number of times, except as otherwise noted. This is not the case in SRFI 13.
74==== Shared storage
75Some Scheme implementations, e.g. Guile, provide ways to construct substrings that share storage with other strings. SRFI 130 provides only minimal support for such shared substrings. The following SRFI 130 procedures are allowed to return a result which shares storage with one or more of their string arguments:
76
77 substring/cursors
78 string-take string-take-right
79 string-drop string-drop-right
80 string-pad string-pad-right
81 string-trim string-trim-right string-trim-both
82 string-split string-filter string-remove
83
84In particular, if the result is the same (in the sense of {{string=?}}) as any of the arguments, any implementation of the above procedures may return the string argument without copying it. Other procedures such as {{string-copy/cursors}}, as well as all the [[https://srfi.schemers.org/srfi-130/srfi-130.html#R7RS|R7RS]] procedures, are not permitted to return shared results. If a shared value is returned, it may be mutable or immutable.
85==== Naming conventions
86The procedures of this SRFI follow a consistent naming scheme, and are consistent with the conventions developed in SRFI 1. The names are composed of smaller lexemes in a regular way that exposes the structure and relationships between the procedures. This should help the programmer to recall or reconstitute the name of the desired procedure. In particular, the order of common parameters is consistent across the different procedures.
87
88Procedures that have left/right directional variants use no suffix to specify left-to-right operation, -right to specify right-to-left operation, and -both to specify both. This is a general convention that has been established in other SRFIs; the value of a convention is proportional to the extent of its use.
89==== Notation
90===== In the following procedure specifications:
91* An {{s}} parameter is a string.
92* A {{char}} parameter is a character.
93* {{Start}} and {{end}} parameters are half-open string cursors or indexes specifying a substring within a string parameter, and typically restrict a procedure's action to the indicated substring.
94
95When omitted, they default to 0 and the length of the string, respectively; or from another point of view, they default to the start cursor and the post-end cursor, respectively. For indexes, it must be the case that 0 <= {{start}} <= {{end}} <= {{(string-length s)}}, for the corresponding parameter s when {{start}} and {{end}} are indexes, and the corresponding relationship must hold when they are cursors. It is an error unless {{start}} and {{end}} are both cursors or both indexes.
96* A {{pred}} parameter is a unary character predicate procedure, returning a true/false value when applied to a character.
97* It is an error if a {{pred}} is not pure and functional.
98* A cursor parameter is either a cursor or an exact non-negative integer specifying an index into a string.
99* {{Len}} and {{nchars}} parameters are exact non-negative integers specifying a length of a string or some number of characters.
100* An {{obj}} parameter may be any value at all.
101* Passing values to procedures with these parameters that do not satisfy these types is an error.
102* Parameters given in square brackets are optional.
103* Unless otherwise noted in the text describing the procedure, any prefix of these optional parameters may be supplied, from zero arguments to the full list.
104* When a procedure returns multiple values, this is shown by listing the return values in square brackets, as well.
105====== So, for example, the procedure with signature
106{{halts? f [x init-store] → [boolean integer]}}
107
108would take one {{(f)}}, two {{(f, x)}} or three {{(f, x, init-store)}} input parameters, and return two values, a boolean and an integer.
109===== A parameter followed by "..." means zero or more elements.
110====== So the procedure with the signature
111{{sum-squares x ...  → number}}
112
113takes zero or more arguments {{(x ...)}},
114====== while the procedure with signature
115{{spell-check doc dict[1] dict[2] ... → string-list}}
116
117takes two required parameters ({{doc}} and {{dict[1]}}) and zero or more optional parameters ({{dict[2]}} ...).
118===== If a procedure's return value is said to be "unspecified," this means that the procedure returns a single arbitrary value.
119Such a procedure is not even required to be consistent from call to call.
120==== Procedures
121===== Cursor operations
122These procedures are mostly taken from Chibi Scheme.
123
124<procedure>string-cursor? obj → boolean</procedure>
125
126Returns {{#t}} if {{obj}} can be a string cursor, and {{#f}} otherwise. In implementations where cursors and indexes are the same thing, {{#t}} is returned on any cursor or index; where they are disjoint, {{#t}} is returned on cursors, {{#f}} on indexes. If obj is neither a cursor nor an index, string-cursor? will always return {{#f}}.
127
128<procedure>string-cursor-start s → cursor</procedure>
129
130
131<procedure>string-cursor-end s → cursor</procedure>
132
133Returns the start/post-end cursor of s respectively.
134
135<procedure>string-cursor-next s cursor → cursor</procedure>
136
137
138<procedure>string-cursor-prev s cursor → cursor</procedure>
139
140Returns the cursor into s following/preceding cursor. If cursor is an index, returns one more/less than cursor. It is an error if cursor is the post-end/start cursor of s.
141
142<procedure>string-cursor-forward s cursor nchars → cursor</procedure>
143
144
145<procedure>string-cursor-back s cursor nchars → cursor</procedure>
146
147Returns the cursor into s which follows/precedes cursor by {{nchars}} characters. If cursor is an index, returns {{nchars}} more/less than cursor. It is an error if the result would be an invalid cursor or index.
148
149<procedure>string-cursor=? cursor[1] cursor[2] → boolean</procedure>
150
151
152<procedure>string-cursor<? cursor[1] cursor[2] → boolean</procedure>
153
154
155<procedure>string-cursor>? cursor[1] cursor[2] → boolean</procedure>
156
157
158<procedure>string-cursor<=? cursor[1] cursor[2] → boolean</procedure>
159
160
161<procedure>string-cursor>=? cursor[1] cursor[2] → boolean</procedure>
162
163Compares two cursors or two indexes pointing into the same string.
164
165<procedure>string-cursor-diff s start end → nchars</procedure>
166
167Returns the number of characters between {{start}} and {{end}} in string s. Note that the result is always non-negative if {{start}} and {{end}} are a valid {{start-end}} pair.
168
169<procedure>string-cursor->index s cursor → index</procedure>
170
171
172<procedure>string-index->cursor s index → cursor</procedure>
173
174Converts a cursor/index into s into the corresponding index/cursor. If the argument is already an index/cursor, it is returned unchanged.
175===== Predicates
176
177<procedure>string-null? s → boolean</procedure>
178
179Is s the empty string?
180
181<procedure>string-every pred s [start end] → value</procedure>
182
183
184<procedure>string-any pred s [start end] → value</procedure>
185
186Checks to see if every/any character in s satisfies {{pred}} proceeding from left (index {{start}}) to right (index {{end}}).
187======= The predicate is "witness-generating":
188* If {{string-any}} returns true, the returned true value is the one produced by the application of the predicate.
189* If {{string-every}} returns true, the returned true value is the one produced by the final application of the predicate to {{s[end-1]}}.
190If {{string-every}} is applied to an empty sequence of characters, it simply returns {{#t}}.
191======= The names of these procedures do not end with a question mark -- this is to indicate that they do not return a simple boolean ({{#t}} or {{#f}}), but a general value.
192===== Constructors
193
194<procedure>string-tabulate proc len → string</procedure>
195
196* {{Proc}} is an integer → char procedure.
197* Construct a string of size {{len}} by applying {{proc}} to each value from {{0}} (inclusive) to {{len}} (exclusive) to produce the corresponding string element.
198* The order in which {{proc}} is applied to the indexes is not specified.
199* Note that the order of arguments is not the same as SRFI 1's list-tabulate, but is the same as tabulation functions in other SRFIs.
200When this discrepancy was discovered in SRFI 13, it was too late to change SRFI 1.
201
202<procedure>string-unfold stop? mapper successor seed [base make-final] → string</procedure>
203
204* This is a fundamental constructor for strings.
205
206<parameter>Successor</parameter>
207is used to generate a series of "seed" values from the initial seed:
208
209 seed, (successor seed), (successor^2 seed), (successor^3 seed), ...
210
211<parameter>Stop?</parameter>
212tells us when to stop -- when it returns true when applied to one of these seed values.
213
214<parameter>Mapper</parameter>
215maps each seed value to the corresponding character in the result string. These chars are assembled into the string in a left-to-right order.
216
217<parameter>Base</parameter>
218is the optional initial/leftmost portion of the constructed string; it defaults to the empty string "".
219
220<parameter>Make-final</parameter>
221is applied to the terminal seed value (on which stop? returns true) to produce the final/rightmost portion of the constructed string. It defaults to {{(lambda (x) "")}}.
222
223* {{string-unfold}} is a fairly powerful string constructor -- you can use it to convert a list to a string, read a port into a string, reverse a string, copy a string, and so forth.
224====== Examples:
225<enscript highlight="scheme">
226(port->string p) = (string-unfold eof-object? values
227                                  (lambda (x) (read-char p))
228                                  (read-char p))
229
230(list->string lis) = (string-unfold null? car cdr lis)
231
232(string-tabulate f size) = (string-unfold (lambda (i) (= i size)) f add1 0)
233</enscript>
234======= To map f over a list lis, producing a string:
235<enscript highlight="scheme">
236(string-unfold null? (compose f car) cdr lis)
237</enscript>
238======= Interested functional programmers may enjoy noting that string-fold-right and string-unfold are in some sense inverses.
239======== That is, given operations knull?, kar, kdr, kons, and knil satisfying
240<enscript highlight="scheme">
241(kons (kar x) (kdr x)) = x  and (knull? knil) = {{#t}}
242</enscript>
243======== then
244<enscript highlight="scheme">
245(string-fold-right kons knil (string-unfold knull? kar kdr x)) = x
246</enscript>
247======== and
248<enscript highlight="scheme">
249(string-unfold knull? kar kdr (string-fold-right kons knil s)) = s.
250</enscript>
251
252The final string constructed does not share storage with either base or the value produced by make-final.
253
254This combinator sometimes is called an "anamorphism."
255======= Note: implementations should take care that runtime stack limits do not cause overflow when constructing large (e.g., megabyte) strings with string-unfold.
256
257<procedure>string-unfold-right stop? mapper successor seed [base make-final] → string</procedure>
258
259This is a fundamental constructor for strings.
260
261It is equivalent to string-unfold, except that the results of mapper are assembled into the string in a right-to-left order, base is the optional rightmost portion of the constructed string, and make-final produces the leftmost portion of the constructed string.
262===== Conversion
263
264<procedure>string->list/cursors s [start end] → char-list</procedure>
265
266
267<procedure>string->vector/cursors s [start end] → char-vector</procedure>
268
269{{string->list/cursors}} and {{string->vector/cursors}} return a newly allocated list or vector of the characters that make up the given string. They differ from the R7RS procedures {{string->list}} and {{string->vector}} by accepting either cursors or indexes.
270====== reverse-list->string char-list → string
271An efficient implementation of (compose {{list->string}} reverse):
272
273<enscript highlight="scheme">
274(reverse-list->string '(#\a #\B #\c)) → "cBa"
275</enscript>
276
277This is a common idiom in the epilog of string-processing loops that accumulate an answer in a reverse-order list. (See also {{string-concatenate-reverse}} for the "chunked" variant.)
278
279<procedure>string-join string-list [delimiter grammar] → string</procedure>
280
281This procedure is a simple unparser --- it pastes strings together using the delimiter string.
282
283The {{grammar}} argument is a symbol that determines how the delimiter is used, and defaults to {{'infix}}.
284* {{'infix}} means an infix or separator {{grammar:}} insert the delimiter between list elements.
285An empty list will produce an empty string -- note, however, that parsing an empty string with an infix or separator grammar is ambiguous.
286Is it an empty list, or a list of one element, the empty string?
287* {{'strict-infix}} means the same as 'infix, but will signal an error if given an empty list.
288* {{'suffix}} means a suffix or terminator {{grammar:}} insert the delimiter after every list element. This grammar has no ambiguities.
289* {{'prefix}} means a prefix {{grammar:}} insert the delimiter before every list element. This grammar has no ambiguities.
290* The delimiter is the string used to delimit elements; it defaults to a single space " ".
291* Examples
292<enscript highlight="scheme">
293(string-join '("foo" "bar" "baz") ":")         => "foo:bar:baz"
294(string-join '("foo" "bar" "baz") ":" 'suffix) => "foo:bar:baz:"
295
296;; Infix grammar is ambiguous wrt empty list vs. empty string,
297(string-join '()   ":") => ""
298(string-join '("") ":") => ""
299
300;; but suffix & prefix grammars are not.
301(string-join '()   ":" 'suffix) => ""
302(string-join '("") ":" 'suffix) => ":"
303</enscript>
304===== Selection
305====== string-ref/cursor s cursor → char
306Returns character {{s[i]}} using a valid cursor or index of {{s}}. It differs from the R7RS procedure {{string-ref}} by accepting either a cursor or an index.
307====== substring/cursors s start end → string
308====== string-copy/cursors s [start end] → string
309These procedures return a string whose contents are the characters of s beginning with index {{start}} (inclusive) and ending with index {{end}} (exclusive). If {{substring/cursors}} produces the entire string, it may return either {{s}} or a copy of {{s}}; in some implementations, proper substrings may share memory with {{s}}. However, string-copy /cursors always returns a newly allocated string. They differ from the R7RS procedures substring and string-copy by accepting either cursors or indexes.
310====== string-take s nchars → string
311====== string-drop s nchars → string
312====== string-take-right s nchars → string
313====== string-drop-right s nchars → string
314string-take returns the first {{nchars}} of {{s}}; string-drop returns all but the first {{nchars}} of {{s}}. string-take-right returns the last {{nchars}} of s; string-drop-right returns all but the last {{nchars}} of {{s}}.
315
316If these procedures produce the entire string, they may return either {{s}} or a copy of {{s}}; in some implementations, proper substrings may share memory with {{s}}.
317======= Examples
318<enscript highlight="scheme">
319(string-take "Pete Szilagyi" 6) => "Pete S"
320(string-drop "Pete Szilagyi" 6) => "zilagyi"
321
322(string-take-right "Beta rules" 5) => "rules"
323(string-drop-right "Beta rules" 5) => "Beta "
324</enscript>
325======= It is an error to take or drop more characters than are in the string:
326<enscript highlight="scheme">
327(string-take "foo" 37) => error
328</enscript>
329
330<procedure>string-pad s len [char start end] → string</procedure>
331
332
333<procedure>string-pad-right s len [char start end] → string</procedure>
334
335Build a string of length {{len}} comprised of s padded on the left (right) by as many occurrences of the character char as needed. If {{s}} has more than {{len}} chars, it is truncated on the left (right) to length {{len}}. Char defaults to {{#\space}}.
336
337If {{len <= end-start}}, the returned value is allowed to share storage with {{s}}, or be exactly {{s}} (if {{len = end-start}}).
338
339<enscript highlight="scheme">
340(string-pad     "325" 5) => "  325"
341(string-pad   "71325" 5) => "71325"
342(string-pad "8871325" 5) => "71325"
343</enscript>
344
345<procedure>string-trim       s [pred start end] → string</procedure>
346
347
348<procedure>string-trim-right s [pred start end] → string</procedure>
349
350
351<procedure>string-trim-both  s [pred start end] → string</procedure>
352
353Trim {{s}} by skipping over all characters on the left / on the right / on both sides that satisfy the second parameter {{pred}}: {{pred}} defaults to {{char-whitespace?}}.
354
355If no trimming occurs, these functions may return either {{s}} or a copy of {{s}}; in some implementations, proper substrings may share memory with {{s}}.
356
357<enscript highlight="scheme">
358(string-trim-both "  The outlook wasn't brilliant,  \n\r")
359    => "The outlook wasn't brilliant,"
360</enscript>
361===== Prefixes & suffixes
362
363<procedure>string-prefix-length    s1 s2 [start1 end1 start2 end2] → integer</procedure>
364
365
366<procedure>string-suffix-length    s1 s2 [start1 end1 start2 end2] → integer</procedure>
367
368Return the length of the longest common prefix/suffix of the two strings. For prefixes, this is equivalent to the "mismatch index" for the strings (modulo the {{start}} cursors).
369
370The optional {{start/end}} cursors or indexes restrict the comparison to the indicated substrings of {{s1}} and {{s2}}.
371
372
373<procedure>string-prefix?    s1 s2 [start1 end1 start2 end2] → boolean</procedure>
374
375
376<procedure>string-suffix?    s1 s2 [start1 end1 start2 end2] → boolean</procedure>
377
378Is {{s1}} a prefix/suffix of {{s2}}?
379* The optional {{start/end}} cursors or indexes restrict the comparison to the indicated substrings of {{s1}} and {{s2}}.
380===== Searching
381
382<procedure>string-index s pred [start end] → cursor</procedure>
383
384
385<procedure>string-index-right s pred [start end] → cursor</procedure>
386
387
388<procedure>string-skip s pred [start end] → cursor</procedure>
389
390
391<procedure>string-skip-right s pred [start end] → cursor</procedure>
392
393{{string-index}} searches through {{s}} from the left, returning the cursor of the first occurrence of a character which satisfies the predicate {{pred}}. If no match is found, it returns {{end}}. string-index-right searches through {{s}} from the right, returning the cursor of the successor of the first occurrence of a character which satisfies the predicate {{pred}}. If no match is found, it returns start.
394
395The {{start}} and {{end}} parameters specify the beginning and end cursors or indexes of the search; the search includes the {{start}}, but not the {{end}}. Be careful of "fencepost" considerations: when searching right-to-left, the first position considered is {{(string-cursor-prev end)}}, whereas when searching left-to-right, the first index considered is start. That is, the {{start/end}} indexes describe the same half-open interval {{[start,end)}} in these procedures that they do in all the other SRFI 130 procedures.
396
397The skip functions are similar, but use the complement of the criteria: they search for the first char that doesn't satisfy {{pred}}. E.g., to skip over initial whitespace, say
398
399<enscript highlight="scheme">
400(substring/cursors s (string-skip s char-whitespace?))
401</enscript>
402
403Note that the result is always a cursor, even when {{start}} and {{end}} are indexes. Use {{string-cursor->index}} to convert the result to an index. Therefore, these four functions are not entirely compatible with their SRFI 13 counterparts, which return {{#f}} on failure.
404
405These functions can be trivially composed with string-take and string-drop to produce take-while, drop-while, span, and break procedures without loss of efficiency.
406
407<procedure>string-contains s1 s2 [start1 end1 start2 end2] → cursor</procedure>
408
409
410<procedure>string-contains-right s1 s2 [start1 end1 start2 end2] → cursor</procedure>
411
412Does string {{s1}} contain string {{s2}}?
413
414Returns the cursor in {{s1}} referring to the first character of the first/last instance of {{s2}} as a substring, or {{#f}} if there is no match. The optional {{start/end}} indexes restrict the operation to the indicated substrings.
415
416The returned cursor is in the range {{[start1,end1)}}. A successful match must lie entirely in the {{[start1,end1)}} range of {{s1}}.
417
418Note that the result is always a cursor, even when {{start1}} and {{end1}} are indexes. Use {{string-cursor->index}} to convert a cursor result to an index.
419
420<enscript highlight="scheme">
421(string-contains "eek -- what a geek." "ee"
422                 12 18) ; Searches "a geek"
423    => {Cursor 15}
424</enscript>
425
426The name of this procedure does not end with a question mark -- this is to indicate that it does not return a simple boolean ({{#t}} or {{#f}}). Rather, it returns either false ({{#f}}) or a cursor.
427===== The whole string
428
429<procedure>string-reverse s [start end] -> string</procedure>
430
431Reverse the string.
432
433{{string-reverse}} returns the result string and does not alter its {{s}} parameter.
434
435<enscript highlight="scheme">
436(string-reverse "Able was I ere I saw elba.")
437    => ".able was I ere I saw elbA"
438(string-reverse "Who stole the spoons?" 14 20)
439    => "snoops"
440</enscript>
441
442Unicode note: Reversing a string simply reverses the sequence of code-points it contains. So a combining diacritic a coming after a base character {{b}} in string {{s}} would come out before {{b}} in the reversed result.
443
444<procedure>string-concatenate string-list → string</procedure>
445
446Append the elements of string-list together into a single string. Guaranteed to return a freshly allocated string.
447
448Note that the (apply string-append string-list) idiom is not robust for long lists of strings, as some Scheme implementations limit the number of arguments that may be passed to an n-ary procedure.
449
450<procedure>string-concatenate-reverse string-list [final-string end] → string</procedure>
451
452With no optional arguments, this function is equivalent to
453
454<enscript highlight="scheme">
455(string-concatenate (reverse string-list))
456</enscript>
457
458If the optional argument final-string is specified, it is consed onto the beginning of {{string-list}} before performing the {{list-reverse}} and {{string-concatenate}} operations.
459
460If the optional argument {{end}} is given, only the characters up to but not including {{end}} in final-string are added to the result, thus producing
461
462<enscript highlight="scheme">
463(string-concatenate
464  (reverse (cons (substring final-string
465                            (string-cursor-start final-string)
466                            end)
467                 string-list)))
468</enscript>
469
470E.g.
471
472<enscript highlight="scheme">
473(string-concatenate-reverse '(" must be" "Hello, I") " going.XXXX" 7)
474  => "Hello, I must be going."
475</enscript>
476
477This procedure is useful in the construction of procedures that accumulate character data into lists of string buffers, and wish to convert the accumulated data into a single string when done.
478
479<procedure>string-fold kons knil s [start end] → value</procedure>
480
481
482<procedure>string-fold-right kons knil s [start end] → value</procedure>
483
484These are the fundamental iterators for strings.
485
486The left-fold operator maps the {{kons}} procedure across the string from left to right
487
488<enscript highlight="scheme">
489(... (kons s[2] (kons s[1] (kons s[0] knil))))
490</enscript>
491
492In other words, {{string-fold}} obeys the (tail) recursion
493
494<enscript highlight="scheme">
495(string-fold kons knil s start end) =
496    (string-fold kons (kons s[start] knil) start+1 end)
497</enscript>
498
499The {{right-fold}} operator maps the {{kons}} procedure across the string from right to left
500
501<enscript highlight="scheme">
502(kons s[0] (... (kons s[end-3] (kons s[end-2] (kons s[end-1] knil)))))
503</enscript>
504
505obeying the (tail) recursion
506
507<enscript highlight="scheme">
508(string-fold-right kons knil s start end) =
509    (string-fold-right kons (kons s[end-1] knil) start end-1)
510</enscript>
511======= Examples:
512<enscript highlight="scheme">
513;;; Convert a string to a list of chars.
514(string-fold-right cons '() s)
515
516;;; Count the number of lower-case characters in a string.
517(string-fold (lambda (c count)
518               (if (char-lower-case? c)
519                   (+ count 1)
520                   count))
521             0
522             s)
523
524;;; Double every backslash character in S.
525(let* ((ans-len (string-fold (lambda (c sum)
526                               (+ sum (if (char=? c #\\) 2 1)))
527                             0 s))
528       (ans (make-string ans-len)))
529  (string-fold (lambda (c i)
530                 (let ((i (if (char=? c #\\)
531                              (begin (string-set! ans i #\\) (+ i 1))
532                              i)))
533                   (string-set! ans i c)
534                   (+ i 1)))
535               0 s)
536  ans)
537</enscript>
538======= The right-fold combinator is sometimes called a "catamorphism."
539
540<procedure>string-for-each-cursor proc s [start end] → unspecified</procedure>
541
542Apply {{proc}} to each cursor of s, in order, excluding the post-end cursor. The optional {{start/end}} pairs restrict the endpoints of the loop. This is simply a method of looping over a string that is guaranteed to be safe and correct.
543
544====== Example:
545<enscript highlight="scheme">
546(let ((s "abcde") (v '()))
547  (string-for-each-cursor
548    (lambda (cur) (set! v (cons (char->integer (string-ref/cursor s cur)) v)))
549    s)
550  v) => (101 100 99 98 97)
551</enscript>
552
553<procedure>string-replicate s from to [start end] → string</procedure>
554
555This is an "extended substring" procedure that implements replicated copying of a substring of some string.
556
557{{S}} is a string; {{start}} and {{end}} are optional arguments that demarcate a substring of s, defaulting to {{0}} and the length of {{s}} (i.e., the whole string). Replicate this substring up and down index space, in both the positive and negative directions. For example, if {{s = "abcdefg"}}, {{start=3}}, and {{end=6}}, then we have the conceptual bidirectionally-infinite string
558
559 ...  d  e  f  d  e  f  d  e  f d  e  f  d  e  f  d  e  f  d ...
560 ... -9 -8 -7 -6 -5 -4 -3 -2 -1 0 +1 +2 +3 +4 +5 +6 +7 +8 +9 ...
561
562string-replicate returns the substring of this string beginning at index {{from}}, and ending at {{to}}. Note that these arguments cannot be cursors. It is an error if {{from}} is greater than {{to}}.
563======= You can use string-replicate to perform a variety of tasks:
564* To rotate a string left:
565<enscript highlight="scheme">
566(string-replicate "abcdef" 2 8) => "cdefab
567</enscript>"
568* To rotate a string right:
569<enscript highlight="scheme">
570(string-replicate "abcdef" -2 4) => "efabcd
571</enscript>"
572* To replicate a string:
573<enscript highlight="scheme">
574(string-replicate "abc" 0 7) => "abcabca
575</enscript>"
576======= Note that
577* The from/to indexes give a half-open range -- the characters from index from up to, but not including, index to.
578* The from/to indexes are not in terms of the index space for string {{s}}. They are in terms of the replicated index space of the substring defined by {{s}}, {{start}}, and {{end}}.
579* It is an error if {{start=end}} -- although this is allowed by special dispensation when from=to.
580======= Compatibility note:
581{{string-replicate}} is identical to the {{xsubstring}} procedure of SRFI 13, except that the to argument is required.
582
583<procedure>string-count s pred [start end] → integer</procedure>
584
585Return a count of the number of characters in {{s}} that satisfy the {{pred}} argument.
586
587<procedure>string-replace s1 s2 start1 end1 [start2 end2] → string</procedure>
588
589Returns
590
591<enscript highlight="scheme">
592(string-append (substring/cursors s1 (string-cursor-start s1) start1)
593               (substring/cursors s2 start2 end2)
594               (substring/cursors s1 end1 (string-cursor-end s1)))
595</enscript>
596
597That is, the segment of characters in {{s1}} from {{start1}} to {{end1}} is replaced by the segment of characters in {{s2}} from {{start2}} to end2. If {{start1=end1}}, this simply splices the {{s2}} characters into {{s1}} at the specified index.
598======= Examples:
599<enscript highlight="scheme">
600(string-replace "The TCL programmer endured daily ridicule."
601                "another miserable perl drone" 4 7 8 22 ) =>
602    "The miserable perl programmer endured daily ridicule."
603
604(string-replace "It's easy to code it up in Scheme." "lots of fun" 5 9) =>
605    "It's lots of fun to code it up in Scheme."
606
607(define (string-insert s i t) (string-replace s t i i))
608
609(string-insert "It's easy to code it up in Scheme." 5 "really ") =>
610    "It's really easy to code it up in Scheme."
611</enscript>
612
613<procedure>string-split s delimiter [grammar limit start end] → list</procedure>
614
615Returns a list of the words contained in the substring of string from {{start}} (inclusive) to {{end}} (exclusive). Delimiter specifies a string that is to be used as the word separator. This will often be a single character, but multiple characters are allowed for cases like splitting on "\r\n". The returned list will then have one more item than the number of non-overlapping occurrences of the delimiter in the string. If delimiter is an empty string, then the returned list contains a list of strings, each of which contains a single character.
616
617Grammar is a symbol with the same meaning as in the {{string-join}} procedure. If it is infix, which is the default, processing is done as described above, except that an empty {{s}} produces the empty list; if it is strict-infix, an empty {{s}} signals an error. The values {{prefix}} and {{suffix}} cause a leading/trailing empty string in the result to be suppressed.
618
619If {{limit}} is a non-negative exact integer, at most that many splits occur, and the remainder of string is returned as the final element of the list (thus, the result will have at most {{limit+1}} elements). If {{limit}} is not specified or is {{#f}}, then as many splits as possible are made. It is an error if {{limit}} is any other value.
620
621Use SRFI 115's regexp-split to split on a regular expression rather than a simple string.
622
623<procedure>string-filter pred s [start end] → string</procedure>
624
625
626<procedure>string-remove pred s [start end] → string</procedure>
627
628Filter the string {{s}}, retaining only those characters that satisfy / do not satisfy {{pred}}.
629
630If the string is unaltered by the filtering operation, these functions may return either {{s}} or a copy of {{s}}.
631======= Compatibility note:
632{{string-remove}} is identical to the {{string-delete}} procedure of SRFI 13, but the name {{string-delete}} is inconsistent with the conventions of SRFI 1 and other SRFIs.
633===== Sample implementation
634This SRFI comes with a sample implementation, which can be found in the repository of this SRFI. It is a cut-down version of the sample implementation of SRFI 13, with the addition of the cursor operations procedures, the {{*/cursors}} procedures, {{string-contains-right}}, and {{string-split}}. Here are Olin's original implementation notes:
635
636''I have placed this source on the Net with an unencumbered, "open" copyright. The prefix/suffix and comparison routines in this code had (extremely distant) origins in MIT Scheme's string lib, and were substantially reworked by myself. Being derived from that code, they are covered by the MIT Scheme copyright, which is a generic BSD-style open-source copyright. See the source file for details.''
637
638''The KMP string-search code was influenced by implementations written by Stephen Bevan, Brian Denheyer and Will Fitzgerald. However, this version was written from scratch by myself.''
639
640''The remainder of the code was written by myself for scsh or for this SRFI; I have placed this code under the scsh copyright, which is also a generic BSD-style open-source copyright.''
641
642''The code is written for portability and should be straightforward to port to any Scheme. The source comments contains detailed notes describing the non-R5RS dependencies.''
643
644''The library is written for clarity and well-commented. Fast paths are provided for common cases. This is not to say that the implementation can't be tuned up for a specific Scheme implementation. There are notes in the comments addressing ways implementors can tune the reference implementation for performance.''
645
646''In short, I've written the reference implementation to make it as painless as possible for an implementor -- or a regular programmer -- to adopt this library and get good results with it.''
647
648''Another implementation, derived from Chibi Scheme's SRFI 130, is present in the foof subdirectory. This implementation is smaller but may be slower. It can be more easily adapted to Schemes that differentiate between indexes and cursors.''
649===== Acknowledgements
650Thanks to the members of the SRFI 130 mailing list who made this SRFI what it now is, including Per Bothner, Arthur Gleckler, Shiro Kawai, Jim Rees, and especially Alex Shinn, whose idea it was to make cursors and indexes disjoint, and who provided the foof implementation. The following acknowledgements by Olin Shivers are taken from SRFI 13:
651
652The design of this library benefited greatly from the feedback provided during the SRFI discussion phase. Among those contributing thoughtful commentary and suggestions, both on the mailing list and by private discussion, were Paolo Amoroso, Lars Arvestad, Alan Bawden, Jim Bender, Dan Bornstein, Per Bothner, Will Clinger, Brian Denheyer, Mikael Djurfeldt, Kent Dybvig, Sergei Egorov, Marc Feeley, Matthias Felleisen, Will Fitzgerald, Matthew Flatt, Arthur A. Gleckler, Ben Goetter, Sven Hartrumpf, Erik Hilsdale, Richard Kelsey, Oleg Kiselyov, Bengt Kleberg, Donovan Kolbly, Bruce Korb, Shriram Krishnamurthi, Bruce Lewis, Tom Lord, Brad Lucier, Dave Mason, David Rush, Klaus Schilling, Jonathan Sobel, Mike Sperber, Mikael Staldal, Vladimir Tsyshevsky, Donald Welsh, and Mike Wilson. I am grateful to them for their assistance.
653
654I am also grateful to the authors, implementors and documentors of all the systems mentioned in the introduction. Aubrey Jaffer and Kent Pitman should be noted for their work in producing Web-accessible versions of the R5RS and Common Lisp spec, which was a tremendous aid.
655
656This is not to imply that these individuals necessarily endorse the final results, of course.
657
658During this document's long development period, great patience was exhibited by Mike Sperber, who is the editor for the SRFI, and by Hillary Sullivan, who is not.
659===== References & links
660====== [CommonLisp]
661Common Lisp: the Language.
662Guy L. Steele Jr. (editor).
663Digital Press, Maynard, Mass., second edition 1990.
664Available at http://www.elwood.com/alu/table/references.htm#cltl2.
665
666The Common Lisp "HyperSpec," produced by Kent Pitman, is essentially the ANSI spec for Common Lisp:
667http://www.lispworks.com/documentation/HyperSpec/Front/index.htm.
668====== [MIT-Scheme]
669http://www.swiss.ai.mit.edu/projects/scheme/
670====== [R5RS]
671Revised^5 report on the algorithmic language Scheme.
672R. Kelsey, W. Clinger, J. Rees (editors).
673Higher-Order and Symbolic Computation, Vol. 11, No. 1, September, 1998.
674and ACM SIGPLAN Notices, Vol. 33, No. 9, October, 1998.
675Available at http://www.schemers.org/Documents/Standards/.
676====== [R7RS]
677Revised^7 report on the algorithmic language Scheme.
678A. Shinn, J. Cowan, A. Gleckler (editors).
679Available at http://r7rs.org.
680====== [SRFI]
681The SRFI web site.
682http://srfi.schemers.org/
683====== [SRFI-13]
684SRFI-13: String libraries.
685http://srfi.schemers.org/srfi-13/
686====== [SRFI-14]
687SRFI-14: Character-set library.
688http://srfi.schemers.org/srfi-14/
689The SRFI 14 char-set library defines a character-set data type, which is used by some procedures in this library.
690=== Author
691John Cowan
692Ported and packaged to Chicken 5 by Sergey Goldgaber
693=== Copyright
694Copyright (C) Olin Shivers (1998, 1999, 2000) and John Cowan (2016).
695
696Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
697
698The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
699
700THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Note: See TracBrowser for help on using the repository browser.