source: project/wiki/eggref/5/srfi-135 @ 39473

Last change on this file since 39473 was 39473, checked in by Zipheir, 2 months ago

Use italics for arguments; other minor fixes.

File size: 37.9 KB
Line 
1== SRFI-135: Immutable Texts
2
3This SRFI specifies a new data type of immutable texts. The
4operations of this new data type include analogues for all of the
5non-mutating operations on strings specified by the R7RS and most of
6those specified by SRFI 130, but the immutability of texts and
7uniformity of character-based indexing simplify the specification of
8those operations while avoiding several inefficiencies associated with
9the mutability of Scheme's strings.
10
11This egg provides the UTF-8 version of the SRFI 135 sample
12implementation.
13
14[[toc:]]
15
16== Author
17
18William D. Clinger
19
20== SRFI Description
21
22This page includes excerpts from the SRFI document, but is primarily
23intended to document the forms exported by the egg. For a full
24description of this SRFI, see the full
25[[https://srfi.schemers.org/srfi-135/srfi-135.html|SRFI document]].
26
27The
28[[https://srfi.schemers.org/srfi-135/srfi-135.html#ExternalRepresentation|external representation]]
29of texts specified by SRFI 135 is not yet supported.
30
31== Conceptual model
32
33Immutable texts are like strings except they can't be mutated.
34
35Immutability makes it easier to use space-efficient representations
36such as UTF-8 and UTF-16 without incurring the cost of scanning from
37the beginning when character indexes are used (as with string-ref).
38
39When mutation is not needed, immutable texts are likely to be more
40efficient than strings with respect to space or time. In some
41implementations, immutable texts may be more efficient than strings
42with respect to both space and time.
43
44== Subtypes
45
46This SRFI defines two new types:
47
48* ''text'' is a type consisting of the immutable texts for which {{text?}} returns true.
49
50* ''textual'' is a union type consisting of the texts and strings for which {{textual?}} returns true.
51
52The subtypes of the new ''textual'' type include the new ''text'' type and
53Scheme's traditional ''string'' type, which consists of the values for
54which {{string?}} returns true. The string type includes both mutable
55strings and the (conceptually) immutable strings that are the values
56of string literals and calls to {{symbol->string}}.
57
58== Notation
59
60In the following procedure specifications:
61
62* A ''text'' argument is an immutable text.
63
64* A ''textual'' argument is an immutable text or a string.
65
66* A ''char'' argument is a character.
67
68* An ''idx'' argument is an exact non-negative integer specifying a valid character index into a text or string. The valid character indexes of a text or string ''textual'' of length ''n'' are the exact integers ''idx'' satisfying 0 ≀ ''idx'' < ''n''.
69
70* A ''k'' argument or result is a ''position'': an exact non-negative integer that is either a valid character index for one of the textual arguments or is the length of a textual argument.
71
72* ''start'' and ''end'' arguments are positions specifying a half-open interval of indexes for a subtext or substring. When omitted, ''start'' defaults to 0 and ''end'' to the length of the corresponding ''textual'' argument. It is an error unless 0 ≀ ''start'' ≀ ''end'' ≀ {{(textual-length textual)}}.
73
74* A ''len'' or ''nchars'' argument is an exact non-negative integer specifying some number of characters, usually the length of a text or string.
75
76* A ''pred'' argument is a unary character predicate, taking a character as its one argument and returning a value that will be interpreted as true or false. Unless noted otherwise, as with {{textual-every}} and {{textual-any}}, all predicates passed to procedures specified in this SRFI may be called in any order and any number of times. It is an error if ''pred'' has side effects or does not behave functionally (returning the same result whenever it is called with the same character); the implementation does not detect those errors.
77
78* An ''obj'' argument may be any value at all.
79
80It is an error to pass values that violate the specification above.
81
82Arguments given in square brackets are optional. Unless otherwise
83noted in the text describing the procedure, any prefix of these
84optional arguments may be supplied, from zero arguments to the full
85list. When a procedure returns multiple values, this is shown by
86listing the return values in square brackets, as well.
87
88== Procedures
89
90=== Predicates
91
92<procedure>(text? obj) → boolean</procedure>
93
94Is ''obj'' an immutable text? In particular, {{(text?}} ''obj''{{)}}
95returns false if {{(string?}} ''obj''{{)}} returns true, which
96implies {{string?}} returns false if {{text?}} returns true.
97
98<procedure>(textual? obj) → boolean</procedure>
99
100Returns true if and only ''obj'' is an immutable text or a string.
101
102<procedure>(textual-null? text) → boolean</procedure>
103
104Is ''text'' the empty text?
105
106<procedure>(textual-every pred textual [start end]) → value</procedure>
107<procedure>(textual-any pred textual [start end]) → value</procedure>
108
109Checks to see if every/any character in ''textual'' satisfies
110''pred'', proceeding from left (index ''start'') to right (index
111''end''). These procedures are short-circuiting: if ''pred'' returns
112false, {{textual-every}} does not call ''pred'' on subsequent characters;
113if ''pred'' returns true, {{textual-any}} does not call ''pred'' on
114subsequent characters; Both procedures are "witness-generating":
115
116* If {{textual-every}} is given an empty interval (with ''start'' = ''end''), it returns {{#t}}.
117
118* If {{textual-every}} returns true for a non-empty interval (with ''start'' < ''end''), the returned true value is the one returned by the final call to the predicate on {{(text-ref (textual-copy}} ''text''{{) (-}} ''end'' {{1))}}.
119
120* If {{textual-any}} returns true, the returned true value is the one returned by the predicate.
121
122Note: The names of these procedures do not end with a
123question mark. This indicates a general value is returned
124instead of a simple boolean ({{#t}} or {{#f}}).
125
126=== Constructors
127
128<procedure>(make-text len char) → text</procedure>
129
130Returns a text of the given length filled with the given
131character.
132
133<procedure>(text char ...) → text</procedure>
134
135Returns a text consisting of the given characters.
136
137<procedure>(text-tabulate proc len) → text</procedure>
138
139''proc'' is a procedure that accepts an exact integer as its
140argument and returns a character. Constructs a text of
141size ''len'' by calling ''proc'' on each value from 0 (inclusive)
142to ''len'' (exclusive) to produce the corresponding element
143of the text. The order in which ''proc'' is called on those
144indexes is not specified.
145
146<procedure>(text-unfold stop? mapper successor seed [base make-final]) → text</procedure>
147This is a fundamental constructor for texts.
148
149* ''successor'' is used to generate a series of "seed" values from the initial seed: ''seed'', (''successor seed''), (''successor''^2 ''seed''), (''successor''^3 ''seed''), ...
150
151* ''stop?'' tells us when to stop — when it returns true when applied to one of these seed values.
152
153* ''mapper'' maps each seed value to the corresponding character(s) in the result text, which are assembled into that text in left-to-right order. It is an error for ''mapper'' to return anything other than a character, string, or text.
154
155* ''base'' is the optional initial/leftmost portion of the constructed text, which defaults to the empty text {{(text)}}. It is an error if base is anything other than a character, string, or text.
156
157* ''make-final'' is applied to the terminal seed value (on which ''stop?'' returns true) to produce the final/rightmost portion of the constructed text. It defaults to {{(lambda (x) (text))}}. It is an error for ''make-final'' to return anything other than a character, string, or text.
158
159{{text-unfold}} is a fairly powerful text constructor. You
160can use it to convert a list to a text, read a port into
161a text, reverse a text, copy a text, and so forth.
162Examples:
163
164<enscript highlight="scheme">
165(port->text p) = (text-unfold eof-object?
166                              values
167                              (lambda (x) (read-char p))
168                              (read-char p))
169
170(list->text lis) = (text-unfold null? car cdr lis)
171
172(text-tabulate f size) = (text-unfold (lambda (i) (= i size)) f add1 0)
173
174;; To map f over a list lis, producing a text:
175(text-unfold null? (compose f car) cdr lis)
176</enscript>
177
178<procedure>(text-unfold-right stop? mapper successor seed [base make-final]) → text</procedure>
179
180This is a fundamental constructor for texts. It is the
181same as {{text-unfold}} except the results of ''mapper'' are
182assembled into the text in right-to-left order, ''base'' is
183the optional rightmost portion of the constructed text,
184and ''make-final'' produces the leftmost portion of the
185constructed text.
186
187<enscript highlight="scheme">
188(text-unfold-right (lambda (n) (< n (char->integer #\A)))
189                   (lambda (n) (char-downcase (integer->char n)))
190                   (lambda (n) (- n 1))
191                   (char->integer #\Z)
192                   #\space
193                   (lambda (n) " The English alphabet: "))
194  ⇒ « The English alphabet: abcdefghijklmnopqrstuvwxyz »
195</enscript>
196
197=== Conversion
198
199<procedure>(textual->text textual) → text</procedure>
200
201When given a text, {{textual->text}} just returns that text.
202When given a string, {{textual->text}} returns the result of
203calling {{string->text}} on that string. Signals an error
204when its argument is neither string nor text.
205
206<procedure>(textual->string textual [start end]) → string</procedure>
207<procedure>(textual->vector textual [start end]) → char-vector</procedure>
208<procedure>(textual->list textual [start end]) → char-list</procedure>
209
210{{textual->string}}, {{textual->vector}}, and {{textual->list}}
211return a newly allocated (unless empty) mutable string,
212vector, or list of the characters that make up the given
213subtext or substring.
214
215<procedure>(string->text string [start end]) → text</procedure>
216<procedure>(vector->text char-vector [start end]) → text</procedure>
217<procedure>(list->text char-list [start end]) → text</procedure>
218
219These procedures return a text containing the characters
220of the given substring, subvector, or sublist. The
221behavior of the text will not be affected by subsequent
222mutation of the given string, vector, or list.
223
224<procedure>(reverse-list->text char-list) → text</procedure>
225
226An efficient implementation of {{(compose list->text reverse)}}:
227
228<enscript highlight="scheme">
229(reverse-list->text '(#\a #\B #\c)) → «cBa»
230</enscript>
231
232This is a common idiom in the epilogue of text-processing
233loops that accumulate their result using a list in
234reverse order. (See also {{textual-concatenate-reverse}} for
235the "chunked" variant.)
236
237<procedure>(textual->utf8 textual [start end]) → bytevector</procedure>
238<procedure>(textual->utf16 textual [start end]) → bytevector</procedure>
239<procedure>(textual->utf16be textual [start end]) → bytevector</procedure>
240<procedure>(textual->utf16le textual [start end]) → bytevector</procedure>
241
242These procedures return a newly allocated (unless empty)
243bytevector containing a UTF-8 or UTF-16 encoding of the
244given subtext or substring.
245
246The bytevectors returned by {{textual->utf8}},
247{{textual->utf16be}}, and {{textual->utf16le}} do not contain a
248byte-order mark (BOM). {{textual->utf16be}} returns a
249big-endian encoding, while {{textual->utf16le}} returns a
250little-endian encoding.
251
252The bytevectors returned by {{textual->utf16}} begin with a
253BOM that declares an implementation-dependent endianness,
254and the bytevector elements following that BOM encode the
255given subtext or substring using that endianness.
256
257<procedure>(utf8->text bytevector [start end]) → text</procedure>
258<procedure>(utf16->text bytevector [start end]) → text</procedure>
259<procedure>(utf16be->text bytevector [start end]) → text</procedure>
260<procedure>(utf16le->text bytevector [start end]) → text</procedure>
261
262These procedures interpret their bytevector argument as a
263UTF-8 or UTF-16 encoding of a sequence of characters, and
264return a text containing that sequence.
265
266The bytevector subrange given to {{utf16->text}} may begin
267with a byte order mark (BOM); if so, that BOM determines
268whether the rest of the subrange is to be interpreted as
269big-endian or little-endian; in either case, the BOM will
270not become a character in the returned text. If the
271subrange does not begin with a BOM, it is decoded using
272the same implementation-dependent endianness used by
273{{textual->utf16}}.
274
275The {{utf16be->text}} and {{utf16le->text}} procedures interpret
276their inputs as big-endian or little-endian,
277respectively. If a BOM is present, it is treated as a
278normal character and will become part of the result.
279
280It is an error if the bytevector subrange given to
281{{utf8->text}} contains invalid UTF-8 byte sequences. For the
282other three procedures, it is an error if start or end
283are odd, or if the bytevector subrange contains invalid
284UTF-16 byte sequences.
285
286=== Selection
287
288<procedure>(text-length text) → len</procedure>
289
290Returns the number of characters within the given text.
291
292<procedure>(text-ref text idx) → char</procedure>
293
294Returns character {{text[idx]}}, using 0-origin indexing.
295
296<procedure>(textual-length textual) → len</procedure>
297<procedure>(textual-ref textual idx) → char</procedure>
298
299{{textual-length}} returns the number of characters in
300''textual'', and {{textual-ref}} returns the character at
301character index ''idx'', using 0-origin indexing. These
302procedures are the generalizations of {{text-length}} and
303{{text-ref}} to accept strings as well as texts. If ''textual''
304is a text, they must execute in O(1) time, but there is
305no such requirement if ''textual'' is a string.
306
307<procedure>(subtext text start end) → text</procedure>
308<procedure>(subtextual textual start end) → text</procedure>
309
310These procedures return a text containing the characters
311of ''text'' or ''textual'' beginning with index ''start'' (inclusive)
312and ending with index ''end'' (exclusive).
313
314If ''textual'' is a string, then that string does not share any
315storage with the result, so subsequent mutation of that
316string will not affect the text returned by {{subtextual}}.
317When the first argument is a text, as is required by
318{{subtext}}, the implementation returns a result that shares
319storage with that text. These procedures just return their
320first argument when that argument is a text, ''start'' is 0, and
321''end'' is the length of that text.
322
323<procedure>(textual-copy textual [start end]) → text</procedure>
324
325Returns a text containing the characters of ''textual''
326beginning with index ''start'' (inclusive) and ending with
327index ''end'' (exclusive).
328
329Unlike {{subtext}} and {{subtextual}}, the result of {{textual-copy}}
330never shares substructures that would retain characters
331or sequences of characters that are substructures of its
332first argument or previously allocated objects.
333
334If {{textual-copy}} returns an empty text, that empty text
335may be {{eq?}} or {{eqv?}} to the text returned by ''(text)''.
336If the text returned by {{textual-copy}} is non-empty, then it
337is not {{eqv?}} to any previously extant object.
338
339<procedure>(textual-take textual nchars) → text</procedure>
340<procedure>(textual-drop textual nchars) → text</procedure>
341<procedure>(textual-take-right textual nchars) → text</procedure>
342<procedure>(textual-drop-right textual nchars) → text</procedure>
343
344{{textual-take}} returns a text containing the first ''nchars''
345of ''textual''; {{textual-drop}} returns a text containing all
346but the first ''nchars'' of ''textual''. {{textual-take-right}}
347returns a text containing the last ''nchars'' of ''textual'';
348{{textual-drop-right}} returns a text containing all but the
349last ''nchars'' of ''textual''.
350
351If ''textual'' is a string, then that string does not share
352any storage with the result, so subsequent mutation of
353that string will not affect the text returned by these
354procedures. If ''textual'' is a text, the result shares storage with
355that text.
356
357<enscript highlight="scheme">
358(textual-take "Pete Szilagyi" 6) ⇒ «Pete S»
359(textual-drop "Pete Szilagyi" 6) ⇒ «zilagyi»
360
361(textual-take-right "Beta rules" 5) ⇒ «rules»
362(textual-drop-right "Beta rules" 5) ⇒ «Beta »
363
364;; It is an error to take or drop more characters than are
365;; in the text:
366(textual-take "foo" 37) ⇒ error
367</enscript>
368
369<procedure>(textual-pad textual len [char start end]) → text</procedure>
370<procedure>(textual-pad-right textual len [char start end]) → text</procedure>
371
372Returns a text of length ''len'' comprised of the characters
373drawn from the given subrange of ''textual'', padded on the
374left (right) by as many occurrences of the character ''char''
375as needed. If ''textual'' has more than ''len'' chars, it is
376truncated on the left (right) to length ''len''. ''char''
377defaults to {{#\space}}.
378
379If ''textual'' is a string, then that string does not share
380any storage with the result, so subsequent mutation of
381that string will not affect the text returned by these
382procedures. If ''textual'' is a text, the result shares storage
383with that text whenever sharing would be space-efficient.
384
385<enscript highlight="scheme">
386(textual-pad "325" 5) ⇒ «  325»
387(textual-pad "71325" 5) ⇒ «71325»
388(textual-pad "8871325" 5) ⇒ «71325»
389</enscript>
390
391<procedure>(textual-trim textual [pred start end]) → text</procedure>
392<procedure>(textual-trim-right textual [pred start end]) → text</procedure>
393<procedure>(textual-trim-both textual [pred start end]) → text</procedure>
394
395Returns a text obtained from the given subrange of
396''textual'' by skipping over all characters on the left / on
397the right / on both sides that satisfy the second
398argument ''pred'': ''pred'' defaults to {{char-whitespace?}}.
399
400If ''textual'' is a string, then that string does not share
401any storage with the result, so subsequent mutation of
402that string will not affect the text returned by these
403procedures. If ''textual'' is a text, the result shares storage
404with that text whenever sharing would be space-efficient.
405
406<enscript highlight="scheme">
407(textual-trim-both "  The outlook wasn't brilliant,  \n\r") ⇒ «The outlook wasn't brilliant,»
408</enscript>
409
410=== Replacement
411
412<procedure>(textual-replace textual1 textual2 start1 end1 [start2 end2]) → text</procedure>
413
414Returns
415
416<enscript highlight="scheme">
417(textual-append (subtextual textual1 0 start1)
418(subtextual textual2 start2 end2)
419(subtextual textual1 end1 (textual-length textual1)))
420</enscript>
421
422That is, the segment of characters in ''textual1'' from
423''start1'' to ''end1'' is replaced by the segment of characters
424in ''textual2'' from ''start2'' to ''end2''. If ''start1'' = ''end1'',
425this simply splices the characters drawn from ''textual2'' into
426''textual1'' at that position.
427
428Examples:
429
430<enscript highlight="scheme">
431(textual-replace "The TCL programmer endured daily ridicule."
432                 "another miserable perl drone"
433                 4
434                 7
435                 8
436                 22)
437⇒ «The miserable perl programmer endured daily ridicule.»
438
439(textual-replace "It's easy to code it up in Scheme."
440                 "lots of fun"
441                 5
442                 9)
443⇒ «It's lots of fun to code it up in Scheme.»
444
445(define (textual-insert s i t) (textual-replace s t i i))
446
447(textual-insert "It's easy to code it up in Scheme." 5 "really ")
448⇒ «It's really easy to code it up in Scheme.»
449
450(define (textual-set s i c) (textual-replace s (text c) i (+ i 1)))
451
452(textual-set "Text-ref runs in O(n) time." 19 #\1)
453⇒ «Text-ref runs in O(1) time.»
454</enscript>
455
456=== Comparison
457
458<procedure>(textual=? textual1 textual2 textual3 ...) → boolean</procedure>
459
460Returns {{#t}} if all the texts have the same length and
461contain exactly the same characters in the same
462positions; otherwise returns {{#f}}.
463
464<procedure>(textual<?  textual1 textual2 textual3 ...) → boolean</procedure>
465<procedure>(textual>?  textual1 textual2 textual3 ...) → boolean</procedure>
466<procedure>(textual<=? textual1 textual2 textual3 ...) → boolean</procedure>
467<procedure>(textual>=? textual1 textual2 textual3 ...) → boolean</procedure>
468
469These procedures compare their arguments lexicographically and
470return {{#t}} if they are (respectively): monotonically increasing,
471monotonically decreasing, monotonically non-decreasing, or
472monotonically non-increasing.
473
474These comparison predicates are transitive.
475
476<procedure>(textual-ci=? textual1 textual2 textual3 ...) → boolean</procedure>
477
478Returns {{#t}} if, after calling {{textual-foldcase}} on each of
479the arguments, all of the case-folded texts would have
480the same length and contain the same characters in the
481same positions; otherwise returns {{#f}}.
482
483<procedure>(textual-ci<?  textual1 textual2 textual3 ...) → boolean</procedure>
484<procedure>(textual-ci>?  textual1 textual2 textual3 ...) → boolean</procedure>
485<procedure>(textual-ci<=? textual1 textual2 textual3 ...) → boolean</procedure>
486<procedure>(textual-ci>=? textual1 textual2 textual3 ...) → boolean</procedure>
487
488These procedures behave as though they had called
489{{textual-foldcase}} on their arguments before applying the
490corresponding procedures without "{{-ci}}".
491
492=== Prefixes & suffixes
493
494<procedure>(textual-prefix-length textual1 textual2 [start1 end1 start2 end2]) → integer</procedure>
495<procedure>(textual-suffix-length textual1 textual2 [start1 end1 start2 end2]) → integer</procedure>
496
497Return the length of the longest common prefix/suffix of
498''textual1'' and ''textual2''. For prefixes, this is equivalent
499to their "mismatch index" (relative to the start
500indexes).
501
502The optional ''start''/''end'' indexes restrict the comparison to
503the indicated subtexts of ''textual1'' and ''textual2''.
504
505<procedure>(textual-prefix? textual1 textual2 [start1 end1 start2 end2]) → boolean</procedure>
506<procedure>(textual-suffix? textual1 textual2 [start1 end1 start2 end2]) → boolean</procedure>
507
508Is ''textual1'' a prefix/suffix of ''textual2''?
509
510The optional start/end indexes restrict the comparison to
511the indicated subtexts of ''textual1'' and ''textual2''.
512
513=== Searching
514
515<procedure>(textual-index textual pred [start end]) → idx-or-false</procedure>
516<procedure>(textual-index-right textual pred [start end]) → idx-or-false</procedure>
517<procedure>(textual-skip textual pred [start end]) → idx-or-false</procedure>
518<procedure>(textual-skip-right textual pred [start end]) → idx-or-false</procedure>
519
520''textual-index'' searches through the given subtext or
521substring from the left, returning the index of the
522leftmost character satisfying the predicate ''pred''.
523{{textual-index-right}} searches from the right, returning
524the index of the rightmost character satisfying the
525predicate ''pred''. If no match is found, these procedures
526return {{#f}}.
527
528The ''start'' and ''end'' arguments specify the beginning and end
529of the search; the valid indexes relevant to the search
530include ''start'' but exclude ''end''. Beware of "fencepost"
531errors: when searching right-to-left, the first index
532considered is {{(-}} ''end'' {{1)}}, whereas when searching
533left-to-right, the first index considered is ''start''. That
534is, the start/end indexes describe the same half-open
535interval [''start'',''end'') in these procedures that they do in
536all other procedures specified by this SRFI.
537
538The skip functions are similar, but use the complement of
539the criterion: they search for the first char that
540doesn't satisfy ''pred''. To skip over initial whitespace,
541for example, say
542
543<enscript highlight="scheme">
544(subtextual text
545            (or (textual-skip text char-whitespace?)
546                (textual-length text))
547            (textual-length text))
548</enscript>
549
550These functions can be trivially composed with
551{{textual-take}} and {{textual-drop}} to produce take-while,
552drop-while, span, and break procedures without loss of
553efficiency.
554
555<procedure>(textual-contains textual1 textual2 [start1 end1 start2 end2]) → idx-or-false</procedure>
556<procedure>(textual-contains-right textual1 textual2 [start1 end1 start2 end2]) → idx-or-false</procedure>
557
558Does the subtext of ''textual1'' specified by ''start1'' and ''end1''
559contain the sequence of characters given by the subtext
560of ''textual2'' specified by ''start2'' and ''end2''?
561
562Returns {{#f}} if there is no match. If ''start2'' = ''end2'',
563{{textual-contains}} returns ''start1'' but
564{{textual-contains-right}} returns ''end1''. Otherwise returns
565the index in ''textual1'' for the first character of the
566first/last match; that index lies within the half-open
567interval [''start1'',''end1''), and the match lies entirely
568within the [''start1'',''end1'') range of ''textual1''.
569
570<enscript highlight="scheme">
571;; Searches "a geek"
572(textual-contains "eek -- what a geek." "ee" 12 18) ⇒ 15
573</enscript>
574
575Note: The names of these procedures do not end with a
576question mark. This indicates a useful value is returned
577when there is a match.
578
579=== Case conversion
580
581<procedure>(textual-upcase textual) → text</procedure>
582<procedure>(textual-downcase textual) → text</procedure>
583<procedure>(textual-foldcase textual) → text</procedure>
584<procedure>(textual-titlecase textual) → text</procedure>
585
586These procedures return the text obtained by applying
587Unicode's full uppercasing, lowercasing, case-folding, or
588title-casing algorithms to their argument. In some cases,
589the length of the result may be different from the length
590of the argument. Note that language-sensitive mappings
591and foldings are not used.
592
593=== Concatenation
594
595<procedure>(textual-append textual ...) → text</procedure>
596
597Returns a text whose sequence of characters is the
598concatenation of the sequences of characters in the given
599arguments.
600
601<procedure>(textual-concatenate textual-list) → text</procedure>
602
603Concatenates the elements of {{textual-list}} together into a
604single text.
605
606If any elements of {{textual-list}} are strings, then those
607strings do not share any storage with the result, so
608subsequent mutation of those string will not affect the
609text returned by this procedure. The result shares storage with
610the texts in the list if that sharing would be space-efficient.
611
612<procedure>(textual-concatenate-reverse textual-list [final-textual end]) → text</procedure>
613
614With no optional arguments, calling this procedure is
615equivalent to
616<enscript highlight="scheme">
617(textual-concatenate (reverse textual-list))
618</enscript>
619If the optional argument ''final-textual'' is specified, it
620is effectively consed onto the beginning of ''textual-list''
621before performing the list-reverse and
622{{textual-concatenate}} operations.
623
624If the optional argument ''end'' is given, only the
625characters up to but not including ''end'' in ''final-textual''
626are added to the result, thus producing
627<enscript highlight="scheme">
628(textual-concatenate
629 (reverse (cons (subtext final-textual 0 end)
630                textual-list)))
631</enscript>
632For example:
633<enscript highlight="scheme">
634(textual-concatenate-reverse '(" must be" "Hello, I") " going.XXXX" 7)
635 â‡’ «Hello, I must be going.»
636</enscript>
637
638<procedure>(textual-join textual-list [delimiter grammar]) → text</procedure>
639
640This procedure is a simple unparser; it pastes texts
641together using the delimiter text.
642
643''textual-list'' is a list of texts and/or strings. ''delimiter''
644is a text or a string. The ''grammar'' argument is a symbol
645that determines how the delimiter is used, and defaults
646to {{infix}}. It is an error for ''grammar'' to be any symbol
647other than these four:
648
649* {{infix}} means an infix or separator grammar: insert the delimiter between list elements. An empty list will produce an empty text.
650
651* {{strict-infix}} means the same as {{infix}} if the textual-list is non-empty, but will signal an error if given an empty list. (This avoids an ambiguity shown in the examples below.)
652
653* {{suffix}} means a suffix or terminator grammar: insert the delimiter after every list element.
654
655* {{prefix}} means a prefix grammar: insert the delimiter before every list element.
656
657The delimiter is the text used to delimit elements; it
658defaults to a single space " ".
659
660<enscript highlight="scheme">
661(textual-join '("foo" "bar" "baz")) ⇒ «foo bar baz»
662(textual-join '("foo" "bar" "baz") "") ⇒ «foobarbaz»
663(textual-join '("foo" "bar" "baz") «:») ⇒ «foo:bar:baz»
664(textual-join '("foo" "bar" "baz") ":" 'suffix) ⇒ «foo:bar:baz:»
665
666;; Infix grammar is ambiguous wrt empty list vs. empty text:
667(textual-join '()   ":") ⇒ «»
668(textual-join '("") ":") ⇒ «»
669
670;; Suffix and prefix grammars are not:
671(textual-join '()   ":" 'suffix)) ⇒ «»
672(textual-join '("") ":" 'suffix)) ⇒ «:»
673</enscript>
674
675=== Fold, map & friends
676
677<procedure>(textual-fold kons knil textual [start end]) → value</procedure>
678<procedure>(textual-fold-right kons knil textual [start end]) → value</procedure>
679
680These are the fundamental iterators for texts.
681
682The ''textual-fold'' procedure maps the ''kons'' procedure across
683the given text or string from left to right:
684
685<enscript highlight="scheme">
686(... (kons textual[2] (kons textual[1] (kons textual[0] knil))))
687</enscript>
688
689In other words, ''textual-fold'' obeys the (tail) recursion
690
691<enscript highlight="scheme">
692(textual-fold kons knil textual start end) = (textual-fold kons (kons textual[start] knil) start+1 end)
693</enscript>
694
695The {{textual-fold-right}} procedure maps ''kons'' across the
696given text or string from right to left:
697
698<enscript highlight="scheme">
699(kons textual[0]
700      (... (kons textual[end-3]
701                 (kons textual[end-2]
702                       (kons textual[end-1] knil)))))
703</enscript>
704
705obeying the (tail) recursion
706
707<enscript highlight="scheme">
708(textual-fold-right kons knil textual start end) = (textual-fold-right kons (kons textual[end-1] knil) start end-1)
709</enscript>
710
711Examples:
712
713<enscript highlight="scheme">
714;; Convert a text or string to a list of chars.
715(textual-fold-right cons '() textual)
716
717;; Count the number of lower-case characters in a text or string.
718(textual-fold (lambda (c count)
719                (if (char-lower-case? c) (+ count 1) count))
720              0
721              textual)
722</enscript>
723
724The textual-fold-right combinator is sometimes called a
725"catamorphism."
726
727<procedure>(textual-map proc textual1 textual2 ...) → text</procedure>
728
729It is an error if ''proc'' does not accept as many arguments
730as the number of ''textual'' arguments passed to {{textual-map}},
731does not accept characters as arguments, or returns a
732value that is not a character, string, or text.
733
734The textual-map procedure applies ''proc'' element-wise to
735the characters of the ''textual'' arguments, converts each
736value returned by ''proc'' to a text, and returns the
737concatenation of those texts. If more than one ''textual''
738argument is given and not all have the same length, then
739{{textual-map}} terminates when the shortest ''textual'' argument
740runs out. The dynamic order in which ''proc'' is called on
741the characters of the ''textual'' arguments is unspecified,
742as is the dynamic order in which the coercions are
743performed. If any strings returned by ''proc'' are mutated
744after they have been returned and before the call to
745{{textual-map}} has returned, then {{textual-map}} returns a text
746with unspecified contents; the {{textual-map}} procedure
747itself does not mutate those strings.
748
749Example:
750
751<enscript highlight="scheme">
752(textual-map (lambda (c0 c1 c2)
753               (case c0
754                 ((#\1) c1)
755                 ((#\2) (string c2))
756                 ((#\-) (text #\- c1))))
757             (string->text "1222-1111-2222")
758             (string->text "Hi There!")
759             (string->text "Dear John"))
760  ⇒ «Hear-here!»
761</enscript>
762
763<procedure>(textual-for-each proc textual1 textual2 ...) → unspecified</procedure>
764
765It is an error if ''proc'' does not accept as many arguments
766as the number of ''textual'' arguments passed to {{textual-for-each}}
767or does not accept characters as arguments.
768
769The {{textual-for-each}} procedure applies ''proc'' element-wise
770to the characters of the ''textual'' arguments, going from
771left to right. If more than one ''textual'' argument is given
772and not all have the same length, then {{textual-for-each}}
773terminates when the shortest ''textual'' argument runs out.
774
775<procedure>(textual-map-index proc textual [start end]) → text</procedure>
776
777Calls ''proc'' on each valid index of the specified subtext
778or substring, converts the results of those calls into
779texts, and returns the concatenation of those texts. It
780is an error for ''proc'' to return anything other than a
781character, string, or text. The dynamic order in which
782''proc'' is called on the indexes is unspecified, as is the
783dynamic order in which the coercions are performed. If
784any strings returned by ''proc'' are mutated after they have
785been returned and before the call to {{textual-map-index}}
786has returned, then {{textual-map-index}} returns a text with
787unspecified contents; the {{textual-map-index}} procedure
788itself does not mutate those strings.
789
790<procedure>(textual-for-each-index proc textual [start end]) → unspecified</procedure>
791
792Calls ''proc'' on each valid index of the specified subtext
793or substring, in increasing order, discarding the results
794of those calls. This is simply a safe and correct way to
795loop over a subtext or substring.
796
797Example:
798
799<enscript highlight="scheme">
800(let ((txt (string->text "abcde"))
801      (v '()))
802  (textual-for-each-index
803   (lambda (cur) (set! v (cons (char->integer (text-ref txt cur)) v)))
804   txt)
805  v) ⇒ (101 100 99 98 97)
806</enscript>
807
808<procedure>(textual-count textual pred [start end]) → integer</procedure>
809
810Returns a count of the number of characters in the
811specified subtext of {{textual}} that satisfy the given
812predicate.
813
814<procedure>(textual-filter pred textual [start end]) → text</procedure>
815<procedure>(textual-remove pred textual [start end]) → text</procedure>
816
817Filter the given subtext of ''textual'', retaining only those
818characters that satisfy / do not satisfy ''pred''.
819
820If ''textual'' is a string, then that string does not share
821any storage with the result, so subsequent mutation of
822that string will not affect the text returned by these
823procedures. If ''textual'' is a text, the result shares storage
824with that text whenever sharing would be space-efficient.
825
826=== Replication & splitting
827
828<procedure>(textual-replicate textual from to [start end]) → text</procedure>
829
830This is an "extended subtext" procedure that implements
831replicated copying of a subtext or substring.
832
833''textual'' is a text or string; ''start'' and ''end'' are optional
834arguments that specify a subtext of ''textual'', defaulting
835to 0 and the length of ''textual''. This subtext is
836conceptually replicated both up and down the index space,
837in both the positive and negative directions. For
838example, if ''textual'' is {{"abcdefg"}}, ''start'' is 3, and ''end''
839is 6, then we have the conceptual bidirectionally-infinite
840text
841
842 ...  d  e  f  d  e  f  d  e  f  d  e  f  d  e  f  d  e  f  d ...
843     -9 -8 -7 -6 -5 -4 -3 -2 -1  0 +1 +2 +3 +4 +5 +6 +7 +8 +9
844
845{{textual-replicate}} returns the subtext of this text
846beginning at index ''from'', and ending at ''to''. It is an error
847if ''from'' is greater than ''to''.
848
849You can use {{textual-replicate}} to perform a variety of
850tasks:
851
852* To rotate a text left: {{(textual-replicate "abcdef" 2 8)}} ⇒ {{«cdefab»}}
853
854* To rotate a text right: {{(textual-replicate "abcdef" -2 4)}} ⇒ {{«efabcd»}}
855
856* To replicate a text: {{(textual-replicate "abc" 0 7)}} ⇒ {{«abcabca»}}
857
858Note that
859
860* The ''from''/''to'' arguments give a half-open range containing the characters from index ''from'' up to, but not including, index ''to''.
861
862* The ''from''/''to'' indexes are not expressed in the index space of ''textual''. They refer instead to the replicated index space of the subtext defined by ''textual'', ''start'', and ''end''.
863
864It is an error if ''start'' = ''end'', unless ''from'' = ''to'', which is
865allowed as a special case.
866
867<procedure>(textual-split textual delimiter [grammar limit start end]) → list</procedure>
868
869Returns a list of texts representing the words contained
870in the subtext of ''textual'' from ''start'' (inclusive) to ''end''
871(exclusive). The ''delimiter'' is a text or string to be used
872as the word separator. This will often be a single
873character, but multiple characters are allowed for use
874cases such as splitting on {{"\r\n"}}. The returned list will
875have one more item than the number of non-overlapping
876occurrences of the delimiter in the text. If ''delimiter'' is
877an empty text, then the returned list contains a list of
878texts, each of which contains a single character.
879
880The ''grammar'' is a symbol with the same meaning as in the
881{{textual-join}} procedure. If it is ''infix'', which is the
882default, processing is done as described above, except an
883empty ''textual'' produces the empty list; if ''grammar'' is
884''strict-infix'', then an empty ''textual'' signals an error. The
885values ''prefix'' and ''suffix'' cause a leading/trailing empty
886text in the result to be suppressed.
887
888If ''limit'' is a non-negative exact integer, at most that
889many splits occur, and the remainder of ''textual'' is
890returned as the final element of the list (so the result
891will have at most ''limit''+1 elements). If ''limit'' is not
892specified or is {{#f}}, then as many splits as possible are
893made. It is an error if ''limit'' is any other value.
894
895To split on a regular expression ''re'', use SRFI 115's
896{{regexp-split}} procedure:
897
898<enscript highlight="scheme">
899(map string->text (regexp-split re (textual->string txt)))
900</enscript>
901
902== About This Egg
903
904=== Dependencies
905
906The [[https://wiki.call-cc.org/eggref/5/utf8|utf8]] and
907[[https://wiki.call-cc.org/eggref/5/r7rs|r7rs]] eggs are required.
908
909=== Maintainer
910
911Wolfgang Corcoran-Mathe <wcm at sigwinch dot xyzzy without the zy>
912
913=== Repository
914
915[[https://github.com/Zipheir/srfi-135|GitHub]]
916
917=== Version History
918
919; 0.1 : (2020-11-12) Initial release.
920
921== License
922
923Copyright (C) William D Clinger (2016). All Rights Reserved.
924
925Permission is hereby granted, free of charge, to any person obtaining
926a copy of this software and associated documentation files (the
927"Software"), to deal in the Software without restriction, including
928without limitation the rights to use, copy, modify, merge, publish,
929distribute, sublicense, and/or sell copies of the Software, and to
930permit persons to whom the Software is furnished to do so, subject to
931the following conditions:
932
933The above copyright notice and this permission notice shall be
934included in all copies or substantial portions of the Software.
935
936THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
937EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
938MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
939IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
940CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
941TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
942SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Note: See TracBrowser for help on using the repository browser.