str/tokenize
Pure-Gleam tokenizer for grapheme cluster segmentation.
Handles base characters followed by combining marks, variation selectors, skin-tone modifiers and simple ZWJ sequences. This is a pedagogical reference implementation, not a full UAX #29 implementation.
Values
pub fn chars(text: String) -> List(String)
Returns a list of grapheme clusters for the input string. Uses a pure-Gleam approximation of grapheme segmentation.
chars(“café”) -> [“c”, “a”, “f”, “é”]
pub fn chars_stdlib(text: String) -> List(String)
Uses the BEAM stdlib grapheme segmentation (more accurate).
chars_stdlib(“café”) -> [“c”, “a”, “f”, “é”]