str/tokenize

Pure-Gleam tokenizer for grapheme cluster segmentation.

Handles base characters followed by combining marks, variation selectors, skin-tone modifiers and simple ZWJ sequences. This is a pedagogical reference implementation, not a full UAX #29 implementation.

Values

pub fn chars(text: String) -> List(String)

Returns a list of grapheme clusters for the input string. Uses a pure-Gleam approximation of grapheme segmentation.

chars(“café”) -> [“c”, “a”, “f”, “é”]

pub fn chars_stdlib(text: String) -> List(String)

Uses the BEAM stdlib grapheme segmentation (more accurate).

chars_stdlib(“café”) -> [“c”, “a”, “f”, “é”]

Search Document