str/tokenize

Pure-Gleam tokenizer for grapheme cluster segmentation.

Handles base characters followed by combining marks, variation selectors, skin-tone modifiers and simple ZWJ sequences. This is an experimental pure-Gleam implementation that approximates grapheme segmentation. For production use the BEAM-provided string.to_graphemes via chars_stdlib/1, which follows the platform’s grapheme segmentation.

Values

pub fn chars(text: String) -> List(String)

Returns a list of grapheme clusters for the input string. This chars/1 function is an experimental, pure-Gleam tokenizer that approximates grapheme cluster segmentation. It is useful when you need a tokenizer implemented purely in Gleam (e.g. for understanding or environments where you prefer not to depend on the BEAM helper), but it may differ in edge cases from the BEAM stdlib implementation. Prefer chars_stdlib/1 for production code where accuracy and performance are important.

Example: chars(“café”) -> [“c”, “a”, “f”, “é”]

pub fn chars_stdlib(text: String) -> List(String)

Uses the BEAM stdlib grapheme segmentation (more accurate).

chars_stdlib(“café”) -> [“c”, “a”, “f”, “é”]

Search Document