Skip to main content

Text.tokenize

tokenizepatterncase_sensitivity

Group: Conversions

Documentation

Takes an input string and and a pattern and returns all the matches as a Vector Text. If the pattern contains marked groups, the values are concatenated together; otherwise the whole match is returned.

Arguments

  • input: The text to tokenize.
  • case_sensitivity: Specifies if the text values should be compared case sensitively. The values are compared case sensitively by default.

Examples

Split to blocks of 3 characters.

     "ABCDEF" . tokenize  "..." == ["ABC","DEF"]

Split to blocks of 3 characters taking first and third letters.

     "ABCDEF" . tokenize "(.).(.)" == ["AC","DF"]

Split a text on any white space.

     'Hello Big\r\nWide\tWorld\nGoodbye!' . tokenize "(\S+)(?:\s+|$)"
== ["Hello","Big","Wide","World","Goodbye!"]