Text.words
wordskeep_whitespace
Group: Text
Aliases: get words
Documentation
Returns a vector containing all words in the given text.
Arguments
keep_whitespace
: Whether or not the whitespace around the words should be preserved. If set toTrue
, the whitespace will be included as a "word" in the output.
Examples
Getting the words in the sentence "I have not one, but two cats."
"I have not one, but two cats.".words == ['I', 'have', 'not', 'one', ',', 'but', 'two', 'cats', '.']
Getting the words in the Thai sentence "แมวมีสี่ขา"
"แมวมีสี่ขา".words == ['แมว', 'มี', 'สี่', 'ขา']
Remarks
What is a Word?
A word is defined based on the definition of Word Boundaries in the Unicode Standard Annex 29, supplemented by language-specific dictionaries for Chinese, Japanese, Thai, and Khmer.