Skip to main content

Text.words

wordskeep_whitespace

Group: Text
Aliases: get words

Documentation

Returns a vector containing all words in the given text.

Arguments

  • keep_whitespace: Whether or not the whitespace around the words should be preserved. If set to True, the whitespace will be included as a "word" in the output.

Examples

Getting the words in the sentence "I have not one, but two cats."

     "I have not one, but two cats.".words == ['I', 'have', 'not', 'one', ',', 'but', 'two', 'cats', '.']

Getting the words in the Thai sentence "แมวมีสี่ขา"

      "แมวมีสี่ขา".words == ['แมว', 'มี', 'สี่', 'ขา']

Remarks

What is a Word?

A word is defined based on the definition of Word Boundaries in the Unicode Standard Annex 29, supplemented by language-specific dictionaries for Chinese, Japanese, Thai, and Khmer.