Statements and Ops
TOKENIZE
TOKENIZE(<X>, **kwargs)
: This operation tokenizes long texts into token windows of a
given length with a given stride. This is an element-wise column operation.
X
should be a VARCHAR
column. Each row is treated and tokenized as an independent long text.
Alternatively, one can specify the exact number of splits to tokenize the long texts into. This operation returns an array-valued column that can be exploded with an unnesting operation.
Args
The length of the tokenization window.
The stride of the tokenization.
An alternative way to specify the number of splits the tokenizer produces.