SIM (X,Y): This operation computes a symmetric semantic similarity score between <X> and <Y>. This is an element-wise column-valued operation.

Both arguments may be columns (of type VARCHAR) or strings (literals). If an argument is a string literal, it broadcasts the string out to the column argument. If both arguments are strings, they are both broadcast out to the expected number of rows based on the table.

The semantic similarity score represents and inner product and is thus always between -1 and 1. A score of greater than 0.3 indicates that two texts are similar and a score greater than 0.4 indicates that two texts are highly similar. Texts that are nearly exact matches can obtain similarities of over 0.5. Scores less than 0.1 indicate no particular similarity or semantic relationships between the texts.

Syntax

SELECT SIM(<LITERAL> | <COLUMN>, <COLUMN> | <LITERAL>) FROM <TABLE>;

Args

X
VARCHAR or str (literal)

The first arugment for the semantic similarity score. X must be a column of type VARCHAR or a string literal.

Y
VARCHAR or str (literal)

The second arugment for the semantic similarity score. Y must be a column of type VARCHAR or a string literal.

Examples

SELECT SIM('cat', pets) FROM pets_table;