--add_tokenized

Switch

--add_tokenized

Description

Creates a tokenized version of the message table.

Argument and Default Value

None

Details

This will create a table called TABLE_tok (where TABLE is specified by -t) in the database specified by -d. The message column in this new table is a list of tokens.

This switch is used to create a tokenized version of the message table. It uses WWBP's tokenizer, which splits the message into tokens, then dumps the JSON version of the token list into MySQL text.

If your message is:

"Mom said she's gonna think about getting a truck."

the same row in the tokenized table will look like this:

["mom", "said", "she's", "gonna", "think", "about", "getting", "a", "truck", "."]

To use the tokenized table in standalone scripts, simply do JSON.load(message).

Other Switches

Required Switches:

Optional Switches:

Example Commands

# General form
# Creates the tables: TABLE_tok
dlatkInterface.py -d DATABASE -t TABLE -c GROUP_BY_FIELD --add_tokenized

# Creates the tables: primals_tok
dlatkInterface.py -d primals -t primals_new -c message_id --add_tokenized