--add_tokenized

Switch

--add_tokenized

Description

Creates a tokenized version of the message table.

Argument and Default Value

None

Details

This will create a table called TABLE_tok (where TABLE is specified by -t) in the database specified by -d. The message column in this new table is a list of tokens. It uses DLATK's built-in tokenizer Happier Fun Tokenizer, which is an extension of Happy Fun Tokenizer.

If your message is:

"Mom said she's gonna think about getting a truck."

the same row in the tokenized table will look like this:

["mom", "said", "she's", "gonna", "think", "about", "getting", "a", "truck", "."]

To use the tokenized table in standalone scripts, simply do JSON.load(message).

Other Switches

Required Switches:

Example Commands

# Creates the tables: msgs_tok
dlatkInterface.py -d dla_tutorial -t msgs -c message_id --add_tokenized
mysql> select message from msgs_tok limit 1;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| message                                                                                                                                                                                                                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ["can", "you", "believe", "it", "?", "?", "my", "mom", "wouln't", "let", "me", "go", "out", "on", "my", "b'day", "...", "i", "was", "really", "really", "mad", "at", "her", ".", "still", "am", ".", "but", "i", "got", "more", "presents", "from", "my", "friends", "this", "year", ".", "so", "thats", "great", "."] |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+