--add_tweettok

Switch

--add_tweettok

Description

Use Carnegie Mellon University's TweetNLP tokenizer to create a tokenized version of the message table.

Argument and Default Value

None

Details

This will create a table called TABLE_tweettok (where TABLE is specified by -t) in the database specified by -d. The message column in this new table is a list of tokens.

Example on one message

Original message:

"@antijokeapple: What do you call a Bee who is having a bad hair day? A Frisbee." Hahah.

Tokenized message:

["\"", "@antijokeapple", ":", "What", "do", "you", "call", "a", "Bee", "who", "is", "having", "a", "bad", "hair", "day", "?", "A", "Frisbee", ".", "\"", "Hahah", "."]

Other Switches

Required Switches:

Example Commands

# creates the table msgs_tweettok
./dlatkInterface.py -d dla_tutorial -t msgs -c message_id --add_tweettok
mysql> select message from msgs_tweettok limit 1;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| message                                                                                                                                                                                                                                                                                                            |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ["can", "you", "believe", "it", "??", "my", "mom", "wouln't", "let", "me", "go", "out", "on", "my", "b'day", "...", "i", "was", "really", "really", "mad", "at", "her", ".", "still", "am", ".", "but", "i", "got", "more", "presents", "from", "my", "friends", "this", "year", ".", "so", "thats", "great", "."] |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+