Add lda topic version of message table.

Argument and Default Value

Arguement: LDA_TABLE. There is no default value.


A tokenized message table TOK_TABLE is specified with the -t switch. --add_lda_messages then creates the table TOK_TABLE_lda$LDA_TABLE, which has the same structure as TOK_TABLE.

Other Switches

Required Switches:

Example Commands

# Creates the table twt_20mil_tok_lda$twt_topics
dlatkInterface.py -d twitterGH -t twt_20mil_tok --add_lda_messages  twt_topics

The table twt_20mil_tok_lda$twt_topics has the same structure as twt_20mil_tok except for the message column. An example of how this column is changed:

  • Message in twt_20mil_tok: ["is", "a", "book"]
  • Message in twt_20mil_tok_lda$twt_topics: [{"index": "0", "term": "book", "doc": "1", "topic_id": "701", "term_id": "5", "message_id": "128679866827677696"}]

In the above command, twt_20mil_tok was created from twt_20mil using --add_tokenized and the file twt_topics was created addMessageID.py, as in the following two commands:

dlatkInterface.py -d twitterGH -t twt_20mil -c id --add_tokenized # this creates twt_20mil_tok
dlatk/addMessageID.py twt_20mil.txt twt_20mil_state.gz > twt_topics

Mallet LDA Interface