--clean_messages

Switch

--clean_messages

Description

When used alone it replaces URLs with <URL> and @mentions with <USER>.

When used with:

  • --deduplicate: it replaces URLs with <URL> and @mentions with <USER> but also removed duplicate tweets.

  • --language_filter: it removed urls and @mentions before applying the language filter but not removed from the resulting message table.

Argument and Default Value

None

Details

When used alone it will create a new table whose name is taken from the -t flag and appends "_an".

Other Switches

Required Switches:

Optional Switches:

Example Commands

Clean URLs and @mentions:

# creates the table msgs_an
./dlatkInterface.py -d dla_tutorial -t msgs -c user_id --clean_messages

Clean URLs and @mentions while lanugage filtering:

# creates the table msgs_en
./dlatkInterface.py -d dla_tutorial -t msgs -c user_id --language_filter en --clean_messages

Clean URLs and @mentions while deduplicating:

# creates the table msgs_dedup
./dlatkInterface.py -d dla_tutorial -t msgs -c user_id --deduplicate --clean_messages