--use_collocs

Switch

--use_collocs

Description

Use a set of collocations to extract n grams.

Argument and Default Value

Use this option to extract features using a collocation table (--colloc_table), or to modify a feature table that was extracted using collocations. The collocation table holds the multigrams that should be considered together. All words that aren’t part of the predefined list of collocations will be counted as 1grams.

Details

Use this option to extract features using a collocation table (--colloc_table), or to modify a feature table that was extracted using collocations. The collocation table holds the multigrams that should be considered together. All words that aren’t part of the predefined list of collocations will be counted as 1grams.

Note: --colloc_table is assumed to have columns ‘feat’

Note: The preferred collocation table as of June 2015 is ufeat$pmi$fb22_messagesEn$lnpmi0_15

Other Switches

Required Switches: None Optional Switches: --colloc_table <TABLENAME> --include_sub_collocs --feature_type_name <STRING>

Example Commands

Example outputs: feat$colloc$msgsEn_r5k$user_id$16to16 feat$colloc$msgsEn_r5k$user_id$16to16$0_05 Off:doc:fwflag_label use: only extract 1:doc:fwflag_grams that appear in the lex table ANEW: fwInterface.py -d fbtrust -t messagesEn -c user_id --add_ngrams --use_collocs --colloc_table ANEW fwflag_colloc_column term --feature_type_name ANEWterms