--test_classifiers

Switch

--test_classifiers

Description

Splits the data into test and training set, trains a classifier on training set and predicts it on the test set.

Argument and Default Value

None

Details

This switch split the data into test (1/5) and training (4/5), then creates a classification model (aka a classifier) on the training data only. It then predicts the outcome class for the data in the test set, and yields accuracies for the created model. This technique is called out of sample prediction, and is used to avoid over:doc:fwflag_fitting. It is usually better to either use --nfold_test_classifiers, which does the same thing as --test_classifiers but multiple times. Alternatively, you can manually create a test/training set by splitting your data in MySQL. If you're doing this, it's preferable to put "wordy" users in the training set, to boost the accuracy.

Other Switches

Required Switches: -d, -g, -t, -f, --outcome_table, --outcomes Optional Switches: --group_freq_thresh --no_standardize --model --sparse etc.