--test_regression

Switch

--test_regression

Description

Splits the data into test and training set, trains a model on training set and predicts it on the test set.

Argument and Default Value

None

Details

This switch split the data into test (1/5) and training (4/5), then creates a regression model on the training data only. It then predicts the outcomes for the data in the test set, and yields accuracies for the created model. This technique is called out of sample prediction, and is used to avoid over:doc:fwflag_fitting. It is usually better to either use --nfold_test_regression, which does the same thing as --test_regression but multiple times. Alternatively, you can manually create a test/training set by splitting your data in MySQL. If you're doing this, it's preferable to put "wordy" users in the training set, to boost the accuracy.

Other Switches

Required Switches: -d, -g, -t, -f, --outcome_table, --outcomes Optional Switches: --group_freq_thresh --no_standardize --model --sparse etc.