--keep_low_variance_outcomes

Switch

--keep_low_variance_outcomes or --keep_low_variance

Description

Keep any outcomes, controls or interactions that have low variance.

Argument and Default Value

By default DLATK will calculate the variance for all outcomes, controls and interaction variables and remove if less than the default threshold. Use this flag to turn this feature off.

The default threshold is 0 and is set via a variable in dlaConstants.py:

DEF_LOW_VARIANCE_THRESHOLD = 0.0

or can be changed via OutcomeGetter and OutcomeAnalyzer instance variables:

OutcomeGetter(..., low_variance_thresh=foo, ...)
OutcomeAnalyzer(..., low_variance_thresh=foo, ...)

Other Switches

Required Switches:

Optional Switches:

Example Commands

These are two toy examples where we correlate language features with gender but only consider males. You probably don't want to do this in practice.

# run DLA over only males
dlatkInterface.py -d dla_tutorial -t msgs -c user_id --outcome_table blog_outcomes \
--outcomes gender -f 'feat$1gram$msgs$user_id$16to16' --correlate --where "gender = 0" --keep_low_variance
# use 1grams to predict the gender of only males via 10-fold cross validation
dlatkInterface.py -d dla_tutorial -t msgs -c user_id --outcome_table blog_outcomes \
--outcomes gender -f 'feat$1gram$msgs$user_id$16to16' --combo_test_regression \
--folds 10 --where "gender = 0" --keep_low_variance