--regression_to_lexicon

Switch

--regression_to_lexicon

Description

Extracts the coefficients from a regression model and turns them into a lexicon.

Argument and Default Value

Name of the lexicon to be created.

Details

Use this switch to create a lexicon from a regression model. Either create the lexicon from a previously created model (using --load_model) or create a model using --train_regression. The name of the lexicon will be dd_ARGUMENT ; dd indicates the lexicon was data driven.

IMPORTANT: When creating the model, use --no_standardize or the model will not make any sense.

Only use regression algorithms that have linear coefficients (i.e by choosing the right --model), because the lexicon extraction equation won't make sense otherwise. This functionality hasn't totally been validated with advanced feature selection, so beware. Also note that the coefficients won't be efficient to distinguish what features best characterize the outcomes looked at, use --correlate or other univariate techniques to get at that type of insight.

Other Switches

Required Switches: -d, -c, -t, -f, --outcome_table, --outcomes --no_standardize Needs one of these two switches: --train_regression --load_model Example Commands ================ .. code:doc:fwflag_block:: python

# Trains a regression model to predict age for users from 1grams, without standardizing # Will save the model to a picklefile called deleteMe.pickle, and create a lexicon called dd_testAgeLex ~/fwInterface.py -d fb20 -t messages_en -c user_id -f 'feat$1gram$messages_en$user_id$16to16$0_01' --outcome_table masterstats_andy_r10k --outcomes age --train_regression --save_model --picklefile deleteMe.pickle --no_standardize --regression_to_lexicon testAgeLex

# Given a model that was previously made, this turns the model into a lexicon called dd_testAgeLex ~/fwInterface.py -d fb20 -t messages_en -c user_id -f 'feat$1gram$messages_en$user_id$16to16$0_01' --load_model --picklefile deleteMe.pickle --regression_to_lexicon testAgeLex

References

Sap et al. (2014) - Developing Age and Gender Predictive Lexica over Social Media