--outcome_controls

Switch

--outcome_controls <Field1> <Field2> ... (alias: --controls)

Description

Generate correlations for --outcomes like the default --correlate while controlling for other variables.

Argument and Default Value

List fields from the outcome table that should be used as controls in the linear regression.

Details

These values are generated by using least squares linear regression. For each feature/outcome pair, we normalize all variables, including feature group norms, control variables and outcome variables by subtracting the mean and dividing by the standard deviation, thus creating a data distribution that has a mean of zero and a standard deviation of 1. We then create a linear model that predicts the outcome value based on the feature group norms, and control variables. B0 + B1*F + B2*C = predicted O From this model, the B2 is the value that shows up in the r matrix.

Or in the words of Patrick...

The simplest way (i.e., to get a Pearson correlation between a continuous outcome and some language variable, like normed topic use) is to run a multiple linear regression:

where is the intercept, and is the group_norm of the language feature.

When every variable is normalized (which it is here), the regression coefficient b1 is mathematically equivalent to a Pearson r between outcome and the language variable.

Other Switches

Required Switches: --outcomes --outcome_table Optional Switches: --group_freq_thresh --outcome_interaction Example Commands ================ .. code:doc:fwflag_block:: python

# Correlates 1grams with age for every user ./fwInterface.py -d twitterGH -t messages_en -g cty_id --group_freq_thresh 100 -f 'feat$cat_moralFoundations$messages_en$cty_id$16to16' --outcome_table countyVotingSM2 --outcomes Median_Age Rpercent_2008 overall_LS population_density percent_white percent_bachelors --outcome_controls log_mean_income --output_name morals2demog_ctrinc --rmatrix --sort