--outcome_controls <Field1> <Field2> ... (alias: --controls)
Generate correlations for --outcomes like the default --correlate while controlling for other variables.
Argument and Default Value¶
List fields from the outcome table that should be used as controls in the linear regression.
These values are generated by using least squares linear regression. For each feature/outcome pair, we normalize all variables, including feature group norms, control variables and outcome variables by subtracting the mean and dividing by the standard deviation, thus creating a data distribution that has a mean of zero and a standard deviation of 1. We then create a linear model that predicts the outcome value based on the feature group norms, and control variables. B0 + B1*F + B2*C = predicted O From this model, the B2 is the value that shows up in the r matrix.
Or in the words of Patrick...
The simplest way (i.e., to get a Pearson correlation between a continuous outcome and some language variable, like normed topic use) is to run a multiple linear regression:
where is the intercept, and is the group_norm of the language feature.
When every variable is normalized (which it is here), the regression coefficient b1 is mathematically equivalent to a Pearson r between outcome and the language variable.
# Correlates 1grams with age for every user ./fwInterface.py -d twitterGH -t messages_en -c cty_id --group_freq_thresh 100 -f 'feat$cat_moralFoundations$messages_en$cty_id$16to16' --outcome_table countyVotingSM2 --outcomes Median_Age Rpercent_2008 overall_LS population_density percent_white percent_bachelors --outcome_controls log_mean_income --output_name morals2demog_ctrinc --rmatrix --sort