Correlates features with the given outcomes and print r-s to standard output.
Argument and Default Value¶
This is one of the flag that triggers the correlation code (just like --rmatrix or --tagcloud). If none of the --outcome_controls or --outcome_interaction flags are specified, a Pearson correlation will be done for every feature's group_norm and the outcome. See the specific pages for the types of analyses performed when there's controls or interaction variables.
Every p-value is by default bonferroni corrected, unless --no_correction is specified.
NOTE - group columns must match in type between the message table, feature table and outcome table!
The following pseudo-code is happening
for feat in all_features: for outcome in outcomes: # x: column vector of group_norms for given feature # y: column vector of outcome values; aligned to x (r, p) = pearsonr(x,y)
--correlate prints out the following tuples to the stdout:
("feature", (pearson-r, p-value, (confidence interval left, confidence interval right), number of groups/sample size, total count of "feature")
# Correlates LIWC lexical features with age and gender for every user in masterstats_andy_r10k dlatkInterface.py -d fb20 -t messages_en -c user_id --outcome_table masterstats_andy_r10k --outcomes age gender -f 'feat$cat_LIWC2007$messages_en$user_id$16to16' --correlate