***************** Packaged Datasets ***************** All datasets are available on our `github `_ page, the `World Well-Being Project `_ site and via the pip install. Note: some lexica and datasets are distributed on more restrictive licenses than DLATK. Please review each before use. Language Data ============= Blog Authorship Corpus ---------------------- A subset of blog posts from `this `_ dataset collected by `J. Schler, M. Koppel, S. Argamon and J. Pennebaker `_. This subset contains all posts from a random set of 1000 users. Shared with permission from Moshe Koppel. * `[.zip] `_ * MySQL: dla_tutorial.msgs, dla_tutorial.blog_outcomes Lexica ====== Age and Gender Lexica --------------------- Our data-driven age and gender lexica were generated from about 97,000 Facebook, Blogger and Twitter users. * `[.zip] `_ * MySQL: permaLexicon.dd_emnlp14_ageGender * `Link to publication `_ PERMA Lexicon ------------- Our lexicon to predict well-being as measured through PERMA scales. * `[.zip] `_ * MySQL: permaLexicon.dd_permaV3 * `Link to publication `_ * `[Usage license] `_ Spanish PERMA Lexicon --------------------- Our lexicon to measure PERMA in Spanish, derived from Spanish tweets annotated with PERMA. * `[.zip] `_ * MySQL: permaLexicon.dd_sperma_v2 * `Link to publication `_ Other Lexica ------------ Prospection Lexicon: Temporal Orientation: * `[.csv] `_ * MySQL: permaLexicon.dd_PaPreFut * `Link to publication `_ Affect and Intensity Lexicon: * `[.csv] `_ * MySQL: permaLexicon.dd_intAff * `Link to publication `_ LDA Topics ========== 2000 Facebook Topics -------------------- * Top 20 words per topic: `[.csv] `_ `[Excel file] `_ * MySQL: permaLexicon.met_a30_2000_cp and permaLexicon.met_a30_2000_freq_t50ll * All words: `[.csv] `_ * Conditional probabilities `[.csv] `_ (sparse matrix format) * `Link to publication `_