Installation

DLATK is highly dependent on MySQL. If you do not have MySQL installed and running please skip Recommended Install and follow the steps in the Full Install section.

Please see Install FAQs for common install issues.

Full Install

Setup

Before installing DLATK you need to install the necessary system requirements (MySQL being the most important). The next steps will walk you through how to do this on a machine running Ubuntu or OSX.

Linux

WARNING: This will install MySQL on your computer.

Install the required Ubuntu libraries. The requirements.sys can be found on the DLATK GitHub page. The r-base package might be difficult to install and can be removed from requirements.sys if needed though this will limit some minor functionality.

wget https://github.com/dlatk/dlatk/blob/public/install/requirements.sys
xargs apt-get install < requirements.sys

DLATK has been tested on Ubuntu 14.04.

OSX (with brew)

WARNING: This will install MySQL on your computer.

Install dependencies with brew.

brew install python mysql

DLATK has been tested on OSX 10.11.

With the system requirements out of the way you can now install the Python code via Pip, Anaconda or GitHub:

Install (pip)

Install the Python 3 version via pip:

pip install dlatk

To install the Python 2.7 version use:

pip install "dlatk < 1.0"

Install (Anaconda)

Run the following in a Python 3.5 conda env:

conda install -c wwbp dlatk

Install (GitHub)

Run the following:

git clone https://github.com/dlatk/dlatk.git
cd dlatk
python setup.py install

Install Other Dependencies

Load NLTK corpus

Load NLTK data from the command line:

python -c "import nltk; nltk.download('wordnet')"

Install Stanford Parser

  1. Download the zip file from http://nlp.stanford.edu/software/lex-parser.shtml.
  2. Extract into ../dlatk/Tools/StanfordParser/.
  3. Move ../dlatk/Tools/StanfordParser/oneline.sh into the folder you extracted: ../dlatk/Tools/StanfordParser/stanford-parser-full*/.

Install Tweet NLP v0.3 (ark-tweet-nlp-0.3)

  1. Download the tgz file (for version 0.3) from http://www.cs.cmu.edu/~ark/TweetNLP/.
  2. Extract this file into ../dlatk/Tools/TwitterTagger/.

Python Modules (optional)

You can install the optional python dependencies with

pip install image jsonrpclib-pelix langid rpy2 simplejson textstat wordcloud

Standard DLATK functions can be run without these modules.

Install the IBM Wordcloud jar file (optional)

The IBM wordcloud module is our default. To install this you must sign up for a IBM DeveloperWorks account and download ibm-word-cloud.jar. Place this file into ../dlatk/lib/.

If you are unable to install this jar then you can use the python wordcloud module:

  1. pip install wordcloud
  2. Change wordcloud_algorithm='ibm' in ../dlatk/lib/wordcloud.py to wordcloud_algorithm='amueller'.

Note: You must install either the IBM Wordcloud jar or the Python wordcloud module to print wordclouds.

Mallet (optional)

Mallet can be used with DLATK to create LDA topics (see the Mallet LDA Interface tutorial). Directions on downloading and installing can be found here.

Full List of Dependencies

Python

Python (optional)

Other (optional)

  • IBM Wordcloud (for wordcloud visualization)
  • Mallet (for creating LDA topics)

Python version support

DLATK is available for Python 2.7 and 3.5, with the 3.5 version being the official release. The 2.7 version is fully functional (as of v0.6.1) but will not be maintained and also does not contain some of the newer features available in v1.0.

To install the Python 2.7 version run:

pip install "dlatk < 1.0"

Getting Started

Command Line Interface

DLATK is run using dlatkInterface.py which is added to /usr/local/bin during the installation process.

MySQL Configuration

Any calls to dlatkInterface.py will open MySQL. We assume any table with text data has the following columns:

  • message: text data
  • message_id: unique numeric identifier for each message

All lexicon tables are assumed to be in a database called permaLexicon (a sample database with this name is distributed with the release). To change this you must edit fwConstants.py: DEF_LEXICON_DB = 'permaLexicon'

Sample Datasets

DLATK comes packaged with two sample databases: dla_tutorial and permaLexicon. See Packaged Datasets for more information on the databases. To install them use the following:

mysql -u username -p  < /path/to/dlatk/data/dla_tutorial.sql
mysql -u username -p  < /path/to/dlatk/data/permaLexicon.sql

The path to DLATK can be found using the following:

python -c "import dlatk; print(dlatk.__file__)"

WARNING: if these databases already exist the above commands will add tables to the db.

Next Steps

Try the DLATK Tutorial once you have everything running.

Install Issues

See Install FAQs for more info.