Using DLATK to view your SQL data

Since DLATK relies so heavily in MySQL we have created a few commands to help keep track of high level data. As with most dlatkInterface calls you will need the standard flags:

  • -d: the database we are using
  • -t: the table inside the database where our text lives (aka the message table)
  • -c: the table column we will be grouping the text by (aka group)

Viewing and Describing Tables

ls

View all feature tables for a given message table and / or grouping

Example 1: See user level features for the message table msgs:

dlatkInterface.py -d dla_tutorial -t msgs -c user_id --ls

SQL QUERY: SHOW TABLES FROM dla_tutorial LIKE 'feat$%$msgs$user_id$%'
Found 4 available feature tables
feat$1gram$msgs$user_id$16to1
feat$1gram$msgs$user_id$16to16
feat$cat_met_a30_2000_cp_w$msgs$user_id$16to16
feat$meta_1gram$msgs$user_id$16to1

Example 2: See features extracted at any level for the message table msgs:

dlatkInterface.py -d dla_tutorial -t msgs -c '%' --ls

SQL QUERY: SHOW TABLES FROM dla_tutorial LIKE 'feat$%$msgs$%$%'
Found 6 available feature tables
feat$1gram$msgs$message_id$16to1
feat$1gram$msgs$user_id$16to1
feat$1gram$msgs$user_id$16to16
feat$cat_met_a30_2000_cp_w$msgs$user_id$16to16
feat$meta_1gram$msgs$message_id$16to1
feat$meta_1gram$msgs$user_id$16to1

Show Tables

Use this flag to view non-feature tables:

dlatkInterface.py -d dla_tutorial --show_tables

Found 5 available tables
blog_outcomes
blog_outcomes_rand
dummy_table
msgs
msgs_rand

dlatkInterface.py -d dla_tutorial  --show_tables 'blog%'

Found 2 available tables
blog_outcomes
blog_outcomes_rand

Describe Tables

dlatkInterface.py -d dla_tutorial -t msgs --describe_tables
...
SQL QUERY: DESCRIBE msgs
                    Field                     Type      Null       Key   Default          Extra
               message_id                  int(11)        NO       PRI           auto_increment
                  user_id         int(10) unsigned       YES       MUL
                     date              varchar(64)       YES
             created_time                 datetime       YES       MUL
                  message                     text       YES
dlatkInterface.py -d dla_tutorial -t msgs --describe_tables blog_outcomes
...
SQL QUERY: DESCRIBE msgs
                    Field                     Type      Null       Key   Default          Extra
               message_id                  int(11)        NO       PRI           auto_increment
                  user_id         int(10) unsigned       YES       MUL
                     date              varchar(64)       YES
             created_time                 datetime       YES       MUL
                  message                     text       YES
SQL QUERY: DESCRIBE blog_outcomes
                    Field                     Type      Null       Key   Default          Extra
                  user_id                  int(11)        NO       PRI
                   gender                   int(2)       YES
                      age          int(3) unsigned       YES
                     occu              varchar(32)       YES
                     sign              varchar(16)       YES
                is_indunk                   int(1)       YES
               is_student                   int(1)       YES
             is_education                   int(1)       YES
            is_technology                   int(1)       YES

Creating tables

These commands allow you to create random samples of your data

Random Sample

Creates a new table with a random subset of rows from the table specified by -t.

Example 1: create the table msgs_rand that contains a random 10% of the rows in msgs:

dlatkInterface.py -d dla_tutorial -t msgs --create_random_sample .10
...
SQL QUERY: DROP TABLE IF EXISTS msgs_rand
SQL QUERY: CREATE TABLE msgs_rand LIKE msgs
SQL QUERY: ALTER TABLE msgs_rand DISABLE KEYS
SQL QUERY: INSERT INTO msgs_rand SELECT * FROM msgs where RAND(42) < 0.11000000000000001 LIMIT 3167
SQL QUERY: ALTER TABLE msgs_rand ENABLE KEYS

Example 2: create the table blog_outcomes_rand that contains a random 50% of the rows in blog_outcomes with the random seed 567:

dlatkInterface.py -d dla_tutorial -t blog_outcomes --create_random_sample .50 567
...
SQL QUERY: DROP TABLE IF EXISTS msgs_rand
SQL QUERY: CREATE TABLE msgs_rand LIKE msgs
SQL QUERY: ALTER TABLE msgs_rand DISABLE KEYS
SQL QUERY: INSERT INTO msgs_rand SELECT * FROM msgs where RAND(567) < 0.55 LIMIT 15837
SQL QUERY: ALTER TABLE msgs_rand ENABLE KEYS