Using DLATK to view your SQL data

Since DLATK relies so heavily in MySQL we have created a few commands to help keep track of high level data. As with most dlatkInterface calls you will need the standard flags:

  • -d: the database we are using

  • -t: the table inside the database where our text lives (aka the message table)

  • -g: the table column we will be grouping the text by (aka group)

Viewing and Describing Tables

Show Feature Tables

View all feature tables for a given message table and / or grouping

Example 1: See user level features for the message table msgs:

dlatkInterface.py -d dla_tutorial -t msgs -c user_id --show_feature_tables

SQL QUERY: SHOW TABLES FROM dla_tutorial LIKE 'feat$%$msgs$user_id$%'
Found 4 available feature tables
feat$1gram$msgs$user_id$16to1
feat$1gram$msgs$user_id$16to16
feat$cat_met_a30_2000_cp_w$msgs$user_id$16to16
feat$meta_1gram$msgs$user_id$16to1

Example 2: See features extracted at any level for the message table msgs:

dlatkInterface.py -d dla_tutorial -t msgs -c '%' --ls

SQL QUERY: SHOW TABLES FROM dla_tutorial LIKE 'feat$%$msgs$%$%'
Found 6 available feature tables
feat$1gram$msgs$message_id$16to1
feat$1gram$msgs$user_id$16to1
feat$1gram$msgs$user_id$16to16
feat$cat_met_a30_2000_cp_w$msgs$user_id$16to16
feat$meta_1gram$msgs$message_id$16to1
feat$meta_1gram$msgs$user_id$16to1

Show Tables

Use this flag to view non-feature tables:

dlatkInterface.py -d dla_tutorial --show_tables

Found 5 available tables
blog_outcomes
blog_outcomes_rand
dummy_table
msgs
msgs_rand

dlatkInterface.py -d dla_tutorial  --show_tables 'blog%'

Found 2 available tables
blog_outcomes
blog_outcomes_rand

Describe Tables

dlatkInterface.py -d dla_tutorial -t msgs --describe_tables
...
SQL QUERY: DESCRIBE msgs
                    Field                     Type      Null       Key   Default          Extra
               message_id                  int(11)        NO       PRI           auto_increment
                  user_id         int(10) unsigned       YES       MUL
                     date              varchar(64)       YES
             created_time                 datetime       YES       MUL
                  message                     text       YES
dlatkInterface.py -d dla_tutorial -t msgs --describe_tables blog_outcomes
...
SQL QUERY: DESCRIBE msgs
                    Field                     Type      Null       Key   Default          Extra
               message_id                  int(11)        NO       PRI           auto_increment
                  user_id         int(10) unsigned       YES       MUL
                     date              varchar(64)       YES
             created_time                 datetime       YES       MUL
                  message                     text       YES
SQL QUERY: DESCRIBE blog_outcomes
                    Field                     Type      Null       Key   Default          Extra
                  user_id                  int(11)        NO       PRI
                   gender                   int(2)       YES
                      age          int(3) unsigned       YES
                     occu              varchar(32)       YES
                     sign              varchar(16)       YES
                is_indunk                   int(1)       YES
               is_student                   int(1)       YES
             is_education                   int(1)       YES
            is_technology                   int(1)       YES

View Table Data

   dlatkInterface.py -d dla_tutorial -t msgs --view_tables
   ...
   SQL QUERY: select column_name from information_schema.columns
       where table_schema = 'dla_tutorial' and table_name='msgs'
   SQL QUERY: SELECT * FROM msgs LIMIT 5
message_id        user_id           date   created_time        message
         1        3991108   31,July,2004 2004-07-31 00:    can you bel
         2        3991108   25,July,2004 2004-07-25 00:    miss su  us
         3        3991108   24,July,2004 2004-07-24 00:    i'm lookin
         4        3991108   24,July,2004 2004-07-24 00:    what a time
         5        3991108 01,August,2004 2004-08-01 00:    i cannot be
   dlatkInterface.py -d dla_tutorial -t msgs --view_tables blog_outcomes
   ...
   SQL QUERY: select column_name from information_schema.columns
       where table_schema = 'dla_tutorial' and table_name='msgs'
   SQL QUERY: SELECT * FROM msgs LIMIT 5
message_id        user_id           date   created_time        message
         1        3991108   31,July,2004 2004-07-31 00:    can you bel
         2        3991108   25,July,2004 2004-07-25 00:    miss su  us
         3        3991108   24,July,2004 2004-07-24 00:    i'm lookin
         4        3991108   24,July,2004 2004-07-24 00:    what a time
         5        3991108 01,August,2004 2004-08-01 00:    i cannot be

   SQL QUERY: select column_name from information_schema.columns
               where table_schema = 'dla_tutorial' and table_name='blog_outcomes'
   SQL QUERY: SELECT * FROM blog_outcomes LIMIT 5
           user_id         gender     gender_cat            age           occu           sign      is_indunk     is_student   is_education  is_technology
           3991108              1         female             17         indUnk            Leo              1              0              0              0
           3417138              1         female             25 Communications         Taurus              0              0              0              0
           3673414              0           male             14        Student        Scorpio              0              1              0              0
           3361075              1         female             16        Student      Capricorn              0              1              0              0
           4115327              1         female             14         indUnk          Libra              1              0              0              0

Creating tables

These commands allow you to create random samples of your data

Random Sample

Creates a new table with a random subset of rows from the table specified by -t.

Example 1: create the table msgs_rand that contains a random 10% of the rows in msgs:

dlatkInterface.py -d dla_tutorial -t msgs --create_random_sample .10
...
SQL QUERY: DROP TABLE IF EXISTS msgs_rand
SQL QUERY: CREATE TABLE msgs_rand LIKE msgs
SQL QUERY: ALTER TABLE msgs_rand DISABLE KEYS
SQL QUERY: INSERT INTO msgs_rand SELECT * FROM msgs where RAND(42) < 0.11000000000000001 LIMIT 3167
SQL QUERY: ALTER TABLE msgs_rand ENABLE KEYS

Example 2: create the table blog_outcomes_rand that contains a random 50% of the rows in blog_outcomes with the random seed 567:

dlatkInterface.py -d dla_tutorial -t blog_outcomes --create_random_sample .50 567
...
SQL QUERY: DROP TABLE IF EXISTS msgs_rand
SQL QUERY: CREATE TABLE msgs_rand LIKE msgs
SQL QUERY: ALTER TABLE msgs_rand DISABLE KEYS
SQL QUERY: INSERT INTO msgs_rand SELECT * FROM msgs where RAND(567) < 0.55 LIMIT 15837
SQL QUERY: ALTER TABLE msgs_rand ENABLE KEYS