dlatk.LexicaInterface package

Submodules

dlatk.LexicaInterface.addMessageID module

dlatk.LexicaInterface.ldaExtractor module

class dlatk.LexicaInterface.ldaExtractor.LDAExtractor(corpdb='dla_tutorial', corptable='msgs', correl_field='user_id', mysql_host='localhost', message_field='message', messageid_field='message_id', encoding='utf8mb4', use_unicode=True, ldaMsgTable='messages_en_lda$msgs_en_tok_a30')[source]

Bases: dlatk.featureExtractor.FeatureExtractor

addLDAFeatTable(ldaMessageTable, tableName=None, valueFunc=<function LDAExtractor.<lambda>>)[source]

Creates feature tuples (correl_field, feature, values) table where features are ngrams

createDistributions(filename=None)[source]
getWordWhiteList(pocc=0.01)[source]
static printDistToCSV(dist, fileName)[source]
class dlatk.LexicaInterface.ldaExtractor.LDAExtractorParser(description='On Features Class.', prefix_chars='-+', formatter_class=<class 'argparse.ArgumentDefaultsHelpFormatter'>)[source]

Bases: argparse.ArgumentParser

drop(objNames)[source]
getParser()[source]

Just incase someone is confused by the inheritance

static load(savedir, savename)[source]
matchExtension = re.compile('^(.*)\\.pickle$')
printObjs()[source]
static printstates(savedir)[source]
processArgs(args='')[source]

Processes all arguments

processLDAExtractor(args)[source]

Main argument processing area

processLoad(args)[source]

processing state load arguments

processSave(args)[source]

processes the save arguments

static removeStateFile(savedir, savename)[source]
static save(savedir, savename, objectTup)[source]
saveExtension = 'pickle'

dlatk.LexicaInterface.lexInterface module

PYTHON SCRIPT TO INTERACT WITH PERMA LEXICON DBs

TODO: -make sure printCSV covers all bases. -Add fucntionaltiy for adding terms to db -make sure add terms works with hashtags

class dlatk.LexicaInterface.lexInterface.Lexicon(lex=None, mysql_host='127.0.0.1')[source]

Bases: object

addTermsToCorpus(corpdb, corptable, termfield, messagefield, messageidfield, fulltext=False)[source]

find rows with terms from lexicon and insert them back in as annotated rows

annotateSenses(currentName, newLexiconName)[source]
compare(otherLex)[source]

Compares two lexicons, depends on the types

createLexiconFromCorpus(corpdb, corptable, messagefield, messageidfield, minwordfreq)[source]

Creates a lexicon (all in one category) from a examining word frequencies in a corpus

createLexiconTable(tablename)[source]

Creates a lexicon table from the instances lexicon variable

currentLexicon = None
dbConn = None
dbCursor = None
depolCategories()[source]
expand()[source]

Expands a lexicon to contain more words

getLexicon(temp='nothing')[source]
insertLexiconRows(tablename, lex=None)[source]

Adds rows, taken from the lexicon variable to mysql

intersect(otherLexicon)[source]

intersects self lexicon with another and returns the result

lexiconDB = 'permaLexicon'
likeExamples(corpdb, corptable, messagefield, numForEach=60, onlyPrintIfMin=True, onlyPrintStartingAlpha=True)[source]
likeSamples(corpdb, corptable, messagefield, category, lexicon_name, number_of_messages)[source]
loadLexicon(tablename, where='')[source]

Loads a lexicon as currentLexicon

pprint()[source]

Uses pprint to print the current lexicon

printCSV()[source]

prints a csv style output of the lexicon

randomize()[source]

randomizes the categories of the current lexicon

setLexicon(lexicon)[source]
unGroupCategories()[source]
union(otherLexicon)[source]

union self lexicon with another and returns the result

static wordExpand(word, specificDepth=1, generalizeDepth=-1, totalLinks=2)[source]
wpRE = re.compile('^([^\\.#]+)\\.([a-z])', re.IGNORECASE)
wpsRE = re.compile('[nvar]\\.(\\d+|\\?)$')
class dlatk.LexicaInterface.lexInterface.WeightedLexicon(weightedLexicon=None, lex=None, mysql_host='127.0.0.1')[source]

Bases: dlatk.LexicaInterface.lexInterface.Lexicon

WeightedLexicons have an additional dictionary with weights for each term in the regular lexicon

compare(otherLex)[source]

Compares two lexicons, depends on the types

static compareUnweightedToUnweighted(uLex1, uLex2, metric='jaccard')[source]

Compares two unweighted lexicons

static compareWeightedToUnweighted(wLex1, uLex2, comparisonMethod='weighted')[source]

Compares a weighted lexicon (self) to an unweighted lexicon

static compareWeightedToWeighted(wLex1, wLex2)[source]

Compares two weighted lexicons

createLexiconTable(tablename)[source]

Loads a lexicon, checking to see if it is weighted or not, then responding accordingly

createWeightedLexiconTable(tablename)[source]

Creates a lexicon table from the instance's lexicon variable

getWeightedLexicon()[source]
insertWeightedLexiconRows(tablename, lex=None)[source]

Adds rows, taken from the lexicon variable to mysql

isSelfLexiconWeighted()[source]
isTableLexiconWeighted(tablename)[source]
loadLexicon(tablename, where='')[source]

Loads a lexicon, checking to see if it is weighted or not, then responding accordingly

loadWeightedLexicon(tablename, where='')[source]

Loads a lexicon as weightedLexicon

mapToSuperLexicon(superLexiconMapping)[source]

Creates a new lexicon based on mapping topic words to super topics

printCSV()[source]

prints a csv style output of the lexicon

union(otherLexicon)[source]

union self lexicon with another and returns the result

dlatk.LexicaInterface.lexInterface.interactiveGetSenses(cat, word)[source]
dlatk.LexicaInterface.lexInterface.loadLexiconFeatMapFromCSV(filename)[source]

Load a lexicon from a csv

dlatk.LexicaInterface.lexInterface.loadLexiconFromDic(filename)[source]

Loads a lexicon from a .dic file such as LIWC2001_English.dic

dlatk.LexicaInterface.lexInterface.loadLexiconFromFile(filename)[source]

Loads the perma lexicon, using standard formatting returns a dictionary of frozensets

dlatk.LexicaInterface.lexInterface.loadLexiconFromGFile(filename, using_filter)[source]

Loads a lexicon in "google format" returns a dictionary of frozensets

dlatk.LexicaInterface.lexInterface.loadLexiconFromSparse(filename)[source]

Loads the perma lexicon from a sparse formatting word[, word], category returns a dictionary of frozensets

dlatk.LexicaInterface.lexInterface.loadLexiconFromTopicFile(filename)[source]

Loads a lexicon from a topic file returns a dictionary of frozensets

dlatk.LexicaInterface.lexInterface.loadWeightedLexiconFromSparse(filename)[source]

Loads the perma lexicon from a sparse formatting word[, word], category returns a dictionary of frozensets

dlatk.LexicaInterface.lexInterface.loadWeightedLexiconFromTopicCSV(filename, threshold=None)[source]

Loads a weighted lexicon returns a dictionary of dictionaries

dlatk.LexicaInterface.lexInterface.loadWeightedLexiconFromTopicFile(filename)[source]

Loads a weighted lexicon returns a dictionary of dictionaries

Module contents