--tf_idf

Switch

--tf_idf

Description

Creates new feature table where the group_norm is the tf-idf score. Each group_id is seen as a document for calculating tf-idf.

Argument and Default Value

None

Details

The -f flag should be an ngram table.

Resulting value refers to value in ngram table. Group_norm refers to tf:doc:fwflag_idf score.

Other Switches

Required:

Example Commands

./dlatkInterface.py -d dla_tutorial -t msgs -c user_id -f 'feat$1gram$msgs$user_id$16to16' --tf_idf
mysql> select * from feat$tf_idf_1gram$msgs$user_id order limit 5;;
+---------+----------+-----------+-------+--------------------+
| id      | group_id | feat      | value | group_norm         |
+---------+----------+-----------+-------+--------------------+
|  307349 |  2033616 | delivered |     1 | 0.0000878334772103 |
|  278647 |  4144593 | crap      |     6 |  0.000998442620366 |
| 1043863 |  3482840 | story     |     2 |  0.000334689956064 |
| 1150911 |  2876677 | uh        |     2 |  0.000141436336165 |
|  283547 |  3711805 | crosses   |     2 |  0.000827587016091 |
+---------+----------+-----------+-------+--------------------+