.. _fwflag_train_classifiers: =================== --train_classifiers =================== Switch ====== --train_classifiers Description =========== Trains a classification model using the features given. Argument and Default Value ========================== None Details ======= This switch will cause the infrastructure to train a machine learning model to predict the outcome(s) (:doc:`fwflag_outcomes`) from the features in the feature tables :doc:`fwflag_f` (Note that you can put multiple feature tables in there). Features are loaded into memory, and are filtered/clustered using the feature selection (see below) and then standardized over the groups (unless :doc:`fwflag_no_standardize` is used), then fed into the classification model. It is usually useful to use this switch with :doc:`fwflag_save_model`, but put the order of the features into the name cause those aren't yet stored in the model. Feature Selection In order to avoid overfitting, we have a couple of feature selection steps that one can do. Most of our feature selection is done using the Scikit:doc:`fwflag_Learn` package. To use it, we have a couple of pre:doc:`fwflag_made` feature selections, so just (un)comment the lines below this line: # feature selection: featureSelectionString = None Every feature selector string will create an object if evaluated, and said object needs to have the following two functions: fit(X, y) transform(X) If putting a lot of features into the model, it's good to use the pipeline feature selection: featureSelectionString = 'Pipeline([("1_mean_value_filter", OccurrenceThreshold(threshold=(X.shape[0]/100.0))), ("2_univariate_select", SelectFwe(f_regression, alpha=70.0)), ("3_rpca", RandomizedPCA(n_components=.4/len(self.featureGetters), random_state=42, whiten=False, iterated_power=3, max_components=X.shape[0]/max(1.5, len(self.featureGetters))))])' If there aren't many features, you can choose not to use any feature selection. Talk to a CS PostDoc about this :) Model selection See below for choosing the model. Once the model is chosen, you should tweak the parameters by commenting in/out the appropriate line in classifyPredictor.py below # Model Parameters cvParams = {... You can choose your model using :doc:`fwflag_model`, and choose one of the following: svc (Support Vector Classification) linear:doc:`fwflag_svc` (Support Vector Classification with Linear Kernel) lr (Logistic Regression) etc (ExtraTrees Classification) rfc (RandomForrest Classification) pac (Passive Agressive Classification) lda (Linear Discriminant Analysis) Other Switches ============== Required Switches: :doc:`fwflag_d`, :doc:`fwflag_c`, :doc:`fwflag_t`, :doc:`fwflag_f`, :doc:`fwflag_outcome_table`, :doc:`fwflag_outcomes` Optional Switches: :doc:`fwflag_group_freq_thresh` :doc:`fwflag_model` :doc:`fwflag_save_model` :doc:`fwflag_picklefile` :doc:`fwflag_no_standardize` :doc:`fwflag_sparse` :doc:`fwflag_classification_to_lexicon` etc. Example Commands ================ .. code:doc:`fwflag_block`:: python # Trains a classifier to predict the gender (a binary variable) for users from 1grams # Will save the model to a picklefile called deleteMeGender.pickle ~/fwInterface.py :doc:`fwflag_d` fb20 :doc:`fwflag_t` messages_en :doc:`fwflag_c` user_id :doc:`fwflag_f` 'feat$1gram$messages_en$user_id$16to16$0_01' :doc:`fwflag_outcome_table` masterstats_andy_r10k :doc:`fwflag_outcomes` gender :doc:`fwflag_train_classifiers` :doc:`fwflag_save_model` :doc:`fwflag_picklefile` deleteMeGender.pickle