Reduces a feature space to clusters.
Argument and Default Value¶
If --n_components is not specified then the default number of clusers is 24 (when applicable).
Using --model one can specify the following clustering algorithms:
- NMF - Non:doc:fwflag_Negative matrix factorization by Projected Gradient (NMF)
- PCA - (Principal component analysis) Linear dimensionality reduction using Singular Value Decomposition of the data and keeping only the most significant singular vectors to project the data to a lower dimensional space.
- SPARSEPCA - (Sparse Principal Components Analysis) Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is controllable by the coefficient of the L1 penalty.
- LDA - (Linear Discriminant Analysis) A classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes’ rule.
- KMEANS - K:doc:fwflag_Means clustering
- DBSCAN - (Density:doc:fwflag_Based Spatial Clustering of Applications with Noise) Finds core samples of high density and expands clusters from them. Good for data which contains clusters of similar density.
- SPECTRAL - Apply clustering to a projection to the normalized laplacian. In practice Spectral Clustering is very useful when the structure of the individual clusters is highly non:doc:fwflag_convex or more generally when a measure of the center and spread of the cluster is not a suitable description of the complete cluster. For instance when clusters are nested circles on the 2D plan.
- GMM - (Gaussian Mixture Model)
# General syntax dlatkInterface.py -d <DATABASE> -t <TABLE> -c <> -f <FEATURE_TABLE> --fit_reducer --model <MODEL_NAME> # Example command dlatkInterface.py -d primals -t primals_new -c dp_id -f 'feat$1to3gram$primals_new$dp_id$16to1$0_0001' --fit_reducer --model spectral --group_freq_thresh 100