KMeans#
- class pyspark.mllib.clustering.KMeans[source]#
- K-means clustering. - New in version 0.9.0. - Methods - train(rdd, k[, maxIterations, ...])- Train a k-means clustering model. - Methods Documentation - classmethod train(rdd, k, maxIterations=100, initializationMode='k-means||', seed=None, initializationSteps=2, epsilon=0.0001, initialModel=None, distanceMeasure='euclidean')[source]#
- Train a k-means clustering model. - New in version 0.9.0. - Parameters
- rdd:pyspark.RDD
- Training points as an RDD of - pyspark.mllib.linalg.Vectoror convertible sequence types.
- kint
- Number of clusters to create. 
- maxIterationsint, optional
- Maximum number of iterations allowed. (default: 100) 
- initializationModestr, optional
- The initialization algorithm. This can be either “random” or “k-means||”. (default: “k-means||”) 
- seedint, optional
- Random seed value for cluster initialization. Set as None to generate seed based on system time. (default: None) 
- initializationSteps
- Number of steps for the k-means|| initialization mode. This is an advanced setting – the default of 2 is almost always enough. (default: 2) 
- epsilonfloat, optional
- Distance threshold within which a center will be considered to have converged. If all centers move less than this Euclidean distance, iterations are stopped. (default: 1e-4) 
- initialModelKMeansModel, optional
- Initial cluster centers can be provided as a KMeansModel object rather than using the random or k-means|| initializationModel. (default: None) 
- distanceMeasurestr, optional
- The distance measure used by the k-means algorithm. (default: “euclidean”) 
 
- rdd: