LinearRegressionWithSGD#
- class pyspark.mllib.regression.LinearRegressionWithSGD[source]#
- Train a linear regression model with no regularization using Stochastic Gradient Descent. - New in version 0.9.0. - Deprecated since version 2.0.0: Use - pyspark.ml.regression.LinearRegression.- Methods - train(data[, iterations, step, ...])- Train a linear regression model using Stochastic Gradient Descent (SGD). - Methods Documentation - classmethod train(data, iterations=100, step=1.0, miniBatchFraction=1.0, initialWeights=None, regParam=0.0, regType=None, intercept=False, validateData=True, convergenceTol=0.001)[source]#
- Train a linear regression model using Stochastic Gradient Descent (SGD). This solves the least squares regression formulation - f(weights) = 1/(2n) ||A weights - y||^2 - which is the mean squared error. Here the data matrix has n rows, and the input RDD holds the set of rows of A, each with its corresponding right hand side label y. See also the documentation for the precise formulation. - New in version 0.9.0. - Parameters
- datapyspark.RDD
- The training data, an RDD of LabeledPoint. 
- iterationsint, optional
- The number of iterations. (default: 100) 
- stepfloat, optional
- The step parameter used in SGD. (default: 1.0) 
- miniBatchFractionfloat, optional
- Fraction of data to be used for each SGD iteration. (default: 1.0) 
- initialWeightspyspark.mllib.linalg.Vectoror convertible, optional
- The initial weights. (default: None) 
- regParamfloat, optional
- The regularizer parameter. (default: 0.0) 
- regTypestr, optional
- The type of regularizer used for training our model. Supported values: - “l1” for using L1 regularization 
- “l2” for using L2 regularization 
- None for no regularization (default) 
 
- interceptbool, optional
- Boolean parameter which indicates the use or not of the augmented representation for training data (i.e., whether bias features are activated or not). (default: False) 
- validateDatabool, optional
- Boolean parameter which indicates if the algorithm should validate data before training. (default: True) 
- convergenceTolfloat, optional
- A condition which decides iteration termination. (default: 0.001) 
 
- data