CoordinateMatrix#
- class pyspark.mllib.linalg.distributed.CoordinateMatrix(entries, numRows=0, numCols=0)[source]#
- Represents a matrix in coordinate format. - Parameters
- entriespyspark.RDD
- An RDD of MatrixEntry inputs or (int, int, float) tuples. 
- numRowsint, optional
- Number of rows in the matrix. A non-positive value means unknown, at which point the number of rows will be determined by the max row index plus one. 
- numColsint, optional
- Number of columns in the matrix. A non-positive value means unknown, at which point the number of columns will be determined by the max row index plus one. 
 
- entries
 - Methods - numCols()- Get or compute the number of cols. - numRows()- Get or compute the number of rows. - toBlockMatrix([rowsPerBlock, colsPerBlock])- Convert this matrix to a BlockMatrix. - Convert this matrix to an IndexedRowMatrix. - Convert this matrix to a RowMatrix. - Transpose this CoordinateMatrix. - Attributes - Entries of the CoordinateMatrix stored as an RDD of MatrixEntries. - Methods Documentation - numCols()[source]#
- Get or compute the number of cols. - Examples - >>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(1, 0, 2), ... MatrixEntry(2, 1, 3.7)]) - >>> mat = CoordinateMatrix(entries) >>> print(mat.numCols()) 2 - >>> mat = CoordinateMatrix(entries, 7, 6) >>> print(mat.numCols()) 6 
 - numRows()[source]#
- Get or compute the number of rows. - Examples - >>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(1, 0, 2), ... MatrixEntry(2, 1, 3.7)]) - >>> mat = CoordinateMatrix(entries) >>> print(mat.numRows()) 3 - >>> mat = CoordinateMatrix(entries, 7, 6) >>> print(mat.numRows()) 7 
 - toBlockMatrix(rowsPerBlock=1024, colsPerBlock=1024)[source]#
- Convert this matrix to a BlockMatrix. - Parameters
- rowsPerBlockint, optional
- Number of rows that make up each block. The blocks forming the final rows are not required to have the given number of rows. 
- colsPerBlockint, optional
- Number of columns that make up each block. The blocks forming the final columns are not required to have the given number of columns. 
 
 - Examples - >>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(6, 4, 2.1)]) >>> mat = CoordinateMatrix(entries).toBlockMatrix() - >>> # This CoordinateMatrix will have 7 effective rows, due to >>> # the highest row index being 6, and the ensuing >>> # BlockMatrix will have 7 rows as well. >>> print(mat.numRows()) 7 - >>> # This CoordinateMatrix will have 5 columns, due to the >>> # highest column index being 4, and the ensuing >>> # BlockMatrix will have 5 columns as well. >>> print(mat.numCols()) 5 
 - toIndexedRowMatrix()[source]#
- Convert this matrix to an IndexedRowMatrix. - Examples - >>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(6, 4, 2.1)]) >>> mat = CoordinateMatrix(entries).toIndexedRowMatrix() - >>> # This CoordinateMatrix will have 7 effective rows, due to >>> # the highest row index being 6, and the ensuing >>> # IndexedRowMatrix will have 7 rows as well. >>> print(mat.numRows()) 7 - >>> # This CoordinateMatrix will have 5 columns, due to the >>> # highest column index being 4, and the ensuing >>> # IndexedRowMatrix will have 5 columns as well. >>> print(mat.numCols()) 5 
 - toRowMatrix()[source]#
- Convert this matrix to a RowMatrix. - Examples - >>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(6, 4, 2.1)]) >>> mat = CoordinateMatrix(entries).toRowMatrix() - >>> # This CoordinateMatrix will have 7 effective rows, due to >>> # the highest row index being 6, but the ensuing RowMatrix >>> # will only have 2 rows since there are only entries on 2 >>> # unique rows. >>> print(mat.numRows()) 2 - >>> # This CoordinateMatrix will have 5 columns, due to the >>> # highest column index being 4, and the ensuing RowMatrix >>> # will have 5 columns as well. >>> print(mat.numCols()) 5 
 - transpose()[source]#
- Transpose this CoordinateMatrix. - New in version 2.0.0. - Examples - >>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(1, 0, 2), ... MatrixEntry(2, 1, 3.7)]) >>> mat = CoordinateMatrix(entries) >>> mat_transposed = mat.transpose() - >>> print(mat_transposed.numRows()) 2 - >>> print(mat_transposed.numCols()) 3 
 - Attributes Documentation - entries#
- Entries of the CoordinateMatrix stored as an RDD of MatrixEntries. - Examples - >>> mat = CoordinateMatrix(sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(6, 4, 2.1)])) >>> entries = mat.entries >>> entries.first() MatrixEntry(0, 0, 1.2)