pyspark.SparkContext.parallelize#
- SparkContext.parallelize(c, numSlices=None)[source]#
- Distribute a local Python collection to form an RDD. Using range is recommended if the input represents a range for performance. - New in version 0.7.0. - Parameters
- ccollections.abc.Iterable
- iterable collection to distribute 
- numSlicesint, optional
- the number of partitions of the new RDD 
 
- c
- Returns
- RDD
- RDD representing distributed collection. 
 
 - Examples - >>> sc.parallelize([0, 2, 3, 4, 6], 5).glom().collect() [[0], [2], [3], [4], [6]] >>> sc.parallelize(range(0, 6, 2), 5).glom().collect() [[], [0], [], [2], [4]] - Deal with a list of strings. - >>> strings = ["a", "b", "c"] >>> sc.parallelize(strings, 2).glom().collect() [['a'], ['b', 'c']]