
上QQ阅读APP看书,第一时间看更新
.repartition(...) transformation
The repartition(n) transformation repartitions the RDD into n partitions by randomly reshuffling and uniformly distributing data across the network. As noted in the preceding recipes, this can improve performance by running more parallel threads concurrently. Here's a code snippet that does precisely that:
# The flights RDD originally generated has 2 partitions
flights.getNumPartitions()
# Output
2
# Let's re-partition this to 8 so we can have 8
# partitions
flights2 = flights.repartition(8)
# Checking the number of partitions for the flights2 RDD
flights2.getNumPartitions()
# Output
8