Apache Spark 使用withColumnRenamed重命名多列

您所在的位置:网站首页 pandas列重命名 Apache Spark 使用withColumnRenamed重命名多列

Apache Spark 使用withColumnRenamed重命名多列

#Apache Spark 使用withColumnRenamed重命名多列 | 来源: 网络整理| 查看: 265

被zero323接受的答案是有效的。其他大多数答案应该避免。下面是另一个高效的解决方案,它利用quinn库,非常适合生产代码库:

df = spark.createDataFrame([(1,2), (3,4)], ['x1', 'x2']) def rename_col(s): mapping = {'x1': 'x3', 'x2': 'x4'} return mapping[s] actual_df = df.transform(quinn.with_columns_renamed(rename_col)) actual_df.show()

下面是输出的DataFrame:

+---+---+ | x3| x4| +---+---+ | 1| 2| | 3| 4| +---+---+

让我们看一下actual_df.explain(True)输出的逻辑计划,并验证它们是否有效:

== Parsed Logical Plan == 'Project ['x1 AS x3#52, 'x2 AS x4#53] +- LogicalRDD [x1#48L, x2#49L], false == Analyzed Logical Plan == x3: bigint, x4: bigint Project [x1#48L AS x3#52L, x2#49L AS x4#53L] +- LogicalRDD [x1#48L, x2#49L], false == Optimized Logical Plan == Project [x1#48L AS x3#52L, x2#49L AS x4#53L] +- LogicalRDD [x1#48L, x2#49L], false == Physical Plan == *(1) Project [x1#48L AS x3#52L, x2#49L AS x4#53L]

解析后的逻辑计划和物理计划基本上是相同的,因此Catalyst没有做任何繁重的工作来优化计划。应该避免多次调用withColumnRenamed,因为这会创建需要优化的低效解析计划。让我们看一个不必要的复杂解析计划:一个三个三个一个



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3