Apache Spark 使用withColumnRenamed重命名多列 |
您所在的位置:网站首页 › pandas列重命名 › Apache Spark 使用withColumnRenamed重命名多列 |
被zero323接受的答案是有效的。其他大多数答案应该避免。下面是另一个高效的解决方案,它利用quinn库,非常适合生产代码库: df = spark.createDataFrame([(1,2), (3,4)], ['x1', 'x2']) def rename_col(s): mapping = {'x1': 'x3', 'x2': 'x4'} return mapping[s] actual_df = df.transform(quinn.with_columns_renamed(rename_col)) actual_df.show()下面是输出的DataFrame: +---+---+ | x3| x4| +---+---+ | 1| 2| | 3| 4| +---+---+让我们看一下actual_df.explain(True)输出的逻辑计划,并验证它们是否有效: == Parsed Logical Plan == 'Project ['x1 AS x3#52, 'x2 AS x4#53] +- LogicalRDD [x1#48L, x2#49L], false == Analyzed Logical Plan == x3: bigint, x4: bigint Project [x1#48L AS x3#52L, x2#49L AS x4#53L] +- LogicalRDD [x1#48L, x2#49L], false == Optimized Logical Plan == Project [x1#48L AS x3#52L, x2#49L AS x4#53L] +- LogicalRDD [x1#48L, x2#49L], false == Physical Plan == *(1) Project [x1#48L AS x3#52L, x2#49L AS x4#53L]解析后的逻辑计划和物理计划基本上是相同的,因此Catalyst没有做任何繁重的工作来优化计划。应该避免多次调用withColumnRenamed,因为这会创建需要优化的低效解析计划。让我们看一个不必要的复杂解析计划:一个三个三个一个 |
今日新闻 |
推荐新闻 |
CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3 |