Hive修改与字段名后Spark读取不到字段

您所在的位置：网站首页 › hive表增加列 › Hive修改与字段名后Spark读取不到字段

Hive修改与字段名后Spark读取不到字段

#Hive修改与字段名后Spark读取不到字段| 来源: 网络整理| 查看: 265

方案一：

前提是有元数据库的权限

问题重现（例）

当我将数据存储格式改变，或者增加一列的时候，我习惯使用了alter table add …来实现

原来的表：在这里插入图片描述

ALTER TABLE test ADD COLUMNS (weight STRING) CASCADE 1

加上一列weight字段后（这里使用cascade就是为了同步到hivemetastore），我用spark向表插入一个带有weight字段的表，此时抛出异常

Exception in thread "main" org.apache.spark.sql.AnalysisException: The column number of the existing table default.test(struct) doesn't match the data schema(struct); 123

告诉我字段不匹配，于是我去找了元数据发现元数据中的确没有weight这一字段，难道是cascade失效了？不太可能差元数据的步骤：

use hivemetastore; select * from TBLS;//找到此表的TBL_ID //找到后在TABLE_PARAMS找这张表的schema select * from TABLE_PARAMS where TBL_ID = '691' 1234

发现的确没有改变

在这里插入图片描述

于是去网上寻求答案在Spark社区21841中发现了原因

原因及解决

原因就在于我们的存储格式为parquet

parquet格式时，spark会直接从上面的元数据库中寻找schema信息，但是此种格式下，hive中通过alter的方式并不能修改元数据库中的信息，所以更新就失败了，spark访问也无效

解决：粗劣的解决办法就是update元数据库中的这一条信息，但是比较费时，也可以将parquet格式改掉，用hive的格式也可以同步，我们因为采用parquet存储信息所以不能轻易改变，目前只能update元数据库来实现，苦逼，如果有更好的方法，请随时评论给我在这里插入图片描述

方案二：

适用外部表，删除表，数据还在

1. 复制原表的建表是语句

2. 修改字段名

3.用新的语句建表

4.执行脚本

#/bin/bash start_date=$1 end_date=$2 while [ "${start_date}" != "`date -d "${end_date} +1 day" +"%Y-%m-%d"`" ] do hive -e "use XXX;alter table XXX add partition(tel_date='${start_date}') location 'wasbs://XXXX.blob.co re.XXXX.cn/XXXX/features/tel/XXXX/TEL_DATE=${start_date}';" start_date=`date -d "${start_date} +1 day" +"%Y-%m-%d"` done 分案三：

适用于增加字段

1. 在/usr/bin/hive 下新增字段

2. 在 hive on spark 下新增字段

alias shive='beeline -u '\''jdbc:hive2://hostname:10002/;transportMode=http'\'''

方案四：

也是目前我找到的最实用的方案：

alter table table_name add columns(location_id string) cascade|restrict；

alter table table_name change column col_name col_name string cascade|restrict；

alter table table_name replace columns(id string,amt string,name string,name2 string) cascade; -- 这种方法需要带上所有字段

加深记忆的方法也很简单，cascade的中文翻译为“级联”，也就是不仅变更新分区的表结构（metadata），同时也变更旧分区的表结构。

The CASCADE|RESTRICT clause is available in Hive 1.1.0. ALTER TABLE CHANGE COLUMN with CASCADE command changes the columns of a table’s metadata, and cascades the same change to all the partition metadata. RESTRICT is the default, limiting column change only to table metadata.

ALTER TABLE CHANGE COLUMN CASCADE clause will override the table partition’s column metadata regardless of the table or partition’s protection mode. Use with discretion.

The column change command will only modify Hive’s metadata, and will not modify data. Users should make sure the actual data layout of the table/partition conforms with the metadata definition.

附：官方文档

ADD COLUMNS lets you add new columns to the end of the existing columns but before the partition columns. This is supported for Avro backed tables as well, for Hive 0.14 and later.

REPLACE COLUMNS removes all existing columns and adds the new set of columns. This can be done only for tables with a native SerDe (DynamicSerDe, MetadataTypedColumnsetSerDe, LazySimpleSerDe and ColumnarSerDe). Refer to Hive SerDe for more information. REPLACE COLUMNS can also be used to drop columns. For example, "ALTER TABLE test_change REPLACE COLUMNS (a int, b int);" will remove column 'c' from test_change's schema.

The PARTITION clause is available in Hive 0.14.0 and later; see Upgrading Pre-Hive 0.13.0 Decimal Columns for usage.

The CASCADE|RESTRICT clause is available in Hive 1.1.0. ALTER TABLE ADD|REPLACE COLUMNS with CASCADE command changes the columns of a table's metadata, and cascades the same change to all the partition metadata. RESTRICT is the default, limiting column changes only to table metadata.

Add/Replace Columns ALTER TABLE table_name [PARTITION partition_spec] -- (Note: Hive 0.14.0 and later) ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...) [CASCADE|RESTRICT] -- (Note: Hive 1.1.0 and later)

【本文地址】

Hive修改与字段名后Spark读取不到字段

Hive修改与字段名后Spark读取不到字段

今日新闻

推荐新闻