本地IDEA、spark程序远程读取hive数据

您所在的位置：网站首页 › hive连接远程hadoop › 本地IDEA、spark程序远程读取hive数据

本地IDEA、spark程序远程读取hive数据

2024-01-07 03:23| 来源: 网络整理| 查看: 265

描述问题

数据在linux系统服务器上，在自己windows上用IDEA编写spark程序，需要远程访问hive数据。先说成功步骤，再说配置过程出现的的问题和解决办法

步骤

确保hive --service metastore 服务在Linux服务器已开启，在hive-cli可以正常读取数据。

1 下载winutils

github-winutils各个版本集合下载里面和自己服务器版本对应的，在这里插入图片描述

配置HADOOP_HOME

将下载的文件添加到系统环境变量，配置完最好重启系统。在这里插入图片描述

2 添加hive-site.xml文件

下载服务器端的hive-site.xml文件配置添加到src/main/resources目录下在这里插入图片描述

hive-site.xml

此处好几个注意点 hive.metastore.uris hive.metastore.warehouse.dir hive.exec.scratchdir 这几个参数一定注意配置好，后面好几个问题都跟这里有关

javax.jdo.option.ConnectionURL jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true;useSSL=false;allowPublicKeyRetrieval=true;serverTimezone=GMT%2B8 javax.jdo.option.ConnectionDriverName com.mysql.cj.jdbc.Driver javax.jdo.option.ConnectionUserName root javax.jdo.option.ConnectionPassword password hive.metastore.uris thrift://hbase:9083 hive.metastore.warehouse.dir /user/hive/warehouse hive.exec.scratchdir /tmp hive.server2.thrift.bind.host hbase hive.server2.thrift.port 10000 hive.server2.enable.doAs false hive.metastore.schema.verification false hive.metastore.event.db.notification.api.auth false mapreduce.jobtracker.address ignorethis hive.exec.show.job.failure.debug.info false 3 spark程序config def readHive(args:Array[String]*): Unit ={ System.setProperty("HADOOP_USER_NAME","root") val spark2: SparkSession = new SparkSession.Builder() .master("local[*]") .appName("sparkReadHive") //支持读Hive数据 .enableHiveSupport() .getOrCreate() spark2.sql("show databases") spark2.sql("use weblog") //spark2.sql("show tables").show() val frame = spark2.sql("select * from mlog limit 100") frame.show() } 读取结果在这里插入图片描述

解决过程出现的问题现存问题一个WARN

经以上操作程序可读取到hive数据，但会报如下warning，windows无法在这里插入图片描述具体请看

临时目录tmp有关问题

在hive-site.xml中设置一下，默认使用hdfs上的临时目录

hive.exec.scratchdir /tmp

权限用有写权限的用户去操作：在spark程序代码中添加：System.setProperty(“HADOOP_USER_NAME”,“root”)

spark.sql.warehouse.dir INFO SharedState: spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('/user/hive/warehouse'). 21/11/21 11:28:04 INFO SharedState: Warehouse path is '/user/hive/warehouse'.

只要在hive-site.xml设置了hive.metastore.warehouse.dir，就可以替代，也可在代码添加spark.sql.warehouse.dir使用本地warehouse

JDBC方式

还有一种方式是通过jdbc方式访问hiveserver2，这样查询操作都是在服务端，只能获取查询结果；

参考文章

使用idea, sparksql读取hive中的数据 Hive的metastore和hiveserver2服务 Spark远程连接Hive数据源

【本文地址】

本地IDEA、spark程序远程读取hive数据

本地IDEA、spark程序远程读取hive数据

今日新闻

推荐新闻