Spark大数据分析与实战:HDFS文件操作 |
您所在的位置:网站首页 › spark操作hdfs文件 › Spark大数据分析与实战:HDFS文件操作 |
Spark大数据分析与实战:HDFS文件操作
一、安装Hadoop和Spark
具体的安装过程在我以前的博客里面有,大家可以通过以下链接进入操作 Linux基础环境搭建(CentOS7)- 安装Hadoop Linux基础环境搭建(CentOS7)- 安装Scala和Spark 二、启动Hadoop与Spark 查看3个节点的进程master
Shell命令: [root@master ~]# hadoop fs -mkdir /user/hadoop SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hbase/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]运行截图: Shell命令: [root@master ~]# vim /usr/hadoop/test.txt [root@master ~]# hadoop fs -put /usr/hadoop/test.txt /user/hadoop SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hbase/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] [root@master ~]# hadoop fs -cat /user/hadoop/test.txt SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hbase/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] I love hadoop and spark! John Zhuang!运行截图:
Shell 命令: [root@master ~]# mkdir /usr/hadoop/download [root@master ~]# hadoop fs -get /user/hadoop/test.txt /usr/hadoop/download/ SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hbase/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] [root@master ~]# cat /usr/h hadoop/ hbase/ hive/ [root@master ~]# cat /usr/hadoop/test.txt I love hadoop and spark! John Zhuang!运行截图: Shell 命令: [root@master ~]# hadoop fs -text /user/hadoop/test.txt SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hbase/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] I love hadoop and spark! John Zhuang!运行截图: Shell命令: [root@master ~]# hadoop fs -mkdir /user/hadoop/input SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hbase/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] [root@master ~]# hadoop fs -cp /user/hadoop/test.txt /user/hadoop/input SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hbase/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]运行截图: Shell命令: [root@master ~]# hadoop fs -rm -r /user/hadoop/input/ SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hbase/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 21/03/18 17:22:54 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. Deleted /user/hadoop/input [root@master ~]# hadoop fs -ls /user/hadoop/ SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hbase/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Found 1 items -rw-r--r-- 2 root supergroup 38 2021-03-18 17:14 /user/hadoop/test.txt运行截图: Shell命令: [root@master spark-2.4.0-bin-hadoop2.7]# spark-shell --master spark://master:7077 --executor-memory 512M Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://master:4040 Spark context available as 'sc' (master = spark://master:7077, app id = app-20210318230638-0000). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.0 /_/ Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171) Type in expressions to have them evaluated. Type :help for more information. scala> var file=sc.textFile("file:///usr/hadoop/test.txt") file: org.apache.spark.rdd.RDD[String] = file:///usr/hadoop/test.txt MapPartitionsRDD[1] at textFile at :24 scala> val length = file.count() length: Long = 2运行截图: Shell命令: [root@master ~]# spark-shell --master spark://master:7077 --executor-memory 512M Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://master:4040 Spark context available as 'sc' (master = spark://master:7077, app id = app-20210318231630-0000). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.0 /_/ Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171) Type in expressions to have them evaluated. Type :help for more information. scala> var hdfsfile=sc.textFile("/user/hadoop/test.txt") hdfsfile: org.apache.spark.rdd.RDD[String] = /user/hadoop/test.txt MapPartitionsRDD[1] at textFile at :24 scala> val length = hdfsfile.count() length: Long = 2运行截图: 提示:如果IDEA未构建Spark项目,可以转接到以下的博客: IDEA使用Maven构建Spark项目:https://blog.csdn.net/weixin_47580081/article/details/115435536 源代码: package com.John.Sparkstudy.SparkTest.Test01 import org.apache.spark.{SparkConf, SparkContext} /** * @author John * @Date 2021/3/19 7:27 */ object CountTest01 { def main(args: Array[String]): Unit = { // 创建SparkContext val conf = new SparkConf().setAppName("test_Count") val sc = new SparkContext(conf) // 加载文件 val file = sc.textFile("hdfs:///user/hadoop/test.txt") // 处理文件 val num = file.count() // 输出结果 println("文件 " + file.name + " 的行数为:" + num + " 行") } }Shell命令: [root@master jarFile]# rz -E rz waiting to receive. [root@master jarFile]# ll 总用量 8 -rw-r--r--. 1 root root 6394 3月 19 07:53 original-SparkTest-1.0-SNAPSHOT.jar [root@master jarFile]# cd /usr/spark/spark-2.4.0-bin-hadoop2.7/ [root@master spark-2.4.0-bin-hadoop2.7]# bin/spark-submit --class com.John.Sparkstudy.SparkTest.Test01.CountTest01 --master spark://master:7077 /usr/testFile/jarFile/original-SparkTest-1.0-SNAPSHOT.jar运行截图:
|
今日新闻 |
推荐新闻 |
CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3 |