####

大数据课程第四天

Hadoop相关的配置信息

core # 基础通用配置内容 1.namenode总入口 2.临时目录

hdfs # hdfs相关内容的配置 1.权限 2.副本 3. HA高可用

mapred # mapreduce相关的配置

yarn   # yarn相关的配置

#底层的配置文件,存储都是默认值,根据需要进行修改

core-default.xml

hdfs-default.xml

marpred-default.xml

yarn-default.xml

# HADOOP_HOME/etc/hadoop

core-site.xml

hdfs-site.xml

mapred-site.xml

yarn-site.xml

# 代码级 维护性查  优先级高

Configuration configuration = new Configuration();

configuration.set("fs.default.name","hdfs://hadoop:8020");

configuration.set("key","value");

.....

FileSystem fileSystem = FileSystem.get(configuration);

# 代码级 维护性好  优先级低

Configuration configuration = new Configuration();

configuration.addResource("core-site.xml");

configuration.addResource("hdfs-site.xml");

configuration.addResource("marpred-site.xml");

configuration.addResource("yarn-site.xml");

FileSystem fileSystem = FileSystem.get(configuration);

#Hadoop shell命令 直接指定 配置信息

#测试

bin/hdfs dfs -ls / -Dfs.defaultFS=xxxx

MapReduce编程

MapReduce基于HDFS之上一种计算平台,计算框架

MapReduce运行原理:

搭建yarn集群 NameNode不能和ResourceManager放置在同一台节点 #保证resourcemanager和namenode不放置在同一个节点,修改yarn-site.xml

#启动yarn 一定要在resourcemanager所在的机器上执行启动命令

sbin/start-yarn.sh

布置作业: HAHDFS集群基础上 搭建HAYarn集群

MapReduce的核心5步骤

MR经典案例WordCount 思路分析

MapReduce编程代码

org.apache.hadoopgroupId>

hadoop-commonartifactId>

2.5.2version>

dependency>​

org.apache.hadoopgroupId>

hadoop-clientartifactId>

2.5.2version>

dependency>​

org.apache.hadoopgroupId>

hadoop-hdfsartifactId>

2.5.2version>

dependency>​

org.apache.hadoopgroupId>

hadoop-mapreduce-client-coreartifactId>

2.5.2version>

dependency>​

org.apache.hadoopgroupId>

hadoop-yarn-commonartifactId>

2.5.2version>

dependency>

public classTestMapReduce {/*** k1 LongWritable

* v1 Text

*

*

* k2 Text

* v2 IntWritable*/

public static class MyMap extends Mapper{

Text k2= newText();

IntWritable v2= newIntWritable();

@Override/*** k1 key 0

* v1 value suns xiaohei*/

protected void map(LongWritable key, Text value, Context context) throwsIOException, InterruptedException {

String line=value.toString();

String[] words= line.split("\t");for(String word:words) {

k2.set(word);

v2.set(1);

context.write(k2,v2);

}

}

}public static class MyReduce extends Reducer{

Text k3= newText();

IntWritable v3= newIntWritable();

@Overrideprotected void reduce(Text key, Iterablevalues, Context context) throwsIOException, InterruptedException {int result = 0;for(IntWritable value:values) {

result+=value.get();

}

k3.set(key);

v3.set(result);

context.write(k3,v3);

}

}public static void main(String[] args)throwsException {

Job job=Job.getInstance();

job.setJarByClass(TestMapReduce.class);

job.setJobName("first");//inputFormat

TextInputFormat.addInputPath(job,new Path("/test"));//map

job.setMapperClass(MyMap.class);

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(IntWritable.class);//shuffle 自动完成//reduce

job.setReducerClass(MyReduce.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);//outputFormat

TextOutputFormat.setOutputPath(job,new Path("/dest1"));

job.waitForCompletion(true);

}

}

MapReduce的部署

注意:(yarn命令需要早hadoop安装的bin目录运行)

①最直接方法

直接maven打包,将jar包scp上传到到服务器即可

bin/yarn jar hadoop-mapreduce.jar 运行

bin/hdfs dfs -text /dest1/part-r-00000查看结果

Bytes Written=38[root@hadoop hadoop-2.5.2]# bin/hdfs dfs -text /dest1/part-r-00000

19/01/24 09:40:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes whereapplicable

aaa2(次)

bbb2jjj1kkkk1lhc1ssss1

②maven的一键打包上传

IDEA-file-setting-plugins搜索Maven Helper 安装后重启IDEA

pom.xml如下配置:

UTF-8project.build.sourceEncoding>

1.7maven.compiler.source>

1.7maven.compiler.target>

com.baizhi.TestMapReducebaizhi-mainClass>

192.168.194.147target-host>

/opt/install/hadoop-2.5.2target-position>

properties>

...

org.apache.maven.wagongroupId>

wagon-sshartifactId>

2.8version>

extension>

extensions>

org.apache.maven.pluginsgroupId>

maven-jar-pluginartifactId>

2.3.2version>

${basedir}outputDirectory>

${baizhi-mainClass}mainClass>

manifest>

archive>

configuration>

plugin>

org.codehaus.mojogroupId>

wagon-maven-pluginartifactId>

1.0version>

${project.build.finalName}.jarfromFile>

scp://root:123456@${target-host}${target-position}url>​configuration>

plugin>

plugings>

build>

以上配置好后就可以点击maven插件,先双击Jar:jar完成打包,在点击wagon:upload完成上传

但是怎么一键完成上诉两个步骤呢?

这时候就需要上面安装的插件maven helper了,pom.xml文件上右键点击:

Run Maven ->new Goal 输入内容:jar:jar wagon:upload 点击OK即可完成打包上传一键完成

③maven的一键打包上传及其运行

在②上面的基础上,给wagon添加commands运行命令,如下:

org.codehaus.mojogroupId>

wagon-maven-pluginartifactId>

1.0version>

${project.build.finalName}.jarfromFile>

scp://root:123456@${target-host}${target-position}url>

pkill -f ${project.build.finalName}.jarcommand>

nohup /opt/install/hadoop-2.5.2/bin/yarn jar /opt/install/hadoop-2.5.2/${project.build.finalName}.jar > /root/nohup.out 2>&1 &command>

commands>

truedisplayCommandOutputs>

configuration>

plugin>

接着在mavenhelper 添加new Goal:

jar:jar wagon:upload-single wagon:sshexec

运行之前记得先complie一下,确保项目的target目录里已将编译好了

在resourcemanager节点上查看nohup.out文件,可见运行成功

ResourceManager的高可用(HA)

①.yarn-site.xml下配置如下内容

yarn.nodemanager.aux-servicesname>

mapreduce_shufflevalue>

property>

yarn.resourcemanager.ha.enabledname>

truevalue>

property>

yarn.resourcemanager.cluster-idname>

lhcvalue>

property>

yarn.resourcemanager.ha.rm-idsname>

rm1,rm2value>

property>

yarn.resourcemanager.hostname.rm1name>

hadoop1value>

property>

yarn.resourcemanager.hostname.rm2name>

hadoop2value>

property>

yarn.resourcemanager.zk-addressname>

hadoop:2181,hadoop1:2181,hadoop2:2181value>

property>

configuration>

②.分别在hadoop1,hadoop2的hadoop安装目录上运行: sbin/start-yarn.sh 启动ResourceManag

③.运行jps查看进程, ResourceManager正常启动

[root@hadoop1 hadoop-2.5.2]# jps

4552 NameNode

4762 DFSZKFailoverController

4610 DataNode

5822 ResourceManager

6251 Jps

4472 JournalNode

4426 QuorumPeerMain

④.分别运行:bin/yarn rmadmin -getServiceState rm2和bin/yarn rmadmin -getServiceState rm1

查看两节点的REsourceMananger的状态,一个为active,另一个为standby

[root@hadoop1 hadoop-2.5.2]# bin/yarn rmadmin -getServiceState rm1

19/01/24 11:56:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

active

[root@hadoop1 hadoop-2.5.2]# bin/yarn rmadmin -getServiceState rm2

19/01/24 11:58:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

standby

⑤将一台的rm1的ResourceManager关闭,再次执行:bin/yarn rmadmin -getServiceState rm2

发现:rm2状态为active,这就实现了ResManager的自动故障转移

详情见博客:https://blog.csdn.net/skywalker_only/article/details/41726189

Logo

技术共进,成长同行——讯飞AI开发者社区

更多推荐