大数据计算框架MapReduce入门

MapReduce实现WordCount

fengchengwu2012

830人浏览 · 2021-12-06 17:48:00

fengchengwu2012 · 2021-12-06 17:48:00 发布

一、准备hadoop环境

hadoop的离线计算框架MapReduce,实现WordCount就稍许麻烦了，在计算效率上比Flink 、Spark还是要逊色很多，下面使用MapReduce实现WordCount。

二、编写Map类

public class WordCountMapper  extends Mapper<LongWritable, Text,Text, IntWritable> {
    Text k = new Text();
    IntWritable v = new IntWritable(1);
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        // 1 获取一行
        String line = value.toString();
        // 2 切割
        String[] words = line.split(" ");
        // 3 输出
        for (String word : words) {
            k.set(word);
            context.write(k, v);
        }
    }
}

三、编写Reduce类

public class WordCountReduce  extends Reducer<Text, IntWritable,Text,IntWritable> {
    int sum;
    IntWritable v = new IntWritable();
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        // 1 累加求和
        sum = 0;
        for (IntWritable count : values) {
            sum += count.get();
        }
        // 2 输出
        v.set(sum);
        context.write(key,v);
    }
}

四、提交计算

    /**
     *MapReduce 编程  yarn
     */
    public  static   void  mapReduceDriver(String ... args) throws IOException, ClassNotFoundException, InterruptedException {
        // 1 获取配置信息以及封装任务
        Configuration configuration = new Configuration();
        Job job = Job.getInstance(configuration);
        // 2 设置jar加载路径
        job.setJarByClass(HadoopApp.class);
        // 3 设置map和reduce类
        job.setMapperClass(WordCountMapper.class);
        job.setReducerClass(WordCountReduce.class);
        // 4 设置map输出
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        // 5 设置最终输出kv类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        // 6 设置输入和输出路径
        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        // 7 提交
        boolean result = job.waitForCompletion(true);
        System.exit(result ? 0 : 1);
    }

测试

        String inPath="D:/workplace/java-item/res/file";
        String outPath = inPath+"/hadoop";
        try {
            HadoopUtils.mapReduceDriver(inPath+"/hadoop_word_count.txt", outPath+"/out_word_count.txt");
        } catch (IOException | ClassNotFoundException | InterruptedException e) {
            e.printStackTrace();
            System.out.println("执行任务调度失败");
        }

技术共进，成长同行——讯飞AI开发者社区

更多推荐

AI Compass前沿速览：Kimi K2、InfinityHuman-AI数字人、3D-AI桌面伴侣、叠叠社–AI虚拟陪伴

Apertus是瑞士由EPFL、ETH Zurich和瑞士国家超级计算中心（CSCS）联合推出的首个大规模、开放、多语言的大型语言模型（LLM），作为瑞士AI倡议的一部分，旨在推动透明、开放且合规的AI发展。Midoo AI基于对AI工具聚合平台内容的分析，该平台汇集了多种人工智能工具，旨在提升用户在信息处理、内容创作及学习辅助方面的效率。在数据收集阶段，严格执行数据隐私和版权保护原则，仅使用公开