您现在的位置： Linux教程網 >> UnixLinux > >> Linux編程 >> Linux編程

MapReduce經典案例分享

資源文件math 張三 99 李四 90 王五 90 趙六 60 資源文件china 張三 79 李四 75 王五 80 趙六 90 資源文件english 張三 89 李四 75 王五 70 趙六 90 分析： map 階段將將學生姓名作為key 成績作為value.這樣Reduce階段得到的數據就是 key:張三 value:{99,79,89} …… 在Reduce中將學生的成績球平均值。 實現：

package com.bwzy.Hadoop;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Reducer.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import com.bwzy.hadoop.HeBing.Map;
import com.bwzy.hadoop.HeBing.Reduce;
public class AvgSorce extends Configured implements Tool {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
public void map(LongWritable key, Text value, Context context) throws IOException,InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while(tokenizer.hasMoreElements()){
String strName = tokenizer.nextToken();
String strSorce = tokenizer.nextToken();
context.write(new Text(strName), new IntWritable(Integer.parseInt(strSorce)));
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
int num = 0;
for (IntWritable sorce : values) {
sum+=sorce.get();
num++;
}
context.write(key, new IntWritable((int)(sum/num)));
}
}
@Override
public int run(String[] arg0) throws Exception {
Job job = new Job(getConf());
job.setJobName("AvgSorce");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
// job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(arg0[0]));
FileOutputFormat.setOutputPath(job, new Path(arg0[1]));
boolean success = job.waitForCompletion(true);
return success ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int ret = ToolRunner.run(new AvgSorce(), args);
System.exit(ret);
}
}

運行： 1：將程序打包選中打包的類-->右擊-->Export-->java-->JAR file--填入保存路徑-->完成 2：將jar包拷貝到hadoop的目錄下。（因為程序中用到來hadoop的jar包） 3：將資源文件上傳到定義的hdfs目錄下創建hdfs目錄命令（在hadoop已經成功啟動的前提下）：hadoop fs -mkdir /自定義/自定義/input 上傳本地資源文件到hdfs上：hadop fs -put -copyFromLocal /home/user/Document/math /自定義/自定義/input …… 4：運行MapReduce程序： hadoop jar /home/user/hadoop-1.0.4/AvgSorce.jar com.bwzy.hadoop.AvgSorce /自定義/自定義/input /自定義/自定義/output 說明：hadoop運行後會自動創建/自定義/自定義/output目錄，在該目錄下會有兩個文件，其中一個文件中存放來MapReduce運行的結果。如果重新運行該程序，需要將/自定義/自定義/output目錄刪除，否則系統認為該結果已經存在了。 5：運行的結果為張三 89 李四 80 王五 80

趙六 80

上一篇文章： C指針解析及指針作為參數傳遞的應用
下一篇文章： Spring MVC底層的簡單實現

Linux編程