歡迎來到Linux教程網
Linux教程網
Linux教程網
Linux教程網
您现在的位置: Linux教程網 >> UnixLinux >  >> Linux編程 >> Linux編程

為Hadoop的MapReduce程序編寫makefile

最近需要把基於Hadoop的MapReduce程序集成到一個大的用C/C++編寫的框架中,需要在make的時候自動將MapReduce應用進行編譯和打包。這裡以簡單的WordCount1為例說明具體的實現細節,注意:hadoop版本為2.4.0.

源代碼包含兩個文件,一個是WordCount1.java是具體的對單詞計數實現的邏輯;第二個是CounterThread.java,其中簡單的當前處理的行數做一個統計和打印。代碼分別見附1. 編寫makefile的關鍵是將hadoop提供的jar包的路徑全部加載進來,看到網上很多資料都自己實現一個腳本把hadoop目錄下所有的.jar文件放到一個路徑中,然後進行編譯,這種做法太麻煩了。當然也有些簡單的辦法,但是都是比較老的hadoop版本如0.20之類的。

其實,hadoop提供了一個命令hadoop classpath可以獲得包含所有jar包的路徑.所以只需要用 javac -classpath "`hadoop classpath`" *.java 便可,然後使用jar -cvf對class文件進行打包就可以了。

--------------------------------------分割線 --------------------------------------

Ubuntu 13.04上搭建Hadoop環境 http://www.linuxidc.com/Linux/2013-06/86106.htm

Ubuntu 12.10 +Hadoop 1.2.1版本集群配置 http://www.linuxidc.com/Linux/2013-09/90600.htm

Ubuntu上搭建Hadoop環境(單機模式+偽分布模式) http://www.linuxidc.com/Linux/2013-01/77681.htm

Ubuntu下Hadoop環境的配置 http://www.linuxidc.com/Linux/2012-11/74539.htm

單機版搭建Hadoop環境圖文教程詳解 http://www.linuxidc.com/Linux/2012-02/53927.htm

--------------------------------------分割線 --------------------------------------

具體的Makefile代碼如下

SRC_DIR = src/mypackage/*.java
CLASS_DIR = bin
TARGET_JAR = WordCount

all:$(TARGET_JAR)

$(TARGET_JAR): $(SRC_DIR)
 mkdir -p $(CLASS_DIR)
# javac -classpath `$(HADOOP) classpath` -d $(CLASS_DIR) $(SRC_DIR)
 javac -classpath "`hadoop classpath`" src/mypackage/*.java -d $(CLASS_DIR) -Xlint
 jar -cvf $(TARGET_JAR).jar -C $(CLASS_DIR) ./
 
clean:
 rm -rf $(CLASS_DIR) *.jar

make一下

lichao@ubuntu:WordCount1$ make
mkdir -p bin
javac -classpath "`hadoop classpath`" src/mypackage/*.java -d bin -Xlint
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/common/lib/jaxb-api.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/common/lib/activation.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/common/lib/jsr173_1.0_api.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/common/lib/jaxb1-impl.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/yarn/lib/jaxb-api.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/yarn/lib/activation.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/yarn/lib/jsr173_1.0_api.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/yarn/lib/jaxb1-impl.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/contrib/capacity-scheduler/*.jar": no such file or directory
src/mypackage/WordCount1.java:61: warning: [deprecation] Job(Configuration,String) in Job has been deprecated
  Job job = new Job(conf, "WordCount1");                  //建立新job
            ^
10 warnings
jar -cvf WordCount.jar -C bin ./
added manifest
adding: mypackage/(in = 0) (out= 0)(stored 0%)
adding: mypackage/WordCount1.class(in = 1970) (out= 1037)(deflated 47%)
adding: mypackage/CounterThread.class(in = 1760) (out= 914)(deflated 48%)
adding: mypackage/WordCount1$IntSumReducer.class(in = 1762) (out= 749)(deflated 57%)
adding: mypackage/WordCount1$TokenizerMapper.class(in = 1759) (out= 762)(deflated 56%)
adding: log4j.properties(in = 476) (out= 172)(deflated 63%)

雖然有warning,但是不影響結果。編譯後,我們來簡單的測試一下。

先生成測試數據:while true; do seq 1 100000 >> tmpfile; done; 差不多可以了就Ctrl+c
 
然後將數據放到hdfs上,hadoop fs -put tmpfile /data/
 
接著運行MapReduce程序:hadoop jar WordCount.jar mypackage/WordCount1 /data/tmpfile /output2
 
效果如下:

14/07/15 13:26:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/07/15 13:26:03 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
14/07/15 13:26:05 INFO input.FileInputFormat: Total input paths to process : 1
14/07/15 13:26:05 INFO mapreduce.JobSubmitter: number of splits:6
14/07/15 13:26:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1405397597558_0003
14/07/15 13:26:06 INFO impl.YarnClientImpl: Submitted application application_1405397597558_0003
14/07/15 13:26:06 INFO mapreduce.Job: The url to track the job: http://ubuntu:8088/proxy/application_1405397597558_0003/
14/07/15 13:26:06 INFO mapreduce.Job: Running job: job_1405397597558_0003
14/07/15 13:26:20 INFO mapreduce.Job: Job job_1405397597558_0003 running in uber mode : false
14/07/15 13:26:20 INFO mapreduce.Job:  map 0% reduce 0%
14/07/15 13:26:34 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
輸入行數:0
14/07/15 13:26:48 INFO mapreduce.Job:  map 2% reduce 0%
輸入行數:3138474
14/07/15 13:26:51 INFO mapreduce.Job:  map 5% reduce 0%
14/07/15 13:26:54 INFO mapreduce.Job:  map 6% reduce 0%
14/07/15 13:26:55 INFO mapreduce.Job:  map 8% reduce 0%
14/07/15 13:26:57 INFO mapreduce.Job:  map 9% reduce 0%
14/07/15 13:26:58 INFO mapreduce.Job:  map 11% reduce 0%
14/07/15 13:27:00 INFO mapreduce.Job:  map 12% reduce 0%
14/07/15 13:27:01 INFO mapreduce.Job:  map 13% reduce 0%
輸入行數:23383595
14/07/15 13:27:05 INFO mapreduce.Job:  map 14% reduce 0%
輸入行數:23383595
14/07/15 13:27:23 INFO mapreduce.Job:  map 15% reduce 0%
14/07/15 13:27:27 INFO mapreduce.Job:  map 16% reduce 0%
14/07/15 13:27:28 INFO mapreduce.Job:  map 18% reduce 0%
14/07/15 13:27:30 INFO mapreduce.Job:  map 19% reduce 0%
14/07/15 13:27:31 INFO mapreduce.Job:  map 21% reduce 0%
14/07/15 13:27:34 INFO mapreduce.Job:  map 24% reduce 0%
輸入行數:38430301
14/07/15 13:27:37 INFO mapreduce.Job:  map 25% reduce 0%
14/07/15 13:27:40 INFO mapreduce.Job:  map 26% reduce 0%
輸入行數:42826322
14/07/15 13:27:57 INFO mapreduce.Job:  map 27% reduce 0%
14/07/15 13:28:00 INFO mapreduce.Job:  map 29% reduce 0%
14/07/15 13:28:02 INFO mapreduce.Job:  map 30% reduce 0%
14/07/15 13:28:03 INFO mapreduce.Job:  map 32% reduce 0%
輸入行數:54513531
14/07/15 13:28:05 INFO mapreduce.Job:  map 33% reduce 0%
14/07/15 13:28:06 INFO mapreduce.Job:  map 34% reduce 0%
14/07/15 13:28:08 INFO mapreduce.Job:  map 35% reduce 0%
14/07/15 13:28:09 INFO mapreduce.Job:  map 36% reduce 0%
輸入行數:60959081
14/07/15 13:28:22 INFO mapreduce.Job:  map 42% reduce 0%
14/07/15 13:28:30 INFO mapreduce.Job:  map 43% reduce 0%
14/07/15 13:28:31 INFO mapreduce.Job:  map 44% reduce 0%
14/07/15 13:28:34 INFO mapreduce.Job:  map 45% reduce 0%
14/07/15 13:28:35 INFO mapreduce.Job:  map 46% reduce 0%
輸入行數:69936159
14/07/15 13:28:37 INFO mapreduce.Job:  map 47% reduce 0%
14/07/15 13:28:38 INFO mapreduce.Job:  map 48% reduce 0%
14/07/15 13:28:41 INFO mapreduce.Job:  map 49% reduce 0%
14/07/15 13:28:44 INFO mapreduce.Job:  map 50% reduce 0%
輸入行數:77160461
14/07/15 13:29:01 INFO mapreduce.Job:  map 51% reduce 0%
14/07/15 13:29:04 INFO mapreduce.Job:  map 52% reduce 0%
14/07/15 13:29:05 INFO mapreduce.Job:  map 53% reduce 0%
輸入行數:83000373
14/07/15 13:29:07 INFO mapreduce.Job:  map 54% reduce 0%
14/07/15 13:29:09 INFO mapreduce.Job:  map 55% reduce 0%
14/07/15 13:29:10 INFO mapreduce.Job:  map 56% reduce 0%
14/07/15 13:29:13 INFO mapreduce.Job:  map 57% reduce 0%
14/07/15 13:29:16 INFO mapreduce.Job:  map 58% reduce 0%
輸入行數:93361766
14/07/15 13:29:32 INFO mapreduce.Job:  map 59% reduce 0%
輸入行數:98194696
14/07/15 13:29:35 INFO mapreduce.Job:  map 60% reduce 0%
14/07/15 13:29:37 INFO mapreduce.Job:  map 61% reduce 0%
14/07/15 13:29:38 INFO mapreduce.Job:  map 62% reduce 0%
14/07/15 13:29:40 INFO mapreduce.Job:  map 63% reduce 0%
14/07/15 13:29:41 INFO mapreduce.Job:  map 64% reduce 0%
14/07/15 13:29:44 INFO mapreduce.Job:  map 65% reduce 0%
14/07/15 13:29:48 INFO mapreduce.Job:  map 66% reduce 0%
輸入行數:109562184
14/07/15 13:30:04 INFO mapreduce.Job:  map 67% reduce 0%
輸入行數:113362818
14/07/15 13:30:06 INFO mapreduce.Job:  map 68% reduce 0%
14/07/15 13:30:08 INFO mapreduce.Job:  map 69% reduce 0%
14/07/15 13:30:10 INFO mapreduce.Job:  map 70% reduce 0%
14/07/15 13:30:12 INFO mapreduce.Job:  map 71% reduce 0%
14/07/15 13:30:15 INFO mapreduce.Job:  map 72% reduce 0%
輸入行數:123074119
14/07/15 13:30:32 INFO mapreduce.Job:  map 76% reduce 0%
14/07/15 13:30:33 INFO mapreduce.Job:  map 80% reduce 0%
14/07/15 13:30:34 INFO mapreduce.Job:  map 83% reduce 0%
14/07/15 13:30:35 INFO mapreduce.Job:  map 84% reduce 0%
輸入行數:123074119
14/07/15 13:30:37 INFO mapreduce.Job:  map 89% reduce 0%
14/07/15 13:30:38 INFO mapreduce.Job:  map 92% reduce 0%
14/07/15 13:30:39 INFO mapreduce.Job:  map 95% reduce 0%
14/07/15 13:30:40 INFO mapreduce.Job:  map 100% reduce 0%
輸入行數:123074119
14/07/15 13:30:53 INFO mapreduce.Job:  map 100% reduce 100%
14/07/15 13:30:53 INFO mapreduce.Job: Job job_1405397597558_0003 completed successfully
14/07/15 13:30:53 INFO mapreduce.Job: Counters: 50
 File System Counters
  FILE: Number of bytes read=58256119
  FILE: Number of bytes written=66039749
  FILE: Number of read operations=0
  FILE: Number of large read operations=0
  FILE: Number of write operations=0
  HDFS: Number of bytes read=724520133
  HDFS: Number of bytes written=1088895
  HDFS: Number of read operations=21
  HDFS: Number of large read operations=0
  HDFS: Number of write operations=2
 Job Counters
  Killed map tasks=2
  Launched map tasks=8
  Launched reduce tasks=1
  Data-local map tasks=8
  Total time spent by all maps in occupied slots (ms)=1528715
  Total time spent by all reduces in occupied slots (ms)=17508
  Total time spent by all map tasks (ms)=1528715
  Total time spent by all reduce tasks (ms)=17508
  Total vcore-seconds taken by all map tasks=1528715
  Total vcore-seconds taken by all reduce tasks=17508
  Total megabyte-seconds taken by all map tasks=1565404160
  Total megabyte-seconds taken by all reduce tasks=17928192
 Map-Reduce Framework
  Map input records=123074119
  Map output records=123074119
  Map output bytes=1216795535
  Map output materialized bytes=7133406
  Input split bytes=594
  Combine input records=127374119
  Combine output records=4900000
  Reduce input groups=100000
  Reduce shuffle bytes=7133406
  Reduce input records=600000
  Reduce output records=100000
  Spilled Records=5500000
  Shuffled Maps =6
  Failed Shuffles=0
  Merged Map outputs=6
  GC time elapsed (ms)=39761
  CPU time spent (ms)=1397060
  Physical memory (bytes) snapshot=1797943296
  Virtual memory (bytes) snapshot=5082316800
  Total committed heap usage (bytes)=1398800384
 Shuffle Errors
  BAD_ID=0
  CONNECTION=0
  IO_ERROR=0
  WRONG_LENGTH=0
  WRONG_MAP=0
  WRONG_REDUCE=0
 File Input Format Counters
  Bytes Read=724519539
 File Output Format Counters
  Bytes Written=1088895

更多詳情見請繼續閱讀下一頁的精彩內容: http://www.linuxidc.com/Linux/2014-07/104316p2.htm

Copyright © Linux教程網 All Rights Reserved