使用Java開發一個helloworld級別UDF,打包成udf.jar,存放在/home/Hadoop/lib下,代碼如下:
package com.luogankun.udf; import org.apache.hadoop.hive.ql.exec.UDF; public class HelloUDF extends UDF { public String evaluate(String str) { try { return "HelloWorld " + str; } catch (Exception e) { return null; } } }
Hive中使用UDF
cd $SPARK_HOME/bin spark-sql --jars /home/hadoop/lib/udf.jar CREATE TEMPORARY FUNCTION hello AS 'com.luogankun.udf.HelloUDF';
select hello(url) from page_views limit 1;
SparkSQL中使用UDF
方式一:在啟動spark-sql時通過--jars指定
cd $SPARK_HOME/bin spark-sql --jars /home/hadoop/lib/udf.jar CREATE TEMPORARY FUNCTION hello AS 'com.luogankun.udf.HelloUDF';
select hello(url) from page_views limit 1;
方式二:先啟動spark-sql後add jar
cd $SPARK_HOME/bin spark-sql add jar /home/hadoop/lib/udf.jar; CREATE TEMPORARY FUNCTION hello AS 'com.luogankun.udf.HelloUDF';
select hello(url) from page_views limit 1;
在測試過程中發現並不支持該種方式,會報java.lang.ClassNotFoundException: com.luogankun.udf.HelloUDF
如何解決?
1)需要先將udf.jar的路徑配置到spark-env.sh的SPARK_CLASSPATH中,形如:
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/hadoop/software/mysql-connector-java-5.1.27-bin.jar:/home/hadoop/lib/udf.jar
2)再啟動spark-sql,直接CREATE TEMPORARY FUNCTION即可;
cd $SPARK_HOME/bin spark-sql CREATE TEMPORARY FUNCTION hello AS 'com.luogankun.udf.HelloUDF';
select hello(url) from page_views limit 1;
方式三:Thrift JDBC Server中使用UDF
在beeline命令行中執行:
add jar /home/hadoop/lib/udf.jar; CREATE TEMPORARY FUNCTION hello AS 'com.luogankun.udf.HelloUDF';
select hello(url) from page_views limit 1;
Java編程思想(第4版) 中文清晰PDF完整版 http://www.linuxidc.com/Linux/2014-08/105403.htm
編寫高質量代碼 改善Java程序的151個建議 PDF高清完整版 http://www.linuxidc.com/Linux/2014-06/103388.htm
Java 8簡明教程 http://www.linuxidc.com/Linux/2014-03/98754.htm
Java對象初始化順序的簡單驗證 http://www.linuxidc.com/Linux/2014-02/96220.htm
Java對象值傳遞和對象傳遞的總結 http://www.linuxidc.com/Linux/2012-12/76692.htm