您现在的位置： Linux教程網 >> UnixLinux > >> Linux編程 >> Linux編程

SparkSQL使用之Thrift JDBC Server

Thrift JDBC Server描述

Thrift JDBC Server使用的是HIVE0.12的HiveServer2實現。能夠使用Spark或者hive0.12版本的beeline腳本與JDBC Server進行交互使用。Thrift JDBC Server默認監聽端口是10000。

使用Thrift JDBC Server前需要注意：

1、將hive-site.xml配置文件拷貝到$SPARK_HOME/conf目錄下；

2、需要在$SPARK_HOME/conf/spark-env.sh中的SPARK_CLASSPATH添加jdbc驅動的jar包

export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/Hadoop/software/mysql-connector-java-5.1.27-bin.jar

Thrift JDBC Server命令使用幫助：

cd $SPARK_HOME/sbin
start-thriftserver.sh --help

復制代碼
Usage: ./sbin/start-thriftserver [options] [thrift server options]
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Options:
--master MASTER_URL spark://host:port, mesos://host:port, yarn, or local.
--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
on one of the worker machines inside the cluster ("cluster")
(Default: client).
--class CLASS_NAME Your application's main class (for Java / Scala apps).
--name NAME A name of your application.
--jars JARS Comma-separated list of local jars to include on the driver
and executor classpaths.
--py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place
on the PYTHONPATH for Python apps.
--files FILES Comma-separated list of files to be placed in the working
directory of each executor.

--conf PROP=VALUE Arbitrary Spark configuration property.
--properties-file FILE Path to a file from which to load extra properties. If not
specified, this will look for conf/spark-defaults.conf.

--driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 512M).
--driver-java-options Extra Java options to pass to the driver.
--driver-library-path Extra library path entries to pass to the driver.
--driver-class-path Extra class path entries to pass to the driver. Note that
jars added with --jars are automatically included in the
classpath.

--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G).

--help, -h Show this help message and exit
--verbose, -v Print additional debug output

Spark standalone with cluster deploy mode only:
--driver-cores NUM Cores for driver (Default: 1).
--supervise If given, restarts the driver on failure.

Spark standalone and Mesos only:
--total-executor-cores NUM Total cores for all executors.

YARN-only:
--executor-cores NUM Number of cores per executor (Default: 1).
--queue QUEUE_NAME The YARN queue to submit to (Default: "default").
--num-executors NUM Number of executors to launch (Default: 2).
--archives ARCHIVES Comma separated list of archives to be extracted into the
working directory of each executor.

Thrift server options:
--hiveconf <property=value> Use value for given property

master的描述與Spark SQL CLI一致

beeline命令使用幫助：

cd $SPARK_HOME/bin
beeline --help

Usage: java org.apache.hive.cli.beeline.BeeLine
-u <database url> the JDBC URL to connect to
-n <username> the username to connect as
-p <password> the password to connect as
-d <driver class> the driver class to use
-e <query> query that should be executed
-f <file> script file that should be executed
--color=[true/false] control whether color is used for display
--showHeader=[true/false] show column names in query results
--headerInterval=ROWS; the interval between which heades are displayed
--fastConnect=[true/false] skip building table/column list for tab-completion
--autoCommit=[true/false] enable/disable automatic transaction commit
--verbose=[true/false] show verbose error messages and debug info
--showWarnings=[true/false] display connection warnings
--showNestedErrs=[true/false] display nested errors
--numberFormat=[pattern] format numbers using DecimalFormat pattern
--force=[true/false] continue running script even after errors
--maxWidth=MAXWIDTH the maximum width of the terminal
--maxColumnWidth=MAXCOLWIDTH the maximum width to use when displaying columns
--silent=[true/false] be more silent
--autosave=[true/false] automatically save preferences
--outputformat=[table/vertical/csv/tsv] format mode for result display
--isolation=LEVEL set the transaction isolation level
--help display this message

Thrift JDBC Server/beeline啟動

啟動Thrift JDBC Server：默認端口是10000

cd $SPARK_HOME/sbin
start-thriftserver.sh

如何修改Thrift JDBC Server的默認監聽端口號？借助於--hiveconf

start-thriftserver.sh --hiveconf hive.server2.thrift.port=14000

HiveServer2 Clients 詳情參見：https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients

啟動beeline

cd $SPARK_HOME/bin
beeline -u jdbc:hive2://hadoop000:10000/default -n hadoop

sql腳本測試

SELECT track_time, url, session_id, referer, ip, end_user_id, city_id FROM page_views WHERE city_id = -1000 limit 10;
SELECT session_id, count(*) c FROM page_views group by session_id order by c desc limit 10;

上一篇文章： SparkSQL使用之Spark SQL CLI
下一篇文章： SparkSQL使用之JDBC代碼訪問Thrift JDBC Server

Linux編程

在非XFree86的X Server下使用中文TrueType字庫

FreeBSD上的real server安裝及使用

SparkSQL使用之Spark SQL CLI

SparkSQL使用之JDBC代碼訪問Thrift JDBC Server

SparkSQL使用之如何使用UDF

Thrift使用實例

使用Java快速入門Thrift

在centos6.8上安裝使用VNC server

相關文章

快速搭建Time Server與NIS Server

在桌面部署中使用 Virtual Server 2005

linux下配置PROXY SERVER和CACHE SERVER

FreeBSD上的real server安裝及使用

x server與x client分離使用

Fedora 22 Server 怎樣升級到 Fedora 23 Beta Server

使用Ubuntu 8.10 Server 版定制安裝Mini Ubuntu

使用root用戶登錄vmware server

Java使用JDBC方式連接數據庫

ArcGIS 10.1 for Server 如何使用10或者之前的切片

Java項目使用Spring jdbc連接數據庫

Linux VNC server的安裝及簡單配置使用

Linux編程

SHELL編程

PERL編程