您现在的位置： Linux教程網 >> UnixLinux > >> Linux編程 >> Linux編程

SparkSQL使用之Spark SQL CLI

Spark SQL CLI描述

Spark SQL CLI的引入使得在SparkSQL中通過hive metastore就可以直接對hive進行查詢更加方便；當前版本中還不能使用Spark SQL CLI與ThriftServer進行交互。

使用Spark SQL CLI前需要注意：

1、將hive-site.xml配置文件拷貝到$SPARK_HOME/conf目錄下；

2、需要在$SPARK_HOME/conf/spark-env.sh中的SPARK_CLASSPATH添加jdbc驅動的jar包

export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/Hadoop/software/mysql-connector-java-5.1.27-bin.jar

Spark SQL CLI命令參數介紹：

cd $SPARK_HOME/bin
spark-sql --help

Usage: ./bin/spark-sql [options] [cli option]
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Options:
--master MASTER_URL spark://host:port, mesos://host:port, yarn, or local.
--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
on one of the worker machines inside the cluster ("cluster")
(Default: client).
--class CLASS_NAME Your application's main class (for Java / Scala apps).
--name NAME A name of your application.
--jars JARS Comma-separated list of local jars to include on the driver
and executor classpaths.
--py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place
on the PYTHONPATH for Python apps.
--files FILES Comma-separated list of files to be placed in the working
directory of each executor.

--conf PROP=VALUE Arbitrary Spark configuration property.
--properties-file FILE Path to a file from which to load extra properties. If not
specified, this will look for conf/spark-defaults.conf.

--driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 512M).
--driver-java-options Extra Java options to pass to the driver.
--driver-library-path Extra library path entries to pass to the driver.
--driver-class-path Extra class path entries to pass to the driver. Note that
jars added with --jars are automatically included in the
classpath.

--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G).

--help, -h Show this help message and exit
--verbose, -v Print additional debug output

Spark standalone with cluster deploy mode only:
--driver-cores NUM Cores for driver (Default: 1).
--supervise If given, restarts the driver on failure.

Spark standalone and Mesos only:
--total-executor-cores NUM Total cores for all executors.

YARN-only:
--executor-cores NUM Number of cores per executor (Default: 1).
--queue QUEUE_NAME The YARN queue to submit to (Default: "default").
--num-executors NUM Number of executors to launch (Default: 2).
--archives ARCHIVES Comma separated list of archives to be extracted into the
working directory of each executor.

CLI options:
-d,--define <key=value> Variable subsitution to apply to hive
commands. e.g. -d A=B or --define A=B
--database <databasename> Specify the database to use
-e <quoted-query-string> SQL from command line
-f <filename> SQL from files
-h <hostname> connecting to Hive Server on remote host
--hiveconf <property=value> Use value for given property
--hivevar <key=value> Variable subsitution to apply to hive
commands. e.g. --hivevar A=B
-i <filename> Initialization SQL file
-p <port> connecting to Hive Server on port number
-S,--silent Silent mode in interactive shell
-v,--verbose Verbose mode (echo executed SQL to the console)

在啟動spark-sql時，如果不指定master，則以local的方式運行，master既可以指定standalone的地址，也可以指定yarn；

當設定master為yarn時(spark-sql --master yarn)時，可以通過http://hadoop000:8088頁面監控到整個job的執行過程；

注：如果在$SPARK_HOME/conf/spark-defaults.conf中配置了spark.master spark://hadoop000:7077，那麼在啟動spark-sql時不指定master也是運行在standalone集群之上。

spark-sql使用

啟動spark-sql：由於我已經在spark-defaults.conf中配置了spark.master spark://hadoop000:7077，就沒在spark-sql啟動時指定master了

cd $SPARK_HOME/bin
spark-sql

SELECT track_time, url, session_id, referer, ip, end_user_id, city_id FROM page_views WHERE city_id = -1000 limit 10;

SELECT session_id, count(*) c FROM page_views group by session_id order by c desc limit 10;

上面兩個sql語句用到的表現在存在hive中了，如果沒有則手工創建下，創建腳本以及導入數據腳本如下：

create table page_views(
track_time string,
url string,
session_id string,
referer string,
ip string,
end_user_id string,
city_id string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

load data local inpath '/home/spark/software/data/page_views.dat' overwrite into table page_views;

上一篇文章： C#把對象類型轉化為指定類型，轉化失敗時返回該類型默認值
下一篇文章： SparkSQL使用之Thrift JDBC Server

Linux編程

使用 IntelliJ IDEA 導入 Spark 最新源碼及編譯 Spark 源代碼

Commons CLI使用詳解

SparkSQL使用之Thrift JDBC Server

SparkSQL使用之如何使用UDF

Java中使用SQL的效率分析例子

Spark修煉之道（進階篇）——Spark入門到精通：第十三節 Spark Streaming—— Spark SQL、DataFrame與Spark Streaming

Spark源碼分析之Spark Shell（下）

Spark源碼分析之Spark Shell（上）

相關文章

在 MySQL 數據庫中使用C 執行SQL的語句

解決ubuntu下oracle sql plus光標鍵不能使用

SSH使用Log4j

Java中Eclipse的使用

使用 Apache Commons CLI 開發命令行工具

使用IntelliJ IDEA編寫Scala在Spark中運行

SparkSQL使用之JDBC代碼訪問Thrift JDBC Server

Android開發教程：使用已有的SQL數據庫

攻擊方式學習之SQL注入(SQL Injection)

[Linux] 在 Linux CLI 使用 ssh

Linux上使用Azure CLI來管理Azure

Spark

Linux編程

SHELL編程

PERL編程