May 17, 2021 Spark Programming guide
Spark SQL also supports interfaces that run SQL queries directly without writing any code.
The Trift JDBC/ODBC server implemented here is consistent with HiveServer2 in Hive 0.12. You can test the JDBC server with the beeline script that came with Spark or Hive 0.12.
In the Spark directory, run the following command to start the JDBC/ODBC server.
./sbin/start-thriftserver.sh
This script accepts any
bin/spark-submit
command line parameters, plus
--hiveconf
to indicate the Hive property. Y
ou can
./sbin/start-thriftserver.sh --help
get a complete list of all available options. B
y default, the server
localhost:10000
You can override these variables with environment variables.
export HIVE_SERVER2_THRIFT_PORT=<listening-port>
export HIVE_SERVER2_THRIFT_BIND_HOST=<listening-host>
./sbin/start-thriftserver.sh \
--master <master-uri> \
...
Or overwrite with system variables.
./sbin/start-thriftserver.sh \
--hiveconf hive.server2.thrift.port=<listening-port> \
--hiveconf hive.server2.thrift.bind.host=<listening-host> \
--master <master-uri>
...
Now you can test the Trift JDBC/ODBC server with beeline.
./bin/beeline
Here's how to connect to the Trift JDBC/ODBC server:
beeline> !connect jdbc:hive2://localhost:10000
Beeline will ask for your username and password. I n unsealed mode, simply enter your machine's username and empty password. For safe mode, you can follow the instructions in the Beeline documentation.
Spark SQL CLI is a convenient tool that runs the Hive metastore service locally and executes queries entered on the command line. Note that the Spark SQL CLI cannot communicate with the Thrift JDBC server.
Run the following command in the Spark directory to start the Spark SQL CLI.
./bin/spark-sql