May 17, 2021 Spark Programming guide
Spark SQL also supports reading and writing data from Apache Hive. H
owever, Hive has a large number of dependencies, so it is not included in the Spark collection. S
park
-Phive
-Phive-thriftserver
parameters to support Hive.
Note that this rebuilt jar package must exist in all worker nodes because they require access to the data stored in Hive through Hive's serialized and anti-serialization libraries.
When working with Hive, developers need to provide HiveContext. H
iveContext, inherited from SQLContext, adds the ability to discover tables in metaStore and write queries using HiveSql. U
sers who do not have a Live deployment can also create HiveContext.
When not
hive-site.xml
the context automatically creates the metastore_db warehouse
metastore_db
the current
warehouse
// sc is an existing SparkContext.
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
sqlContext.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src")
// Queries are expressed in HiveQL
sqlContext.sql("FROM src SELECT key, value").collect().foreach(println)