Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Spark SQL starts


May 17, 2021 Spark Programming guide


Table of contents


Spark SQL starts

The entry point for all related features in Spark is the SQLContext class or its sub-classes, and all you need to create an SQLContext is just a SparkContext.

val sc: SparkContext // An existing SparkContext.
val sqlContext = new org.apache.spark.sql.SQLContext(sc)

// createSchemaRDD is used to implicitly convert an RDD to a SchemaRDD.
import sqlContext.createSchemaRDD

In addition to a basic SQLContext, you can also create a HiveContext that supports a superset of features supported by basic SQLContext. I ts additional features include the ability to write queries with a more complete HivEQL analyzer to access HiveUDFs and the ability to read data from the Hive table. W ith HiveContext you don't need an existing Hive on, and sqlContext's available data sources are available for HiveContext. H iveContext is packaged separately to avoid including all Hive dependencies when Spark is built. I f these dependencies are not a problem for your application, Spark 1.2 recommends hiveContext. Future stable versions will focus on providing SQLContext with features equivalent to HiveContext.

The specific SQL variant used to resolve query statements can be spark.sql.dialect option. T his parameter can be changed in two ways, one by setting it by setConf method and the other by setting it in the SQL command SET key=value F or SQLContext, the only available dialect is "sql", a simple SQL parser provided by Spark SQL. I n HiveContext, although "sql" is also supported, the default dialect is "hiveql". T his is because the HivQL parser is more complete. "hiveql" is recommended in many use cases.