
How it works...
To create a SparkSession, we will use the Builder class (accessed via the .builder property of the SparkSession class). You can specify some basic properties of the SparkSession here:
- The .master(...) allows you to specify the driver node (in our preceding example, we would be running a local session with two cores)
- The .appName(...) gives you means to specify a friendly name for your app
- The .config(...) method allows you to refine your session's behavior further; the list of the most important SparkSession parameters is outlined in the following table
- The .getOrCreate() method returns either a new SparkSession if one has not been created yet, or returns a pointer to an already existing SparkSession
The following table gives an example list of the most useful configuration parameters for a local instance of Spark:
Some of these parameters are also applicable if you are working in a cluster environment with multiple worker nodes. In the next recipe, we will explain how to set up and administer a multi-node Spark cluster deployed over YARN.

There are some environment variables that also allow you to further fine-tune your Spark environment. Specifically, we are talking about the PYSPARK_DRIVER_PYTHON and PYSPARK_DRIVER_PYTHON_OPTS variables. We have already covered these in the Installing Spark from sources recipe.