Table of Contents

Notes on getting spark up and running on my windows box.

Inspiration stolen/taken from: Running Spark Applications on Windows - Jacek Laskowski



Install/Configure WinUtils

Set paths:




Note: Path settings should be set in control panel for persistence.

  1. Start: control panel
  2. Search: path
  3. Click: “Edit the system environment variables”
  4. Click: “Environment Variables”:
  5. Under “User Variables for <user>”, Click: New
    1. Variable Name: HADOOP_HOME
    2. Variable Value: c:\APPLICATIONS\hadoop
  6. Under “User Variables for <user>”, Find: Path, click Edit, then:
    1. Paste in: C:\APPLICATIONS\hadoop\bin

Configure & Test:

winutils.exe chmod -R 777 C:\tmp\hive

winutils.exe ls -F C:\tmp\hive

Install/Configure Spark

Run it

cd C:\APPLICATIONS\Spark\spark-2.2.0-bin-hadoop2.7\bin

Custom run… (local master, 2 cores)

spark-shell2.cmd --master local[2]


Using Spark's default log4j profile: org/apache/spark/
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/11/27 13:21:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/27 13:21:21 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/jars/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/bin/../jars/datanucleus-core-3.2.10.jar."
17/11/27 13:21:21 WARN General: Plugin (Bundle) "" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/bin/../jars/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/jars/datanucleus-rdbms-3.2.9.jar."
17/11/27 13:21:21 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/bin/../jars/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/jars/datanucleus-api-jdo-3.2.6.jar."
17/11/27 13:21:25 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at
Spark context available as 'sc' (master = local[*], app id = local-1511788879839).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_151)
Type in expressions to have them evaluated.
Type :help for more information.

Test command:

spark.range(1).withColumn("status", lit("All seems fine. Congratulations!")).show(false)

