spark:windows_dev_environment [wiki.roman-halliday.com]

This page is read only. You can view the source, but not change it. Ask your administrator if you think this is wrong.
Notes on getting spark up and running on my windows box.

Inspiration stolen/taken from: [[https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-tips-and-tricks-running-spark-windows.html|Running Spark Applications on Windows - Jacek Laskowski]]

====== Download ======
Download:
  * Spark: [[https://spark.apache.org/downloads.html]]
  * WinUtils: 
    * Actual ''.exe'': [[https://github.com/steveloughran/winutils/blob/master/hadoop-2.7.1/bin/winutils.exe]]
    * All versions: [[https://github.com/steveloughran/winutils]]

====== Install/Configure WinUtils ======

  * Copy ''winutils.exe'' to: ''C:\APPLICATIONS\hadoop\bin''

Set paths:
<code>
set HADOOP_HOME=c:\APPLICATIONS\hadoop

set PATH=%HADOOP_HOME%\bin;%PATH%

echo %HADOOP_HOME%
</code>
**Note:** Path settings should be set in control panel for persistence.
  - Start: ''control panel''
  - Search: ''path''
  - Click: "Edit the system environment variables"
  - Click: "Environment Variables":
  - Under "User Variables for <user>", Click: ''New''
    - Variable Name: ''HADOOP_HOME''
    - Variable Value: ''c:\APPLICATIONS\hadoop''
  - Under "User Variables for <user>", Find: ''Path'', click ''Edit'', then:
    - Paste in: ''C:\APPLICATIONS\hadoop\bin''

Configure & Test:
<code>
winutils.exe chmod -R 777 C:\tmp\hive

winutils.exe ls -F C:\tmp\hive
</code>
====== Install/Configure Spark ======

  * Copy file under: ''C:\APPLICATIONS\Spark''
  * I ended up with ''bin'' under: ''C:\APPLICATIONS\Spark\spark-2.2.0-bin-hadoop2.7''

===== Run it =====
<code>
cd C:\APPLICATIONS\Spark\spark-2.2.0-bin-hadoop2.7\bin
spark-shell2.cmd
</code>

Custom run... (local master, 2 cores)
<code>
spark-shell2.cmd --master local[2]
</code>

Output:
<code>
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/11/27 13:21:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/27 13:21:21 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/jars/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/bin/../jars/datanucleus-core-3.2.10.jar."
17/11/27 13:21:21 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/bin/../jars/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/jars/datanucleus-rdbms-3.2.9.jar."
17/11/27 13:21:21 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/bin/../jars/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/jars/datanucleus-api-jdo-3.2.6.jar."
17/11/27 13:21:25 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://172.18.51.7:4040
Spark context available as 'sc' (master = local[*], app id = local-1511788879839).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_151)
Type in expressions to have them evaluated.
Type :help for more information.
</code>

Test command:
<code>
spark.range(1).withColumn("status", lit("All seems fine. Congratulations!")).show(false)
</code>

exit:
<code>
sys.exit
</code>

----
  * Web Interface: [[http://localhost:4040/jobs/]]
    * Configuration: [[http://localhost:4040/environment/]]