Notes on getting spark up and running on my windows box.
Inspiration stolen/taken from: [[https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-tips-and-tricks-running-spark-windows.html|Running Spark Applications on Windows - Jacek Laskowski]]
====== Download ======
Download:
* Spark: [[https://spark.apache.org/downloads.html]]
* WinUtils:
* Actual ''.exe'': [[https://github.com/steveloughran/winutils/blob/master/hadoop-2.7.1/bin/winutils.exe]]
* All versions: [[https://github.com/steveloughran/winutils]]
====== Install/Configure WinUtils ======
* Copy ''winutils.exe'' to: ''C:\APPLICATIONS\hadoop\bin''
Set paths:
set HADOOP_HOME=c:\APPLICATIONS\hadoop
set PATH=%HADOOP_HOME%\bin;%PATH%
echo %HADOOP_HOME%
**Note:** Path settings should be set in control panel for persistence.
- Start: ''control panel''
- Search: ''path''
- Click: "Edit the system environment variables"
- Click: "Environment Variables":
- Under "User Variables for ", Click: ''New''
- Variable Name: ''HADOOP_HOME''
- Variable Value: ''c:\APPLICATIONS\hadoop''
- Under "User Variables for ", Find: ''Path'', click ''Edit'', then:
- Paste in: ''C:\APPLICATIONS\hadoop\bin''
Configure & Test:
winutils.exe chmod -R 777 C:\tmp\hive
winutils.exe ls -F C:\tmp\hive
====== Install/Configure Spark ======
* Copy file under: ''C:\APPLICATIONS\Spark''
* I ended up with ''bin'' under: ''C:\APPLICATIONS\Spark\spark-2.2.0-bin-hadoop2.7''
===== Run it =====
cd C:\APPLICATIONS\Spark\spark-2.2.0-bin-hadoop2.7\bin
spark-shell2.cmd
Custom run... (local master, 2 cores)
spark-shell2.cmd --master local[2]
Output:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/11/27 13:21:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/27 13:21:21 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/jars/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/bin/../jars/datanucleus-core-3.2.10.jar."
17/11/27 13:21:21 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/bin/../jars/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/jars/datanucleus-rdbms-3.2.9.jar."
17/11/27 13:21:21 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/bin/../jars/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/jars/datanucleus-api-jdo-3.2.6.jar."
17/11/27 13:21:25 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://172.18.51.7:4040
Spark context available as 'sc' (master = local[*], app id = local-1511788879839).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.2.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_151)
Type in expressions to have them evaluated.
Type :help for more information.
Test command:
spark.range(1).withColumn("status", lit("All seems fine. Congratulations!")).show(false)
exit:
sys.exit
----
* Web Interface: [[http://localhost:4040/jobs/]]
* Configuration: [[http://localhost:4040/environment/]]