This is an old revision of the document!
Notes on getting spark up and running on my windows box.
Inspiration stolen/taken from: Running Spark Applications on Windows - Jacek Laskowski
Download
Download:
- WinUtils:
- All versions: https://github.com/steveloughran/winutils
Install/Configure WinUtils
- Copy
winutils.exe
to:C:\APPLICATIONS\hadoop\bin
Set paths:
set HADOOP_HOME=c:\APPLICATIONS\hadoop set PATH=%HADOOP_HOME%\bin;%PATH% echo %HADOOP_HOME%
Note: Path settings should be set in control panel for persistence.
- Start:
control panel
- Search:
path
- Click: “Edit the system environment variables”
- Click: “Environment Variables”:
- Under “User Variables for <user>”, Click:
New
- Variable Name:
HADOOP_HOME
- Variable Value:
c:\APPLICATIONS\hadoop
- Under “User Variables for <user>”, Find:
Path
, clickEdit
, then:- Paste in:
C:\APPLICATIONS\hadoop\bin
Configure & Test:
winutils.exe chmod -R 777 C:\tmp\hive winutils.exe ls -F C:\tmp\hive
Install/Configure Spark
- Copy file under:
C:\APPLICATIONS\Spark
- I ended up with
bin
under:C:\APPLICATIONS\Spark\spark-2.2.0-bin-hadoop2.7
Run it
cd C:\APPLICATIONS\Spark\spark-2.2.0-bin-hadoop2.7\bin spark-shell2.cmd
Output:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 17/11/27 13:21:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/11/27 13:21:21 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/jars/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/bin/../jars/datanucleus-core-3.2.10.jar." 17/11/27 13:21:21 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/bin/../jars/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/jars/datanucleus-rdbms-3.2.9.jar." 17/11/27 13:21:21 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/bin/../jars/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/APPLICATIONS/Spark/spark-2.2.0-bin-hadoop2.7/jars/datanucleus-api-jdo-3.2.6.jar." 17/11/27 13:21:25 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException Spark context Web UI available at http://172.18.51.7:4040 Spark context available as 'sc' (master = local[*], app id = local-1511788879839). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_151) Type in expressions to have them evaluated. Type :help for more information.
Test command:
spark.range(1).withColumn("status", lit("All seems fine. Congratulations!")).show(false)
exit:
sys.exit
- Web Interface: http://localhost:4040/jobs/
- Configuration: http://localhost:4040/environment/