SE6023 Lab1 Tutorial
Remote Hadoop Cluster Environment
- Ubuntu 18.04
- Hadoop 3.1.1 - For more details, visit Ambari Web UI in the link below.
- Java 8 (OpenJDK)
- You can access others’ home directory if you are in the same group.
- Every account has ~10GB storage quota.
- If you can’t connect to or work properly with the remote environment, mail to TA or v72807647 at gmail dot com
Handy URLs
- If https disappeared and something failed to work, add https:// back manually.
- If you need to access logs/files in the machine hadoop-slave*, use lynx(text web browser) or scp/rsync from the remote machine.
Recommended Developing Environment
For Windows users, please search for “Installation of OpenSSH For Windows” or run a linux distribution in a VM.
We will use Ubuntu Desktop in this guide unless other OS is mentioned.
- Recommended editor/IDE: IntellJ(Recommended) / vim / Eclipse
- Java SDK(OpenJDK/Oracle) >= 8
Must-have skills
- ssh/scp (OpenSSH/PuTTY/WinSCP/Filezilla in Windows)
- Java
Basic Linux Command/VSCode-SFTP Integration
Create Your First Hadoop Program: WordCount
If you are not familiar with the working environment, try out using plain text editor
first. The section will cover all the command you need to run example code successfully.
More experienced user may want to use IDEs/Maven to speed up development.
Using Plain Text Editor
- If you created
/user/apple/output
before, it must be deleted before next execution by using hadoop fs -rmr /user/apple/output
.
For example, I logged in with apple
account, and get WordCount.java source code from here.
$ ssh apple@pdc19.csie.ncu.edu.tw
$ nano WordCount.java
$ hadoop com.sun.tools.javac.Main WordCount.java
$ jar cf wc.jar WordCount*.class
$ hadoop fs -mkdir -p /user/apple/lab1/input
$ echo "Hello World Bye World" > file1
$ echo "Hello Hadoop Goodbye Hadoop" > file2
$ hadoop fs -copyFromLocal ./file1 /user/apple/lab1/input/file1
$ hadoop fs -copyFromLocal ./file2 /user/apple/lab1/input/file2
$ hadoop jar wc.jar WordCount /user/apple/lab1/input /user/apple/lab1/output
$ hadoop fs -cat /user/apple/lab1/output/part-r-00000
Bye 1
Goodbye 1
Hadoop 2
Hello 2
World 2
Using IntelliJ with Maven
About Maven
…Maven
dynamically downloads Java libraries and Maven
plug-ins from one or more repositories such as the Maven 2 Central
Repository, and stores them in a local cache…
Install maven and IntelliJ
Create a Maven Project
Feel free to fill in
Add the following code between <project>...</project>
of pom.xml
, making maven
automatically download hadoop-related dependencies.
<repositories>
<repository>
<id>apache</id>
<url>http://maven.apache.org</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.1.0</version>
</dependency>
<dependency>
<groupId>org.jetbrains</groupId>
<artifactId>annotations</artifactId>
<version>RELEASE</version>
<scope>compile</scope>
</dependency>
</dependencies>
Create a new class called WordCount
Paste WordCount
code, and click the refresh button. You will notice that maven are downloading dependencies.
Generate .class (Optional)
Before compiling, we recommend the version of bytecode no less than 8.
Compile code, Build > Build Project
, the WordCount.class
will be generated under target/classes
.
Generate .jar (Optional)
Create Artifacts
first
Build > Build Artifacts
, the WordCount.jar
will be generated.
Troubleshooting
Using Eclipse
Google is your BFF.
Upload file to Remote Server (Optional)
Method 1: Using SCP (for win10/linux)
scp file1 file2 apple@pdc19.csie.ncu.edu.tw:./
Method 2: IntellJ Ultimate
Ultimate
version of IntelliJ has a useful feature, can upload the compiled JAR
to SSH(SFTP) server directly. You can follow setup guides on Remote Server Configuration and Uploading and Downloading Files.
However, you need a student account to activate Ultimate version, if you do not have one, follow this link and register with NCU mail (or @g.ncu.edu.tw).
Method 3: Copy-and-paste in a text editor
Simple
but always useful. Just launch nano/vim and paste by right mouse click
or Ctrl+Shift+V(In most Linux graphical terminals).
Vim
SE6023 Lab1 Tutorial
tags:
hadoop
Remote Hadoop Cluster Environment
Let us speak frankly before you get started :
Handy URLs
pdc19.csie.ncu.edu.tw
https://pdc19.csie.ncu.edu.tw:18763
https://pdc19.csie.ncu.edu.tw:18764
https://pdc19.csie.ncu.edu.tw:18765
Recommended Developing Environment
Must-have skills
Basic Linux Command/VSCode-SFTP Integration
Create Your First Hadoop Program: WordCount
If you are not familiar with the working environment, try out
using plain text editor
first. The section will cover all the command you need to run example code successfully.More experienced user may want to use IDEs/Maven to speed up development.
Using Plain Text Editor
/user/apple/output
before, it must be deleted before next execution by usinghadoop fs -rmr /user/apple/output
.For example, I logged in with
apple
account, and get WordCount.java source code from here.Using IntelliJ with Maven
About Maven
…
Maven
dynamically downloads Java libraries and Maven plug-ins from one or more repositories such as the Maven 2 Central Repository, and stores them in a local cache…Install maven and IntelliJ
sudo apt install maven
Create a Maven Project
Feel free to fill in
Add the following code between
<project>...</project>
ofpom.xml
, makingmaven
automatically download hadoop-related dependencies.Create a new class called
WordCount
Paste
WordCount
code, and click the refresh button. You will notice that maven are downloading dependencies.Generate .class (Optional)
Before compiling, we recommend the version of bytecode no less than 8.
Compile code,
Build > Build Project
, theWordCount.class
will be generated undertarget/classes
.Generate .jar (Optional)
Create
Artifacts
firstBuild > Build Artifacts
, theWordCount.jar
will be generated.Troubleshooting
File -> Project Structure -> Project Settings -> Modules -> "Your Module Name" -> Sources -> Language Level
, select8
Using Eclipse
Google is your BFF.
Upload file to Remote Server (Optional)
Method 1: Using SCP (for win10/linux)
Use WinSCP to do this in graphical way.
You can integrate scp with VSCode.
Method 2: IntellJ Ultimate
Ultimate
version of IntelliJ has a useful feature, can upload the compiled JAR to SSH(SFTP) server directly. You can follow setup guides on Remote Server Configuration and Uploading and Downloading Files.However, you need a student account to activate Ultimate version, if you do not have one, follow this link and register with NCU mail (or @g.ncu.edu.tw).
Method 3: Copy-and-paste in a text editor
Simple but always useful. Just launch nano/vim and paste by right mouse click or Ctrl+Shift+V(In most Linux graphical terminals).
References for some useful tools
Vim
>>> 終端機和Vim教學 <<<
>>> Vim Tutorial <<<