SE6023 Lab3 HBase

Introducing HBase

Data Model and APIs

row timestamp column family column qualifier value

For furthur details, please read the documents.

Example of a table

row timestamp columnFamily columnQualifier value
warmpc t1 CPU BRAND AMD
warmpc t2 CPU ITEM TR2-2990WX
warmpc t3 CPU PRICE 57999
warmpc t1 HD BRAND TOSHIBA
warmpc t2 HD ITEM MG06ACA10TE-10TB
warmpc t3 HD PRICE 11800
warmpc t1 RAM BRAND GSKILL
warmpc t2 RAM ITEM F4-3200C14Q-32GTZRX
warmpc t3 RAM PRICE 15990

HBase Hands-on

Note

HBase interactive Shell

Although We will focus on using Java/Mapred to work with HBase, you can use the HBase shell for debugging/testing.

$ hbase shell
hbase(main)> help
...

Available Commands Introduction

HBase Cluster Status

(use your UNIX password to log in)
https://pdc19.csie.ncu.edu.tw:18766

Create Table and Columns using Java

Upload the source (name it createHBaseTable.java) to your home directory:

/*** createHBaseTable.java ***/ import java.util.*; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.*; import org.apache.hadoop.hbase.client.*; import org.apache.hadoop.hbase.util.Bytes; public class createHBaseTable { public static void main(String[] args) throws Exception { if (args.length < 2) { System.out.println("Arguments: [newTableName] [Family1] [Family2] ... [FamilyN]"); System.out.println("Existing table with the same name will be deleted!"); System.exit(1); } TableName tableName = TableName.valueOf(args[0]); String newColumnFamilies[] = Arrays.copyOfRange(args, 1, args.length); ArrayList<ColumnFamilyDescriptor> newColumnFamilyDescriptors = new ArrayList<>(); TableDescriptor tableDescriptor = null; TableDescriptorBuilder tableDescriptorBuilder = TableDescriptorBuilder.newBuilder(tableName); Configuration config = HBaseConfiguration.create(); Connection connection = ConnectionFactory.createConnection(config); Admin admin = connection.getAdmin(); /* Delete the table if it exists */ if (admin.tableExists(tableName)) { System.out.println(tableName + " exists"); System.out.println("disabling " + tableName + "..."); admin.disableTable(tableName); admin.deleteTable(tableName); System.out.println("deleting " + tableName + "..."); } /* Create column families */ for (String newColumnFamily : newColumnFamilies) { newColumnFamilyDescriptors .add(ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes(newColumnFamily)).build()); } tableDescriptorBuilder.setColumnFamilies(newColumnFamilyDescriptors); System.out.println("creating " + tableName + "..."); tableDescriptor = tableDescriptorBuilder.build(); try { admin.createTable(tableDescriptor); } finally { admin.close(); } } };

Build it then execute with HBase:

` means Command Substitution.

$ javac -cp `hbase classpath` createHBaseTable.java
## Expect createHBaseTable.class is in the same directory
$ hbase createHBaseTable a000000000 CPU RAM HD
... INFO  [main] client.HBaseAdmin: Operation: CREATE, Table Name: default:a000000000, .... completed

You will be able to see the created table at https://pdc19.csie.ncu.edu.tw:18766/tablesDetailed.jsp (login with your UNIX password!) now.

Create Table Data

Upload the source (name it putDataToHBaseTable.java) to your home directory:

/*** putDataToHBaseTable ***/ import java.util.*; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.*; import org.apache.hadoop.hbase.client.*; import org.apache.hadoop.hbase.util.Bytes; public class putDataToHBaseTable { public static void main(String[] args) throws Exception { if (args.length != 5) { System.out.println("Arguments: [TableName] [Row] [Family] [Qualifier] [Value]"); System.exit(1); } TableName tableName = TableName.valueOf(args[0]); byte[] rowKey = Bytes.toBytes(args[1]); byte[] family = Bytes.toBytes(args[2]); byte[] qualifier = Bytes.toBytes(args[3]); byte[] value = Bytes.toBytes(args[4]); Configuration config = HBaseConfiguration.create(); Connection connection = ConnectionFactory.createConnection(config); Table table = connection.getTable(tableName); Put put = new Put(rowKey); put.addColumn(family, qualifier, value); try{ table.put(put); }finally{ table.close(); connection.close(); } } };

Build it:

$ javac -cp `hbase classpath` putDataToHBaseTable.java

Write a simple shell script using text editor and save it as putToTable.sh for easy putting:

#!/bin/bash
table="a000000000"
hbase putDataToHBaseTable $table warmpc CPU BRAND AMD
hbase putDataToHBaseTable $table warmpc CPU ITEM TR2-2990WX
hbase putDataToHBaseTable $table warmpc CPU PRICE 57999
hbase putDataToHBaseTable $table warmpc HD BRAND TOSHIBA
hbase putDataToHBaseTable $table warmpc HD ITEM MG06ACA10TE-10TB
hbase putDataToHBaseTable $table warmpc HD PRICE 11800
hbase putDataToHBaseTable $table warmpc RAM BRAND GSKILL
hbase putDataToHBaseTable $table warmpc RAM ITEM F4-3200C14Q-32GTZRX
hbase putDataToHBaseTable $table warmpc RAM PRICE 15990

Put:

$ chmod +x ./putToTable.sh
$ ./putToTable.sh

Warmpc has nothing to do with the famous computer components seller in Taiwan.

Get Value in Table

Upload the source (name it getHBaseTableValue.java) to your home directory:

/*** getHBaseTableValue ***/ import java.util.*; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.*; import org.apache.hadoop.hbase.client.*; import org.apache.hadoop.hbase.util.Bytes; public class getHBaseTableValue { public static void main(String[] args) throws Exception { if (args.length != 4) { System.out.println("Arguments: [TableName] [Row] [Family] [Qualifier]"); System.exit(1); } TableName tableName = TableName.valueOf(args[0]); byte[] rowKey = Bytes.toBytes(args[1]); byte[] family = Bytes.toBytes(args[2]); byte[] qualifier = Bytes.toBytes(args[3]); Configuration config = HBaseConfiguration.create(); Connection connection = ConnectionFactory.createConnection(config); Table table = connection.getTable(tableName); Get get = new Get(rowKey); try{ Result rowResult = table.get(get); byte[] value = rowResult.getValue(family, qualifier); System.out.println("Value: " + Bytes.toString(value)); }finally{ table.close(); connection.close(); } } };

Build it then execute with HBase:

$ javac -cp `hbase mapredcp`:`hadoop classpath` getHBaseTableValue.java
$ hbase getHBaseTableValue a000000000 warmpc CPU ITEM
...
Value: TR2-2990WX
...

Working with MapReduce

HBase can be access by Hadoop/MapReduce. You can read/write from HDFS to HBase, and vice versa.

From HDFS to HBase

Upload the data (name it hbase.txt) to your home directory of hadoop(on HDFS):

coldpc CPU BRAND AMD coldpc CPU ITEM R7-2700X coldpc CPU PRICE 10490 coldpc RAM BRAND KINGSTON coldpc RAM ITEM HyperX-D43200-8Gx4 coldpc RAM PRICE 6988 coldpc HD BRAND WD coldpc HD ITEM UltrastarDCHC520 coldpc HD PRICE 15890
## copy-paste with use your favorite text editor or use sftp
$ nano hbase.txt
$ hadoop fs -put hbase.txt /user/a000000000/hbase.txt

Upload the source (name it hbaseMapredInput.java) to your home directory:

Remember to change the username.

import java.io.IOException; import java.util.*; import org.apache.hadoop.hbase.*; import org.apache.hadoop.hbase.client.*; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.mapreduce.TableOutputFormat; import org.apache.hadoop.hbase.mapreduce.TableReducer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.MapWritable; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; public class hbaseMapredInput { public static class hbaseMapredInputMapper extends Mapper<LongWritable, Text, Text, MapWritable> { public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); int i = 0; MapWritable outputMap = new MapWritable(); Text outputKey = new Text(); while (itr.hasMoreTokens()) { switch (i) { case 0: outputKey.set(itr.nextToken()); break; case 1: outputMap.put(new Text("family"), new Text(itr.nextToken())); break; case 2: outputMap.put(new Text("qualifier"), new Text(itr.nextToken())); break; case 3: outputMap.put(new Text("value"), new Text(itr.nextToken())); context.write(outputKey, outputMap); outputMap.clear(); break; } i++; } } } public static class hbaseMapredInputReducer extends TableReducer<Text, MapWritable, NullWritable> { public void reduce(Text key, Iterable<MapWritable> values, Context context) throws IOException, InterruptedException { for (MapWritable valueObject : values) { String family = valueObject.get(new Text("family")).toString(); String qualifier = valueObject.get(new Text("qualifier")).toString(); String value = valueObject.get(new Text("value")).toString(); Put put = new Put(Bytes.toBytes(key.toString())); put.addColumn(Bytes.toBytes(family), Bytes.toBytes(qualifier), Bytes.toBytes(value)); context.write(NullWritable.get(), put); } } } public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("please enter input, tablename"); System.exit(0); } String input = args[0]; String tablename = args[1]; String username = "a000000000"; //TODO change me Configuration config = HBaseConfiguration.create(); config.set(TableOutputFormat.OUTPUT_TABLE, tablename); config.set("hbase.zookeeper.quorum", "hadoop-slave1,hadoop-slave2,hadoop-slave3,hadoop-master"); config.set("zookeeper.znode.parent", "/hbase-unsecure"); Job job = Job.getInstance(config); job.setJobName("hbaseMapredInput_" + username); job.setJarByClass(hbaseMapredInput.class); job.setMapperClass(hbaseMapredInputMapper.class); job.setReducerClass(hbaseMapredInputReducer.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(MapWritable.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TableOutputFormat.class); FileInputFormat.addInputPath(job, new Path(input)); job.waitForCompletion(true); } }

build it, pack into a jar then run with hadoop:

Note
The program need to package as an JAR portable executable, so hadoop can distribute it to nodes, making map-reduce possible.

$ javac -cp `hbase classpath` hbaseMapredInput.java 
$ jar cf hbaseMapredInput.jar hbaseMapredInput*.class
## Change username to yours
$ hadoop jar hbaseMapredInput.jar hbaseMapredInput hbase.txt a000000000
...

From HBase to HDFS

Upload the source (name it hbaseMapredOutput.java) to your home directory:

import java.io.IOException; import java.util.*; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.Cell; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.hbase.mapreduce.TableInputFormat; import org.apache.hadoop.hbase.mapreduce.TableMapper; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.MapWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; public class hbaseMapredOutput { public static class hbaseMapredOutputMapper extends TableMapper<Text, MapWritable> { public void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException { MapWritable outputMap = new MapWritable(); for (Cell kv : value.rawCells()) { outputMap.put(new Text("family"), new Text(Arrays.copyOfRange(kv.getFamilyArray(), kv.getFamilyOffset(), kv.getFamilyOffset() + kv.getFamilyLength()))); outputMap.put(new Text("qualifier"), new Text(Arrays.copyOfRange(kv.getQualifierArray(), kv.getQualifierOffset(), kv.getQualifierOffset() + kv.getQualifierLength()))); outputMap.put(new Text("value"), new Text(Arrays.copyOfRange(kv.getValueArray(), kv.getValueOffset(), kv.getValueOffset() + kv.getValueLength()))); context.write(new Text( Arrays.copyOfRange(kv.getRowArray(), kv.getRowOffset(), kv.getRowOffset() + kv.getRowLength())), outputMap); outputMap.clear(); } } } public static class hbaseMapredOutputReducer extends Reducer<Text, MapWritable, Text, Text> { public void reduce(Text key, Iterable<MapWritable> values, Context context) throws IOException, InterruptedException { for (MapWritable valueObject : values) { String family = valueObject.get(new Text("family")).toString(); String qualifier = valueObject.get(new Text("qualifier")).toString(); String value = valueObject.get(new Text("value")).toString(); context.write(key, new Text(family + " " + qualifier + " " + value)); } } } public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("please enter output, tablename"); System.exit(0); } String output = args[0]; String tablename = args[1]; String username = "a000000000"; // TODO change this Configuration config = HBaseConfiguration.create(); config.set(TableInputFormat.INPUT_TABLE, tablename); // These two lines are only needed when you run on the server config.set("hbase.zookeeper.quorum", "hadoop-slave1,hadoop-slave2,hadoop-slave3,hadoop-master"); config.set("zookeeper.znode.parent", "/hbase-unsecure"); Job job = Job.getInstance(config); job.setJobName("hbaseMapredOutput_" + username); job.setJarByClass(hbaseMapredOutput.class); job.setMapperClass(hbaseMapredOutputMapper.class); job.setReducerClass(hbaseMapredOutputReducer.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(MapWritable.class); job.setInputFormatClass(TableInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileOutputFormat.setOutputPath(job, new Path(output)); job.waitForCompletion(true); } }
## Run this line if you have already run the program once. Change the username!
$ hadoop fs -rm -r /user/a000000000/lab3-out
## It's normal to see the error if you are first time executing the program
rm: '/user/a000000000/lab3-out': No such file or directory 

$ javac -cp `hbase classpath` hbaseMapredOutput.java
$ jar cf hbaseMapredOutput.jar hbaseMapredOutput*.class
$ hadoop jar hbaseMapredOutput.jar hbaseMapredOutput lab3-out a000000000
$ hadoop fs -cat /user/a000000000/lab3-out/part-r-00000
......
coldpc  RAM PRICE 6988
coldpc  RAM ITEM HyperX-D43200-8Gx4
coldpc  RAM BRAND KINGSTON
coldpc  HD PRICE 15890
coldpc  HD ITEM UltrastarDCHC520
coldpc  HD BRAND WD
coldpc  CPU PRICE 10490
coldpc  CPU ITEM R7-2700X
coldpc  CPU BRAND AMD
......

Practices

Due date
2019/05/31 23:59 (GMT+8:00)

Requirements Updated 2019/5/27

Practice 1

Put the following data into the table you have created during the hands-on using HBase Java APIs. Data can be hard-coded into your java program (will get less score since it is easier) or be provided by a shell script like what we have done during the hands-on.
透過HBase Java APIs將下列的資料放進你在hands-on時建立的table。可接受將資料寫死(但分數會較低)或透過shell script方式提供(如在範例程式的作法,可獲得完整的分數)。

row columnFamily qualifier value
Yaya MB BRAND ASUS
Yaya MB ITEM RogZenithExtremeAlpha
Yaya MB PRICE 17990

How to hand in 如何繳交
Save your code named username_practice1.java in your home directory/lab3 (~/lab3) and make sure it can be compiled with the filename. (We will rename the code file with your public static class name and then compile it.)
If you use a shell script to provide data, name it username_practice1.sh and put in the same folder as java code.
將你的程式以username_practice1.java(如:s107123456_practice1.java)為檔名存檔在家目錄/lab3底下,並確定可以正常用該檔案/類別成功編譯(評分程式會以public static的類別名稱重新命名程式碼檔案再編譯)。

How will we test the program

Tips

Practice 2

Try to add this data into the table you created during the hands-on with MapReduce Java API. You should read the data from a text file located at the HDFS, and the arguments of the java program should be [programName] [dataFilePathOnHDFS] [tableName] or [programName] [dataFilePathOnHDFS].
透過MapReduce Java API將下列的資料放進你在hands-on時建立的table。資料來源必須是HDFS中的檔案,程式的arguments需為[programName] [dataFilePathOnHDFS] [tableName][programName] [dataFilePathOnHDFS]

row columnFamily qualifier value
P100 MB BRAND ASUS
P100 MB ITEM X399
P100 MB PRICE 8787

How to hand in
Save your code named username_practice2.java in your home directory/lab3 (~/lab3).

How will we test the program
We will modify the username in your code(only when necessary), compile it, run with the test data (and a test table) as arguments of the program and check the result.

Tips
To add a new column into the existing table, you may need to:

Tips

Working with IDEs: Maven Project

For intellJ/Eclipse/VSCode…etc with maven users, add related repo and dependency in pom.xml then you can get code complete features:

<repositories>
	...
	<repository>
		<id>mvnrepository</id>
		<url>https://mvnrepository.com/repos/central</url>
	</repository>
	...
</repositories>
<dependencies>
	...
	<dependency>
		<groupId>org.apache.hbase</groupId>
		<artifactId>hbase-client</artifactId>
		<version>2.1.4</version>
	</dependency>
	<dependency>
		<groupId>org.apache.hbase</groupId>
		<artifactId>hbase-common</artifactId>
		<version>2.1.4</version>
	</dependency>
	...
</dependencies>

LOG NEEDED?

You can see the logs of slaves from https://pdc19.csie.ncu.edu.tw:18764/cluster/ . Just click your programName_studentID for more details, and click logs in the page.
But you will got error/NXDOMAIN page since the slaves are behind a firewall, to solve this problem, copy the URL of the log page, and use text browser from the SSH shell:

## Replace the URL with your log URL
$ lynx http://hadoop-slave1:8042/node/containerlogs/container_e27_1555285318657_0022_01_000001/s107525003

References