SE6023 Lab3 HBase
Introducing HBase
- HBase is the Hadoop database: a distributed, scalable, big data store.
- Can host very large tables – billions of rows X millions of columns – on top of clusters built by commodity hardwares.
- Capable of providing random, realtime read/write access to big data
- Is a type of NoSQL database.
- But lacks of many of the features you can find in a Relational Database Management System (typed columns, secondary indexes, transactions, advanced query languages, etc.)
Data Model and APIs
- Table
An HBase table consists of multiple rows.
row |
timestamp |
column family |
column qualifier |
value |
-
Row
A row in HBase consists of a row key and one or more columns with values associated with them; “row key” is known as primary key in tranditional database.
DO NOT have data types and always store/access as byte arrays(byte[]).
-
Column Family/Qualifier
Column families physically colocate a set of columns and their values.
A column qualifier is added to a column family to provide the index for a given piece of data. Given a column family content, a column qualifier might be content:html
or content:pdf
.
-
Cell
A cell is a combination of row, column family, and column qualifier, and contains a value and a timestamp, which represents the value’s version.
-
Timestamp
A timestamp is written alongside each value, and is the identifier for a given version of a value.
For furthur details, please read the documents.
Example of a table
row |
timestamp |
columnFamily |
columnQualifier |
value |
warmpc |
t1 |
CPU |
BRAND |
AMD |
warmpc |
t2 |
CPU |
ITEM |
TR2-2990WX |
warmpc |
t3 |
CPU |
PRICE |
57999 |
warmpc |
t1 |
HD |
BRAND |
TOSHIBA |
warmpc |
t2 |
HD |
ITEM |
MG06ACA10TE-10TB |
warmpc |
t3 |
HD |
PRICE |
11800 |
warmpc |
t1 |
RAM |
BRAND |
GSKILL |
warmpc |
t2 |
RAM |
ITEM |
F4-3200C14Q-32GTZRX |
warmpc |
t3 |
RAM |
PRICE |
15990 |
HBase API Quick Links
HBase Hands-on
Note
- Replace all occurrences of
a000000000
with your username.
- There are API changes in newer HBase (>=3.0.0).
If you have learned HBase before, you may need to read the docs again, and check whether your code can run properly on our server or not.
Make sure the documents or instructions you googled are using the latest API.
HBase interactive Shell
Although We will focus on using Java/Mapred to work with HBase, you can use the HBase shell for debugging/testing.
$ hbase shell
hbase(main)> help
...
Available Commands Introduction
HBase Cluster Status
(use your UNIX password to log in)
https://pdc19.csie.ncu.edu.tw:18766
Create Table and Columns using Java
Upload the source (name it createHBaseTable.java) to your home directory:
import java.util.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
public class createHBaseTable {
public static void main(String[] args) throws Exception {
if (args.length < 2) {
System.out.println("Arguments: [newTableName] [Family1] [Family2] ... [FamilyN]");
System.out.println("Existing table with the same name will be deleted!");
System.exit(1);
}
TableName tableName = TableName.valueOf(args[0]);
String newColumnFamilies[] = Arrays.copyOfRange(args, 1, args.length);
ArrayList<ColumnFamilyDescriptor> newColumnFamilyDescriptors = new ArrayList<>();
TableDescriptor tableDescriptor = null;
TableDescriptorBuilder tableDescriptorBuilder = TableDescriptorBuilder.newBuilder(tableName);
Configuration config = HBaseConfiguration.create();
Connection connection = ConnectionFactory.createConnection(config);
Admin admin = connection.getAdmin();
if (admin.tableExists(tableName)) {
System.out.println(tableName + " exists");
System.out.println("disabling " + tableName + "...");
admin.disableTable(tableName);
admin.deleteTable(tableName);
System.out.println("deleting " + tableName + "...");
}
for (String newColumnFamily : newColumnFamilies) {
newColumnFamilyDescriptors
.add(ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes(newColumnFamily)).build());
}
tableDescriptorBuilder.setColumnFamilies(newColumnFamilyDescriptors);
System.out.println("creating " + tableName + "...");
tableDescriptor = tableDescriptorBuilder.build();
try {
admin.createTable(tableDescriptor);
} finally {
admin.close();
}
}
};
Build it then execute with HBase:
$ javac -cp `hbase classpath` createHBaseTable.java
$ hbase createHBaseTable a000000000 CPU RAM HD
... INFO [main] client.HBaseAdmin: Operation: CREATE, Table Name: default:a000000000, .... completed
You will be able to see the created table at https://pdc19.csie.ncu.edu.tw:18766/tablesDetailed.jsp (login with your UNIX password!) now.
Create Table Data
Upload the source (name it putDataToHBaseTable.java) to your home directory:
import java.util.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
public class putDataToHBaseTable {
public static void main(String[] args) throws Exception {
if (args.length != 5) {
System.out.println("Arguments: [TableName] [Row] [Family] [Qualifier] [Value]");
System.exit(1);
}
TableName tableName = TableName.valueOf(args[0]);
byte[] rowKey = Bytes.toBytes(args[1]);
byte[] family = Bytes.toBytes(args[2]);
byte[] qualifier = Bytes.toBytes(args[3]);
byte[] value = Bytes.toBytes(args[4]);
Configuration config = HBaseConfiguration.create();
Connection connection = ConnectionFactory.createConnection(config);
Table table = connection.getTable(tableName);
Put put = new Put(rowKey);
put.addColumn(family, qualifier, value);
try{
table.put(put);
}finally{
table.close();
connection.close();
}
}
};
Build it:
$ javac -cp `hbase classpath` putDataToHBaseTable.java
Write a simple shell script using text editor and save it as putToTable.sh
for easy putting:
#!/bin/bash
table="a000000000"
hbase putDataToHBaseTable $table warmpc CPU BRAND AMD
hbase putDataToHBaseTable $table warmpc CPU ITEM TR2-2990WX
hbase putDataToHBaseTable $table warmpc CPU PRICE 57999
hbase putDataToHBaseTable $table warmpc HD BRAND TOSHIBA
hbase putDataToHBaseTable $table warmpc HD ITEM MG06ACA10TE-10TB
hbase putDataToHBaseTable $table warmpc HD PRICE 11800
hbase putDataToHBaseTable $table warmpc RAM BRAND GSKILL
hbase putDataToHBaseTable $table warmpc RAM ITEM F4-3200C14Q-32GTZRX
hbase putDataToHBaseTable $table warmpc RAM PRICE 15990
Put:
$ chmod +x ./putToTable.sh
$ ./putToTable.sh
Warmpc has nothing to do with the famous computer components seller in Taiwan.
Get Value in Table
Upload the source (name it getHBaseTableValue.java) to your home directory:
import java.util.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
public class getHBaseTableValue {
public static void main(String[] args) throws Exception {
if (args.length != 4) {
System.out.println("Arguments: [TableName] [Row] [Family] [Qualifier]");
System.exit(1);
}
TableName tableName = TableName.valueOf(args[0]);
byte[] rowKey = Bytes.toBytes(args[1]);
byte[] family = Bytes.toBytes(args[2]);
byte[] qualifier = Bytes.toBytes(args[3]);
Configuration config = HBaseConfiguration.create();
Connection connection = ConnectionFactory.createConnection(config);
Table table = connection.getTable(tableName);
Get get = new Get(rowKey);
try{
Result rowResult = table.get(get);
byte[] value = rowResult.getValue(family, qualifier);
System.out.println("Value: " + Bytes.toString(value));
}finally{
table.close();
connection.close();
}
}
};
Build it then execute with HBase:
$ javac -cp `hbase mapredcp`:`hadoop classpath` getHBaseTableValue.java
$ hbase getHBaseTableValue a000000000 warmpc CPU ITEM
...
Value: TR2-2990WX
...
Working with MapReduce
HBase can be access by Hadoop/MapReduce. You can read/write from HDFS to HBase, and vice versa.
From HDFS to HBase
Upload the data (name it hbase.txt) to your home directory of hadoop(on HDFS):
coldpc CPU BRAND AMD
coldpc CPU ITEM R7-2700X
coldpc CPU PRICE 10490
coldpc RAM BRAND KINGSTON
coldpc RAM ITEM HyperX-D43200-8Gx4
coldpc RAM PRICE 6988
coldpc HD BRAND WD
coldpc HD ITEM UltrastarDCHC520
coldpc HD PRICE 15890
$ nano hbase.txt
$ hadoop fs -put hbase.txt /user/a000000000/hbase.txt
Upload the source (name it hbaseMapredInput.java) to your home directory:
Remember to change the username
.
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
public class hbaseMapredInput {
public static class hbaseMapredInputMapper extends Mapper<LongWritable, Text, Text, MapWritable> {
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
int i = 0;
MapWritable outputMap = new MapWritable();
Text outputKey = new Text();
while (itr.hasMoreTokens()) {
switch (i) {
case 0:
outputKey.set(itr.nextToken());
break;
case 1:
outputMap.put(new Text("family"), new Text(itr.nextToken()));
break;
case 2:
outputMap.put(new Text("qualifier"), new Text(itr.nextToken()));
break;
case 3:
outputMap.put(new Text("value"), new Text(itr.nextToken()));
context.write(outputKey, outputMap);
outputMap.clear();
break;
}
i++;
}
}
}
public static class hbaseMapredInputReducer extends TableReducer<Text, MapWritable, NullWritable> {
public void reduce(Text key, Iterable<MapWritable> values, Context context)
throws IOException, InterruptedException {
for (MapWritable valueObject : values) {
String family = valueObject.get(new Text("family")).toString();
String qualifier = valueObject.get(new Text("qualifier")).toString();
String value = valueObject.get(new Text("value")).toString();
Put put = new Put(Bytes.toBytes(key.toString()));
put.addColumn(Bytes.toBytes(family), Bytes.toBytes(qualifier), Bytes.toBytes(value));
context.write(NullWritable.get(), put);
}
}
}
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.println("please enter input, tablename");
System.exit(0);
}
String input = args[0];
String tablename = args[1];
String username = "a000000000";
Configuration config = HBaseConfiguration.create();
config.set(TableOutputFormat.OUTPUT_TABLE, tablename);
config.set("hbase.zookeeper.quorum", "hadoop-slave1,hadoop-slave2,hadoop-slave3,hadoop-master");
config.set("zookeeper.znode.parent", "/hbase-unsecure");
Job job = Job.getInstance(config);
job.setJobName("hbaseMapredInput_" + username);
job.setJarByClass(hbaseMapredInput.class);
job.setMapperClass(hbaseMapredInputMapper.class);
job.setReducerClass(hbaseMapredInputReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(MapWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TableOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(input));
job.waitForCompletion(true);
}
}
build it, pack into a jar then run with hadoop:
Note
The program need to package as an JAR portable executable, so hadoop can distribute it to nodes, making map-reduce possible.
$ javac -cp `hbase classpath` hbaseMapredInput.java
$ jar cf hbaseMapredInput.jar hbaseMapredInput*.class
$ hadoop jar hbaseMapredInput.jar hbaseMapredInput hbase.txt a000000000
...
From HBase to HDFS
Upload the source (name it hbaseMapredOutput.java) to your home directory:
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableInputFormat;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class hbaseMapredOutput {
public static class hbaseMapredOutputMapper extends TableMapper<Text, MapWritable> {
public void map(ImmutableBytesWritable key, Result value, Context context)
throws IOException, InterruptedException {
MapWritable outputMap = new MapWritable();
for (Cell kv : value.rawCells()) {
outputMap.put(new Text("family"), new Text(Arrays.copyOfRange(kv.getFamilyArray(), kv.getFamilyOffset(),
kv.getFamilyOffset() + kv.getFamilyLength())));
outputMap.put(new Text("qualifier"), new Text(Arrays.copyOfRange(kv.getQualifierArray(),
kv.getQualifierOffset(), kv.getQualifierOffset() + kv.getQualifierLength())));
outputMap.put(new Text("value"), new Text(Arrays.copyOfRange(kv.getValueArray(), kv.getValueOffset(),
kv.getValueOffset() + kv.getValueLength())));
context.write(new Text(
Arrays.copyOfRange(kv.getRowArray(), kv.getRowOffset(), kv.getRowOffset() + kv.getRowLength())),
outputMap);
outputMap.clear();
}
}
}
public static class hbaseMapredOutputReducer extends Reducer<Text, MapWritable, Text, Text> {
public void reduce(Text key, Iterable<MapWritable> values, Context context)
throws IOException, InterruptedException {
for (MapWritable valueObject : values) {
String family = valueObject.get(new Text("family")).toString();
String qualifier = valueObject.get(new Text("qualifier")).toString();
String value = valueObject.get(new Text("value")).toString();
context.write(key, new Text(family + " " + qualifier + " " + value));
}
}
}
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.println("please enter output, tablename");
System.exit(0);
}
String output = args[0];
String tablename = args[1];
String username = "a000000000";
Configuration config = HBaseConfiguration.create();
config.set(TableInputFormat.INPUT_TABLE, tablename);
config.set("hbase.zookeeper.quorum", "hadoop-slave1,hadoop-slave2,hadoop-slave3,hadoop-master");
config.set("zookeeper.znode.parent", "/hbase-unsecure");
Job job = Job.getInstance(config);
job.setJobName("hbaseMapredOutput_" + username);
job.setJarByClass(hbaseMapredOutput.class);
job.setMapperClass(hbaseMapredOutputMapper.class);
job.setReducerClass(hbaseMapredOutputReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(MapWritable.class);
job.setInputFormatClass(TableInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileOutputFormat.setOutputPath(job, new Path(output));
job.waitForCompletion(true);
}
}
$ hadoop fs -rm -r /user/a000000000/lab3-out
rm: '/user/a000000000/lab3-out': No such file or directory
$ javac -cp `hbase classpath` hbaseMapredOutput.java
$ jar cf hbaseMapredOutput.jar hbaseMapredOutput*.class
$ hadoop jar hbaseMapredOutput.jar hbaseMapredOutput lab3-out a000000000
$ hadoop fs -cat /user/a000000000/lab3-out/part-r-00000
......
coldpc RAM PRICE 6988
coldpc RAM ITEM HyperX-D43200-8Gx4
coldpc RAM BRAND KINGSTON
coldpc HD PRICE 15890
coldpc HD ITEM UltrastarDCHC520
coldpc HD BRAND WD
coldpc CPU PRICE 10490
coldpc CPU ITEM R7-2700X
coldpc CPU BRAND AMD
......
Practices
Due date
2019/05/31 23:59 (GMT+8:00)
Requirements Updated 2019/5/27
Practice 1
Put the following data into the table you have created during the hands-on using HBase Java APIs. Data can be hard-coded into your java program (will get less score since it is easier) or be provided by a shell script like what we have done during the hands-on.
透過HBase Java APIs將下列的資料放進你在hands-on時建立的table。可接受將資料寫死(但分數會較低)或透過shell script方式提供(如在範例程式的作法,可獲得完整的分數)。
row |
columnFamily |
qualifier |
value |
Yaya |
MB |
BRAND |
ASUS |
Yaya |
MB |
ITEM |
RogZenithExtremeAlpha |
Yaya |
MB |
PRICE |
17990 |
How to hand in 如何繳交
Save your code named username_practice1.java
in your home directory/lab3 (~/lab3
) and make sure it can be compiled with the filename. (We will rename the code file with your public static
class name and then compile it.)
If you use a shell script to provide data, name it username_practice1.sh
and put in the same folder as java code.
將你的程式以username_practice1.java
(如:s107123456_practice1.java)為檔名存檔在家目錄/lab3
底下,並確定可以正常用該檔案/類別成功編譯(評分程式會以public static的類別名稱重新命名程式碼檔案再編譯)。
How will we test the program
- If only a java file is provided, we will modify the username in your code, run it without any arguments and check if the data has been correctly added to the table.
- If a java file and a shell script are provided, we will modify the username in your shell script, run the shell script and check the results.
Tips
- You will need to modify the code in the hands-on to add a new column family into the existing table. Copy-paste the sample code will only get basic score.
你需要修改程式碼以新增原本不存在的Column Family到已經存在的table中。 只是複製貼上範例程式碼只會獲得基本分數。
- Be aware for output messages when you are running shell scripts. You may need to set your terminal scrollback to Unlimited or tee the output to a file to check if there were something wrong (exceptions…, etc).
Practice 2
Try to add this data into the table you created during the hands-on with MapReduce Java API. You should read the data from a text file located at the HDFS, and the arguments of the java program should be [programName] [dataFilePathOnHDFS] [tableName]
or [programName] [dataFilePathOnHDFS]
.
透過MapReduce Java API將下列的資料放進你在hands-on時建立的table。資料來源必須是HDFS中的檔案,程式的arguments需為[programName] [dataFilePathOnHDFS] [tableName]
或 [programName] [dataFilePathOnHDFS]
。
row |
columnFamily |
qualifier |
value |
P100 |
MB |
BRAND |
ASUS |
P100 |
MB |
ITEM |
X399 |
P100 |
MB |
PRICE |
8787 |
How to hand in
Save your code named username_practice2.java
in your home directory/lab3 (~/lab3
).
How will we test the program
We will modify the username in your code(only when necessary), compile it, run with the test data (and a test table) as arguments of the program and check the result.
Tips
To add a new column into the existing table, you may need to:
- Connect HBase again(with
zookeeper & config.set()
lines in the hand-on) and get Admin instance inside the reducer.
Since reducer part of code may be executed on the data nodes, the reducer will not be able to access variable/objects stored in your original class(which will be stored on the name node). Without the code of adding new column family, you would get less score.
需要在Reducer中重新連線並透過Admin的instance以新增新的Column Family到table中。沒有新增column family將會獲得較低的分數。
- Catch
org.apache.hadoop.hbase.InvalidFamilyOperationException
will make you easier to done the task without checking whether the column exists or not.
If the column provided in input already exists, just skip it!
- If you want to hard-code your username in the program, please let us know by adding a
//TODO modify here
comment. Hard-code username wont get less score in practice 3-2. 把username寫死在 3-2 這題並不會影響分數。
Howerver, TODO
is a pretty liar, use it carefully.
- You can use HBase shell to re-enable the disabled table conveniently when debugging. Try
enable 'tableName'
in the shell.
Tips
Working with IDEs: Maven Project
For intellJ/Eclipse/VSCode…etc with maven users, add related repo and dependency in pom.xml
then you can get code complete features:
<repositories>
...
<repository>
<id>mvnrepository</id>
<url>https://mvnrepository.com/repos/central</url>
</repository>
...
</repositories>
<dependencies>
...
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>2.1.4</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
<version>2.1.4</version>
</dependency>
...
</dependencies>
LOG NEEDED?
You can see the logs of slaves from https://pdc19.csie.ncu.edu.tw:18764/cluster/ . Just click your programName_studentID
for more details, and click logs in the page.
But you will got error/NXDOMAIN page since the slaves are behind a firewall, to solve this problem, copy the URL of the log page, and use text browser from the SSH shell:
$ lynx http://hadoop-slave1:8042/node/containerlogs/container_e27_1555285318657_0022_01_000001/s107525003
- Use
Page Down/Up
to browse the page
/
to find with keywords
q
to quit
References
SE6023 Lab3 HBase
Introducing HBase
Data Model and APIs
An HBase table consists of multiple rows.
Row
A row in HBase consists of a row key and one or more columns with values associated with them; “row key” is known as primary key in tranditional database.
DO NOT have data types and always store/access as byte arrays(byte[]).
Column Family/Qualifier
Column families physically colocate a set of columns and their values.
A column qualifier is added to a column family to provide the index for a given piece of data. Given a column family content, a column qualifier might be
content:html
orcontent:pdf
.Cell
A cell is a combination of row, column family, and column qualifier, and contains a value and a timestamp, which represents the value’s version.
Timestamp
A timestamp is written alongside each value, and is the identifier for a given version of a value.
For furthur details, please read the documents.
Example of a table
HBase API Quick Links
Admin, HBaseConfiguration, Connection
Table
TableDescriptor, TableDescriptorBuilder, ColumnFamilyDescriptor, ColumnFamilyDescriptorBuilder
Put, Get
Bytes
HBase Hands-on
Note
a000000000
with your username.If you have learned HBase before, you may need to read the docs again, and check whether your code can run properly on our server or not.
Make sure the documents or instructions you googled are using the latest API.
HBase interactive Shell
Although We will focus on using Java/Mapred to work with HBase, you can use the HBase shell for debugging/testing.
$ hbase shell hbase(main)> help ...
Available Commands Introduction
HBase Cluster Status
(use your UNIX password to log in)
https://pdc19.csie.ncu.edu.tw:18766
Create Table and Columns using Java
Upload the source (name it createHBaseTable.java) to your home directory:
Build it then execute with HBase:
` means Command Substitution.
$ javac -cp `hbase classpath` createHBaseTable.java ## Expect createHBaseTable.class is in the same directory $ hbase createHBaseTable a000000000 CPU RAM HD ... INFO [main] client.HBaseAdmin: Operation: CREATE, Table Name: default:a000000000, .... completed
You will be able to see the created table at https://pdc19.csie.ncu.edu.tw:18766/tablesDetailed.jsp (login with your UNIX password!) now.
Create Table Data
Upload the source (name it putDataToHBaseTable.java) to your home directory:
Build it:
Write a simple shell script using text editor and save it as
putToTable.sh
for easy putting:#!/bin/bash table="a000000000" hbase putDataToHBaseTable $table warmpc CPU BRAND AMD hbase putDataToHBaseTable $table warmpc CPU ITEM TR2-2990WX hbase putDataToHBaseTable $table warmpc CPU PRICE 57999 hbase putDataToHBaseTable $table warmpc HD BRAND TOSHIBA hbase putDataToHBaseTable $table warmpc HD ITEM MG06ACA10TE-10TB hbase putDataToHBaseTable $table warmpc HD PRICE 11800 hbase putDataToHBaseTable $table warmpc RAM BRAND GSKILL hbase putDataToHBaseTable $table warmpc RAM ITEM F4-3200C14Q-32GTZRX hbase putDataToHBaseTable $table warmpc RAM PRICE 15990
Put:
Warmpc has nothing to do with the famous computer components seller in Taiwan.
Get Value in Table
Upload the source (name it getHBaseTableValue.java) to your home directory:
Build it then execute with HBase:
Working with MapReduce
HBase can be access by Hadoop/MapReduce. You can read/write from HDFS to HBase, and vice versa.
From HDFS to HBase
Upload the data (name it hbase.txt) to your home directory of hadoop(on HDFS):
## copy-paste with use your favorite text editor or use sftp $ nano hbase.txt $ hadoop fs -put hbase.txt /user/a000000000/hbase.txt
Upload the source (name it hbaseMapredInput.java) to your home directory:
Remember to change the
username
.build it, pack into a jar then run with hadoop:
Note
The program need to package as an JAR portable executable, so hadoop can distribute it to nodes, making map-reduce possible.
$ javac -cp `hbase classpath` hbaseMapredInput.java $ jar cf hbaseMapredInput.jar hbaseMapredInput*.class ## Change username to yours $ hadoop jar hbaseMapredInput.jar hbaseMapredInput hbase.txt a000000000 ...
From HBase to HDFS
Upload the source (name it hbaseMapredOutput.java) to your home directory:
## Run this line if you have already run the program once. Change the username! $ hadoop fs -rm -r /user/a000000000/lab3-out ## It's normal to see the error if you are first time executing the program rm: '/user/a000000000/lab3-out': No such file or directory $ javac -cp `hbase classpath` hbaseMapredOutput.java $ jar cf hbaseMapredOutput.jar hbaseMapredOutput*.class $ hadoop jar hbaseMapredOutput.jar hbaseMapredOutput lab3-out a000000000 $ hadoop fs -cat /user/a000000000/lab3-out/part-r-00000 ...... coldpc RAM PRICE 6988 coldpc RAM ITEM HyperX-D43200-8Gx4 coldpc RAM BRAND KINGSTON coldpc HD PRICE 15890 coldpc HD ITEM UltrastarDCHC520 coldpc HD BRAND WD coldpc CPU PRICE 10490 coldpc CPU ITEM R7-2700X coldpc CPU BRAND AMD ......
Practices
Due date
2019/05/31 23:59 (GMT+8:00)
Requirements Updated 2019/5/27
Practice 1
Put the following data into the table you have created during the hands-on using HBase Java APIs. Data can be hard-coded into your java program (will get less score since it is easier) or be provided by a shell script like what we have done during the hands-on.
透過HBase Java APIs將下列的資料放進你在hands-on時建立的table。可接受將資料寫死(但分數會較低)或透過shell script方式提供(如在範例程式的作法,可獲得完整的分數)。
How to hand in 如何繳交
Save your code named
username_practice1.java
in your home directory/lab3 (~/lab3
) and make sure it can be compiled with the filename. (We will rename the code file with yourpublic static
class name and then compile it.)If you use a shell script to provide data, name it
username_practice1.sh
and put in the same folder as java code.將你的程式以
username_practice1.java
(如:s107123456_practice1.java)為檔名存檔在家目錄/lab3
底下,並確定可以正常用該檔案/類別成功編譯(評分程式會以public static的類別名稱重新命名程式碼檔案再編譯)。How will we test the program
Tips
你需要修改程式碼以新增原本不存在的Column Family到已經存在的table中。 只是複製貼上範例程式碼只會獲得基本分數。
Practice 2
Try to add this data into the table you created during the hands-on with MapReduce Java API. You should read the data from a text file located at the HDFS, and the arguments of the java program should be
[programName] [dataFilePathOnHDFS] [tableName]
or[programName] [dataFilePathOnHDFS]
.透過MapReduce Java API將下列的資料放進你在hands-on時建立的table。資料來源必須是HDFS中的檔案,程式的arguments需為
[programName] [dataFilePathOnHDFS] [tableName]
或[programName] [dataFilePathOnHDFS]
。How to hand in
Save your code named
username_practice2.java
in your home directory/lab3 (~/lab3
).How will we test the program
We will modify the username in your code(only when necessary), compile it, run with the test data (and a test table) as arguments of the program and check the result.
Tips
To add a new column into the existing table, you may need to:
zookeeper & config.set()
lines in the hand-on) and get Admin instance inside the reducer.Since reducer part of code may be executed on the data nodes, the reducer will not be able to access variable/objects stored in your original class(which will be stored on the name node). Without the code of adding new column family, you would get less score.
需要在Reducer中重新連線並透過Admin的instance以新增新的Column Family到table中。沒有新增column family將會獲得較低的分數。
org.apache.hadoop.hbase.InvalidFamilyOperationException
will make you easier to done the task without checking whether the column exists or not.If the column provided in input already exists, just skip it!
//TODO modify here
comment. Hard-code username wont get less score in practice 3-2. 把username寫死在 3-2 這題並不會影響分數。Howerver,TODO
is a pretty liar, use it carefully.enable 'tableName'
in the shell.Tips
Working with IDEs: Maven Project
For intellJ/Eclipse/VSCode…etc with maven users, add related repo and dependency in
pom.xml
then you can get code complete features:<repositories> ... <repository> <id>mvnrepository</id> <url>https://mvnrepository.com/repos/central</url> </repository> ... </repositories> <dependencies> ... <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>2.1.4</version> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-common</artifactId> <version>2.1.4</version> </dependency> ... </dependencies>
LOG NEEDED?
You can see the logs of slaves from https://pdc19.csie.ncu.edu.tw:18764/cluster/ . Just click your
programName_studentID
for more details, and click logs in the page.But you will got error/NXDOMAIN page since the slaves are behind a firewall, to solve this problem, copy the URL of the log page, and use text browser from the SSH shell:
## Replace the URL with your log URL $ lynx http://hadoop-slave1:8042/node/containerlogs/container_e27_1555285318657_0022_01_000001/s107525003
Page Down/Up
to browse the page/
to find with keywordsq
to quitReferences