Assignment 3

(Due time and date: 2:00pm on Apr. 30 (postponed to May. 7); 100 marks)


Submission Requirements

1. Finding Association Rules on Transaction Databases (20 Marks)

Take a look at the following table, where T1, T2, T3, T4, T5, and T6 are the transaction ID's, and A, B, C, D, and E are the item ID's.
Transaction ID List of Item ID's
T1 A, B, E
T2 B, C, D
T3 B, D, E
T4 C, D, E
T5 B, C, D, E
T6 B, C, E

Let the min_support = 20% and min_conf = 60%. In this question, we are considering the Apriori algorithm and two of its variations. They are:

  1. General Apriori algorithm.
  2. Hash-based Apriori algorithm (Suppose order(A)=1, order(B)=2, order(C)=3, order(D)=4, order(E)=5. The hashing function used is hash(x,y) = (order(x) * 10 + order(y)) mod 7, e.g. hash(A, B) = 5).
  3. Partitioning-based Apriori algorithm (Suppose the above transaction database is divided into two partitions. Transactions T1, T2, and T3 are in one partition while transactions T4, T5, and T6 are in the other).
Use each of the above three algorithm (5 marks for each algorithm) to mine all the rules which match the following meta-rule template.
      buys(X, Y) => buys(X, "E") -- [s, c]

2. Calculation Question (20 marks)

Suppose a data relation about a large set of students in a university database has been generalized to a relation R. You are required to derive a characteristic rule and a discriminant rule from this relation.

You can print out the data sheet for the concept hierarchies and the relation R which are used in this question.

Let the attribute thresholds (denoted as T(attribute)) be: T(major) = 3, T(status) = 2, T(age) = 2, T(nationality) = 2, and T(gpa) = 3.

  1. Derive a characteristic rule for R.
  2. Let the attribute thresholds be the same as above. Derive a discriminant rule which contrasts applied_science vs. arts students.

    3. Chapter 5, Exercise 2, 3 (40 marks)

    4. Chapter 6, Exercise 7 (20 marks)