Data Mining Course Schedule 2002

Final score

SCHEDULE &GRADING

  Week

Lecture

Assignment 30%

Oral Presentation 20%

week 2

Introduction

 

 

week 3

Data Warehouse and OLAP

Reading Assignment

 

week 4

OLAP II

Presentation Proposal

1.         Edu City / MS OLAP

week 5

Data Preparation

 

2.         TWBD

week 6

DMQL & DBMiner

OLAP (3/25)

3.         Software Intro I (IM)

week 7

Characterization and Discrimination

 

4.         Software Intro II ()

week 8

Association Rule I

 

5.         Incremental mining (carry)

week 9

Association Rule II

 

6.         Text mining (nono)

week 10

Sequential Pattern

DHP (4/15)

7.         Maximum itemset mining (want)

week 11

Classification I

 

8.         Association rule based classification (glendy)

week 12

Exam 30%

Exam (5/6)

 

week 13

Clustering

 

9.         Association rule based clustering (sting)

week 14

Temporal Data Mining

Sequential Pattern Minging

 

10.      Temporal data mining (windson)

11.      TWBD (robcup)

12.      IEPAD (bruce)

week 15

Term Project 20%

I (5/27)

 

week 16

Presentation Order

II (6/3)

 

week 17

 

II (6/10)

 

week 18

 

 

 

Reading Assignment

S. Chaudhuri, U. Dayal, and V. Ganti, Database technology for decision support systems, IEEE Computer, Dec. 2001, pp. 48-55.

T. B. Pedersen and C. S. Jensen, Multidimensional database technology, IEEE Computer, Dec. 2001, pp. 40-46.

Oral Presentation (open from 3/11 to 5/6)

Send your slides to jahui@db.csie.ncu.edu.tw one week before presentation.

TERM PROJECT

You have two options for the term project in this course: implementation-based and application-based. You can, based on your preference, choose either one. They have the same weight in your final grade.

Option I: Implementation-based Project

There are three typical kinds of knowledge in data mining. They can be described as:

1.     Concept Description

2.     Classification

3.     Association

4.     Clustering

You are required to choose any one of these algorithms and implement it using C++ or C, and test your program using some real data which can be obtained  from University of California-Irvine: Machine Learning Database Repositories. Please spend some time navigating through the different data sets there and  select the most suitable one for your testing. You might select several others to test your program to show your program does work.

You are required to prepare a documentation of your project, including the description of your project, the algorithm, design diagram, key data structures, source code, and  the testing results (input/output).  You need to explain your test and test results, including any references to help people understand the significance and interestingness of your work.

Option II: Application-based Project

The students are asked to choose one application domain, and prepare the documentation for your case study including:

1.     The application case.

2.     How do you prepare for your data?

3.     Choose the mining type

4.     How would you explain your result?

5.     What problems you might encounter?

Note: The documentation should be printed using in 12pt font, single line spacing, and should not exceed 15 pages. Please also prepare 30-minute slides to present your work. The length of the essay, though not strictly required, should be between 10 to 15 pages. However, we pay more attention to the quality of your essay, not just the number of pages. Test data can be download from Datasets for Machine Learning, Knowledge Discovery and Data Mining or PKDD Cup.

1.         KDD Cup 1998

2.         KDD Cup 1999

3.         KDD Cup 2000

4.         KDD Cup 2001

5.         KDD Cup 2002

Special Topic

Incremental Mining

Maximal Frequent Itemset

Temporal Data Mining

Association Rule based Classifier

Scalable Classifier

Data Cleansing

Privacy Preserving Data Mining