Data Mining

Data Mining Course Schedule 2002

SCHEDULE &GRADING

Week	Lecture	Assignment 30%	Oral Presentation 20%
week 2	Introduction
week 3	Data Warehouse and OLAP	Reading Assignment
week 4	OLAP II	Presentation Proposal	1. Edu City / MS OLAP
week 5	Data Preparation		2. TWBD
week 6	DMQL & DBMiner	OLAP (3/25)	3. Software Intro I (IM)
week 7	Characterization and Discrimination		4. Software Intro II ()
week 8	Association Rule I		5. Incremental mining (carry)
week 9	Association Rule II		6. Text mining (nono)
week 10	Sequential Pattern	DHP (4/15)	7. Maximum itemset mining (want)
week 11	Classification I		8. Association rule based classification (glendy)
week 12	Exam 30%	Exam (5/6)
week 13	Clustering		9. Association rule based clustering (sting)
week 14	Temporal Data Mining Sequential Pattern Minging		10. Temporal data mining (windson) 11. TWBD (robcup) 12. IEPAD (bruce)
week 15	Term Project 20%	I (5/27)
week 16	Presentation Order	II (6/3)
week 17		II (6/10)
week 18

Reading Assignment

S. Chaudhuri, U. Dayal, and V. Ganti, Database technology for decision support systems, IEEE Computer, Dec. 2001, pp. 48-55.

T. B. Pedersen and C. S. Jensen, Multidimensional database technology, IEEE Computer, Dec. 2001, pp. 40-46.

Oral Presentation (open from 3/11 to 5/6)

Send your slides to jahui@db.csie.ncu.edu.tw one week before presentation.

TERM PROJECT

You have two options for the term project in this course: implementation-based and application-based. You can, based on your preference, choose either one. They have the same weight in your final grade.

Option I: Implementation-based Project

There are three typical kinds of knowledge in data mining. They can be described as:

1. Concept Description

2. Classification

3. Association

4. Clustering

You are required to choose any one of these algorithms and implement it using C++ or C, and test your program using some real data which can be obtained from University of California-Irvine: Machine Learning Database Repositories. Please spend some time navigating through the different data sets there and select the most suitable one for your testing. You might select several others to test your program to show your program does work.

You are required to prepare a documentation of your project, including the description of your project, the algorithm, design diagram, key data structures, source code, and the testing results (input/output). You need to explain your test and test results, including any references to help people understand the significance and interestingness of your work.

Option II: Application-based Project

The students are asked to choose one application domain, and prepare the documentation for your case study including:

1. The application case.

2. How do you prepare for your data?

3. Choose the mining type

4. How would you explain your result?

5. What problems you might encounter?

Note: The documentation should be printed using in 12pt font, single line spacing, and should not exceed 15 pages. Please also prepare 30-minute slides to present your work. The length of the essay, though not strictly required, should be between 10 to 15 pages. However, we pay more attention to the quality of your essay, not just the number of pages. Test data can be download from Datasets for Machine Learning, Knowledge Discovery and Data Mining or PKDD Cup.

Special Topic

Incremental Mining

David W. Cheung, S.D. Lee, Benjamin Kao, A General Incremental Techniques for Maintaining Discovered Association Rules, Proceedings of the 5th international conference on database systems for advanced applications, Melbourne, Australia, Apr. 1-4, 1997
Chang-Hung Lee, Cheng-Ru Lin, and Ming-Syan Chen, Sliding-Window Filtering: An Efficient Algorithm for Incremental Mining, ACM CIKM 2001
Zequn Zhou, A Low-Scan Incremental Association Rule maintenance Method Based on the Apriori Property, AI 2001
A. Veloso, B. Possas, W. Meira Jr., M. B. de carvalho, Knowledge Management in Association Rule Mining, ICDM 2001

Maximal Frequent Itemset

R.Agrawal, C. Aggarwal, and V. Prasad. Depth First Generation of Long Patterns. In ACM SIGKDD Conf. Aug. 2000.
R. Agrawal, et al. Fast discovery of association rules. In advances in knowledge Discovery and Data Mining, AAAI press, 1996.
R. J. Bayardo. Efficiently mining long patterns from databases. In ACM SIGMOD Conf., June 1998.
D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases. ICDE'01.
Karam Gouda and Mohammed J. Zaki. Efficiently Mining Maximal Frequent Itemsets. ICDM'01.
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In ACM SIGMOD conf., May 2000.

Temporal Data Mining

B. Ozden, S. Ramaswamy, A. Silberschatz, Cyclic Association Rules, ICDE 1998: 412-421
S. Ramaswamy, S. Mahajan, A. Silberschatz: On the Discovery of Interesting Patterns in Association Rules. VLDB 1998: 368-379
J. Han, G. Dong, Y. Yin, Efficient Mining of Partial Periodic Patterns In Time Series Database, ICDE 1999: 106-115
C. P. Rainsford, Adding Temporal Semantics to Association Rules, PKDD 1999: 504-509
Y. Li, P. Ning, X. Sean Wang, S. Jajodia, Discovering Calendar-based Temporal Association Rules, TIME 2001: 111-118
Xiaodong Chen, Ilias Petrounias: Mining Temporal Features in Association Rules. PKDD 1999: 295-300

Association Rule based Classifier

B. Lent, A. Swmi, and J. Widom. Clustering association rules. ICDE 1997: 220-231
B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. KDD 1998: 80-86
G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. KDD 1999: 43-52
J. Li, G. Dong, and K. Ramamohanrarao. Making use of the most expressive jumping emerging patterns for classification. PAKDD 2000: 220-232
D. Meretakis and B. Wuthrich. Extending naïve bayes classifiers using long itemsets. KDD 1999: 165-174

Scalable Classifier

SLIQ (EDBT96 -- Mehta et al. 96)
SPRINT (VLDB96 -- J. Shafer et al. 96)
PUBLIC (VLDB98 -- Rastogi & Shim 98)
RainForest (VLDB98 -- Gehrke, et al.¦98)

Data Cleansing

Privacy Preserving Data Mining