Dr. Chia-Hui Chang - Rm E1-323, Ext: 4523
Homepage: http://www.csie.ncu.edu.tw/~chia/
Email: chia@csie.ncu.edu.tw
- Data Mining: Concepts and Techniques, J. Han and M. Kamber, Morgan Kaufmann , 2000
- Slides can be downloaded from http://www.cs.sfu.ca/~han/DM_Book.html
- Predictive Data Mining, S.M. Weiss and N. Indurkhya, Morgan Kaufmann, 1998
- Data Mining: Practical Machine Learning Tools and Techniques with Java Implementation, Ian H. Witten and Eibe Frank, Morgan Kaufman, 1999
- Advances in Knowledge Discovery and Data Mining, U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusam, AAAI/MIT Press, 1996: A recent collection of research papers
- Mastering Data Mining: The Art and Science of Customer Relationship Management, by Michael J. A. Berry and Gordon Linoff, John Wiley & Sons; 1999
Schedule | ||
Introduction | week 2 | |
Data Warehouse and OLAP | week 3 | |
Data Preparation | week 4 | |
Assignment 1 (MSSQL OLAP) -- Due. Mar. 19 | week 4 | |
DMQL & DBMiner Presentation Proposal -- Due Mar. 26 |
week 5 | |
Characterization and Discrimination | week 6 | |
Assignment 2 (DBMiner) -- Due. Apr. 9 | week 7 | |
Association Rule | week 8 | |
Assignment 3 (Association Rule) -- Due. Apr. 30 (postponed to May. 7) | week 9 | |
Classification | week 10 | |
Clustering | week 11 | |
Case Study | week 12 | |
Exam, May. 14 -- 30% | week 13 | |
Term Project -- 30% | 3 weeks | |
Others -- 10%
[Selected or Voluntary] Assignment presentation [Optional] Oral presentation topics:
|
You have two options for the term project in this course: implementation-based and application-based. You can, based on your preference, choose either one. They have the same weight (30%) in your final grade.Option I: Implmentation-based Project
There are three typical kinds of knowledge in data mining. They can be described as:You are required to choose any one of these algorithms and implement it using C++ or C, and test your program using some real data which can be obtained from University of California-Irvine: Machine Learning Database Repositories. Please spend some time navigating through the different data sets there and select the most suitable one for your testing. You might select several others to test your program to show your program does work.
- Classification
- Association
- Clustering
You are required to prepare a documentation of your project, including the description of your project, the algorithm, design diagram, key data structures, source code, and the testing results (input/output). You need to explain your test and test results, including any references to help people understand the significance and interestingness of your work.
Option II: Application-based Project
The students are asked to choose one application domain, and prepare the documentation for your case study including:Note: The documentation should be printed using in 12pt font, single line spacing, and should not exceed 15 pages. Please also prepare 30-minute slides to present your work. The length of the essay, though not strictly required, should be between 10 to 15 pages. However, we pay more attention to the quality of your essay, not just the number of pages. Test data can be download from Datasets for Machine Learning, Knowledge Discovery and Data Mining.
- The application case.
- How do you prepare for your data?
- Choose the mining type
- How would you explain your result?
- What problems you might encounter?
- ACM KDD
http://informatik.uni-trier.de/~ley/db/conf/kdd/index.html- IEEE ICDE
http://informatik.uni-trier.de/~ley/db/conf/icde/index.html- IEEE ICDM
http://informatik.uni-trier.de/~ley/db/conf/icdm/index.html- SIAM International Conference on Data Mining
SDM03, SDM02, SDM01- MLDM
http://informatik.uni-trier.de/~ley/db/conf/mldm/index.html- DaWaK
http://informatik.uni-trier.de/~ley/db/conf/dawak/index.html- DMKD
http://informatik.uni-trier.de/~ley/db/conf/dmkd/index.html- PKDD
http://informatik.uni-trier.de/~ley/db/conf/pkdd/index.html- PAKDD
htt~p://informatik.uni-trier.de/~ley/db/conf/pakdd/index.html
- Data Mining and Konwledge Discovery
http://www.wkap.nl/journalhome.htm/1384-5810
- KDNuggets Directory: Data Mining and Knowledge Discovery
http://www.kdnuggets.com/index.html- Data Mining Software
http://www.cs.bham.ac.uk/~anp/software.html