Computer Science Mount Holyoke College
  CS341 Data Mining, Spring 2007


 

Instructor: 
Class Meets:
Classroom:

Office Hours:
Office:

Email:

Professor Xiaoyan Li 
M, W 2:40 PM-3:55 PM
Kendade Hall: 107
T,Th 10:00-11:00 AM
Clapp 227

xli@mtholyoke.edu

 

 


Course Update Information 

  • January 29. First Day of Class. Welcome all!

 

Course Objectives

Data Mining has become one of the most exciting and fastest growing fields in computer science. Data Mining refers to various techniques which can be used to uncover hidden information from a database. The data to be mined may be complex, multimedia data including text, graphics, video, audio and bioinformatics data. Data Mining has evolved from several areas including: databases, artificial intelligence, machine learning, pattern recognition, multimedia information retrieval, and can be applied to the exploration of hidden information from web, video, and bioinformatics data. This course is designed to provide senior undergraduate students with introductory of data mining concepts and tools. In addition, related concepts such as information retrieval, web mining and bioinformatics will be covered.

Textbook Data Mining: Introductory and Advanced Topics. by Margaret H. Dunham

 

Schedule

The following schedule is based on spring 2007 academic calendar:

 

Date

Planned Lecture Topics

Read/Assign/Exam/lab

Jan 29 (M)

Jan 31 (W)

Introduction

Part 1: Database Systems, Decision Support Systems and Warehousing

Ch 1

Ch 2

Feb 5  (M)

Feb 7 (W)

Part 1: Information Retrieval, Questions Answering and Web Search

Part I: Data Mining techniques (I)

Ch 2

Ch 3

Feb 12 (M)
Feb 14 (W)

Part I: Data Mining techniques (II)

“snow day”

Ch 3 (hw1)

Ch 4

Feb19 (M)
Feb 21 (W)

Part II: Classification – Regression & Bayesian Classification

Part II: Classification - Distance-Based Algorithm

Ch 4

Ch 4 (hw2)

Feb 26 (M)
Feb 28 (W)

Part II: Classification - Decision Tree Algorithm

Part II: Classification – Rule-Based Algorithm

Ch 4

Ch 4 (hw3)

Mar 5 (M)
Mar 7 (W
)

Review session

First in-class exam

 

 

Mar 12 (M)
Mar 14 (W)

Part II: Clustering – Similarity and Distance Measures

Part II: Clustering – Hierarchical Algorithm

Ch 5

Ch 5

Mar 17-25

Mid-semester break

 

 

Mar 26 (M)

Mar 28 (W)

Part II: Clustering Partitional Algorithm

Part II: Association Rules

(hw4)

Ch 6

Apr 2 (M)
Apr 4 (W)

Part III: PERL (I)

Part III: PERL (II)

 

Apr 9 (M)
Apr 11 (W)

Part III: Project Discussion

Part III: Project Discussion

Final Project

Apr 16(M)
Apr 18 (W)

Part III: Project Discussion & Follow up

 

 

Apr 23 (M)
Apr 25 (W)

Part III: Project Discussion & Follow up

 

 

Apr 30 (M)
May 2 (W)

Part III: Project Discussion & Follow up

 

 

May 7 (M)
May 9-10

In-class Presentation

Reading days

 

May 11-15

 

 

 

 

 

 


Assignments and Grading

See syllabus above for the tentative timetable for a schedule. There will be about 4 assignments taking up 20% of your final grade. There will be one midterm and one final project that contribute 20% and 40% of your final grade, respectively. The rest 20% goes to class participations. There will be no final examination. 

Policies:  Students may discuss ideas together. But since each student get credits for her submissions, all solutions must be done separately by each student, and must not be shared.

Communications: I would like the course to run smoothly and enjoyably. Feel free to let me know what you find good and interesting about the course. Let me know as soon as possible about the reverse. You may see me in my office during my hours or send me messages by e-mail.


Copyright @ Xiaoyan Li, Mount Holyoke College, Spring 2007.