Grade "A+" Accredited by NAAC with a CGPA of 3.46
Grade "A+" Accredited by NAAC with a CGPA of 3.46

Data Mining

Course ID
BHCS 17B
Level
Undergraduate
Program
B.Sc. CS (Hons.)
Semester
Sixth
Credits
6.0
Paper Type
DSE - 3
Method
Lecture & Practical

Unique Paper Code: Update Awaited

This course introduces data mining techniques and enables students to apply these techniques on real-life datasets. The course focuses on three main data mining techniques: Classification, Clustering and Association Rule Mining tasks.

Learning Outcomes:

At the end of the course, students should be able to:

  • Pre-process the data, and perform cleaning and transformation.
  • Apply suitable classification algorithm to train the classifier and evaluate its performance.
  • Apply appropriate clustering algorithm to cluster data and evaluate clustering quality.
  • Use association rule mining algorithms and generate frequent item-sets and association rules.

Course Contents

Unit 1
Unit 2
Unit 3
Unit 4

Unit 1

Introduction to Data Mining – Applications of data mining, data mining tasks, motivation and challenges, types of data attributes and measurements, data quality.
Data Pre-processing – aggregation, sampling, dimensionality reduction, Feature Subset Selection, Feature Creation, Discretization and Binarization, Variable Transformation.

Unit 2

Classification: Basic Concepts, Decision Tree Classifier: Decision tree algorithm, attribute selection measures, Nearest Neighbour Classifier, Bayes Theorem and Naive Bayes Classifier.

Model Evaluation: Holdout Method, Random Sub Sampling, Cross-Validation, evaluation metrics, confusion matrix.

Unit 3

Association rule mining: Transaction data-set, Frequent Itemset, Support measure, Apriori Principle, Apriori Algorithm, Computational Complexity, Rule Generation, Confidence of association rule.

Unit 4

Cluster Analysis: Basic Concepts, Different Types of Clustering Methods, Different Types of Clusters, K-means: The Basic K-means Algorithm, Strengths and Weaknesses of K-means algorithm, Agglomerative Hierarchical Clustering: Basic Algorithm, Proximity between clusters, DBSCAN: The DBSCAN Algorithm, Strengths and Weaknesses.

Practicals

Lab List 1

Section 1: Preprocessing

  1. Create a file “people.txt” with the following data:


i) Read the data from the file “people.txt”.
ii) Create a ruleset E that contain rules to check for the following conditions:

    1. The age should be in the range 0-150.
    2. The age should be greater than years married.
    3. The status should be married or single or widowed.
    4. If age is less than 18 the age group should be child, if age is between 18 and 65 the age group should be adult, if age is more than 65 the age group should be elderly.

iii)Check whether ruleset E is violated by the data in the file people.txt.

iv)Summarize the results obtained in part (iii)

v)Visualize the results obtained in part (iii)

    1. Perform the following preprocessing tasks on the dirty_iris datasetii.
      1. Calculate the number and percentage of observations that are complete.
      2. Replace all the special values in data with NA.
      3. Define these rules in a separate text file and read them.
      (Use editfile function in R (package editrules). Use similar function in Python).
      Print the resulting constraint object.

Additional Information

Text Books


Han, J., Kamber, M.,& Jian,P. (2011). Data Mining: Concepts and Techniques. 3rd edition. Morgan Kaufmann
Tan, P.-N., Steinbach, M., & Kumar, V. (2005). Introduction to Data Mining. 1st Edition. Pearson Education.

Additional Resources


Gupta, G. K. (2006). Introduction to Data Mining with Case Studies. Prentice-Hall of India.
Hand, D., & Mannila, H. & Smyth, P. (2006). Principles of Data Mining. Prentice-Hall of India.
Pujari, A. (2008). Data Mining Techniques. 2nd edition. Universities Press.

Teaching Learning Process


Use of ICT tools in conjunction with traditional class room teaching methods
Interactive sessions
Class discussions

Assessment Methods

Written tests, assignments, quizzes, presentations as announced by the instructor in the class

Keywords

Data mining, classifiers, data pre-processing, metrics.

Disclaimer: Details on this page are subject to change as per University of Delhi guidelines. For latest update in this regard please refer to the University of Delhi website here.