Grade "A+" Accredited by NAAC with a CGPA of 3.46

Data Mining

Download Course Details

Course ID

BHCS 17B

Level

Undergraduate

Program

B.Sc. CS (Hons.)

Semester

Sixth

Credits

6.0

Paper Type

DSE - 3

Method

Lecture & Practical

Unique Paper Code: Update Awaited

This course introduces data mining techniques and enables students to apply these techniques on real-life datasets. The course focuses on three main data mining techniques: Classification, Clustering and Association Rule Mining tasks.

Learning Outcomes:

At the end of the course, students should be able to:

Pre-process the data, and perform cleaning and transformation.
Apply suitable classification algorithm to train the classifier and evaluate its performance.
Apply appropriate clustering algorithm to cluster data and evaluate clustering quality.
Use association rule mining algorithms and generate frequent item-sets and association rules.

Course Contents

Unit 1

Unit 2

Unit 3

Unit 4

Unit 1

Introduction to Data Mining – Applications of data mining, data mining tasks, motivation and challenges, types of data attributes and measurements, data quality.
Data Pre-processing – aggregation, sampling, dimensionality reduction, Feature Subset Selection, Feature Creation, Discretization and Binarization, Variable Transformation.

Unit 2

Classification: Basic Concepts, Decision Tree Classifier: Decision tree algorithm, attribute selection measures, Nearest Neighbour Classifier, Bayes Theorem and Naive Bayes Classifier.

Model Evaluation: Holdout Method, Random Sub Sampling, Cross-Validation, evaluation metrics, confusion matrix.

Unit 3

Association rule mining: Transaction data-set, Frequent Itemset, Support measure, Apriori Principle, Apriori Algorithm, Computational Complexity, Rule Generation, Confidence of association rule.

Unit 4

Cluster Analysis: Basic Concepts, Different Types of Clustering Methods, Different Types of Clusters, K-means: The Basic K-means Algorithm, Strengths and Weaknesses of K-means algorithm, Agglomerative Hierarchical Clustering: Basic Algorithm, Proximity between clusters, DBSCAN: The DBSCAN Algorithm, Strengths and Weaknesses.

Practicals

Lab List 1

Section 1: Preprocessing

Create a file “people.txt” with the following data:

i) Read the data from the file “people.txt”.
ii) Create a ruleset E that contain rules to check for the following conditions:

1. The age should be in the range 0-150.
2. The age should be greater than years married.
3. The status should be married or single or widowed.
4. If age is less than 18 the age group should be child, if age is between 18 and 65 the age group should be adult, if age is more than 65 the age group should be elderly.

iii)Check whether ruleset E is violated by the data in the file people.txt.

iv)Summarize the results obtained in part (iii)

v)Visualize the results obtained in part (iii)

1. Perform the following preprocessing tasks on the dirty_iris datasetⁱⁱ.
  1. Calculate the number and percentage of observations that are complete.
  2. Replace all the special values in data with NA.
  3. Define these rules in a separate text file and read them.
  (Use editfile function in R (package editrules). Use similar function in Python).
  Print the resulting constraint object.

Additional Information

Text Books

Han, J., Kamber, M.,& Jian,P. (2011). Data Mining: Concepts and Techniques. 3rd edition. Morgan Kaufmann
Tan, P.-N., Steinbach, M., & Kumar, V. (2005). Introduction to Data Mining. 1st Edition. Pearson Education.

Additional Resources

Gupta, G. K. (2006). Introduction to Data Mining with Case Studies. Prentice-Hall of India.
Hand, D., & Mannila, H. & Smyth, P. (2006). Principles of Data Mining. Prentice-Hall of India.
Pujari, A. (2008). Data Mining Techniques. 2nd edition. Universities Press.

Teaching Learning Process

Use of ICT tools in conjunction with traditional class room teaching methods
Interactive sessions
Class discussions

Assessment Methods

Written tests, assignments, quizzes, presentations as announced by the instructor in the class

Keywords

Data mining, classifiers, data pre-processing, metrics.

Disclaimer: Details on this page are subject to change as per University of Delhi guidelines. For latest update in this regard please refer to the University of Delhi website here.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.