Grade "A+" Accredited by NAAC with a CGPA of 3.46
Grade "A+" Accredited by NAAC with a CGPA of 3.46

Data Analysis and Visualization

Course ID
BHCS 15A
Level
Undergraduate
Program
B.Sc. CS (Hons.)
Semester
Fifth
Credits
6.0
Paper Type
DSE - 1
Method
Lecture & Practical

Unique Paper Code: Update Awaited

This course introduces students to data analysis and visualization in the field of exploratory data science using Python.

Learning Outcomes:

At the end of the course, students should be able to:

  • Use data analysis tools in the pandas library
  • Load, clean, transform, merge and reshape data.
  • Handle external files as well as exceptions.
  • Analyze and manipulate time series data.
  • Solve real world data analysis problems.

Course Contents

Unit 1
Unit 2
Unit 3
Unit 4
Unit 5

Unit 1

Introduction: Introduction to Data Science, Exploratory Data Analysis and Data Science Process. Motivation for using Python for Data Analysis, Introduction of Python shell iPython and Jupyter Notebook.

Essential Python Libraries: NumPy, pandas, matplotlib, SciPy, scikit-learn, statsmodels.

 

Unit 2

Getting Started with Pandas: Arrays and vectorized conputation, Introduction to pandas Data Structures, Essential Functionality, Summarizing and Computing Descriptive Statistics.
Data Loading, Storage and File Formats.
Reading and Writing Data in Text Format, Web Scraping, Binary Data Formats, Interacting with Web APIs, Interacting with Databases
Data Cleaning and Preparation.
Handling Missing Data, Data Transformation, String Manipulation
.

Unit 3

Data Wrangling: Hierarchical Indexing, Combining and Merging Data Sets Reshaping and Pivoting.
Data Visualization matplotlib: Basics of matplotlib, plotting with pandas and seaborn, other python visualization tools.

Unit 4

Data Aggregation and Group operations: Group by Mechanics, Data aggregation, General split-apply-combine, Pivot tables and cross tabulation

Time Series Data Analysis: Date and Time Data Types and Tools, Time series Basics, date Ranges, Frequencies and Shifting, Time Zone Handling, Periods and Periods Arithmetic, Resampling and Frequency conversion, Moving Window Functions.

Unit 5

Advanced Pandas: Categorical Data, Advanced GroupBy Use, Techniques for Method Chaining.

Practicals

Lab List 1

  1. Practicals based on NumPy ndarray
  2. Practicals based on Pandas Data Structures
  3. Practicals based on Data Loading, Storage and File Formatss
  4. Practicals based on Interacting with Web APIs
  5. Practicals based on Data Cleaning and Preparation
  6. Practicals based on Data Wrangling
  7. Practicals based on Data Visualization using matplotlib
  8. Practicals based on Data Aggregation
  9. Practicals based on Time Series Data Analysis

Additional Information

Text Books


McKinney, W.(2017). Python for Data Analysis: Data Wrangling with Pandas, NumPy and IPython. 2nd edition. O’Reilly Media
O’Neil, C., & Schutt, R. (2013). Doing Data Science: Straight Talk from the Frontline
O’Reilly Media

Teaching Learning Process


Use of ICT tools in conjunction with traditional class room teaching methods
Interactive sessions
Class discussions

Assessment Methods

Written tests, assignments, quizzes, presentations as announced by the instructor in the class

Keywords

Data Analysis, data wrangling, data visualization, data cleaning, data preparation.

Disclaimer: Details on this page are subject to change as per University of Delhi guidelines. For latest update in this regard please refer to the University of Delhi website here.