I am religiously following the Data Analysis with Python online course at cognitiveclass.ai. I have found the courses at cognitiveclass.ai interesting and informative. Courses are very simple and requires your utmost attention. They have correctly stayed away from being over interactive(yes this is the very bad trend spread across most of Data science course found online). My only concern is no text or PDF files to brush up. At least handouts would have been great. So i have thought of documenting important lectures for myself.

Python Packages for Data Science- Short Intro

In order to do data analysis in Python-

A Python library is a collection of functions and methods that allow you to perform lots of actions without writing any code. The libraries usually contain built-in modules providing different functionalities, which you can use directly. And there are extensive libraries, offering a broad range of facilities.

Python data analysis libraries can be divided into three groups:

1.The first group is called “scientific computing libraries.”

a. Pandas

offers data structure and tools for effective data manipulation and analysis. It provides fast axis to structured data. The primary instrument of Pandas is a two-dimensional table consisting of column and row labels, which are called a DataFrame. It is designed to provide easy indexing functionality.

b. The Numpy library

uses arrays for its inputs and outputs. It can be extended to objects for matrices, and with minor coding changes, developers can perform fast array processing.

c. SciPy

includes functions for some advanced math problems, as listed on this slide, as well as data visualization.

2. Visualization Libraries:-

Using data visualization methods is the best way to communicate with others, showing them meaningful results of analysis.These libraries enable you to create graphs, charts and maps.

a.The Matplotlib package

is the most well-known library for data visualization. It is great for making graphs and plots. The graphs are also highly customizable.


Another high-level visualization library is Seaborn. It is based on Matplotlib. It’s very easy to generate various plots such as heat maps, time series, and violin plots.

3. Algorithmic Libraries:-

With Machine Learning algorithms, we’re able to develop a model using our dataset, and obtain predictions.
The algorithmic libraries tackle some machine learning tasks from basic to complex.
Here we introduce two packages:

a. The Scikit-learn library

contains tools for statistical modeling, including regression, classification, clustering and so on. This library is built on NumPy, SciPy and Matplotlib.

b. StatsModels

is also a Python module that allows users to explore data, estimate statistical
models, and perform statistical tests.