Python & MySQL: PYTHON PANDAS - 1 (SERIES OBJECTS)

INFORMATICS PRACTICES

PYTHON PANDAS – 1 (Part - 1)

SERIES OBJECTS

INTRODUCTION

Python Pandas is Python’s library for data analysis. Pandas has derived its name from ‘Panel data system’, which is an echometric term for multi-dimensional structured data sets. Today, Pandas has become a popular choice for data analysis. The main author of Pandas is Wes McKinney.

Pandas is an open source, BSD library built for Python programming language. Pandas offers high-performance, easy-to-use data structures and data analysis tools.

Pandas is the most popular library in the scientific Python ecosystem for doing data analysis. Pandas is capable of many tasks like:

It can read and write in many different data formats like integer, double, float etc.

It can calculate in all the possible ways data is organized.
It allows you to apply operations to independent groups within the data.
It supports reshaping of data into different forms.
It can easily select subset of data from bulky data set.
It can combine the multiple datasets together.
It supports advanced time-series functionality.
It has functionality to find and fill missing data in dataset.
It supports visualization by integrating matplotlib and seaborn etc. libraries.

Pandas Data Structures

A data structures is a particular way of storing and organizing data in a computer to suit a specific purpose so that it can be accessed and worked with in appropriate ways.

Series Vs. DataFrame Objects

Property	Series	DataFrame
Dimensions	1-dimensional	2-dimensional
Type of data	Homogeneous, i.e., all the elements must be of same data type in a Series object.	Heterogeneous, i.e. DataFrame object can have elements of different data types.
Mutability	Value mutable, i.e., their element’s value cab change Size-immutable	Value mutable Size-mutable

Series Data Structure

A Series is a pandas data structure that represents a one-dimensional array like object containing an array of data (of any NumPy data type) and an associated array of data labels, called its index.

Example1: Write code to create a Series object using Python sequences [1,3,5,7,9]. Assume that Pandas is imported as alias pd.

Sol:

import pandas as pd

ser = pd.Series([1, 3, 5, 7, 9])

print(“Series Object is:”)

print(ser)

Output:

Series Object is:

0 1

1 3

2 5

3 7

4 9

Dtype : int64

Example2: Write a program to create a Series object using individual characters to ‘p’, ‘y’, ‘t’, ‘h’, ‘o’ and ‘n’. Assume that pandas is imported as alias name pd.

Sol:

import pandas as pd

ser = pd.Series([‘p’, ‘y’, ‘t’, ‘h’ ‘o’, ‘n’])

print(“Series Object is:”)

print(ser)

Output:

Series Object is:

0 p

1 y

2 t

3 h

4 o

5 n

Dtype : int64

Example 3:Write a program to create Series object using five different words: ‘I’, ‘am’, ‘student’, ‘of’, ‘Takshila’. Assume that pandas is imported as alias name pd.

Sol:

import pandas as pd

ser = pd.Series([‘I’, ‘am’, ‘student’ , ‘of’, ‘Takshila’])

print(“Series Object is:”)

print(ser)

Output:

Series Object is:

0 I

1 am

2 student

3 of

4 Takshila

Dtype : int64

Specify Data as an ndarray

The data attribute cab ne an ndarray also.

Consider following code:

a = np.arange(5, 25, 4)

print(a)

ser = pd. Series(a)

print(ser)

Example4:

Write a program to create a Series object using an ndarray that has 6 elements in the range 5 to 55.

Sol:

import pandas as pd

import numpy as np

ser = pd.Series(np.linspace(5, 55, 6))

print(“Series Object is”)

print(ser)

Output:

Series Object is:

0 5.0

1 15.0

2 25.0

3 35.0

4 45.0

5 55.0

dtype: float64

Example5:

Write a program to create a Series object using an ndarray that is created by tiling a list [7,8,9], twice.

Sol:

import pandas as pd

import numpy as np

ser = pd.Series(np.tile([7,8,9],2))

print(“Series Object is”)

print(ser)

Output:

Series Object is:

0 7

1 8

2 9

3 7

4 8

5 9

dtype: int32

Data as a Python Dictionary (Series)

In dictionary keys become index and values become data of Series object.

Example6:

Write a program to create a Series object using a dictionary that stores the number of students in each section of class in your school.

Sol:

import pandas as pd

stu1 = {'A':35, 'B':38, 'C': 40, 'D':42, 'E':45}

ser = pd.Series(stu1)

print(ser)

Output:

A 35

B 38

C 40

D 42

E 45

dtype: int64

Data as a Scalar Value

If data is a scalar value, then the index argument to Series( ) function must be provided.

Example7:

Write a program to create a Series object that stores the initial budget allocated 45000/- each for the four quarters of the year : Qtr1, Qtr2, Qtr3 and Qtr4.

Sol:

import pandas as pd

ser = pd.Series(45000, index = ['Qtr1', 'Qtr2', 'Qtr3', 'Qtr4'])

print(ser)

Output:

Qtr1 45000

Qtr2 45000

Qtr3 45000

Qtr4 45000

dtype: int64

Adding NaN values in a Series Object

In such cases, you can fill missing data with a NaN (Not a Number) values.

Empty value NaN is defined in Numpy module and hence you can use np.NaN to specify missing value, or use None, e.g.,

ser = pd.Series( [5.5, 6.8, np.NaN, 7.6])

print(ser)

Output:

0 5.5

1 6.8

2 NaN

3 7.6

dtype: float64

Example8:

A Python list namely section stores the section names (‘A’, ‘B’, ‘C’, ‘D’, ‘E’) of class XII in your school. Another list contribution made by these students to a charity fund endorsed by the school. Write code to create a Series object that stores the contribution amount as the values and the section names as the indexes.

Sol:

import pandas as pd

section = ['A', 'B', 'C', 'D', 'E']

contribution = [5000, 6000, 6500, 7500, 8000]

ser = pd.Series(data = contribution, index = section)

print(ser)

Output:

A 5000

B 6000

C 6500

D 7500

E 8000

dtype: int64

Using Mathematical Function/Expression to Create Data Array in Series()

The Series( ) allows you to define a function or expression that can calculate values for data sequence.

Example9:

Given are two object, a list object namely lst1 and a Series ser1, both are having similar values i.e., 1,3,5,7,9. Find the output.

(i) print(lst1*2) (ii) print(ser1*2)

(i) import pandas as pd

lst1 = [ 1,3,5,7,9]

print(lst1*2)

Output:

[1, 3, 5, 7, 9, 1, 3, 5, 7, 9]

(ii) import pandas as pd

ser1 =pd.Series( [ 1,3,5,7,9])

print(ser1*2)

Output:

0 2

1 6

2 10

3 14

4 18

dtype: int64

Series Object Attributes

Some common attributes of Series object are- index, values, dtype, shape, nbytes, ndim, size, itemsize, hasnans, empty etc.

Example10:

consider the series object ser that stores the contribution of each section as sown

A 6700

B 5600

C 5000

D 5200

E 8235

Write code to create series object and use the all attributes.

import pandas as pd

import numpy as np

a = [6700, 5600, 5000, 5200, 8235]

ser = pd.Series(a, index = ['A', 'B', 'C', 'D', 'E'])

print(ser)

print(ser.index) # The index (axis labels) of the series

print(ser.values) #Return Series as ndarray or ndarray like depending on the dtype

print(ser.dtype) # return the dtype object of the underlaying data.

print(ser.shape) # Return a tuple of the shape of the underlying data.

print(ser.nbytes) #Return the number of bytes in the underlying data.

print(ser.ndim) # return the number of dimensions of the underlyting data.

print(ser.size) # Return the number of elements in the underlying data.

#print(ser.itemsize) # return the size of the dtype of the item of the underlying data.

print(ser.hasnans) #Return True if there are any NaN values, otherwise return False.

print(ser.empty) # Return True if the Series object is empty, otherwise False.

Output:

A 6700

B 5600

C 5000

D 5200

E 8235

dtype: int64

Index(['A', 'B', 'C', 'D', 'E'], dtype='object')

[6700 5600 5000 5200 8235]

int64

(5,)

False

The head( ) and tail( ) Functions

The head( ) function is used to fetch first n rows from a Pandas object and tail ( ) function returns last n rows from a Pandas object.

Example11:

A Series object data1 consists of around 100 rows of data. Write a program to print the following details:

(i) First 8 rows of data

(ii) Last 4 rows of data

Solution:

import pandas as pd

import numpy as np

data1 = np.arange(1,100)

ser1 =pd.Series(data1)

print(ser1)

print(ser1.head(8))

print(ser1.tail(4))

Vector Operations on Series Object

Vector operations mean that if you apply a function or expression then it is individually applied on each item of the object. Since Series objects are built upon Numpy arrays (ndarrays), they also support vectorized operations, just like ndarrays.

Example:

import pandas as pd

import numpy as np

data1 = np.arange(1,100,15)

ser1 =pd.Series(data1)

print(ser1)

print(ser1+2)

Output:

0 1

1 16

2 31

3 46

4 61

5 76

6 91

dtype: int32

0 3

1 18

2 33

3 48

4 63

5 78

6 93

dtype: int32

import pandas as pd

import numpy as np

data1 = np.arange(1,100,15)

ser1 =pd.Series(data1)

#print(ser1)

print(ser1*3)

0 3

1 48

2 93

3 138

4 183

5 228

6 273

dtype: int32

import pandas as pd

import numpy as np

data1 = np.arange(1,100,15)

ser1 =pd.Series(data1)

#print(ser1)

print(ser1**2)

0 1

1 256

2 961

3 2116

4 3721

5 5776

6 8281

dtype: int32

import pandas as pd

import numpy as np

data1 = np.arange(1,100,15)

ser1 =pd.Series(data1)

#print(ser1)

print(ser1>50)

0 False

1 False

2 False

3 False

4 True

5 True

6 True

dtype: bool

Example 12:

Number of students in class 11 and 12 in four streams (‘Math’, ‘Biology’, ‘Commerce’ and ‘Humanities’) are stored in two Series object TA11 and TA12. Write code to find total number of students in class 11 and 12, stream wise.

Solution:

import pandas as pd

TA11 = pd.Series(data = [28, 32, 35, 38], index =['Math', 'Biology', 'Commerce', 'Humanities'])

TA12 = pd.Series(data = [27, 35, 37, 40], index =['Math', 'Biology', 'Commerce', 'Humanities'])

print('Total number of students in class 11 and 12 are:')

print(TA11+TA12)

Output:

Total number of students in class 11 and 12 are:

Math 55

Biology 67

Commerce 72

Humanities 78

dtype: int64

Sorting on the Values and Index

To sort a Series object on the basis of values and index, you may use sort_values() and and sort_index().

Example

import pandas as pd

data1 = [6000, 7000, 4000, 3000, 2500]

ser = pd.Series(data1, index = ['A', 'B', 'C','D','E'])

print(ser)

#For the Values

print(ser.sort_values())

print(ser.sort_values(ascending=False))

#For the indexes

print(ser.sort_index())

print(ser.sort_index(ascending = False))

=========================

NOTE: Go to the Blog Archive and search Month Wise Contents

=========================

THANK YOU !!!

Python & MySQL

Thursday, April 29, 2021

PYTHON PANDAS - 1 (SERIES OBJECTS)

No comments:

Post a Comment

CLASS XI HALF YEARLY QP WTH MS 2024

Contact Form

Pages