INFORMATICS PRACTICES
PYTHON PANDAS
– 1 (Part - 1)
SERIES OBJECTS
INTRODUCTION
Python Pandas
is Python’s library for data analysis. Pandas has derived its name from ‘Panel
data system’, which is an echometric term for multi-dimensional structured data
sets. Today, Pandas has become a popular choice for data analysis. The main
author of Pandas is Wes McKinney.
Pandas is an
open source, BSD library built for Python programming language. Pandas offers
high-performance, easy-to-use data structures and data analysis tools.
Pandas is the
most popular library in the scientific Python ecosystem for doing data
analysis. Pandas is capable of many tasks like:
- It can calculate in all the possible ways data is organized.
- It allows you to apply operations to independent groups within the data.
- It supports reshaping of data into different forms.
- It can easily select subset of data from bulky data set.
- It can combine the multiple datasets together.
- It supports advanced time-series functionality.
- It has functionality to find and fill missing data in dataset.
- It supports visualization by integrating matplotlib and seaborn etc. libraries.
Pandas Data
Structures
A data
structures is a particular way of storing and organizing data in a computer to
suit a specific purpose so that it can be accessed and worked with in
appropriate ways.
Series Vs.
DataFrame Objects
Property |
Series |
DataFrame |
Dimensions |
1-dimensional |
2-dimensional |
Type of data |
Homogeneous,
i.e., all the elements must be of same data type in a Series object. |
Heterogeneous,
i.e. DataFrame object can have elements of different data types. |
Mutability |
Value mutable,
i.e., their element’s value cab change Size-immutable |
Value mutable Size-mutable |
Series Data
Structure
Series Data Structure
A Series is a pandas data structure that represents a
one-dimensional array like object containing an array of data (of any NumPy
data type) and an associated array of data labels, called its index.
Sol: import pandas as
pd ser =
pd.Series([1, 3, 5, 7, 9]) print(“Series
Object is:”) print(ser)
|
Output: Series Object is: 0
1 1
3 2
5 3
7 4
9 Dtype : int64 |
Sol: import pandas as
pd ser =
pd.Series([‘p’, ‘y’, ‘t’, ‘h’ ‘o’, ‘n’]) print(“Series
Object is:”) print(ser)
|
Output: Series Object is: 0
p 1
y 2
t 3
h 4
o 5
n Dtype : int64 |
Sol: import pandas as
pd ser = pd.Series([‘I’,
‘am’, ‘student’ , ‘of’, ‘Takshila’]) print(“Series
Object is:”) print(ser)
|
Output: Series Object is: 0
I 1
am 2
student 3
of 4
Takshila Dtype : int64 |
Specify Data as an ndarray
The data attribute cab ne an ndarray also.
Consider following code:
a = np.arange(5, 25, 4)
print(a)
ser = pd. Series(a)
print(ser)
Example4:
Write a program to create a Series object using an
ndarray that has 6 elements in the range 5 to 55.
Sol: import pandas as
pd import numpy as
np ser =
pd.Series(np.linspace(5, 55, 6)) print(“Series
Object is”) print(ser) |
Output: Series Object is: 0 5.0 1 15.0 2 25.0 3 35.0 4 45.0 5 55.0 dtype: float64 |
Example5:
Write a program to create a Series object using an
ndarray that is created by tiling a list [7,8,9], twice.
Sol: import pandas as
pd import numpy as
np ser =
pd.Series(np.tile([7,8,9],2)) print(“Series
Object is”) print(ser) |
Output: Series Object is: 0 7 1 8 2 9 3 7 4 8 5 9 dtype: int32 |
Data as a Python Dictionary (Series)
In dictionary keys become index and values
become data of Series object.
Example6:
Write a program to create a Series object using a
dictionary that stores the number of students in each section of class in your
school.
Sol: import pandas as
pd stu1 = {'A':35,
'B':38, 'C': 40, 'D':42, 'E':45} ser =
pd.Series(stu1) print(ser) |
Output: A 35 B 38 C 40 D 42 E 45 dtype: int64 |
Data as a Scalar Value
If data is a scalar value, then the index argument to
Series( ) function must be provided.
Example7:
Write a program to create a Series object that stores
the initial budget allocated 45000/- each for the four quarters of the year :
Qtr1, Qtr2, Qtr3 and Qtr4.
Sol: import pandas as
pd ser = pd.Series(45000,
index = ['Qtr1', 'Qtr2', 'Qtr3', 'Qtr4']) print(ser) |
Output: Qtr1 45000 Qtr2 45000 Qtr3 45000 Qtr4 45000 dtype: int64 |
Adding NaN values in a Series Object
In such cases, you can fill missing data with a NaN
(Not a Number) values.
Empty value NaN is defined in Numpy module and hence
you can use np.NaN to specify missing value, or use None, e.g.,
ser = pd.Series( [5.5, 6.8, np.NaN, 7.6])
print(ser)
Output:
0 5.5
1 6.8
2 NaN
3 7.6
dtype:
float64
Example8:
A Python list namely section stores the section names (‘A’, ‘B’, ‘C’, ‘D’, ‘E’) of class XII in your school. Another list contribution made by these students to a charity fund endorsed by the school. Write code to create a Series object that stores the contribution amount as the values and the section names as the indexes.
Sol: import pandas as
pd section = ['A',
'B', 'C', 'D', 'E'] contribution =
[5000, 6000, 6500, 7500, 8000] ser =
pd.Series(data = contribution, index = section) print(ser) |
Output: A 5000 B 6000 C 6500 D 7500 E 8000 dtype: int64 |
The Series( ) allows you to define a function or
expression that can calculate values for data sequence.
Example9:
Given are two object, a list object namely lst1 and a
Series ser1, both are having similar values i.e., 1,3,5,7,9. Find the output.
(i) print(lst1*2) (ii)
print(ser1*2)
(i) import pandas as
pd lst1 = [
1,3,5,7,9] print(lst1*2) |
Output: [1, 3, 5, 7, 9,
1, 3, 5, 7, 9] |
(ii) import pandas as
pd ser1 =pd.Series(
[ 1,3,5,7,9]) print(ser1*2) |
Output: 0 2 1 6 2 10 3 14 4 18 dtype: int64 |
Series Object Attributes
Some common attributes of Series object are- index,
values, dtype, shape, nbytes, ndim, size, itemsize, hasnans, empty etc.
Example10:
consider the series object ser that stores the
contribution of each section as sown
A 6700
B 5600
C 5000
D 5200
E 8235
Write code to create series object and use the all
attributes.
import pandas as pd import numpy as
np
a = [6700, 5600,
5000, 5200, 8235] ser =
pd.Series(a, index = ['A', 'B', 'C', 'D', 'E']) print(ser)
print(ser.index) # The index (axis labels) of the series
print(ser.values) #Return Series as ndarray or ndarray like
depending on the dtype
print(ser.dtype) # return the dtype object of the
underlaying data.
print(ser.shape) # Return a tuple of the shape of the
underlying data.
print(ser.nbytes) #Return the number of bytes in the
underlying data.
print(ser.ndim) # return the number of dimensions of the
underlyting data.
print(ser.size) # Return the number of elements in the
underlying data.
#print(ser.itemsize) # return the size of the dtype of the
item of the underlying data.
print(ser.hasnans) #Return True if there are any NaN values,
otherwise return False.
print(ser.empty) # Return True if the Series object is
empty, otherwise False. |
Output: A 6700 B 5600 C 5000 D 5200 E 8235 dtype: int64 Index(['A', 'B',
'C', 'D', 'E'], dtype='object') [6700 5600 5000
5200 8235] int64 (5,) 40 1 5 False False |
The head( ) and tail( ) Functions
The head( ) function is used to fetch first n rows
from a Pandas object and tail ( ) function returns last n rows from a Pandas
object.
Example11:
A Series object data1 consists of around 100 rows of
data. Write a program to print the following details:
(i) First 8 rows of data
(ii) Last 4 rows of data
Solution:
import pandas as pd
import numpy as np
data1 = np.arange(1,100)
ser1 =pd.Series(data1)
print(ser1)
print(ser1.head(8))
print(ser1.tail(4))
Vector Operations on Series Object
Vector operations mean that if you apply a function or
expression then it is individually applied on each item of the object. Since
Series objects are built upon Numpy arrays (ndarrays), they also support
vectorized operations, just like ndarrays.
Example:
import pandas as
pd import numpy as
np data1 =
np.arange(1,100,15) ser1
=pd.Series(data1) print(ser1) print(ser1+2) |
Output: 0 1 1 16 2 31 3 46 4 61 5 76 6 91 dtype: int32 0 3 1 18 2 33 3 48 4 63 5 78 6 93 dtype: int32 |
import pandas as
pd import numpy as
np data1 =
np.arange(1,100,15) ser1
=pd.Series(data1) #print(ser1) print(ser1*3) |
0 3 1 48 2 93 3 138 4 183 5 228 6 273 dtype: int32 |
import pandas as
pd import numpy as
np data1 =
np.arange(1,100,15) ser1
=pd.Series(data1) #print(ser1) print(ser1**2) |
0 1 1 256 2 961 3 2116 4 3721 5 5776 6 8281 dtype: int32 |
import pandas as
pd import numpy as
np data1 =
np.arange(1,100,15) ser1
=pd.Series(data1) #print(ser1) print(ser1>50) |
0 False 1 False 2 False 3 False 4 True 5 True 6 True dtype: bool |
Example 12:
Number of
students in class 11 and 12 in four streams (‘Math’, ‘Biology’, ‘Commerce’ and
‘Humanities’) are stored in two Series object TA11 and TA12. Write code to find
total number of students in class 11 and 12, stream wise.
Solution: import pandas as
pd TA11 =
pd.Series(data = [28, 32, 35, 38], index =['Math', 'Biology', 'Commerce',
'Humanities']) TA12 =
pd.Series(data = [27, 35, 37, 40], index =['Math', 'Biology', 'Commerce',
'Humanities']) print('Total
number of students in class 11 and 12 are:') print(TA11+TA12) |
Output: Total number of
students in class 11 and 12 are: Math 55 Biology 67 Commerce 72 Humanities 78 dtype: int64 |
Sorting on the Values and Index
To sort a Series object on the basis of values and
index, you may use sort_values() and and sort_index().
Example
import pandas as
pd data1 = [6000,
7000, 4000, 3000, 2500] ser =
pd.Series(data1, index = ['A', 'B', 'C','D','E']) print(ser) #For the Values print(ser.sort_values()) print(ser.sort_values(ascending=False)) #For the indexes print(ser.sort_index()) print(ser.sort_index(ascending
= False)) |
No comments:
Post a Comment
Please do not any spam in the comment box.