PANDAS-Getting started with data analytics.
Hello everyone! I want to discuss pandas,one of the most important and widely used python library. Though are are lot of stories available , but the importance of this particular library got me to write this. This is the one library from where the journey of data science begins.
As more and more people are getting into Data science and Machine learning , it is very important for them to have the right tools.
Pandas is an open-source, BSD-licensed Python library providing high-performance, easy-to-use data structures and data analysis tools for the Python
it is installed generally with another important python library called Numpy as:
import pandas as pdimport numpy as np
Reading and writing data
Using pandas we can with various different file format, for example:
Most of the time we use , only two type of file , either CSV(comma separated value) file or Excel file.
let’s see how to read and write these two file.
read_csv is used to read files , while to_csv is used to write files.
Similar to CSV , read_excel is used to read files,while to_excel is used to write files.
One of the most important things in Pandas is to understand the data structure that it has.Pandas is divided into three data structures when it comes to dimensionality of an array. These data structures are:
whlie series 1D ,DataFrame is 2D and panel is 3D. Out of these three , the one most frequently used Datastructure is DataFrame.Let’s see how we can create a DataFrame.
pd.DataFrame( data, index, columns, dtype, copy)
Data cleaning is the process of fixing or removing:-
- Null Values
- incorrect Data
- Incorrectly Formatted
- Duplicate Data
and so on. Pandas provide useful functions to tackle these problems.For example to remove the empty cell or Null Values , we have dropna function.
df = pd.read_csv('data.csv')
df_without_null = df.dropna()
Similarly to remove duplicates :-
df.drop_duplicates(inplace = True)
Filtering and Grouping data
We are often required to perform the filtering operations for accessing the desired data. For this pandas has very handy functions as groupby() and filter.
Let’s understand this by example:
I hope it’s clear from the example what groupby() does.
Data can be filtered as per requirements, such as here we can filter the data with age less than 10 :-
filtered_data = data['AGE']<10
Merging Data Frames
Sometimes we do need to merge two data sets to form a single one . this concept can be related to the join function of a relational database. pandas do take of all these and have an inbuilt function as merge. Let’s see hoe merge function is defined :-
DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)
I have tried to give a complete overview of pandas, i hope you all will find useful. If you like this story or find it useful , do let me know by clapping.
Please let me know if you find any mistakes or you want me to add anything to this by commenting.