Python Pandas

Getting Started with Pandas

let's learn about building blocks of data science in python

getting started with python pandas

Brief Introduction to Python Pandas
Pandas is one of the most crucial packages when it comes to Python data science tool stack. Pandas is an open source Python Data Analysis Library which makes it easier to create and manipuale data structures which can contain hundreds to millions of rows of data. In this tutorial we'll walk you through the following steps ::
  1. Installing Pandas via Pip3
  2. Other alternatives for getting Pandas
  3. Importing Pandas in Jupyter Notebook
  4. Introducing Pandas Data Frame
  5. Creating Simple Data Frames
  6. Summary
Installing Pandas via Pip3
If you're running Python3 and have Pip3 installed , then installing pandas can be very simple. Just execute the following command in your terminal.

pip3 install pandas 
Though installing Pandas via Pip3 is one of the easiest ways to get it but it's worth mentioning that there exists alternatives to that method.
  1. Anaconda :: Anaconda is a python based software tool for data science which comes pre-equipped with Pandas.
  2. Enthought Canopy :: Similarly Canopy is yet another tool in which you can get Pandas.
*Note :: Convering how to install these tools is out of scope of this tutorial , interested reader may search these on google and visit their respective sites.
Importing Pandas in Jupyter Notebook
While writing these tutorials , we've used Jupyter notebook throughout to run our code examples , we recommend the same for our readers also, but that's not a compulsion. You may like to use your own favorite editors or IDEs , we leave that choice to you. Just in case you're interested in using Jupyter Notebook but don't know how to set it up , we've a tutorial for you here . In order to import pandas , just create a new notebook , issue the following command in your notebook and hit ctrl + enter .

import pandas as pd 

Introducing DataFrame
In the previous section , we already imported pandas into our workspace and so now we're ready to work with it. A DataFrame is a core data structure in pandas. A DataFrame is very similar to a table in an excel sheet or a csv file as it consists of rows and columns. In future tutorials we'll see that we can actually load complete excel or csv files into a pandas DataFrame. But here let's try to create a simple data frame. One of the most simplest ways to create a Pandas DataFrame is by using a Python dictionary , we can create a dataframe from a dictionary as follows ::

import pandas as pd 

df = pd.DataFrame({ 
    'name' : [ 'Rebeca' , 'Emily Blunt' , 'Jessica' ] ,
    'age'  : [ 32 , 34 , 54 ] , 
    'hours-of-code' : [ 2010 , 8000 , 3400 ] 

When we execute above code , [ using ctrl + enter in case of Jupyter Notebook ] , we see that a new DataFrame is created. Check out the following screenshot from Jupyter Notebook.

Congratulations !! , as promised , we just created our first DataFrame in Pandas using Jupyter Notebook , note that if you're using your own IDE , you can run the program like any other Python script and it should give same results. We've used Notebook because it seems to be very convenient when it comes to sharing results visually and coming back to code later on is also very easier.
In this tutorial , we've tried to get you up and running with Python Pandas. We hope you were able to follow along and have got your Pandas running. In the next set of tutorials , we'll actually start diving deeper into the capabilities of this power package.
Hope you enjoyed learning about the Python Pandas and don't worry in case you're not clear yet. You'll become more comfortable as we dive further in the future parts and start writing more code.

More from RISHABH.IO

developed & nourished with by