How to Select Data using loc from DataFrame using Pandas Library in Jupyter Notebook

Posted on

This article is also another article where the main focus is data selection. It will select specific data from a variable with the DataFrame type. The process for selecting data is in the Jupyter Notebook. The Jupyter Notebook itself is a web-based application where te script can run. Just retrieve the data from any kind of source such as CSV file or database connection. Store it into a variable with the type of DataFrame. It is a data structure type that come along with Pandas library. After successfully store the data, try to select it using the loc function. But before executing the script, run the Jupyter Notebook first. The following is the execution of the Jupyter Notebook :

(myenv) C:\python\data-science>jupyter notebook
[I 17:08:28.062 NotebookApp] Serving notebooks from local directory: C:\python\data-science
[I 17:08:28.062 NotebookApp] The Jupyter Notebook is running at:
[I 17:08:28.067 NotebookApp] http://localhost:8888/?token=4dd9801ef2aacad1d445955b0ae4621b4c669da84c617e7b
[I 17:08:28.068 NotebookApp]  or http://127.0.0.1:8888/?token=4dd9801ef2aacad1d445955b0ae4621b4c669da84c617e7b
[I 17:08:28.070 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 17:08:28.225 NotebookApp]

    To access the notebook, open this file in a browser:
        file:///C:/Users/Personal/AppData/Roaming/jupyter/runtime/nbserver-8796-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/?token=4dd9801ef2aacad1d445955b0ae4621b4c669da84c617e7b
     or http://127.0.0.1:8888/?token=4dd9801ef2aacad1d445955b0ae4621b4c669da84c617e7b

In this context, the data will be available from a database connection as in the article How to Get Data From a PostgreSQL Database in Jupyter Notebook. After getting the data and storing it to the variable with the DataFrame data type structure, the selection process of the data is possible. The following is the pattern for selecting data using loc function where it is available in a DataFrame variable as follows :

df.loc[row_selection,column_selection]

The following is the example of the above data selection script.

How to Select Data using loc from DataFrame using Pandas Library in Jupyter Notebook
How to Select Data using loc from DataFrame using Pandas Library in Jupyter Notebook

In the above example, loc function is useful to select row with specific column. For an example, the first argument is 0 for selecting the row with index 0. The second argument is ‘id’ for selecting the column with the label ‘id’. So instead of giving integer as in the iloc function for selecting column, it is directly state the ‘label’ of the column. In this case, that label is ‘id’.

 

Leave a Reply