How to Select Column a DataFrame using Pandas Library in Jupyter Notebook

Posted on

This article will show how to select column from a DataFrame. In python, along with the pandas library package, there is a specific data structure type with the name of DataFrame. The variable with the type of DataFrame retrieves some data from many kinds of form including files and databases. After retrieving data, the selection process of some columns from that variable is also possible. It is similar with execution query to a table. The selection process is available by running a certain script or code in jupyter notebook. Jupyter notebook is an open source web application for manipulating data. So, run the jupyter notebook first as follows :

(myenv) C:\python\data-science>jupyter notebook
[I 17:08:28.062 NotebookApp] Serving notebooks from local directory: C:\python\data-science
[I 17:08:28.062 NotebookApp] The Jupyter Notebook is running at:
[I 17:08:28.067 NotebookApp] http://localhost:8888/?token=4dd9801ef2aacad1d445955b0ae4621b4c669da84c617e7b
[I 17:08:28.068 NotebookApp]  or http://127.0.0.1:8888/?token=4dd9801ef2aacad1d445955b0ae4621b4c669da84c617e7b
[I 17:08:28.070 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 17:08:28.225 NotebookApp]

    To access the notebook, open this file in a browser:
        file:///C:/Users/Personal/AppData/Roaming/jupyter/runtime/nbserver-8796-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/?token=4dd9801ef2aacad1d445955b0ae4621b4c669da84c617e7b
     or http://127.0.0.1:8888/?token=4dd9801ef2aacad1d445955b0ae4621b4c669da84c617e7b

After executing jupyter notebook, run the script for selecting column of a variable with the type of DataFrame. For starter, just list all the columns available in the variable. The example for the data is available in the article How to Get Data From a PostgreSQL Database in Jupyter Notebook. Actually, in the example, there are three columns. After storing the data into a dataframe, just select the desired column. The script exist as in the following lines :

data.info()
data_subset['employee_id']

The following is the output execution :

How to Select Column a DataFrame using Pandas Library in Jupyter Notebook
How to Select Column a DataFrame using Pandas Library in Jupyter Notebook

In the above example, it is selecting one and even two columns at one. First of all, it will display all of the available columns in the DataFrame. After that, it demonstrate how to select just one column using the label or the name of the column. The first one, it select the column with the name or the label of ’employee_id’. Next, it demonstrate how to select two columns at the same time. Those columns are ’employee_id’ and ‘name’.

Leave a Reply