This article will focus on slicing data from a variable with the DataFrame type. The variable will have data from different kinds of sources. For an example whether it is from a CSV file or database connection. The process for slicing data will run as a proper demonstration in Jupyter Notebook. Slicing data means retrieving data within desired range either it is the rows interval or columns interval. So, before executing the script to slice data in the Jupyter Notebook, run it first. Jupyter Notebook itself is a web-based application where it can create and share documents that contain live code, equations, visualizations and narrative text. That include data cleaning and transformation, numerical simulation, statistical modeling, data visualizations, machine learning and many other. The purpose is also fit for this context. It is manipulating data in a DataFrame by slicing it. Below is the command for running the Jupyter Notebook :
(myenv) C:\python\data-science>jupyter notebook [I 17:08:28.062 NotebookApp] Serving notebooks from local directory: C:\python\data-science [I 17:08:28.062 NotebookApp] The Jupyter Notebook is running at: [I 17:08:28.067 NotebookApp] http://localhost:8888/?token=4dd9801ef2aacad1d445955b0ae4621b4c669da84c617e7b [I 17:08:28.068 NotebookApp] or http://127.0.0.1:8888/?token=4dd9801ef2aacad1d445955b0ae4621b4c669da84c617e7b [I 17:08:28.070 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [C 17:08:28.225 NotebookApp] To access the notebook, open this file in a browser: file:///C:/Users/Personal/AppData/Roaming/jupyter/runtime/nbserver-8796-open.html Or copy and paste one of these URLs: http://localhost:8888/?token=4dd9801ef2aacad1d445955b0ae4621b4c669da84c617e7b or http://127.0.0.1:8888/?token=4dd9801ef2aacad1d445955b0ae4621b4c669da84c617e7b
Slicing Data from DataFrame
The concept is using the same execution pattern as in the article How to Select Data with iloc function from DataFrame using Pandas Library in Jupyter Notebook. Basically, slicing data is also using the same pattern with the iloc function. The pattern is also has the same pattern as the selection data from DataFrame as follows :
df.iloc[row selection,column selection]
The above pattern will have an integer as its argument. Where the argument are in integer value. So, the row selection and column selection will have a specific integer value. And those value are indicating the index of the row and also the index of the column. So, if the row selection value is 0, it will select the row with index ‘0’ or the first row. Moreover, if the row selection value is 0, it will select the column with index ‘0’ or the first column. It goes with the other value. But in order to select according to specific range or interval. There is a slight different pattern argument for the function. The following is an example :
The pattern for the data slicing using iloc method is in the following pattern :
Slicing using the “:” will absolutely select everything. But since there is a specific number in the last_row_selection, which in the above example is 3, it will select up to the third index for the row slicing. On the other hand, since there is also a specific number in the last_column_selection, which in the above example is 2, it will select up to the second index for the column slicing.