This is an article where the main focus is to select only column with specific data type. In this context, only the column with non-numeric data type. Actually, the process for achieving it will be available in jupyter notebook. Jupyter notebook is a program based on web that allows to create and share documents containing live code, equations, visualizations and narrative text. Furthermore, it is useful for data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. First of all, before starting to demonstrate how to select columns, run the jupyter notebook. So, in order to do that, exeute the following script as follows :
(myenv) C:\python\data-science>jupyter notebook [I 17:08:28.062 NotebookApp] Serving notebooks from local directory: C:\python\data-science [I 17:08:28.062 NotebookApp] The Jupyter Notebook is running at: [I 17:08:28.067 NotebookApp] http://localhost:8888/?token=4dd9801ef2aacad1d445955b0ae4621b4c669da84c617e7b [I 17:08:28.068 NotebookApp] or http://127.0.0.1:8888/?token=4dd9801ef2aacad1d445955b0ae4621b4c669da84c617e7b [I 17:08:28.070 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [C 17:08:28.225 NotebookApp] To access the notebook, open this file in a browser: file:///C:/Users/Personal/AppData/Roaming/jupyter/runtime/nbserver-8796-open.html Or copy and paste one of these URLs: http://localhost:8888/?token=4dd9801ef2aacad1d445955b0ae4621b4c669da84c617e7b or http://127.0.0.1:8888/?token=4dd9801ef2aacad1d445955b0ae4621b4c669da84c617e7b
After successfully run jupyter notebook, just run the suitable script to be able to get the data. The following is the example of running the script :
Finally, in order to achieve the purpose of selecting column with a specific data type from a DataFrame, the above command execution is the example. The command execution is :
import numpy as np data.select_dtypes(exclude=np.number)
The function available for selecting specific data type is ‘select_dtypes’. The variable name ‘data’ is a DataFrame data structure. Moreover, by passing an argument of ‘exclude’ with the value of np.number, it will select only the column with the data type beside number. Those are the employee_id and name where in the actual data type of the column table each of them have a string data type. As in the above output, it will only select column with an object data type. It is using ‘numpy’ an additional library package to specify numeric value as an argument for the ‘select_dtypes’ function.