Introduction
This is an erticle where the main focus is just to print and display the summary of a DataFrame variable data structure. Before working on the summary, there is a specific requirement which is very important for the variable with the type of DataFrame data structure will exist. As a reference, just read an article with the title of ‘How to Use Pandas’ in this link. It is giving information about how to use install Pandas library. Furthermore, just read another reference article with the title of ‘How to Use Pandas with DataFrame’ in this link. After importing Pandas library and defining a DataFrame data type variable, it is possible to perform a summary for the DataFrame.
How to Display Summary of a DataFrame
In this part, the main focus will be able to summarize a variable with the type of DataFrame data structure. There is a method or a function which is possible for printing a concise summary of a DataFrame. So, this method will prints information about a DataFrame including the index dtype and columns, non-null values and memory usage. That method exist in the DataFrame of Pandas library. It comes really handy when doing exploratory analysis of the data. So, the following is an example for getting a quick overview of the data set using the info() method or function to summarize the variable with the type of DataFrame data set :
-
Before every steps ahead, it will start with executing a command line interface. In this case, it will run a Command Prompt since the execution of the command will use local device running Microsoft Windows operating system. Below is the appearance of the Command Prompt :
Microsoft Windows [Version 10.0.22000.856] (c) Microsoft Corporation. All rights reserved. C:\Users\Personal>
-
Then, execute ‘python’ in order to get in to the Python command console below :
Microsoft Windows [Version 10.0.22000.856] (c) Microsoft Corporation. All rights reserved. C:\Users\Personal>python Python 3.10.5 (tags/v3.10.5:f377153, Jun 6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information.
-
Next, after getting into the Python command console, just import Pandas library as follows :
Microsoft Windows [Version 10.0.22000.856] (c) Microsoft Corporation. All rights reserved. C:\Users\Personal>python Python 3.10.5 (tags/v3.10.5:f377153, Jun 6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import pandas as pd
-
Following the previous step, just define DataFrame variable. In this case as an example, it is by reading a CSV file. In this context, it is using a file with the name of ‘student.csv’ :
Microsoft Windows [Version 10.0.22000.856] (c) Microsoft Corporation. All rights reserved. C:\Users\Personal>python Python 3.10.5 (tags/v3.10.5:f377153, Jun 6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import pandas as pd Python 3.10.5 (tags/v3.10.5:f377153, Jun 6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import pandas as pd >>> df = pd.read_csv('student.csv')
-
Finally, in order to give summarization for DataFrame variable, the following is the appearance of the command execution as follows :
>>> df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 35 entries, 0 to 34 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 35 non-null int64 1 name 35 non-null object 2 class 35 non-null object 3 mark 35 non-null int64 4 gender 35 non-null object dtypes: int64(2), object(3) memory usage: 1.5+ KB >>>
As it exist in the above output command, it will describe or summarize the DataFrame variable which is giving the available column lists.