This article contains an information about how to read CSV file. The content of the CSV file will be available using Pandas library. The Pandas library is a library available in Python programming language. According to the information in this link, Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. Another information in Wikipedia and it is available in this link describe that Pandas is a high-level data manipulation tool developed by Wes McKinney. It is built on the Numpy package and its key data structure is called the DataFrame. DataFrames allow you to store and manipulate tabular data in rows of observations and columns of variables.
Moreover, also according the information in Wikipedia, still in this link, in particular, Pandas library offers data structures and operations for manipulating numerical tables and time series. The name is derived from the term “panel data”, an econometrics term for data sets that include observations over multiple time periods for the same individuals. Its name is a play on the phrase “Python data analysis” itself.
Without further explanation, just execute the process. In this article, the execution process is using a jupyter notebook. Just execute the jupyter notebook in this context from a command line. For an example :
(myenv) C:\python\data-science>jupyter notebook [I 17:08:28.062 NotebookApp] Serving notebooks from local directory: C:\python\data-science [I 17:08:28.062 NotebookApp] The Jupyter Notebook is running at: [I 17:08:28.067 NotebookApp] http://localhost:8888/?token=4dd9801ef2aacad1d445955b0ae4621b4c669da84c617e7b [I 17:08:28.068 NotebookApp] or http://127.0.0.1:8888/?token=4dd9801ef2aacad1d445955b0ae4621b4c669da84c617e7b [I 17:08:28.070 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [C 17:08:28.225 NotebookApp] To access the notebook, open this file in a browser: file:///C:/Users/Personal/AppData/Roaming/jupyter/runtime/nbserver-8796-open.html Or copy and paste one of these URLs: http://localhost:8888/?token=4dd9801ef2aacad1d445955b0ae4621b4c669da84c617e7b or http://127.0.0.1:8888/?token=4dd9801ef2aacad1d445955b0ae4621b4c669da84c617e7b
Reading CSV File using Pandas Library
So, using Pandas library, the main purpose is to get the data from CSV file. After retrieving the data, it will then pass to a key data structure called DataFrame. The following is the syntax to achieve it :
import pandas as pd data = pd.read_csv("file_name.csv") data
The above command execution will just print the content of the CSV file with the name of ‘file_name.csv’. Where the file itself is in the same directory with the file script. In term of the script execution, the above file script is a .ipynb file where it runs in a jupyter notebook as in the following image :
The above is an image of a running Jupyter Notebook. It run a .ipynb file with the name of ‘read-file-transactions.ipynb’. There is a snippet code available as follows :
import pandas as pd data = pd.read_csv("transactions1.csv") data
Reading CSV File and Separate the Column Header using Pandas Library
As in the above output exist, the output is just one single column with many rows. So, how to separate the element in the column header ?. Since the column header element has a separator character of ‘;’, just modify the snippet code. Pass the separator character in the snippet code as follows :
import pandas as pd data = pd.read_csv("transactions1.csv",sep=";") data
The following output will appear :
In the above output there is a warning message in the DtypeWarning section. There is a need to specify dtype option on import or set low_memory=False. So, re-execute the above script with the additionall argument of ‘low_memory=False’. The following is the image of the execution :