Sort Files by Size Linux Command Line

Posted on

This is an article which is made to solve problems on searching files or folders containing large files. After finding out that a server has its own space storage slowly running out, it is important to retrieve or to reclaim some space by searching unused or unimportant files, folders scattered in the server.

The method used for searching those files and folders is by executing the sort command from the top level directory and then re-execute it in the lower level directory which is indicated containing larger size folder or files.

The command used to execute the sorting process is shown below :

du -h / | sort -nr | grep [0-9]G | head -20


Description : 
du : It is the command used to estimate space usage.
-h : It is the additional parameter for 'du' command used to display the size in a human-readable format. The 'h' itself can be assumed as the abbreviation of human-readable.
/ : The location specified to be viewed, in this context it is the root partition. 
| : The pipe sign, whatever output generated before the pipe sign will be directed to any media, command available after the pipe sign. 
sort : It is the command used to sort lines of text files
-n : It is the additional parameter for 'sort' command used to display by comparing to string numerical value accordingly. 
-r : It is the additional paramter for 'sort' command which is reversing the result of comparison. 
grep : It is a command which is used to filter and print lines which is matching with the pattern specified. 
[0-9]G : It is the pattern specified as parameter for the grep command. It will display any line containing number followed with 'G' character. 
head : It is a command which is used to output the first part of files only. 
-20 : It is the additional parameter of the head command which is printing only the first 20 lines available. 

Below is the first execution on root partition (/) using the above command pattern :

[root@hostname ~]# du -h / | sort -nr | grep [0-9]G | head -20
du: cannot access ‘/proc/60569/task/60569/fd/4’: No such file or directory
du: cannot access ‘/proc/60569/task/60569/fdinfo/4’: No such file or directory
du: cannot access ‘/proc/60569/fd/4’: No such file or directory
du: cannot access ‘/proc/60569/fdinfo/4’: No such file or directory
76G /
26G /home
25G /opt/alfresco-community
25G /opt
23G /var/lib/mysql
23G /var/lib
23G /var
23G /opt/alfresco-community/tomcat
23G /home/user
22G /opt/alfresco-community/tomcat/logs
13G /var/lib/mysql/webapps1
10G /var/lib/mysql/webapps2
1.6G /home/dev
1.4G /home/guest
1.3G /usr
1.3G /home/apps/bin
0 /run/udev/links/\x2fdisk\x2fby-id\x2fdm-uuid-LVM-cgvkcZiNjYDj9VbmK3a8UFeZxpJvlSv0VWzCwli6RvphmtK67GRQ9q6KLRTuqQjo
[root@hostname ~]#

As shown in the above output, the largest files located in root partition (/) which has the size of 76G. But if the output is read further there is a location which consistently has large size eventhough it is going deeper on its level. It can be shown as follows :

25G /opt/alfresco-community
...
23G /opt/alfresco-community/tomcat
...
22G /opt/alfresco-community/tomcat/logs
...

In the /opt/alfresco-community specifically /opt/alfresco-community/tomcat and moreover in detail it is located in /opt/alfresco-community/tomcat/logs has a 22G folder size in total.

To be able to search further, run the sort command again but in this time, it will be executed in /opt/alfresco-community/tomcat/logs :

[root@hostname logs]# du -h * | sort -nr | grep [0-9]G | head -20
5.1G localhost_access_log2016-07-19.txt
3.3G localhost_access_log2016-07-20.txt
1.9G localhost_access_log2016-08-02.txt
1.7G localhost_access_log2016-08-01.txt
1.6G localhost_access_log2016-07-31.txt
1.5G localhost_access_log2016-07-30.txt
1.3G localhost_access_log2016-07-29.txt
1.2G localhost_access_log2016-07-28.txt
[root@hostname logs]# 

Based on the above output command, there are several files has large size even surpass 1G. But not only that, to be able to view files which are smaller than 1G, below is the command execution by modifying grep into grep [0-9]M where M is for searching Megabytes size files :

du -h * | sort -nr | grep [0-9]M | head -20
[root@hostname logs]# du -h * | sort -nr | grep [0-9]M | head -20
1018M   localhost_access_log2016-07-27.txt
875M    localhost_access_log2016-07-26.txt
788M    localhost_access_log2016-08-03.txt
734M    localhost_access_log2016-07-25.txt
593M    localhost_access_log2016-07-24.txt
451M    localhost_access_log2016-07-23.txt
310M    localhost_access_log2016-07-22.txt
169M    localhost_access_log2016-07-21.txt
62M     catalina.out
[root@hostname logs]# 

As expected, although the size is under 1G, but there are several files have the size of hundreds MB which are significant to be erased since those files are logs files.

Removing log files by executing the following command :

[root@hostname logs]# rm -rf localhost_access_log2016-0*

The above explanation described how to claim some space by iteratively checked and searched using the command combination for large files and folder. Hope it helps for further reference.

One thought on “Sort Files by Size Linux Command Line

Leave a Reply