split即一款分割文件的小工具,可以根據設定的大小(如行數、字節數等)將一個文件等分成更小的文件。若文件大小超出文件系統支持的單文件最大值,或由於網絡傳輸的限制,此時將大文件切分成同等大小的小文件,則可以很好的解決這些問題。
split help
[test@server ~]$ split --help
Usage: split [OPTION]... [INPUT [PREFIX]]
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default
size is 1000 lines, and default PREFIX is `x'. With no INPUT, or when INPUT
is -, read standard input.
Mandatory arguments to long options are mandatory for short options too.
-a, --suffix-length=N use suffixes of length N (default 2)
-b, --bytes=SIZE put SIZE bytes per output file
-C, --line-bytes=SIZE put at most SIZE bytes of lines per output file
-d, --numeric-suffixes use numeric suffixes instead of alphabetic
-l, --lines=NUMBER put NUMBER lines per output file
--verbose print a diagnostic just before each
output file is opened
--help display this help and exit
--version output version information and exit
SIZE may be (or may be an integer optionally followed by) one of following:
KB 1000, K 1024, MB 1000*1000, M 1024*1024, and so on for G, T, P, E, Z, Y.
Report split bugs to [email protected]
GNU coreutils home page: <http://www.gnu.org/software/coreutils/>
General help using GNU software: <http://www.gnu.org/gethelp/>
For complete documentation, run: info coreutils 'split invocation'
使用
查看文件大小
[root@server ~]# ll -h
total 4.0K
-rw-r--r-- 1 root root 1.5K Mar 12 10:19 netstat.log.bz2
按bytes分割文件
[root@server ~]# split -d -b 1K netstat.log.bz2 netstat.log.bz2.
[root@server ~]# ll -h
total 12K
-rw-r--r-- 1 root root 1.5K Mar 12 10:19 netstat.log.bz2
-rw-r--r-- 1 root root 1.0K Mar 12 10:22 netstat.log.bz2.00
-rw-r--r-- 1 root root 500 Mar 12 10:22 netstat.log.bz2.01
測試分割後文件的完整性
[root@server ~]# bzip2 -v -t netstat.log.bz2.00
netstat.log.bz2.00: file ends unexpectedly
You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.
[root@server ~]# bzip2 -v -t netstat.log.bz2.01
netstat.log.bz2.01: bad magic number (file not created by bzip2)
You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.
合並分割後的文件
[root@server ~]# cat netstat.log.bz2.0[0-1] > netstat.log.recover.bz2
測試合並後文件的完整性
[root@server ~]# bzip2 -v -t netstat.log.recover.bz2
netstat.log.recover.bz2: ok
注意
-a 指定後綴名的長度。根據數字或字母,可以確定分割後的最大文件數
如果後綴為數字[0-9],則分割後最多有 10 ** ${suffix_length};
如果後綴為字母[a-z],則分割後最多有 26 ** ${suffix_length};
通常在使用split分割文件前,根據原文件大小,分割後大小,估算下分割後文件數量,以此確定合適的分割分割後綴和後綴長度,否則可能出現後綴不夠用的情況。
查看要分割的文件大小
[test@server ~]$ wc netstat.log
302 1918 23945 netstat.log
按行分割文件
[test@server ~]$ split -a 2 -d -l 2 --verbose netstat.log netstat.log.
creating file `netstat.log.00'
... ...
creating file `netstat.log.99'
split: output file suffixes exhausted
該文件有302行,按2行一個文件進行分割,則會產生151(302/2)個文件。但在分割時,使用數字為後綴,長度為2,則最多能夠產生 10 ** 2 = 100個文件,顯然不夠用。