最近接到了一個需求:解出一個二進制文件的內容。php
/home/work/files # ll total 312 -rw-------@ 1 honvid staff 30K 7 24 14:52 15158 -rw------- 1 honvid staff 46K 7 24 14:53 62770 -rw-------@ 1 honvid staff 73K 7 24 11:26 8686584
vi
可見以下一堆亂碼。java
^_<8b>^H^@^@^@^@^@^D^@í½^G`^\I<96>%&/mÊ{^?JõJ×àt¡^H<80>`^S$ …… …… ……
unzip
/home/work/files # unzip 15158 Archive: 15158 End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive. unzip: cannot find zipfile directory in one of 15158 or 15158.zip, and cannot find 15158.ZIP, period.
gzip
/home/work/files # gzip -d 15158 gzip: 15158: unknown suffix -- ignored
tar
/home/work/files # tar -xzvf 15158 tar: Unrecognized archive format tar: Error exit delayed from previous errors.
lzma
/home/work/files # lzma -d 15158 lzma: 15158: File format not recognized
xz
/home/work/files # xz -d 15158 xz: 15158: File format not recognized
jar
看的有文章說能夠使用 jar
命令進行解壓。app
/home/work/files # jar xvf 15158 java.util.zip.ZipException: zip END header not found at java.base/java.util.zip.ZipFile$Source.zerror(ZipFile.java:1470) at java.base/java.util.zip.ZipFile$Source.findEND(ZipFile.java:1371) at java.base/java.util.zip.ZipFile$Source.initCEN(ZipFile.java:1378) at java.base/java.util.zip.ZipFile$Source.<init>(ZipFile.java:1209) at java.base/java.util.zip.ZipFile$Source.get(ZipFile.java:1172) at java.base/java.util.zip.ZipFile$CleanableResource.<init>(ZipFile.java:719) at java.base/java.util.zip.ZipFile.<init>(ZipFile.java:239) at java.base/java.util.zip.ZipFile.<init>(ZipFile.java:169) at java.base/java.util.zip.ZipFile.<init>(ZipFile.java:140) at jdk.jartool/sun.tools.jar.Main.extract(Main.java:1389) at jdk.jartool/sun.tools.jar.Main.run(Main.java:410) at jdk.jartool/sun.tools.jar.Main.main(Main.java:1681)
7za
後來想着要不用個大而全的工具進行解壓。查到能夠用 P7ZIP 。工具
我是在 Alpine 中使用的測試。測試
安裝步驟以下:this
/home/work/files # apk add p7zip (1/1) Installing p7zip (16.02-r3) Executing busybox-1.29.3-r10.trigger OK: 84 MiB in 64 packages # 安裝包不小 /home/work/files # 7za 7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 p7zip Version 16.02 (locale=C.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz (806E9),ASM,AES-NI) Usage: 7za <command> [<switches>...] <archive_name> [<file_names>...] [<@listfiles...>] <Commands> a : Add files to archive b : Benchmark d : Delete files from archive e : Extract files from archive (without using directory names) h : Calculate hash values for files i : Show information about supported formats l : List contents of archive rn : Rename files in archive t : Test integrity of archive u : Update files to archive x : eXtract files with full paths <Switches> -- : Stop switches parsing -ai[r[-|0]]{@listfile|!wildcard} : Include archives -ax[r[-|0]]{@listfile|!wildcard} : eXclude archives -ao{a|s|t|u} : set Overwrite mode -an : disable archive_name field -bb[0-3] : set output log level -bd : disable progress indicator -bs{o|e|p}{0|1|2} : set output stream for output/error/progress line -bt : show execution time statistics -i[r[-|0]]{@listfile|!wildcard} : Include filenames -m{Parameters} : set compression Method -mmt[N] : set number of CPU threads -o{Directory} : set Output directory -p{Password} : set Password -r[-|0] : Recurse subdirectories -sa{a|e|s} : set Archive name mode -scc{UTF-8|WIN|DOS} : set charset for for console input/output -scs{UTF-8|UTF-16LE|UTF-16BE|WIN|DOS|{id}} : set charset for list files -scrc[CRC32|CRC64|SHA1|SHA256|*] : set hash function for x, e, h commands -sdel : delete files after compression -seml[.] : send archive by email -sfx[{name}] : Create SFX archive -si[{name}] : read data from stdin -slp : set Large Pages mode -slt : show technical information for l (List) command -snh : store hard links as links -snl : store symbolic links as links -sni : store NT security information -sns[-] : store NTFS alternate streams -so : write data to stdout -spd : disable wildcard matching for file names -spe : eliminate duplication of root folder for extract command -spf : use fully qualified file paths -ssc[-] : set sensitive case mode -ssw : compress shared files -stl : set archive timestamp from the most recently modified file -stm{HexMask} : set CPU thread affinity mask (hexadecimal number) -stx{Type} : exclude archive type -t{Type} : Set type of archive -u[-][p#][q#][r#][x#][y#][z#][!newArchiveName] : Update options -v{Size}[b|k|m|g] : Create volumes -w[{path}] : assign Work directory. Empty path means a temporary directory -x[r[-|0]]{@listfile|!wildcard} : eXclude filenames -y : assume Yes on all queries
重點來了。.net
/home/work/files # 7za x 15158 7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 p7zip Version 16.02 (locale=C.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz (806E9),ASM,AES-NI) Scanning the drive for archives: 1 file, 74830 bytes (74 KiB) Extracting archive: 15158 WARNING: 15158 Can not open the file as [zip] archive The file is open as [gzip] archive -- Path = 15158 Open WARNING: Can not open the file as [zip] archive Type = gzip Headers Size = 10 Everything is Ok Archives with Warnings: 1 Size: 149629 Compressed: 74830
我開始覺得沒有解壓成功,一眼看去有個 WARNING
。code
後來再仔細一看,居然是有個 Type = gzip
。是 Gzip
文件。orm
那麼爲啥剛纔嘗試使用 gzip
命令失敗了呢。ip
.gz
後,使用 gzip -d
命令解壓成功。應該是 gzip 的腳本沒有對文件內容進行類型校驗,只是對文件名後綴進行匹配。
PHP
讀取內容成功$filename = '/home/work/files/15158'; $file = file_get_contents($filename); echo gzdecode($filename);
成功輸出文件內容。
後來看過一個封裝的判斷文件類型的工具類。其思路是判斷文件頭信息。
$filename = '/home/work/files/15158'; //This function opens a magic database and returns its resource. $handle = finfo_open(FILEINFO_MIME_TYPE); // Return information about a file $fileInfo = finfo_file($handle, $filename); finfo_close($handle); var_dump($fileInfo); ## 輸出內容 string(18) "application/x-gzip"
這個的前提是知道各文件類型的頭信息:可查詢文件頭信息庫
$file = @fopen('/home/work/files/15158', "rb"); if (!$file) throw new \Exception("file refuse!"); $bin = fread($file, 15); //只讀15字節 各個不一樣文件類型,頭信息不同。 fclose($file); // 定義的文件頭信息映射 $types = [ ["FFD8FFE1", "jpg"], ["89504E47", "png"], ["255044462D312E", "pdf"], ["504B0304", "zip"], ["52617221", "rar"], ["1F8B08", "gzip"] ]; foreach ($types as $type) { $blen = strlen(pack("H*", $type[0])); //獲得文件頭標記字節數 $tbin = substr($bin, 0, intval($blen)); ///須要比較文件頭長度 if (strtolower($type[0]) == strtolower(array_shift(unpack("H*", $tbin)))) { return $type[1]; } } return "unknown";