記一次未知類型的二進制文件的打開

最近接到了一個需求:解出一個二進制文件的內容。php

/home/work/files # ll
total 312
-rw-------@ 1 honvid  staff    30K  7 24 14:52 15158
-rw-------  1 honvid  staff    46K  7 24 14:53 62770
-rw-------@ 1 honvid  staff    73K  7 24 11:26 8686584

vi

可見以下一堆亂碼。java

^_<8b>^H^@^@^@^@^@^D^@í½^G`^\I<96>%&/mÊ{^?JõJ×àt¡^H<80>`^S$
……
……
……

unzip

/home/work/files # unzip 15158
Archive:  15158
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of 15158 or
        15158.zip, and cannot find 15158.ZIP, period.

gzip

/home/work/files # gzip -d 15158
gzip: 15158: unknown suffix -- ignored

tar

/home/work/files # tar -xzvf 15158
tar: Unrecognized archive format
tar: Error exit delayed from previous errors.

lzma

/home/work/files # lzma -d 15158
lzma: 15158: File format not recognized

xz

/home/work/files # xz -d 15158
xz: 15158: File format not recognized

jar

看的有文章說能夠使用 jar 命令進行解壓。app

/home/work/files # jar xvf 15158
java.util.zip.ZipException: zip END header not found
    at java.base/java.util.zip.ZipFile$Source.zerror(ZipFile.java:1470)
    at java.base/java.util.zip.ZipFile$Source.findEND(ZipFile.java:1371)
    at java.base/java.util.zip.ZipFile$Source.initCEN(ZipFile.java:1378)
    at java.base/java.util.zip.ZipFile$Source.<init>(ZipFile.java:1209)
    at java.base/java.util.zip.ZipFile$Source.get(ZipFile.java:1172)
    at java.base/java.util.zip.ZipFile$CleanableResource.<init>(ZipFile.java:719)
    at java.base/java.util.zip.ZipFile.<init>(ZipFile.java:239)
    at java.base/java.util.zip.ZipFile.<init>(ZipFile.java:169)
    at java.base/java.util.zip.ZipFile.<init>(ZipFile.java:140)
    at jdk.jartool/sun.tools.jar.Main.extract(Main.java:1389)
    at jdk.jartool/sun.tools.jar.Main.run(Main.java:410)
    at jdk.jartool/sun.tools.jar.Main.main(Main.java:1681)

7za

後來想着要不用個大而全的工具進行解壓。查到能夠用 P7ZIP 。工具

我是在 Alpine 中使用的測試。測試

安裝步驟以下:this

/home/work/files # apk add p7zip
(1/1) Installing p7zip (16.02-r3)
Executing busybox-1.29.3-r10.trigger
OK: 84 MiB in 64 packages

# 安裝包不小

/home/work/files # 7za

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz (806E9),ASM,AES-NI)

Usage: 7za <command> [<switches>...] <archive_name> [<file_names>...]
       [<@listfiles...>]

<Commands>
  a : Add files to archive
  b : Benchmark
  d : Delete files from archive
  e : Extract files from archive (without using directory names)
  h : Calculate hash values for files
  i : Show information about supported formats
  l : List contents of archive
  rn : Rename files in archive
  t : Test integrity of archive
  u : Update files to archive
  x : eXtract files with full paths

<Switches>
  -- : Stop switches parsing
  -ai[r[-|0]]{@listfile|!wildcard} : Include archives
  -ax[r[-|0]]{@listfile|!wildcard} : eXclude archives
  -ao{a|s|t|u} : set Overwrite mode
  -an : disable archive_name field
  -bb[0-3] : set output log level
  -bd : disable progress indicator
  -bs{o|e|p}{0|1|2} : set output stream for output/error/progress line
  -bt : show execution time statistics
  -i[r[-|0]]{@listfile|!wildcard} : Include filenames
  -m{Parameters} : set compression Method
    -mmt[N] : set number of CPU threads
  -o{Directory} : set Output directory
  -p{Password} : set Password
  -r[-|0] : Recurse subdirectories
  -sa{a|e|s} : set Archive name mode
  -scc{UTF-8|WIN|DOS} : set charset for for console input/output
  -scs{UTF-8|UTF-16LE|UTF-16BE|WIN|DOS|{id}} : set charset for list files
  -scrc[CRC32|CRC64|SHA1|SHA256|*] : set hash function for x, e, h commands
  -sdel : delete files after compression
  -seml[.] : send archive by email
  -sfx[{name}] : Create SFX archive
  -si[{name}] : read data from stdin
  -slp : set Large Pages mode
  -slt : show technical information for l (List) command
  -snh : store hard links as links
  -snl : store symbolic links as links
  -sni : store NT security information
  -sns[-] : store NTFS alternate streams
  -so : write data to stdout
  -spd : disable wildcard matching for file names
  -spe : eliminate duplication of root folder for extract command
  -spf : use fully qualified file paths
  -ssc[-] : set sensitive case mode
  -ssw : compress shared files
  -stl : set archive timestamp from the most recently modified file
  -stm{HexMask} : set CPU thread affinity mask (hexadecimal number)
  -stx{Type} : exclude archive type
  -t{Type} : Set type of archive
  -u[-][p#][q#][r#][x#][y#][z#][!newArchiveName] : Update options
  -v{Size}[b|k|m|g] : Create volumes
  -w[{path}] : assign Work directory. Empty path means a temporary directory
  -x[r[-|0]]{@listfile|!wildcard} : eXclude filenames
  -y : assume Yes on all queries

重點來了。.net

/home/work/files # 7za x 15158

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz (806E9),ASM,AES-NI)

Scanning the drive for archives:
1 file, 74830 bytes (74 KiB)

Extracting archive: 15158
WARNING:
15158
Can not open the file as [zip] archive
The file is open as [gzip] archive

--
Path = 15158
Open WARNING: Can not open the file as [zip] archive
Type = gzip
Headers Size = 10

Everything is Ok

Archives with Warnings: 1
Size:       149629
Compressed: 74830

我開始覺得沒有解壓成功,一眼看去有個 WARNINGcode

後來再仔細一看,居然是有個 Type = gzip。是 Gzip 文件。orm

那麼爲啥剛纔嘗試使用 gzip 命令失敗了呢。ip

驗證

  • 添加文件後綴 .gz 後,使用 gzip -d 命令解壓成功。
應該是 gzip 的腳本沒有對文件內容進行類型校驗,只是對文件名後綴進行匹配。
  • 使用 PHP 讀取內容成功
$filename = '/home/work/files/15158';
$file = file_get_contents($filename);
echo gzdecode($filename);

成功輸出文件內容。

擴展

後來看過一個封裝的判斷文件類型的工具類。其思路是判斷文件頭信息。

經過原生方法

$filename = '/home/work/files/15158';
//This function opens a magic database and returns its resource.
$handle   = finfo_open(FILEINFO_MIME_TYPE);
// Return information about a file
$fileInfo = finfo_file($handle, $filename);
finfo_close($handle);
var_dump($fileInfo);

## 輸出內容
string(18) "application/x-gzip"

經過頭信息

這個的前提是知道各文件類型的頭信息:可查詢文件頭信息庫

$file = @fopen('/home/work/files/15158', "rb");
if (!$file) throw new \Exception("file refuse!");
$bin = fread($file, 15); //只讀15字節 各個不一樣文件類型,頭信息不同。
fclose($file);
// 定義的文件頭信息映射
$types = [
    ["FFD8FFE1", "jpg"],
    ["89504E47", "png"],
    ["255044462D312E", "pdf"],
    ["504B0304", "zip"],
    ["52617221", "rar"],
    ["1F8B08", "gzip"]
];
foreach ($types as $type) {
    $blen = strlen(pack("H*", $type[0])); //獲得文件頭標記字節數
    $tbin = substr($bin, 0, intval($blen)); ///須要比較文件頭長度

    if (strtolower($type[0]) == strtolower(array_shift(unpack("H*", $tbin)))) {

        return $type[1];
    }
}
return "unknown";
相關文章
相關標籤/搜索