上一篇介紹了osquery的一些用法,即如何使用SQL語句查詢系統信息。本文就來介紹下這個table是如何定義的,及table中的數據是如何取得的。
本文以uptime和process兩張表爲例。本文介紹的osquery版本是1.7.6。linux
uptime主要用來獲取系統的啓動時間:git
osquery> select * from uptime; +------+-------+---------+---------+---------------+ | days | hours | minutes | seconds | total_seconds | +------+-------+---------+---------+---------------+ | 1 | 23 | 19 | 53 | 170393 | +------+-------+---------+---------+---------------+
uptime表中的這條數據是如何獲取的呢?
通常來講,對於table的描述分爲兩部分。一部分是spec,一部分是impl。github
首先來看uptime.tableide
table_name("uptime") description("Track time passed since last boot.") schema([ Column("days", INTEGER, "Days of uptime"), Column("hours", INTEGER, "Hours of uptime"), Column("minutes", INTEGER, "Minutes of uptime"), Column("seconds", INTEGER, "Seconds of uptime"), Column("total_seconds", BIGINT, "Total uptime seconds"), ]) implementation("system/uptime@genUptime")
能夠看到uptime表有5列,分別是days,hours,minutes,seconds,total_seconds。函數
其實現的代碼是system/uptime
中的genUptime
函數。ui
那麼直接來看具體實現uptime.cpp。spa
QueryData genUptime(QueryContext& context) { Row r; QueryData results; long uptime_in_seconds = getUptime(); //獲取啓動的時間(根據不一樣的系統,有不一樣的方法獲取) if (uptime_in_seconds >= 0) { r["days"] = INTEGER(uptime_in_seconds / 60 / 60 / 24); r["hours"] = INTEGER((uptime_in_seconds / 60 / 60) % 24); r["minutes"] = INTEGER((uptime_in_seconds / 60) % 60); r["seconds"] = INTEGER(uptime_in_seconds % 60); r["total_seconds"] = BIGINT(uptime_in_seconds); results.push_back(r); } return results; }
Row r
是一行數據,其對應於SQL查詢結果的一行,包含有該表的每一列。QueryData results
是SQL查詢返回的全部查詢結果的集合,能夠包含若干行。能夠看到該函數是首先獲取啓動時間,而後在行Row r
中對應的字段填入相應的數據。
以後將結果經過results.push_back(r);
填入到返回數據中,而後最終返回查詢的結果。code
由於uptime表只是獲取對應的時間,因此只有一行。這裏genUptime
也就對應只填寫了一行進行返回。orm
uptime
是一個比較簡單的表,下面對一個更爲複雜的表processes
進行分析。進程
processes表相對來講,就複雜一些,其提供了正在running的進程的相關信息。
首先來看processes.table,
能夠看到該表包含了不少列。這裏就不一一介紹了。
table_name("processes") description("All running processes on the host system.") schema([ Column("pid", BIGINT, "Process (or thread) ID", index=True), Column("name", TEXT, "The process path or shorthand argv[0]"), Column("path", TEXT, "Path to executed binary"), Column("cmdline", TEXT, "Complete argv"), Column("state", TEXT, "Process state"), Column("cwd", TEXT, "Process current working directory"), Column("root", TEXT, "Process virtual root directory"), Column("uid", BIGINT, "Unsigned user ID"), Column("gid", BIGINT, "Unsigned group ID"), Column("euid", BIGINT, "Unsigned effective user ID"), Column("egid", BIGINT, "Unsigned effective group ID"), Column("suid", BIGINT, "Unsigned saved user ID"), Column("sgid", BIGINT, "Unsigned saved group ID"), Column("on_disk", INTEGER, "The process path exists yes=1, no=0, unknown=-1"), Column("wired_size", BIGINT, "Bytes of unpagable memory used by process"), Column("resident_size", BIGINT, "Bytes of private memory used by process"), Column("phys_footprint", BIGINT, "Bytes of total physical memory used"), Column("user_time", BIGINT, "CPU time spent in user space"), Column("system_time", BIGINT, "CPU time spent in kernel space"), Column("start_time", BIGINT, "Process start in seconds since boot (non-sleeping)"), Column("parent", BIGINT, "Process parent's PID"), Column("pgroup", BIGINT, "Process group"), Column("nice", INTEGER, "Process nice level (-20 to 20, default 0)"), ]) implementation("system/processes@genProcesses") examples([ "select * from processes where pid = 1", ])
能夠看到起其實現是processes
中的genProcesses
函數。
processes
中的genProcesses
函數爲不一樣系統提供了不一樣的實現。本文主要是從linux/processes.cpp來作分析。
首先看實現函數genProcesses
:
QueryData genProcesses(QueryContext& context) { QueryData results; auto pidlist = getProcList(context); for (const auto& pid : pidlist) { genProcess(pid, results); } return results; }
能夠看到該函數主要有兩部分。
getProcList
函數主要是根據context獲取pid列表。
std::set<std::string> getProcList(const QueryContext& context) { std::set<std::string> pidlist; if (context.constraints.count("pid") > 0 && context.constraints.at("pid").exists(EQUALS)) { for (const auto& pid : context.constraints.at("pid").getAll(EQUALS)) { if (isDirectory("/proc/" + pid)) { pidlist.insert(pid); } } } else { osquery::procProcesses(pidlist); } return pidlist; }
從代碼裏能夠看到,這裏能夠根據查詢條件進行篩選。若是查詢條件裏面有where pid=xxxx
的時候,即符合了
if (context.constraints.count("pid") > 0 && context.constraints.at("pid").exists(EQUALS))
的條件,所以只須要將該pid加入到pidList中。
這一步的好處在於若是有where pid=xxxx
的條件,就不須要檢索全部的pid,只須要去獲取特定的pid信息就能夠了。
若是沒有這種限制條件,則去獲取全部的pid。獲取的方法是procProcesses
函數:
const std::string kLinuxProcPath = "/proc"; Status procProcesses(std::set<std::string>& processes) { // Iterate over each process-like directory in proc. boost::filesystem::directory_iterator it(kLinuxProcPath), end; try { for (; it != end; ++it) { if (boost::filesystem::is_directory(it->status())) { // See #792: std::regex is incomplete until GCC 4.9 if (std::atoll(it->path().leaf().string().c_str()) > 0) { processes.insert(it->path().leaf().string()); } } } } catch (const boost::filesystem::filesystem_error& e) { VLOG(1) << "Exception iterating Linux processes " << e.what(); return Status(1, e.what()); } return Status(0, "OK"); }
能夠看到,獲取全部的pid就是遍歷/proc
下的全部文件夾,判斷文件夾是否是純數字,若是是,則加入到processes
集合裏。
有了pidList,接下來就是根據pidList,依次獲取每一個pid的信息。
void genProcess(const std::string& pid, QueryData& results) { // Parse the process stat and status. auto proc_stat = getProcStat(pid); Row r; r["pid"] = pid; r["parent"] = proc_stat.parent; r["path"] = readProcLink("exe", pid); r["name"] = proc_stat.name; r["pgroup"] = proc_stat.group; r["state"] = proc_stat.state; r["nice"] = proc_stat.nice; // Read/parse cmdline arguments. r["cmdline"] = readProcCMDLine(pid); r["cwd"] = readProcLink("cwd", pid); r["root"] = readProcLink("root", pid); r["uid"] = proc_stat.real_uid; r["euid"] = proc_stat.effective_uid; r["suid"] = proc_stat.saved_uid; r["gid"] = proc_stat.real_gid; r["egid"] = proc_stat.effective_gid; r["sgid"] = proc_stat.saved_gid; // If the path of the executable that started the process is available and // the path exists on disk, set on_disk to 1. If the path is not // available, set on_disk to -1. If, and only if, the path of the // executable is available and the file does NOT exist on disk, set on_disk // to 0. r["on_disk"] = osquery::pathExists(r["path"]).toString(); // size/memory information r["wired_size"] = "0"; // No support for unpagable counters in linux. r["resident_size"] = proc_stat.resident_size; r["phys_footprint"] = proc_stat.phys_footprint; // time information r["user_time"] = proc_stat.user_time; r["system_time"] = proc_stat.system_time; r["start_time"] = proc_stat.start_time; results.push_back(r); }
能夠看到首先是用getProcStat函數獲取pid的信息。
這裏getProcStat
函數就不展開分析了,其主要就是讀取/proc/<pid>/stat
文件,而後將對應的字段獲取出來。
而後genProcess
函數將從getProcStat
獲取到的信息,填入到行r
中的對應列,最後將行r
加到返回的結果集中。