Nodejs進程間通訊

時間 2021-01-16

標籤 node shell json 緩存安全網絡 electron socket 分佈式 ide 欄目 Unix 简体版

原文原文鏈接

一.場景
Node運行在單線程下，但這並不意味着沒法利用多核/多機下多進程的優點node

事實上，Node最初從設計上就考慮了分佈式網絡場景：shell

Node is a single-threaded, single-process system which enforces shared-nothing design with OS process boundaries. It has rather good libraries for networking. I believe this to be a basis for designing very large distributed programs. The 「nodes」 need to be organized: given a communication protocol, told how to connect to each other. In the next couple months we are working on libraries for Node that allow these networks.

P.S.關於Node之因此叫Node，見Why is Node.js named Node.js?json

二.建立進程
通訊方式與進程產生方式有關，而Node有4種建立進程的方式：spawn()，exec()，execFile()和fork()緩存

spawn
const { spawn } = require('child_process');
const child = spawn('pwd');
// 帶參數的形式
// const child = spawn('find', ['.', '-type', 'f']);

spawn()返回ChildProcess實例，ChildProcess一樣基於事件機制（EventEmitter API），提供了一些事件：安全

exit：子進程退出時觸發，能夠得知進程退出狀態（code和signal）網絡

disconnect：父進程調用child.disconnect()時觸發electron

error：子進程建立失敗，或被kill時觸發socket

close：子進程的stdio流（標準輸入輸出流）關閉時觸發分佈式

message：子進程經過process.send()發送消息時觸發，父子進程之間能夠經過這種內置的消息機制通訊ide

能夠經過child.stdin，child.stdout和child.stderr訪問子進程的stdio流，這些流被關閉的時，子進程會觸發close事件

P.S.close與exit的區別主要體如今多進程共享同一stdio流的場景，某個進程退出了並不意味着stdio流被關閉了

在子進程中，stdout/stderr具備Readable特性，而stdin具備Writable特性，與主進程的狀況正好相反：

child.stdout.on('data', (data) => {
  console.log(`child stdout:\n${data}`);
});

child.stderr.on('data', (data) => {
  console.error(`child stderr:\n${data}`);
});

利用進程stdio流的管道特性，就能夠完成更復雜的事情，例如：

const { spawn } = require('child_process');

const find = spawn('find', ['.', '-type', 'f']);
const wc = spawn('wc', ['-l']);

find.stdout.pipe(wc.stdin);

wc.stdout.on('data', (data) => {
  console.log(`Number of files ${data}`);
});

做用等價於find . -type f | wc -l，遞歸統計當前目錄文件數量

IPC選項
另外，經過spawn()方法的stdio選項能夠創建IPC機制：

const { spawn } = require('child_process');

const child = spawn('node', ['./ipc-child.js'], { stdio: [null, null, null, 'ipc'] });
child.on('message', (m) => {
  console.log(m);
});
child.send('Here Here');

// ./ipc-child.js
process.on('message', (m) => {
  process.send(`< ${m}`);
  process.send('> 不要回答x3');
});

關於spawn()的IPC選項的詳細信息，請查看options.stdio

exec
spawn()方法默認不會建立shell去執行傳入的命令（因此性能上稍微好一點），而exec()方法會建立一個shell。另外，exec()不是基於stream的，而是把傳入命令的執行結果暫存到buffer中，再整個傳遞給回調函數

exec()方法的特色是徹底支持shell語法，能夠直接傳入任意shell腳本，例如：

const { exec } = require('child_process');

exec('find . -type f | wc -l', (err, stdout, stderr) => {
  if (err) {
    console.error(`exec error: ${err}`);
    return;
  }

  console.log(`Number of files ${stdout}`);
});

但exec()方法也所以存在命令注入的安全風險，在含有用戶輸入等動態內容的場景要特別注意。因此，exec()方法的適用場景是：但願直接使用shell語法，而且預期輸出數據量不大（不存在內存壓力）

那麼，有沒有既支持shell語法，還具備stream IO優點的方式？

有。一箭雙鵰的方式以下：

const { spawn } = require('child_process');
const child = spawn('find . -type f | wc -l', {
  shell: true
});
child.stdout.pipe(process.stdout);

開啓spawn()的shell選項，並經過pipe()方法把子進程的標準輸出簡單地接到當前進程的標準輸入上，以便看到命令執行結果。實際上還有更容易的方式：

const { spawn } = require('child_process');
process.stdout.on('data', (data) => {
  console.log(data);
});
const child = spawn('find . -type f | wc -l', {
  shell: true,
  stdio: 'inherit'
});

stdio: 'inherit'容許子進程繼承當前進程的標準輸入輸出（共享stdin，stdout和stderr），因此上例可以經過監聽當前進程process.stdout的data事件拿到子進程的輸出結果

另外，除了stdio和shell選項，spawn()還支持一些其它選項，如：

const child = spawn('find . -type f | wc -l', {
  stdio: 'inherit',
  shell: true,
  // 修改環境變量，默認process.env
  env: { HOME: '/tmp/xxx' },
  // 改變當前工做目錄
  cwd: '/tmp',
  // 做爲獨立進程存在
  detached: true
});

注意，env選項除了以環境變量形式向子進程傳遞數據外，還能夠用來實現沙箱式的環境變量隔離，默認把process.env做爲子進程的環境變量集，子進程與當前進程同樣可以訪問全部環境變量，若是像上例中指定自定義對象做爲子進程的環境變量集，子進程就沒法訪問其它環境變量

因此，想要增/刪環境變量的話，須要這樣作：

var spawn_env = JSON.parse(JSON.stringify(process.env));

// remove those env vars
delete spawn_env.ATOM_SHELL_INTERNAL_RUN_AS_NODE;
delete spawn_env.ELECTRON_RUN_AS_NODE;

var sp = spawn(command, ['.'], {cwd: cwd, env: spawn_env});

detached選項更有意思：

const { spawn } = require('child_process');

const child = spawn('node', ['stuff.js'], {
  detached: true,
  stdio: 'ignore'
});

child.unref();

以這種方式建立的獨立進程行爲取決於操做系統，Windows上detached子進程將擁有本身的console窗口，而Linux上該進程會建立新的process group（這個特性能夠用來管理子進程族，實現相似於tree-kill的特性）

unref()方法用來斷絕關係，這樣「父」進程能夠獨立退出（不會致使子進程跟着退出），但要注意這時子進程的stdio也應該獨立於「父」進程，不然「父」進程退出後子進程仍會受到影響

execFile
const { execFile } = require('child_process');
const child = execFile('node', ['--version'], (error, stdout, stderr) => {
  if (error) {
    throw error;
  }
  console.log(stdout);
});

與exec()方法相似，但不經過shell來執行（因此性能稍好一點），因此要求傳入可執行文件。Windows下某些文件沒法直接執行，好比.bat和.cmd，這些文件就不能用execFile()來執行，只能藉助exec()或開啓了shell選項的spawn()

P.S.與exec()同樣也不是基於stream的，一樣存在輸出數據量風險

xxxSync
spawn，exec和execFile都有對應的同步阻塞版本，一直等到子進程退出

const { 
  spawnSync, 
  execSync, 
  execFileSync,
} = require('child_process');

同步方法用來簡化腳本任務，好比啓動流程，其它時候應該避免使用這些方法

fork
fork()是spawn()的變體，用來建立Node進程，最大的特色是父子進程自帶通訊機制（IPC管道）：

The child_process.fork() method is a special case of child_process.spawn() used specifically to spawn new Node.js processes. Like child_process.spawn(), a ChildProcess object is returned. The returned ChildProcess will have an additional communication channel built-in that allows messages to be passed back and forth between the parent and child. See subprocess.send() for details.

例如：

var n = child_process.fork('./child.js');
n.on('message', function(m) {
  console.log('PARENT got message:', m);
});
n.send({ hello: 'world' });

// ./child.js
process.on('message', function(m) {
  console.log('CHILD got message:', m);
});
process.send({ foo: 'bar' });

由於fork()自帶通訊機制的優點，尤爲適合用來拆分耗時邏輯，例如：

const http = require('http');
const longComputation = () => {
  let sum = 0;
  for (let i = 0; i < 1e9; i++) {
    sum += i;
  };
  return sum;
};
const server = http.createServer();
server.on('request', (req, res) => {
  if (req.url === '/compute') {
    const sum = longComputation();
    return res.end(`Sum is ${sum}`);
  } else {
    res.end('Ok')
  }
});

server.listen(3000);

這樣作的致命問題是一旦有人訪問/compute，後續請求都沒法及時處理，由於事件循環還被longComputation阻塞着，直到耗時計算結束才能恢復服務能力

爲了不耗時操做阻塞主進程的事件循環，能夠把longComputation()拆分到子進程中：

// compute.js
const longComputation = () => {
  let sum = 0;
  for (let i = 0; i < 1e9; i++) {
    sum += i;
  };
  return sum;
};

// 開關，收到消息纔開始作
process.on('message', (msg) => {
  const sum = longComputation();
  process.send(sum);
});

主進程開啓子進程執行longComputation：

const http = require('http');
const { fork } = require('child_process');

const server = http.createServer();

server.on('request', (req, res) => {
  if (req.url === '/compute') {
    const compute = fork('compute.js');
    compute.send('start');
    compute.on('message', sum => {
      res.end(`Sum is ${sum}`);
    });
  } else {
    res.end('Ok')
  }
});

server.listen(3000);

主進程的事件循環不會再被耗時計算阻塞，但進程數量還須要進一步限制，不然資源被進程消耗殆盡時服務能力仍會受到影響

P.S.實際上，cluster模塊就是對多進程服務能力的封裝，思路與這個簡單示例相似

三.通訊方式
1.經過stdin/stdout傳遞json
stdin/stdout and a JSON payload

最直接的通訊方式，拿到子進程的handle後，能夠訪問其stdio流，而後約
定一種message格式開始愉快地通訊：

const { spawn } = require('child_process');

child = spawn('node', ['./stdio-child.js']);
child.stdout.setEncoding('utf8');
// 父進程-發
child.stdin.write(JSON.stringify({
  type: 'handshake',
  payload: '你好吖'
}));
// 父進程-收
child.stdout.on('data', function (chunk) {
  let data = chunk.toString();
  let message = JSON.parse(data);
  console.log(`${message.type} ${message.payload}`);
});

子進程與之相似：

// ./stdio-child.js
// 子進程-收
process.stdin.on('data', (chunk) => {
  let data = chunk.toString();
  let message = JSON.parse(data);
  switch (message.type) {
    case 'handshake':
      // 子進程-發
      process.stdout.write(JSON.stringify({
        type: 'message',
        payload: message.payload + ' : hoho'
      }));
      break;
    default:
      break;
  }
});

P.S.VS Code進程間通訊就採用了這種方式，具體見access electron API from vscode extension

明顯的限制是須要拿到「子」進程的handle，兩個徹底獨立的進程之間沒法經過這種方式來通訊（好比跨應用，甚至跨機器的場景）

P.S.關於stream及pipe的詳細信息，請查看Node中的流

2.原生IPC支持
如spawn()及fork()的例子，進程之間能夠藉助內置的IPC機制通訊

父進程：

process.on('message')收

child.send()發

子進程：

process.on('message')收

process.send()發

限制同上，一樣要有一方可以拿到另外一方的handle才行

3.sockets
藉助網絡來完成進程間通訊，不只能跨進程，還能跨機器

node-ipc就採用這種方案，例如：

// server
const ipc=require('../../../node-ipc');

ipc.config.id = 'world';
ipc.config.retry= 1500;
ipc.config.maxConnections=1;

ipc.serveNet(
    function(){
        ipc.server.on(
            'message',
            function(data,socket){
                ipc.log('got a message : ', data);
                ipc.server.emit(
                    socket,
                    'message',
                    data+' world!'
                );
            }
        );

        ipc.server.on(
            'socket.disconnected',
            function(data,socket){
                console.log('DISCONNECTED\n\n',arguments);
            }
        );
    }
);
ipc.server.on(
    'error',
    function(err){
        ipc.log('Got an ERROR!',err);
    }
);
ipc.server.start();

// client
const ipc=require('node-ipc');

ipc.config.id = 'hello';
ipc.config.retry= 1500;

ipc.connectToNet(
    'world',
    function(){
        ipc.of.world.on(
            'connect',
            function(){
                ipc.log('## connected to world ##', ipc.config.delay);
                ipc.of.world.emit(
                    'message',
                    'hello'
                );
            }
        );
        ipc.of.world.on(
            'disconnect',
            function(){
                ipc.log('disconnected from world');
            }
        );
        ipc.of.world.on(
            'message',
            function(data){
                ipc.log('got a message from world : ', data);
            }
        );
    }
);

P.S.更多示例見RIAEvangelist/node-ipc

固然，單機場景下經過網絡來完成進程間通訊有些浪費性能，但網絡通訊的優點在於跨環境的兼容性與更進一步的RPC場景

4.message queue
父子進程都經過外部消息機制來通訊，跨進程的能力取決於MQ支持

即進程間不直接通訊，而是經過中間層（MQ），加一個控制層就能得到更多靈活性和優點：

穩定性：消息機制提供了強大的穩定性保證，好比確認送達（消息回執ACK），失敗重發/防止多發等等

優先級控制：容許調整消息響應次序

離線能力：消息能夠被緩存

事務性消息處理：把關聯消息組合成事務，保證其送達順序及完整性

P.S.很差實現？包一層能解決嗎，不行就包兩層……

比較受歡迎的有smrchy/rsmq，例如：

// init
RedisSMQ = require("rsmq");
rsmq = new RedisSMQ( {host: "127.0.0.1", port: 6379, ns: "rsmq"} );
// create queue
rsmq.createQueue({qname:"myqueue"}, function (err, resp) {
    if (resp===1) {
      console.log("queue created")
    }
});
// send message
rsmq.sendMessage({qname:"myqueue", message:"Hello World"}, function (err, resp) {
  if (resp) {
    console.log("Message sent. ID:", resp);
  }
});
// receive message
rsmq.receiveMessage({qname:"myqueue"}, function (err, resp) {
  if (resp.id) {
    console.log("Message received.", resp)  
  }
  else {
    console.log("No messages for me...")
  }
});

會起一個Redis server，基本原理以下：

Using a shared Redis server multiple Node.js processes can send / receive messages.

消息的收/發/緩存/持久化依靠Redis提供的能力，在此基礎上實現完整的隊列機制

5.Redis
基本思路與message queue相似：

Use Redis as a message bus/broker.

Redis自帶Pub/Sub機制（即發佈-訂閱模式），適用於簡單的通訊場景，好比一對一或一對多而且不關注消息可靠性的場景

另外，Redis有list結構，能夠用做消息隊列，以此提升消息可靠性。通常作法是生產者LPUSH消息，消費者BRPOP消息。適用於要求消息可靠性的簡單通訊場景，但缺點是消息不具狀態，且沒有ACK機制，沒法知足複雜的通訊需求

P.S.Redis的Pub/Sub示例見What’s the most efficient node.js inter-process communication library/method?

四.總結
Node進程間通訊有4種方式：

經過stdin/stdout傳遞json：最直接的方式，適用於可以拿到「子」進程handle的場景，適用於關聯進程之間通訊，沒法跨機器

Node原生IPC支持：最native（地道？）的方式，比上一種「正規」一些，具備一樣的侷限性

經過sockets：最通用的方式，有良好的跨環境能力，但存在網絡的性能損耗

藉助message queue：最強大的方式，既然要通訊，場景還複雜，不妨擴展出一層消息中間件，漂亮地解決各類通訊問題

參考資料Node.js Child Processes: Everything you need to know

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。