服務器集羣之間突然ssh跳轉不通html
# ssh 192.168.0.1
The authenticity of host '192.168.0.1 (192.168.0.1)' can't be established.
RSA1 key fingerprint is 07:e4:54:79:62:60:22:c2:72:23:21:00:54:a0:90:79.
Are you sure you want to continue connecting (yes/no)?node
輸入yes以後要求輸入密碼,可是以前設置的是免密登陸,查看 ~/.ssh 目錄下文件均正常web
# ls -l ~/.ssh
total 16
-rw------- 1 root root 2040 Jan 10 11:32 authorized_keys
-rwx------ 1 root root 1679 Jan 10 11:27 id_rsa
-rwx------ 1 root root 408 Jan 10 11:27 id_rsa.pub
-rwx------ 1 root root 2753 Jan 10 11:27 known_hosts服務器
再檢查ssh版本ssh
# ssh -V
OpenSSH_7.4p1, SSH protocols 1.5/2.0, OpenSSL 0x100020bfide# yum list installed|grep openssh
openssh.x86_64 7.4p1-16.el7 @base
openssh-clients.x86_64 7.4p1-16.el7 @base
openssh-server.x86_64 7.4p1-16.el7 @base測試# ls -l /usr/sbin/sshd
-rwxr-xr-x 1 root root 1288984 Feb 13 06:02 /usr/sbin/sshdui# ps aux|grep sshd
root 8698 0.0 0.0 25236 1236 ? Ss 06:02 0:00 sshdunix
看起來是早上6點時sshd剛升級到7.4(文件更新同時進程重啓),檢查sshd狀態發現有報錯rest
# service sshd status
Redirecting to /bin/systemctl status sshd.service
鈼sshd.service - OpenSSH server daemon
Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled; vendor preset: enabled)
Active: activating (auto-restart) (Result: exit-code) since Wed 2019-02-13 13:41:32 CST; 18s ago
Docs: man:sshd(8)
man:sshd_config(5)
Process: 11999 ExecStart=/usr/sbin/sshd -D $OPTIONS (code=exited, status=255)
Main PID: 11999 (code=exited, status=255)
Tasks: 0
Memory: 0B
CGroup: /system.slice/sshd.serviceFeb 13 13:41:32 $server systemd[1]: sshd.service: main process exited, code=exited, status=255/n/a
Feb 13 13:41:32 $server sshd[11999]: This private key will be ignored.
Feb 13 13:41:32 $server sshd[11999]: bad permissions: ignore key: /etc/ssh/ssh_host_rsa_key
Feb 13 13:41:32 $server sshd[11999]: Could not load host key: /etc/ssh/ssh_host_rsa_key
Feb 13 13:41:32 $server sshd[11999]: Could not load host key: /etc/ssh/ssh_host_dsa_key
Feb 13 13:41:32 $server sshd[11999]: Disabling protocol version 2. Could not load host key
Feb 13 13:41:32 $server systemd[1]: Failed to start OpenSSH server daemon.
Feb 13 13:41:32 $server systemd[1]: Unit sshd.service entered failed state.
Feb 13 13:41:32 $server systemd[1]: sshd.service failed.
嘗試手工啓動sshd來看下具體的錯誤
# sshd -D -p 8822
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: UNPROTECTED PRIVATE KEY FILE! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0640 for '/etc/ssh/ssh_host_rsa_key' are too open.
It is recommended that your private key files are NOT accessible by others.
This private key will be ignored.
bad permissions: ignore key: /etc/ssh/ssh_host_rsa_key
Could not load host key: /etc/ssh/ssh_host_rsa_key
Could not load host key: /etc/ssh/ssh_host_dsa_key
Disabling protocol version 2. Could not load host key
看起來是由於key文件權限太大致使ssh protocol 2被禁用
# ls -l /etc/ssh
total 612
-rw-r--r-- 1 root root 581843 Apr 11 2018 moduli
-rw-r--r-- 1 root root 1144 Feb 13 06:02 ssh_config
-rw------- 1 root root 2450 Feb 13 06:02 sshd_config
-rw-r-----. 1 root ssh_keys 227 Jan 22 2018 ssh_host_ecdsa_key
-rw-r--r--. 1 root root 162 Jan 22 2018 ssh_host_ecdsa_key.pub
-rw-r-----. 1 root ssh_keys 387 Jan 22 2018 ssh_host_ed25519_key
-rw-r--r--. 1 root root 82 Jan 22 2018 ssh_host_ed25519_key.pub
-rw------- 1 root root 991 Feb 13 06:02 ssh_host_key
-rw-r--r-- 1 root root 656 Feb 13 06:02 ssh_host_key.pub
-rw-r-----. 1 root ssh_keys 1675 Jan 22 2018 ssh_host_rsa_key
-rw-r--r--. 1 root root 382 Jan 22 2018 ssh_host_rsa_key.pub
將/etc/ssh下文件權限所有改成600
# chmod 600 /etc/ssh/*
而後使用測試的sshd進程跳轉8822端口一切正常,可是sshd service仍是不斷啓動失敗,懷疑是由於當前的sshd進程不是經過service啓動,因此sshd service不斷重啓可是沒法綁定端口,將sshd進程kill掉,再看sshd service終於啓動正常
# service sshd status
Redirecting to /bin/systemctl status sshd.service
鈼sshd.service - OpenSSH server daemon
Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled; vendor preset: enabled)
Active: activating (start) since Wed 2019-02-13 13:35:32 CST; 16s ago
Docs: man:sshd(8)
man:sshd_config(5)
Main PID: 4355 (sshd)
Tasks: 1
Memory: 416.0K
CGroup: /system.slice/sshd.service
鈹斺攢4355 /usr/sbin/sshd -DFeb 13 13:35:32 $server systemd[1]: Starting OpenSSH server daemon...
Feb 13 13:35:32 $server sshd[4355]: Could not load host key: /etc/ssh/ssh_host_dsa_key
Feb 13 13:35:32 $server sshd[4355]: Server listening on 0.0.0.0 port 22.
Feb 13 13:35:32 $server sshd[4355]: error: Bind to port 22 on :: failed: Address already in use.
可是還有一個error: Bind to port 22 on :: failed: Address already in use. 這個是由於在配置文件
$ vi /etc/ssh/sshd_config
#Port 22
#Protocol 2,1
#ListenAddress 0.0.0.0
#ListenAddress ::
默認會綁定ipv4和ipv6的22端口,將其中兩行取消註釋
Port 22
#Protocol 2,1
ListenAddress 0.0.0.0
#ListenAddress ::
啓動正常
Feb 13 15:30:04 $server systemd[1]: Starting OpenSSH server daemon...
Feb 13 15:30:04 $server sshd[4731]: Could not load host key: /etc/ssh/ssh_host_dsa_key
Feb 13 15:30:04 $server sshd[4731]: Server listening on 0.0.0.0 port 22.
可是過一段時間進程就會消失,查看sshd.service
# cat /usr/lib/systemd/system/sshd.service
[Unit]
Description=OpenSSH server daemon
Documentation=man:sshd(8) man:sshd_config(5)
After=network.target sshd-keygen.service
Wants=sshd-keygen.service[Service]
Type=notify
EnvironmentFile=/etc/sysconfig/sshd
ExecStart=/usr/sbin/sshd -D $OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
RestartSec=42s[Install]
WantedBy=multi-user.target
經過journalctl查看日誌發現
Feb 13 16:08:09 $server systemd[1]: sshd.service start operation timed out. Terminating.
Feb 13 16:08:09 $server sshd[26701]: Received signal 15; terminating.
Feb 13 16:08:09 $server systemd[1]: sshd.service: main process exited, code=exited, status=255/n/a
Feb 13 16:08:09 $server systemd[1]: Unit sshd.service entered failed state.
Feb 13 16:08:09 $server systemd[1]: sshd.service failed.
Feb 13 16:08:51 $server systemd[1]: sshd.service holdoff time over, scheduling restart.
看起來是由於不斷啓動超時被停止進程致使;
# vi /usr/lib/systemd/system/sshd.service
ExecStart=/usr/sbin/sshd $OPTIONS
將-D去掉,而後將以前的sshd kill掉,而後重啓sshd
# systemctl daemon-reload
# systemctl start sshd
至此sshd服務恢復,ssh跳轉恢復,因此openssh升級7.4時必定要修改/etc/ssh/下的文件權限,否則會禁用ssh protocol 2,同時也會致使以前的key失效;同時還要修改sshd.service;等一下,ansible還有報錯
192.168.0.1 | FAILED | rc=-1 >>
failed to open a SFTP connection (Channel closed.)
查看sshd_config
# vi /etc/ssh/sshd_config
Subsystem sftp /usr/libexec/sftp-server
發現sftp-server路徑不存在/usr/libexec/sftp-server,實際的路徑是/usr/libexec/openssh/sftp-server,修改以後重啓sshd,上面問題修復,再等一下,scp也有報錯
# scp 192.168.0.1:/file1 .
command-line: line 0: Bad configuration option: PermitLocalCommand
這個是因爲scp、sftp和ssh版本不匹配致使的,經過yum查看這幾個命令都在openssh-clients包中,可是這幾個命令的最後修改時間不同,重裝openssh-clients
# yum reinstall openssh-clients
而後這幾個命令的最後修改時間都一致了,問題也消失了;
參考:
https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html#managed-node-requirements
https://unix.stackexchange.com/questions/390224/openssh-server-start-failed-with-result-timeout/400644#400644
http://www.webopius.com/content/350/solution-to-error-command-line-line-0-bad-configuration-option-permitlocalcommand