問題出現:node
重啓後出現了這樣的狀況:
查看詳細的參數
查看數據庫neutron 中對應的agents表。發現表中沒有alive這個字段
python
這些服務的實際狀態爲active:
----1------
● neutron-l3-agent.service - OpenStack Neutron Layer 3 Agent
Loaded: loaded (/usr/lib/systemd/system/neutron-l3-agent.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2016-06-22 19:39:34 EDT; 1h 31min ago
Main PID: 2847 (neutron-l3-agen)
CGroup: /system.slice/neutron-l3-agent.service
-----2-----
● neutron-openvswitch-agent.service - OpenStack Neutron Open vSwitch Agent
Loaded: loaded (/usr/lib/systemd/system/neutron-openvswitch-agent.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2016-06-22 19:39:34 EDT; 1h 30min ago
Main PID: 2846 (neutron-openvsw)
CGroup: /system.slice/neutron-openvswitch-agent.service
----3------
● neutron-dhcp-agent.service - OpenStack Neutron DHCP Agent
Loaded: loaded (/usr/lib/systemd/system/neutron-dhcp-agent.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2016-06-22 19:39:34 EDT; 2h 12min ago
Main PID: 2848 (neutron-dhcp-ag)
CGroup: /system.slice/neutron-dhcp-agent.service
----4----
● neutron-metadata-agent.service - OpenStack Neutron Metadata Agent
Loaded: loaded (/usr/lib/systemd/system/neutron-metadata-agent.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2016-06-22 21:18:53 EDT; 34min ago
Main PID: 13505 (neutron-metadat)
CGroup: /system.slice/neutron-metadata-agent.service
---5-----
[root@compute1 ~]# systemctl status neutron-openvswitch-agent.service
● neutron-openvswitch-agent.service - OpenStack Neutron Open vSwitch Agent
Loaded: loaded (/usr/lib/systemd/system/neutron-openvswitch-agent.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2016-06-22 19:39:13 EDT; 2h 16min ago
Main PID: 1435 (neutron-openvsw)
CGroup: /system.slice/neutron-openvswitch-agent.servicelinux
=================數據庫
在compute node中作了關閉一個服務 再開啓一個服務 查看log
發現log中有這樣的提示 :
Agent out of sync with plugin!
Agent tunnel out of sync with plugin!
----------------------關閉服務後從新開啓服務----------------
systemctl stop neutron-openvswitch-agent.service
[root@compute1 ~]# systemctl status neutron-openvswitch-agent.service
● neutron-openvswitch-agent.service - OpenStack Neutron Open vSwitch Agent
Loaded: loaded (/usr/lib/systemd/system/neutron-openvswitch-agent.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Wed 2016-06-22 22:00:18 EDT; 1min 34s ago
[root@compute1 ~]# systemctl start neutron-openvswitch-agent.service
[root@compute1 ~]# systemctl status neutron-openvswitch-agent.service
● neutron-openvswitch-agent.service - OpenStack Neutron Open vSwitch Agent
Loaded: loaded (/usr/lib/systemd/system/neutron-openvswitch-agent.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2016-06-22 22:02:05 EDT; 38s ago
---------------log-------------------
[root@compute1 neutron]# tail -f openvswitch-agent.log
2016-06-22 22:02:05.933 17125 INFO neutron.common.config [-] Logging enabled!
2016-06-22 22:02:05.934 17125 INFO neutron.common.config [-] /usr/bin/neutron-openvswitch-agent version 2015.1.2
2016-06-22 22:02:05.943 17125 WARNING oslo_config.cfg [-] Option "lock_path" from group "DEFAULT" is deprecated. Use option "lock_path" from group "oslo_concurrency".
2016-06-22 22:02:06.952 17125 INFO oslo_messaging._drivers.impl_rabbit [-] Connecting to AMQP server on controller0:5672
2016-06-22 22:02:07.015 17125 INFO oslo_messaging._drivers.impl_rabbit [-] Connected to AMQP server on controller0:5672
2016-06-22 22:02:07.036 17125 INFO oslo_messaging._drivers.impl_rabbit [-] Connecting to AMQP server on controller0:5672
2016-06-22 22:02:07.075 17125 INFO oslo_messaging._drivers.impl_rabbit [-] Connected to AMQP server on controller0:5672
2016-06-22 22:02:07.705 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connecting to AMQP server on controller0:5672
2016-06-22 22:02:07.726 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connected to AMQP server on controller0:5672
2016-06-22 22:02:07.745 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connecting to AMQP server on controller0:5672
2016-06-22 22:02:07.763 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connected to AMQP server on controller0:5672
2016-06-22 22:02:07.778 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connecting to AMQP server on controller0:5672
2016-06-22 22:02:07.795 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connected to AMQP server on controller0:5672
2016-06-22 22:02:07.814 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connecting to AMQP server on controller0:5672
2016-06-22 22:02:07.835 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connected to AMQP server on controller0:5672
2016-06-22 22:02:07.852 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connecting to AMQP server on controller0:5672
2016-06-22 22:02:07.872 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connected to AMQP server on controller0:5672
2016-06-22 22:02:07.890 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connecting to AMQP server on controller0:5672
2016-06-22 22:02:07.907 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connected to AMQP server on controller0:5672
2016-06-22 22:02:07.925 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connecting to AMQP server on controller0:5672
2016-06-22 22:02:07.943 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connected to AMQP server on controller0:5672
2016-06-22 22:02:07.964 17125 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Agent initialized successfully, now running...
2016-06-22 22:02:07.976 17125 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Agent out of sync with plugin!
2016-06-22 22:02:08.106 17125 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Agent tunnel out of sync with plugin!
------------neutron-dhcp-agent.log-------
在此log中看到了沒有發送report 報了一個 Failed reporting state錯誤
2016-06-21 22:55:35.950 13361 INFO neutron.agent.dhcp.agent [-] Synchronizing state complete
2016-06-21 22:55:35.952 13361 INFO oslo_messaging._drivers.impl_rabbit [-] Connecting to AMQP server on controller0:5672
2016-06-21 22:55:35.970 13361 INFO neutron.agent.dhcp.agent [req-12dfeee8-c542-46ad-b3f5-2557c2fcbd2f ] Synchronizing state
2016-06-21 22:55:35.995 13361 INFO oslo_messaging._drivers.impl_rabbit [-] Connected to AMQP server on controller0:5672
2016-06-21 22:55:36.057 13361 INFO neutron.agent.dhcp.agent [-] DHCP agent started
2016-06-21 22:55:36.074 13361 INFO neutron.agent.dhcp.agent [req-12dfeee8-c542-46ad-b3f5-2557c2fcbd2f ] Synchronizing state complete
2016-06-22 19:39:35.733 2848 INFO neutron.common.config [-] Logging enabled!
2016-06-22 19:39:35.756 2848 INFO neutron.common.config [-] /usr/bin/neutron-dhcp-agent version 2015.1.2
2016-06-22 19:39:35.869 2848 WARNING oslo_config.cfg [req-8833acf1-c0ca-4783-bbfc-12ccfe4717d6 ] Option "lock_path" from group "DEFAULT" is deprecated. Use option "lock_path" from group "oslo_concurrency".
2016-06-22 19:39:35.884 2848 INFO oslo_messaging._drivers.impl_rabbit [req-fba865de-cb4e-44ee-9ccd-25ca25b23be9 ] Connecting to AMQP server on controller0:5672
2016-06-22 19:39:35.908 2848 INFO neutron.agent.dhcp.agent [-] Synchronizing state
2016-06-22 19:39:35.945 2848 INFO oslo_messaging._drivers.impl_rabbit [req-fba865de-cb4e-44ee-9ccd-25ca25b23be9 ] Connected to AMQP server on controller0:5672
2016-06-22 19:39:35.957 2848 INFO oslo_messaging._drivers.impl_rabbit [req-fba865de-cb4e-44ee-9ccd-25ca25b23be9 ] Connecting to AMQP server on controller0:5672
2016-06-22 19:39:35.973 2848 INFO oslo_messaging._drivers.impl_rabbit [-] Connecting to AMQP server on controller0:5672
2016-06-22 19:39:36.258 2848 INFO oslo_messaging._drivers.impl_rabbit [-] Connected to AMQP server on controller0:5672
2016-06-22 19:39:36.269 2848 INFO oslo_messaging._drivers.impl_rabbit [req-fba865de-cb4e-44ee-9ccd-25ca25b23be9 ] Connected to AMQP server on controller0:5672
2016-06-22 19:40:36.281 2848 ERROR neutron.agent.dhcp.agent [-] Unable to sync network state.
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent Traceback (most recent call last):
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 157, in sync_state
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent active_networks = self.plugin_rpc.get_active_networks_info()
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 417, in get_active_networks_info
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent host=self.host)
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 156, in call
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent retry=self.retry)
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 90, in _send
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent timeout=timeout, retry=retry)
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 350, in send
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent retry=retry)
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 339, in _send
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent result = self._waiter.wait(msg_id, timeout)
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 243, in wait
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent message = self.waiters.get(msg_id, timeout=timeout)
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 149, in get
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent 'to message ID %s' % msg_id)
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent MessagingTimeout: Timed out waiting for a reply to message ID 699644629f1740e8b6013baba374bbc2
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent
2016-06-22 19:40:36.310 2848 INFO oslo_messaging._drivers.impl_rabbit [-] Connecting to AMQP server on controller0:5672
2016-06-22 19:40:36.320 2848 ERROR neutron.agent.dhcp.agent [req-fba865de-cb4e-44ee-9ccd-25ca25b23be9 ] Failed reporting state!
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent Traceback (most recent call last):
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 575, in _report_state
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent self.state_rpc.report_state(ctx, self.agent_state, self.use_call)
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/rpc.py", line 80, in report_state
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent return method(context, 'report_state', **kwargs)
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 156, in call
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent retry=self.retry)
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 90, in _send
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent timeout=timeout, retry=retry)
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 350, in send
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent retry=retry)
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 339, in _send
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent result = self._waiter.wait(msg_id, timeout)
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 243, in wait
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent message = self.waiters.get(msg_id, timeout=timeout)
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 149, in get
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent 'to message ID %s' % msg_id)
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent MessagingTimeout: Timed out waiting for a reply to message ID 27290842d8d241e1b71757fb33c57f65app
----google了一下----
that when the agents first boot up, they are out of sync. And that's normal behaviour. Then they do synchronize, but no message is written back in the logs,
當重啓agent的時候若是提示下面的是正確的舉動:
Agent tunnel out of sync with plugin!
Agent out of sync with plugin!
--------google到一個可能的解決方案點-------
也就是說agents會固定一段時間(75s)去向neutron-server報告,若是neutron-server沒有收到agents們的報告就會顯示爲XXX。從這個點出發,建議查看schedule task
Agents report their own status to neutron-server periodically. The default inter time is 75 seconds. If neutron server can't recieve the report in 75 secods,the alive of the agent will be xxx. And it will be changed to :-) after recieving new status report.
Translates into : If this happens all of the time and is causing issues with scheduling then you should look into load on the servers where the agents are running, look into the logs of the agents, see if there are any issues with scheduled taskspython2.7
----在log中發現了一些問題,彷佛是由於沒有qdhcp的namespace---
2016-06-23 15:21:39.505 2876 INFO oslo_messaging._drivers.impl_rabbit [-] Connecting to AMQP server on controller0:5672
2016-06-23 15:21:39.535 2876 INFO neutron.agent.dhcp.agent [req-572ca88c-99d5-4ae9-8c06-86b62ee54745 ] Synchronizing state
2016-06-23 15:21:39.540 2876 INFO oslo_messaging._drivers.impl_rabbit [-] Connected to AMQP server on controller0:5672
2016-06-23 15:21:39.586 2876 INFO neutron.agent.dhcp.agent [-] DHCP agent started
2016-06-23 15:21:40.584 2876 ERROR neutron.agent.linux.utils [req-572ca88c-99d5-4ae9-8c06-86b62ee54745 ]
Command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'delete', 'qdhcp-60cf7464-09c8-4c4a-ba8d-cd3004970bd6']
Exit code: 1
Stdin:
Stdout:
Stderr: Cannot remove namespace file "/var/run/netns/qdhcp-60cf7464-09c8-4c4a-ba8d-cd3004970bd6": No such file or directory
2016-06-23 15:21:40.584 2876 WARNING neutron.agent.linux.dhcp [req-572ca88c-99d5-4ae9-8c06-86b62ee54745 ] Failed trying to delete namespace: qdhcp-60cf7464-09c8-4c4a-ba8d-cd3004970bd6
2016-06-23 15:21:40.585 2876 INFO neutron.agent.dhcp.agent [req-572ca88c-99d5-4ae9-8c06-86b62ee54745 ] Synchronizing state complete
2016-06-23 15:21:45.586 2876 INFO neutron.agent.dhcp.agent [-] Synchronizing state
2016-06-23 15:21:45.693 2876 INFO neutron.agent.dhcp.agent [-] Synchronizing state complete
2016-06-23 16:50:22.062 2833 INFO neutron.common.config [-] Logging enabled!
2016-06-23 16:50:22.063 2833 INFO neutron.common.config [-] /usr/bin/neutron-dhcp-agent version 2015.1.2
2016-06-23 16:50:22.100 2833 WARNING oslo_config.cfg [req-606d77c7-5a86-4edc-b03e-2440cb627da1 ] Option "lock_path" from group "DEFAULT" is deprecated. Use option "lock_path" from group "oslo_concurrency".
2016-06-23 16:50:22.106 2833 INFO oslo_messaging._drivers.impl_rabbit [req-d6ceb7b0-dd30-45aa-bcb6-32e3d6f04be8 ] Connecting to AMQP server on controller0:5672
2016-06-23 16:50:22.114 2833 INFO neutron.agent.dhcp.agent [-] Synchronizing state
2016-06-23 16:50:22.135 2833 INFO oslo_messaging._drivers.impl_rabbit [req-d6ceb7b0-dd30-45aa-bcb6-32e3d6f04be8 ] Connected to AMQP server on controller0:5672
2016-06-23 16:50:22.155 2833 INFO oslo_messaging._drivers.impl_rabbit [req-d6ceb7b0-dd30-45aa-bcb6-32e3d6f04be8 ] Connecting to AMQP server on controller0:5672
2016-06-23 16:50:22.168 2833 INFO oslo_messaging._drivers.impl_rabbit [-] Connecting to AMQP server on controller0:5672
2016-06-23 16:50:22.208 2833 INFO oslo_messaging._drivers.impl_rabbit [req-d6ceb7b0-dd30-45aa-bcb6-32e3d6f04be8 ] Connected to AMQP server on controller0:5672
2016-06-23 16:50:22.210 2833 INFO oslo_messaging._drivers.impl_rabbit [-] Connected to AMQP server on controller0:5672
2016-06-23 16:50:22.370 2833 INFO neutron.agent.dhcp.agent [-] Synchronizing state complete
2016-06-23 16:50:22.402 2833 INFO oslo_messaging._drivers.impl_rabbit [-] Connecting to AMQP server on controller0:5672
2016-06-23 16:50:22.421 2833 INFO neutron.agent.dhcp.agent [req-d6ceb7b0-dd30-45aa-bcb6-32e3d6f04be8 ] Synchronizing state
2016-06-23 16:50:22.433 2833 INFO oslo_messaging._drivers.impl_rabbit [-] Connected to AMQP server on controller0:5672
2016-06-23 16:50:22.463 2833 INFO neutron.agent.dhcp.agent [-] DHCP agent started
2016-06-23 16:50:22.521 2833 INFO neutron.agent.dhcp.agent [req-d6ceb7b0-dd30-45aa-bcb6-32e3d6f04be8 ] Synchronizing state complete
2016-06-23 16:51:49.764 2833 INFO neutron.openstack.common.service [req-606d77c7-5a86-4edc-b03e-2440cb627da1 ] Caught SIGTERM, exiting
2016-06-23 16:51:49.819 2833 ERROR oslo_messaging._drivers.impl_rabbit [-] Failed to consume message from queue:
2016-06-23 16:51:50.416 3288 INFO neutron.common.config [-] Logging enabled!
2016-06-23 16:51:50.416 3288 INFO neutron.common.config [-] /usr/bin/neutron-dhcp-agent version 2015.1.2
2016-06-23 16:51:50.430 3288 WARNING oslo_config.cfg [req-98f04cfd-f5db-4ff7-957d-9b436ac4a7ad ] Option "lock_path" from group "DEFAULT" is deprecated. Use option "lock_path" from group "oslo_concurrency".this
===最終問題解決===google
緣由:
各個node間的時間不一樣步致使了各個agent不能正常的運行
[解決方案]
troubleshooting ntp or chrony to sync time with different node
1.在controller0上創建timeServer network and compute sync controller0
2.restart neutron agents on network node
3.restart nova-compute service on compute node
4.check neutron mult-agent service on network node
spa