Kubernetes源碼閱讀筆記——Controller Manager（之二）

時間 2019-11-13

標籤 kubernetes 源碼閱讀筆記 controller manager 之二简体版

原文原文鏈接

上一篇文章中，咱們看到了Controller Manager的基本運行邏輯，可是還有一些問題沒有解決，咱們將在本篇文章中進行分析。html

1、ListAndWatchapi

首先是Informer。上一篇中寫道，啓動Informer本質上是調用了controller的reflector的Run方法。下面咱們進入reflector的Run方法看看：緩存

k8s.io/cient-go/tools/cache/reflector.go

func (r *Reflector) Run(stopCh <-chan struct{}) {
    glog.V(3).Infof("Starting reflector %v (%s) from %s", r.expectedType, r.resyncPeriod, r.name)
    wait.Until(func() {
        if err := r.ListAndWatch(stopCh); err != nil {
            utilruntime.HandleError(err)
        }
    }, r.period, stopCh)
}

能夠看到，方法經過調用wait.Until，每過period時間段就執行一次ListAndWatch方法。app

進入ListAndWatch方法：函數

k8s.io/client-go/tools/cache/reflector.go // ListAndWatch first lists all items and get the resource version at the moment of call,
// and then use the resource version to watch.
// It returns error if ListAndWatch didn't even try to initialize watch.
func (r *Reflector) ListAndWatch(stopCh <-chan struct{}) error {
    glog.V(3).Infof("Listing and watching %v from %s", r.expectedType, r.name)
    var resourceVersion string

    // Explicitly set "0" as resource version - it's fine for the List()
    // to be served from cache and potentially be delayed relative to
    // etcd contents. Reflector framework will catch up via Watch() eventually.
    options := metav1.ListOptions{ResourceVersion: "0"}
    r.metrics.numberOfLists.Inc()
    start := r.clock.Now()
    list, err := r.listerWatcher.List(options)
    if err != nil {
        return fmt.Errorf("%s: Failed to list %v: %v", r.name, r.expectedType, err)
    }
    r.metrics.listDuration.Observe(time.Since(start).Seconds())
    listMetaInterface, err := meta.ListAccessor(list)
    if err != nil {
        return fmt.Errorf("%s: Unable to understand list result %#v: %v", r.name, list, err)
    }
    resourceVersion = listMetaInterface.GetResourceVersion()
    items, err := meta.ExtractList(list)
    if err != nil {
        return fmt.Errorf("%s: Unable to understand list result %#v (%v)", r.name, list, err)
    }
    r.metrics.numberOfItemsInList.Observe(float64(len(items)))
    if err := r.syncWith(items, resourceVersion); err != nil {
        return fmt.Errorf("%s: Unable to sync list result: %v", r.name, err)
    }
    r.setLastSyncResourceVersion(resourceVersion)

    resyncerrc := make(chan error, 1)
    cancelCh := make(chan struct{})
    defer close(cancelCh)
    go func() {
        resyncCh, cleanup := r.resyncChan()
        defer func() {
            cleanup() // Call the last one written into cleanup
        }()
        for {
            select {
            case <-resyncCh:
            case <-stopCh:
                return
            case <-cancelCh:
                return
            }
            if r.ShouldResync == nil || r.ShouldResync() {
                glog.V(4).Infof("%s: forcing resync", r.name)
                if err := r.store.Resync(); err != nil {
                    resyncerrc <- err
                    return
                }
            }
            cleanup()
            resyncCh, cleanup = r.resyncChan()
        }
    }()

    for {
        // give the stopCh a chance to stop the loop, even in case of continue statements further down on errors
        select {
        case <-stopCh:
            return nil
        default:
        }

        timeoutSeconds := int64(minWatchTimeout.Seconds() * (rand.Float64() + 1.0))
        options = metav1.ListOptions{
            ResourceVersion: resourceVersion,
            // We want to avoid situations of hanging watchers. Stop any wachers that do not
            // receive any events within the timeout window.
            TimeoutSeconds: &timeoutSeconds,
        }

        r.metrics.numberOfWatches.Inc()
        w, err := r.listerWatcher.Watch(options)
        if err != nil {
            switch err {
            case io.EOF:
                // watch closed normally
            case io.ErrUnexpectedEOF:
                glog.V(1).Infof("%s: Watch for %v closed with unexpected EOF: %v", r.name, r.expectedType, err)
            default:
                utilruntime.HandleError(fmt.Errorf("%s: Failed to watch %v: %v", r.name, r.expectedType, err))
            }
            // If this is "connection refused" error, it means that most likely apiserver is not responsive.
            // It doesn't make sense to re-list all objects because most likely we will be able to restart
            // watch where we ended.
            // If that's the case wait and resend watch request.
            if urlError, ok := err.(*url.Error); ok {
                if opError, ok := urlError.Err.(*net.OpError); ok {
                    if errno, ok := opError.Err.(syscall.Errno); ok && errno == syscall.ECONNREFUSED {
                        time.Sleep(time.Second)
                        continue
                    }
                }
            }
            return nil
        }

        if err := r.watchHandler(w, &resourceVersion, resyncerrc, stopCh); err != nil {
            if err != errorStopRequested {
                glog.Warningf("%s: watch of %v ended with: %v", r.name, r.expectedType, err)
            }
            return nil
        }
    }
}

這個方法的註釋已經寫明。首先，用默認的ResourceVersion 「0」調用List方法，列出資源，並調用ListAccessor方法進行驗證。工具

其次，調用GetResourceVersion方法取得資源的實際的ResourceVersion，並調用syncWith方法進行同步。oop

再次，用go func()後面的一大段代碼處理reflector緩存的同步。ui

最後，在一個for循環裏調用Watch方法，對資源的變化進行watch，並調用watchHandler方法對變化進行相應處理。this

Watch方法本質上是在調用ListWatch的WatchFunc。這一字段是在建立具體的Controller對象時向informer添加的，後面會詳細介紹。總之，Watch的本質與用戶經過client-go工具鏈接API Server是同樣的。url

進入watchHandler方法：

k8s.io/client-go/tools/cache/reflector.go // watchHandler watches w and keeps *resourceVersion up to date.
func (r *Reflector) watchHandler(w watch.Interface, resourceVersion *string, errc chan error, stopCh <-chan struct{}) error {
    start := r.clock.Now()
    eventCount := 0

    defer w.Stop()
    // update metrics
    ...

loop:
    for {
        select {
        case <-stopCh:
            return errorStopRequested
        case err := <-errc:
            return err
        case event, ok := <-w.ResultChan():
            if !ok {
                break loop
            }
            if event.Type == watch.Error {
                return apierrs.FromObject(event.Object)
            }
            if e, a := r.expectedType, reflect.TypeOf(event.Object); e != nil && e != a {
                utilruntime.HandleError(fmt.Errorf("%s: expected type %v, but watch event object had type %v", r.name, e, a))
                continue
            }
            meta, err := meta.Accessor(event.Object)
            if err != nil {
                utilruntime.HandleError(fmt.Errorf("%s: unable to understand watch event %#v", r.name, event))
                continue
            }
            newResourceVersion := meta.GetResourceVersion()
            switch event.Type {
            case watch.Added:
                err := r.store.Add(event.Object)
                if err != nil {
                    utilruntime.HandleError(fmt.Errorf("%s: unable to add watch event object (%#v) to store: %v", r.name, event.Object, err))
                }
            case watch.Modified:
                err := r.store.Update(event.Object)
                if err != nil {
                    utilruntime.HandleError(fmt.Errorf("%s: unable to update watch event object (%#v) to store: %v", r.name, event.Object, err))
                }
            case watch.Deleted:
                // TODO: Will any consumers need access to the "last known
                // state", which is passed in event.Object? If so, may need
                // to change this.
                err := r.store.Delete(event.Object)
                if err != nil {
                    utilruntime.HandleError(fmt.Errorf("%s: unable to delete watch event object (%#v) from store: %v", r.name, event.Object, err))
                }
            default:
                utilruntime.HandleError(fmt.Errorf("%s: unable to understand watch event %#v", r.name, event))
            }
            *resourceVersion = newResourceVersion
            r.setLastSyncResourceVersion(newResourceVersion)
            eventCount++
        }
    }

    watchDuration := r.clock.Now().Sub(start)
    ...
    return nil
}

這個方法中，最重要的是中間的case選擇。對於watch到的結果，按照Added、Modified、Deleted分別調用store的Add、Update、Delete方法，在FIFO隊列中進行更新。這個FIFO隊列就是以前在執行informer的Run方法時所建立的。

Informer的大體邏輯就是這樣，經過list-watch機制，與API Server創建鏈接，監聽資源的變化，在緩存中進行更新。

2、StartControllers

上一篇文章中提到，Controller Manager經過在StartControllers方法中調用啓動函數來啓動全部的Controller。咱們先來看看StartControllers方法：

cmd/kube-controller-manager/app/controllermanager.go

func StartControllers(ctx ControllerContext, startSATokenController InitFunc, controllers map[string]InitFunc, unsecuredMux *mux.PathRecorderMux) error {
    ...

    for controllerName, initFn := range controllers {
    if !ctx.IsControllerEnabled(controllerName) {
        klog.Warningf("%q is disabled", controllerName)
        continue
    }

    time.Sleep(wait.Jitter(ctx.ComponentConfig.Generic.ControllerStartInterval.Duration, ControllerStartJitter))

    klog.V(1).Infof("Starting %q", controllerName)
    debugHandler, started, err := initFn(ctx)
    if err != nil {
        klog.Errorf("Error starting %q", controllerName)
        return err
    }
    if !started {
        klog.Warningf("Skipping %q", controllerName)
        continue
    }
    ...

    return nil
}

咱們看到，方法經過對controllers中每一個元素執行各自的initFn來執行啓動函數，由於每一個Controller都是以[string]function的格式保存在map中的，因此直接執行自身的function便可。

下面咱們進入一個Controller的啓動函數看看。以deployment爲例。從NewControllerInitializers中找到controllers["deployment"]=startDeploymentController，進入startDeploymentController，能夠看到方法位於app/apps.go中：

cmd/kube-controller-manager/app/apps.go

func startDeploymentController(ctx ControllerContext) (http.Handler, bool, error) {
    if !ctx.AvailableResources[schema.GroupVersionResource{Group: "apps", Version: "v1", Resource: "deployments"}] {
    return nil, false, nil
    }
    dc, err := deployment.NewDeploymentController(
    ctx.InformerFactory.Apps().V1().Deployments(),
    ctx.InformerFactory.Apps().V1().ReplicaSets(),
    ctx.InformerFactory.Core().V1().Pods(),
    ctx.ClientBuilder.ClientOrDie("deployment-controller"),
    )
    if err != nil {
    return nil, true, fmt.Errorf("error creating Deployment controller: %v", err)
    }
    go dc.Run(int(ctx.ComponentConfig.DeploymentController.ConcurrentDeploymentSyncs), ctx.Stop)
    return nil, true, nil
}

咱們看到，方法調用了NewDeploymentController方法建立一個Deployment，並經過GO協程調用Run方法運行它。

咱們先來看一下NewDeploymentController方法。

3、NewDeploymentController

pkg/controller/deployment/deployment_controller.go

func NewDeploymentController(dInformer appsinformers.DeploymentInformer, rsInformer appsinformers.ReplicaSetInformer, podInformer coreinformers.PodInformer, client clientset.Interface) (*DeploymentController, error) {
    eventBroadcaster := record.NewBroadcaster()
    eventBroadcaster.StartLogging(klog.Infof)
    eventBroadcaster.StartRecordingToSink(&v1core.EventSinkImpl{Interface: client.CoreV1().Events("")})

    if client != nil && client.CoreV1().RESTClient().GetRateLimiter() != nil {
    if err := metrics.RegisterMetricAndTrackRateLimiterUsage("deployment_controller", client.CoreV1().RESTClient().GetRateLimiter()); err != nil {
        return nil, err
    }
    }
    dc := &DeploymentController{
    client:        client,
    eventRecorder: eventBroadcaster.NewRecorder(scheme.Scheme, v1.EventSource{Component: "deployment-controller"}),
    queue:         workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "deployment"),
    }
    dc.rsControl = controller.RealRSControl{
    KubeClient: client,
    Recorder:   dc.eventRecorder,
    }

    dInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
    AddFunc:    dc.addDeployment,
    UpdateFunc: dc.updateDeployment,
    // This will enter the sync loop and no-op, because the deployment has been deleted from the store.
    DeleteFunc: dc.deleteDeployment,
    })
    rsInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
    AddFunc:    dc.addReplicaSet,
    UpdateFunc: dc.updateReplicaSet,
    DeleteFunc: dc.deleteReplicaSet,
    })
    podInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
    DeleteFunc: dc.deletePod,
    })

    dc.syncHandler = dc.syncDeployment
    dc.enqueueDeployment = dc.enqueue

    dc.dLister = dInformer.Lister()
    dc.rsLister = rsInformer.Lister()
    dc.podLister = podInformer.Lister()
    dc.dListerSynced = dInformer.Informer().HasSynced
    dc.rsListerSynced = rsInformer.Informer().HasSynced
    dc.podListerSynced = podInformer.Informer().HasSynced
    return dc, nil
}

咱們看到，這個方法大體作了三件事。

（1）建立了一個Broadcaster，用於作kubernetes中event資源相關的處理。

（2）建立DeploymentController

咱們知道，Deployment並不直接操做集羣中的pod，而是經過操做ReplicaSet來間接操做pod，所以咱們能夠看到，DeploymentController結構體中也包含rsControl這一字段，用於操做ReplicaSet。而queue字段則維護了一個deployment的隊列，用於將須要更新狀態的deployment元素存入，後面會講到。

（3）添加回調函數

Informer的AddEventHandler方法爲deployment controller的全部informer添加了回調函數，對deployment和ReplicaSet的添加、更新、刪除進行相應的處理。

下面咱們詳細分析一下添加回調函數操做，這裏分紅三部分說，一是Informer的具體建立，二是函數被調用的邏輯，三是回調函數自己。

4、Informer()

在Deployment Controller中包含了deployment、replicaset、pod三個informer，都是各自的Informer()方法建立各自的informer。

以方法中dInformer.Informer()方法爲例。方法本質上調用了InformerFor方法：

k8s.io/client-go/informers/apps/v1/deployment.go

func (f *deploymentInformer) Informer() cache.SharedIndexInformer {
	return f.factory.InformerFor(&apps_v1.Deployment{}, f.defaultInformer)
}

InformerFor方法比較直觀，就是調用defaultInformer方法，將生成的informer存入factory的informer字段中。

defaultInformer方法則是調用了NewFilteredDeploymentInformer方法：

k8s.io/client-go/informers/apps/v1/deployment.go


func NewFilteredDeploymentInformer(client kubernetes.Interface, namespace string, resyncPeriod time.Duration, indexers cache.Indexers, tweakListOptions internalinterfaces.TweakListOptionsFunc) cache.SharedIndexInformer {
	return cache.NewSharedIndexInformer(
		&cache.ListWatch{
			ListFunc: func(options meta_v1.ListOptions) (runtime.Object, error) {
				if tweakListOptions != nil {
					tweakListOptions(&options)
				}
				return client.AppsV1().Deployments(namespace).List(options)
			},
			WatchFunc: func(options meta_v1.ListOptions) (watch.Interface, error) {
				if tweakListOptions != nil {
					tweakListOptions(&options)
				}
				return client.AppsV1().Deployments(namespace).Watch(options)
			},
		},
		&apps_v1.Deployment{},
		resyncPeriod,
		indexers,
	)
}

這個方法直接就返回了一個NewSharedIndexInformer結構體。能夠看到，這個結構體內部，定義了這個Informer的List和Watch函數，與通常用戶經過client-go鏈接API Server並沒有二致。

5、AddEventHandler

首先看調用的邏輯。AddEventHandler方法除了爲Informer添加新的回調函數外，還同時向這些新回調函數如何分發消息，使這些回調函數被調用。AddEventHandler方法只有一行，即調用AddEventHandlerWithResyncPeriod方法。

進入AddEventHandlerWithResyncPeriod方法：

k8s.io/client-go/tools/cache/shared_informer.go

func (s *sharedIndexInformer) AddEventHandlerWithResyncPeriod(handler ResourceEventHandler, resyncPeriod time.Duration) {
    s.startedLock.Lock()
    defer s.startedLock.Unlock()

    if s.stopped {
        glog.V(2).Infof("Handler %v was not added to shared informer because it has stopped already", handler)
        return
    }

    if resyncPeriod > 0 {
　　　　　　　　　　...
    }

    listener := newProcessListener(handler, resyncPeriod, determineResyncPeriod(resyncPeriod, s.resyncCheckPeriod), s.clock.Now(), initialBufferSize)

    if !s.started {
        s.processor.addListener(listener)
        return
    }

    // in order to safely join, we have to
    // 1. stop sending add/update/delete notifications
    // 2. do a list against the store
    // 3. send synthetic "Add" events to the new handler
    // 4. unblock
    s.blockDeltas.Lock()
    defer s.blockDeltas.Unlock()

    s.processor.addListener(listener)
    for _, item := range s.indexer.List() {
        listener.add(addNotification{newObj: item})
    }
}

咱們看到，這裏引入了一個listener的概念。listener是informer用於獲取Thread Safe Store元素並處理的機制。經過調用addListener方法，爲informer添加並運行listener。

隨後，又遍歷Thread Safe Store中的元素，將全部的消息都發送給新添加的EventHandler。這裏的目的是讓新EventHandler先處理一遍Thread Safe Store中已有的元素，否則它們就只能接收到新消息，沒有機會處理舊消息了。

6、addListener

listener的添加和運行都經過addListener方法進行。進入addListener方法：

k8s.io/client-go/tools/cache/shared_informer.go

func (p *sharedProcessor) addListener(listener *processorListener) {
	p.listenersLock.Lock()
	defer p.listenersLock.Unlock()

	p.addListenerLocked(listener)
	if p.listenersStarted {
		p.wg.Start(listener.run)
		p.wg.Start(listener.pop)
	}
}

addListener方法經過goroutine的方式，調用listener的run和pop方法，運行listener。

分別進入run和pop方法：

k8s.io/client-go/tools/cache/shared_informer.go

func (p *processorListener) run() {
    stopCh := make(chan struct{})
    wait.Until(func() {
        // this gives us a few quick retries before a long pause and then a few more quick retries
        err := wait.ExponentialBackoff(retry.DefaultRetry, func() (bool, error) {
            for next := range p.nextCh {
                switch notification := next.(type) {
                case updateNotification:
                    p.handler.OnUpdate(notification.oldObj, notification.newObj)
                case addNotification:
                    p.handler.OnAdd(notification.newObj)
                case deleteNotification:
                    p.handler.OnDelete(notification.oldObj)
                default:
                    utilruntime.HandleError(fmt.Errorf("unrecognized notification: %#v", next))
                }
            }
            // the only way to get here is if the p.nextCh is empty and closed
            return true, nil
        })

        // the only way to get here is if the p.nextCh is empty and closed
        if err == nil {
            close(stopCh)
        }
    }, 1*time.Minute, stopCh)
}

k8s.io/client-go/tools/cache/shared_informer.go

func (p *processorListener) pop() {
    defer utilruntime.HandleCrash()
    defer close(p.nextCh) // Tell .run() to stop

    var nextCh chan<- interface{}
    var notification interface{}
    for {
        select {
        case nextCh <- notification:
            // Notification dispatched
            var ok bool
            notification, ok = p.pendingNotifications.ReadOne()
            if !ok { // Nothing to pop
                nextCh = nil // Disable this select case
            }
        case notificationToAdd, ok := <-p.addCh:
            if !ok {
                return
            }
            if notification == nil { // No notification to pop (and pendingNotifications is empty)
                // Optimize the case - skip adding to pendingNotifications
                notification = notificationToAdd
                nextCh = p.nextCh
            } else { // There is already a notification waiting to be dispatched
                p.pendingNotifications.WriteOne(notificationToAdd)
            }
        }
    }
}

pop方法用到好幾個channel，本質上還是從channel中取出一個notification，並經過run方法進行處理。能夠看到，在run方法中有對於notification的類型選擇，並分別調用OnUpdate、OnAdd、OnDelete方法進行處理，而這三個方法則分別調用了前面AddEventHandler方法中傳進來的回調函數，即：

pkg/controller/deployment/deployment_controller.go

func NewDeploymentController(dInformer appsinformers.DeploymentInformer, rsInformer appsinformers.ReplicaSetInformer, podInformer coreinformers.PodInformer, client clientset.Interface) (*DeploymentController, error) {
    ...
    dInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
	AddFunc:    dc.addDeployment,
	UpdateFunc: dc.updateDeployment,
	DeleteFunc: dc.deleteDeployment,
    })
    rsInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
	AddFunc:    dc.addReplicaSet,
	UpdateFunc: dc.updateReplicaSet,
	DeleteFunc: dc.deleteReplicaSet,
    })
    podInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
	DeleteFunc: dc.deletePod,
    })

    ...
}

那麼這些notification從何而來呢？答案是，經過listener的add方法添加，即前述AddEventHandlerWithResyncPeriod方法的最後一句：

k8s.io/client-go/tools/cache/shared_informer.go

func (s *sharedIndexInformer) AddEventHandlerWithResyncPeriod(handler ResourceEventHandler, resyncPeriod time.Duration) {
        ...
	for _, item := range s.indexer.List() {
		listener.add(addNotification{newObj: item})
	}
}

add方法只有一行，就是將信息傳入listener的channel中。而信息則是經過indexer的List方法，從Thread Safe Store中得到。此外，上一篇文章中也提到過，informer會不斷從FIFO中pop元素，進行處理後向listener發送信息。因此對於EventHandler來講，一方面在剛註冊時，從Thread Safe Store中一次性得到全部舊的消息，一方面從FIFO隊列中一個一個地得到消息。

至此，informer的邏輯就很清晰了。首先經過ListAndWatch方法，將watch到的資源狀態的變化存入FIFO隊列。以後，在啓動controller後，經過factory生成相應的informer，並經過AddEventHandler方法添加相應的回調函數。AddEventHandler方法會爲informer添加listener，將Thread Safe Store隊列中的內容取出，根據其狀態調用相應的回調函數進行處理。同時FIFO隊列中的元素也會一個一個pop出來，供informer消費。

剩下的內容，將在下一篇文章中分析。下一篇文章連接http://www.javashuo.com/article/p-vluyzvvi-p.html