英文原文:https://labs.meanpug.com/custom-application-metrics-with-django-prometheus-and-kubernetes/ 做者:Bobby Steinbach 譯者:馬若飛html
本文強調了應用程序定製指標的重要性,用代碼實例演示瞭如何設計指標並整合Prometheus到Django項目中,爲使用Django構建應用的開發者提供了參考。python
儘管有大量關於這一主題的討論,但應用程序的自定義指標的重要性怎麼強調都不爲過。和爲Django應用收集的核心服務指標(應用和web服務器統計數據、關鍵數據庫和緩存操做指標)不一樣,自定義指標是業務特有的數據點,其邊界和閾值只有你本身知道,這實際上是頗有趣的事情。nginx
什麼樣的指標纔是有用的?考慮下面幾點:git
除了明顯的依賴(pip install Django
)以外,咱們還須要爲寵物項目(譯者注:demo)添加一些額外的包。繼續並安裝pip install django-prometheus-client
。這將爲咱們提供一個Python的Prometheus客戶端,以及一些有用的Django hook,包括中間件和一個優雅的DB包裝器。接下來,咱們將運行Django管理命令來啓動項目,更新咱們的設置來使用Prometheus客戶端,並將Prometheus的URL添加到URL配置中。github
啓動一個新的項目和應用程序web
爲了這篇文章,而且切合代理的品牌,咱們創建了一個遛狗服務。請注意,它實際上不會作什麼事,但足以做爲一個教學示例。執行以下命令:sql
django-admin.py startproject demo python manage.py startapp walker
#settings.py INSTALLED_APPS = [ ... 'walker', ... ]
如今,咱們來添加一些基本的模型和視圖。簡單起見,我只實現將要驗證的部分。若是想要完整地示例,能夠從這個demo應用 獲取源碼。數據庫
# walker/models.py from django.db import models from django_prometheus.models import ExportModelOperationsMixin class Walker(ExportModelOperationsMixin('walker'), models.Model): name = models.CharField(max_length=127) email = models.CharField(max_length=127) def __str__(self): return f'{self.name} // {self.email} ({self.id})' class Dog(ExportModelOperationsMixin('dog'), models.Model): SIZE_XS = 'xs' SIZE_SM = 'sm' SIZE_MD = 'md' SIZE_LG = 'lg' SIZE_XL = 'xl' DOG_SIZES = ( (SIZE_XS, 'xsmall'), (SIZE_SM, 'small'), (SIZE_MD, 'medium'), (SIZE_LG, 'large'), (SIZE_XL, 'xlarge'), ) size = models.CharField(max_length=31, choices=DOG_SIZES, default=SIZE_MD) name = models.CharField(max_length=127) age = models.IntegerField() def __str__(self): return f'{self.name} // {self.age}y ({self.size})' class Walk(ExportModelOperationsMixin('walk'), models.Model): dog = models.ForeignKey(Dog, related_name='walks', on_delete=models.CASCADE) walker = models.ForeignKey(Walker, related_name='walks', on_delete=models.CASCADE) distance = models.IntegerField(default=0, help_text='walk distance (in meters)') start_time = models.DateTimeField(null=True, blank=True, default=None) end_time = models.DateTimeField(null=True, blank=True, default=None) @property def is_complete(self): return self.end_time is not None @classmethod def in_progress(cls): """ get the list of `Walk`s currently in progress """ return cls.objects.filter(start_time__isnull=False, end_time__isnull=True) def __str__(self): return f'{self.walker.name} // {self.dog.name} @ {self.start_time} ({self.id})'
# walker/views.py from django.shortcuts import render, redirect from django.views import View from django.core.exceptions import ObjectDoesNotExist from django.http import HttpResponseNotFound, JsonResponse, HttpResponseBadRequest, Http404 from django.urls import reverse from django.utils.timezone import now from walker import models, forms class WalkDetailsView(View): def get_walk(self, walk_id=None): try: return models.Walk.objects.get(id=walk_id) except ObjectDoesNotExist: raise Http404(f'no walk with ID {walk_id} in progress') class CheckWalkStatusView(WalkDetailsView): def get(self, request, walk_id=None, **kwargs): walk = self.get_walk(walk_id=walk_id) return JsonResponse({'complete': walk.is_complete}) class CompleteWalkView(WalkDetailsView): def get(self, request, walk_id=None, **kwargs): walk = self.get_walk(walk_id=walk_id) return render(request, 'index.html', context={'form': forms.CompleteWalkForm(instance=walk)}) def post(self, request, walk_id=None, **kwargs): try: walk = models.Walk.objects.get(id=walk_id) except ObjectDoesNotExist: return HttpResponseNotFound(content=f'no walk with ID {walk_id} found') if walk.is_complete: return HttpResponseBadRequest(content=f'walk {walk.id} is already complete') form = forms.CompleteWalkForm(data=request.POST, instance=walk) if form.is_valid(): updated_walk = form.save(commit=False) updated_walk.end_time = now() updated_walk.save() return redirect(f'{reverse("walk_start")}?walk={walk.id}') return HttpResponseBadRequest(content=f'form validation failed with errors {form.errors}') class StartWalkView(View): def get(self, request): return render(request, 'index.html', context={'form': forms.StartWalkForm()}) def post(self, request): form = forms.StartWalkForm(data=request.POST) if form.is_valid(): walk = form.save(commit=False) walk.start_time = now() walk.save() return redirect(f'{reverse("walk_start")}?walk={walk.id}') return HttpResponseBadRequest(content=f'form validation failed with errors {form.errors}')
更新應用設置並添加Prometheus urlsdjango
如今咱們有了一個Django項目以及相應的設置,能夠爲 django-prometheus添加須要的配置項了。在 settings.py
中添加下面的配置:api
INSTALLED_APPS = [ ... 'django_prometheus', ... ] MIDDLEWARE = [ 'django_prometheus.middleware.PrometheusBeforeMiddleware', .... 'django_prometheus.middleware.PrometheusAfterMiddleware', ] # we're assuming a Postgres DB here because, well, that's just the right choice :) DATABASES = { 'default': { 'ENGINE': 'django_prometheus.db.backends.postgresql', 'NAME': os.getenv('DB_NAME'), 'USER': os.getenv('DB_USER'), 'PASSWORD': os.getenv('DB_PASSWORD'), 'HOST': os.getenv('DB_HOST'), 'PORT': os.getenv('DB_PORT', '5432'), }, }
添加url配置到 urls.py
:
urlpatterns = [ ... path('', include('django_prometheus.urls')), ]
如今咱們有了一個配置好的基本應用,併爲整合作好了準備。
因爲django-prometheus
提供了開箱即用功能,咱們能夠當即追蹤一些基本的模型操做,好比插入和刪除。能夠在/metrics
endpoint看到這些:
django-prometheus提供的默認指標
讓咱們把它變得更有趣點。
添加一個walker/metrics.py
文件,定義一些要追蹤的基本指標。
# walker/metrics.py from prometheus_client import Counter, Histogram walks_started = Counter('walks_started', 'number of walks started') walks_completed = Counter('walks_completed', 'number of walks completed') invalid_walks = Counter('invalid_walks', 'number of walks attempted to be started, but invalid') walk_distance = Histogram('walk_distance', 'distribution of distance walked', buckets=[0, 50, 200, 400, 800, 1600, 3200])
很簡單,不是嗎?Prometheus文檔很好地解釋了每種指標類型的用途,簡言之,咱們使用計數器來表示嚴格隨時間增加的指標,使用直方圖來追蹤包含值分佈的指標。下面開始驗證應用的代碼。
# walker/views.py ... from walker import metrics ... class CompleteWalkView(WalkDetailsView): ... def post(self, request, walk_id=None, **kwargs): ... if form.is_valid(): updated_walk = form.save(commit=False) updated_walk.end_time = now() updated_walk.save() metrics.walks_completed.inc() metrics.walk_distance.observe(updated_walk.distance) return redirect(f'{reverse("walk_start")}?walk={walk.id}') return HttpResponseBadRequest(content=f'form validation failed with errors {form.errors}') ... class StartWalkView(View): ... def post(self, request): if form.is_valid(): walk = form.save(commit=False) walk.start_time = now() walk.save() metrics.walks_started.inc() return redirect(f'{reverse("walk_start")}?walk={walk.id}') metrics.invalid_walks.inc() return HttpResponseBadRequest(content=f'form validation failed with errors {form.errors}')
發送幾個樣例請求,能夠看到新指標已經產生了。
顯示散步距離和建立散步的指標
定義的指標此時已經能夠在prometheus裏查找到了
至此,咱們已經在代碼中添加了自定義指標,整合了應用以追蹤指標,並驗證了這些指標已在/metrics
上更新並可用。讓咱們繼續將儀表化應用部署到Kubernetes集羣。
我只會列出和追蹤、導出指標相關的配置內容,完整的Helm chart部署和服務配置能夠在 demo應用中找到。 做爲起點,這有一些和導出指標相關的deployment和configmap的配置:
# helm/demo/templates/nginx-conf-configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: {{ include "demo.fullname" . }}-nginx-conf ... data: demo.conf: | upstream app_server { server 127.0.0.1:8000 fail_timeout=0; } server { listen 80; client_max_body_size 4G; # set the correct host(s) for your site server_name{{ range .Values.ingress.hosts }} {{ . }}{{- end }}; keepalive_timeout 5; root /code/static; location / { # checks for static file, if not found proxy to app try_files $uri @proxy_to_app; } location ^~ /metrics { auth_basic "Metrics"; auth_basic_user_file /etc/nginx/secrets/.htpasswd; proxy_pass http://app_server; } location @proxy_to_app { proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header Host $http_host; # we don't want nginx trying to do something clever with # redirects, we set the Host: header above already. proxy_redirect off; proxy_pass http://app_server; } }
# helm/demo/templates/deployment.yaml apiVersion: apps/v1 kind: Deployment ... spec: metadata: labels: app.kubernetes.io/name: {{ include "demo.name" . }} app.kubernetes.io/instance: {{ .Release.Name }} app: {{ include "demo.name" . }} volumes: ... - name: nginx-conf configMap: name: {{ include "demo.fullname" . }}-nginx-conf - name: prometheus-auth secret: secretName: prometheus-basic-auth ... containers: - name: {{ .Chart.Name }}-nginx image: "{{ .Values.nginx.image.repository }}:{{ .Values.nginx.image.tag }}" imagePullPolicy: IfNotPresent volumeMounts: ... - name: nginx-conf mountPath: /etc/nginx/conf.d/ - name: prometheus-auth mountPath: /etc/nginx/secrets/.htpasswd ports: - name: http containerPort: 80 protocol: TCP - name: {{ .Chart.Name }} image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}" imagePullPolicy: {{ .Values.image.pullPolicy }} command: ["gunicorn", "--worker-class", "gthread", "--threads", "3", "--bind", "0.0.0.0:8000", "demo.wsgi:application"] env: {{ include "demo.env" . | nindent 12 }} ports: - name: gunicorn containerPort: 8000 protocol: TCP ...
沒什麼神奇的,只是一些YAML而已。有兩個重點須要強調一下:
/metrics
放在了驗證後面,爲location塊設置了auth_basic指令集。你可能但願在反向代理以後部署gunicorn ,但這樣作能夠得到保護指標的額外好處。基於Helm的幫助文檔,部署Prometheus很是簡單,不須要額外工做:
helm upgrade --install prometheus stable/prometheus
幾分鐘後,你應該就能夠經過 port-forward
進入Prometheus的pod(默認的容器端口是9090)。
Prometheus Helm chart 有大量的自定義可選項,不過咱們只須要設置extraScrapeConfigs
。建立一個values.yaml
文件。你能夠略過這部分直接使用 demo應用 做爲參考。文件內容以下:
extraScrapeConfigs: | - job_name: demo scrape_interval: 5s metrics_path: /metrics basic_auth: username: prometheus password: prometheus tls_config: insecure_skip_verify: true kubernetes_sd_configs: - role: endpoints namespaces: names: - default relabel_configs: - source_labels: [__meta_kubernetes_service_label_app] regex: demo action: keep - source_labels: [__meta_kubernetes_endpoint_port_name] regex: http action: keep - source_labels: [__meta_kubernetes_namespace] target_label: namespace - source_labels: [__meta_kubernetes_pod_name] target_label: pod - source_labels: [__meta_kubernetes_service_name] target_label: service - source_labels: [__meta_kubernetes_service_name] target_label: job - target_label: endpoint replacement: http
建立完成後,就能夠經過下面的操做爲prometheus deployment更新配置。
helm upgrade --install prometheus -f values.yaml
爲驗證全部的步驟都配置正確了,打開瀏覽器輸入 http://localhost:9090/targets
(假設你已經經過 port-forward
進入了運行prometheus的Pod)。若是你看到demo應用在target的列表中,說明運行正常了。
我要強調一點:捕獲自定義的應用程序指標並設置相應的報告和監控是軟件工程中最重要的任務之一。幸運的是,將Prometheus指標集成到Django應用程序中實際上很是簡單,正如本文展現的那樣。若是你想要開始監測本身的應用,請參考完整的示例應用程序,或者直接fork代碼庫。祝你玩得開心。 ServiceMesher 社區是由一羣擁有相同價值觀和理念的志願者們共同發起,於 2018 年 4 月正式成立。 社區關注領域有:容器、微服務、Service Mesh、Serverless,擁抱開源和雲原生,致力於推進 Service Mesh 在中國的蓬勃發展。 社區官網:https://www.servicemesher.com