靜止背景下的卡爾曼多目標跟蹤html
最近學習了一下多目標跟蹤,看了看MathWorks的關於Motion-Based Multiple Object Tracking的Documention。算法
程序來自matlab的CV工具箱Computer Vision System Toolbox。這種方法用於靜止背景下的多目標檢測與跟蹤。ide
程序能夠分爲兩部分,1.每一幀檢測運動objects;函數
2.實時的將檢測獲得的區域匹配到相同一個物體;工具
檢測部分,用的是基於高斯混合模型的背景剪除法;學習
參考連接:http://blog.pluskid.org/?p=39ui
所謂單高斯模型,就是用多維高斯分佈機率來進行模式分類this
其中μ用訓練樣本均值代替,Σ用樣本方差代替,X爲d維的樣本向量。經過高斯機率公式就能夠得出類別C屬於正(負)樣本的機率。idea
而混合高斯模型(GMM)就是數據從多個高斯分佈中產生的。每一個GMM由K個高斯分佈線性疊加而成。
P(x)=Σp(k)*p(x|k) 至關於對各個高斯分佈進行加權(權係數越大,那麼這個數據屬於這個高斯分佈的可能性越大)
而在實際過程當中,咱們是在已知數據的前提下,對GMM進行參數估計,具體在這裏即爲圖片訓練一個合適的GMM模型。
那麼在前景檢測中,咱們會取靜止背景(約50幀圖像)來進行GMM參數估計,進行背景建模。分類域值官網取得0.7,經驗取值0.7-0.75可調。這一步將會分離前景和背景,輸出爲前景二值掩碼。
而後進行形態學運算,並經過函數返回運動區域的centroids和bboxes,完成前景檢測部分。
跟蹤部分,用的是卡爾曼濾波。卡爾曼是一個線性估計算法,能夠創建幀間bboxs的關係。
跟蹤分爲5種狀態: 1,新目標出現 2,目標匹配 3,目標遮擋 4,目標分離 5,目標消失。
卡爾曼原理在這兒我就不貼了,網上不少。
狀態方程: X(k+1)=A(K+1,K)X(K)+w(K) 其中 X(k)=[x(k),y(k),w(k),h(k),v(k)], x,y,w,h,分別表示bboxs的橫縱座標,長,寬。
觀測方程: Z(k)=H(k)X(k)+v(k) w(k), v(k),不相關的高斯白噪聲。
定義好了觀測方程與狀態方程以後就能夠用卡爾曼濾波器實現運動目標的跟蹤,步驟以下:
1)計算運動目標的特徵信息(運動質心,以及外接矩形)。
2)用獲得的特徵信息初始化卡爾曼濾波器(開始時能夠初始爲0)。
3)用卡爾曼濾波器對下一幀中對應的目標區域進行預測,當下一幀到來時,在預測區域內進行目標匹配。
4)若是匹配成功,則更新卡爾曼濾波器
在匹配的過程當中,使用的是匈牙利匹配算法,匈牙利算法在這裏有很好的介紹:http://blog.csdn.net/pi9nc/article/details/11848327
匈牙利匹配算法在此處是將新一幀圖片中檢測到的運動物體匹配到對應的軌跡。匹配過程是經過最小化卡爾曼預測獲得的質心與檢測到的質心之間的歐氏距離之和實現的
具體能夠分爲兩步:
1, 計算損失矩陣,大小爲[M N],其中,M是軌跡數目,N是檢測到的運動物體數目。
2, 求解損失矩陣
主要思路就是這麼多,下面貼上matlab的demo,你們能夠跑一跑。
function multiObjectTracking() % create system objects used for reading video, detecting moving objects, % and displaying the results obj = setupSystemObjects(); %初始化函數 tracks = initializeTracks(); % create an empty array of tracks %初始化軌跡對象 nextId = 1; % ID of the next track % detect moving objects, and track them across video frames while ~isDone(obj.reader) frame = readFrame(); %讀取一幀 [centroids, bboxes, mask] = detectObjects(frame); %前景檢測 predictNewLocationsOfTracks(); %根據位置進行卡爾曼預測 [assignments, unassignedTracks, unassignedDetections] = ... detectionToTrackAssignment(); %匈牙利匹配算法進行匹配 updateAssignedTracks();%分配好的軌跡更新 updateUnassignedTracks();%未分配的軌跡更新 deleteLostTracks();%刪除丟掉的軌跡 createNewTracks();%建立新軌跡 displayTrackingResults();%結果展現 end %% Create System Objects % Create System objects used for reading the video frames, detecting % foreground objects, and displaying results. function obj = setupSystemObjects() % Initialize Video I/O % Create objects for reading a video from a file, drawing the tracked % objects in each frame, and playing the video. % create a video file reader obj.reader = vision.VideoFileReader('atrium.avi'); %讀入視頻 % create two video players, one to display the video, % and one to display the foreground mask obj.videoPlayer = vision.VideoPlayer('Position', [20, 400, 700, 400]); %建立兩個窗口 obj.maskPlayer = vision.VideoPlayer('Position', [740, 400, 700, 400]); % Create system objects for foreground detection and blob analysis % The foreground detector is used to segment moving objects from % the background. It outputs a binary mask, where the pixel value % of 1 corresponds to the foreground and the value of 0 corresponds % to the background. obj.detector = vision.ForegroundDetector('NumGaussians', 3, ... %GMM進行前景檢測,高斯核數目爲3,前40幀爲背景幀,域值爲0.7 'NumTrainingFrames', 40, 'MinimumBackgroundRatio', 0.7); % Connected groups of foreground pixels are likely to correspond to moving % objects. The blob analysis system object is used to find such groups % (called 'blobs' or 'connected components'), and compute their % characteristics, such as area, centroid, and the bounding box. obj.blobAnalyser = vision.BlobAnalysis('BoundingBoxOutputPort', true, ... %輸出質心和外接矩形 'AreaOutputPort', true, 'CentroidOutputPort', true, ... 'MinimumBlobArea', 400); end %% Initialize Tracks % The |initializeTracks| function creates an array of tracks, where each % track is a structure representing a moving object in the video. The % purpose of the structure is to maintain the state of a tracked object. % The state consists of information used for detection to track assignment, % track termination, and display. % % The structure contains the following fields: % % * |id| : the integer ID of the track % * |bbox| : the current bounding box of the object; used % for display % * |kalmanFilter| : a Kalman filter object used for motion-based % tracking % * |age| : the number of frames since the track was first % detected % * |totalVisibleCount| : the total number of frames in which the track % was detected (visible) % * |consecutiveInvisibleCount| : the number of consecutive frames for % which the track was not detected (invisible). % % Noisy detections tend to result in short-lived tracks. For this reason, % the example only displays an object after it was tracked for some number % of frames. This happens when |totalVisibleCount| exceeds a specified % threshold. % % When no detections are associated with a track for several consecutive % frames, the example assumes that the object has left the field of view % and deletes the track. This happens when |consecutiveInvisibleCount| % exceeds a specified threshold. A track may also get deleted as noise if % it was tracked for a short time, and marked invisible for most of the of % the frames. function tracks = initializeTracks() % create an empty array of tracks tracks = struct(... 'id', {}, ... %軌跡ID 'bbox', {}, ... %外接矩形 'kalmanFilter', {}, ...%軌跡的卡爾曼濾波器 'age', {}, ...%總數量 'totalVisibleCount', {}, ...%可視數量 'consecutiveInvisibleCount', {});%不可視數量 end %% Read a Video Frame % Read the next video frame from the video file. function frame = readFrame() frame = obj.reader.step();%激活讀圖函數 end %% Detect Objects % The |detectObjects| function returns the centroids and the bounding boxes % of the detected objects. It also returns the binary mask, which has the % same size as the input frame. Pixels with a value of 1 correspond to the % foreground, and pixels with a value of 0 correspond to the background. % % The function performs motion segmentation using the foreground detector. % It then performs morphological operations on the resulting binary mask to % remove noisy pixels and to fill the holes in the remaining blobs. function [centroids, bboxes, mask] = detectObjects(frame) % detect foreground mask = obj.detector.step(frame); % apply morphological operations to remove noise and fill in holes mask = imopen(mask, strel('rectangle', [3,3]));%開運算 mask = imclose(mask, strel('rectangle', [15, 15])); %閉運算 mask = imfill(mask, 'holes');%填洞 % perform blob analysis to find connected components [~, centroids, bboxes] = obj.blobAnalyser.step(mask); end %% Predict New Locations of Existing Tracks % Use the Kalman filter to predict the centroid of each track in the % current frame, and update its bounding box accordingly. function predictNewLocationsOfTracks() for i = 1:length(tracks) bbox = tracks(i).bbox; % predict the current location of the track predictedCentroid = predict(tracks(i).kalmanFilter);%根據之前的軌跡,預測當前位置 % shift the bounding box so that its center is at % the predicted location predictedCentroid = int32(predictedCentroid) - bbox(3:4) / 2; tracks(i).bbox = [predictedCentroid, bbox(3:4)];%真正的當前位置 end end %% Assign Detections to Tracks % Assigning object detections in the current frame to existing tracks is % done by minimizing cost. The cost is defined as the negative % log-likelihood of a detection corresponding to a track. % % The algorithm involves two steps: % % Step 1: Compute the cost of assigning every detection to each track using % the |distance| method of the |vision.KalmanFilter| System object. The % cost takes into account the Euclidean distance between the predicted % centroid of the track and the centroid of the detection. It also includes % the confidence of the prediction, which is maintained by the Kalman % filter. The results are stored in an MxN matrix, where M is the number of % tracks, and N is the number of detections. % % Step 2: Solve the assignment problem represented by the cost matrix using % the |assignDetectionsToTracks| function. The function takes the cost % matrix and the cost of not assigning any detections to a track. % % The value for the cost of not assigning a detection to a track depends on % the range of values returned by the |distance| method of the % |vision.KalmanFilter|. This value must be tuned experimentally. Setting % it too low increases the likelihood of creating a new track, and may % result in track fragmentation. Setting it too high may result in a single % track corresponding to a series of separate moving objects. % % The |assignDetectionsToTracks| function uses the Munkres' version of the % Hungarian algorithm to compute an assignment which minimizes the total % cost. It returns an M x 2 matrix containing the corresponding indices of % assigned tracks and detections in its two columns. It also returns the % indices of tracks and detections that remained unassigned. function [assignments, unassignedTracks, unassignedDetections] = ... detectionToTrackAssignment() nTracks = length(tracks); nDetections = size(centroids, 1); % compute the cost of assigning each detection to each track cost = zeros(nTracks, nDetections); for i = 1:nTracks cost(i, :) = distance(tracks(i).kalmanFilter, centroids);%損失矩陣計算 end % solve the assignment problem costOfNonAssignment = 20; [assignments, unassignedTracks, unassignedDetections] = ... assignDetectionsToTracks(cost, costOfNonAssignment);%匈牙利算法匹配 end %% Update Assigned Tracks % The |updateAssignedTracks| function updates each assigned track with the % corresponding detection. It calls the |correct| method of % |vision.KalmanFilter| to correct the location estimate. Next, it stores % the new bounding box, and increases the age of the track and the total % visible count by 1. Finally, the function sets the invisible count to 0. function updateAssignedTracks() numAssignedTracks = size(assignments, 1); for i = 1:numAssignedTracks trackIdx = assignments(i, 1); detectionIdx = assignments(i, 2); centroid = centroids(detectionIdx, :); bbox = bboxes(detectionIdx, :); % correct the estimate of the object's location % using the new detection correct(tracks(trackIdx).kalmanFilter, centroid); % replace predicted bounding box with detected % bounding box tracks(trackIdx).bbox = bbox; % update track's age tracks(trackIdx).age = tracks(trackIdx).age + 1; % update visibility tracks(trackIdx).totalVisibleCount = ... tracks(trackIdx).totalVisibleCount + 1; tracks(trackIdx).consecutiveInvisibleCount = 0; end end %% Update Unassigned Tracks % Mark each unassigned track as invisible, and increase its age by 1. function updateUnassignedTracks() for i = 1:length(unassignedTracks) ind = unassignedTracks(i); tracks(ind).age = tracks(ind).age + 1; tracks(ind).consecutiveInvisibleCount = ... tracks(ind).consecutiveInvisibleCount + 1; end end %% Delete Lost Tracks % The |deleteLostTracks| function deletes tracks that have been invisible % for too many consecutive frames. It also deletes recently created tracks % that have been invisible for too many frames overall. function deleteLostTracks() if isempty(tracks) return; end invisibleForTooLong = 10; ageThreshold = 8; % compute the fraction of the track's age for which it was visible ages = [tracks(:).age]; totalVisibleCounts = [tracks(:).totalVisibleCount]; visibility = totalVisibleCounts ./ ages; % find the indices of 'lost' tracks lostInds = (ages < ageThreshold & visibility < 0.6) | ... [tracks(:).consecutiveInvisibleCount] >= invisibleForTooLong; % delete lost tracks tracks = tracks(~lostInds); end %% Create New Tracks % Create new tracks from unassigned detections. Assume that any unassigned % detection is a start of a new track. In practice, you can use other cues % to eliminate noisy detections, such as size, location, or appearance. function createNewTracks() centroids = centroids(unassignedDetections, :); bboxes = bboxes(unassignedDetections, :); for i = 1:size(centroids, 1) centroid = centroids(i,:); bbox = bboxes(i, :); % create a Kalman filter object kalmanFilter = configureKalmanFilter('ConstantVelocity', ... centroid, [200, 50], [100, 25], 100); % create a new track newTrack = struct(... 'id', nextId, ... 'bbox', bbox, ... 'kalmanFilter', kalmanFilter, ... 'age', 1, ... 'totalVisibleCount', 1, ... 'consecutiveInvisibleCount', 0); % add it to the array of tracks tracks(end + 1) = newTrack; % increment the next id nextId = nextId + 1; end end %% Display Tracking Results % The |displayTrackingResults| function draws a bounding box and label ID % for each track on the video frame and the foreground mask. It then % displays the frame and the mask in their respective video players. function displayTrackingResults() % convert the frame and the mask to uint8 RGB frame = im2uint8(frame); mask = uint8(repmat(mask, [1, 1, 3])) .* 255; minVisibleCount = 8; if ~isempty(tracks) % noisy detections tend to result in short-lived tracks % only display tracks that have been visible for more than % a minimum number of frames. reliableTrackInds = ... [tracks(:).totalVisibleCount] > minVisibleCount; reliableTracks = tracks(reliableTrackInds); % display the objects. If an object has not been detected % in this frame, display its predicted bounding box. if ~isempty(reliableTracks) % get bounding boxes bboxes = cat(1, reliableTracks.bbox); % get ids ids = int32([reliableTracks(:).id]); % create labels for objects indicating the ones for % which we display the predicted rather than the actual % location labels = cellstr(int2str(ids')); predictedTrackInds = ... [reliableTracks(:).consecutiveInvisibleCount] > 0; isPredicted = cell(size(labels)); isPredicted(predictedTrackInds) = {' predicted'}; labels = strcat(labels, isPredicted); % draw on the frame frame = insertObjectAnnotation(frame, 'rectangle', ... bboxes, labels); % draw on the mask mask = insertObjectAnnotation(mask, 'rectangle', ... bboxes, labels); end end % display the mask and the frame obj.maskPlayer.step(mask); obj.videoPlayer.step(frame); end %% Summary % This example created a motion-based system for detecting and % tracking multiple moving objects. Try using a different video to see if % you are able to detect and track objects. Try modifying the parameters % for the detection, assignment, and deletion steps. % % The tracking in this example was solely based on motion with the % assumption that all objects move in a straight line with constant speed. % When the motion of an object significantly deviates from this model, the % example may produce tracking errors. Notice the mistake in tracking the % person labeled #12, when he is occluded by the tree. % % The likelihood of tracking errors can be reduced by using a more complex % motion model, such as constant acceleration, or by using multiple Kalman % filters for every object. Also, you can incorporate other cues for % associating detections over time, such as size, shape, and color. displayEndOfDemoMessage(mfilename) end