Recently, there is a short Q&A on lvm IRC about pvmove, which you may find it's interesting:ios
7:39:51 AM - arteomp: Hello, is it normal that pvmove runned for ~48 hours (migration of 2Tb lv with ~15Mb/s) was used at the end ~3Gb of mem. Memory usage was growing all the time of pvmove was running 7:40:22 AM - arteomp: can it be memory leak or just normal behavior? 7:40:26 AM - agk: arteomp: what memory? 7:40:29 AM - agk: - what was using it? 7:40:33 AM - arteomp: rss 7:40:37 AM - agk: for what 7:41:02 AM - agk: a process called something? 7:41:21 AM - agk: - which options did you use? what version? 7:41:55 AM - agk: - you should use the polld version 7:41:59 AM - arteomp: root 19461 1.2 0.4 648388 528572 pts/13 S+ 17:50 4:17 \_ pvmove -i 5 -v -n b2eccd07-5615-4a28-8d20-da3593a6e33f /dev/mapper/26551727a4d686b41426776484b4f4f61 /dev/mapper/360425c510058702b606b4cec000000d3 7:42:03 AM - arteomp: it was something like this 7:42:12 AM - agk: so 5 second polling within the process itself 7:42:14 AM - arteomp: but was 3Gb, not 500mb as now 7:42:32 AM - agk: pvmove has lots of different modes of operation 7:42:55 AM - agk: you could try other modes, but really, what version is this 7:43:06 AM - agk: if it's recent, then file a bug 7:43:19 AM - agk: if it's old, don't bother unless you reproduce on a recent version 7:43:20 AM - arteomp: LVM version: 2.02.111(2)-RHEL6 (2014-09-01) 7:43:25 AM - arteomp: Driver version: 4.27.0 7:43:25 AM - agk: that's very old 7:43:59 AM - agk: if the memory growth is a problem, just kill the process (CTRL-c should do) 7:44:02 AM - agk: then restart it 7:44:24 AM - agk: read the man page and you'll see why this is safe 7:44:32 AM - agk: - the state is in kernel plus on disk 7:45:01 AM - agk: the process then just loops - in your case every 5 seconds - asking the kernel "have you finished the part I just asked you to do yet?" 7:45:09 AM - agk: you can kill that safely 7:45:25 AM - agk: and just restart it by running pvmove again 7:45:42 AM - agk: - it'll see one in progress and go straight into the polling loop again 7:46:01 AM - agk: when kernel has finished, userspace commits that bit to disk and moves to the next part and asks the kernel to do that 7:46:23 AM - agk: so yes, 3Gb must be a bug, but it's not important as you can just kill the process and start it again 7:46:42 AM - agk: it survives a reboot in a similar way 7:47:20 AM - agk: the process blocks CTRL-c while it is in the part of the loop that is updating the kernel/disk 7:47:46 AM - agk: so it only exits at a safe place 7:48:53 AM - arteomp: agk: yes, thk you, i know i can kill it safely, just was curious about the reason 7:49:04 AM - arteomp: maybe the reason is 5 sec polling 7:49:12 AM - agk: sounds like a leak in that oop 7:49:13 AM - arteomp: will test later other interval 7:49:14 AM - agk: loop 7:49:31 AM - agk: if you want to find out, recompile it with the valgrind options 7:49:37 AM - agk: and run under valgrind 7:49:53 AM - agk: (or maybe those got added after that version) 7:50:03 AM - agk: (- just options to make it more friendlt) 7:51:50 AM - coughlan has left the room (Quit: Ping timeout: 256 seconds). 7:52:23 AM - arteomp: btw, its possible to use it without polling 7:52:29 AM - arteomp: and monitor progress by: 7:52:32 AM - arteomp: dmsetup status|grep mirr 7:52:40 AM - arteomp: ps1-pvmove0: 0 2097152000 mirror 2 253:5 253:31 713411/2048000 1 AA 1 core 7:54:52 AM - arteomp: also sadly that pvmove doesnt support option to pass "handle_errors" to the dm-mirror 7:56:20 AM - arteomp: it makes you to check dmesg every time after pvmove ended (were or no i/o erros) 8:00:13 AM - arteomp: agk: from man page: "If pvmove is interrupted for any reason (e.g. the machine crashes) then run pvmove again without any PV arguments to restar any operations that were in progress from the last checkpoint" 8:00:51 AM - arteomp: agk: but how often does pvmove this checkpoints? is it correlated with cheking interval? 8:01:23 AM - arteomp: this moment isnt clear at the documentation/man page. 8:04:47 AM - arteomp: yes said: "when kernel has finished, userspace commits that bit to disk and moves to the next part and asks the kernel to do that" 8:05:42 AM - arteomp: but pvmove create dm-mirror with 1024 region_size, so kcopyd use 512K chunk for transfer 8:06:01 AM - arteomp: does userspace commit results every 512k? 8:12:39 AM - arteomp: i feel it must save result not every time "when kernel has finished" but every time when kernel has finished and entire PE was moved but need checking 8:26:13 AM - coughlan [~coughlan@pool-173-76-169-189.bstnma.fios.verizon.net] entered the room. 8:30:50 AM - agk: no, it does it in blocks of contiguous extents 8:31:09 AM - agk: dmsetup status shows the x/y for the bit it is doing 8:31:20 AM - agk: you'll see the numbers grow till they are equal 8:36:40 AM - arteomp: agk: but its status of the segment syncing 8:36:52 AM - arteomp: but what if i have one big segment 8:39:09 AM - arteomp: for example: ps1-pvmove0: 0 2097152000 mirror 2 253:5 253:31 794933/2048000 1 AA 1 core 8:39:23 AM - arteomp: 2097152000*512/1024/1024/1024 = 1000Gb 8:39:43 AM - arteomp: so its one segment with 1000Gb size 8:41:26 AM - arteomp: dm-mirror has type "core", so there are only in-memory log 8:41:46 AM - arteomp: if i will reboot host then the progress 794933/2048000 will not be saved 8:42:07 AM - arteomp: but from which poing will it be started? 8:43:38 AM - arteomp: s/poing/point 9:05:25 AM - agk: don't reboot 9:05:30 AM - agk: just kill the process 9:05:36 AM - agk: it's only polling 9:05:56 AM - agk: if it's too big you can use the :range syntax to move less at once 9:06:05 AM - agk: /dev/blah:0-10000 or whatever 9:08:07 AM - arteomp: agk: im not going to reboot im just about ur quote: "it survives a reboot in a similar way" 9:09:21 AM - arteomp: so i see right now, after reboot it will not be start from last 512K chunk was synced, but from last checkpoint 9:10:33 AM - arteomp: it will start from last checkpoint, is last check point the start of segment? 9:11:15 AM - arteomp: if yes, then as u said its more save to use "/dev/blah:0-10000 in loop" 9:11:25 AM - arteomp: s/save/safe 9:18:06 AM - agk: yes 9:18:27 AM - agk: you can define your own checkpoints by doing multiple pvmoves with the range notation 9:18:36 AM - agk: or send a patch to break it up:) 9:18:48 AM - agk: (would be quite a complicated patch though)