#20875 closed defect (fixed)
pdmBlkCacheEvictPagesFrom leaking locks and causing VM deadlocks [FIXED IN SVN]
回報者: | aaronk | 負責人: | |
---|---|---|---|
元件: | VMM | 版本: | VirtualBox 6.1.32 |
關鍵字: | deadlock cache | 副本: | |
Guest type: | Linux | Host type: | Linux |
描述
I was experiencing VM I/O deadlocks. I analyzed one instance and I could see there were three threads that looked to have deadlocked:
Thread 20 (Thread 0x7f23cd9b1700 (LWP 13085)): ============================================== #0 0x00007f240286ba35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f2401f79458 in RTSemEventWait () from /usr/lib/virtualbox/VBoxRT.so #2 0x00007f2401f1ddfa in RTCritSectEnter () from /usr/lib/virtualbox/VBoxRT.so #3 0x00007f23f4acdac6 in PDMR3BlkCacheWrite () from /usr/lib/virtualbox/components/VBoxVMM.so
Thread 19 (Thread 0x7f23cd430700 (LWP 13086)): ============================================== #0 0x00007f240286b39e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0 #1 0x00007f2401f739c1 in RTSemRWRequestWrite () from /usr/lib/virtualbox/VBoxRT.so #2 0x00007f23f4acbd9b in pdmBlkCacheEvictPagesFrom(PDMBLKCACHEGLOBAL*, unsigned long, PDMBLKLRULIST*, PDMBLKLRULIST*, bool, unsigned char**) [clone .isra.5] () from /usr/lib/virtualbox/components/VBoxVMM.so #3 0x00007f23f4acbfe5 in pdmBlkCacheReclaim(PDMBLKCACHEGLOBAL*, unsigned long, bool, unsigned char**) [clone .part.6] [clone .constprop.9] () from /usr/lib/virtualbox/components/VBoxVMM.so #4 0x00007f23f4acdbca in PDMR3BlkCacheWrite () from /usr/lib/virtualbox/components/VBoxVMM.so
Thread 32 (Thread 0x7f23cda32700 (LWP 30856)): ============================================== #0 0x00007f240286b39e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0 #1 0x00007f2401f739c1 in RTSemRWRequestWrite () from /usr/lib/virtualbox/VBoxRT.so #2 0x00007f23f4ace04e in PDMR3BlkCacheIoXferComplete () from /usr/lib/virtualbox/components/VBoxVMM.so
The first thread was waiting on a lock on a pCache object and the 2nd and 3rd threads were waiting on pBlkCache->SemRWEntries.
It turns out that thread 19 was holding the lock on pCache which I think is why Thread 20 was blocked. The only answer I could find as to why the pBlkCache->SemRWEntries lock was being held was due to an imbalance in the pdmBlkCacheEvictPagesFrom function. This the first time I’ve really looked at the virtualbox source, so I’m not overly sure but it seems as though the imbalance in pdmBlkCacheEvictPagesFrom would only be experienced under situations of high contention over VirtualBox’s write cache.
Here’s the fix I’ve tried and it appears to avoid the deadlocks:
Index: src/VBox/VMM/VMMR3/PDMBlkCache.cpp =================================================================== --- src/VBox/VMM/VMMR3/PDMBlkCache.cpp (revision 93537) +++ src/VBox/VMM/VMMR3/PDMBlkCache.cpp (working copy) @@ -460,7 +460,10 @@ RTSemRWReleaseWrite(pBlkCache->SemRWEntries); RTMemFree(pCurr); } - } + } else { + LogRel(("Would have left without releasing pBlkCache->SemRWEntries")); + RTSemRWReleaseWrite(pBlkCache->SemRWEntries); + } } else
更動歷史 (2)
comment:1 3 年 前 由 編輯
摘要: | pdmBlkCacheEvictPagesFrom leaking locks and causing VM deadlocks → pdmBlkCacheEvictPagesFrom leaking locks and causing VM deadlocks [FIXED IN SVN] |
---|
comment:2 3 年 前 由 編輯
狀態: | new → closed |
---|---|
處理結果: | → fixed |
The issue should be fixed in VirtualBox 6.1.34. Closing.
Thanks for debugging and tracking this down. The semaphore handling is certainly out of balance there in one code path. I've committed a similar fix (logging + style differs). May take a little while before this becomes externally visible, though. It will be included in 6.1.36, but it might just have missed the boat for 6.1.34, we'll see...