VirtualBox

12 年 前 建立

11 年 前 結束

11 年 前 更新

#11610 closed defect (fixed)

BUG: unable to handle kernel paging request

回報者: csreynolds 負責人:
元件: other 版本: VirtualBox 4.2.10
關鍵字: kernel 副本:
Guest type: all Host type: Linux

描述

VM will start up, function for a random amount of time and then freeze. Has to be killed from command line.

附加檔案 (24)

messages (7.2 KB ) - 12 年 前, 由 csreynolds 新增
VBox.log (62.6 KB ) - 12 年 前, 由 csreynolds 新增
virtualbox-4.2.10-2-linux-3.8.5-1-oops.log (4.8 KB ) - 12 年 前, 由 sl4mmy 新增
VBox.log.1 (62.2 KB ) - 12 年 前, 由 sl4mmy 新增
This log is from when I couldn't boot a Windows virtual machine after upgrading my ArchLinux system to virtualbox-4.2.10-2 and linux-3.8.5-1.
VBox.2.log (56.3 KB ) - 12 年 前, 由 sl4mmy 新增
This log is from successfully booting the same Windows virtual machine after downgrading back to virtualbox-4.2.8-1 and linux-3.7.10-1
fedora-18-oops.txt (2.8 KB ) - 12 年 前, 由 Ronan SALMON 新增
kernel oops starting a VM on a up-to-date fedora 18 Linux bureau 3.8.5-201.fc18.i686.PAE #1 SMP Thu Mar 28 21:50:08 UTC 2013 i686 i686 i386 GNU/Linux VirtualBox-4.2-4.2.10_84104_fedora18-1.i686
fc-18-oops.txt (4.7 KB ) - 12 年 前, 由 csreynolds 新增
still happening with kernel 3.8.6-203.fc18.x86_64 and VirtualBox 4.2.12 I have tried re-installing header/devel rpms and re-running vboxdrv setup to see if it cleared up like frank, same issue.
cpuinfo (13.8 KB ) - 12 年 前, 由 timemaster 新增
cpuinfo of affected system
cpuinfo.2 (20.8 KB ) - 12 年 前, 由 sl4mmy 新增
Output of /proc/cpuinfo from my host machine (Archlinux, kernel v3.9.2, VirtualBox 4.2.12)
kernel_vboxissue.log (3.4 KB ) - 12 年 前, 由 dboy 新增
Kernel Oops, VirtualBox 4.2.12
virtualbox-4.2.12-3-linux-3.9.3-1-oops.log (5.3 KB ) - 11 年 前, 由 sl4mmy 新增
Here is another kernel log from my system running Linux 3.9.3 and VirtualBox 4.2.12.
VirtualBox-dies.log (84.1 KB ) - 11 年 前, 由 rmflight 新增
rmflight dmesg output
oops.txt (4.8 KB ) - 11 年 前, 由 p5n 新增
virtualbox-4.2.51-linux-3.9.3-oops.txt (90.8 KB ) - 11 年 前, 由 sl4mmy 新增
Complete dmesg of kernel oops produced using test build 4.2.51.
host_uname (111 位元組 ) - 11 年 前, 由 wenns 新增
host_dmesg (93.5 KB ) - 11 年 前, 由 wenns 新增
host_cpuinfo (20.8 KB ) - 11 年 前, 由 wenns 新增
host_lsmod (2.2 KB ) - 11 年 前, 由 wenns 新增
host_meminfo (1.2 KB ) - 11 年 前, 由 wenns 新增
host_vb_version (1.9 KB ) - 11 年 前, 由 wenns 新增
vb_crash_dataset.tar.gz (29.9 KB ) - 11 年 前, 由 wenns 新增
virtualbox-4.2.51-linux-3.9.4-oops.txt (91.9 KB ) - 11 年 前, 由 sl4mmy 新增
Full system log of crash with VirtualBox-4.2.51-85953 and Linux 3.9.4
VBoxSVC.log (2.2 KB ) - 11 年 前, 由 sl4mmy 新增
VirtualBox service log from crash with VirtualBox 4.2.51-85953 and Linux 3.9.4
sl4mmy-virtualbox-4.2.51-linux-3.9.4-vbox.log (51.0 KB ) - 11 年 前, 由 sl4mmy 新增
VBox.log from crash with VirtualBox 4.2.51-85953 and Linux 3.9.4

下載所有附檔: .zip

更動歷史 (85)

12 年 前csreynolds 編輯

附檔: 新增 messages

12 年 前csreynolds 編輯

附檔: 新增 VBox.log

comment:1 12 年 前csreynolds 編輯

This problem started after I upgraded to kernel 3.8.x. 3.7.x functions properly.

comment:2 12 年 前csreynolds 編輯

I can boot into an older kernel and i have no problems. Is there a way I can get more detailed info on why the crash is happening? I'd like to help resolve this issue if i can.

[creynolds@localhost trunk]$ uname -a Linux localhost.localdomain 3.6.10-4.fc18.x86_64 #1 SMP Tue Dec 11 18:01:27 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

The listed kernel above shows no problems at all. 3.7 also worked before I upgraded to 3.8

comment:3 12 年 前Stephen Rondeau 編輯

I had a similar experience with upgrading Fedora 17 from 3.6.10 to 3.8.3 -- VirtualBox 4.2.10 issued "unable to handle kernel paging" while writing to the virtual disk (was installing Fedora 17 Live CD to hard drive). So I tried VirtualBox 4.1.24 -- same problem. I changed to another computer with 3.8.3 kernel -- same problem. Reverted kernel to 3.6.10, and no problems were encountered.

Reported problem to Red Hat (Bug 929339), who said it was VirtualBox's problem.

comment:4 12 年 前sl4mmy 編輯

I encountered the same problem today on ArchLinux. I was running virtualbox-4.2.10-2 and linux-3.8.5-1. I had to downgrade back to virtualbox-4.2.8-1 and linux-3.7.10-1 in order to use my virtual machines again. I will upload the relevant snippet from /var/log/messages.log.

12 年 前sl4mmy 編輯

附檔: 新增 VBox.log.1

This log is from when I couldn't boot a Windows virtual machine after upgrading my ArchLinux system to virtualbox-4.2.10-2 and linux-3.8.5-1.

12 年 前sl4mmy 編輯

附檔: 新增 VBox.2.log

This log is from successfully booting the same Windows virtual machine after downgrading back to virtualbox-4.2.8-1 and linux-3.7.10-1

12 年 前Ronan SALMON 編輯

附檔: 新增 fedora-18-oops.txt

kernel oops starting a VM on a up-to-date fedora 18 Linux bureau 3.8.5-201.fc18.i686.PAE #1 SMP Thu Mar 28 21:50:08 UTC 2013 i686 i686 i386 GNU/Linux VirtualBox-4.2-4.2.10_84104_fedora18-1.i686

comment:5 12 年 前Ronan SALMON 編輯

actually, as I read again the description of this ticket, I realized that my issue is probably not the same as this one since VMs don't start. I get a kernel Oops trying to start them.

回覆:  5 comment:6 12 年 前sl4mmy 編輯

Replying to rsalmon:

actually, as I read again the description of this ticket, I realized that my issue is probably not the same as this one since VMs don't start. I get a kernel Oops trying to start them.

Actually, the same is true in my case as well. The VM starts but the Oops happens at some random point while the guest is booting. I tried with both Windows XP and RHEL 6.3 guests. None ever booted into a usable state before the Oops occurred.

comment:7 12 年 前Frank Mehnert 編輯

Trying to find a pattern. It seems that Fedora 18 hosts are affected with Linux 3.8. I have a 64-bit Fedora 18 system running with a Linux kernel 3.8.5-201 installed. I have no problems starting 64-bit guests (e.g. Debian 6.0) or 32-bit guests (e.g. Windows XP).

You don't run KVM in parallel by any chance?

回覆:  7 comment:8 12 年 前Stephen Rondeau 編輯

Replying to frank:

Trying to find a pattern. It seems that Fedora 18 hosts are affected with Linux 3.8. I have a 64-bit Fedora 18 system running with a Linux kernel 3.8.5-201 installed. I have no problems starting 64-bit guests (e.g. Debian 6.0) or 32-bit guests (e.g. Windows XP).

You don't run KVM in parallel by any chance?

In my case, no.

The host was running Fedora 17 x86_64 3.8.3-103 (10GB RAM).

I had created a Fedora VM with 2GB RAM, a 15GB virtual disk and configured networking to be bridged to em2. I connected the Fedora 17 Live CD (x86) .iso file, and clicked on "install to hard drive". It was during the last step -- the installation of packages to the virtual disk -- that I would encounter the kernel paging error. The point at which it was encountered varied -- one time it was fairly early in the package installation process, while at another time it was near the end.

I don't believe there were any other VMs active at the time.

When I changed the host's kernel back to 3.6.10, I didn't encounter any problems.

I have many other hosts running 3.8.3-103, but with an existing Windows 7 VM, and I haven't seen any problems running them. It seemed to be tied to writing a lot to the virtual disk.

回覆:  7 comment:9 12 年 前Ronan SALMON 編輯

Replying to frank:

Trying to find a pattern. It seems that Fedora 18 hosts are affected with Linux 3.8. I have a 64-bit Fedora 18 system running with a Linux kernel 3.8.5-201 installed. I have no problems starting 64-bit guests (e.g. Debian 6.0) or 32-bit guests (e.g. Windows XP).

You don't run KVM in parallel by any chance?

I don't run KVM. Now, I'm not sure of what I've done but I no longer get a kernel Oops when starting a VM. I forced a reinstall of the kernel and kernel's header/devel files, then rerun vboxdrv setup. may be I had a problem with the devel files.

kernel is 3.8.5-201.fc18.i686.PAE and I was able to start a debian 32bits.

12 年 前csreynolds 編輯

附檔: 新增 fc-18-oops.txt

still happening with kernel 3.8.6-203.fc18.x86_64 and VirtualBox 4.2.12 I have tried re-installing header/devel rpms and re-running vboxdrv setup to see if it cleared up like frank, same issue.

comment:10 12 年 前sl4mmy 編輯

I noticed at least one difference in the /var/log/messages between the last version of VirtualBox that worked on my machine and all of the versions that failed: just before the kernel Oops message there is a line logging that a network device entered promiscuous mode.

In the working versions of VirtualBox the device is vboxnet0 or vboxnet1, but in the versions that don't work the device is eth0. You can see an example of this at line 1 in the log snippet I originally posted: https://www.alldomusa.eu.org/attachment/ticket/11610/virtualbox-4.2.10-2-linux-3.8.5-1-oops.log#L1

The same can be seen here in the original attachment posted by csreynolds: https://www.alldomusa.eu.org/attachment/ticket/11610/messages#L17

Unfortunately, the other snippets of /var/log/messages posted in this thread trimmed the "device XYZ entered promiscuous mode" lines.

I wonder if this is consistent for others experiencing this issue. Do the versions of VirtualBox that fail always log the name of the physical interface before the kernel Oops, and the versions of VirtualBox that work fine always log the name of one of the vboxnet interfaces?

Does anyone know of any changes in VirtualBox 4.2.10+ or Linux 3.8+ that would affect which device the vboxdrv, vboxnetadp or vboxnetflt kernel modules try to switch into promiscuous mode?

comment:11 12 年 前sergiomb 編輯

hi , the kernel on host or kernel on guest ?

12 年 前timemaster 編輯

附檔: 新增 cpuinfo

cpuinfo of affected system

comment:12 12 年 前timemaster 編輯

Hi All, I think I was able to nail down the problem. I played with different configuration for many virtual machine and found some working configuration.

It all comes down to under System settings, "Acceleration" tab, nested paging (AMD-V) or EPT (Intel VT-x)

under System settings, "Processor" tab, Enable PAE/NX

under System settings, "Acceleration" tab, hardware virtualization (AMD-V) (Intel VT-x) (first checkbox)

Generally, the problem arise when you have activated hardware virtualization or have multiple cpu (which automatically activate it)

I was doing some tests with System Rescue CD iso file and an installed CromeBook OS

Working without problem: System rescue cd and chrome os will boot, run and wait for user input at the prompt or interface.

  • PAE NX on
  • 1 processor
  • VT-d off
  • nested page off

system rescue cd will work as above, chrome will not because it need a pae kernel, anyway.

  • PAE NX off
  • 1 processor
  • VT-d off
  • nested pages off

Not working : system rescue cd will show a fully working boot menu, when starting the default kernel will hang after third line "Probing EDD (edd=off to disable)... ok

chrome will fail silently

  • pae nx on
  • 1 processor
  • vt-d on
  • nested pages off

system rescue cd, same as above

chrome will fail silently

  • pae nx on
  • 1 processor
  • vt-d on
  • nested pages on

system rescue cd, same as above

chrome will fail silently

  • pae nx on
  • 1 processor
  • vt-d on
  • nested pages on

system rescue cd, will boot but fail before running to user login. multiple run will crash at different places

chrome will fail silently

  • pae nx off
  • 2 processor
  • vt-d on
  • nested pages off

system rescue cd, will boot but fail before running to user login. multiple run will crash at different places

chrome will fail silently

  • pae nx on
  • 2 processor
  • vt-d on
  • nested pages on

I could not see differences between nat and bridged network. (bridget network put the interface into promiscous mode) I could not see

Under Arch linux, affected kernel https://bugs.archlinux.org/task/34399 also occur with 3.9.2-1-ARCH virtualbox 4.2.12_OSE_r84980

attached in my cpuinfo, I have an intel westmere cpu.

最後由 Frank Mehnert 編輯於 12 年 前 (上一筆) (差異)

回覆:  11 ; comment:13 12 年 前sl4mmy 編輯

Replying to sergiomb:

hi , the kernel on host or kernel on guest ?

On the host, but I believe that is a red herring. timemaster's workaround works for me, too.

回覆:  12 comment:14 12 年 前sl4mmy 編輯

Replying to timemaster:

Generally, the problem arise when you have activated hardware virtualization or have multiple cpu (which automatically activate it)

I was doing some tests with System Rescue CD iso file and an installed CromeBook OS

Disabling VT-x/AMD-V worked around the problem with my 32-bit Windows XP virtual machine, unfortunately that won't work for any 64-bit virtual machines. Disabling PAE/NX for a 64-bit vm seems to help it run a little longer before the kernel Oops occurs (for example, with PAE/NX enabled my 64-bit vm trips over consistently while the vm is booting, but with PAE/NX disabled it boots fine and is usable), but it does eventually happen every time for me.

This is with my host system running Linux 3.9.2 and VirtualBox 4.2.12. I'll post the output of /proc/cpuinfo as well.

12 年 前sl4mmy 編輯

附檔: 新增 cpuinfo.2

Output of /proc/cpuinfo from my host machine (Archlinux, kernel v3.9.2, VirtualBox 4.2.12)

回覆:  13 ; comment:15 12 年 前sergiomb 編輯

Replying to sl4mmy:

Replying to sergiomb:

hi , the kernel on host or kernel on guest ?

On the host, but I believe that is a red herring. timemaster's workaround works for me, too.

what or where is timemaster's workaround ?

comment:16 12 年 前Frank Mehnert 編輯

We still cannot reproduce this problem. I would be interested to see more kernel logs after a VM crashed like described above.

12 年 前dboy 編輯

附檔: 新增 kernel_vboxissue.log

Kernel Oops, VirtualBox 4.2.12

回覆:  15 comment:17 11 年 前sl4mmy 編輯

Replying to sergiomb:

what or where is timemaster's workaround ?

https://www.alldomusa.eu.org/ticket/11610#comment:12

回覆:  16 ; comment:18 11 年 前sl4mmy 編輯

Hi, frank-

Replying to frank:

We still cannot reproduce this problem. I would be interested to see more kernel logs after a VM crashed like described above.

What kernel version are you running on the host? What kind of processor does the host have?

I exchanged some private emails with the maintainer of the Archlinux package. He says he can't reproduce the problem, either. I'm not sure what the common trigger is between all of the affected systems...

11 年 前sl4mmy 編輯

Here is another kernel log from my system running Linux 3.9.3 and VirtualBox 4.2.12.

回覆:  18 ; comment:19 11 年 前Quickbooks Office 編輯

@sl4mmy: Care to post the whole dmesg log instead of just the oops portion? And the number of guests running simultanously when you got the oops?

Also you might want to see if you can reproduce this issue with the below test build (major rewrite of the VT-x code including many bug fixes and performance improvements)

http://www.alldomusa.eu.org/download/testcase/VirtualBox-4.2.51-85607-Linux_amd64.run

http://www.alldomusa.eu.org/download/testcase/Oracle_VM_VirtualBox_Extension_Pack-4.2.51-85607.vbox-extpack

comment:20 11 年 前wenns 編輯

Hi all,

Im experiencing the same behavior: VM's boot up and freeze later, which can be easily reproduced by writing big amounts of data to the virtual disc. In my case I do a big "svn co " and the hang happens usually after svn has written ~ 1 GB.

Interestingly enough, It happens only on my server hardware: HP Prolient with 64 GB Memory and 24 Intel Xeon Cores. Quite similar setup (same VirtualBox version, same kernel, save VM) on my workstation (Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz) works fine.

I can provide more details if necessary.

11 年 前rmflight 編輯

附檔: 新增 VirtualBox-dies.log

rmflight dmesg output

comment:21 11 年 前wenns 編輯

I've done some testing and begin to see patterns. On our side, this issue arises under following circumstances:

  • 64 bit guest (tested with Linux 3.8 and Windows 7)
  • 64 bit Linux host (Ubuntu Server in our case). Other host OSes not tested.
  • Intel Xeon hardware. Yes: it doesnt trigger on a Intel Core i7, host OS/guest OS/virtualizer all being equal.

Switching the following settings on and off doesnt matter:

  • PAE/NX
  • Nested Paging
  • VT-x/AMD-V (This one cannot be switched off on 64 guests, of course)

Also, the behavior under the posted test build (4.2.51) is still the same.

Hope that helps.

comment:22 11 年 前wenns 編輯

Oops. The above is not quite right: I just experienced the same issue with an 32 bit guest (Windows 7), hardware acceleration enabled. Will disable and recheck now.

11 年 前p5n 編輯

附檔: 新增 oops.txt

comment:23 11 年 前p5n 編輯

one more oops from kernel 3.9.3-1-ARCH and virtualbox 4.2.12-3

回覆:  22 ; comment:24 11 年 前sergiomb 編輯

Replying to wenns:

Oops. The above is not quite right: I just experienced the same issue with an 32 bit guest (Windows 7), hardware acceleration enabled. Will disable and recheck now.

can you precise what you disable ? graphics ? FYI, I found in one machine on linux host that one linux guest crash and X won't start when loads /usr/lib/modules/*/extra/VirtualBox/vboxvideo.ko , if I remove it before lunch X , everything works , just disable "seamless mode"

回覆:  19 ; comment:25 11 年 前sl4mmy 編輯

Hi, quickbooks-

Replying to quickbooks:

@sl4mmy: Care to post the whole dmesg log instead of just the oops portion? And the number of guests running simultanously when you got the oops?

Also you might want to see if you can reproduce this issue with the below test build (major rewrite of the VT-x code including many bug fixes and performance improvements)

I can reliably reproduce the issue with only a single guest running. Also, the test build you linked to still suffers from the same problem. I will post a complete dmesg from my test using that test build (4.2.51).

11 年 前sl4mmy 編輯

Complete dmesg of kernel oops produced using test build 4.2.51.

回覆:  20 comment:26 11 年 前sl4mmy 編輯

Hi, wenns-

Replying to wenns:

Interestingly enough, It happens only on my server hardware: HP Prolient with 64 GB Memory and 24 Intel Xeon Cores. Quite similar setup (same VirtualBox version, same kernel, save VM) on my workstation (Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz) works fine.

That is interesting. My desktop suffering from this problem has an Intel Xeon X5675 @ 3.07GHz.

回覆:  22 comment:27 11 年 前sl4mmy 編輯

Hi, wenns-

Replying to wenns:

Oops. The above is not quite right: I just experienced the same issue with an 32 bit guest (Windows 7), hardware acceleration enabled. Will disable and recheck now.

Did disabling hardware acceleration work around the problem with your 32-bit guests? With hardware acceleration enabled my 32 bit guests reliably trigger the kernel oops while they are booting, but with hardware acceleration enabled my 32 bit guests are usable (well... they're noticably slower ;)). Again, this is with my desktop machine with a Xeon X5675 @ 3.07GHz.

comment:28 11 年 前timemaster 編輯

So if I summaryse the cpu usage..... ....

csreynolds VBox.log show that he use Xeon X5675 @ 3.07GHz
sl4mmy VBox.log.1 and cpuinfo.2 show that he use Xeon X5675 @ 3.07GHz
timemaster (I) show in cpuinfo am using a Xeon E5620 @ 2.40GHz
wenns say that he is using Xeon processor, and his core i7 processor does not have this problem.
rmflight VirtualBox-dies.log show that he is using Xeon X5650 @ 2.67GHz

p5n ?
wenns ? detail please

That's a lot of Xeon processor in the range of ?56??... plus Wenns say that i7 processor are not affected..... Something worth to check.

最後由 timemaster 編輯於 11 年 前 (上一筆) (差異)

comment:29 11 年 前Quickbooks Office 編輯

Not sure which guest caused this, as I had 3+ guests running: 1 linux, 2 windows. 1 was installing a new copy of Win 7 64bit.

I have an Intel i3-3225.

May 24 19:09:19 localhost kernel: [11647.766537] EMT-0: page allocation failure: order:9, mode:0x344d2
May 24 19:09:19 localhost kernel: [11647.766541] Pid: 5366, comm: EMT-0 Tainted: PF        C O 3.9.3-201.fc18.x86_64 #1
May 24 19:09:19 localhost kernel: [11647.766542] Call Trace:
May 24 19:09:19 localhost kernel: [11647.766547]  [<ffffffff81139509>] warn_alloc_failed+0xe9/0x150
May 24 19:09:19 localhost kernel: [11647.766551]  [<ffffffff81658ae4>] ? __alloc_pages_direct_compact+0x182/0x194
May 24 19:09:19 localhost kernel: [11647.766553]  [<ffffffff8113d806>] __alloc_pages_nodemask+0x856/0xae0
May 24 19:09:19 localhost kernel: [11647.766557]  [<ffffffff8117c0c8>] alloc_pages_current+0xb8/0x190
May 24 19:09:19 localhost kernel: [11647.766570]  [<ffffffffa02bbd60>] rtR0MemObjLinuxAllocPages+0xc0/0x260 [vboxdrv]
May 24 19:09:19 localhost kernel: [11647.766577]  [<ffffffffa02bbf3a>] rtR0MemObjLinuxAllocPhysSub2+0x3a/0xe0 [vboxdrv]
May 24 19:09:19 localhost kernel: [11647.766583]  [<ffffffffa02bc0aa>] rtR0MemObjLinuxAllocPhysSub+0xca/0xd0 [vboxdrv]
May 24 19:09:19 localhost kernel: [11647.766589]  [<ffffffffa02bc479>] rtR0MemObjNativeAllocPhys+0x19/0x20 [vboxdrv]
May 24 19:09:19 localhost kernel: [11647.766595]  [<ffffffffa02ba314>] VBoxHost_RTR0MemObjAllocPhysExTag+0x64/0xb0 [vboxdrv]
May 24 19:09:19 localhost kernel: [11647.766608]  [<ffffffffa02bb89d>] ? rtR0MemAllocEx+0x17d/0x250 [vboxdrv]
May 24 19:09:19 localhost kernel: [11647.766613]  [<ffffffffa02bb89d>] ? rtR0MemAllocEx+0x17d/0x250 [vboxdrv]
May 24 19:09:19 localhost kernel: [11647.766618]  [<ffffffffa02b2db4>] ? supdrvIOCtl+0x1664/0x2be0 [vboxdrv]
May 24 19:09:19 localhost kernel: [11647.766623]  [<ffffffffa02bb89d>] ? rtR0MemAllocEx+0x17d/0x250 [vboxdrv]
May 24 19:09:19 localhost kernel: [11647.766628]  [<ffffffffa02ad47c>] ? VBoxDrvLinuxIOCtl_4_2_51+0x10c/0x1f0 [vboxdrv]
May 24 19:09:19 localhost kernel: [11647.766631]  [<ffffffff811b17e7>] ? do_vfs_ioctl+0x97/0x580
May 24 19:09:19 localhost kernel: [11647.766634]  [<ffffffff812a157a>] ? inode_has_perm.isra.32.constprop.62+0x2a/0x30
May 24 19:09:19 localhost kernel: [11647.766635]  [<ffffffff812a2c07>] ? file_has_perm+0x97/0xb0
May 24 19:09:19 localhost kernel: [11647.766637]  [<ffffffff811b1d61>] ? sys_ioctl+0x91/0xb0
May 24 19:09:19 localhost kernel: [11647.766640]  [<ffffffff81669f59>] ? system_call_fastpath+0x16/0x1b
May 24 19:09:19 localhost kernel: [11647.766641] Mem-Info:
May 24 19:09:19 localhost kernel: [11647.766642] Node 0 DMA per-cpu:
May 24 19:09:19 localhost kernel: [11647.766643] CPU    0: hi:    0, btch:   1 usd:   0
May 24 19:09:19 localhost kernel: [11647.766644] CPU    1: hi:    0, btch:   1 usd:   0
May 24 19:09:19 localhost kernel: [11647.766645] CPU    2: hi:    0, btch:   1 usd:   0
May 24 19:09:19 localhost kernel: [11647.766646] CPU    3: hi:    0, btch:   1 usd:   0
May 24 19:09:19 localhost kernel: [11647.766646] Node 0 DMA32 per-cpu:
May 24 19:09:19 localhost kernel: [11647.766648] CPU    0: hi:  186, btch:  31 usd:   0
May 24 19:09:19 localhost kernel: [11647.766648] CPU    1: hi:  186, btch:  31 usd:   0
May 24 19:09:19 localhost kernel: [11647.766649] CPU    2: hi:  186, btch:  31 usd:   0
May 24 19:09:19 localhost kernel: [11647.766650] CPU    3: hi:  186, btch:  31 usd:   0
May 24 19:09:19 localhost kernel: [11647.766650] Node 0 Normal per-cpu:
May 24 19:09:19 localhost kernel: [11647.766651] CPU    0: hi:  186, btch:  31 usd:   0
May 24 19:09:19 localhost kernel: [11647.766652] CPU    1: hi:  186, btch:  31 usd:   0
May 24 19:09:19 localhost kernel: [11647.766653] CPU    2: hi:  186, btch:  31 usd:   0
May 24 19:09:19 localhost kernel: [11647.766653] CPU    3: hi:  186, btch:  31 usd:   0
May 24 19:09:19 localhost kernel: [11647.766656] active_anon:194178 inactive_anon:4307 isolated_anon:0
May 24 19:09:19 localhost kernel: [11647.766656]  active_file:378144 inactive_file:835082 isolated_file:0
May 24 19:09:19 localhost kernel: [11647.766656]  unevictable:879 dirty:29 writeback:0 unstable:0
May 24 19:09:19 localhost kernel: [11647.766656]  free:58654 slab_reclaimable:32788 slab_unreclaimable:31894
May 24 19:09:19 localhost kernel: [11647.766656]  mapped:1056803 shmem:6914 pagetables:11294 bounce:0
May 24 19:09:19 localhost kernel: [11647.766656]  free_cma:0
May 24 19:09:19 localhost kernel: [11647.766658] Node 0 DMA free:15892kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
May 24 19:09:19 localhost kernel: [11647.766660] lowmem_reserve[]: 0 3436 15947 15947
May 24 19:09:19 localhost kernel: [11647.766662] Node 0 DMA32 free:64824kB min:14548kB low:18184kB high:21820kB active_anon:7812kB inactive_anon:0kB active_file:116kB inactive_file:96kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3631648kB managed:3518864kB mlocked:0kB dirty:0kB writeback:0kB mapped:8356kB shmem:4kB slab_reclaimable:376kB slab_unreclaimable:3972kB kernel_stack:48kB pagetables:1200kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
May 24 19:09:19 localhost kernel: [11647.766665] lowmem_reserve[]: 0 0 12510 12510
May 24 19:09:19 localhost kernel: [11647.766667] Node 0 Normal free:153900kB min:52968kB low:66208kB high:79452kB active_anon:768900kB inactive_anon:17228kB active_file:1512460kB inactive_file:3340232kB unevictable:3516kB isolated(anon):0kB isolated(file):0kB present:13074432kB managed:12811044kB mlocked:3516kB dirty:116kB writeback:0kB mapped:4218856kB shmem:27652kB slab_reclaimable:130776kB slab_unreclaimable:123596kB kernel_stack:2992kB pagetables:43976kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
May 24 19:09:19 localhost kernel: [11647.766670] lowmem_reserve[]: 0 0 0 0
May 24 19:09:19 localhost kernel: [11647.766671] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15892kB
May 24 19:09:19 localhost kernel: [11647.766677] Node 0 DMA32: 109*4kB (UEM) 87*8kB (UEM) 131*16kB (UEM) 56*32kB (UEM) 83*64kB (UEM) 60*128kB (UM) 33*256kB (UM) 13*512kB (UM) 15*1024kB (UEM) 8*2048kB (UM) 0*4096kB = 64860kB
May 24 19:09:19 localhost kernel: [11647.766684] Node 0 Normal: 5990*4kB (UEM) 2991*8kB (UEM) 1510*16kB (UEM) 783*32kB (EM) 341*64kB (EM) 74*128kB (UM) 30*256kB (UEM) 15*512kB (UEM) 10*1024kB (UM) 0*2048kB 0*4096kB = 154000kB
May 24 19:09:19 localhost kernel: [11647.766691] 1220696 total pagecache pages
May 24 19:09:19 localhost kernel: [11647.766692] 0 pages in swap cache
May 24 19:09:19 localhost kernel: [11647.766693] Swap cache stats: add 0, delete 0, find 0/0
May 24 19:09:19 localhost kernel: [11647.766693] Free swap  = 0kB
May 24 19:09:19 localhost kernel: [11647.766694] Total swap = 0kB
May 24 19:09:19 localhost kernel: [11647.795919] 4186111 pages RAM
May 24 19:09:19 localhost kernel: [11647.795922] 2599506 pages reserved
May 24 19:09:19 localhost kernel: [11647.795923] 1370459 pages shared
May 24 19:09:19 localhost kernel: [11647.795923] 1307461 pages non-shared

回覆:  25 ; comment:30 11 年 前Quickbooks Office 編輯

Replying to sl4mmy:

I can reliably reproduce the issue with only a single guest running. Also, the test build you linked to still suffers from the same problem. I will post a complete dmesg from my test using that test build (4.2.51).

Test Build (May 22)

Linux 64 Host: http://www.alldomusa.eu.org/download/testcase/VirtualBox-4.2.51-85953-Linux_amd64.run

Extension pack: http://www.alldomusa.eu.org/download/testcase/Oracle_VM_VirtualBox_Extension_Pack-4.2.51-85953.vbox-extpack

Can you upload a coredump of the guest + guest log file: https://www.alldomusa.eu.org/wiki/Core_dump

最後由 Quickbooks Office 編輯於 11 年 前 (上一筆) (差異)

回覆:  24 comment:31 11 年 前wenns 編輯

Replying to sergiomb:

Replying to wenns:

Oops. The above is not quite right: I just experienced the same issue with an 32 bit guest (Windows 7), hardware acceleration enabled. Will disable and recheck now.

can you precise what you disable ? graphics ?

I disable VT-x/AMD-V. And now it work reliably. I'll post an overview in a couple of minutes.

FYI, I found in one machine on linux host that one linux guest crash and X won't start when loads /usr/lib/modules/*/extra/VirtualBox/vboxvideo.ko , if I remove it before lunch X , everything works , just disable "seamless mode"

11 年 前wenns 編輯

附檔: 新增 host_uname

11 年 前wenns 編輯

附檔: 新增 host_dmesg

11 年 前wenns 編輯

附檔: 新增 host_cpuinfo

11 年 前wenns 編輯

附檔: 新增 host_lsmod

11 年 前wenns 編輯

附檔: 新增 host_meminfo

11 年 前wenns 編輯

附檔: 新增 host_vb_version

comment:32 11 年 前wenns 編輯

I'm glad there are people caring about the issue and interested in details. So here they are. In short: Im able to trigger this issue reliably under following conditions:

  1. A guest (OS doesnt seem to matter) with VT-x/AMD-V enabled is running on
  2. Intel Xeon [email protected], with Linux Ubuntu Server 64 Bit on top.

I tried a couple of Linux systems and Windows 7 (64 and 32 bit) as guests, all behave the same. I *didnt* try an other host OS.

See attachments for further details on the host platform.

回覆:  30 ; comment:33 11 年 前wenns 編輯

Replying to quickbooks:

Replying to sl4mmy:

I can reliably reproduce the issue with only a single guest running. Also, the test build you linked to still suffers from the same problem. I will post a complete dmesg from my test using that test build (4.2.51).

Test Build (May 22)

Linux 64 Host: http://www.alldomusa.eu.org/download/testcase/VirtualBox-4.2.51-85953-Linux_amd64.run

Extension pack: http://www.alldomusa.eu.org/download/testcase/Oracle_VM_VirtualBox_Extension_Pack-4.2.51-85953.vbox-extpack

Can you upload a coredump of the guest + guest log file: https://www.alldomusa.eu.org/wiki/Core_dump

I have a core dump now but its quite big (350 M gzipped). How can I pass it to you? I file that big cannot be attached to this thread.

回覆:  28 comment:34 11 年 前p5n 編輯

Replying to timemaster:

p5n ?
wenns ? detail please

CPU: Dual Xeon E5506

MB: Intel S5500BC

OS: ArchLinux

回覆:  33 comment:35 11 年 前Frank Mehnert 編輯

Replying to wenns:

I have a core dump now but its quite big (350 M gzipped). How can I pass it to you? I file that big cannot be attached to this thread.

Please look here for instructions how / where to upload the core dump.

comment:36 11 年 前Frank Mehnert 編輯

wenns and others, I'm also interested in another set of data: When this happens, please attach the VBox.log file from the VM session you are currently running together with the output of 'dmesg' from the host. I need both files from the same time for investigation. Thank you!

comment:37 11 年 前p5n 編輯

Uggly workaround:

ls -1d /sys/devices/system/cpu/cpu?/online | while read a; do echo 0 >$a; done

Yes, it dramatically slows down your host )

comment:38 11 年 前p5n 編輯

Actually switching off one of two CPUs helped me.

(I switched off all odd cores: 1,3,5,7)

回覆:  33 comment:39 11 年 前Quickbooks Office 編輯

Replying to wenns:

Replying to quickbooks:

Replying to sl4mmy:

I can reliably reproduce the issue with only a single guest running. Also, the test build you linked to still suffers from the same problem. I will post a complete dmesg from my test using that test build (4.2.51).

Test Build (May 22)

Linux 64 Host: http://www.alldomusa.eu.org/download/testcase/VirtualBox-4.2.51-85953-Linux_amd64.run

Extension pack: http://www.alldomusa.eu.org/download/testcase/Oracle_VM_VirtualBox_Extension_Pack-4.2.51-85953.vbox-extpack

Can you upload a coredump of the guest + guest log file: https://www.alldomusa.eu.org/wiki/Core_dump

I have a core dump now but its quite big (350 M gzipped). How can I pass it to you? I file that big cannot be attached to this thread.

Upload it to ftp://ftp.oracle.com/appsdev/incoming together with attaching log file, and then just post the file name here.

That way only Oracle Developer's can take a look at the core dump, as sometimes core dumps contain sensitive information.

You probably will need a FTP upload software like FileZilla or gFTP etc.

comment:40 11 年 前sergiomb 編輯

Hi , I just check and this is not my bug problem, I disable all CPU acceleration, and still hangs my laptop on resume a VM , sometimes seems my 6 gigas of swap is not enough. If you know other bug tickets that may address my problem, I was grateful that you show me .

Thanks,

回覆:  36 comment:41 11 年 前wenns 編輯

Replying to frank:

wenns and others, I'm also interested in another set of data: When this happens, please attach the VBox.log file from the VM session you are currently running together with the output of 'dmesg' from the host. I need both files from the same time for investigation. Thank you!

Here they are: see attachments, file vb_crash_dataset.tar.gz

11 年 前wenns 編輯

comment:42 11 年 前Frank Mehnert 編輯

Thanks wenns! We now see where it crashes but don't know yet why it crashes. Did you ever run an older kernel on your Xeon box with the same setup, so can you confirm that this is a Linux 3.8 regression? Or did you see the same crashes with older Linux kernels?

回覆:  30 comment:43 11 年 前sl4mmy 編輯

Hi, quickbooks-

Replying to quickbooks:

Test Build (May 22)

Linux 64 Host: http://www.alldomusa.eu.org/download/testcase/VirtualBox-4.2.51-85953-Linux_amd64.run

Extension pack: http://www.alldomusa.eu.org/download/testcase/Oracle_VM_VirtualBox_Extension_Pack-4.2.51-85953.vbox-extpack

Can you upload a coredump of the guest + guest log file: https://www.alldomusa.eu.org/wiki/Core_dump

I was able to reproduce the problem with the 85953 build. I uploaded a tarball with logs and coredumps named sl4mmy-virtualbox-4.2.51-linux-3.9.4-oops.tar.gz to the FTP site.

11 年 前sl4mmy 編輯

Full system log of crash with VirtualBox-4.2.51-85953 and Linux 3.9.4

11 年 前sl4mmy 編輯

附檔: 新增 VBoxSVC.log

VirtualBox service log from crash with VirtualBox 4.2.51-85953 and Linux 3.9.4

回覆:  36 comment:44 11 年 前sl4mmy 編輯

Hi, frank-

Replying to frank:

wenns and others, I'm also interested in another set of data: When this happens, please attach the VBox.log file from the VM session you are currently running together with the output of 'dmesg' from the host. I need both files from the same time for investigation. Thank you!

I uploaded a tarball named sl4mmy-virtualbox-4.2.51-linux-3.9.4-oops.tar.gz to the FTP site that includes both logs plus coredumps of VirtualBox, VBoxSVC and VBoxXPCOMIPCD. I also attached both log files separately to this ticket:

回覆:  42 comment:45 11 年 前sl4mmy 編輯

Hi, frank-

Replying to frank:

Thanks wenns! We now see where it crashes but don't know yet why it crashes. Did you ever run an older kernel on your Xeon box with the same setup, so can you confirm that this is a Linux 3.8 regression? Or did you see the same crashes with older Linux kernels?

I ran VirtualBox on this workstation without problems since October 2012. The last working version for me was VirtualBox 4.2.8 with Linux 3.7.10.

Unfortunately, the official VirtualBox 4.2.10+ packages for Arch require Linux 3.8+ so I can't easily test VirtualBox 4.2.12 with Linux 3.7.10. It also makes it difficult to identify the regression: was the problem introduced in VirtualBox 4.2.10 or Linux 3.8?

comment:46 11 年 前Frank Mehnert 編輯

sl4mmy, thanks for the logs. But one log is missing: The VBox.log file from the VM. You provided VBoxSVC.log which is from the VBoxSVC server. The VBox.log file can be found either from the VM selector window / Machine / Show Log ... or can also be found in the VM configuration directory under Logs.

11 年 前sl4mmy 編輯

VBox.log from crash with VirtualBox 4.2.51-85953 and Linux 3.9.4

回覆:  46 comment:47 11 年 前sl4mmy 編輯

Hi, frank-

Replying to frank:

sl4mmy, thanks for the logs. But one log is missing: The VBox.log file from the VM. You provided VBoxSVC.log which is from the VBoxSVC server. The VBox.log file can be found either from the VM selector window / Machine / Show Log ... or can also be found in the VM configuration directory under Logs.

D'oh! Sorry... I've just attached the VBox.log from the same session yesterday as the other log files.

回覆:  38 comment:48 11 年 前sl4mmy 編輯

Hi, p5n-

Replying to p5n:

Actually switching off one of two CPUs helped me.

(I switched off all odd cores: 1,3,5,7)

Wow, that's a really interesting observation! I've been able to work-around the issue on my machine by doing the same, thanks!

comment:49 11 年 前sl4mmy 編輯

Howdy-

Thanks to p5n's observations (https://www.alldomusa.eu.org/ticket/11610#comment:37 and https://www.alldomusa.eu.org/ticket/11610#comment:38) I came up with a work-around that doesn't require disabling hardware virtualization acceleration:

$ numactl --cpunodebind=0 --localalloc -- /opt/VirtualBox/VirtualBox

First of all, here is what the numa topology of my workstation looks like:

$ numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 12 13 14 15 16 17
node 0 size: 6143 MB
node 0 free: 340 MB
node 1 cpus: 6 7 8 9 10 11 18 19 20 21 22 23
node 1 size: 6127 MB
node 1 free: 154 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10

So with this work-around VirtualBox can only run on the CPUs of node 0, and all of the memory used by VirtualBox should be allocated on the same node running the process.

By the way, this is with the VirtualBox 4.2.51-85953 test build frank and others linked to, and Linux 3.9.4.

Interestingly, when I first tried playing with numactl after reading p5n's comments I tried binding to the CPUs on node 1, not node 0, but I encountered the same kernel oops. I tried putzing with a few more options to numactl but to no avail. Before giving up, however, I decided to try binding to node 0 instead, and sure it enough it appears to work!

What is it about node 0 that is special? Is it anyway related to the fact that node 0 is the initial boot node?

I tested with a 32-bit Windows guest with 2 CPUs and a 64-bit RHEL 6.3 guest with 2 CPUs. I even tested with both running simultaneously, watching YouTube videos in the Windows guest while running some builds in the RHEL guest. :) Zero kernel Oops so far...

Yay! Big thanks to p5n!

comment:50 11 年 前Frank Mehnert 編輯

Thanks sl4mmy, also for your additional log. This helps further...

回覆:  50 comment:51 11 年 前sl4mmy 編輯

Hi, frank-

Replying to frank:

Thanks sl4mmy, also for your additional log. This helps further...

Sure, no problem!

Also, I can confirm that the work-around also works with the official Arch packages for VirtualBox 4.2.12 (virtualbox-4.2.12-3 and virtualbox-host-modules-4.2.12-6) on Linux 3.9.4.

comment:52 11 年 前Frank Mehnert 編輯

I think the reason for this problem is CONFIG_NUMA_BALANCING which was introduced in Linux 3.8. Currently looking for a patch how to prevent migrating pages between numa nodes. Probably by setting a VM area flag...

comment:53 11 年 前Romain Buquet 編輯

Dummy comment, just to be notified when this ticket is modified.

comment:54 11 年 前Frank Mehnert 編輯

Just an update: We know what's wrong but it will be difficult to fix. Actually we are a bit over-stretching the Linux kernel API. We plan a workaround for 4.2.x and a better fix for the next major release. As written above, this problem affects only people which have more than one NUMA node in their system (output of numctl --hardware).

comment:55 11 年 前Frank Mehnert 編輯

The following patch will be included in the next maintenance release (expected very soon). To fix the problem, please go to /usr/src/vboxhost-4.2.12/vboxdrv/r0drv/linux and apply these two lines manually. Then make sure that all VMs are terminated, recompile the host kernel driver (/etc/init.d/vboxdrv setup) and that was it. Or just wait a bit for the release.

This is actually a workaround but we cannot do a more fundamental fix. The simple fix will require a Linux kernel change, the difficult fix will require many many code changes in VBox so this will have to wait.

--- memobj-r0drv-linux.c        (revision 86600)
+++ memobj-r0drv-linux.c        (revision 86601)
@@ -1527,6 +1527,21 @@
                 }
             }
 
+#ifdef CONFIG_NUMA_BALANCING
+            if (RT_SUCCESS(rc))
+            {
+                /** @todo Ugly hack! But right now we have no other means to disable
+                 *        automatic NUMA page balancing. */
+# ifdef RT_OS_X86
+                pTask->mm->numa_next_reset = jiffies + 0x7fffffffUL;
+                pTask->mm->numa_next_scan  = jiffies + 0x7fffffffUL;
+# else
+                pTask->mm->numa_next_reset = jiffies + 0x7fffffffffffffffUL;
+                pTask->mm->numa_next_scan  = jiffies + 0x7fffffffffffffffUL;
+# endif
+            }
+#endif
+
             up_write(&pTask->mm->mmap_sem);
 
             if (RT_SUCCESS(rc))

回覆:  55 comment:56 11 年 前Quickbooks Office 編輯

Replying to frank:

This is actually a workaround but we cannot do a more fundamental fix. The simple fix will require a Linux kernel change, the difficult fix will require many many code changes in VBox so this will have to

Can you post a trunk build for 64 bit linux, plz. thnx.

comment:57 11 年 前Frank Mehnert 編輯

Sure, here it is.

comment:58 11 年 前Frank Mehnert 編輯

The workaround is included in VBox 4.2.14.

comment:59 11 年 前sl4mmy 編輯

Hi, Frank-

I can confirm that the problem no longer occurs on my host system with VirtualBox 4.2.16. Thanks!

comment:60 11 年 前Frank Mehnert 編輯

狀態: newclosed
處理結果: fixed

Hi, sl4mmy, thanks for the feedback and thanks again for helping debugging this problem. I will close this ticket. A better fix is required but this one will do it for the moment.

comment:61 11 年 前Frank Mehnert 編輯

See also #11171 for page allocation warnings on Linux hosts.

注意: 瀏覽 TracTickets 來幫助您使用待辦事項功能

© 2024 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette