VirtualBox

source: vbox/trunk/src/VBox/VMM/VMMR0/GMMR0.cpp@ 82978

最後變更 在這個檔案從82978是 82978,由 vboxsync 提交於 5 年 前

VMM/GMMR0: Introduce a spinlock to protect the AVL tree and associated TLB. bugref:9627

  • 屬性 svn:eol-style 設為 native
  • 屬性 svn:keywords 設為 Id Revision
檔案大小: 196.7 KB
 
1/* $Id: GMMR0.cpp 82978 2020-02-04 14:52:50Z vboxsync $ */
2/** @file
3 * GMM - Global Memory Manager.
4 */
5
6/*
7 * Copyright (C) 2007-2020 Oracle Corporation
8 *
9 * This file is part of VirtualBox Open Source Edition (OSE), as
10 * available from http://www.alldomusa.eu.org. This file is free software;
11 * you can redistribute it and/or modify it under the terms of the GNU
12 * General Public License (GPL) as published by the Free Software
13 * Foundation, in version 2 as it comes in the "COPYING" file of the
14 * VirtualBox OSE distribution. VirtualBox OSE is distributed in the
15 * hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
16 */
17
18
19/** @page pg_gmm GMM - The Global Memory Manager
20 *
21 * As the name indicates, this component is responsible for global memory
22 * management. Currently only guest RAM is allocated from the GMM, but this
23 * may change to include shadow page tables and other bits later.
24 *
25 * Guest RAM is managed as individual pages, but allocated from the host OS
26 * in chunks for reasons of portability / efficiency. To minimize the memory
27 * footprint all tracking structure must be as small as possible without
28 * unnecessary performance penalties.
29 *
30 * The allocation chunks has fixed sized, the size defined at compile time
31 * by the #GMM_CHUNK_SIZE \#define.
32 *
33 * Each chunk is given an unique ID. Each page also has a unique ID. The
34 * relationship between the two IDs is:
35 * @code
36 * GMM_CHUNK_SHIFT = log2(GMM_CHUNK_SIZE / PAGE_SIZE);
37 * idPage = (idChunk << GMM_CHUNK_SHIFT) | iPage;
38 * @endcode
39 * Where iPage is the index of the page within the chunk. This ID scheme
40 * permits for efficient chunk and page lookup, but it relies on the chunk size
41 * to be set at compile time. The chunks are organized in an AVL tree with their
42 * IDs being the keys.
43 *
44 * The physical address of each page in an allocation chunk is maintained by
45 * the #RTR0MEMOBJ and obtained using #RTR0MemObjGetPagePhysAddr. There is no
46 * need to duplicate this information (it'll cost 8-bytes per page if we did).
47 *
48 * So what do we need to track per page? Most importantly we need to know
49 * which state the page is in:
50 * - Private - Allocated for (eventually) backing one particular VM page.
51 * - Shared - Readonly page that is used by one or more VMs and treated
52 * as COW by PGM.
53 * - Free - Not used by anyone.
54 *
55 * For the page replacement operations (sharing, defragmenting and freeing)
56 * to be somewhat efficient, private pages needs to be associated with a
57 * particular page in a particular VM.
58 *
59 * Tracking the usage of shared pages is impractical and expensive, so we'll
60 * settle for a reference counting system instead.
61 *
62 * Free pages will be chained on LIFOs
63 *
64 * On 64-bit systems we will use a 64-bit bitfield per page, while on 32-bit
65 * systems a 32-bit bitfield will have to suffice because of address space
66 * limitations. The #GMMPAGE structure shows the details.
67 *
68 *
69 * @section sec_gmm_alloc_strat Page Allocation Strategy
70 *
71 * The strategy for allocating pages has to take fragmentation and shared
72 * pages into account, or we may end up with with 2000 chunks with only
73 * a few pages in each. Shared pages cannot easily be reallocated because
74 * of the inaccurate usage accounting (see above). Private pages can be
75 * reallocated by a defragmentation thread in the same manner that sharing
76 * is done.
77 *
78 * The first approach is to manage the free pages in two sets depending on
79 * whether they are mainly for the allocation of shared or private pages.
80 * In the initial implementation there will be almost no possibility for
81 * mixing shared and private pages in the same chunk (only if we're really
82 * stressed on memory), but when we implement forking of VMs and have to
83 * deal with lots of COW pages it'll start getting kind of interesting.
84 *
85 * The sets are lists of chunks with approximately the same number of
86 * free pages. Say the chunk size is 1MB, meaning 256 pages, and a set
87 * consists of 16 lists. So, the first list will contain the chunks with
88 * 1-7 free pages, the second covers 8-15, and so on. The chunks will be
89 * moved between the lists as pages are freed up or allocated.
90 *
91 *
92 * @section sec_gmm_costs Costs
93 *
94 * The per page cost in kernel space is 32-bit plus whatever RTR0MEMOBJ
95 * entails. In addition there is the chunk cost of approximately
96 * (sizeof(RT0MEMOBJ) + sizeof(CHUNK)) / 2^CHUNK_SHIFT bytes per page.
97 *
98 * On Windows the per page #RTR0MEMOBJ cost is 32-bit on 32-bit windows
99 * and 64-bit on 64-bit windows (a PFN_NUMBER in the MDL). So, 64-bit per page.
100 * The cost on Linux is identical, but here it's because of sizeof(struct page *).
101 *
102 *
103 * @section sec_gmm_legacy Legacy Mode for Non-Tier-1 Platforms
104 *
105 * In legacy mode the page source is locked user pages and not
106 * #RTR0MemObjAllocPhysNC, this means that a page can only be allocated
107 * by the VM that locked it. We will make no attempt at implementing
108 * page sharing on these systems, just do enough to make it all work.
109 *
110 * @note With 6.1 really dropping 32-bit support, the legacy mode is obsoleted
111 * under the assumption that there is sufficient kernel virtual address
112 * space to map all of the guest memory allocations. So, we'll be using
113 * #RTR0MemObjAllocPage on some platforms as an alternative to
114 * #RTR0MemObjAllocPhysNC.
115 *
116 *
117 * @subsection sub_gmm_locking Serializing
118 *
119 * One simple fast mutex will be employed in the initial implementation, not
120 * two as mentioned in @ref sec_pgmPhys_Serializing.
121 *
122 * @see @ref sec_pgmPhys_Serializing
123 *
124 *
125 * @section sec_gmm_overcommit Memory Over-Commitment Management
126 *
127 * The GVM will have to do the system wide memory over-commitment
128 * management. My current ideas are:
129 * - Per VM oc policy that indicates how much to initially commit
130 * to it and what to do in a out-of-memory situation.
131 * - Prevent overtaxing the host.
132 *
133 * There are some challenges here, the main ones are configurability and
134 * security. Should we for instance permit anyone to request 100% memory
135 * commitment? Who should be allowed to do runtime adjustments of the
136 * config. And how to prevent these settings from being lost when the last
137 * VM process exits? The solution is probably to have an optional root
138 * daemon the will keep VMMR0.r0 in memory and enable the security measures.
139 *
140 *
141 *
142 * @section sec_gmm_numa NUMA
143 *
144 * NUMA considerations will be designed and implemented a bit later.
145 *
146 * The preliminary guesses is that we will have to try allocate memory as
147 * close as possible to the CPUs the VM is executed on (EMT and additional CPU
148 * threads). Which means it's mostly about allocation and sharing policies.
149 * Both the scheduler and allocator interface will to supply some NUMA info
150 * and we'll need to have a way to calc access costs.
151 *
152 */
153
154
155/*********************************************************************************************************************************
156* Header Files *
157*********************************************************************************************************************************/
158#define LOG_GROUP LOG_GROUP_GMM
159#include <VBox/rawpci.h>
160#include <VBox/vmm/gmm.h>
161#include "GMMR0Internal.h"
162#include <VBox/vmm/vmcc.h>
163#include <VBox/vmm/pgm.h>
164#include <VBox/log.h>
165#include <VBox/param.h>
166#include <VBox/err.h>
167#include <VBox/VMMDev.h>
168#include <iprt/asm.h>
169#include <iprt/avl.h>
170#ifdef VBOX_STRICT
171# include <iprt/crc.h>
172#endif
173#include <iprt/critsect.h>
174#include <iprt/list.h>
175#include <iprt/mem.h>
176#include <iprt/memobj.h>
177#include <iprt/mp.h>
178#include <iprt/semaphore.h>
179#include <iprt/spinlock.h>
180#include <iprt/string.h>
181#include <iprt/time.h>
182
183
184/*********************************************************************************************************************************
185* Defined Constants And Macros *
186*********************************************************************************************************************************/
187/** @def VBOX_USE_CRIT_SECT_FOR_GIANT
188 * Use a critical section instead of a fast mutex for the giant GMM lock.
189 *
190 * @remarks This is primarily a way of avoiding the deadlock checks in the
191 * windows driver verifier. */
192#if defined(RT_OS_WINDOWS) || defined(RT_OS_DARWIN) || defined(DOXYGEN_RUNNING)
193# define VBOX_USE_CRIT_SECT_FOR_GIANT
194#endif
195
196#if (!defined(VBOX_WITH_RAM_IN_KERNEL) || defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)) \
197 && !defined(RT_OS_DARWIN)
198/** Enable the legacy mode code (will be dropped soon). */
199# define GMM_WITH_LEGACY_MODE
200#endif
201
202
203/*********************************************************************************************************************************
204* Structures and Typedefs *
205*********************************************************************************************************************************/
206/** Pointer to set of free chunks. */
207typedef struct GMMCHUNKFREESET *PGMMCHUNKFREESET;
208
209/**
210 * The per-page tracking structure employed by the GMM.
211 *
212 * On 32-bit hosts we'll some trickery is necessary to compress all
213 * the information into 32-bits. When the fSharedFree member is set,
214 * the 30th bit decides whether it's a free page or not.
215 *
216 * Because of the different layout on 32-bit and 64-bit hosts, macros
217 * are used to get and set some of the data.
218 */
219typedef union GMMPAGE
220{
221#if HC_ARCH_BITS == 64
222 /** Unsigned integer view. */
223 uint64_t u;
224
225 /** The common view. */
226 struct GMMPAGECOMMON
227 {
228 uint32_t uStuff1 : 32;
229 uint32_t uStuff2 : 30;
230 /** The page state. */
231 uint32_t u2State : 2;
232 } Common;
233
234 /** The view of a private page. */
235 struct GMMPAGEPRIVATE
236 {
237 /** The guest page frame number. (Max addressable: 2 ^ 44 - 16) */
238 uint32_t pfn;
239 /** The GVM handle. (64K VMs) */
240 uint32_t hGVM : 16;
241 /** Reserved. */
242 uint32_t u16Reserved : 14;
243 /** The page state. */
244 uint32_t u2State : 2;
245 } Private;
246
247 /** The view of a shared page. */
248 struct GMMPAGESHARED
249 {
250 /** The host page frame number. (Max addressable: 2 ^ 44 - 16) */
251 uint32_t pfn;
252 /** The reference count (64K VMs). */
253 uint32_t cRefs : 16;
254 /** Used for debug checksumming. */
255 uint32_t u14Checksum : 14;
256 /** The page state. */
257 uint32_t u2State : 2;
258 } Shared;
259
260 /** The view of a free page. */
261 struct GMMPAGEFREE
262 {
263 /** The index of the next page in the free list. UINT16_MAX is NIL. */
264 uint16_t iNext;
265 /** Reserved. Checksum or something? */
266 uint16_t u16Reserved0;
267 /** Reserved. Checksum or something? */
268 uint32_t u30Reserved1 : 30;
269 /** The page state. */
270 uint32_t u2State : 2;
271 } Free;
272
273#else /* 32-bit */
274 /** Unsigned integer view. */
275 uint32_t u;
276
277 /** The common view. */
278 struct GMMPAGECOMMON
279 {
280 uint32_t uStuff : 30;
281 /** The page state. */
282 uint32_t u2State : 2;
283 } Common;
284
285 /** The view of a private page. */
286 struct GMMPAGEPRIVATE
287 {
288 /** The guest page frame number. (Max addressable: 2 ^ 36) */
289 uint32_t pfn : 24;
290 /** The GVM handle. (127 VMs) */
291 uint32_t hGVM : 7;
292 /** The top page state bit, MBZ. */
293 uint32_t fZero : 1;
294 } Private;
295
296 /** The view of a shared page. */
297 struct GMMPAGESHARED
298 {
299 /** The reference count. */
300 uint32_t cRefs : 30;
301 /** The page state. */
302 uint32_t u2State : 2;
303 } Shared;
304
305 /** The view of a free page. */
306 struct GMMPAGEFREE
307 {
308 /** The index of the next page in the free list. UINT16_MAX is NIL. */
309 uint32_t iNext : 16;
310 /** Reserved. Checksum or something? */
311 uint32_t u14Reserved : 14;
312 /** The page state. */
313 uint32_t u2State : 2;
314 } Free;
315#endif
316} GMMPAGE;
317AssertCompileSize(GMMPAGE, sizeof(RTHCUINTPTR));
318/** Pointer to a GMMPAGE. */
319typedef GMMPAGE *PGMMPAGE;
320
321
322/** @name The Page States.
323 * @{ */
324/** A private page. */
325#define GMM_PAGE_STATE_PRIVATE 0
326/** A private page - alternative value used on the 32-bit implementation.
327 * This will never be used on 64-bit hosts. */
328#define GMM_PAGE_STATE_PRIVATE_32 1
329/** A shared page. */
330#define GMM_PAGE_STATE_SHARED 2
331/** A free page. */
332#define GMM_PAGE_STATE_FREE 3
333/** @} */
334
335
336/** @def GMM_PAGE_IS_PRIVATE
337 *
338 * @returns true if private, false if not.
339 * @param pPage The GMM page.
340 */
341#if HC_ARCH_BITS == 64
342# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_PRIVATE )
343#else
344# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Private.fZero == 0 )
345#endif
346
347/** @def GMM_PAGE_IS_SHARED
348 *
349 * @returns true if shared, false if not.
350 * @param pPage The GMM page.
351 */
352#define GMM_PAGE_IS_SHARED(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_SHARED )
353
354/** @def GMM_PAGE_IS_FREE
355 *
356 * @returns true if free, false if not.
357 * @param pPage The GMM page.
358 */
359#define GMM_PAGE_IS_FREE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_FREE )
360
361/** @def GMM_PAGE_PFN_LAST
362 * The last valid guest pfn range.
363 * @remark Some of the values outside the range has special meaning,
364 * see GMM_PAGE_PFN_UNSHAREABLE.
365 */
366#if HC_ARCH_BITS == 64
367# define GMM_PAGE_PFN_LAST UINT32_C(0xfffffff0)
368#else
369# define GMM_PAGE_PFN_LAST UINT32_C(0x00fffff0)
370#endif
371AssertCompile(GMM_PAGE_PFN_LAST == (GMM_GCPHYS_LAST >> PAGE_SHIFT));
372
373/** @def GMM_PAGE_PFN_UNSHAREABLE
374 * Indicates that this page isn't used for normal guest memory and thus isn't shareable.
375 */
376#if HC_ARCH_BITS == 64
377# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0xfffffff1)
378#else
379# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0x00fffff1)
380#endif
381AssertCompile(GMM_PAGE_PFN_UNSHAREABLE == (GMM_GCPHYS_UNSHAREABLE >> PAGE_SHIFT));
382
383
384/**
385 * A GMM allocation chunk ring-3 mapping record.
386 *
387 * This should really be associated with a session and not a VM, but
388 * it's simpler to associated with a VM and cleanup with the VM object
389 * is destroyed.
390 */
391typedef struct GMMCHUNKMAP
392{
393 /** The mapping object. */
394 RTR0MEMOBJ hMapObj;
395 /** The VM owning the mapping. */
396 PGVM pGVM;
397} GMMCHUNKMAP;
398/** Pointer to a GMM allocation chunk mapping. */
399typedef struct GMMCHUNKMAP *PGMMCHUNKMAP;
400
401
402/**
403 * A GMM allocation chunk.
404 */
405typedef struct GMMCHUNK
406{
407 /** The AVL node core.
408 * The Key is the chunk ID. (Giant mtx.) */
409 AVLU32NODECORE Core;
410 /** The memory object.
411 * Either from RTR0MemObjAllocPhysNC or RTR0MemObjLockUser depending on
412 * what the host can dish up with. (Chunk mtx protects mapping accesses
413 * and related frees.) */
414 RTR0MEMOBJ hMemObj;
415#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
416 /** Pointer to the kernel mapping. */
417 uint8_t *pbMapping;
418#endif
419 /** Pointer to the next chunk in the free list. (Giant mtx.) */
420 PGMMCHUNK pFreeNext;
421 /** Pointer to the previous chunk in the free list. (Giant mtx.) */
422 PGMMCHUNK pFreePrev;
423 /** Pointer to the free set this chunk belongs to. NULL for
424 * chunks with no free pages. (Giant mtx.) */
425 PGMMCHUNKFREESET pSet;
426 /** List node in the chunk list (GMM::ChunkList). (Giant mtx.) */
427 RTLISTNODE ListNode;
428 /** Pointer to an array of mappings. (Chunk mtx.) */
429 PGMMCHUNKMAP paMappingsX;
430 /** The number of mappings. (Chunk mtx.) */
431 uint16_t cMappingsX;
432 /** The mapping lock this chunk is using using. UINT16_MAX if nobody is
433 * mapping or freeing anything. (Giant mtx.) */
434 uint8_t volatile iChunkMtx;
435 /** GMM_CHUNK_FLAGS_XXX. (Giant mtx.) */
436 uint8_t fFlags;
437 /** The head of the list of free pages. UINT16_MAX is the NIL value.
438 * (Giant mtx.) */
439 uint16_t iFreeHead;
440 /** The number of free pages. (Giant mtx.) */
441 uint16_t cFree;
442 /** The GVM handle of the VM that first allocated pages from this chunk, this
443 * is used as a preference when there are several chunks to choose from.
444 * When in bound memory mode this isn't a preference any longer. (Giant
445 * mtx.) */
446 uint16_t hGVM;
447 /** The ID of the NUMA node the memory mostly resides on. (Reserved for
448 * future use.) (Giant mtx.) */
449 uint16_t idNumaNode;
450 /** The number of private pages. (Giant mtx.) */
451 uint16_t cPrivate;
452 /** The number of shared pages. (Giant mtx.) */
453 uint16_t cShared;
454 /** The pages. (Giant mtx.) */
455 GMMPAGE aPages[GMM_CHUNK_SIZE >> PAGE_SHIFT];
456} GMMCHUNK;
457
458/** Indicates that the NUMA properies of the memory is unknown. */
459#define GMM_CHUNK_NUMA_ID_UNKNOWN UINT16_C(0xfffe)
460
461/** @name GMM_CHUNK_FLAGS_XXX - chunk flags.
462 * @{ */
463/** Indicates that the chunk is a large page (2MB). */
464#define GMM_CHUNK_FLAGS_LARGE_PAGE UINT16_C(0x0001)
465#ifdef GMM_WITH_LEGACY_MODE
466/** Indicates that the chunk was locked rather than allocated directly. */
467# define GMM_CHUNK_FLAGS_SEEDED UINT16_C(0x0002)
468#endif
469/** @} */
470
471
472/**
473 * An allocation chunk TLB entry.
474 */
475typedef struct GMMCHUNKTLBE
476{
477 /** The chunk id. */
478 uint32_t idChunk;
479 /** Pointer to the chunk. */
480 PGMMCHUNK pChunk;
481} GMMCHUNKTLBE;
482/** Pointer to an allocation chunk TLB entry. */
483typedef GMMCHUNKTLBE *PGMMCHUNKTLBE;
484
485
486/** The number of entries tin the allocation chunk TLB. */
487#define GMM_CHUNKTLB_ENTRIES 32
488/** Gets the TLB entry index for the given Chunk ID. */
489#define GMM_CHUNKTLB_IDX(idChunk) ( (idChunk) & (GMM_CHUNKTLB_ENTRIES - 1) )
490
491/**
492 * An allocation chunk TLB.
493 */
494typedef struct GMMCHUNKTLB
495{
496 /** The TLB entries. */
497 GMMCHUNKTLBE aEntries[GMM_CHUNKTLB_ENTRIES];
498} GMMCHUNKTLB;
499/** Pointer to an allocation chunk TLB. */
500typedef GMMCHUNKTLB *PGMMCHUNKTLB;
501
502
503/**
504 * The GMM instance data.
505 */
506typedef struct GMM
507{
508 /** Magic / eye catcher. GMM_MAGIC */
509 uint32_t u32Magic;
510 /** The number of threads waiting on the mutex. */
511 uint32_t cMtxContenders;
512#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
513 /** The critical section protecting the GMM.
514 * More fine grained locking can be implemented later if necessary. */
515 RTCRITSECT GiantCritSect;
516#else
517 /** The fast mutex protecting the GMM.
518 * More fine grained locking can be implemented later if necessary. */
519 RTSEMFASTMUTEX hMtx;
520#endif
521#ifdef VBOX_STRICT
522 /** The current mutex owner. */
523 RTNATIVETHREAD hMtxOwner;
524#endif
525 /** Spinlock protecting the AVL tree.
526 * @todo Make this a read-write spinlock as we should allow concurrent
527 * lookups. */
528 RTSPINLOCK hSpinLockTree;
529 /** The chunk tree.
530 * Protected by hSpinLockTree. */
531 PAVLU32NODECORE pChunks;
532 /** The chunk TLB.
533 * Protected by hSpinLockTree. */
534 GMMCHUNKTLB ChunkTLB;
535 /** The private free set. */
536 GMMCHUNKFREESET PrivateX;
537 /** The shared free set. */
538 GMMCHUNKFREESET Shared;
539
540 /** Shared module tree (global).
541 * @todo separate trees for distinctly different guest OSes. */
542 PAVLLU32NODECORE pGlobalSharedModuleTree;
543 /** Sharable modules (count of nodes in pGlobalSharedModuleTree). */
544 uint32_t cShareableModules;
545
546 /** The chunk list. For simplifying the cleanup process and avoid tree
547 * traversal. */
548 RTLISTANCHOR ChunkList;
549
550 /** The maximum number of pages we're allowed to allocate.
551 * @gcfgm{GMM/MaxPages,64-bit, Direct.}
552 * @gcfgm{GMM/PctPages,32-bit, Relative to the number of host pages.} */
553 uint64_t cMaxPages;
554 /** The number of pages that has been reserved.
555 * The deal is that cReservedPages - cOverCommittedPages <= cMaxPages. */
556 uint64_t cReservedPages;
557 /** The number of pages that we have over-committed in reservations. */
558 uint64_t cOverCommittedPages;
559 /** The number of actually allocated (committed if you like) pages. */
560 uint64_t cAllocatedPages;
561 /** The number of pages that are shared. A subset of cAllocatedPages. */
562 uint64_t cSharedPages;
563 /** The number of pages that are actually shared between VMs. */
564 uint64_t cDuplicatePages;
565 /** The number of pages that are shared that has been left behind by
566 * VMs not doing proper cleanups. */
567 uint64_t cLeftBehindSharedPages;
568 /** The number of allocation chunks.
569 * (The number of pages we've allocated from the host can be derived from this.) */
570 uint32_t cChunks;
571 /** The number of current ballooned pages. */
572 uint64_t cBalloonedPages;
573
574#ifndef GMM_WITH_LEGACY_MODE
575# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
576 /** Whether #RTR0MemObjAllocPhysNC works. */
577 bool fHasWorkingAllocPhysNC;
578# else
579 bool fPadding;
580# endif
581#else
582 /** The legacy allocation mode indicator.
583 * This is determined at initialization time. */
584 bool fLegacyAllocationMode;
585#endif
586 /** The bound memory mode indicator.
587 * When set, the memory will be bound to a specific VM and never
588 * shared. This is always set if fLegacyAllocationMode is set.
589 * (Also determined at initialization time.) */
590 bool fBoundMemoryMode;
591 /** The number of registered VMs. */
592 uint16_t cRegisteredVMs;
593
594 /** The number of freed chunks ever. This is used a list generation to
595 * avoid restarting the cleanup scanning when the list wasn't modified. */
596 uint32_t cFreedChunks;
597 /** The previous allocated Chunk ID.
598 * Used as a hint to avoid scanning the whole bitmap. */
599 uint32_t idChunkPrev;
600 /** Chunk ID allocation bitmap.
601 * Bits of allocated IDs are set, free ones are clear.
602 * The NIL id (0) is marked allocated. */
603 uint32_t bmChunkId[(GMM_CHUNKID_LAST + 1 + 31) / 32];
604
605 /** The index of the next mutex to use. */
606 uint32_t iNextChunkMtx;
607 /** Chunk locks for reducing lock contention without having to allocate
608 * one lock per chunk. */
609 struct
610 {
611 /** The mutex */
612 RTSEMFASTMUTEX hMtx;
613 /** The number of threads currently using this mutex. */
614 uint32_t volatile cUsers;
615 } aChunkMtx[64];
616} GMM;
617/** Pointer to the GMM instance. */
618typedef GMM *PGMM;
619
620/** The value of GMM::u32Magic (Katsuhiro Otomo). */
621#define GMM_MAGIC UINT32_C(0x19540414)
622
623
624/**
625 * GMM chunk mutex state.
626 *
627 * This is returned by gmmR0ChunkMutexAcquire and is used by the other
628 * gmmR0ChunkMutex* methods.
629 */
630typedef struct GMMR0CHUNKMTXSTATE
631{
632 PGMM pGMM;
633 /** The index of the chunk mutex. */
634 uint8_t iChunkMtx;
635 /** The relevant flags (GMMR0CHUNK_MTX_XXX). */
636 uint8_t fFlags;
637} GMMR0CHUNKMTXSTATE;
638/** Pointer to a chunk mutex state. */
639typedef GMMR0CHUNKMTXSTATE *PGMMR0CHUNKMTXSTATE;
640
641/** @name GMMR0CHUNK_MTX_XXX
642 * @{ */
643#define GMMR0CHUNK_MTX_INVALID UINT32_C(0)
644#define GMMR0CHUNK_MTX_KEEP_GIANT UINT32_C(1)
645#define GMMR0CHUNK_MTX_RETAKE_GIANT UINT32_C(2)
646#define GMMR0CHUNK_MTX_DROP_GIANT UINT32_C(3)
647#define GMMR0CHUNK_MTX_END UINT32_C(4)
648/** @} */
649
650
651/** The maximum number of shared modules per-vm. */
652#define GMM_MAX_SHARED_PER_VM_MODULES 2048
653/** The maximum number of shared modules GMM is allowed to track. */
654#define GMM_MAX_SHARED_GLOBAL_MODULES 16834
655
656
657/**
658 * Argument packet for gmmR0SharedModuleCleanup.
659 */
660typedef struct GMMR0SHMODPERVMDTORARGS
661{
662 PGVM pGVM;
663 PGMM pGMM;
664} GMMR0SHMODPERVMDTORARGS;
665
666/**
667 * Argument packet for gmmR0CheckSharedModule.
668 */
669typedef struct GMMCHECKSHAREDMODULEINFO
670{
671 PGVM pGVM;
672 VMCPUID idCpu;
673} GMMCHECKSHAREDMODULEINFO;
674
675
676/*********************************************************************************************************************************
677* Global Variables *
678*********************************************************************************************************************************/
679/** Pointer to the GMM instance data. */
680static PGMM g_pGMM = NULL;
681
682/** Macro for obtaining and validating the g_pGMM pointer.
683 *
684 * On failure it will return from the invoking function with the specified
685 * return value.
686 *
687 * @param pGMM The name of the pGMM variable.
688 * @param rc The return value on failure. Use VERR_GMM_INSTANCE for VBox
689 * status codes.
690 */
691#define GMM_GET_VALID_INSTANCE(pGMM, rc) \
692 do { \
693 (pGMM) = g_pGMM; \
694 AssertPtrReturn((pGMM), (rc)); \
695 AssertMsgReturn((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic), (rc)); \
696 } while (0)
697
698/** Macro for obtaining and validating the g_pGMM pointer, void function
699 * variant.
700 *
701 * On failure it will return from the invoking function.
702 *
703 * @param pGMM The name of the pGMM variable.
704 */
705#define GMM_GET_VALID_INSTANCE_VOID(pGMM) \
706 do { \
707 (pGMM) = g_pGMM; \
708 AssertPtrReturnVoid((pGMM)); \
709 AssertMsgReturnVoid((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic)); \
710 } while (0)
711
712
713/** @def GMM_CHECK_SANITY_UPON_ENTERING
714 * Checks the sanity of the GMM instance data before making changes.
715 *
716 * This is macro is a stub by default and must be enabled manually in the code.
717 *
718 * @returns true if sane, false if not.
719 * @param pGMM The name of the pGMM variable.
720 */
721#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
722# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
723#else
724# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (true)
725#endif
726
727/** @def GMM_CHECK_SANITY_UPON_LEAVING
728 * Checks the sanity of the GMM instance data after making changes.
729 *
730 * This is macro is a stub by default and must be enabled manually in the code.
731 *
732 * @returns true if sane, false if not.
733 * @param pGMM The name of the pGMM variable.
734 */
735#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
736# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
737#else
738# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (true)
739#endif
740
741/** @def GMM_CHECK_SANITY_IN_LOOPS
742 * Checks the sanity of the GMM instance in the allocation loops.
743 *
744 * This is macro is a stub by default and must be enabled manually in the code.
745 *
746 * @returns true if sane, false if not.
747 * @param pGMM The name of the pGMM variable.
748 */
749#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
750# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
751#else
752# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (true)
753#endif
754
755
756/*********************************************************************************************************************************
757* Internal Functions *
758*********************************************************************************************************************************/
759static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM);
760static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
761DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk);
762DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet);
763DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
764#ifdef GMMR0_WITH_SANITY_CHECK
765static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo);
766#endif
767static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem);
768DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
769DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
770static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
771#ifdef VBOX_WITH_PAGE_SHARING
772static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM);
773# ifdef VBOX_STRICT
774static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage);
775# endif
776#endif
777
778
779
780/**
781 * Initializes the GMM component.
782 *
783 * This is called when the VMMR0.r0 module is loaded and protected by the
784 * loader semaphore.
785 *
786 * @returns VBox status code.
787 */
788GMMR0DECL(int) GMMR0Init(void)
789{
790 LogFlow(("GMMInit:\n"));
791
792 /*
793 * Allocate the instance data and the locks.
794 */
795 PGMM pGMM = (PGMM)RTMemAllocZ(sizeof(*pGMM));
796 if (!pGMM)
797 return VERR_NO_MEMORY;
798
799 pGMM->u32Magic = GMM_MAGIC;
800 for (unsigned i = 0; i < RT_ELEMENTS(pGMM->ChunkTLB.aEntries); i++)
801 pGMM->ChunkTLB.aEntries[i].idChunk = NIL_GMM_CHUNKID;
802 RTListInit(&pGMM->ChunkList);
803 ASMBitSet(&pGMM->bmChunkId[0], NIL_GMM_CHUNKID);
804
805#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
806 int rc = RTCritSectInit(&pGMM->GiantCritSect);
807#else
808 int rc = RTSemFastMutexCreate(&pGMM->hMtx);
809#endif
810 if (RT_SUCCESS(rc))
811 {
812 unsigned iMtx;
813 for (iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
814 {
815 rc = RTSemFastMutexCreate(&pGMM->aChunkMtx[iMtx].hMtx);
816 if (RT_FAILURE(rc))
817 break;
818 }
819 pGMM->hSpinLockTree = NIL_RTSPINLOCK;
820 if (RT_SUCCESS(rc))
821 rc = RTSpinlockCreate(&pGMM->hSpinLockTree, RTSPINLOCK_FLAGS_INTERRUPT_SAFE, "gmm-chunk-tree");
822 if (RT_SUCCESS(rc))
823 {
824#ifndef GMM_WITH_LEGACY_MODE
825 /*
826 * Figure out how we're going to allocate stuff (only applicable to
827 * host with linear physical memory mappings).
828 */
829 pGMM->fBoundMemoryMode = false;
830# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
831 pGMM->fHasWorkingAllocPhysNC = false;
832
833 RTR0MEMOBJ hMemObj;
834 rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
835 if (RT_SUCCESS(rc))
836 {
837 rc = RTR0MemObjFree(hMemObj, true);
838 AssertRC(rc);
839 pGMM->fHasWorkingAllocPhysNC = true;
840 }
841 else if (rc != VERR_NOT_SUPPORTED)
842 SUPR0Printf("GMMR0Init: Warning! RTR0MemObjAllocPhysNC(, %u, NIL_RTHCPHYS) -> %d!\n", GMM_CHUNK_SIZE, rc);
843# endif
844#else /* GMM_WITH_LEGACY_MODE */
845 /*
846 * Check and see if RTR0MemObjAllocPhysNC works.
847 */
848# if 0 /* later, see @bufref{3170}. */
849 RTR0MEMOBJ MemObj;
850 rc = RTR0MemObjAllocPhysNC(&MemObj, _64K, NIL_RTHCPHYS);
851 if (RT_SUCCESS(rc))
852 {
853 rc = RTR0MemObjFree(MemObj, true);
854 AssertRC(rc);
855 }
856 else if (rc == VERR_NOT_SUPPORTED)
857 pGMM->fLegacyAllocationMode = pGMM->fBoundMemoryMode = true;
858 else
859 SUPR0Printf("GMMR0Init: RTR0MemObjAllocPhysNC(,64K,Any) -> %d!\n", rc);
860# else
861# if defined(RT_OS_WINDOWS) || (defined(RT_OS_SOLARIS) && ARCH_BITS == 64) || defined(RT_OS_LINUX) || defined(RT_OS_FREEBSD)
862 pGMM->fLegacyAllocationMode = false;
863# if ARCH_BITS == 32
864 /* Don't reuse possibly partial chunks because of the virtual
865 address space limitation. */
866 pGMM->fBoundMemoryMode = true;
867# else
868 pGMM->fBoundMemoryMode = false;
869# endif
870# else
871 pGMM->fLegacyAllocationMode = true;
872 pGMM->fBoundMemoryMode = true;
873# endif
874# endif
875#endif /* GMM_WITH_LEGACY_MODE */
876
877 /*
878 * Query system page count and guess a reasonable cMaxPages value.
879 */
880 pGMM->cMaxPages = UINT32_MAX; /** @todo IPRT function for query ram size and such. */
881
882 g_pGMM = pGMM;
883#ifdef GMM_WITH_LEGACY_MODE
884 LogFlow(("GMMInit: pGMM=%p fLegacyAllocationMode=%RTbool fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fLegacyAllocationMode, pGMM->fBoundMemoryMode));
885#elif defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
886 LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool fHasWorkingAllocPhysNC=%RTbool\n", pGMM, pGMM->fBoundMemoryMode, pGMM->fHasWorkingAllocPhysNC));
887#else
888 LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fBoundMemoryMode));
889#endif
890 return VINF_SUCCESS;
891 }
892
893 /*
894 * Bail out.
895 */
896 RTSpinlockDestroy(pGMM->hSpinLockTree);
897 while (iMtx-- > 0)
898 RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
899#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
900 RTCritSectDelete(&pGMM->GiantCritSect);
901#else
902 RTSemFastMutexDestroy(pGMM->hMtx);
903#endif
904 }
905
906 pGMM->u32Magic = 0;
907 RTMemFree(pGMM);
908 SUPR0Printf("GMMR0Init: failed! rc=%d\n", rc);
909 return rc;
910}
911
912
913/**
914 * Terminates the GMM component.
915 */
916GMMR0DECL(void) GMMR0Term(void)
917{
918 LogFlow(("GMMTerm:\n"));
919
920 /*
921 * Take care / be paranoid...
922 */
923 PGMM pGMM = g_pGMM;
924 if (!VALID_PTR(pGMM))
925 return;
926 if (pGMM->u32Magic != GMM_MAGIC)
927 {
928 SUPR0Printf("GMMR0Term: u32Magic=%#x\n", pGMM->u32Magic);
929 return;
930 }
931
932 /*
933 * Undo what init did and free all the resources we've acquired.
934 */
935 /* Destroy the fundamentals. */
936 g_pGMM = NULL;
937 pGMM->u32Magic = ~GMM_MAGIC;
938#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
939 RTCritSectDelete(&pGMM->GiantCritSect);
940#else
941 RTSemFastMutexDestroy(pGMM->hMtx);
942 pGMM->hMtx = NIL_RTSEMFASTMUTEX;
943#endif
944 RTSpinlockDestroy(pGMM->hSpinLockTree);
945 pGMM->hSpinLockTree = NIL_RTSPINLOCK;
946
947 /* Free any chunks still hanging around. */
948 RTAvlU32Destroy(&pGMM->pChunks, gmmR0TermDestroyChunk, pGMM);
949
950 /* Destroy the chunk locks. */
951 for (unsigned iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
952 {
953 Assert(pGMM->aChunkMtx[iMtx].cUsers == 0);
954 RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
955 pGMM->aChunkMtx[iMtx].hMtx = NIL_RTSEMFASTMUTEX;
956 }
957
958 /* Finally the instance data itself. */
959 RTMemFree(pGMM);
960 LogFlow(("GMMTerm: done\n"));
961}
962
963
964/**
965 * RTAvlU32Destroy callback.
966 *
967 * @returns 0
968 * @param pNode The node to destroy.
969 * @param pvGMM The GMM handle.
970 */
971static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM)
972{
973 PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
974
975 if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
976 SUPR0Printf("GMMR0Term: %RKv/%#x: cFree=%d cPrivate=%d cShared=%d cMappings=%d\n", pChunk,
977 pChunk->Core.Key, pChunk->cFree, pChunk->cPrivate, pChunk->cShared, pChunk->cMappingsX);
978
979 int rc = RTR0MemObjFree(pChunk->hMemObj, true /* fFreeMappings */);
980 if (RT_FAILURE(rc))
981 {
982 SUPR0Printf("GMMR0Term: %RKv/%#x: RTRMemObjFree(%RKv,true) -> %d (cMappings=%d)\n", pChunk,
983 pChunk->Core.Key, pChunk->hMemObj, rc, pChunk->cMappingsX);
984 AssertRC(rc);
985 }
986 pChunk->hMemObj = NIL_RTR0MEMOBJ;
987
988 RTMemFree(pChunk->paMappingsX);
989 pChunk->paMappingsX = NULL;
990
991 RTMemFree(pChunk);
992 NOREF(pvGMM);
993 return 0;
994}
995
996
997/**
998 * Initializes the per-VM data for the GMM.
999 *
1000 * This is called from within the GVMM lock (from GVMMR0CreateVM)
1001 * and should only initialize the data members so GMMR0CleanupVM
1002 * can deal with them. We reserve no memory or anything here,
1003 * that's done later in GMMR0InitVM.
1004 *
1005 * @param pGVM Pointer to the Global VM structure.
1006 */
1007GMMR0DECL(void) GMMR0InitPerVMData(PGVM pGVM)
1008{
1009 AssertCompile(RT_SIZEOFMEMB(GVM,gmm.s) <= RT_SIZEOFMEMB(GVM,gmm.padding));
1010
1011 pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1012 pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1013 pGVM->gmm.s.Stats.fMayAllocate = false;
1014}
1015
1016
1017/**
1018 * Acquires the GMM giant lock.
1019 *
1020 * @returns Assert status code from RTSemFastMutexRequest.
1021 * @param pGMM Pointer to the GMM instance.
1022 */
1023static int gmmR0MutexAcquire(PGMM pGMM)
1024{
1025 ASMAtomicIncU32(&pGMM->cMtxContenders);
1026#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1027 int rc = RTCritSectEnter(&pGMM->GiantCritSect);
1028#else
1029 int rc = RTSemFastMutexRequest(pGMM->hMtx);
1030#endif
1031 ASMAtomicDecU32(&pGMM->cMtxContenders);
1032 AssertRC(rc);
1033#ifdef VBOX_STRICT
1034 pGMM->hMtxOwner = RTThreadNativeSelf();
1035#endif
1036 return rc;
1037}
1038
1039
1040/**
1041 * Releases the GMM giant lock.
1042 *
1043 * @returns Assert status code from RTSemFastMutexRequest.
1044 * @param pGMM Pointer to the GMM instance.
1045 */
1046static int gmmR0MutexRelease(PGMM pGMM)
1047{
1048#ifdef VBOX_STRICT
1049 pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1050#endif
1051#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1052 int rc = RTCritSectLeave(&pGMM->GiantCritSect);
1053#else
1054 int rc = RTSemFastMutexRelease(pGMM->hMtx);
1055 AssertRC(rc);
1056#endif
1057 return rc;
1058}
1059
1060
1061/**
1062 * Yields the GMM giant lock if there is contention and a certain minimum time
1063 * has elapsed since we took it.
1064 *
1065 * @returns @c true if the mutex was yielded, @c false if not.
1066 * @param pGMM Pointer to the GMM instance.
1067 * @param puLockNanoTS Where the lock acquisition time stamp is kept
1068 * (in/out).
1069 */
1070static bool gmmR0MutexYield(PGMM pGMM, uint64_t *puLockNanoTS)
1071{
1072 /*
1073 * If nobody is contending the mutex, don't bother checking the time.
1074 */
1075 if (ASMAtomicReadU32(&pGMM->cMtxContenders) == 0)
1076 return false;
1077
1078 /*
1079 * Don't yield if we haven't executed for at least 2 milliseconds.
1080 */
1081 uint64_t uNanoNow = RTTimeSystemNanoTS();
1082 if (uNanoNow - *puLockNanoTS < UINT32_C(2000000))
1083 return false;
1084
1085 /*
1086 * Yield the mutex.
1087 */
1088#ifdef VBOX_STRICT
1089 pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1090#endif
1091 ASMAtomicIncU32(&pGMM->cMtxContenders);
1092#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1093 int rc1 = RTCritSectLeave(&pGMM->GiantCritSect); AssertRC(rc1);
1094#else
1095 int rc1 = RTSemFastMutexRelease(pGMM->hMtx); AssertRC(rc1);
1096#endif
1097
1098 RTThreadYield();
1099
1100#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1101 int rc2 = RTCritSectEnter(&pGMM->GiantCritSect); AssertRC(rc2);
1102#else
1103 int rc2 = RTSemFastMutexRequest(pGMM->hMtx); AssertRC(rc2);
1104#endif
1105 *puLockNanoTS = RTTimeSystemNanoTS();
1106 ASMAtomicDecU32(&pGMM->cMtxContenders);
1107#ifdef VBOX_STRICT
1108 pGMM->hMtxOwner = RTThreadNativeSelf();
1109#endif
1110
1111 return true;
1112}
1113
1114
1115/**
1116 * Acquires a chunk lock.
1117 *
1118 * The caller must own the giant lock.
1119 *
1120 * @returns Assert status code from RTSemFastMutexRequest.
1121 * @param pMtxState The chunk mutex state info. (Avoids
1122 * passing the same flags and stuff around
1123 * for subsequent release and drop-giant
1124 * calls.)
1125 * @param pGMM Pointer to the GMM instance.
1126 * @param pChunk Pointer to the chunk.
1127 * @param fFlags Flags regarding the giant lock, GMMR0CHUNK_MTX_XXX.
1128 */
1129static int gmmR0ChunkMutexAcquire(PGMMR0CHUNKMTXSTATE pMtxState, PGMM pGMM, PGMMCHUNK pChunk, uint32_t fFlags)
1130{
1131 Assert(fFlags > GMMR0CHUNK_MTX_INVALID && fFlags < GMMR0CHUNK_MTX_END);
1132 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1133
1134 pMtxState->pGMM = pGMM;
1135 pMtxState->fFlags = (uint8_t)fFlags;
1136
1137 /*
1138 * Get the lock index and reference the lock.
1139 */
1140 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1141 uint32_t iChunkMtx = pChunk->iChunkMtx;
1142 if (iChunkMtx == UINT8_MAX)
1143 {
1144 iChunkMtx = pGMM->iNextChunkMtx++;
1145 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1146
1147 /* Try get an unused one... */
1148 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1149 {
1150 iChunkMtx = pGMM->iNextChunkMtx++;
1151 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1152 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1153 {
1154 iChunkMtx = pGMM->iNextChunkMtx++;
1155 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1156 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1157 {
1158 iChunkMtx = pGMM->iNextChunkMtx++;
1159 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1160 }
1161 }
1162 }
1163
1164 pChunk->iChunkMtx = iChunkMtx;
1165 }
1166 AssertCompile(RT_ELEMENTS(pGMM->aChunkMtx) < UINT8_MAX);
1167 pMtxState->iChunkMtx = (uint8_t)iChunkMtx;
1168 ASMAtomicIncU32(&pGMM->aChunkMtx[iChunkMtx].cUsers);
1169
1170 /*
1171 * Drop the giant?
1172 */
1173 if (fFlags != GMMR0CHUNK_MTX_KEEP_GIANT)
1174 {
1175 /** @todo GMM life cycle cleanup (we may race someone
1176 * destroying and cleaning up GMM)? */
1177 gmmR0MutexRelease(pGMM);
1178 }
1179
1180 /*
1181 * Take the chunk mutex.
1182 */
1183 int rc = RTSemFastMutexRequest(pGMM->aChunkMtx[iChunkMtx].hMtx);
1184 AssertRC(rc);
1185 return rc;
1186}
1187
1188
1189/**
1190 * Releases the GMM giant lock.
1191 *
1192 * @returns Assert status code from RTSemFastMutexRequest.
1193 * @param pMtxState Pointer to the chunk mutex state.
1194 * @param pChunk Pointer to the chunk if it's still
1195 * alive, NULL if it isn't. This is used to deassociate
1196 * the chunk from the mutex on the way out so a new one
1197 * can be selected next time, thus avoiding contented
1198 * mutexes.
1199 */
1200static int gmmR0ChunkMutexRelease(PGMMR0CHUNKMTXSTATE pMtxState, PGMMCHUNK pChunk)
1201{
1202 PGMM pGMM = pMtxState->pGMM;
1203
1204 /*
1205 * Release the chunk mutex and reacquire the giant if requested.
1206 */
1207 int rc = RTSemFastMutexRelease(pGMM->aChunkMtx[pMtxState->iChunkMtx].hMtx);
1208 AssertRC(rc);
1209 if (pMtxState->fFlags == GMMR0CHUNK_MTX_RETAKE_GIANT)
1210 rc = gmmR0MutexAcquire(pGMM);
1211 else
1212 Assert((pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT) == (pGMM->hMtxOwner == RTThreadNativeSelf()));
1213
1214 /*
1215 * Drop the chunk mutex user reference and deassociate it from the chunk
1216 * when possible.
1217 */
1218 if ( ASMAtomicDecU32(&pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers) == 0
1219 && pChunk
1220 && RT_SUCCESS(rc) )
1221 {
1222 if (pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT)
1223 pChunk->iChunkMtx = UINT8_MAX;
1224 else
1225 {
1226 rc = gmmR0MutexAcquire(pGMM);
1227 if (RT_SUCCESS(rc))
1228 {
1229 if (pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers == 0)
1230 pChunk->iChunkMtx = UINT8_MAX;
1231 rc = gmmR0MutexRelease(pGMM);
1232 }
1233 }
1234 }
1235
1236 pMtxState->pGMM = NULL;
1237 return rc;
1238}
1239
1240
1241/**
1242 * Drops the giant GMM lock we kept in gmmR0ChunkMutexAcquire while keeping the
1243 * chunk locked.
1244 *
1245 * This only works if gmmR0ChunkMutexAcquire was called with
1246 * GMMR0CHUNK_MTX_KEEP_GIANT. gmmR0ChunkMutexRelease will retake the giant
1247 * mutex, i.e. behave as if GMMR0CHUNK_MTX_RETAKE_GIANT was used.
1248 *
1249 * @returns VBox status code (assuming success is ok).
1250 * @param pMtxState Pointer to the chunk mutex state.
1251 */
1252static int gmmR0ChunkMutexDropGiant(PGMMR0CHUNKMTXSTATE pMtxState)
1253{
1254 AssertReturn(pMtxState->fFlags == GMMR0CHUNK_MTX_KEEP_GIANT, VERR_GMM_MTX_FLAGS);
1255 Assert(pMtxState->pGMM->hMtxOwner == RTThreadNativeSelf());
1256 pMtxState->fFlags = GMMR0CHUNK_MTX_RETAKE_GIANT;
1257 /** @todo GMM life cycle cleanup (we may race someone
1258 * destroying and cleaning up GMM)? */
1259 return gmmR0MutexRelease(pMtxState->pGMM);
1260}
1261
1262
1263/**
1264 * For experimenting with NUMA affinity and such.
1265 *
1266 * @returns The current NUMA Node ID.
1267 */
1268static uint16_t gmmR0GetCurrentNumaNodeId(void)
1269{
1270#if 1
1271 return GMM_CHUNK_NUMA_ID_UNKNOWN;
1272#else
1273 return RTMpCpuId() / 16;
1274#endif
1275}
1276
1277
1278
1279/**
1280 * Cleans up when a VM is terminating.
1281 *
1282 * @param pGVM Pointer to the Global VM structure.
1283 */
1284GMMR0DECL(void) GMMR0CleanupVM(PGVM pGVM)
1285{
1286 LogFlow(("GMMR0CleanupVM: pGVM=%p:{.hSelf=%#x}\n", pGVM, pGVM->hSelf));
1287
1288 PGMM pGMM;
1289 GMM_GET_VALID_INSTANCE_VOID(pGMM);
1290
1291#ifdef VBOX_WITH_PAGE_SHARING
1292 /*
1293 * Clean up all registered shared modules first.
1294 */
1295 gmmR0SharedModuleCleanup(pGMM, pGVM);
1296#endif
1297
1298 gmmR0MutexAcquire(pGMM);
1299 uint64_t uLockNanoTS = RTTimeSystemNanoTS();
1300 GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
1301
1302 /*
1303 * The policy is 'INVALID' until the initial reservation
1304 * request has been serviced.
1305 */
1306 if ( pGVM->gmm.s.Stats.enmPolicy > GMMOCPOLICY_INVALID
1307 && pGVM->gmm.s.Stats.enmPolicy < GMMOCPOLICY_END)
1308 {
1309 /*
1310 * If it's the last VM around, we can skip walking all the chunk looking
1311 * for the pages owned by this VM and instead flush the whole shebang.
1312 *
1313 * This takes care of the eventuality that a VM has left shared page
1314 * references behind (shouldn't happen of course, but you never know).
1315 */
1316 Assert(pGMM->cRegisteredVMs);
1317 pGMM->cRegisteredVMs--;
1318
1319 /*
1320 * Walk the entire pool looking for pages that belong to this VM
1321 * and leftover mappings. (This'll only catch private pages,
1322 * shared pages will be 'left behind'.)
1323 */
1324 /** @todo r=bird: This scanning+freeing could be optimized in bound mode! */
1325 uint64_t cPrivatePages = pGVM->gmm.s.Stats.cPrivatePages; /* save */
1326
1327 unsigned iCountDown = 64;
1328 bool fRedoFromStart;
1329 PGMMCHUNK pChunk;
1330 do
1331 {
1332 fRedoFromStart = false;
1333 RTListForEachReverse(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
1334 {
1335 uint32_t const cFreeChunksOld = pGMM->cFreedChunks;
1336 if ( ( !pGMM->fBoundMemoryMode
1337 || pChunk->hGVM == pGVM->hSelf)
1338 && gmmR0CleanupVMScanChunk(pGMM, pGVM, pChunk))
1339 {
1340 /* We left the giant mutex, so reset the yield counters. */
1341 uLockNanoTS = RTTimeSystemNanoTS();
1342 iCountDown = 64;
1343 }
1344 else
1345 {
1346 /* Didn't leave it, so do normal yielding. */
1347 if (!iCountDown)
1348 gmmR0MutexYield(pGMM, &uLockNanoTS);
1349 else
1350 iCountDown--;
1351 }
1352 if (pGMM->cFreedChunks != cFreeChunksOld)
1353 {
1354 fRedoFromStart = true;
1355 break;
1356 }
1357 }
1358 } while (fRedoFromStart);
1359
1360 if (pGVM->gmm.s.Stats.cPrivatePages)
1361 SUPR0Printf("GMMR0CleanupVM: hGVM=%#x has %#x private pages that cannot be found!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cPrivatePages);
1362
1363 pGMM->cAllocatedPages -= cPrivatePages;
1364
1365 /*
1366 * Free empty chunks.
1367 */
1368 PGMMCHUNKFREESET pPrivateSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
1369 do
1370 {
1371 fRedoFromStart = false;
1372 iCountDown = 10240;
1373 pChunk = pPrivateSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
1374 while (pChunk)
1375 {
1376 PGMMCHUNK pNext = pChunk->pFreeNext;
1377 Assert(pChunk->cFree == GMM_CHUNK_NUM_PAGES);
1378 if ( !pGMM->fBoundMemoryMode
1379 || pChunk->hGVM == pGVM->hSelf)
1380 {
1381 uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1382 if (gmmR0FreeChunk(pGMM, pGVM, pChunk, true /*fRelaxedSem*/))
1383 {
1384 /* We've left the giant mutex, restart? (+1 for our unlink) */
1385 fRedoFromStart = pPrivateSet->idGeneration != idGenerationOld + 1;
1386 if (fRedoFromStart)
1387 break;
1388 uLockNanoTS = RTTimeSystemNanoTS();
1389 iCountDown = 10240;
1390 }
1391 }
1392
1393 /* Advance and maybe yield the lock. */
1394 pChunk = pNext;
1395 if (--iCountDown == 0)
1396 {
1397 uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1398 fRedoFromStart = gmmR0MutexYield(pGMM, &uLockNanoTS)
1399 && pPrivateSet->idGeneration != idGenerationOld;
1400 if (fRedoFromStart)
1401 break;
1402 iCountDown = 10240;
1403 }
1404 }
1405 } while (fRedoFromStart);
1406
1407 /*
1408 * Account for shared pages that weren't freed.
1409 */
1410 if (pGVM->gmm.s.Stats.cSharedPages)
1411 {
1412 Assert(pGMM->cSharedPages >= pGVM->gmm.s.Stats.cSharedPages);
1413 SUPR0Printf("GMMR0CleanupVM: hGVM=%#x left %#x shared pages behind!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cSharedPages);
1414 pGMM->cLeftBehindSharedPages += pGVM->gmm.s.Stats.cSharedPages;
1415 }
1416
1417 /*
1418 * Clean up balloon statistics in case the VM process crashed.
1419 */
1420 Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
1421 pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
1422
1423 /*
1424 * Update the over-commitment management statistics.
1425 */
1426 pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1427 + pGVM->gmm.s.Stats.Reserved.cFixedPages
1428 + pGVM->gmm.s.Stats.Reserved.cShadowPages;
1429 switch (pGVM->gmm.s.Stats.enmPolicy)
1430 {
1431 case GMMOCPOLICY_NO_OC:
1432 break;
1433 default:
1434 /** @todo Update GMM->cOverCommittedPages */
1435 break;
1436 }
1437 }
1438
1439 /* zap the GVM data. */
1440 pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1441 pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1442 pGVM->gmm.s.Stats.fMayAllocate = false;
1443
1444 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1445 gmmR0MutexRelease(pGMM);
1446
1447 LogFlow(("GMMR0CleanupVM: returns\n"));
1448}
1449
1450
1451/**
1452 * Scan one chunk for private pages belonging to the specified VM.
1453 *
1454 * @note This function may drop the giant mutex!
1455 *
1456 * @returns @c true if we've temporarily dropped the giant mutex, @c false if
1457 * we didn't.
1458 * @param pGMM Pointer to the GMM instance.
1459 * @param pGVM The global VM handle.
1460 * @param pChunk The chunk to scan.
1461 */
1462static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1463{
1464 Assert(!pGMM->fBoundMemoryMode || pChunk->hGVM == pGVM->hSelf);
1465
1466 /*
1467 * Look for pages belonging to the VM.
1468 * (Perform some internal checks while we're scanning.)
1469 */
1470#ifndef VBOX_STRICT
1471 if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
1472#endif
1473 {
1474 unsigned cPrivate = 0;
1475 unsigned cShared = 0;
1476 unsigned cFree = 0;
1477
1478 gmmR0UnlinkChunk(pChunk); /* avoiding cFreePages updates. */
1479
1480 uint16_t hGVM = pGVM->hSelf;
1481 unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
1482 while (iPage-- > 0)
1483 if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
1484 {
1485 if (pChunk->aPages[iPage].Private.hGVM == hGVM)
1486 {
1487 /*
1488 * Free the page.
1489 *
1490 * The reason for not using gmmR0FreePrivatePage here is that we
1491 * must *not* cause the chunk to be freed from under us - we're in
1492 * an AVL tree walk here.
1493 */
1494 pChunk->aPages[iPage].u = 0;
1495 pChunk->aPages[iPage].Free.iNext = pChunk->iFreeHead;
1496 pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
1497 pChunk->iFreeHead = iPage;
1498 pChunk->cPrivate--;
1499 pChunk->cFree++;
1500 pGVM->gmm.s.Stats.cPrivatePages--;
1501 cFree++;
1502 }
1503 else
1504 cPrivate++;
1505 }
1506 else if (GMM_PAGE_IS_FREE(&pChunk->aPages[iPage]))
1507 cFree++;
1508 else
1509 cShared++;
1510
1511 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1512
1513 /*
1514 * Did it add up?
1515 */
1516 if (RT_UNLIKELY( pChunk->cFree != cFree
1517 || pChunk->cPrivate != cPrivate
1518 || pChunk->cShared != cShared))
1519 {
1520 SUPR0Printf("gmmR0CleanupVMScanChunk: Chunk %RKv/%#x has bogus stats - free=%d/%d private=%d/%d shared=%d/%d\n",
1521 pChunk, pChunk->Core.Key, pChunk->cFree, cFree, pChunk->cPrivate, cPrivate, pChunk->cShared, cShared);
1522 pChunk->cFree = cFree;
1523 pChunk->cPrivate = cPrivate;
1524 pChunk->cShared = cShared;
1525 }
1526 }
1527
1528 /*
1529 * If not in bound memory mode, we should reset the hGVM field
1530 * if it has our handle in it.
1531 */
1532 if (pChunk->hGVM == pGVM->hSelf)
1533 {
1534 if (!g_pGMM->fBoundMemoryMode)
1535 pChunk->hGVM = NIL_GVM_HANDLE;
1536 else if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
1537 {
1538 SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: cFree=%#x - it should be 0 in bound mode!\n",
1539 pChunk, pChunk->Core.Key, pChunk->cFree);
1540 AssertMsgFailed(("%p/%#x: cFree=%#x - it should be 0 in bound mode!\n", pChunk, pChunk->Core.Key, pChunk->cFree));
1541
1542 gmmR0UnlinkChunk(pChunk);
1543 pChunk->cFree = GMM_CHUNK_NUM_PAGES;
1544 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1545 }
1546 }
1547
1548 /*
1549 * Look for a mapping belonging to the terminating VM.
1550 */
1551 GMMR0CHUNKMTXSTATE MtxState;
1552 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
1553 unsigned cMappings = pChunk->cMappingsX;
1554 for (unsigned i = 0; i < cMappings; i++)
1555 if (pChunk->paMappingsX[i].pGVM == pGVM)
1556 {
1557 gmmR0ChunkMutexDropGiant(&MtxState);
1558
1559 RTR0MEMOBJ hMemObj = pChunk->paMappingsX[i].hMapObj;
1560
1561 cMappings--;
1562 if (i < cMappings)
1563 pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
1564 pChunk->paMappingsX[cMappings].pGVM = NULL;
1565 pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
1566 Assert(pChunk->cMappingsX - 1U == cMappings);
1567 pChunk->cMappingsX = cMappings;
1568
1569 int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings (NA) */);
1570 if (RT_FAILURE(rc))
1571 {
1572 SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: mapping #%x: RTRMemObjFree(%RKv,false) -> %d \n",
1573 pChunk, pChunk->Core.Key, i, hMemObj, rc);
1574 AssertRC(rc);
1575 }
1576
1577 gmmR0ChunkMutexRelease(&MtxState, pChunk);
1578 return true;
1579 }
1580
1581 gmmR0ChunkMutexRelease(&MtxState, pChunk);
1582 return false;
1583}
1584
1585
1586/**
1587 * The initial resource reservations.
1588 *
1589 * This will make memory reservations according to policy and priority. If there aren't
1590 * sufficient resources available to sustain the VM this function will fail and all
1591 * future allocations requests will fail as well.
1592 *
1593 * These are just the initial reservations made very very early during the VM creation
1594 * process and will be adjusted later in the GMMR0UpdateReservation call after the
1595 * ring-3 init has completed.
1596 *
1597 * @returns VBox status code.
1598 * @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1599 * @retval VERR_GMM_
1600 *
1601 * @param pGVM The global (ring-0) VM structure.
1602 * @param idCpu The VCPU id - must be zero.
1603 * @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1604 * This does not include MMIO2 and similar.
1605 * @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1606 * @param cFixedPages The number of pages that may be allocated for fixed objects like the
1607 * hyper heap, MMIO2 and similar.
1608 * @param enmPolicy The OC policy to use on this VM.
1609 * @param enmPriority The priority in an out-of-memory situation.
1610 *
1611 * @thread The creator thread / EMT(0).
1612 */
1613GMMR0DECL(int) GMMR0InitialReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages,
1614 uint32_t cFixedPages, GMMOCPOLICY enmPolicy, GMMPRIORITY enmPriority)
1615{
1616 LogFlow(("GMMR0InitialReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x enmPolicy=%d enmPriority=%d\n",
1617 pGVM, cBasePages, cShadowPages, cFixedPages, enmPolicy, enmPriority));
1618
1619 /*
1620 * Validate, get basics and take the semaphore.
1621 */
1622 AssertReturn(idCpu == 0, VERR_INVALID_CPU_ID);
1623 PGMM pGMM;
1624 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1625 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1626 if (RT_FAILURE(rc))
1627 return rc;
1628
1629 AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1630 AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1631 AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1632 AssertReturn(enmPolicy > GMMOCPOLICY_INVALID && enmPolicy < GMMOCPOLICY_END, VERR_INVALID_PARAMETER);
1633 AssertReturn(enmPriority > GMMPRIORITY_INVALID && enmPriority < GMMPRIORITY_END, VERR_INVALID_PARAMETER);
1634
1635 gmmR0MutexAcquire(pGMM);
1636 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1637 {
1638 if ( !pGVM->gmm.s.Stats.Reserved.cBasePages
1639 && !pGVM->gmm.s.Stats.Reserved.cFixedPages
1640 && !pGVM->gmm.s.Stats.Reserved.cShadowPages)
1641 {
1642 /*
1643 * Check if we can accommodate this.
1644 */
1645 /* ... later ... */
1646 if (RT_SUCCESS(rc))
1647 {
1648 /*
1649 * Update the records.
1650 */
1651 pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1652 pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1653 pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1654 pGVM->gmm.s.Stats.enmPolicy = enmPolicy;
1655 pGVM->gmm.s.Stats.enmPriority = enmPriority;
1656 pGVM->gmm.s.Stats.fMayAllocate = true;
1657
1658 pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1659 pGMM->cRegisteredVMs++;
1660 }
1661 }
1662 else
1663 rc = VERR_WRONG_ORDER;
1664 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1665 }
1666 else
1667 rc = VERR_GMM_IS_NOT_SANE;
1668 gmmR0MutexRelease(pGMM);
1669 LogFlow(("GMMR0InitialReservation: returns %Rrc\n", rc));
1670 return rc;
1671}
1672
1673
1674/**
1675 * VMMR0 request wrapper for GMMR0InitialReservation.
1676 *
1677 * @returns see GMMR0InitialReservation.
1678 * @param pGVM The global (ring-0) VM structure.
1679 * @param idCpu The VCPU id.
1680 * @param pReq Pointer to the request packet.
1681 */
1682GMMR0DECL(int) GMMR0InitialReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMINITIALRESERVATIONREQ pReq)
1683{
1684 /*
1685 * Validate input and pass it on.
1686 */
1687 AssertPtrReturn(pGVM, VERR_INVALID_POINTER);
1688 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1689 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
1690
1691 return GMMR0InitialReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages,
1692 pReq->cFixedPages, pReq->enmPolicy, pReq->enmPriority);
1693}
1694
1695
1696/**
1697 * This updates the memory reservation with the additional MMIO2 and ROM pages.
1698 *
1699 * @returns VBox status code.
1700 * @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1701 *
1702 * @param pGVM The global (ring-0) VM structure.
1703 * @param idCpu The VCPU id.
1704 * @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1705 * This does not include MMIO2 and similar.
1706 * @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1707 * @param cFixedPages The number of pages that may be allocated for fixed objects like the
1708 * hyper heap, MMIO2 and similar.
1709 *
1710 * @thread EMT(idCpu)
1711 */
1712GMMR0DECL(int) GMMR0UpdateReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages,
1713 uint32_t cShadowPages, uint32_t cFixedPages)
1714{
1715 LogFlow(("GMMR0UpdateReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x\n",
1716 pGVM, cBasePages, cShadowPages, cFixedPages));
1717
1718 /*
1719 * Validate, get basics and take the semaphore.
1720 */
1721 PGMM pGMM;
1722 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1723 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1724 if (RT_FAILURE(rc))
1725 return rc;
1726
1727 AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1728 AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1729 AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1730
1731 gmmR0MutexAcquire(pGMM);
1732 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1733 {
1734 if ( pGVM->gmm.s.Stats.Reserved.cBasePages
1735 && pGVM->gmm.s.Stats.Reserved.cFixedPages
1736 && pGVM->gmm.s.Stats.Reserved.cShadowPages)
1737 {
1738 /*
1739 * Check if we can accommodate this.
1740 */
1741 /* ... later ... */
1742 if (RT_SUCCESS(rc))
1743 {
1744 /*
1745 * Update the records.
1746 */
1747 pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1748 + pGVM->gmm.s.Stats.Reserved.cFixedPages
1749 + pGVM->gmm.s.Stats.Reserved.cShadowPages;
1750 pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1751
1752 pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1753 pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1754 pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1755 }
1756 }
1757 else
1758 rc = VERR_WRONG_ORDER;
1759 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1760 }
1761 else
1762 rc = VERR_GMM_IS_NOT_SANE;
1763 gmmR0MutexRelease(pGMM);
1764 LogFlow(("GMMR0UpdateReservation: returns %Rrc\n", rc));
1765 return rc;
1766}
1767
1768
1769/**
1770 * VMMR0 request wrapper for GMMR0UpdateReservation.
1771 *
1772 * @returns see GMMR0UpdateReservation.
1773 * @param pGVM The global (ring-0) VM structure.
1774 * @param idCpu The VCPU id.
1775 * @param pReq Pointer to the request packet.
1776 */
1777GMMR0DECL(int) GMMR0UpdateReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMUPDATERESERVATIONREQ pReq)
1778{
1779 /*
1780 * Validate input and pass it on.
1781 */
1782 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1783 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
1784
1785 return GMMR0UpdateReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages);
1786}
1787
1788#ifdef GMMR0_WITH_SANITY_CHECK
1789
1790/**
1791 * Performs sanity checks on a free set.
1792 *
1793 * @returns Error count.
1794 *
1795 * @param pGMM Pointer to the GMM instance.
1796 * @param pSet Pointer to the set.
1797 * @param pszSetName The set name.
1798 * @param pszFunction The function from which it was called.
1799 * @param uLine The line number.
1800 */
1801static uint32_t gmmR0SanityCheckSet(PGMM pGMM, PGMMCHUNKFREESET pSet, const char *pszSetName,
1802 const char *pszFunction, unsigned uLineNo)
1803{
1804 uint32_t cErrors = 0;
1805
1806 /*
1807 * Count the free pages in all the chunks and match it against pSet->cFreePages.
1808 */
1809 uint32_t cPages = 0;
1810 for (unsigned i = 0; i < RT_ELEMENTS(pSet->apLists); i++)
1811 {
1812 for (PGMMCHUNK pCur = pSet->apLists[i]; pCur; pCur = pCur->pFreeNext)
1813 {
1814 /** @todo check that the chunk is hash into the right set. */
1815 cPages += pCur->cFree;
1816 }
1817 }
1818 if (RT_UNLIKELY(cPages != pSet->cFreePages))
1819 {
1820 SUPR0Printf("GMM insanity: found %#x pages in the %s set, expected %#x. (%s, line %u)\n",
1821 cPages, pszSetName, pSet->cFreePages, pszFunction, uLineNo);
1822 cErrors++;
1823 }
1824
1825 return cErrors;
1826}
1827
1828
1829/**
1830 * Performs some sanity checks on the GMM while owning lock.
1831 *
1832 * @returns Error count.
1833 *
1834 * @param pGMM Pointer to the GMM instance.
1835 * @param pszFunction The function from which it is called.
1836 * @param uLineNo The line number.
1837 */
1838static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo)
1839{
1840 uint32_t cErrors = 0;
1841
1842 cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->PrivateX, "private", pszFunction, uLineNo);
1843 cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->Shared, "shared", pszFunction, uLineNo);
1844 /** @todo add more sanity checks. */
1845
1846 return cErrors;
1847}
1848
1849#endif /* GMMR0_WITH_SANITY_CHECK */
1850
1851/**
1852 * Looks up a chunk in the tree and fill in the TLB entry for it.
1853 *
1854 * This is not expected to fail and will bitch if it does.
1855 *
1856 * @returns Pointer to the allocation chunk, NULL if not found.
1857 * @param pGMM Pointer to the GMM instance.
1858 * @param idChunk The ID of the chunk to find.
1859 * @param pTlbe Pointer to the TLB entry.
1860 *
1861 * @note Caller owns spinlock.
1862 */
1863static PGMMCHUNK gmmR0GetChunkSlow(PGMM pGMM, uint32_t idChunk, PGMMCHUNKTLBE pTlbe)
1864{
1865 PGMMCHUNK pChunk = (PGMMCHUNK)RTAvlU32Get(&pGMM->pChunks, idChunk);
1866 AssertMsgReturn(pChunk, ("Chunk %#x not found!\n", idChunk), NULL);
1867 pTlbe->idChunk = idChunk;
1868 pTlbe->pChunk = pChunk;
1869 return pChunk;
1870}
1871
1872
1873/**
1874 * Finds a allocation chunk, spin-locked.
1875 *
1876 * This is not expected to fail and will bitch if it does.
1877 *
1878 * @returns Pointer to the allocation chunk, NULL if not found.
1879 * @param pGMM Pointer to the GMM instance.
1880 * @param idChunk The ID of the chunk to find.
1881 */
1882DECLINLINE(PGMMCHUNK) gmmR0GetChunkLocked(PGMM pGMM, uint32_t idChunk)
1883{
1884 /*
1885 * Do a TLB lookup, branch if not in the TLB.
1886 */
1887 PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(idChunk)];
1888 PGMMCHUNK pChunk = pTlbe->pChunk;
1889 if ( pChunk == NULL
1890 || pTlbe->idChunk != idChunk)
1891 pChunk = gmmR0GetChunkSlow(pGMM, idChunk, pTlbe);
1892 return pChunk;
1893}
1894
1895
1896/**
1897 * Finds a allocation chunk.
1898 *
1899 * This is not expected to fail and will bitch if it does.
1900 *
1901 * @returns Pointer to the allocation chunk, NULL if not found.
1902 * @param pGMM Pointer to the GMM instance.
1903 * @param idChunk The ID of the chunk to find.
1904 */
1905DECLINLINE(PGMMCHUNK) gmmR0GetChunk(PGMM pGMM, uint32_t idChunk)
1906{
1907 RTSpinlockAcquire(pGMM->hSpinLockTree);
1908 PGMMCHUNK pChunk = gmmR0GetChunkLocked(pGMM, idChunk);
1909 RTSpinlockRelease(pGMM->hSpinLockTree);
1910 return pChunk;
1911}
1912
1913
1914/**
1915 * Finds a page.
1916 *
1917 * This is not expected to fail and will bitch if it does.
1918 *
1919 * @returns Pointer to the page, NULL if not found.
1920 * @param pGMM Pointer to the GMM instance.
1921 * @param idPage The ID of the page to find.
1922 */
1923DECLINLINE(PGMMPAGE) gmmR0GetPage(PGMM pGMM, uint32_t idPage)
1924{
1925 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1926 if (RT_LIKELY(pChunk))
1927 return &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
1928 return NULL;
1929}
1930
1931
1932#if 0 /* unused */
1933/**
1934 * Gets the host physical address for a page given by it's ID.
1935 *
1936 * @returns The host physical address or NIL_RTHCPHYS.
1937 * @param pGMM Pointer to the GMM instance.
1938 * @param idPage The ID of the page to find.
1939 */
1940DECLINLINE(RTHCPHYS) gmmR0GetPageHCPhys(PGMM pGMM, uint32_t idPage)
1941{
1942 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1943 if (RT_LIKELY(pChunk))
1944 return RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, idPage & GMM_PAGEID_IDX_MASK);
1945 return NIL_RTHCPHYS;
1946}
1947#endif /* unused */
1948
1949
1950/**
1951 * Selects the appropriate free list given the number of free pages.
1952 *
1953 * @returns Free list index.
1954 * @param cFree The number of free pages in the chunk.
1955 */
1956DECLINLINE(unsigned) gmmR0SelectFreeSetList(unsigned cFree)
1957{
1958 unsigned iList = cFree >> GMM_CHUNK_FREE_SET_SHIFT;
1959 AssertMsg(iList < RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists) / RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists[0]),
1960 ("%d (%u)\n", iList, cFree));
1961 return iList;
1962}
1963
1964
1965/**
1966 * Unlinks the chunk from the free list it's currently on (if any).
1967 *
1968 * @param pChunk The allocation chunk.
1969 */
1970DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk)
1971{
1972 PGMMCHUNKFREESET pSet = pChunk->pSet;
1973 if (RT_LIKELY(pSet))
1974 {
1975 pSet->cFreePages -= pChunk->cFree;
1976 pSet->idGeneration++;
1977
1978 PGMMCHUNK pPrev = pChunk->pFreePrev;
1979 PGMMCHUNK pNext = pChunk->pFreeNext;
1980 if (pPrev)
1981 pPrev->pFreeNext = pNext;
1982 else
1983 pSet->apLists[gmmR0SelectFreeSetList(pChunk->cFree)] = pNext;
1984 if (pNext)
1985 pNext->pFreePrev = pPrev;
1986
1987 pChunk->pSet = NULL;
1988 pChunk->pFreeNext = NULL;
1989 pChunk->pFreePrev = NULL;
1990 }
1991 else
1992 {
1993 Assert(!pChunk->pFreeNext);
1994 Assert(!pChunk->pFreePrev);
1995 Assert(!pChunk->cFree);
1996 }
1997}
1998
1999
2000/**
2001 * Links the chunk onto the appropriate free list in the specified free set.
2002 *
2003 * If no free entries, it's not linked into any list.
2004 *
2005 * @param pChunk The allocation chunk.
2006 * @param pSet The free set.
2007 */
2008DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet)
2009{
2010 Assert(!pChunk->pSet);
2011 Assert(!pChunk->pFreeNext);
2012 Assert(!pChunk->pFreePrev);
2013
2014 if (pChunk->cFree > 0)
2015 {
2016 pChunk->pSet = pSet;
2017 pChunk->pFreePrev = NULL;
2018 unsigned const iList = gmmR0SelectFreeSetList(pChunk->cFree);
2019 pChunk->pFreeNext = pSet->apLists[iList];
2020 if (pChunk->pFreeNext)
2021 pChunk->pFreeNext->pFreePrev = pChunk;
2022 pSet->apLists[iList] = pChunk;
2023
2024 pSet->cFreePages += pChunk->cFree;
2025 pSet->idGeneration++;
2026 }
2027}
2028
2029
2030/**
2031 * Links the chunk onto the appropriate free list in the specified free set.
2032 *
2033 * If no free entries, it's not linked into any list.
2034 *
2035 * @param pGMM Pointer to the GMM instance.
2036 * @param pGVM Pointer to the kernel-only VM instace data.
2037 * @param pChunk The allocation chunk.
2038 */
2039DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
2040{
2041 PGMMCHUNKFREESET pSet;
2042 if (pGMM->fBoundMemoryMode)
2043 pSet = &pGVM->gmm.s.Private;
2044 else if (pChunk->cShared)
2045 pSet = &pGMM->Shared;
2046 else
2047 pSet = &pGMM->PrivateX;
2048 gmmR0LinkChunk(pChunk, pSet);
2049}
2050
2051
2052/**
2053 * Frees a Chunk ID.
2054 *
2055 * @param pGMM Pointer to the GMM instance.
2056 * @param idChunk The Chunk ID to free.
2057 */
2058static void gmmR0FreeChunkId(PGMM pGMM, uint32_t idChunk)
2059{
2060 AssertReturnVoid(idChunk != NIL_GMM_CHUNKID);
2061 AssertMsg(ASMBitTest(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk));
2062 ASMAtomicBitClear(&pGMM->bmChunkId[0], idChunk);
2063}
2064
2065
2066/**
2067 * Allocates a new Chunk ID.
2068 *
2069 * @returns The Chunk ID.
2070 * @param pGMM Pointer to the GMM instance.
2071 */
2072static uint32_t gmmR0AllocateChunkId(PGMM pGMM)
2073{
2074 AssertCompile(!((GMM_CHUNKID_LAST + 1) & 31)); /* must be a multiple of 32 */
2075 AssertCompile(NIL_GMM_CHUNKID == 0);
2076
2077 /*
2078 * Try the next sequential one.
2079 */
2080 int32_t idChunk = ++pGMM->idChunkPrev;
2081#if 0 /** @todo enable this code */
2082 if ( idChunk <= GMM_CHUNKID_LAST
2083 && idChunk > NIL_GMM_CHUNKID
2084 && !ASMAtomicBitTestAndSet(&pVMM->bmChunkId[0], idChunk))
2085 return idChunk;
2086#endif
2087
2088 /*
2089 * Scan sequentially from the last one.
2090 */
2091 if ( (uint32_t)idChunk < GMM_CHUNKID_LAST
2092 && idChunk > NIL_GMM_CHUNKID)
2093 {
2094 idChunk = ASMBitNextClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1, idChunk - 1);
2095 if (idChunk > NIL_GMM_CHUNKID)
2096 {
2097 AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2098 return pGMM->idChunkPrev = idChunk;
2099 }
2100 }
2101
2102 /*
2103 * Ok, scan from the start.
2104 * We're not racing anyone, so there is no need to expect failures or have restart loops.
2105 */
2106 idChunk = ASMBitFirstClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1);
2107 AssertMsgReturn(idChunk > NIL_GMM_CHUNKID, ("%#x\n", idChunk), NIL_GVM_HANDLE);
2108 AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2109
2110 return pGMM->idChunkPrev = idChunk;
2111}
2112
2113
2114/**
2115 * Allocates one private page.
2116 *
2117 * Worker for gmmR0AllocatePages.
2118 *
2119 * @param pChunk The chunk to allocate it from.
2120 * @param hGVM The GVM handle of the VM requesting memory.
2121 * @param pPageDesc The page descriptor.
2122 */
2123static void gmmR0AllocatePage(PGMMCHUNK pChunk, uint32_t hGVM, PGMMPAGEDESC pPageDesc)
2124{
2125 /* update the chunk stats. */
2126 if (pChunk->hGVM == NIL_GVM_HANDLE)
2127 pChunk->hGVM = hGVM;
2128 Assert(pChunk->cFree);
2129 pChunk->cFree--;
2130 pChunk->cPrivate++;
2131
2132 /* unlink the first free page. */
2133 const uint32_t iPage = pChunk->iFreeHead;
2134 AssertReleaseMsg(iPage < RT_ELEMENTS(pChunk->aPages), ("%d\n", iPage));
2135 PGMMPAGE pPage = &pChunk->aPages[iPage];
2136 Assert(GMM_PAGE_IS_FREE(pPage));
2137 pChunk->iFreeHead = pPage->Free.iNext;
2138 Log3(("A pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x iNext=%#x\n",
2139 pPage, iPage, (pChunk->Core.Key << GMM_CHUNKID_SHIFT) | iPage,
2140 pPage->Common.u2State, pChunk->iFreeHead, pPage->Free.iNext));
2141
2142 /* make the page private. */
2143 pPage->u = 0;
2144 AssertCompile(GMM_PAGE_STATE_PRIVATE == 0);
2145 pPage->Private.hGVM = hGVM;
2146 AssertCompile(NIL_RTHCPHYS >= GMM_GCPHYS_LAST);
2147 AssertCompile(GMM_GCPHYS_UNSHAREABLE >= GMM_GCPHYS_LAST);
2148 if (pPageDesc->HCPhysGCPhys <= GMM_GCPHYS_LAST)
2149 pPage->Private.pfn = pPageDesc->HCPhysGCPhys >> PAGE_SHIFT;
2150 else
2151 pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE; /* unshareable / unassigned - same thing. */
2152
2153 /* update the page descriptor. */
2154 pPageDesc->HCPhysGCPhys = RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, iPage);
2155 Assert(pPageDesc->HCPhysGCPhys != NIL_RTHCPHYS);
2156 pPageDesc->idPage = (pChunk->Core.Key << GMM_CHUNKID_SHIFT) | iPage;
2157 pPageDesc->idSharedPage = NIL_GMM_PAGEID;
2158}
2159
2160
2161/**
2162 * Picks the free pages from a chunk.
2163 *
2164 * @returns The new page descriptor table index.
2165 * @param pChunk The chunk.
2166 * @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2167 * affinity.
2168 * @param iPage The current page descriptor table index.
2169 * @param cPages The total number of pages to allocate.
2170 * @param paPages The page descriptor table (input + ouput).
2171 */
2172static uint32_t gmmR0AllocatePagesFromChunk(PGMMCHUNK pChunk, uint16_t const hGVM, uint32_t iPage, uint32_t cPages,
2173 PGMMPAGEDESC paPages)
2174{
2175 PGMMCHUNKFREESET pSet = pChunk->pSet; Assert(pSet);
2176 gmmR0UnlinkChunk(pChunk);
2177
2178 for (; pChunk->cFree && iPage < cPages; iPage++)
2179 gmmR0AllocatePage(pChunk, hGVM, &paPages[iPage]);
2180
2181 gmmR0LinkChunk(pChunk, pSet);
2182 return iPage;
2183}
2184
2185
2186/**
2187 * Registers a new chunk of memory.
2188 *
2189 * This is called by both gmmR0AllocateOneChunk and GMMR0SeedChunk.
2190 *
2191 * @returns VBox status code. On success, the giant GMM lock will be held, the
2192 * caller must release it (ugly).
2193 * @param pGMM Pointer to the GMM instance.
2194 * @param pSet Pointer to the set.
2195 * @param hMemObj The memory object for the chunk.
2196 * @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2197 * affinity.
2198 * @param fChunkFlags The chunk flags, GMM_CHUNK_FLAGS_XXX.
2199 * @param ppChunk Chunk address (out). Optional.
2200 *
2201 * @remarks The caller must not own the giant GMM mutex.
2202 * The giant GMM mutex will be acquired and returned acquired in
2203 * the success path. On failure, no locks will be held.
2204 */
2205static int gmmR0RegisterChunk(PGMM pGMM, PGMMCHUNKFREESET pSet, RTR0MEMOBJ hMemObj, uint16_t hGVM, uint16_t fChunkFlags,
2206 PGMMCHUNK *ppChunk)
2207{
2208 Assert(pGMM->hMtxOwner != RTThreadNativeSelf());
2209 Assert(hGVM != NIL_GVM_HANDLE || pGMM->fBoundMemoryMode);
2210#ifdef GMM_WITH_LEGACY_MODE
2211 Assert(fChunkFlags == 0 || fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE || fChunkFlags == GMM_CHUNK_FLAGS_SEEDED);
2212#else
2213 Assert(fChunkFlags == 0 || fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE);
2214#endif
2215
2216#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
2217 /*
2218 * Get a ring-0 mapping of the object.
2219 */
2220# ifdef GMM_WITH_LEGACY_MODE
2221 uint8_t *pbMapping = !(fChunkFlags & GMM_CHUNK_FLAGS_SEEDED) ? (uint8_t *)RTR0MemObjAddress(hMemObj) : NULL;
2222# else
2223 uint8_t *pbMapping = (uint8_t *)RTR0MemObjAddress(hMemObj);
2224# endif
2225 if (!pbMapping)
2226 {
2227 RTR0MEMOBJ hMapObj;
2228 int rc = RTR0MemObjMapKernel(&hMapObj, hMemObj, (void *)-1, 0, RTMEM_PROT_READ | RTMEM_PROT_WRITE);
2229 if (RT_SUCCESS(rc))
2230 pbMapping = (uint8_t *)RTR0MemObjAddress(hMapObj);
2231 else
2232 return rc;
2233 AssertPtr(pbMapping);
2234 }
2235#endif
2236
2237 /*
2238 * Allocate a chunk.
2239 */
2240 int rc;
2241 PGMMCHUNK pChunk = (PGMMCHUNK)RTMemAllocZ(sizeof(*pChunk));
2242 if (pChunk)
2243 {
2244 /*
2245 * Initialize it.
2246 */
2247 pChunk->hMemObj = hMemObj;
2248#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
2249 pChunk->pbMapping = pbMapping;
2250#endif
2251 pChunk->cFree = GMM_CHUNK_NUM_PAGES;
2252 pChunk->hGVM = hGVM;
2253 /*pChunk->iFreeHead = 0;*/
2254 pChunk->idNumaNode = gmmR0GetCurrentNumaNodeId();
2255 pChunk->iChunkMtx = UINT8_MAX;
2256 pChunk->fFlags = fChunkFlags;
2257 for (unsigned iPage = 0; iPage < RT_ELEMENTS(pChunk->aPages) - 1; iPage++)
2258 {
2259 pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
2260 pChunk->aPages[iPage].Free.iNext = iPage + 1;
2261 }
2262 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.u2State = GMM_PAGE_STATE_FREE;
2263 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.iNext = UINT16_MAX;
2264
2265 /*
2266 * Allocate a Chunk ID and insert it into the tree.
2267 * This has to be done behind the mutex of course.
2268 */
2269 rc = gmmR0MutexAcquire(pGMM);
2270 if (RT_SUCCESS(rc))
2271 {
2272 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2273 {
2274 pChunk->Core.Key = gmmR0AllocateChunkId(pGMM);
2275 if ( pChunk->Core.Key != NIL_GMM_CHUNKID
2276 && pChunk->Core.Key <= GMM_CHUNKID_LAST)
2277 {
2278 RTSpinlockAcquire(pGMM->hSpinLockTree);
2279 if (RTAvlU32Insert(&pGMM->pChunks, &pChunk->Core))
2280 {
2281 pGMM->cChunks++;
2282 RTListAppend(&pGMM->ChunkList, &pChunk->ListNode);
2283 RTSpinlockRelease(pGMM->hSpinLockTree);
2284
2285 gmmR0LinkChunk(pChunk, pSet);
2286
2287 LogFlow(("gmmR0RegisterChunk: pChunk=%p id=%#x cChunks=%d\n", pChunk, pChunk->Core.Key, pGMM->cChunks));
2288
2289 if (ppChunk)
2290 *ppChunk = pChunk;
2291 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2292 return VINF_SUCCESS;
2293 }
2294 RTSpinlockRelease(pGMM->hSpinLockTree);
2295 }
2296
2297 /* bail out */
2298 rc = VERR_GMM_CHUNK_INSERT;
2299 }
2300 else
2301 rc = VERR_GMM_IS_NOT_SANE;
2302 gmmR0MutexRelease(pGMM);
2303 }
2304
2305 RTMemFree(pChunk);
2306 }
2307 else
2308 rc = VERR_NO_MEMORY;
2309 return rc;
2310}
2311
2312
2313/**
2314 * Allocate a new chunk, immediately pick the requested pages from it, and adds
2315 * what's remaining to the specified free set.
2316 *
2317 * @note This will leave the giant mutex while allocating the new chunk!
2318 *
2319 * @returns VBox status code.
2320 * @param pGMM Pointer to the GMM instance data.
2321 * @param pGVM Pointer to the kernel-only VM instace data.
2322 * @param pSet Pointer to the free set.
2323 * @param cPages The number of pages requested.
2324 * @param paPages The page descriptor table (input + output).
2325 * @param piPage The pointer to the page descriptor table index variable.
2326 * This will be updated.
2327 */
2328static int gmmR0AllocateChunkNew(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet, uint32_t cPages,
2329 PGMMPAGEDESC paPages, uint32_t *piPage)
2330{
2331 gmmR0MutexRelease(pGMM);
2332
2333 RTR0MEMOBJ hMemObj;
2334#ifndef GMM_WITH_LEGACY_MODE
2335 int rc;
2336# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2337 if (pGMM->fHasWorkingAllocPhysNC)
2338 rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2339 else
2340# endif
2341 rc = RTR0MemObjAllocPage(&hMemObj, GMM_CHUNK_SIZE, false /*fExecutable*/);
2342#else
2343 int rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2344#endif
2345 if (RT_SUCCESS(rc))
2346 {
2347 /** @todo Duplicate gmmR0RegisterChunk here so we can avoid chaining up the
2348 * free pages first and then unchaining them right afterwards. Instead
2349 * do as much work as possible without holding the giant lock. */
2350 PGMMCHUNK pChunk;
2351 rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, 0 /*fChunkFlags*/, &pChunk);
2352 if (RT_SUCCESS(rc))
2353 {
2354 *piPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, *piPage, cPages, paPages);
2355 return VINF_SUCCESS;
2356 }
2357
2358 /* bail out */
2359 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
2360 }
2361
2362 int rc2 = gmmR0MutexAcquire(pGMM);
2363 AssertRCReturn(rc2, RT_FAILURE(rc) ? rc : rc2);
2364 return rc;
2365
2366}
2367
2368
2369/**
2370 * As a last restort we'll pick any page we can get.
2371 *
2372 * @returns The new page descriptor table index.
2373 * @param pSet The set to pick from.
2374 * @param pGVM Pointer to the global VM structure.
2375 * @param iPage The current page descriptor table index.
2376 * @param cPages The total number of pages to allocate.
2377 * @param paPages The page descriptor table (input + ouput).
2378 */
2379static uint32_t gmmR0AllocatePagesIndiscriminately(PGMMCHUNKFREESET pSet, PGVM pGVM,
2380 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2381{
2382 unsigned iList = RT_ELEMENTS(pSet->apLists);
2383 while (iList-- > 0)
2384 {
2385 PGMMCHUNK pChunk = pSet->apLists[iList];
2386 while (pChunk)
2387 {
2388 PGMMCHUNK pNext = pChunk->pFreeNext;
2389
2390 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2391 if (iPage >= cPages)
2392 return iPage;
2393
2394 pChunk = pNext;
2395 }
2396 }
2397 return iPage;
2398}
2399
2400
2401/**
2402 * Pick pages from empty chunks on the same NUMA node.
2403 *
2404 * @returns The new page descriptor table index.
2405 * @param pSet The set to pick from.
2406 * @param pGVM Pointer to the global VM structure.
2407 * @param iPage The current page descriptor table index.
2408 * @param cPages The total number of pages to allocate.
2409 * @param paPages The page descriptor table (input + ouput).
2410 */
2411static uint32_t gmmR0AllocatePagesFromEmptyChunksOnSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2412 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2413{
2414 PGMMCHUNK pChunk = pSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
2415 if (pChunk)
2416 {
2417 uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2418 while (pChunk)
2419 {
2420 PGMMCHUNK pNext = pChunk->pFreeNext;
2421
2422 if (pChunk->idNumaNode == idNumaNode)
2423 {
2424 pChunk->hGVM = pGVM->hSelf;
2425 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2426 if (iPage >= cPages)
2427 {
2428 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2429 return iPage;
2430 }
2431 }
2432
2433 pChunk = pNext;
2434 }
2435 }
2436 return iPage;
2437}
2438
2439
2440/**
2441 * Pick pages from non-empty chunks on the same NUMA node.
2442 *
2443 * @returns The new page descriptor table index.
2444 * @param pSet The set to pick from.
2445 * @param pGVM Pointer to the global VM structure.
2446 * @param iPage The current page descriptor table index.
2447 * @param cPages The total number of pages to allocate.
2448 * @param paPages The page descriptor table (input + ouput).
2449 */
2450static uint32_t gmmR0AllocatePagesFromSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2451 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2452{
2453 /** @todo start by picking from chunks with about the right size first? */
2454 uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2455 unsigned iList = GMM_CHUNK_FREE_SET_UNUSED_LIST;
2456 while (iList-- > 0)
2457 {
2458 PGMMCHUNK pChunk = pSet->apLists[iList];
2459 while (pChunk)
2460 {
2461 PGMMCHUNK pNext = pChunk->pFreeNext;
2462
2463 if (pChunk->idNumaNode == idNumaNode)
2464 {
2465 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2466 if (iPage >= cPages)
2467 {
2468 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2469 return iPage;
2470 }
2471 }
2472
2473 pChunk = pNext;
2474 }
2475 }
2476 return iPage;
2477}
2478
2479
2480/**
2481 * Pick pages that are in chunks already associated with the VM.
2482 *
2483 * @returns The new page descriptor table index.
2484 * @param pGMM Pointer to the GMM instance data.
2485 * @param pGVM Pointer to the global VM structure.
2486 * @param pSet The set to pick from.
2487 * @param iPage The current page descriptor table index.
2488 * @param cPages The total number of pages to allocate.
2489 * @param paPages The page descriptor table (input + ouput).
2490 */
2491static uint32_t gmmR0AllocatePagesAssociatedWithVM(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet,
2492 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2493{
2494 uint16_t const hGVM = pGVM->hSelf;
2495
2496 /* Hint. */
2497 if (pGVM->gmm.s.idLastChunkHint != NIL_GMM_CHUNKID)
2498 {
2499 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pGVM->gmm.s.idLastChunkHint);
2500 if (pChunk && pChunk->cFree)
2501 {
2502 iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2503 if (iPage >= cPages)
2504 return iPage;
2505 }
2506 }
2507
2508 /* Scan. */
2509 for (unsigned iList = 0; iList < RT_ELEMENTS(pSet->apLists); iList++)
2510 {
2511 PGMMCHUNK pChunk = pSet->apLists[iList];
2512 while (pChunk)
2513 {
2514 PGMMCHUNK pNext = pChunk->pFreeNext;
2515
2516 if (pChunk->hGVM == hGVM)
2517 {
2518 iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2519 if (iPage >= cPages)
2520 {
2521 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2522 return iPage;
2523 }
2524 }
2525
2526 pChunk = pNext;
2527 }
2528 }
2529 return iPage;
2530}
2531
2532
2533
2534/**
2535 * Pick pages in bound memory mode.
2536 *
2537 * @returns The new page descriptor table index.
2538 * @param pGVM Pointer to the global VM structure.
2539 * @param iPage The current page descriptor table index.
2540 * @param cPages The total number of pages to allocate.
2541 * @param paPages The page descriptor table (input + ouput).
2542 */
2543static uint32_t gmmR0AllocatePagesInBoundMode(PGVM pGVM, uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2544{
2545 for (unsigned iList = 0; iList < RT_ELEMENTS(pGVM->gmm.s.Private.apLists); iList++)
2546 {
2547 PGMMCHUNK pChunk = pGVM->gmm.s.Private.apLists[iList];
2548 while (pChunk)
2549 {
2550 Assert(pChunk->hGVM == pGVM->hSelf);
2551 PGMMCHUNK pNext = pChunk->pFreeNext;
2552 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2553 if (iPage >= cPages)
2554 return iPage;
2555 pChunk = pNext;
2556 }
2557 }
2558 return iPage;
2559}
2560
2561
2562/**
2563 * Checks if we should start picking pages from chunks of other VMs because
2564 * we're getting close to the system memory or reserved limit.
2565 *
2566 * @returns @c true if we should, @c false if we should first try allocate more
2567 * chunks.
2568 */
2569static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(PGVM pGVM)
2570{
2571 /*
2572 * Don't allocate a new chunk if we're
2573 */
2574 uint64_t cPgReserved = pGVM->gmm.s.Stats.Reserved.cBasePages
2575 + pGVM->gmm.s.Stats.Reserved.cFixedPages
2576 - pGVM->gmm.s.Stats.cBalloonedPages
2577 /** @todo what about shared pages? */;
2578 uint64_t cPgAllocated = pGVM->gmm.s.Stats.Allocated.cBasePages
2579 + pGVM->gmm.s.Stats.Allocated.cFixedPages;
2580 uint64_t cPgDelta = cPgReserved - cPgAllocated;
2581 if (cPgDelta < GMM_CHUNK_NUM_PAGES * 4)
2582 return true;
2583 /** @todo make the threshold configurable, also test the code to see if
2584 * this ever kicks in (we might be reserving too much or smth). */
2585
2586 /*
2587 * Check how close we're to the max memory limit and how many fragments
2588 * there are?...
2589 */
2590 /** @todo */
2591
2592 return false;
2593}
2594
2595
2596/**
2597 * Checks if we should start picking pages from chunks of other VMs because
2598 * there is a lot of free pages around.
2599 *
2600 * @returns @c true if we should, @c false if we should first try allocate more
2601 * chunks.
2602 */
2603static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(PGMM pGMM)
2604{
2605 /*
2606 * Setting the limit at 16 chunks (32 MB) at the moment.
2607 */
2608 if (pGMM->PrivateX.cFreePages >= GMM_CHUNK_NUM_PAGES * 16)
2609 return true;
2610 return false;
2611}
2612
2613
2614/**
2615 * Common worker for GMMR0AllocateHandyPages and GMMR0AllocatePages.
2616 *
2617 * @returns VBox status code:
2618 * @retval VINF_SUCCESS on success.
2619 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk or
2620 * gmmR0AllocateMoreChunks is necessary.
2621 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2622 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2623 * that is we're trying to allocate more than we've reserved.
2624 *
2625 * @param pGMM Pointer to the GMM instance data.
2626 * @param pGVM Pointer to the VM.
2627 * @param cPages The number of pages to allocate.
2628 * @param paPages Pointer to the page descriptors. See GMMPAGEDESC for
2629 * details on what is expected on input.
2630 * @param enmAccount The account to charge.
2631 *
2632 * @remarks Call takes the giant GMM lock.
2633 */
2634static int gmmR0AllocatePagesNew(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2635{
2636 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
2637
2638 /*
2639 * Check allocation limits.
2640 */
2641 if (RT_UNLIKELY(pGMM->cAllocatedPages + cPages > pGMM->cMaxPages))
2642 return VERR_GMM_HIT_GLOBAL_LIMIT;
2643
2644 switch (enmAccount)
2645 {
2646 case GMMACCOUNT_BASE:
2647 if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
2648 > pGVM->gmm.s.Stats.Reserved.cBasePages))
2649 {
2650 Log(("gmmR0AllocatePages:Base: Reserved=%#llx Allocated+Ballooned+Requested=%#llx+%#llx+%#x!\n",
2651 pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages,
2652 pGVM->gmm.s.Stats.cBalloonedPages, cPages));
2653 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2654 }
2655 break;
2656 case GMMACCOUNT_SHADOW:
2657 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages + cPages > pGVM->gmm.s.Stats.Reserved.cShadowPages))
2658 {
2659 Log(("gmmR0AllocatePages:Shadow: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2660 pGVM->gmm.s.Stats.Reserved.cShadowPages, pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
2661 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2662 }
2663 break;
2664 case GMMACCOUNT_FIXED:
2665 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages + cPages > pGVM->gmm.s.Stats.Reserved.cFixedPages))
2666 {
2667 Log(("gmmR0AllocatePages:Fixed: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2668 pGVM->gmm.s.Stats.Reserved.cFixedPages, pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
2669 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2670 }
2671 break;
2672 default:
2673 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2674 }
2675
2676#ifdef GMM_WITH_LEGACY_MODE
2677 /*
2678 * If we're in legacy memory mode, it's easy to figure if we have
2679 * sufficient number of pages up-front.
2680 */
2681 if ( pGMM->fLegacyAllocationMode
2682 && pGVM->gmm.s.Private.cFreePages < cPages)
2683 {
2684 Assert(pGMM->fBoundMemoryMode);
2685 return VERR_GMM_SEED_ME;
2686 }
2687#endif
2688
2689 /*
2690 * Update the accounts before we proceed because we might be leaving the
2691 * protection of the global mutex and thus run the risk of permitting
2692 * too much memory to be allocated.
2693 */
2694 switch (enmAccount)
2695 {
2696 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages += cPages; break;
2697 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages += cPages; break;
2698 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages += cPages; break;
2699 default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2700 }
2701 pGVM->gmm.s.Stats.cPrivatePages += cPages;
2702 pGMM->cAllocatedPages += cPages;
2703
2704#ifdef GMM_WITH_LEGACY_MODE
2705 /*
2706 * Part two of it's-easy-in-legacy-memory-mode.
2707 */
2708 if (pGMM->fLegacyAllocationMode)
2709 {
2710 uint32_t iPage = gmmR0AllocatePagesInBoundMode(pGVM, 0, cPages, paPages);
2711 AssertReleaseReturn(iPage == cPages, VERR_GMM_ALLOC_PAGES_IPE);
2712 return VINF_SUCCESS;
2713 }
2714#endif
2715
2716 /*
2717 * Bound mode is also relatively straightforward.
2718 */
2719 uint32_t iPage = 0;
2720 int rc = VINF_SUCCESS;
2721 if (pGMM->fBoundMemoryMode)
2722 {
2723 iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2724 if (iPage < cPages)
2725 do
2726 rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGVM->gmm.s.Private, cPages, paPages, &iPage);
2727 while (iPage < cPages && RT_SUCCESS(rc));
2728 }
2729 /*
2730 * Shared mode is trickier as we should try archive the same locality as
2731 * in bound mode, but smartly make use of non-full chunks allocated by
2732 * other VMs if we're low on memory.
2733 */
2734 else
2735 {
2736 /* Pick the most optimal pages first. */
2737 iPage = gmmR0AllocatePagesAssociatedWithVM(pGMM, pGVM, &pGMM->PrivateX, iPage, cPages, paPages);
2738 if (iPage < cPages)
2739 {
2740 /* Maybe we should try getting pages from chunks "belonging" to
2741 other VMs before allocating more chunks? */
2742 bool fTriedOnSameAlready = false;
2743 if (gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(pGVM))
2744 {
2745 iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2746 fTriedOnSameAlready = true;
2747 }
2748
2749 /* Allocate memory from empty chunks. */
2750 if (iPage < cPages)
2751 iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2752
2753 /* Grab empty shared chunks. */
2754 if (iPage < cPages)
2755 iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2756
2757 /* If there is a lof of free pages spread around, try not waste
2758 system memory on more chunks. (Should trigger defragmentation.) */
2759 if ( !fTriedOnSameAlready
2760 && gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(pGMM))
2761 {
2762 iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2763 if (iPage < cPages)
2764 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2765 }
2766
2767 /*
2768 * Ok, try allocate new chunks.
2769 */
2770 if (iPage < cPages)
2771 {
2772 do
2773 rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGMM->PrivateX, cPages, paPages, &iPage);
2774 while (iPage < cPages && RT_SUCCESS(rc));
2775
2776 /* If the host is out of memory, take whatever we can get. */
2777 if ( (rc == VERR_NO_MEMORY || rc == VERR_NO_PHYS_MEMORY)
2778 && pGMM->PrivateX.cFreePages + pGMM->Shared.cFreePages >= cPages - iPage)
2779 {
2780 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2781 if (iPage < cPages)
2782 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2783 AssertRelease(iPage == cPages);
2784 rc = VINF_SUCCESS;
2785 }
2786 }
2787 }
2788 }
2789
2790 /*
2791 * Clean up on failure. Since this is bound to be a low-memory condition
2792 * we will give back any empty chunks that might be hanging around.
2793 */
2794 if (RT_FAILURE(rc))
2795 {
2796 /* Update the statistics. */
2797 pGVM->gmm.s.Stats.cPrivatePages -= cPages;
2798 pGMM->cAllocatedPages -= cPages - iPage;
2799 switch (enmAccount)
2800 {
2801 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages; break;
2802 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= cPages; break;
2803 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= cPages; break;
2804 default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2805 }
2806
2807 /* Release the pages. */
2808 while (iPage-- > 0)
2809 {
2810 uint32_t idPage = paPages[iPage].idPage;
2811 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
2812 if (RT_LIKELY(pPage))
2813 {
2814 Assert(GMM_PAGE_IS_PRIVATE(pPage));
2815 Assert(pPage->Private.hGVM == pGVM->hSelf);
2816 gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
2817 }
2818 else
2819 AssertMsgFailed(("idPage=%#x\n", idPage));
2820
2821 paPages[iPage].idPage = NIL_GMM_PAGEID;
2822 paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2823 paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2824 }
2825
2826 /* Free empty chunks. */
2827 /** @todo */
2828
2829 /* return the fail status on failure */
2830 return rc;
2831 }
2832 return VINF_SUCCESS;
2833}
2834
2835
2836/**
2837 * Updates the previous allocations and allocates more pages.
2838 *
2839 * The handy pages are always taken from the 'base' memory account.
2840 * The allocated pages are not cleared and will contains random garbage.
2841 *
2842 * @returns VBox status code:
2843 * @retval VINF_SUCCESS on success.
2844 * @retval VERR_NOT_OWNER if the caller is not an EMT.
2845 * @retval VERR_GMM_PAGE_NOT_FOUND if one of the pages to update wasn't found.
2846 * @retval VERR_GMM_PAGE_NOT_PRIVATE if one of the pages to update wasn't a
2847 * private page.
2848 * @retval VERR_GMM_PAGE_NOT_SHARED if one of the pages to update wasn't a
2849 * shared page.
2850 * @retval VERR_GMM_NOT_PAGE_OWNER if one of the pages to be updated wasn't
2851 * owned by the VM.
2852 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2853 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2854 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2855 * that is we're trying to allocate more than we've reserved.
2856 *
2857 * @param pGVM The global (ring-0) VM structure.
2858 * @param idCpu The VCPU id.
2859 * @param cPagesToUpdate The number of pages to update (starting from the head).
2860 * @param cPagesToAlloc The number of pages to allocate (starting from the head).
2861 * @param paPages The array of page descriptors.
2862 * See GMMPAGEDESC for details on what is expected on input.
2863 * @thread EMT(idCpu)
2864 */
2865GMMR0DECL(int) GMMR0AllocateHandyPages(PGVM pGVM, VMCPUID idCpu, uint32_t cPagesToUpdate,
2866 uint32_t cPagesToAlloc, PGMMPAGEDESC paPages)
2867{
2868 LogFlow(("GMMR0AllocateHandyPages: pGVM=%p cPagesToUpdate=%#x cPagesToAlloc=%#x paPages=%p\n",
2869 pGVM, cPagesToUpdate, cPagesToAlloc, paPages));
2870
2871 /*
2872 * Validate, get basics and take the semaphore.
2873 * (This is a relatively busy path, so make predictions where possible.)
2874 */
2875 PGMM pGMM;
2876 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2877 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
2878 if (RT_FAILURE(rc))
2879 return rc;
2880
2881 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2882 AssertMsgReturn( (cPagesToUpdate && cPagesToUpdate < 1024)
2883 || (cPagesToAlloc && cPagesToAlloc < 1024),
2884 ("cPagesToUpdate=%#x cPagesToAlloc=%#x\n", cPagesToUpdate, cPagesToAlloc),
2885 VERR_INVALID_PARAMETER);
2886
2887 unsigned iPage = 0;
2888 for (; iPage < cPagesToUpdate; iPage++)
2889 {
2890 AssertMsgReturn( ( paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2891 && !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK))
2892 || paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2893 || paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE,
2894 ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys),
2895 VERR_INVALID_PARAMETER);
2896 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2897 /*|| paPages[iPage].idPage == NIL_GMM_PAGEID*/,
2898 ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2899 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2900 /*|| paPages[iPage].idSharedPage == NIL_GMM_PAGEID*/,
2901 ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2902 }
2903
2904 for (; iPage < cPagesToAlloc; iPage++)
2905 {
2906 AssertMsgReturn(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS, ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys), VERR_INVALID_PARAMETER);
2907 AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2908 AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2909 }
2910
2911 gmmR0MutexAcquire(pGMM);
2912 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2913 {
2914 /* No allocations before the initial reservation has been made! */
2915 if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
2916 && pGVM->gmm.s.Stats.Reserved.cFixedPages
2917 && pGVM->gmm.s.Stats.Reserved.cShadowPages))
2918 {
2919 /*
2920 * Perform the updates.
2921 * Stop on the first error.
2922 */
2923 for (iPage = 0; iPage < cPagesToUpdate; iPage++)
2924 {
2925 if (paPages[iPage].idPage != NIL_GMM_PAGEID)
2926 {
2927 PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idPage);
2928 if (RT_LIKELY(pPage))
2929 {
2930 if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
2931 {
2932 if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
2933 {
2934 AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2935 if (RT_LIKELY(paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST))
2936 pPage->Private.pfn = paPages[iPage].HCPhysGCPhys >> PAGE_SHIFT;
2937 else if (paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE)
2938 pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
2939 /* else: NIL_RTHCPHYS nothing */
2940
2941 paPages[iPage].idPage = NIL_GMM_PAGEID;
2942 paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2943 }
2944 else
2945 {
2946 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not owner! hGVM=%#x hSelf=%#x\n",
2947 iPage, paPages[iPage].idPage, pPage->Private.hGVM, pGVM->hSelf));
2948 rc = VERR_GMM_NOT_PAGE_OWNER;
2949 break;
2950 }
2951 }
2952 else
2953 {
2954 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not private! %.*Rhxs (type %d)\n", iPage, paPages[iPage].idPage, sizeof(*pPage), pPage, pPage->Common.u2State));
2955 rc = VERR_GMM_PAGE_NOT_PRIVATE;
2956 break;
2957 }
2958 }
2959 else
2960 {
2961 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (private)\n", iPage, paPages[iPage].idPage));
2962 rc = VERR_GMM_PAGE_NOT_FOUND;
2963 break;
2964 }
2965 }
2966
2967 if (paPages[iPage].idSharedPage != NIL_GMM_PAGEID)
2968 {
2969 PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idSharedPage);
2970 if (RT_LIKELY(pPage))
2971 {
2972 if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
2973 {
2974 AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2975 Assert(pPage->Shared.cRefs);
2976 Assert(pGVM->gmm.s.Stats.cSharedPages);
2977 Assert(pGVM->gmm.s.Stats.Allocated.cBasePages);
2978
2979 Log(("GMMR0AllocateHandyPages: free shared page %x cRefs=%d\n", paPages[iPage].idSharedPage, pPage->Shared.cRefs));
2980 pGVM->gmm.s.Stats.cSharedPages--;
2981 pGVM->gmm.s.Stats.Allocated.cBasePages--;
2982 if (!--pPage->Shared.cRefs)
2983 gmmR0FreeSharedPage(pGMM, pGVM, paPages[iPage].idSharedPage, pPage);
2984 else
2985 {
2986 Assert(pGMM->cDuplicatePages);
2987 pGMM->cDuplicatePages--;
2988 }
2989
2990 paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2991 }
2992 else
2993 {
2994 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not shared!\n", iPage, paPages[iPage].idSharedPage));
2995 rc = VERR_GMM_PAGE_NOT_SHARED;
2996 break;
2997 }
2998 }
2999 else
3000 {
3001 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (shared)\n", iPage, paPages[iPage].idSharedPage));
3002 rc = VERR_GMM_PAGE_NOT_FOUND;
3003 break;
3004 }
3005 }
3006 } /* for each page to update */
3007
3008 if (RT_SUCCESS(rc) && cPagesToAlloc > 0)
3009 {
3010#if defined(VBOX_STRICT) && 0 /** @todo re-test this later. Appeared to be a PGM init bug. */
3011 for (iPage = 0; iPage < cPagesToAlloc; iPage++)
3012 {
3013 Assert(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS);
3014 Assert(paPages[iPage].idPage == NIL_GMM_PAGEID);
3015 Assert(paPages[iPage].idSharedPage == NIL_GMM_PAGEID);
3016 }
3017#endif
3018
3019 /*
3020 * Join paths with GMMR0AllocatePages for the allocation.
3021 * Note! gmmR0AllocateMoreChunks may leave the protection of the mutex!
3022 */
3023 rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPagesToAlloc, paPages, GMMACCOUNT_BASE);
3024 }
3025 }
3026 else
3027 rc = VERR_WRONG_ORDER;
3028 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3029 }
3030 else
3031 rc = VERR_GMM_IS_NOT_SANE;
3032 gmmR0MutexRelease(pGMM);
3033 LogFlow(("GMMR0AllocateHandyPages: returns %Rrc\n", rc));
3034 return rc;
3035}
3036
3037
3038/**
3039 * Allocate one or more pages.
3040 *
3041 * This is typically used for ROMs and MMIO2 (VRAM) during VM creation.
3042 * The allocated pages are not cleared and will contain random garbage.
3043 *
3044 * @returns VBox status code:
3045 * @retval VINF_SUCCESS on success.
3046 * @retval VERR_NOT_OWNER if the caller is not an EMT.
3047 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
3048 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3049 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3050 * that is we're trying to allocate more than we've reserved.
3051 *
3052 * @param pGVM The global (ring-0) VM structure.
3053 * @param idCpu The VCPU id.
3054 * @param cPages The number of pages to allocate.
3055 * @param paPages Pointer to the page descriptors.
3056 * See GMMPAGEDESC for details on what is expected on
3057 * input.
3058 * @param enmAccount The account to charge.
3059 *
3060 * @thread EMT.
3061 */
3062GMMR0DECL(int) GMMR0AllocatePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
3063{
3064 LogFlow(("GMMR0AllocatePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3065
3066 /*
3067 * Validate, get basics and take the semaphore.
3068 */
3069 PGMM pGMM;
3070 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3071 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3072 if (RT_FAILURE(rc))
3073 return rc;
3074
3075 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3076 AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3077 AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3078
3079 for (unsigned iPage = 0; iPage < cPages; iPage++)
3080 {
3081 AssertMsgReturn( paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
3082 || paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE
3083 || ( enmAccount == GMMACCOUNT_BASE
3084 && paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
3085 && !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK)),
3086 ("#%#x: %RHp enmAccount=%d\n", iPage, paPages[iPage].HCPhysGCPhys, enmAccount),
3087 VERR_INVALID_PARAMETER);
3088 AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3089 AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
3090 }
3091
3092 gmmR0MutexAcquire(pGMM);
3093 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3094 {
3095
3096 /* No allocations before the initial reservation has been made! */
3097 if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
3098 && pGVM->gmm.s.Stats.Reserved.cFixedPages
3099 && pGVM->gmm.s.Stats.Reserved.cShadowPages))
3100 rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPages, paPages, enmAccount);
3101 else
3102 rc = VERR_WRONG_ORDER;
3103 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3104 }
3105 else
3106 rc = VERR_GMM_IS_NOT_SANE;
3107 gmmR0MutexRelease(pGMM);
3108 LogFlow(("GMMR0AllocatePages: returns %Rrc\n", rc));
3109 return rc;
3110}
3111
3112
3113/**
3114 * VMMR0 request wrapper for GMMR0AllocatePages.
3115 *
3116 * @returns see GMMR0AllocatePages.
3117 * @param pGVM The global (ring-0) VM structure.
3118 * @param idCpu The VCPU id.
3119 * @param pReq Pointer to the request packet.
3120 */
3121GMMR0DECL(int) GMMR0AllocatePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMALLOCATEPAGESREQ pReq)
3122{
3123 /*
3124 * Validate input and pass it on.
3125 */
3126 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3127 AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0]),
3128 ("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0])),
3129 VERR_INVALID_PARAMETER);
3130 AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages]),
3131 ("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages])),
3132 VERR_INVALID_PARAMETER);
3133
3134 return GMMR0AllocatePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3135}
3136
3137
3138/**
3139 * Allocate a large page to represent guest RAM
3140 *
3141 * The allocated pages are not cleared and will contains random garbage.
3142 *
3143 * @returns VBox status code:
3144 * @retval VINF_SUCCESS on success.
3145 * @retval VERR_NOT_OWNER if the caller is not an EMT.
3146 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
3147 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3148 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3149 * that is we're trying to allocate more than we've reserved.
3150 * @returns see GMMR0AllocatePages.
3151 *
3152 * @param pGVM The global (ring-0) VM structure.
3153 * @param idCpu The VCPU id.
3154 * @param cbPage Large page size.
3155 * @param pIdPage Where to return the GMM page ID of the page.
3156 * @param pHCPhys Where to return the host physical address of the page.
3157 */
3158GMMR0DECL(int) GMMR0AllocateLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t cbPage, uint32_t *pIdPage, RTHCPHYS *pHCPhys)
3159{
3160 LogFlow(("GMMR0AllocateLargePage: pGVM=%p cbPage=%x\n", pGVM, cbPage));
3161
3162 AssertReturn(cbPage == GMM_CHUNK_SIZE, VERR_INVALID_PARAMETER);
3163 AssertPtrReturn(pIdPage, VERR_INVALID_PARAMETER);
3164 AssertPtrReturn(pHCPhys, VERR_INVALID_PARAMETER);
3165
3166 /*
3167 * Validate, get basics and take the semaphore.
3168 */
3169 PGMM pGMM;
3170 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3171 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3172 if (RT_FAILURE(rc))
3173 return rc;
3174
3175#ifdef GMM_WITH_LEGACY_MODE
3176 // /* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3177 // if (pGMM->fLegacyAllocationMode)
3178 // return VERR_NOT_SUPPORTED;
3179#endif
3180
3181 *pHCPhys = NIL_RTHCPHYS;
3182 *pIdPage = NIL_GMM_PAGEID;
3183
3184 gmmR0MutexAcquire(pGMM);
3185 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3186 {
3187 const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3188 if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
3189 > pGVM->gmm.s.Stats.Reserved.cBasePages))
3190 {
3191 Log(("GMMR0AllocateLargePage: Reserved=%#llx Allocated+Requested=%#llx+%#x!\n",
3192 pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3193 gmmR0MutexRelease(pGMM);
3194 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
3195 }
3196
3197 /*
3198 * Allocate a new large page chunk.
3199 *
3200 * Note! We leave the giant GMM lock temporarily as the allocation might
3201 * take a long time. gmmR0RegisterChunk will retake it (ugly).
3202 */
3203 AssertCompile(GMM_CHUNK_SIZE == _2M);
3204 gmmR0MutexRelease(pGMM);
3205
3206 RTR0MEMOBJ hMemObj;
3207 rc = RTR0MemObjAllocPhysEx(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS, GMM_CHUNK_SIZE);
3208 if (RT_SUCCESS(rc))
3209 {
3210 PGMMCHUNKFREESET pSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
3211 PGMMCHUNK pChunk;
3212 rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_LARGE_PAGE, &pChunk);
3213 if (RT_SUCCESS(rc))
3214 {
3215 /*
3216 * Allocate all the pages in the chunk.
3217 */
3218 /* Unlink the new chunk from the free list. */
3219 gmmR0UnlinkChunk(pChunk);
3220
3221 /** @todo rewrite this to skip the looping. */
3222 /* Allocate all pages. */
3223 GMMPAGEDESC PageDesc;
3224 gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3225
3226 /* Return the first page as we'll use the whole chunk as one big page. */
3227 *pIdPage = PageDesc.idPage;
3228 *pHCPhys = PageDesc.HCPhysGCPhys;
3229
3230 for (unsigned i = 1; i < cPages; i++)
3231 gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3232
3233 /* Update accounting. */
3234 pGVM->gmm.s.Stats.Allocated.cBasePages += cPages;
3235 pGVM->gmm.s.Stats.cPrivatePages += cPages;
3236 pGMM->cAllocatedPages += cPages;
3237
3238 gmmR0LinkChunk(pChunk, pSet);
3239 gmmR0MutexRelease(pGMM);
3240 LogFlow(("GMMR0AllocateLargePage: returns VINF_SUCCESS\n"));
3241 return VINF_SUCCESS;
3242 }
3243 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3244 }
3245 }
3246 else
3247 {
3248 gmmR0MutexRelease(pGMM);
3249 rc = VERR_GMM_IS_NOT_SANE;
3250 }
3251
3252 LogFlow(("GMMR0AllocateLargePage: returns %Rrc\n", rc));
3253 return rc;
3254}
3255
3256
3257/**
3258 * Free a large page.
3259 *
3260 * @returns VBox status code:
3261 * @param pGVM The global (ring-0) VM structure.
3262 * @param idCpu The VCPU id.
3263 * @param idPage The large page id.
3264 */
3265GMMR0DECL(int) GMMR0FreeLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t idPage)
3266{
3267 LogFlow(("GMMR0FreeLargePage: pGVM=%p idPage=%x\n", pGVM, idPage));
3268
3269 /*
3270 * Validate, get basics and take the semaphore.
3271 */
3272 PGMM pGMM;
3273 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3274 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3275 if (RT_FAILURE(rc))
3276 return rc;
3277
3278#ifdef GMM_WITH_LEGACY_MODE
3279 // /* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3280 // if (pGMM->fLegacyAllocationMode)
3281 // return VERR_NOT_SUPPORTED;
3282#endif
3283
3284 gmmR0MutexAcquire(pGMM);
3285 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3286 {
3287 const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3288
3289 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3290 {
3291 Log(("GMMR0FreeLargePage: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3292 gmmR0MutexRelease(pGMM);
3293 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3294 }
3295
3296 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3297 if (RT_LIKELY( pPage
3298 && GMM_PAGE_IS_PRIVATE(pPage)))
3299 {
3300 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3301 Assert(pChunk);
3302 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3303 Assert(pChunk->cPrivate > 0);
3304
3305 /* Release the memory immediately. */
3306 gmmR0FreeChunk(pGMM, NULL, pChunk, false /*fRelaxedSem*/); /** @todo this can be relaxed too! */
3307
3308 /* Update accounting. */
3309 pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages;
3310 pGVM->gmm.s.Stats.cPrivatePages -= cPages;
3311 pGMM->cAllocatedPages -= cPages;
3312 }
3313 else
3314 rc = VERR_GMM_PAGE_NOT_FOUND;
3315 }
3316 else
3317 rc = VERR_GMM_IS_NOT_SANE;
3318
3319 gmmR0MutexRelease(pGMM);
3320 LogFlow(("GMMR0FreeLargePage: returns %Rrc\n", rc));
3321 return rc;
3322}
3323
3324
3325/**
3326 * VMMR0 request wrapper for GMMR0FreeLargePage.
3327 *
3328 * @returns see GMMR0FreeLargePage.
3329 * @param pGVM The global (ring-0) VM structure.
3330 * @param idCpu The VCPU id.
3331 * @param pReq Pointer to the request packet.
3332 */
3333GMMR0DECL(int) GMMR0FreeLargePageReq(PGVM pGVM, VMCPUID idCpu, PGMMFREELARGEPAGEREQ pReq)
3334{
3335 /*
3336 * Validate input and pass it on.
3337 */
3338 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3339 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMFREEPAGESREQ),
3340 ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(GMMFREEPAGESREQ)),
3341 VERR_INVALID_PARAMETER);
3342
3343 return GMMR0FreeLargePage(pGVM, idCpu, pReq->idPage);
3344}
3345
3346
3347/**
3348 * Frees a chunk, giving it back to the host OS.
3349 *
3350 * @param pGMM Pointer to the GMM instance.
3351 * @param pGVM This is set when called from GMMR0CleanupVM so we can
3352 * unmap and free the chunk in one go.
3353 * @param pChunk The chunk to free.
3354 * @param fRelaxedSem Whether we can release the semaphore while doing the
3355 * freeing (@c true) or not.
3356 */
3357static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3358{
3359 Assert(pChunk->Core.Key != NIL_GMM_CHUNKID);
3360
3361 GMMR0CHUNKMTXSTATE MtxState;
3362 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3363
3364 /*
3365 * Cleanup hack! Unmap the chunk from the callers address space.
3366 * This shouldn't happen, so screw lock contention...
3367 */
3368 if ( pChunk->cMappingsX
3369#ifdef GMM_WITH_LEGACY_MODE
3370 && (!pGMM->fLegacyAllocationMode || (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
3371#endif
3372 && pGVM)
3373 gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3374
3375 /*
3376 * If there are current mappings of the chunk, then request the
3377 * VMs to unmap them. Reposition the chunk in the free list so
3378 * it won't be a likely candidate for allocations.
3379 */
3380 if (pChunk->cMappingsX)
3381 {
3382 /** @todo R0 -> VM request */
3383 /* The chunk can be mapped by more than one VM if fBoundMemoryMode is false! */
3384 Log(("gmmR0FreeChunk: chunk still has %d mappings; don't free!\n", pChunk->cMappingsX));
3385 gmmR0ChunkMutexRelease(&MtxState, pChunk);
3386 return false;
3387 }
3388
3389
3390 /*
3391 * Save and trash the handle.
3392 */
3393 RTR0MEMOBJ const hMemObj = pChunk->hMemObj;
3394 pChunk->hMemObj = NIL_RTR0MEMOBJ;
3395
3396 /*
3397 * Unlink it from everywhere.
3398 */
3399 gmmR0UnlinkChunk(pChunk);
3400
3401 RTSpinlockAcquire(pGMM->hSpinLockTree);
3402
3403 RTListNodeRemove(&pChunk->ListNode);
3404
3405 PAVLU32NODECORE pCore = RTAvlU32Remove(&pGMM->pChunks, pChunk->Core.Key);
3406 Assert(pCore == &pChunk->Core); NOREF(pCore);
3407
3408 PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(pChunk->Core.Key)];
3409 if (pTlbe->pChunk == pChunk)
3410 {
3411 pTlbe->idChunk = NIL_GMM_CHUNKID;
3412 pTlbe->pChunk = NULL;
3413 }
3414
3415 Assert(pGMM->cChunks > 0);
3416 pGMM->cChunks--;
3417
3418 RTSpinlockRelease(pGMM->hSpinLockTree);
3419
3420 /*
3421 * Free the Chunk ID before dropping the locks and freeing the rest.
3422 */
3423 gmmR0FreeChunkId(pGMM, pChunk->Core.Key);
3424 pChunk->Core.Key = NIL_GMM_CHUNKID;
3425
3426 pGMM->cFreedChunks++;
3427
3428 gmmR0ChunkMutexRelease(&MtxState, NULL);
3429 if (fRelaxedSem)
3430 gmmR0MutexRelease(pGMM);
3431
3432 RTMemFree(pChunk->paMappingsX);
3433 pChunk->paMappingsX = NULL;
3434
3435 RTMemFree(pChunk);
3436
3437#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
3438 int rc = RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3439#else
3440 int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3441#endif
3442 AssertLogRelRC(rc);
3443
3444 if (fRelaxedSem)
3445 gmmR0MutexAcquire(pGMM);
3446 return fRelaxedSem;
3447}
3448
3449
3450/**
3451 * Free page worker.
3452 *
3453 * The caller does all the statistic decrementing, we do all the incrementing.
3454 *
3455 * @param pGMM Pointer to the GMM instance data.
3456 * @param pGVM Pointer to the GVM instance.
3457 * @param pChunk Pointer to the chunk this page belongs to.
3458 * @param idPage The Page ID.
3459 * @param pPage Pointer to the page.
3460 */
3461static void gmmR0FreePageWorker(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint32_t idPage, PGMMPAGE pPage)
3462{
3463 Log3(("F pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x\n",
3464 pPage, pPage - &pChunk->aPages[0], idPage, pPage->Common.u2State, pChunk->iFreeHead)); NOREF(idPage);
3465
3466 /*
3467 * Put the page on the free list.
3468 */
3469 pPage->u = 0;
3470 pPage->Free.u2State = GMM_PAGE_STATE_FREE;
3471 Assert(pChunk->iFreeHead < RT_ELEMENTS(pChunk->aPages) || pChunk->iFreeHead == UINT16_MAX);
3472 pPage->Free.iNext = pChunk->iFreeHead;
3473 pChunk->iFreeHead = pPage - &pChunk->aPages[0];
3474
3475 /*
3476 * Update statistics (the cShared/cPrivate stats are up to date already),
3477 * and relink the chunk if necessary.
3478 */
3479 unsigned const cFree = pChunk->cFree;
3480 if ( !cFree
3481 || gmmR0SelectFreeSetList(cFree) != gmmR0SelectFreeSetList(cFree + 1))
3482 {
3483 gmmR0UnlinkChunk(pChunk);
3484 pChunk->cFree++;
3485 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
3486 }
3487 else
3488 {
3489 pChunk->cFree = cFree + 1;
3490 pChunk->pSet->cFreePages++;
3491 }
3492
3493 /*
3494 * If the chunk becomes empty, consider giving memory back to the host OS.
3495 *
3496 * The current strategy is to try give it back if there are other chunks
3497 * in this free list, meaning if there are at least 240 free pages in this
3498 * category. Note that since there are probably mappings of the chunk,
3499 * it won't be freed up instantly, which probably screws up this logic
3500 * a bit...
3501 */
3502 /** @todo Do this on the way out. */
3503 if (RT_LIKELY( pChunk->cFree != GMM_CHUNK_NUM_PAGES
3504 || pChunk->pFreeNext == NULL
3505 || pChunk->pFreePrev == NULL /** @todo this is probably misfiring, see reset... */))
3506 { /* likely */ }
3507#ifdef GMM_WITH_LEGACY_MODE
3508 else if (RT_LIKELY(pGMM->fLegacyAllocationMode && !(pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE)))
3509 { /* likely */ }
3510#endif
3511 else
3512 gmmR0FreeChunk(pGMM, NULL, pChunk, false);
3513
3514}
3515
3516
3517/**
3518 * Frees a shared page, the page is known to exist and be valid and such.
3519 *
3520 * @param pGMM Pointer to the GMM instance.
3521 * @param pGVM Pointer to the GVM instance.
3522 * @param idPage The page id.
3523 * @param pPage The page structure.
3524 */
3525DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3526{
3527 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3528 Assert(pChunk);
3529 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3530 Assert(pChunk->cShared > 0);
3531 Assert(pGMM->cSharedPages > 0);
3532 Assert(pGMM->cAllocatedPages > 0);
3533 Assert(!pPage->Shared.cRefs);
3534
3535 pChunk->cShared--;
3536 pGMM->cAllocatedPages--;
3537 pGMM->cSharedPages--;
3538 gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3539}
3540
3541
3542/**
3543 * Frees a private page, the page is known to exist and be valid and such.
3544 *
3545 * @param pGMM Pointer to the GMM instance.
3546 * @param pGVM Pointer to the GVM instance.
3547 * @param idPage The page id.
3548 * @param pPage The page structure.
3549 */
3550DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3551{
3552 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3553 Assert(pChunk);
3554 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3555 Assert(pChunk->cPrivate > 0);
3556 Assert(pGMM->cAllocatedPages > 0);
3557
3558 pChunk->cPrivate--;
3559 pGMM->cAllocatedPages--;
3560 gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3561}
3562
3563
3564/**
3565 * Common worker for GMMR0FreePages and GMMR0BalloonedPages.
3566 *
3567 * @returns VBox status code:
3568 * @retval xxx
3569 *
3570 * @param pGMM Pointer to the GMM instance data.
3571 * @param pGVM Pointer to the VM.
3572 * @param cPages The number of pages to free.
3573 * @param paPages Pointer to the page descriptors.
3574 * @param enmAccount The account this relates to.
3575 */
3576static int gmmR0FreePages(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3577{
3578 /*
3579 * Check that the request isn't impossible wrt to the account status.
3580 */
3581 switch (enmAccount)
3582 {
3583 case GMMACCOUNT_BASE:
3584 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3585 {
3586 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3587 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3588 }
3589 break;
3590 case GMMACCOUNT_SHADOW:
3591 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages < cPages))
3592 {
3593 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
3594 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3595 }
3596 break;
3597 case GMMACCOUNT_FIXED:
3598 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages < cPages))
3599 {
3600 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
3601 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3602 }
3603 break;
3604 default:
3605 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3606 }
3607
3608 /*
3609 * Walk the descriptors and free the pages.
3610 *
3611 * Statistics (except the account) are being updated as we go along,
3612 * unlike the alloc code. Also, stop on the first error.
3613 */
3614 int rc = VINF_SUCCESS;
3615 uint32_t iPage;
3616 for (iPage = 0; iPage < cPages; iPage++)
3617 {
3618 uint32_t idPage = paPages[iPage].idPage;
3619 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3620 if (RT_LIKELY(pPage))
3621 {
3622 if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
3623 {
3624 if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
3625 {
3626 Assert(pGVM->gmm.s.Stats.cPrivatePages);
3627 pGVM->gmm.s.Stats.cPrivatePages--;
3628 gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
3629 }
3630 else
3631 {
3632 Log(("gmmR0AllocatePages: #%#x/%#x: not owner! hGVM=%#x hSelf=%#x\n", iPage, idPage,
3633 pPage->Private.hGVM, pGVM->hSelf));
3634 rc = VERR_GMM_NOT_PAGE_OWNER;
3635 break;
3636 }
3637 }
3638 else if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3639 {
3640 Assert(pGVM->gmm.s.Stats.cSharedPages);
3641 Assert(pPage->Shared.cRefs);
3642#if defined(VBOX_WITH_PAGE_SHARING) && defined(VBOX_STRICT) && HC_ARCH_BITS == 64
3643 if (pPage->Shared.u14Checksum)
3644 {
3645 uint32_t uChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
3646 uChecksum &= UINT32_C(0x00003fff);
3647 AssertMsg(!uChecksum || uChecksum == pPage->Shared.u14Checksum,
3648 ("%#x vs %#x - idPage=%#x\n", uChecksum, pPage->Shared.u14Checksum, idPage));
3649 }
3650#endif
3651 pGVM->gmm.s.Stats.cSharedPages--;
3652 if (!--pPage->Shared.cRefs)
3653 gmmR0FreeSharedPage(pGMM, pGVM, idPage, pPage);
3654 else
3655 {
3656 Assert(pGMM->cDuplicatePages);
3657 pGMM->cDuplicatePages--;
3658 }
3659 }
3660 else
3661 {
3662 Log(("gmmR0AllocatePages: #%#x/%#x: already free!\n", iPage, idPage));
3663 rc = VERR_GMM_PAGE_ALREADY_FREE;
3664 break;
3665 }
3666 }
3667 else
3668 {
3669 Log(("gmmR0AllocatePages: #%#x/%#x: not found!\n", iPage, idPage));
3670 rc = VERR_GMM_PAGE_NOT_FOUND;
3671 break;
3672 }
3673 paPages[iPage].idPage = NIL_GMM_PAGEID;
3674 }
3675
3676 /*
3677 * Update the account.
3678 */
3679 switch (enmAccount)
3680 {
3681 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= iPage; break;
3682 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= iPage; break;
3683 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= iPage; break;
3684 default:
3685 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3686 }
3687
3688 /*
3689 * Any threshold stuff to be done here?
3690 */
3691
3692 return rc;
3693}
3694
3695
3696/**
3697 * Free one or more pages.
3698 *
3699 * This is typically used at reset time or power off.
3700 *
3701 * @returns VBox status code:
3702 * @retval xxx
3703 *
3704 * @param pGVM The global (ring-0) VM structure.
3705 * @param idCpu The VCPU id.
3706 * @param cPages The number of pages to allocate.
3707 * @param paPages Pointer to the page descriptors containing the page IDs
3708 * for each page.
3709 * @param enmAccount The account this relates to.
3710 * @thread EMT.
3711 */
3712GMMR0DECL(int) GMMR0FreePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3713{
3714 LogFlow(("GMMR0FreePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3715
3716 /*
3717 * Validate input and get the basics.
3718 */
3719 PGMM pGMM;
3720 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3721 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3722 if (RT_FAILURE(rc))
3723 return rc;
3724
3725 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3726 AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3727 AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3728
3729 for (unsigned iPage = 0; iPage < cPages; iPage++)
3730 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
3731 /*|| paPages[iPage].idPage == NIL_GMM_PAGEID*/,
3732 ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3733
3734 /*
3735 * Take the semaphore and call the worker function.
3736 */
3737 gmmR0MutexAcquire(pGMM);
3738 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3739 {
3740 rc = gmmR0FreePages(pGMM, pGVM, cPages, paPages, enmAccount);
3741 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3742 }
3743 else
3744 rc = VERR_GMM_IS_NOT_SANE;
3745 gmmR0MutexRelease(pGMM);
3746 LogFlow(("GMMR0FreePages: returns %Rrc\n", rc));
3747 return rc;
3748}
3749
3750
3751/**
3752 * VMMR0 request wrapper for GMMR0FreePages.
3753 *
3754 * @returns see GMMR0FreePages.
3755 * @param pGVM The global (ring-0) VM structure.
3756 * @param idCpu The VCPU id.
3757 * @param pReq Pointer to the request packet.
3758 */
3759GMMR0DECL(int) GMMR0FreePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMFREEPAGESREQ pReq)
3760{
3761 /*
3762 * Validate input and pass it on.
3763 */
3764 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3765 AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0]),
3766 ("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0])),
3767 VERR_INVALID_PARAMETER);
3768 AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages]),
3769 ("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages])),
3770 VERR_INVALID_PARAMETER);
3771
3772 return GMMR0FreePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3773}
3774
3775
3776/**
3777 * Report back on a memory ballooning request.
3778 *
3779 * The request may or may not have been initiated by the GMM. If it was initiated
3780 * by the GMM it is important that this function is called even if no pages were
3781 * ballooned.
3782 *
3783 * @returns VBox status code:
3784 * @retval VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH
3785 * @retval VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH
3786 * @retval VERR_GMM_OVERCOMMITTED_TRY_AGAIN_IN_A_BIT - reset condition
3787 * indicating that we won't necessarily have sufficient RAM to boot
3788 * the VM again and that it should pause until this changes (we'll try
3789 * balloon some other VM). (For standard deflate we have little choice
3790 * but to hope the VM won't use the memory that was returned to it.)
3791 *
3792 * @param pGVM The global (ring-0) VM structure.
3793 * @param idCpu The VCPU id.
3794 * @param enmAction Inflate/deflate/reset.
3795 * @param cBalloonedPages The number of pages that was ballooned.
3796 *
3797 * @thread EMT(idCpu)
3798 */
3799GMMR0DECL(int) GMMR0BalloonedPages(PGVM pGVM, VMCPUID idCpu, GMMBALLOONACTION enmAction, uint32_t cBalloonedPages)
3800{
3801 LogFlow(("GMMR0BalloonedPages: pGVM=%p enmAction=%d cBalloonedPages=%#x\n",
3802 pGVM, enmAction, cBalloonedPages));
3803
3804 AssertMsgReturn(cBalloonedPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cBalloonedPages), VERR_INVALID_PARAMETER);
3805
3806 /*
3807 * Validate input and get the basics.
3808 */
3809 PGMM pGMM;
3810 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3811 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3812 if (RT_FAILURE(rc))
3813 return rc;
3814
3815 /*
3816 * Take the semaphore and do some more validations.
3817 */
3818 gmmR0MutexAcquire(pGMM);
3819 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3820 {
3821 switch (enmAction)
3822 {
3823 case GMMBALLOONACTION_INFLATE:
3824 {
3825 if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cBalloonedPages
3826 <= pGVM->gmm.s.Stats.Reserved.cBasePages))
3827 {
3828 /*
3829 * Record the ballooned memory.
3830 */
3831 pGMM->cBalloonedPages += cBalloonedPages;
3832 if (pGVM->gmm.s.Stats.cReqBalloonedPages)
3833 {
3834 /* Codepath never taken. Might be interesting in the future to request ballooned memory from guests in low memory conditions.. */
3835 AssertFailed();
3836
3837 pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3838 pGVM->gmm.s.Stats.cReqActuallyBalloonedPages += cBalloonedPages;
3839 Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx Req=%#llx Actual=%#llx (pending)\n",
3840 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages,
3841 pGVM->gmm.s.Stats.cReqBalloonedPages, pGVM->gmm.s.Stats.cReqActuallyBalloonedPages));
3842 }
3843 else
3844 {
3845 pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3846 Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3847 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3848 }
3849 }
3850 else
3851 {
3852 Log(("GMMR0BalloonedPages: cBasePages=%#llx Total=%#llx cBalloonedPages=%#llx Reserved=%#llx\n",
3853 pGVM->gmm.s.Stats.Allocated.cBasePages, pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages,
3854 pGVM->gmm.s.Stats.Reserved.cBasePages));
3855 rc = VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3856 }
3857 break;
3858 }
3859
3860 case GMMBALLOONACTION_DEFLATE:
3861 {
3862 /* Deflate. */
3863 if (pGVM->gmm.s.Stats.cBalloonedPages >= cBalloonedPages)
3864 {
3865 /*
3866 * Record the ballooned memory.
3867 */
3868 Assert(pGMM->cBalloonedPages >= cBalloonedPages);
3869 pGMM->cBalloonedPages -= cBalloonedPages;
3870 pGVM->gmm.s.Stats.cBalloonedPages -= cBalloonedPages;
3871 if (pGVM->gmm.s.Stats.cReqDeflatePages)
3872 {
3873 AssertFailed(); /* This is path is for later. */
3874 Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx Req=%#llx\n",
3875 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages, pGVM->gmm.s.Stats.cReqDeflatePages));
3876
3877 /*
3878 * Anything we need to do here now when the request has been completed?
3879 */
3880 pGVM->gmm.s.Stats.cReqDeflatePages = 0;
3881 }
3882 else
3883 Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3884 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3885 }
3886 else
3887 {
3888 Log(("GMMR0BalloonedPages: Total=%#llx cBalloonedPages=%#llx\n", pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages));
3889 rc = VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH;
3890 }
3891 break;
3892 }
3893
3894 case GMMBALLOONACTION_RESET:
3895 {
3896 /* Reset to an empty balloon. */
3897 Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
3898
3899 pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
3900 pGVM->gmm.s.Stats.cBalloonedPages = 0;
3901 break;
3902 }
3903
3904 default:
3905 rc = VERR_INVALID_PARAMETER;
3906 break;
3907 }
3908 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3909 }
3910 else
3911 rc = VERR_GMM_IS_NOT_SANE;
3912
3913 gmmR0MutexRelease(pGMM);
3914 LogFlow(("GMMR0BalloonedPages: returns %Rrc\n", rc));
3915 return rc;
3916}
3917
3918
3919/**
3920 * VMMR0 request wrapper for GMMR0BalloonedPages.
3921 *
3922 * @returns see GMMR0BalloonedPages.
3923 * @param pGVM The global (ring-0) VM structure.
3924 * @param idCpu The VCPU id.
3925 * @param pReq Pointer to the request packet.
3926 */
3927GMMR0DECL(int) GMMR0BalloonedPagesReq(PGVM pGVM, VMCPUID idCpu, PGMMBALLOONEDPAGESREQ pReq)
3928{
3929 /*
3930 * Validate input and pass it on.
3931 */
3932 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3933 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMBALLOONEDPAGESREQ),
3934 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMBALLOONEDPAGESREQ)),
3935 VERR_INVALID_PARAMETER);
3936
3937 return GMMR0BalloonedPages(pGVM, idCpu, pReq->enmAction, pReq->cBalloonedPages);
3938}
3939
3940
3941/**
3942 * Return memory statistics for the hypervisor
3943 *
3944 * @returns VBox status code.
3945 * @param pReq Pointer to the request packet.
3946 */
3947GMMR0DECL(int) GMMR0QueryHypervisorMemoryStatsReq(PGMMMEMSTATSREQ pReq)
3948{
3949 /*
3950 * Validate input and pass it on.
3951 */
3952 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3953 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3954 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3955 VERR_INVALID_PARAMETER);
3956
3957 /*
3958 * Validate input and get the basics.
3959 */
3960 PGMM pGMM;
3961 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3962 pReq->cAllocPages = pGMM->cAllocatedPages;
3963 pReq->cFreePages = (pGMM->cChunks << (GMM_CHUNK_SHIFT- PAGE_SHIFT)) - pGMM->cAllocatedPages;
3964 pReq->cBalloonedPages = pGMM->cBalloonedPages;
3965 pReq->cMaxPages = pGMM->cMaxPages;
3966 pReq->cSharedPages = pGMM->cDuplicatePages;
3967 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3968
3969 return VINF_SUCCESS;
3970}
3971
3972
3973/**
3974 * Return memory statistics for the VM
3975 *
3976 * @returns VBox status code.
3977 * @param pGVM The global (ring-0) VM structure.
3978 * @param idCpu Cpu id.
3979 * @param pReq Pointer to the request packet.
3980 *
3981 * @thread EMT(idCpu)
3982 */
3983GMMR0DECL(int) GMMR0QueryMemoryStatsReq(PGVM pGVM, VMCPUID idCpu, PGMMMEMSTATSREQ pReq)
3984{
3985 /*
3986 * Validate input and pass it on.
3987 */
3988 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3989 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3990 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3991 VERR_INVALID_PARAMETER);
3992
3993 /*
3994 * Validate input and get the basics.
3995 */
3996 PGMM pGMM;
3997 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3998 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3999 if (RT_FAILURE(rc))
4000 return rc;
4001
4002 /*
4003 * Take the semaphore and do some more validations.
4004 */
4005 gmmR0MutexAcquire(pGMM);
4006 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4007 {
4008 pReq->cAllocPages = pGVM->gmm.s.Stats.Allocated.cBasePages;
4009 pReq->cBalloonedPages = pGVM->gmm.s.Stats.cBalloonedPages;
4010 pReq->cMaxPages = pGVM->gmm.s.Stats.Reserved.cBasePages;
4011 pReq->cFreePages = pReq->cMaxPages - pReq->cAllocPages;
4012 }
4013 else
4014 rc = VERR_GMM_IS_NOT_SANE;
4015
4016 gmmR0MutexRelease(pGMM);
4017 LogFlow(("GMMR3QueryVMMemoryStats: returns %Rrc\n", rc));
4018 return rc;
4019}
4020
4021
4022/**
4023 * Worker for gmmR0UnmapChunk and gmmr0FreeChunk.
4024 *
4025 * Don't call this in legacy allocation mode!
4026 *
4027 * @returns VBox status code.
4028 * @param pGMM Pointer to the GMM instance data.
4029 * @param pGVM Pointer to the Global VM structure.
4030 * @param pChunk Pointer to the chunk to be unmapped.
4031 */
4032static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
4033{
4034 RT_NOREF_PV(pGMM);
4035#ifdef GMM_WITH_LEGACY_MODE
4036 Assert(!pGMM->fLegacyAllocationMode || (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE));
4037#endif
4038
4039 /*
4040 * Find the mapping and try unmapping it.
4041 */
4042 uint32_t cMappings = pChunk->cMappingsX;
4043 for (uint32_t i = 0; i < cMappings; i++)
4044 {
4045 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4046 if (pChunk->paMappingsX[i].pGVM == pGVM)
4047 {
4048 /* unmap */
4049 int rc = RTR0MemObjFree(pChunk->paMappingsX[i].hMapObj, false /* fFreeMappings (NA) */);
4050 if (RT_SUCCESS(rc))
4051 {
4052 /* update the record. */
4053 cMappings--;
4054 if (i < cMappings)
4055 pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
4056 pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
4057 pChunk->paMappingsX[cMappings].pGVM = NULL;
4058 Assert(pChunk->cMappingsX - 1U == cMappings);
4059 pChunk->cMappingsX = cMappings;
4060 }
4061
4062 return rc;
4063 }
4064 }
4065
4066 Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
4067 return VERR_GMM_CHUNK_NOT_MAPPED;
4068}
4069
4070
4071/**
4072 * Unmaps a chunk previously mapped into the address space of the current process.
4073 *
4074 * @returns VBox status code.
4075 * @param pGMM Pointer to the GMM instance data.
4076 * @param pGVM Pointer to the Global VM structure.
4077 * @param pChunk Pointer to the chunk to be unmapped.
4078 * @param fRelaxedSem Whether we can release the semaphore while doing the
4079 * mapping (@c true) or not.
4080 */
4081static int gmmR0UnmapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
4082{
4083#ifdef GMM_WITH_LEGACY_MODE
4084 if (!pGMM->fLegacyAllocationMode || (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
4085 {
4086#endif
4087 /*
4088 * Lock the chunk and if possible leave the giant GMM lock.
4089 */
4090 GMMR0CHUNKMTXSTATE MtxState;
4091 int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4092 fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4093 if (RT_SUCCESS(rc))
4094 {
4095 rc = gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
4096 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4097 }
4098 return rc;
4099#ifdef GMM_WITH_LEGACY_MODE
4100 }
4101
4102 if (pChunk->hGVM == pGVM->hSelf)
4103 return VINF_SUCCESS;
4104
4105 Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x (legacy)\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
4106 return VERR_GMM_CHUNK_NOT_MAPPED;
4107#endif
4108}
4109
4110
4111/**
4112 * Worker for gmmR0MapChunk.
4113 *
4114 * @returns VBox status code.
4115 * @param pGMM Pointer to the GMM instance data.
4116 * @param pGVM Pointer to the Global VM structure.
4117 * @param pChunk Pointer to the chunk to be mapped.
4118 * @param ppvR3 Where to store the ring-3 address of the mapping.
4119 * In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4120 * contain the address of the existing mapping.
4121 */
4122static int gmmR0MapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4123{
4124#ifdef GMM_WITH_LEGACY_MODE
4125 /*
4126 * If we're in legacy mode this is simple.
4127 */
4128 if (pGMM->fLegacyAllocationMode && !(pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
4129 {
4130 if (pChunk->hGVM != pGVM->hSelf)
4131 {
4132 Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4133 return VERR_GMM_CHUNK_NOT_FOUND;
4134 }
4135
4136 *ppvR3 = RTR0MemObjAddressR3(pChunk->hMemObj);
4137 return VINF_SUCCESS;
4138 }
4139#else
4140 RT_NOREF(pGMM);
4141#endif
4142
4143 /*
4144 * Check to see if the chunk is already mapped.
4145 */
4146 for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4147 {
4148 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4149 if (pChunk->paMappingsX[i].pGVM == pGVM)
4150 {
4151 *ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4152 Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4153#ifdef VBOX_WITH_PAGE_SHARING
4154 /* The ring-3 chunk cache can be out of sync; don't fail. */
4155 return VINF_SUCCESS;
4156#else
4157 return VERR_GMM_CHUNK_ALREADY_MAPPED;
4158#endif
4159 }
4160 }
4161
4162 /*
4163 * Do the mapping.
4164 */
4165 RTR0MEMOBJ hMapObj;
4166 int rc = RTR0MemObjMapUser(&hMapObj, pChunk->hMemObj, (RTR3PTR)-1, 0, RTMEM_PROT_READ | RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4167 if (RT_SUCCESS(rc))
4168 {
4169 /* reallocate the array? assumes few users per chunk (usually one). */
4170 unsigned iMapping = pChunk->cMappingsX;
4171 if ( iMapping <= 3
4172 || (iMapping & 3) == 0)
4173 {
4174 unsigned cNewSize = iMapping <= 3
4175 ? iMapping + 1
4176 : iMapping + 4;
4177 Assert(cNewSize < 4 || RT_ALIGN_32(cNewSize, 4) == cNewSize);
4178 if (RT_UNLIKELY(cNewSize > UINT16_MAX))
4179 {
4180 rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4181 return VERR_GMM_TOO_MANY_CHUNK_MAPPINGS;
4182 }
4183
4184 void *pvMappings = RTMemRealloc(pChunk->paMappingsX, cNewSize * sizeof(pChunk->paMappingsX[0]));
4185 if (RT_UNLIKELY(!pvMappings))
4186 {
4187 rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4188 return VERR_NO_MEMORY;
4189 }
4190 pChunk->paMappingsX = (PGMMCHUNKMAP)pvMappings;
4191 }
4192
4193 /* insert new entry */
4194 pChunk->paMappingsX[iMapping].hMapObj = hMapObj;
4195 pChunk->paMappingsX[iMapping].pGVM = pGVM;
4196 Assert(pChunk->cMappingsX == iMapping);
4197 pChunk->cMappingsX = iMapping + 1;
4198
4199 *ppvR3 = RTR0MemObjAddressR3(hMapObj);
4200 }
4201
4202 return rc;
4203}
4204
4205
4206/**
4207 * Maps a chunk into the user address space of the current process.
4208 *
4209 * @returns VBox status code.
4210 * @param pGMM Pointer to the GMM instance data.
4211 * @param pGVM Pointer to the Global VM structure.
4212 * @param pChunk Pointer to the chunk to be mapped.
4213 * @param fRelaxedSem Whether we can release the semaphore while doing the
4214 * mapping (@c true) or not.
4215 * @param ppvR3 Where to store the ring-3 address of the mapping.
4216 * In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4217 * contain the address of the existing mapping.
4218 */
4219static int gmmR0MapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem, PRTR3PTR ppvR3)
4220{
4221 /*
4222 * Take the chunk lock and leave the giant GMM lock when possible, then
4223 * call the worker function.
4224 */
4225 GMMR0CHUNKMTXSTATE MtxState;
4226 int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4227 fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4228 if (RT_SUCCESS(rc))
4229 {
4230 rc = gmmR0MapChunkLocked(pGMM, pGVM, pChunk, ppvR3);
4231 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4232 }
4233
4234 return rc;
4235}
4236
4237
4238
4239#if defined(VBOX_WITH_PAGE_SHARING) || (defined(VBOX_STRICT) && HC_ARCH_BITS == 64)
4240/**
4241 * Check if a chunk is mapped into the specified VM
4242 *
4243 * @returns mapped yes/no
4244 * @param pGMM Pointer to the GMM instance.
4245 * @param pGVM Pointer to the Global VM structure.
4246 * @param pChunk Pointer to the chunk to be mapped.
4247 * @param ppvR3 Where to store the ring-3 address of the mapping.
4248 */
4249static bool gmmR0IsChunkMapped(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4250{
4251 GMMR0CHUNKMTXSTATE MtxState;
4252 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
4253 for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4254 {
4255 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4256 if (pChunk->paMappingsX[i].pGVM == pGVM)
4257 {
4258 *ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4259 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4260 return true;
4261 }
4262 }
4263 *ppvR3 = NULL;
4264 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4265 return false;
4266}
4267#endif /* VBOX_WITH_PAGE_SHARING || (VBOX_STRICT && 64-BIT) */
4268
4269
4270/**
4271 * Map a chunk and/or unmap another chunk.
4272 *
4273 * The mapping and unmapping applies to the current process.
4274 *
4275 * This API does two things because it saves a kernel call per mapping when
4276 * when the ring-3 mapping cache is full.
4277 *
4278 * @returns VBox status code.
4279 * @param pGVM The global (ring-0) VM structure.
4280 * @param idChunkMap The chunk to map. NIL_GMM_CHUNKID if nothing to map.
4281 * @param idChunkUnmap The chunk to unmap. NIL_GMM_CHUNKID if nothing to unmap.
4282 * @param ppvR3 Where to store the address of the mapped chunk. NULL is ok if nothing to map.
4283 * @thread EMT ???
4284 */
4285GMMR0DECL(int) GMMR0MapUnmapChunk(PGVM pGVM, uint32_t idChunkMap, uint32_t idChunkUnmap, PRTR3PTR ppvR3)
4286{
4287 LogFlow(("GMMR0MapUnmapChunk: pGVM=%p idChunkMap=%#x idChunkUnmap=%#x ppvR3=%p\n",
4288 pGVM, idChunkMap, idChunkUnmap, ppvR3));
4289
4290 /*
4291 * Validate input and get the basics.
4292 */
4293 PGMM pGMM;
4294 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4295 int rc = GVMMR0ValidateGVM(pGVM);
4296 if (RT_FAILURE(rc))
4297 return rc;
4298
4299 AssertCompile(NIL_GMM_CHUNKID == 0);
4300 AssertMsgReturn(idChunkMap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkMap), VERR_INVALID_PARAMETER);
4301 AssertMsgReturn(idChunkUnmap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkUnmap), VERR_INVALID_PARAMETER);
4302
4303 if ( idChunkMap == NIL_GMM_CHUNKID
4304 && idChunkUnmap == NIL_GMM_CHUNKID)
4305 return VERR_INVALID_PARAMETER;
4306
4307 if (idChunkMap != NIL_GMM_CHUNKID)
4308 {
4309 AssertPtrReturn(ppvR3, VERR_INVALID_POINTER);
4310 *ppvR3 = NIL_RTR3PTR;
4311 }
4312
4313 /*
4314 * Take the semaphore and do the work.
4315 *
4316 * The unmapping is done last since it's easier to undo a mapping than
4317 * undoing an unmapping. The ring-3 mapping cache cannot not be so big
4318 * that it pushes the user virtual address space to within a chunk of
4319 * it it's limits, so, no problem here.
4320 */
4321 gmmR0MutexAcquire(pGMM);
4322 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4323 {
4324 PGMMCHUNK pMap = NULL;
4325 if (idChunkMap != NIL_GVM_HANDLE)
4326 {
4327 pMap = gmmR0GetChunk(pGMM, idChunkMap);
4328 if (RT_LIKELY(pMap))
4329 rc = gmmR0MapChunk(pGMM, pGVM, pMap, true /*fRelaxedSem*/, ppvR3);
4330 else
4331 {
4332 Log(("GMMR0MapUnmapChunk: idChunkMap=%#x\n", idChunkMap));
4333 rc = VERR_GMM_CHUNK_NOT_FOUND;
4334 }
4335 }
4336/** @todo split this operation, the bail out might (theoretcially) not be
4337 * entirely safe. */
4338
4339 if ( idChunkUnmap != NIL_GMM_CHUNKID
4340 && RT_SUCCESS(rc))
4341 {
4342 PGMMCHUNK pUnmap = gmmR0GetChunk(pGMM, idChunkUnmap);
4343 if (RT_LIKELY(pUnmap))
4344 rc = gmmR0UnmapChunk(pGMM, pGVM, pUnmap, true /*fRelaxedSem*/);
4345 else
4346 {
4347 Log(("GMMR0MapUnmapChunk: idChunkUnmap=%#x\n", idChunkUnmap));
4348 rc = VERR_GMM_CHUNK_NOT_FOUND;
4349 }
4350
4351 if (RT_FAILURE(rc) && pMap)
4352 gmmR0UnmapChunk(pGMM, pGVM, pMap, false /*fRelaxedSem*/);
4353 }
4354
4355 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4356 }
4357 else
4358 rc = VERR_GMM_IS_NOT_SANE;
4359 gmmR0MutexRelease(pGMM);
4360
4361 LogFlow(("GMMR0MapUnmapChunk: returns %Rrc\n", rc));
4362 return rc;
4363}
4364
4365
4366/**
4367 * VMMR0 request wrapper for GMMR0MapUnmapChunk.
4368 *
4369 * @returns see GMMR0MapUnmapChunk.
4370 * @param pGVM The global (ring-0) VM structure.
4371 * @param pReq Pointer to the request packet.
4372 */
4373GMMR0DECL(int) GMMR0MapUnmapChunkReq(PGVM pGVM, PGMMMAPUNMAPCHUNKREQ pReq)
4374{
4375 /*
4376 * Validate input and pass it on.
4377 */
4378 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4379 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4380
4381 return GMMR0MapUnmapChunk(pGVM, pReq->idChunkMap, pReq->idChunkUnmap, &pReq->pvR3);
4382}
4383
4384
4385/**
4386 * Legacy mode API for supplying pages.
4387 *
4388 * The specified user address points to a allocation chunk sized block that
4389 * will be locked down and used by the GMM when the GM asks for pages.
4390 *
4391 * @returns VBox status code.
4392 * @param pGVM The global (ring-0) VM structure.
4393 * @param idCpu The VCPU id.
4394 * @param pvR3 Pointer to the chunk size memory block to lock down.
4395 */
4396GMMR0DECL(int) GMMR0SeedChunk(PGVM pGVM, VMCPUID idCpu, RTR3PTR pvR3)
4397{
4398#ifdef GMM_WITH_LEGACY_MODE
4399 /*
4400 * Validate input and get the basics.
4401 */
4402 PGMM pGMM;
4403 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4404 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4405 if (RT_FAILURE(rc))
4406 return rc;
4407
4408 AssertPtrReturn(pvR3, VERR_INVALID_POINTER);
4409 AssertReturn(!(PAGE_OFFSET_MASK & pvR3), VERR_INVALID_POINTER);
4410
4411 if (!pGMM->fLegacyAllocationMode)
4412 {
4413 Log(("GMMR0SeedChunk: not in legacy allocation mode!\n"));
4414 return VERR_NOT_SUPPORTED;
4415 }
4416
4417 /*
4418 * Lock the memory and add it as new chunk with our hGVM.
4419 * (The GMM locking is done inside gmmR0RegisterChunk.)
4420 */
4421 RTR0MEMOBJ hMemObj;
4422 rc = RTR0MemObjLockUser(&hMemObj, pvR3, GMM_CHUNK_SIZE, RTMEM_PROT_READ | RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4423 if (RT_SUCCESS(rc))
4424 {
4425 rc = gmmR0RegisterChunk(pGMM, &pGVM->gmm.s.Private, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_SEEDED, NULL);
4426 if (RT_SUCCESS(rc))
4427 gmmR0MutexRelease(pGMM);
4428 else
4429 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
4430 }
4431
4432 LogFlow(("GMMR0SeedChunk: rc=%d (pvR3=%p)\n", rc, pvR3));
4433 return rc;
4434#else
4435 RT_NOREF(pGVM, idCpu, pvR3);
4436 return VERR_NOT_SUPPORTED;
4437#endif
4438}
4439
4440#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
4441
4442/**
4443 * Gets the ring-0 virtual address for the given page.
4444 *
4445 * This is used by PGM when IEM and such wants to access guest RAM from ring-0.
4446 * One of the ASSUMPTIONS here is that the @a idPage is used by the VM and the
4447 * corresponding chunk will remain valid beyond the call (at least till the EMT
4448 * returns to ring-3).
4449 *
4450 * @returns VBox status code.
4451 * @param pGVM Pointer to the kernel-only VM instace data.
4452 * @param idPage The page ID.
4453 * @param ppv Where to store the address.
4454 * @thread EMT
4455 */
4456GMMR0DECL(int) GMMR0PageIdToVirt(PGVM pGVM, uint32_t idPage, void **ppv)
4457{
4458 *ppv = NULL;
4459 PGMM pGMM;
4460 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4461
4462 RTSpinlockAcquire(pGMM->hSpinLockTree);
4463
4464 int rc;
4465 PGMMCHUNK pChunk = gmmR0GetChunkLocked(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4466 if (RT_LIKELY(pChunk))
4467 {
4468 const GMMPAGE *pPage = &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
4469 if (RT_LIKELY( ( GMM_PAGE_IS_PRIVATE(pPage)
4470 && pPage->Private.hGVM == pGVM->hSelf)
4471 || GMM_PAGE_IS_SHARED(pPage)))
4472 {
4473 AssertPtr(pChunk->pbMapping);
4474 *ppv = &pChunk->pbMapping[(idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT];
4475 rc = VINF_SUCCESS;
4476 }
4477 else
4478 rc = VERR_GMM_NOT_PAGE_OWNER;
4479 }
4480 else
4481 rc = VERR_GMM_PAGE_NOT_FOUND;
4482
4483 RTSpinlockRelease(pGMM->hSpinLockTree);
4484 return rc;
4485}
4486
4487#endif
4488
4489#ifdef VBOX_WITH_PAGE_SHARING
4490
4491# ifdef VBOX_STRICT
4492/**
4493 * For checksumming shared pages in strict builds.
4494 *
4495 * The purpose is making sure that a page doesn't change.
4496 *
4497 * @returns Checksum, 0 on failure.
4498 * @param pGMM The GMM instance data.
4499 * @param pGVM Pointer to the kernel-only VM instace data.
4500 * @param idPage The page ID.
4501 */
4502static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage)
4503{
4504 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4505 AssertMsgReturn(pChunk, ("idPage=%#x\n", idPage), 0);
4506
4507 uint8_t *pbChunk;
4508 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4509 return 0;
4510 uint8_t const *pbPage = pbChunk + ((idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4511
4512 return RTCrc32(pbPage, PAGE_SIZE);
4513}
4514# endif /* VBOX_STRICT */
4515
4516
4517/**
4518 * Calculates the module hash value.
4519 *
4520 * @returns Hash value.
4521 * @param pszModuleName The module name.
4522 * @param pszVersion The module version string.
4523 */
4524static uint32_t gmmR0ShModCalcHash(const char *pszModuleName, const char *pszVersion)
4525{
4526 return RTStrHash1ExN(3, pszModuleName, RTSTR_MAX, "::", (size_t)2, pszVersion, RTSTR_MAX);
4527}
4528
4529
4530/**
4531 * Finds a global module.
4532 *
4533 * @returns Pointer to the global module on success, NULL if not found.
4534 * @param pGMM The GMM instance data.
4535 * @param uHash The hash as calculated by gmmR0ShModCalcHash.
4536 * @param cbModule The module size.
4537 * @param enmGuestOS The guest OS type.
4538 * @param cRegions The number of regions.
4539 * @param pszModuleName The module name.
4540 * @param pszVersion The module version.
4541 * @param paRegions The region descriptions.
4542 */
4543static PGMMSHAREDMODULE gmmR0ShModFindGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4544 uint32_t cRegions, const char *pszModuleName, const char *pszVersion,
4545 struct VMMDEVSHAREDREGIONDESC const *paRegions)
4546{
4547 for (PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTAvllU32Get(&pGMM->pGlobalSharedModuleTree, uHash);
4548 pGblMod;
4549 pGblMod = (PGMMSHAREDMODULE)pGblMod->Core.pList)
4550 {
4551 if (pGblMod->cbModule != cbModule)
4552 continue;
4553 if (pGblMod->enmGuestOS != enmGuestOS)
4554 continue;
4555 if (pGblMod->cRegions != cRegions)
4556 continue;
4557 if (strcmp(pGblMod->szName, pszModuleName))
4558 continue;
4559 if (strcmp(pGblMod->szVersion, pszVersion))
4560 continue;
4561
4562 uint32_t i;
4563 for (i = 0; i < cRegions; i++)
4564 {
4565 uint32_t off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4566 if (pGblMod->aRegions[i].off != off)
4567 break;
4568
4569 uint32_t cb = RT_ALIGN_32(paRegions[i].cbRegion + off, PAGE_SIZE);
4570 if (pGblMod->aRegions[i].cb != cb)
4571 break;
4572 }
4573
4574 if (i == cRegions)
4575 return pGblMod;
4576 }
4577
4578 return NULL;
4579}
4580
4581
4582/**
4583 * Creates a new global module.
4584 *
4585 * @returns VBox status code.
4586 * @param pGMM The GMM instance data.
4587 * @param uHash The hash as calculated by gmmR0ShModCalcHash.
4588 * @param cbModule The module size.
4589 * @param enmGuestOS The guest OS type.
4590 * @param cRegions The number of regions.
4591 * @param pszModuleName The module name.
4592 * @param pszVersion The module version.
4593 * @param paRegions The region descriptions.
4594 * @param ppGblMod Where to return the new module on success.
4595 */
4596static int gmmR0ShModNewGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4597 uint32_t cRegions, const char *pszModuleName, const char *pszVersion,
4598 struct VMMDEVSHAREDREGIONDESC const *paRegions, PGMMSHAREDMODULE *ppGblMod)
4599{
4600 Log(("gmmR0ShModNewGlobal: %s %s size %#x os %u rgn %u\n", pszModuleName, pszVersion, cbModule, enmGuestOS, cRegions));
4601 if (pGMM->cShareableModules >= GMM_MAX_SHARED_GLOBAL_MODULES)
4602 {
4603 Log(("gmmR0ShModNewGlobal: Too many modules\n"));
4604 return VERR_GMM_TOO_MANY_GLOBAL_MODULES;
4605 }
4606
4607 PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULE, aRegions[cRegions]));
4608 if (!pGblMod)
4609 {
4610 Log(("gmmR0ShModNewGlobal: No memory\n"));
4611 return VERR_NO_MEMORY;
4612 }
4613
4614 pGblMod->Core.Key = uHash;
4615 pGblMod->cbModule = cbModule;
4616 pGblMod->cRegions = cRegions;
4617 pGblMod->cUsers = 1;
4618 pGblMod->enmGuestOS = enmGuestOS;
4619 strcpy(pGblMod->szName, pszModuleName);
4620 strcpy(pGblMod->szVersion, pszVersion);
4621
4622 for (uint32_t i = 0; i < cRegions; i++)
4623 {
4624 Log(("gmmR0ShModNewGlobal: rgn[%u]=%RGvLB%#x\n", i, paRegions[i].GCRegionAddr, paRegions[i].cbRegion));
4625 pGblMod->aRegions[i].off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4626 pGblMod->aRegions[i].cb = paRegions[i].cbRegion + pGblMod->aRegions[i].off;
4627 pGblMod->aRegions[i].cb = RT_ALIGN_32(pGblMod->aRegions[i].cb, PAGE_SIZE);
4628 pGblMod->aRegions[i].paidPages = NULL; /* allocated when needed. */
4629 }
4630
4631 bool fInsert = RTAvllU32Insert(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4632 Assert(fInsert); NOREF(fInsert);
4633 pGMM->cShareableModules++;
4634
4635 *ppGblMod = pGblMod;
4636 return VINF_SUCCESS;
4637}
4638
4639
4640/**
4641 * Deletes a global module which is no longer referenced by anyone.
4642 *
4643 * @param pGMM The GMM instance data.
4644 * @param pGblMod The module to delete.
4645 */
4646static void gmmR0ShModDeleteGlobal(PGMM pGMM, PGMMSHAREDMODULE pGblMod)
4647{
4648 Assert(pGblMod->cUsers == 0);
4649 Assert(pGMM->cShareableModules > 0 && pGMM->cShareableModules <= GMM_MAX_SHARED_GLOBAL_MODULES);
4650
4651 void *pvTest = RTAvllU32RemoveNode(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4652 Assert(pvTest == pGblMod); NOREF(pvTest);
4653 pGMM->cShareableModules--;
4654
4655 uint32_t i = pGblMod->cRegions;
4656 while (i-- > 0)
4657 {
4658 if (pGblMod->aRegions[i].paidPages)
4659 {
4660 /* We don't doing anything to the pages as they are handled by the
4661 copy-on-write mechanism in PGM. */
4662 RTMemFree(pGblMod->aRegions[i].paidPages);
4663 pGblMod->aRegions[i].paidPages = NULL;
4664 }
4665 }
4666 RTMemFree(pGblMod);
4667}
4668
4669
4670static int gmmR0ShModNewPerVM(PGVM pGVM, RTGCPTR GCBaseAddr, uint32_t cRegions, const VMMDEVSHAREDREGIONDESC *paRegions,
4671 PGMMSHAREDMODULEPERVM *ppRecVM)
4672{
4673 if (pGVM->gmm.s.Stats.cShareableModules >= GMM_MAX_SHARED_PER_VM_MODULES)
4674 return VERR_GMM_TOO_MANY_PER_VM_MODULES;
4675
4676 PGMMSHAREDMODULEPERVM pRecVM;
4677 pRecVM = (PGMMSHAREDMODULEPERVM)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULEPERVM, aRegionsGCPtrs[cRegions]));
4678 if (!pRecVM)
4679 return VERR_NO_MEMORY;
4680
4681 pRecVM->Core.Key = GCBaseAddr;
4682 for (uint32_t i = 0; i < cRegions; i++)
4683 pRecVM->aRegionsGCPtrs[i] = paRegions[i].GCRegionAddr;
4684
4685 bool fInsert = RTAvlGCPtrInsert(&pGVM->gmm.s.pSharedModuleTree, &pRecVM->Core);
4686 Assert(fInsert); NOREF(fInsert);
4687 pGVM->gmm.s.Stats.cShareableModules++;
4688
4689 *ppRecVM = pRecVM;
4690 return VINF_SUCCESS;
4691}
4692
4693
4694static void gmmR0ShModDeletePerVM(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULEPERVM pRecVM, bool fRemove)
4695{
4696 /*
4697 * Free the per-VM module.
4698 */
4699 PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
4700 pRecVM->pGlobalModule = NULL;
4701
4702 if (fRemove)
4703 {
4704 void *pvTest = RTAvlGCPtrRemove(&pGVM->gmm.s.pSharedModuleTree, pRecVM->Core.Key);
4705 Assert(pvTest == &pRecVM->Core); NOREF(pvTest);
4706 }
4707
4708 RTMemFree(pRecVM);
4709
4710 /*
4711 * Release the global module.
4712 * (In the registration bailout case, it might not be.)
4713 */
4714 if (pGblMod)
4715 {
4716 Assert(pGblMod->cUsers > 0);
4717 pGblMod->cUsers--;
4718 if (pGblMod->cUsers == 0)
4719 gmmR0ShModDeleteGlobal(pGMM, pGblMod);
4720 }
4721}
4722
4723#endif /* VBOX_WITH_PAGE_SHARING */
4724
4725/**
4726 * Registers a new shared module for the VM.
4727 *
4728 * @returns VBox status code.
4729 * @param pGVM The global (ring-0) VM structure.
4730 * @param idCpu The VCPU id.
4731 * @param enmGuestOS The guest OS type.
4732 * @param pszModuleName The module name.
4733 * @param pszVersion The module version.
4734 * @param GCPtrModBase The module base address.
4735 * @param cbModule The module size.
4736 * @param cRegions The mumber of shared region descriptors.
4737 * @param paRegions Pointer to an array of shared region(s).
4738 * @thread EMT(idCpu)
4739 */
4740GMMR0DECL(int) GMMR0RegisterSharedModule(PGVM pGVM, VMCPUID idCpu, VBOXOSFAMILY enmGuestOS, char *pszModuleName,
4741 char *pszVersion, RTGCPTR GCPtrModBase, uint32_t cbModule,
4742 uint32_t cRegions, struct VMMDEVSHAREDREGIONDESC const *paRegions)
4743{
4744#ifdef VBOX_WITH_PAGE_SHARING
4745 /*
4746 * Validate input and get the basics.
4747 *
4748 * Note! Turns out the module size does necessarily match the size of the
4749 * regions. (iTunes on XP)
4750 */
4751 PGMM pGMM;
4752 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4753 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4754 if (RT_FAILURE(rc))
4755 return rc;
4756
4757 if (RT_UNLIKELY(cRegions > VMMDEVSHAREDREGIONDESC_MAX))
4758 return VERR_GMM_TOO_MANY_REGIONS;
4759
4760 if (RT_UNLIKELY(cbModule == 0 || cbModule > _1G))
4761 return VERR_GMM_BAD_SHARED_MODULE_SIZE;
4762
4763 uint32_t cbTotal = 0;
4764 for (uint32_t i = 0; i < cRegions; i++)
4765 {
4766 if (RT_UNLIKELY(paRegions[i].cbRegion == 0 || paRegions[i].cbRegion > _1G))
4767 return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4768
4769 cbTotal += paRegions[i].cbRegion;
4770 if (RT_UNLIKELY(cbTotal > _1G))
4771 return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4772 }
4773
4774 AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4775 if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4776 return VERR_GMM_MODULE_NAME_TOO_LONG;
4777
4778 AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4779 if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4780 return VERR_GMM_MODULE_NAME_TOO_LONG;
4781
4782 uint32_t const uHash = gmmR0ShModCalcHash(pszModuleName, pszVersion);
4783 Log(("GMMR0RegisterSharedModule %s %s base %RGv size %x hash %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule, uHash));
4784
4785 /*
4786 * Take the semaphore and do some more validations.
4787 */
4788 gmmR0MutexAcquire(pGMM);
4789 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4790 {
4791 /*
4792 * Check if this module is already locally registered and register
4793 * it if it isn't. The base address is a unique module identifier
4794 * locally.
4795 */
4796 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4797 bool fNewModule = pRecVM == NULL;
4798 if (fNewModule)
4799 {
4800 rc = gmmR0ShModNewPerVM(pGVM, GCPtrModBase, cRegions, paRegions, &pRecVM);
4801 if (RT_SUCCESS(rc))
4802 {
4803 /*
4804 * Find a matching global module, register a new one if needed.
4805 */
4806 PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4807 pszModuleName, pszVersion, paRegions);
4808 if (!pGblMod)
4809 {
4810 Assert(fNewModule);
4811 rc = gmmR0ShModNewGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4812 pszModuleName, pszVersion, paRegions, &pGblMod);
4813 if (RT_SUCCESS(rc))
4814 {
4815 pRecVM->pGlobalModule = pGblMod; /* (One referenced returned by gmmR0ShModNewGlobal.) */
4816 Log(("GMMR0RegisterSharedModule: new module %s %s\n", pszModuleName, pszVersion));
4817 }
4818 else
4819 gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /*fRemove*/);
4820 }
4821 else
4822 {
4823 Assert(pGblMod->cUsers > 0 && pGblMod->cUsers < UINT32_MAX / 2);
4824 pGblMod->cUsers++;
4825 pRecVM->pGlobalModule = pGblMod;
4826
4827 Log(("GMMR0RegisterSharedModule: new per vm module %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4828 }
4829 }
4830 }
4831 else
4832 {
4833 /*
4834 * Attempt to re-register an existing module.
4835 */
4836 PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4837 pszModuleName, pszVersion, paRegions);
4838 if (pRecVM->pGlobalModule == pGblMod)
4839 {
4840 Log(("GMMR0RegisterSharedModule: already registered %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4841 rc = VINF_GMM_SHARED_MODULE_ALREADY_REGISTERED;
4842 }
4843 else
4844 {
4845 /** @todo may have to unregister+register when this happens in case it's caused
4846 * by VBoxService crashing and being restarted... */
4847 Log(("GMMR0RegisterSharedModule: Address clash!\n"
4848 " incoming at %RGvLB%#x %s %s rgns %u\n"
4849 " existing at %RGvLB%#x %s %s rgns %u\n",
4850 GCPtrModBase, cbModule, pszModuleName, pszVersion, cRegions,
4851 pRecVM->Core.Key, pRecVM->pGlobalModule->cbModule, pRecVM->pGlobalModule->szName,
4852 pRecVM->pGlobalModule->szVersion, pRecVM->pGlobalModule->cRegions));
4853 rc = VERR_GMM_SHARED_MODULE_ADDRESS_CLASH;
4854 }
4855 }
4856 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4857 }
4858 else
4859 rc = VERR_GMM_IS_NOT_SANE;
4860
4861 gmmR0MutexRelease(pGMM);
4862 return rc;
4863#else
4864
4865 NOREF(pGVM); NOREF(idCpu); NOREF(enmGuestOS); NOREF(pszModuleName); NOREF(pszVersion);
4866 NOREF(GCPtrModBase); NOREF(cbModule); NOREF(cRegions); NOREF(paRegions);
4867 return VERR_NOT_IMPLEMENTED;
4868#endif
4869}
4870
4871
4872/**
4873 * VMMR0 request wrapper for GMMR0RegisterSharedModule.
4874 *
4875 * @returns see GMMR0RegisterSharedModule.
4876 * @param pGVM The global (ring-0) VM structure.
4877 * @param idCpu The VCPU id.
4878 * @param pReq Pointer to the request packet.
4879 */
4880GMMR0DECL(int) GMMR0RegisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMREGISTERSHAREDMODULEREQ pReq)
4881{
4882 /*
4883 * Validate input and pass it on.
4884 */
4885 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4886 AssertMsgReturn( pReq->Hdr.cbReq >= sizeof(*pReq)
4887 && pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMREGISTERSHAREDMODULEREQ, aRegions[pReq->cRegions]),
4888 ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4889
4890 /* Pass back return code in the request packet to preserve informational codes. (VMMR3CallR0 chokes on them) */
4891 pReq->rc = GMMR0RegisterSharedModule(pGVM, idCpu, pReq->enmGuestOS, pReq->szName, pReq->szVersion,
4892 pReq->GCBaseAddr, pReq->cbModule, pReq->cRegions, pReq->aRegions);
4893 return VINF_SUCCESS;
4894}
4895
4896
4897/**
4898 * Unregisters a shared module for the VM
4899 *
4900 * @returns VBox status code.
4901 * @param pGVM The global (ring-0) VM structure.
4902 * @param idCpu The VCPU id.
4903 * @param pszModuleName The module name.
4904 * @param pszVersion The module version.
4905 * @param GCPtrModBase The module base address.
4906 * @param cbModule The module size.
4907 */
4908GMMR0DECL(int) GMMR0UnregisterSharedModule(PGVM pGVM, VMCPUID idCpu, char *pszModuleName, char *pszVersion,
4909 RTGCPTR GCPtrModBase, uint32_t cbModule)
4910{
4911#ifdef VBOX_WITH_PAGE_SHARING
4912 /*
4913 * Validate input and get the basics.
4914 */
4915 PGMM pGMM;
4916 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4917 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4918 if (RT_FAILURE(rc))
4919 return rc;
4920
4921 AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4922 AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4923 if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4924 return VERR_GMM_MODULE_NAME_TOO_LONG;
4925 if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4926 return VERR_GMM_MODULE_NAME_TOO_LONG;
4927
4928 Log(("GMMR0UnregisterSharedModule %s %s base=%RGv size %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule));
4929
4930 /*
4931 * Take the semaphore and do some more validations.
4932 */
4933 gmmR0MutexAcquire(pGMM);
4934 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4935 {
4936 /*
4937 * Locate and remove the specified module.
4938 */
4939 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4940 if (pRecVM)
4941 {
4942 /** @todo Do we need to do more validations here, like that the
4943 * name + version + cbModule matches? */
4944 NOREF(cbModule);
4945 Assert(pRecVM->pGlobalModule);
4946 gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /*fRemove*/);
4947 }
4948 else
4949 rc = VERR_GMM_SHARED_MODULE_NOT_FOUND;
4950
4951 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4952 }
4953 else
4954 rc = VERR_GMM_IS_NOT_SANE;
4955
4956 gmmR0MutexRelease(pGMM);
4957 return rc;
4958#else
4959
4960 NOREF(pGVM); NOREF(idCpu); NOREF(pszModuleName); NOREF(pszVersion); NOREF(GCPtrModBase); NOREF(cbModule);
4961 return VERR_NOT_IMPLEMENTED;
4962#endif
4963}
4964
4965
4966/**
4967 * VMMR0 request wrapper for GMMR0UnregisterSharedModule.
4968 *
4969 * @returns see GMMR0UnregisterSharedModule.
4970 * @param pGVM The global (ring-0) VM structure.
4971 * @param idCpu The VCPU id.
4972 * @param pReq Pointer to the request packet.
4973 */
4974GMMR0DECL(int) GMMR0UnregisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMUNREGISTERSHAREDMODULEREQ pReq)
4975{
4976 /*
4977 * Validate input and pass it on.
4978 */
4979 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4980 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4981
4982 return GMMR0UnregisterSharedModule(pGVM, idCpu, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule);
4983}
4984
4985#ifdef VBOX_WITH_PAGE_SHARING
4986
4987/**
4988 * Increase the use count of a shared page, the page is known to exist and be valid and such.
4989 *
4990 * @param pGMM Pointer to the GMM instance.
4991 * @param pGVM Pointer to the GVM instance.
4992 * @param pPage The page structure.
4993 */
4994DECLINLINE(void) gmmR0UseSharedPage(PGMM pGMM, PGVM pGVM, PGMMPAGE pPage)
4995{
4996 Assert(pGMM->cSharedPages > 0);
4997 Assert(pGMM->cAllocatedPages > 0);
4998
4999 pGMM->cDuplicatePages++;
5000
5001 pPage->Shared.cRefs++;
5002 pGVM->gmm.s.Stats.cSharedPages++;
5003 pGVM->gmm.s.Stats.Allocated.cBasePages++;
5004}
5005
5006
5007/**
5008 * Converts a private page to a shared page, the page is known to exist and be valid and such.
5009 *
5010 * @param pGMM Pointer to the GMM instance.
5011 * @param pGVM Pointer to the GVM instance.
5012 * @param HCPhys Host physical address
5013 * @param idPage The Page ID
5014 * @param pPage The page structure.
5015 * @param pPageDesc Shared page descriptor
5016 */
5017DECLINLINE(void) gmmR0ConvertToSharedPage(PGMM pGMM, PGVM pGVM, RTHCPHYS HCPhys, uint32_t idPage, PGMMPAGE pPage,
5018 PGMMSHAREDPAGEDESC pPageDesc)
5019{
5020 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
5021 Assert(pChunk);
5022 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
5023 Assert(GMM_PAGE_IS_PRIVATE(pPage));
5024
5025 pChunk->cPrivate--;
5026 pChunk->cShared++;
5027
5028 pGMM->cSharedPages++;
5029
5030 pGVM->gmm.s.Stats.cSharedPages++;
5031 pGVM->gmm.s.Stats.cPrivatePages--;
5032
5033 /* Modify the page structure. */
5034 pPage->Shared.pfn = (uint32_t)(uint64_t)(HCPhys >> PAGE_SHIFT);
5035 pPage->Shared.cRefs = 1;
5036#ifdef VBOX_STRICT
5037 pPageDesc->u32StrictChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
5038 pPage->Shared.u14Checksum = pPageDesc->u32StrictChecksum;
5039#else
5040 NOREF(pPageDesc);
5041 pPage->Shared.u14Checksum = 0;
5042#endif
5043 pPage->Shared.u2State = GMM_PAGE_STATE_SHARED;
5044}
5045
5046
5047static int gmmR0SharedModuleCheckPageFirstTime(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULE pModule,
5048 unsigned idxRegion, unsigned idxPage,
5049 PGMMSHAREDPAGEDESC pPageDesc, PGMMSHAREDREGIONDESC pGlobalRegion)
5050{
5051 NOREF(pModule);
5052
5053 /* Easy case: just change the internal page type. */
5054 PGMMPAGE pPage = gmmR0GetPage(pGMM, pPageDesc->idPage);
5055 AssertMsgReturn(pPage, ("idPage=%#x (GCPhys=%RGp HCPhys=%RHp idxRegion=%#x idxPage=%#x) #1\n",
5056 pPageDesc->idPage, pPageDesc->GCPhys, pPageDesc->HCPhys, idxRegion, idxPage),
5057 VERR_PGM_PHYS_INVALID_PAGE_ID);
5058 NOREF(idxRegion);
5059
5060 AssertMsg(pPageDesc->GCPhys == (pPage->Private.pfn << 12), ("desc %RGp gmm %RGp\n", pPageDesc->HCPhys, (pPage->Private.pfn << 12)));
5061
5062 gmmR0ConvertToSharedPage(pGMM, pGVM, pPageDesc->HCPhys, pPageDesc->idPage, pPage, pPageDesc);
5063
5064 /* Keep track of these references. */
5065 pGlobalRegion->paidPages[idxPage] = pPageDesc->idPage;
5066
5067 return VINF_SUCCESS;
5068}
5069
5070/**
5071 * Checks specified shared module range for changes
5072 *
5073 * Performs the following tasks:
5074 * - If a shared page is new, then it changes the GMM page type to shared and
5075 * returns it in the pPageDesc descriptor.
5076 * - If a shared page already exists, then it checks if the VM page is
5077 * identical and if so frees the VM page and returns the shared page in
5078 * pPageDesc descriptor.
5079 *
5080 * @remarks ASSUMES the caller has acquired the GMM semaphore!!
5081 *
5082 * @returns VBox status code.
5083 * @param pGVM Pointer to the GVM instance data.
5084 * @param pModule Module description
5085 * @param idxRegion Region index
5086 * @param idxPage Page index
5087 * @param pPageDesc Page descriptor
5088 */
5089GMMR0DECL(int) GMMR0SharedModuleCheckPage(PGVM pGVM, PGMMSHAREDMODULE pModule, uint32_t idxRegion, uint32_t idxPage,
5090 PGMMSHAREDPAGEDESC pPageDesc)
5091{
5092 int rc;
5093 PGMM pGMM;
5094 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5095 pPageDesc->u32StrictChecksum = 0;
5096
5097 AssertMsgReturn(idxRegion < pModule->cRegions,
5098 ("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5099 VERR_INVALID_PARAMETER);
5100
5101 uint32_t const cPages = pModule->aRegions[idxRegion].cb >> PAGE_SHIFT;
5102 AssertMsgReturn(idxPage < cPages,
5103 ("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5104 VERR_INVALID_PARAMETER);
5105
5106 LogFlow(("GMMR0SharedModuleCheckRange %s base %RGv region %d idxPage %d\n", pModule->szName, pModule->Core.Key, idxRegion, idxPage));
5107
5108 /*
5109 * First time; create a page descriptor array.
5110 */
5111 PGMMSHAREDREGIONDESC pGlobalRegion = &pModule->aRegions[idxRegion];
5112 if (!pGlobalRegion->paidPages)
5113 {
5114 Log(("Allocate page descriptor array for %d pages\n", cPages));
5115 pGlobalRegion->paidPages = (uint32_t *)RTMemAlloc(cPages * sizeof(pGlobalRegion->paidPages[0]));
5116 AssertReturn(pGlobalRegion->paidPages, VERR_NO_MEMORY);
5117
5118 /* Invalidate all descriptors. */
5119 uint32_t i = cPages;
5120 while (i-- > 0)
5121 pGlobalRegion->paidPages[i] = NIL_GMM_PAGEID;
5122 }
5123
5124 /*
5125 * We've seen this shared page for the first time?
5126 */
5127 if (pGlobalRegion->paidPages[idxPage] == NIL_GMM_PAGEID)
5128 {
5129 Log(("New shared page guest %RGp host %RHp\n", pPageDesc->GCPhys, pPageDesc->HCPhys));
5130 return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5131 }
5132
5133 /*
5134 * We've seen it before...
5135 */
5136 Log(("Replace existing page guest %RGp host %RHp id %#x -> id %#x\n",
5137 pPageDesc->GCPhys, pPageDesc->HCPhys, pPageDesc->idPage, pGlobalRegion->paidPages[idxPage]));
5138 Assert(pPageDesc->idPage != pGlobalRegion->paidPages[idxPage]);
5139
5140 /*
5141 * Get the shared page source.
5142 */
5143 PGMMPAGE pPage = gmmR0GetPage(pGMM, pGlobalRegion->paidPages[idxPage]);
5144 AssertMsgReturn(pPage, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #2\n", pPageDesc->idPage, idxRegion, idxPage),
5145 VERR_PGM_PHYS_INVALID_PAGE_ID);
5146
5147 if (pPage->Common.u2State != GMM_PAGE_STATE_SHARED)
5148 {
5149 /*
5150 * Page was freed at some point; invalidate this entry.
5151 */
5152 /** @todo this isn't really bullet proof. */
5153 Log(("Old shared page was freed -> create a new one\n"));
5154 pGlobalRegion->paidPages[idxPage] = NIL_GMM_PAGEID;
5155 return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5156 }
5157
5158 Log(("Replace existing page guest host %RHp -> %RHp\n", pPageDesc->HCPhys, ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT));
5159
5160 /*
5161 * Calculate the virtual address of the local page.
5162 */
5163 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pPageDesc->idPage >> GMM_CHUNKID_SHIFT);
5164 AssertMsgReturn(pChunk, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #4\n", pPageDesc->idPage, idxRegion, idxPage),
5165 VERR_PGM_PHYS_INVALID_PAGE_ID);
5166
5167 uint8_t *pbChunk;
5168 AssertMsgReturn(gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk),
5169 ("idPage=%#x (idxRegion=%#x idxPage=%#x) #3\n", pPageDesc->idPage, idxRegion, idxPage),
5170 VERR_PGM_PHYS_INVALID_PAGE_ID);
5171 uint8_t *pbLocalPage = pbChunk + ((pPageDesc->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5172
5173 /*
5174 * Calculate the virtual address of the shared page.
5175 */
5176 pChunk = gmmR0GetChunk(pGMM, pGlobalRegion->paidPages[idxPage] >> GMM_CHUNKID_SHIFT);
5177 Assert(pChunk); /* can't fail as gmmR0GetPage succeeded. */
5178
5179 /*
5180 * Get the virtual address of the physical page; map the chunk into the VM
5181 * process if not already done.
5182 */
5183 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5184 {
5185 Log(("Map chunk into process!\n"));
5186 rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/, (PRTR3PTR)&pbChunk);
5187 AssertRCReturn(rc, rc);
5188 }
5189 uint8_t *pbSharedPage = pbChunk + ((pGlobalRegion->paidPages[idxPage] & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5190
5191#ifdef VBOX_STRICT
5192 pPageDesc->u32StrictChecksum = RTCrc32(pbSharedPage, PAGE_SIZE);
5193 uint32_t uChecksum = pPageDesc->u32StrictChecksum & UINT32_C(0x00003fff);
5194 AssertMsg(!uChecksum || uChecksum == pPage->Shared.u14Checksum || !pPage->Shared.u14Checksum,
5195 ("%#x vs %#x - idPage=%#x - %s %s\n", uChecksum, pPage->Shared.u14Checksum,
5196 pGlobalRegion->paidPages[idxPage], pModule->szName, pModule->szVersion));
5197#endif
5198
5199 /** @todo write ASMMemComparePage. */
5200 if (memcmp(pbSharedPage, pbLocalPage, PAGE_SIZE))
5201 {
5202 Log(("Unexpected differences found between local and shared page; skip\n"));
5203 /* Signal to the caller that this one hasn't changed. */
5204 pPageDesc->idPage = NIL_GMM_PAGEID;
5205 return VINF_SUCCESS;
5206 }
5207
5208 /*
5209 * Free the old local page.
5210 */
5211 GMMFREEPAGEDESC PageDesc;
5212 PageDesc.idPage = pPageDesc->idPage;
5213 rc = gmmR0FreePages(pGMM, pGVM, 1, &PageDesc, GMMACCOUNT_BASE);
5214 AssertRCReturn(rc, rc);
5215
5216 gmmR0UseSharedPage(pGMM, pGVM, pPage);
5217
5218 /*
5219 * Pass along the new physical address & page id.
5220 */
5221 pPageDesc->HCPhys = ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT;
5222 pPageDesc->idPage = pGlobalRegion->paidPages[idxPage];
5223
5224 return VINF_SUCCESS;
5225}
5226
5227
5228/**
5229 * RTAvlGCPtrDestroy callback.
5230 *
5231 * @returns 0 or VERR_GMM_INSTANCE.
5232 * @param pNode The node to destroy.
5233 * @param pvArgs Pointer to an argument packet.
5234 */
5235static DECLCALLBACK(int) gmmR0CleanupSharedModule(PAVLGCPTRNODECORE pNode, void *pvArgs)
5236{
5237 gmmR0ShModDeletePerVM(((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGMM,
5238 ((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGVM,
5239 (PGMMSHAREDMODULEPERVM)pNode,
5240 false /*fRemove*/);
5241 return VINF_SUCCESS;
5242}
5243
5244
5245/**
5246 * Used by GMMR0CleanupVM to clean up shared modules.
5247 *
5248 * This is called without taking the GMM lock so that it can be yielded as
5249 * needed here.
5250 *
5251 * @param pGMM The GMM handle.
5252 * @param pGVM The global VM handle.
5253 */
5254static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM)
5255{
5256 gmmR0MutexAcquire(pGMM);
5257 GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
5258
5259 GMMR0SHMODPERVMDTORARGS Args;
5260 Args.pGVM = pGVM;
5261 Args.pGMM = pGMM;
5262 RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5263
5264 AssertMsg(pGVM->gmm.s.Stats.cShareableModules == 0, ("%d\n", pGVM->gmm.s.Stats.cShareableModules));
5265 pGVM->gmm.s.Stats.cShareableModules = 0;
5266
5267 gmmR0MutexRelease(pGMM);
5268}
5269
5270#endif /* VBOX_WITH_PAGE_SHARING */
5271
5272/**
5273 * Removes all shared modules for the specified VM
5274 *
5275 * @returns VBox status code.
5276 * @param pGVM The global (ring-0) VM structure.
5277 * @param idCpu The VCPU id.
5278 */
5279GMMR0DECL(int) GMMR0ResetSharedModules(PGVM pGVM, VMCPUID idCpu)
5280{
5281#ifdef VBOX_WITH_PAGE_SHARING
5282 /*
5283 * Validate input and get the basics.
5284 */
5285 PGMM pGMM;
5286 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5287 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5288 if (RT_FAILURE(rc))
5289 return rc;
5290
5291 /*
5292 * Take the semaphore and do some more validations.
5293 */
5294 gmmR0MutexAcquire(pGMM);
5295 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5296 {
5297 Log(("GMMR0ResetSharedModules\n"));
5298 GMMR0SHMODPERVMDTORARGS Args;
5299 Args.pGVM = pGVM;
5300 Args.pGMM = pGMM;
5301 RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5302 pGVM->gmm.s.Stats.cShareableModules = 0;
5303
5304 rc = VINF_SUCCESS;
5305 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5306 }
5307 else
5308 rc = VERR_GMM_IS_NOT_SANE;
5309
5310 gmmR0MutexRelease(pGMM);
5311 return rc;
5312#else
5313 RT_NOREF(pGVM, idCpu);
5314 return VERR_NOT_IMPLEMENTED;
5315#endif
5316}
5317
5318#ifdef VBOX_WITH_PAGE_SHARING
5319
5320/**
5321 * Tree enumeration callback for checking a shared module.
5322 */
5323static DECLCALLBACK(int) gmmR0CheckSharedModule(PAVLGCPTRNODECORE pNode, void *pvUser)
5324{
5325 GMMCHECKSHAREDMODULEINFO *pArgs = (GMMCHECKSHAREDMODULEINFO*)pvUser;
5326 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)pNode;
5327 PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
5328
5329 Log(("gmmR0CheckSharedModule: check %s %s base=%RGv size=%x\n",
5330 pGblMod->szName, pGblMod->szVersion, pGblMod->Core.Key, pGblMod->cbModule));
5331
5332 int rc = PGMR0SharedModuleCheck(pArgs->pGVM, pArgs->pGVM, pArgs->idCpu, pGblMod, pRecVM->aRegionsGCPtrs);
5333 if (RT_FAILURE(rc))
5334 return rc;
5335 return VINF_SUCCESS;
5336}
5337
5338#endif /* VBOX_WITH_PAGE_SHARING */
5339
5340/**
5341 * Check all shared modules for the specified VM.
5342 *
5343 * @returns VBox status code.
5344 * @param pGVM The global (ring-0) VM structure.
5345 * @param idCpu The calling EMT number.
5346 * @thread EMT(idCpu)
5347 */
5348GMMR0DECL(int) GMMR0CheckSharedModules(PGVM pGVM, VMCPUID idCpu)
5349{
5350#ifdef VBOX_WITH_PAGE_SHARING
5351 /*
5352 * Validate input and get the basics.
5353 */
5354 PGMM pGMM;
5355 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5356 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5357 if (RT_FAILURE(rc))
5358 return rc;
5359
5360# ifndef DEBUG_sandervl
5361 /*
5362 * Take the semaphore and do some more validations.
5363 */
5364 gmmR0MutexAcquire(pGMM);
5365# endif
5366 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5367 {
5368 /*
5369 * Walk the tree, checking each module.
5370 */
5371 Log(("GMMR0CheckSharedModules\n"));
5372
5373 GMMCHECKSHAREDMODULEINFO Args;
5374 Args.pGVM = pGVM;
5375 Args.idCpu = idCpu;
5376 rc = RTAvlGCPtrDoWithAll(&pGVM->gmm.s.pSharedModuleTree, true /* fFromLeft */, gmmR0CheckSharedModule, &Args);
5377
5378 Log(("GMMR0CheckSharedModules done (rc=%Rrc)!\n", rc));
5379 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5380 }
5381 else
5382 rc = VERR_GMM_IS_NOT_SANE;
5383
5384# ifndef DEBUG_sandervl
5385 gmmR0MutexRelease(pGMM);
5386# endif
5387 return rc;
5388#else
5389 RT_NOREF(pGVM, idCpu);
5390 return VERR_NOT_IMPLEMENTED;
5391#endif
5392}
5393
5394#if defined(VBOX_STRICT) && HC_ARCH_BITS == 64
5395
5396/**
5397 * Worker for GMMR0FindDuplicatePageReq.
5398 *
5399 * @returns true if duplicate, false if not.
5400 */
5401static bool gmmR0FindDupPageInChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint8_t const *pbSourcePage)
5402{
5403 bool fFoundDuplicate = false;
5404 /* Only take chunks not mapped into this VM process; not entirely correct. */
5405 uint8_t *pbChunk;
5406 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5407 {
5408 int rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/, (PRTR3PTR)&pbChunk);
5409 if (RT_SUCCESS(rc))
5410 {
5411 /*
5412 * Look for duplicate pages
5413 */
5414 uintptr_t iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
5415 while (iPage-- > 0)
5416 {
5417 if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
5418 {
5419 uint8_t *pbDestPage = pbChunk + (iPage << PAGE_SHIFT);
5420 if (!memcmp(pbSourcePage, pbDestPage, PAGE_SIZE))
5421 {
5422 fFoundDuplicate = true;
5423 break;
5424 }
5425 }
5426 }
5427 gmmR0UnmapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/);
5428 }
5429 }
5430 return fFoundDuplicate;
5431}
5432
5433
5434/**
5435 * Find a duplicate of the specified page in other active VMs
5436 *
5437 * @returns VBox status code.
5438 * @param pGVM The global (ring-0) VM structure.
5439 * @param pReq Pointer to the request packet.
5440 */
5441GMMR0DECL(int) GMMR0FindDuplicatePageReq(PGVM pGVM, PGMMFINDDUPLICATEPAGEREQ pReq)
5442{
5443 /*
5444 * Validate input and pass it on.
5445 */
5446 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5447 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5448
5449 PGMM pGMM;
5450 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5451
5452 int rc = GVMMR0ValidateGVM(pGVM);
5453 if (RT_FAILURE(rc))
5454 return rc;
5455
5456 /*
5457 * Take the semaphore and do some more validations.
5458 */
5459 rc = gmmR0MutexAcquire(pGMM);
5460 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5461 {
5462 uint8_t *pbChunk;
5463 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pReq->idPage >> GMM_CHUNKID_SHIFT);
5464 if (pChunk)
5465 {
5466 if (gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5467 {
5468 uint8_t *pbSourcePage = pbChunk + ((pReq->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5469 PGMMPAGE pPage = gmmR0GetPage(pGMM, pReq->idPage);
5470 if (pPage)
5471 {
5472 /*
5473 * Walk the chunks
5474 */
5475 pReq->fDuplicate = false;
5476 RTListForEach(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
5477 {
5478 if (gmmR0FindDupPageInChunk(pGMM, pGVM, pChunk, pbSourcePage))
5479 {
5480 pReq->fDuplicate = true;
5481 break;
5482 }
5483 }
5484 }
5485 else
5486 {
5487 AssertFailed();
5488 rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
5489 }
5490 }
5491 else
5492 AssertFailed();
5493 }
5494 else
5495 AssertFailed();
5496 }
5497 else
5498 rc = VERR_GMM_IS_NOT_SANE;
5499
5500 gmmR0MutexRelease(pGMM);
5501 return rc;
5502}
5503
5504#endif /* VBOX_STRICT && HC_ARCH_BITS == 64 */
5505
5506
5507/**
5508 * Retrieves the GMM statistics visible to the caller.
5509 *
5510 * @returns VBox status code.
5511 *
5512 * @param pStats Where to put the statistics.
5513 * @param pSession The current session.
5514 * @param pGVM The GVM to obtain statistics for. Optional.
5515 */
5516GMMR0DECL(int) GMMR0QueryStatistics(PGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5517{
5518 LogFlow(("GVMMR0QueryStatistics: pStats=%p pSession=%p pGVM=%p\n", pStats, pSession, pGVM));
5519
5520 /*
5521 * Validate input.
5522 */
5523 AssertPtrReturn(pSession, VERR_INVALID_POINTER);
5524 AssertPtrReturn(pStats, VERR_INVALID_POINTER);
5525 pStats->cMaxPages = 0; /* (crash before taking the mutex...) */
5526
5527 PGMM pGMM;
5528 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5529
5530 /*
5531 * Validate the VM handle, if not NULL, and lock the GMM.
5532 */
5533 int rc;
5534 if (pGVM)
5535 {
5536 rc = GVMMR0ValidateGVM(pGVM);
5537 if (RT_FAILURE(rc))
5538 return rc;
5539 }
5540
5541 rc = gmmR0MutexAcquire(pGMM);
5542 if (RT_FAILURE(rc))
5543 return rc;
5544
5545 /*
5546 * Copy out the GMM statistics.
5547 */
5548 pStats->cMaxPages = pGMM->cMaxPages;
5549 pStats->cReservedPages = pGMM->cReservedPages;
5550 pStats->cOverCommittedPages = pGMM->cOverCommittedPages;
5551 pStats->cAllocatedPages = pGMM->cAllocatedPages;
5552 pStats->cSharedPages = pGMM->cSharedPages;
5553 pStats->cDuplicatePages = pGMM->cDuplicatePages;
5554 pStats->cLeftBehindSharedPages = pGMM->cLeftBehindSharedPages;
5555 pStats->cBalloonedPages = pGMM->cBalloonedPages;
5556 pStats->cChunks = pGMM->cChunks;
5557 pStats->cFreedChunks = pGMM->cFreedChunks;
5558 pStats->cShareableModules = pGMM->cShareableModules;
5559 RT_ZERO(pStats->au64Reserved);
5560
5561 /*
5562 * Copy out the VM statistics.
5563 */
5564 if (pGVM)
5565 pStats->VMStats = pGVM->gmm.s.Stats;
5566 else
5567 RT_ZERO(pStats->VMStats);
5568
5569 gmmR0MutexRelease(pGMM);
5570 return rc;
5571}
5572
5573
5574/**
5575 * VMMR0 request wrapper for GMMR0QueryStatistics.
5576 *
5577 * @returns see GMMR0QueryStatistics.
5578 * @param pGVM The global (ring-0) VM structure. Optional.
5579 * @param pReq Pointer to the request packet.
5580 */
5581GMMR0DECL(int) GMMR0QueryStatisticsReq(PGVM pGVM, PGMMQUERYSTATISTICSSREQ pReq)
5582{
5583 /*
5584 * Validate input and pass it on.
5585 */
5586 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5587 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5588
5589 return GMMR0QueryStatistics(&pReq->Stats, pReq->pSession, pGVM);
5590}
5591
5592
5593/**
5594 * Resets the specified GMM statistics.
5595 *
5596 * @returns VBox status code.
5597 *
5598 * @param pStats Which statistics to reset, that is, non-zero fields
5599 * indicates which to reset.
5600 * @param pSession The current session.
5601 * @param pGVM The GVM to reset statistics for. Optional.
5602 */
5603GMMR0DECL(int) GMMR0ResetStatistics(PCGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5604{
5605 NOREF(pStats); NOREF(pSession); NOREF(pGVM);
5606 /* Currently nothing we can reset at the moment. */
5607 return VINF_SUCCESS;
5608}
5609
5610
5611/**
5612 * VMMR0 request wrapper for GMMR0ResetStatistics.
5613 *
5614 * @returns see GMMR0ResetStatistics.
5615 * @param pGVM The global (ring-0) VM structure. Optional.
5616 * @param pReq Pointer to the request packet.
5617 */
5618GMMR0DECL(int) GMMR0ResetStatisticsReq(PGVM pGVM, PGMMRESETSTATISTICSSREQ pReq)
5619{
5620 /*
5621 * Validate input and pass it on.
5622 */
5623 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5624 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5625
5626 return GMMR0ResetStatistics(&pReq->Stats, pReq->pSession, pGVM);
5627}
5628
注意: 瀏覽 TracBrowser 來幫助您使用儲存庫瀏覽器

© 2025 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette