VirtualBox

source: vbox/trunk/src/VBox/VMM/VMMR0/GMMR0.cpp@ 82976

最後變更 在這個檔案從82976是 82976,由 vboxsync 提交於 5 年 前

VMM/GMMR0: Use the chunk list rather than the AVL tree in GMMR0FindDuplicatePageReq to look for duplicate pages. This will restricts the AVL tree to lookups and make it simpler to protect. bugref:9627

  • 屬性 svn:eol-style 設為 native
  • 屬性 svn:keywords 設為 Id Revision
檔案大小: 195.5 KB
 
1/* $Id: GMMR0.cpp 82976 2020-02-04 12:36:10Z vboxsync $ */
2/** @file
3 * GMM - Global Memory Manager.
4 */
5
6/*
7 * Copyright (C) 2007-2020 Oracle Corporation
8 *
9 * This file is part of VirtualBox Open Source Edition (OSE), as
10 * available from http://www.alldomusa.eu.org. This file is free software;
11 * you can redistribute it and/or modify it under the terms of the GNU
12 * General Public License (GPL) as published by the Free Software
13 * Foundation, in version 2 as it comes in the "COPYING" file of the
14 * VirtualBox OSE distribution. VirtualBox OSE is distributed in the
15 * hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
16 */
17
18
19/** @page pg_gmm GMM - The Global Memory Manager
20 *
21 * As the name indicates, this component is responsible for global memory
22 * management. Currently only guest RAM is allocated from the GMM, but this
23 * may change to include shadow page tables and other bits later.
24 *
25 * Guest RAM is managed as individual pages, but allocated from the host OS
26 * in chunks for reasons of portability / efficiency. To minimize the memory
27 * footprint all tracking structure must be as small as possible without
28 * unnecessary performance penalties.
29 *
30 * The allocation chunks has fixed sized, the size defined at compile time
31 * by the #GMM_CHUNK_SIZE \#define.
32 *
33 * Each chunk is given an unique ID. Each page also has a unique ID. The
34 * relationship between the two IDs is:
35 * @code
36 * GMM_CHUNK_SHIFT = log2(GMM_CHUNK_SIZE / PAGE_SIZE);
37 * idPage = (idChunk << GMM_CHUNK_SHIFT) | iPage;
38 * @endcode
39 * Where iPage is the index of the page within the chunk. This ID scheme
40 * permits for efficient chunk and page lookup, but it relies on the chunk size
41 * to be set at compile time. The chunks are organized in an AVL tree with their
42 * IDs being the keys.
43 *
44 * The physical address of each page in an allocation chunk is maintained by
45 * the #RTR0MEMOBJ and obtained using #RTR0MemObjGetPagePhysAddr. There is no
46 * need to duplicate this information (it'll cost 8-bytes per page if we did).
47 *
48 * So what do we need to track per page? Most importantly we need to know
49 * which state the page is in:
50 * - Private - Allocated for (eventually) backing one particular VM page.
51 * - Shared - Readonly page that is used by one or more VMs and treated
52 * as COW by PGM.
53 * - Free - Not used by anyone.
54 *
55 * For the page replacement operations (sharing, defragmenting and freeing)
56 * to be somewhat efficient, private pages needs to be associated with a
57 * particular page in a particular VM.
58 *
59 * Tracking the usage of shared pages is impractical and expensive, so we'll
60 * settle for a reference counting system instead.
61 *
62 * Free pages will be chained on LIFOs
63 *
64 * On 64-bit systems we will use a 64-bit bitfield per page, while on 32-bit
65 * systems a 32-bit bitfield will have to suffice because of address space
66 * limitations. The #GMMPAGE structure shows the details.
67 *
68 *
69 * @section sec_gmm_alloc_strat Page Allocation Strategy
70 *
71 * The strategy for allocating pages has to take fragmentation and shared
72 * pages into account, or we may end up with with 2000 chunks with only
73 * a few pages in each. Shared pages cannot easily be reallocated because
74 * of the inaccurate usage accounting (see above). Private pages can be
75 * reallocated by a defragmentation thread in the same manner that sharing
76 * is done.
77 *
78 * The first approach is to manage the free pages in two sets depending on
79 * whether they are mainly for the allocation of shared or private pages.
80 * In the initial implementation there will be almost no possibility for
81 * mixing shared and private pages in the same chunk (only if we're really
82 * stressed on memory), but when we implement forking of VMs and have to
83 * deal with lots of COW pages it'll start getting kind of interesting.
84 *
85 * The sets are lists of chunks with approximately the same number of
86 * free pages. Say the chunk size is 1MB, meaning 256 pages, and a set
87 * consists of 16 lists. So, the first list will contain the chunks with
88 * 1-7 free pages, the second covers 8-15, and so on. The chunks will be
89 * moved between the lists as pages are freed up or allocated.
90 *
91 *
92 * @section sec_gmm_costs Costs
93 *
94 * The per page cost in kernel space is 32-bit plus whatever RTR0MEMOBJ
95 * entails. In addition there is the chunk cost of approximately
96 * (sizeof(RT0MEMOBJ) + sizeof(CHUNK)) / 2^CHUNK_SHIFT bytes per page.
97 *
98 * On Windows the per page #RTR0MEMOBJ cost is 32-bit on 32-bit windows
99 * and 64-bit on 64-bit windows (a PFN_NUMBER in the MDL). So, 64-bit per page.
100 * The cost on Linux is identical, but here it's because of sizeof(struct page *).
101 *
102 *
103 * @section sec_gmm_legacy Legacy Mode for Non-Tier-1 Platforms
104 *
105 * In legacy mode the page source is locked user pages and not
106 * #RTR0MemObjAllocPhysNC, this means that a page can only be allocated
107 * by the VM that locked it. We will make no attempt at implementing
108 * page sharing on these systems, just do enough to make it all work.
109 *
110 * @note With 6.1 really dropping 32-bit support, the legacy mode is obsoleted
111 * under the assumption that there is sufficient kernel virtual address
112 * space to map all of the guest memory allocations. So, we'll be using
113 * #RTR0MemObjAllocPage on some platforms as an alternative to
114 * #RTR0MemObjAllocPhysNC.
115 *
116 *
117 * @subsection sub_gmm_locking Serializing
118 *
119 * One simple fast mutex will be employed in the initial implementation, not
120 * two as mentioned in @ref sec_pgmPhys_Serializing.
121 *
122 * @see @ref sec_pgmPhys_Serializing
123 *
124 *
125 * @section sec_gmm_overcommit Memory Over-Commitment Management
126 *
127 * The GVM will have to do the system wide memory over-commitment
128 * management. My current ideas are:
129 * - Per VM oc policy that indicates how much to initially commit
130 * to it and what to do in a out-of-memory situation.
131 * - Prevent overtaxing the host.
132 *
133 * There are some challenges here, the main ones are configurability and
134 * security. Should we for instance permit anyone to request 100% memory
135 * commitment? Who should be allowed to do runtime adjustments of the
136 * config. And how to prevent these settings from being lost when the last
137 * VM process exits? The solution is probably to have an optional root
138 * daemon the will keep VMMR0.r0 in memory and enable the security measures.
139 *
140 *
141 *
142 * @section sec_gmm_numa NUMA
143 *
144 * NUMA considerations will be designed and implemented a bit later.
145 *
146 * The preliminary guesses is that we will have to try allocate memory as
147 * close as possible to the CPUs the VM is executed on (EMT and additional CPU
148 * threads). Which means it's mostly about allocation and sharing policies.
149 * Both the scheduler and allocator interface will to supply some NUMA info
150 * and we'll need to have a way to calc access costs.
151 *
152 */
153
154
155/*********************************************************************************************************************************
156* Header Files *
157*********************************************************************************************************************************/
158#define LOG_GROUP LOG_GROUP_GMM
159#include <VBox/rawpci.h>
160#include <VBox/vmm/gmm.h>
161#include "GMMR0Internal.h"
162#include <VBox/vmm/vmcc.h>
163#include <VBox/vmm/pgm.h>
164#include <VBox/log.h>
165#include <VBox/param.h>
166#include <VBox/err.h>
167#include <VBox/VMMDev.h>
168#include <iprt/asm.h>
169#include <iprt/avl.h>
170#ifdef VBOX_STRICT
171# include <iprt/crc.h>
172#endif
173#include <iprt/critsect.h>
174#include <iprt/list.h>
175#include <iprt/mem.h>
176#include <iprt/memobj.h>
177#include <iprt/mp.h>
178#include <iprt/semaphore.h>
179#include <iprt/string.h>
180#include <iprt/time.h>
181
182
183/*********************************************************************************************************************************
184* Defined Constants And Macros *
185*********************************************************************************************************************************/
186/** @def VBOX_USE_CRIT_SECT_FOR_GIANT
187 * Use a critical section instead of a fast mutex for the giant GMM lock.
188 *
189 * @remarks This is primarily a way of avoiding the deadlock checks in the
190 * windows driver verifier. */
191#if defined(RT_OS_WINDOWS) || defined(RT_OS_DARWIN) || defined(DOXYGEN_RUNNING)
192# define VBOX_USE_CRIT_SECT_FOR_GIANT
193#endif
194
195#if (!defined(VBOX_WITH_RAM_IN_KERNEL) || defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)) \
196 && !defined(RT_OS_DARWIN)
197/** Enable the legacy mode code (will be dropped soon). */
198# define GMM_WITH_LEGACY_MODE
199#endif
200
201
202/*********************************************************************************************************************************
203* Structures and Typedefs *
204*********************************************************************************************************************************/
205/** Pointer to set of free chunks. */
206typedef struct GMMCHUNKFREESET *PGMMCHUNKFREESET;
207
208/**
209 * The per-page tracking structure employed by the GMM.
210 *
211 * On 32-bit hosts we'll some trickery is necessary to compress all
212 * the information into 32-bits. When the fSharedFree member is set,
213 * the 30th bit decides whether it's a free page or not.
214 *
215 * Because of the different layout on 32-bit and 64-bit hosts, macros
216 * are used to get and set some of the data.
217 */
218typedef union GMMPAGE
219{
220#if HC_ARCH_BITS == 64
221 /** Unsigned integer view. */
222 uint64_t u;
223
224 /** The common view. */
225 struct GMMPAGECOMMON
226 {
227 uint32_t uStuff1 : 32;
228 uint32_t uStuff2 : 30;
229 /** The page state. */
230 uint32_t u2State : 2;
231 } Common;
232
233 /** The view of a private page. */
234 struct GMMPAGEPRIVATE
235 {
236 /** The guest page frame number. (Max addressable: 2 ^ 44 - 16) */
237 uint32_t pfn;
238 /** The GVM handle. (64K VMs) */
239 uint32_t hGVM : 16;
240 /** Reserved. */
241 uint32_t u16Reserved : 14;
242 /** The page state. */
243 uint32_t u2State : 2;
244 } Private;
245
246 /** The view of a shared page. */
247 struct GMMPAGESHARED
248 {
249 /** The host page frame number. (Max addressable: 2 ^ 44 - 16) */
250 uint32_t pfn;
251 /** The reference count (64K VMs). */
252 uint32_t cRefs : 16;
253 /** Used for debug checksumming. */
254 uint32_t u14Checksum : 14;
255 /** The page state. */
256 uint32_t u2State : 2;
257 } Shared;
258
259 /** The view of a free page. */
260 struct GMMPAGEFREE
261 {
262 /** The index of the next page in the free list. UINT16_MAX is NIL. */
263 uint16_t iNext;
264 /** Reserved. Checksum or something? */
265 uint16_t u16Reserved0;
266 /** Reserved. Checksum or something? */
267 uint32_t u30Reserved1 : 30;
268 /** The page state. */
269 uint32_t u2State : 2;
270 } Free;
271
272#else /* 32-bit */
273 /** Unsigned integer view. */
274 uint32_t u;
275
276 /** The common view. */
277 struct GMMPAGECOMMON
278 {
279 uint32_t uStuff : 30;
280 /** The page state. */
281 uint32_t u2State : 2;
282 } Common;
283
284 /** The view of a private page. */
285 struct GMMPAGEPRIVATE
286 {
287 /** The guest page frame number. (Max addressable: 2 ^ 36) */
288 uint32_t pfn : 24;
289 /** The GVM handle. (127 VMs) */
290 uint32_t hGVM : 7;
291 /** The top page state bit, MBZ. */
292 uint32_t fZero : 1;
293 } Private;
294
295 /** The view of a shared page. */
296 struct GMMPAGESHARED
297 {
298 /** The reference count. */
299 uint32_t cRefs : 30;
300 /** The page state. */
301 uint32_t u2State : 2;
302 } Shared;
303
304 /** The view of a free page. */
305 struct GMMPAGEFREE
306 {
307 /** The index of the next page in the free list. UINT16_MAX is NIL. */
308 uint32_t iNext : 16;
309 /** Reserved. Checksum or something? */
310 uint32_t u14Reserved : 14;
311 /** The page state. */
312 uint32_t u2State : 2;
313 } Free;
314#endif
315} GMMPAGE;
316AssertCompileSize(GMMPAGE, sizeof(RTHCUINTPTR));
317/** Pointer to a GMMPAGE. */
318typedef GMMPAGE *PGMMPAGE;
319
320
321/** @name The Page States.
322 * @{ */
323/** A private page. */
324#define GMM_PAGE_STATE_PRIVATE 0
325/** A private page - alternative value used on the 32-bit implementation.
326 * This will never be used on 64-bit hosts. */
327#define GMM_PAGE_STATE_PRIVATE_32 1
328/** A shared page. */
329#define GMM_PAGE_STATE_SHARED 2
330/** A free page. */
331#define GMM_PAGE_STATE_FREE 3
332/** @} */
333
334
335/** @def GMM_PAGE_IS_PRIVATE
336 *
337 * @returns true if private, false if not.
338 * @param pPage The GMM page.
339 */
340#if HC_ARCH_BITS == 64
341# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_PRIVATE )
342#else
343# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Private.fZero == 0 )
344#endif
345
346/** @def GMM_PAGE_IS_SHARED
347 *
348 * @returns true if shared, false if not.
349 * @param pPage The GMM page.
350 */
351#define GMM_PAGE_IS_SHARED(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_SHARED )
352
353/** @def GMM_PAGE_IS_FREE
354 *
355 * @returns true if free, false if not.
356 * @param pPage The GMM page.
357 */
358#define GMM_PAGE_IS_FREE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_FREE )
359
360/** @def GMM_PAGE_PFN_LAST
361 * The last valid guest pfn range.
362 * @remark Some of the values outside the range has special meaning,
363 * see GMM_PAGE_PFN_UNSHAREABLE.
364 */
365#if HC_ARCH_BITS == 64
366# define GMM_PAGE_PFN_LAST UINT32_C(0xfffffff0)
367#else
368# define GMM_PAGE_PFN_LAST UINT32_C(0x00fffff0)
369#endif
370AssertCompile(GMM_PAGE_PFN_LAST == (GMM_GCPHYS_LAST >> PAGE_SHIFT));
371
372/** @def GMM_PAGE_PFN_UNSHAREABLE
373 * Indicates that this page isn't used for normal guest memory and thus isn't shareable.
374 */
375#if HC_ARCH_BITS == 64
376# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0xfffffff1)
377#else
378# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0x00fffff1)
379#endif
380AssertCompile(GMM_PAGE_PFN_UNSHAREABLE == (GMM_GCPHYS_UNSHAREABLE >> PAGE_SHIFT));
381
382
383/**
384 * A GMM allocation chunk ring-3 mapping record.
385 *
386 * This should really be associated with a session and not a VM, but
387 * it's simpler to associated with a VM and cleanup with the VM object
388 * is destroyed.
389 */
390typedef struct GMMCHUNKMAP
391{
392 /** The mapping object. */
393 RTR0MEMOBJ hMapObj;
394 /** The VM owning the mapping. */
395 PGVM pGVM;
396} GMMCHUNKMAP;
397/** Pointer to a GMM allocation chunk mapping. */
398typedef struct GMMCHUNKMAP *PGMMCHUNKMAP;
399
400
401/**
402 * A GMM allocation chunk.
403 */
404typedef struct GMMCHUNK
405{
406 /** The AVL node core.
407 * The Key is the chunk ID. (Giant mtx.) */
408 AVLU32NODECORE Core;
409 /** The memory object.
410 * Either from RTR0MemObjAllocPhysNC or RTR0MemObjLockUser depending on
411 * what the host can dish up with. (Chunk mtx protects mapping accesses
412 * and related frees.) */
413 RTR0MEMOBJ hMemObj;
414#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
415 /** Pointer to the kernel mapping. */
416 uint8_t *pbMapping;
417#endif
418 /** Pointer to the next chunk in the free list. (Giant mtx.) */
419 PGMMCHUNK pFreeNext;
420 /** Pointer to the previous chunk in the free list. (Giant mtx.) */
421 PGMMCHUNK pFreePrev;
422 /** Pointer to the free set this chunk belongs to. NULL for
423 * chunks with no free pages. (Giant mtx.) */
424 PGMMCHUNKFREESET pSet;
425 /** List node in the chunk list (GMM::ChunkList). (Giant mtx.) */
426 RTLISTNODE ListNode;
427 /** Pointer to an array of mappings. (Chunk mtx.) */
428 PGMMCHUNKMAP paMappingsX;
429 /** The number of mappings. (Chunk mtx.) */
430 uint16_t cMappingsX;
431 /** The mapping lock this chunk is using using. UINT16_MAX if nobody is
432 * mapping or freeing anything. (Giant mtx.) */
433 uint8_t volatile iChunkMtx;
434 /** GMM_CHUNK_FLAGS_XXX. (Giant mtx.) */
435 uint8_t fFlags;
436 /** The head of the list of free pages. UINT16_MAX is the NIL value.
437 * (Giant mtx.) */
438 uint16_t iFreeHead;
439 /** The number of free pages. (Giant mtx.) */
440 uint16_t cFree;
441 /** The GVM handle of the VM that first allocated pages from this chunk, this
442 * is used as a preference when there are several chunks to choose from.
443 * When in bound memory mode this isn't a preference any longer. (Giant
444 * mtx.) */
445 uint16_t hGVM;
446 /** The ID of the NUMA node the memory mostly resides on. (Reserved for
447 * future use.) (Giant mtx.) */
448 uint16_t idNumaNode;
449 /** The number of private pages. (Giant mtx.) */
450 uint16_t cPrivate;
451 /** The number of shared pages. (Giant mtx.) */
452 uint16_t cShared;
453 /** The pages. (Giant mtx.) */
454 GMMPAGE aPages[GMM_CHUNK_SIZE >> PAGE_SHIFT];
455} GMMCHUNK;
456
457/** Indicates that the NUMA properies of the memory is unknown. */
458#define GMM_CHUNK_NUMA_ID_UNKNOWN UINT16_C(0xfffe)
459
460/** @name GMM_CHUNK_FLAGS_XXX - chunk flags.
461 * @{ */
462/** Indicates that the chunk is a large page (2MB). */
463#define GMM_CHUNK_FLAGS_LARGE_PAGE UINT16_C(0x0001)
464#ifdef GMM_WITH_LEGACY_MODE
465/** Indicates that the chunk was locked rather than allocated directly. */
466# define GMM_CHUNK_FLAGS_SEEDED UINT16_C(0x0002)
467#endif
468/** @} */
469
470
471/**
472 * An allocation chunk TLB entry.
473 */
474typedef struct GMMCHUNKTLBE
475{
476 /** The chunk id. */
477 uint32_t idChunk;
478 /** Pointer to the chunk. */
479 PGMMCHUNK pChunk;
480} GMMCHUNKTLBE;
481/** Pointer to an allocation chunk TLB entry. */
482typedef GMMCHUNKTLBE *PGMMCHUNKTLBE;
483
484
485/** The number of entries tin the allocation chunk TLB. */
486#define GMM_CHUNKTLB_ENTRIES 32
487/** Gets the TLB entry index for the given Chunk ID. */
488#define GMM_CHUNKTLB_IDX(idChunk) ( (idChunk) & (GMM_CHUNKTLB_ENTRIES - 1) )
489
490/**
491 * An allocation chunk TLB.
492 */
493typedef struct GMMCHUNKTLB
494{
495 /** The TLB entries. */
496 GMMCHUNKTLBE aEntries[GMM_CHUNKTLB_ENTRIES];
497} GMMCHUNKTLB;
498/** Pointer to an allocation chunk TLB. */
499typedef GMMCHUNKTLB *PGMMCHUNKTLB;
500
501
502/**
503 * The GMM instance data.
504 */
505typedef struct GMM
506{
507 /** Magic / eye catcher. GMM_MAGIC */
508 uint32_t u32Magic;
509 /** The number of threads waiting on the mutex. */
510 uint32_t cMtxContenders;
511#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
512 /** The critical section protecting the GMM.
513 * More fine grained locking can be implemented later if necessary. */
514 RTCRITSECT GiantCritSect;
515#else
516 /** The fast mutex protecting the GMM.
517 * More fine grained locking can be implemented later if necessary. */
518 RTSEMFASTMUTEX hMtx;
519#endif
520#ifdef VBOX_STRICT
521 /** The current mutex owner. */
522 RTNATIVETHREAD hMtxOwner;
523#endif
524 /** The chunk tree. */
525 PAVLU32NODECORE pChunks;
526 /** The chunk TLB. */
527 GMMCHUNKTLB ChunkTLB;
528 /** The private free set. */
529 GMMCHUNKFREESET PrivateX;
530 /** The shared free set. */
531 GMMCHUNKFREESET Shared;
532
533 /** Shared module tree (global).
534 * @todo separate trees for distinctly different guest OSes. */
535 PAVLLU32NODECORE pGlobalSharedModuleTree;
536 /** Sharable modules (count of nodes in pGlobalSharedModuleTree). */
537 uint32_t cShareableModules;
538
539 /** The chunk list. For simplifying the cleanup process and avoid tree
540 * traversal. */
541 RTLISTANCHOR ChunkList;
542
543 /** The maximum number of pages we're allowed to allocate.
544 * @gcfgm{GMM/MaxPages,64-bit, Direct.}
545 * @gcfgm{GMM/PctPages,32-bit, Relative to the number of host pages.} */
546 uint64_t cMaxPages;
547 /** The number of pages that has been reserved.
548 * The deal is that cReservedPages - cOverCommittedPages <= cMaxPages. */
549 uint64_t cReservedPages;
550 /** The number of pages that we have over-committed in reservations. */
551 uint64_t cOverCommittedPages;
552 /** The number of actually allocated (committed if you like) pages. */
553 uint64_t cAllocatedPages;
554 /** The number of pages that are shared. A subset of cAllocatedPages. */
555 uint64_t cSharedPages;
556 /** The number of pages that are actually shared between VMs. */
557 uint64_t cDuplicatePages;
558 /** The number of pages that are shared that has been left behind by
559 * VMs not doing proper cleanups. */
560 uint64_t cLeftBehindSharedPages;
561 /** The number of allocation chunks.
562 * (The number of pages we've allocated from the host can be derived from this.) */
563 uint32_t cChunks;
564 /** The number of current ballooned pages. */
565 uint64_t cBalloonedPages;
566
567#ifndef GMM_WITH_LEGACY_MODE
568# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
569 /** Whether #RTR0MemObjAllocPhysNC works. */
570 bool fHasWorkingAllocPhysNC;
571# else
572 bool fPadding;
573# endif
574#else
575 /** The legacy allocation mode indicator.
576 * This is determined at initialization time. */
577 bool fLegacyAllocationMode;
578#endif
579 /** The bound memory mode indicator.
580 * When set, the memory will be bound to a specific VM and never
581 * shared. This is always set if fLegacyAllocationMode is set.
582 * (Also determined at initialization time.) */
583 bool fBoundMemoryMode;
584 /** The number of registered VMs. */
585 uint16_t cRegisteredVMs;
586
587 /** The number of freed chunks ever. This is used a list generation to
588 * avoid restarting the cleanup scanning when the list wasn't modified. */
589 uint32_t cFreedChunks;
590 /** The previous allocated Chunk ID.
591 * Used as a hint to avoid scanning the whole bitmap. */
592 uint32_t idChunkPrev;
593 /** Chunk ID allocation bitmap.
594 * Bits of allocated IDs are set, free ones are clear.
595 * The NIL id (0) is marked allocated. */
596 uint32_t bmChunkId[(GMM_CHUNKID_LAST + 1 + 31) / 32];
597
598 /** The index of the next mutex to use. */
599 uint32_t iNextChunkMtx;
600 /** Chunk locks for reducing lock contention without having to allocate
601 * one lock per chunk. */
602 struct
603 {
604 /** The mutex */
605 RTSEMFASTMUTEX hMtx;
606 /** The number of threads currently using this mutex. */
607 uint32_t volatile cUsers;
608 } aChunkMtx[64];
609} GMM;
610/** Pointer to the GMM instance. */
611typedef GMM *PGMM;
612
613/** The value of GMM::u32Magic (Katsuhiro Otomo). */
614#define GMM_MAGIC UINT32_C(0x19540414)
615
616
617/**
618 * GMM chunk mutex state.
619 *
620 * This is returned by gmmR0ChunkMutexAcquire and is used by the other
621 * gmmR0ChunkMutex* methods.
622 */
623typedef struct GMMR0CHUNKMTXSTATE
624{
625 PGMM pGMM;
626 /** The index of the chunk mutex. */
627 uint8_t iChunkMtx;
628 /** The relevant flags (GMMR0CHUNK_MTX_XXX). */
629 uint8_t fFlags;
630} GMMR0CHUNKMTXSTATE;
631/** Pointer to a chunk mutex state. */
632typedef GMMR0CHUNKMTXSTATE *PGMMR0CHUNKMTXSTATE;
633
634/** @name GMMR0CHUNK_MTX_XXX
635 * @{ */
636#define GMMR0CHUNK_MTX_INVALID UINT32_C(0)
637#define GMMR0CHUNK_MTX_KEEP_GIANT UINT32_C(1)
638#define GMMR0CHUNK_MTX_RETAKE_GIANT UINT32_C(2)
639#define GMMR0CHUNK_MTX_DROP_GIANT UINT32_C(3)
640#define GMMR0CHUNK_MTX_END UINT32_C(4)
641/** @} */
642
643
644/** The maximum number of shared modules per-vm. */
645#define GMM_MAX_SHARED_PER_VM_MODULES 2048
646/** The maximum number of shared modules GMM is allowed to track. */
647#define GMM_MAX_SHARED_GLOBAL_MODULES 16834
648
649
650/**
651 * Argument packet for gmmR0SharedModuleCleanup.
652 */
653typedef struct GMMR0SHMODPERVMDTORARGS
654{
655 PGVM pGVM;
656 PGMM pGMM;
657} GMMR0SHMODPERVMDTORARGS;
658
659/**
660 * Argument packet for gmmR0CheckSharedModule.
661 */
662typedef struct GMMCHECKSHAREDMODULEINFO
663{
664 PGVM pGVM;
665 VMCPUID idCpu;
666} GMMCHECKSHAREDMODULEINFO;
667
668/**
669 * Argument packet for gmmR0FindDupPageInChunk by GMMR0FindDuplicatePage.
670 */
671typedef struct GMMFINDDUPPAGEINFO
672{
673 PGVM pGVM;
674 PGMM pGMM;
675 uint8_t *pSourcePage;
676 bool fFoundDuplicate;
677} GMMFINDDUPPAGEINFO;
678
679
680/*********************************************************************************************************************************
681* Global Variables *
682*********************************************************************************************************************************/
683/** Pointer to the GMM instance data. */
684static PGMM g_pGMM = NULL;
685
686/** Macro for obtaining and validating the g_pGMM pointer.
687 *
688 * On failure it will return from the invoking function with the specified
689 * return value.
690 *
691 * @param pGMM The name of the pGMM variable.
692 * @param rc The return value on failure. Use VERR_GMM_INSTANCE for VBox
693 * status codes.
694 */
695#define GMM_GET_VALID_INSTANCE(pGMM, rc) \
696 do { \
697 (pGMM) = g_pGMM; \
698 AssertPtrReturn((pGMM), (rc)); \
699 AssertMsgReturn((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic), (rc)); \
700 } while (0)
701
702/** Macro for obtaining and validating the g_pGMM pointer, void function
703 * variant.
704 *
705 * On failure it will return from the invoking function.
706 *
707 * @param pGMM The name of the pGMM variable.
708 */
709#define GMM_GET_VALID_INSTANCE_VOID(pGMM) \
710 do { \
711 (pGMM) = g_pGMM; \
712 AssertPtrReturnVoid((pGMM)); \
713 AssertMsgReturnVoid((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic)); \
714 } while (0)
715
716
717/** @def GMM_CHECK_SANITY_UPON_ENTERING
718 * Checks the sanity of the GMM instance data before making changes.
719 *
720 * This is macro is a stub by default and must be enabled manually in the code.
721 *
722 * @returns true if sane, false if not.
723 * @param pGMM The name of the pGMM variable.
724 */
725#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
726# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
727#else
728# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (true)
729#endif
730
731/** @def GMM_CHECK_SANITY_UPON_LEAVING
732 * Checks the sanity of the GMM instance data after making changes.
733 *
734 * This is macro is a stub by default and must be enabled manually in the code.
735 *
736 * @returns true if sane, false if not.
737 * @param pGMM The name of the pGMM variable.
738 */
739#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
740# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
741#else
742# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (true)
743#endif
744
745/** @def GMM_CHECK_SANITY_IN_LOOPS
746 * Checks the sanity of the GMM instance in the allocation loops.
747 *
748 * This is macro is a stub by default and must be enabled manually in the code.
749 *
750 * @returns true if sane, false if not.
751 * @param pGMM The name of the pGMM variable.
752 */
753#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
754# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
755#else
756# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (true)
757#endif
758
759
760/*********************************************************************************************************************************
761* Internal Functions *
762*********************************************************************************************************************************/
763static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM);
764static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
765DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk);
766DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet);
767DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
768#ifdef GMMR0_WITH_SANITY_CHECK
769static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo);
770#endif
771static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem);
772DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
773DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
774static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
775#ifdef VBOX_WITH_PAGE_SHARING
776static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM);
777# ifdef VBOX_STRICT
778static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage);
779# endif
780#endif
781
782
783
784/**
785 * Initializes the GMM component.
786 *
787 * This is called when the VMMR0.r0 module is loaded and protected by the
788 * loader semaphore.
789 *
790 * @returns VBox status code.
791 */
792GMMR0DECL(int) GMMR0Init(void)
793{
794 LogFlow(("GMMInit:\n"));
795
796 /*
797 * Allocate the instance data and the locks.
798 */
799 PGMM pGMM = (PGMM)RTMemAllocZ(sizeof(*pGMM));
800 if (!pGMM)
801 return VERR_NO_MEMORY;
802
803 pGMM->u32Magic = GMM_MAGIC;
804 for (unsigned i = 0; i < RT_ELEMENTS(pGMM->ChunkTLB.aEntries); i++)
805 pGMM->ChunkTLB.aEntries[i].idChunk = NIL_GMM_CHUNKID;
806 RTListInit(&pGMM->ChunkList);
807 ASMBitSet(&pGMM->bmChunkId[0], NIL_GMM_CHUNKID);
808
809#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
810 int rc = RTCritSectInit(&pGMM->GiantCritSect);
811#else
812 int rc = RTSemFastMutexCreate(&pGMM->hMtx);
813#endif
814 if (RT_SUCCESS(rc))
815 {
816 unsigned iMtx;
817 for (iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
818 {
819 rc = RTSemFastMutexCreate(&pGMM->aChunkMtx[iMtx].hMtx);
820 if (RT_FAILURE(rc))
821 break;
822 }
823 if (RT_SUCCESS(rc))
824 {
825#ifndef GMM_WITH_LEGACY_MODE
826 /*
827 * Figure out how we're going to allocate stuff (only applicable to
828 * host with linear physical memory mappings).
829 */
830 pGMM->fBoundMemoryMode = false;
831# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
832 pGMM->fHasWorkingAllocPhysNC = false;
833
834 RTR0MEMOBJ hMemObj;
835 rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
836 if (RT_SUCCESS(rc))
837 {
838 rc = RTR0MemObjFree(hMemObj, true);
839 AssertRC(rc);
840 pGMM->fHasWorkingAllocPhysNC = true;
841 }
842 else if (rc != VERR_NOT_SUPPORTED)
843 SUPR0Printf("GMMR0Init: Warning! RTR0MemObjAllocPhysNC(, %u, NIL_RTHCPHYS) -> %d!\n", GMM_CHUNK_SIZE, rc);
844# endif
845#else /* GMM_WITH_LEGACY_MODE */
846 /*
847 * Check and see if RTR0MemObjAllocPhysNC works.
848 */
849# if 0 /* later, see @bufref{3170}. */
850 RTR0MEMOBJ MemObj;
851 rc = RTR0MemObjAllocPhysNC(&MemObj, _64K, NIL_RTHCPHYS);
852 if (RT_SUCCESS(rc))
853 {
854 rc = RTR0MemObjFree(MemObj, true);
855 AssertRC(rc);
856 }
857 else if (rc == VERR_NOT_SUPPORTED)
858 pGMM->fLegacyAllocationMode = pGMM->fBoundMemoryMode = true;
859 else
860 SUPR0Printf("GMMR0Init: RTR0MemObjAllocPhysNC(,64K,Any) -> %d!\n", rc);
861# else
862# if defined(RT_OS_WINDOWS) || (defined(RT_OS_SOLARIS) && ARCH_BITS == 64) || defined(RT_OS_LINUX) || defined(RT_OS_FREEBSD)
863 pGMM->fLegacyAllocationMode = false;
864# if ARCH_BITS == 32
865 /* Don't reuse possibly partial chunks because of the virtual
866 address space limitation. */
867 pGMM->fBoundMemoryMode = true;
868# else
869 pGMM->fBoundMemoryMode = false;
870# endif
871# else
872 pGMM->fLegacyAllocationMode = true;
873 pGMM->fBoundMemoryMode = true;
874# endif
875# endif
876#endif /* GMM_WITH_LEGACY_MODE */
877
878 /*
879 * Query system page count and guess a reasonable cMaxPages value.
880 */
881 pGMM->cMaxPages = UINT32_MAX; /** @todo IPRT function for query ram size and such. */
882
883 g_pGMM = pGMM;
884#ifdef GMM_WITH_LEGACY_MODE
885 LogFlow(("GMMInit: pGMM=%p fLegacyAllocationMode=%RTbool fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fLegacyAllocationMode, pGMM->fBoundMemoryMode));
886#elif defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
887 LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool fHasWorkingAllocPhysNC=%RTbool\n", pGMM, pGMM->fBoundMemoryMode, pGMM->fHasWorkingAllocPhysNC));
888#else
889 LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fBoundMemoryMode));
890#endif
891 return VINF_SUCCESS;
892 }
893
894 /*
895 * Bail out.
896 */
897 while (iMtx-- > 0)
898 RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
899#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
900 RTCritSectDelete(&pGMM->GiantCritSect);
901#else
902 RTSemFastMutexDestroy(pGMM->hMtx);
903#endif
904 }
905
906 pGMM->u32Magic = 0;
907 RTMemFree(pGMM);
908 SUPR0Printf("GMMR0Init: failed! rc=%d\n", rc);
909 return rc;
910}
911
912
913/**
914 * Terminates the GMM component.
915 */
916GMMR0DECL(void) GMMR0Term(void)
917{
918 LogFlow(("GMMTerm:\n"));
919
920 /*
921 * Take care / be paranoid...
922 */
923 PGMM pGMM = g_pGMM;
924 if (!VALID_PTR(pGMM))
925 return;
926 if (pGMM->u32Magic != GMM_MAGIC)
927 {
928 SUPR0Printf("GMMR0Term: u32Magic=%#x\n", pGMM->u32Magic);
929 return;
930 }
931
932 /*
933 * Undo what init did and free all the resources we've acquired.
934 */
935 /* Destroy the fundamentals. */
936 g_pGMM = NULL;
937 pGMM->u32Magic = ~GMM_MAGIC;
938#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
939 RTCritSectDelete(&pGMM->GiantCritSect);
940#else
941 RTSemFastMutexDestroy(pGMM->hMtx);
942 pGMM->hMtx = NIL_RTSEMFASTMUTEX;
943#endif
944
945 /* Free any chunks still hanging around. */
946 RTAvlU32Destroy(&pGMM->pChunks, gmmR0TermDestroyChunk, pGMM);
947
948 /* Destroy the chunk locks. */
949 for (unsigned iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
950 {
951 Assert(pGMM->aChunkMtx[iMtx].cUsers == 0);
952 RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
953 pGMM->aChunkMtx[iMtx].hMtx = NIL_RTSEMFASTMUTEX;
954 }
955
956 /* Finally the instance data itself. */
957 RTMemFree(pGMM);
958 LogFlow(("GMMTerm: done\n"));
959}
960
961
962/**
963 * RTAvlU32Destroy callback.
964 *
965 * @returns 0
966 * @param pNode The node to destroy.
967 * @param pvGMM The GMM handle.
968 */
969static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM)
970{
971 PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
972
973 if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
974 SUPR0Printf("GMMR0Term: %RKv/%#x: cFree=%d cPrivate=%d cShared=%d cMappings=%d\n", pChunk,
975 pChunk->Core.Key, pChunk->cFree, pChunk->cPrivate, pChunk->cShared, pChunk->cMappingsX);
976
977 int rc = RTR0MemObjFree(pChunk->hMemObj, true /* fFreeMappings */);
978 if (RT_FAILURE(rc))
979 {
980 SUPR0Printf("GMMR0Term: %RKv/%#x: RTRMemObjFree(%RKv,true) -> %d (cMappings=%d)\n", pChunk,
981 pChunk->Core.Key, pChunk->hMemObj, rc, pChunk->cMappingsX);
982 AssertRC(rc);
983 }
984 pChunk->hMemObj = NIL_RTR0MEMOBJ;
985
986 RTMemFree(pChunk->paMappingsX);
987 pChunk->paMappingsX = NULL;
988
989 RTMemFree(pChunk);
990 NOREF(pvGMM);
991 return 0;
992}
993
994
995/**
996 * Initializes the per-VM data for the GMM.
997 *
998 * This is called from within the GVMM lock (from GVMMR0CreateVM)
999 * and should only initialize the data members so GMMR0CleanupVM
1000 * can deal with them. We reserve no memory or anything here,
1001 * that's done later in GMMR0InitVM.
1002 *
1003 * @param pGVM Pointer to the Global VM structure.
1004 */
1005GMMR0DECL(void) GMMR0InitPerVMData(PGVM pGVM)
1006{
1007 AssertCompile(RT_SIZEOFMEMB(GVM,gmm.s) <= RT_SIZEOFMEMB(GVM,gmm.padding));
1008
1009 pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1010 pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1011 pGVM->gmm.s.Stats.fMayAllocate = false;
1012}
1013
1014
1015/**
1016 * Acquires the GMM giant lock.
1017 *
1018 * @returns Assert status code from RTSemFastMutexRequest.
1019 * @param pGMM Pointer to the GMM instance.
1020 */
1021static int gmmR0MutexAcquire(PGMM pGMM)
1022{
1023 ASMAtomicIncU32(&pGMM->cMtxContenders);
1024#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1025 int rc = RTCritSectEnter(&pGMM->GiantCritSect);
1026#else
1027 int rc = RTSemFastMutexRequest(pGMM->hMtx);
1028#endif
1029 ASMAtomicDecU32(&pGMM->cMtxContenders);
1030 AssertRC(rc);
1031#ifdef VBOX_STRICT
1032 pGMM->hMtxOwner = RTThreadNativeSelf();
1033#endif
1034 return rc;
1035}
1036
1037
1038/**
1039 * Releases the GMM giant lock.
1040 *
1041 * @returns Assert status code from RTSemFastMutexRequest.
1042 * @param pGMM Pointer to the GMM instance.
1043 */
1044static int gmmR0MutexRelease(PGMM pGMM)
1045{
1046#ifdef VBOX_STRICT
1047 pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1048#endif
1049#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1050 int rc = RTCritSectLeave(&pGMM->GiantCritSect);
1051#else
1052 int rc = RTSemFastMutexRelease(pGMM->hMtx);
1053 AssertRC(rc);
1054#endif
1055 return rc;
1056}
1057
1058
1059/**
1060 * Yields the GMM giant lock if there is contention and a certain minimum time
1061 * has elapsed since we took it.
1062 *
1063 * @returns @c true if the mutex was yielded, @c false if not.
1064 * @param pGMM Pointer to the GMM instance.
1065 * @param puLockNanoTS Where the lock acquisition time stamp is kept
1066 * (in/out).
1067 */
1068static bool gmmR0MutexYield(PGMM pGMM, uint64_t *puLockNanoTS)
1069{
1070 /*
1071 * If nobody is contending the mutex, don't bother checking the time.
1072 */
1073 if (ASMAtomicReadU32(&pGMM->cMtxContenders) == 0)
1074 return false;
1075
1076 /*
1077 * Don't yield if we haven't executed for at least 2 milliseconds.
1078 */
1079 uint64_t uNanoNow = RTTimeSystemNanoTS();
1080 if (uNanoNow - *puLockNanoTS < UINT32_C(2000000))
1081 return false;
1082
1083 /*
1084 * Yield the mutex.
1085 */
1086#ifdef VBOX_STRICT
1087 pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1088#endif
1089 ASMAtomicIncU32(&pGMM->cMtxContenders);
1090#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1091 int rc1 = RTCritSectLeave(&pGMM->GiantCritSect); AssertRC(rc1);
1092#else
1093 int rc1 = RTSemFastMutexRelease(pGMM->hMtx); AssertRC(rc1);
1094#endif
1095
1096 RTThreadYield();
1097
1098#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1099 int rc2 = RTCritSectEnter(&pGMM->GiantCritSect); AssertRC(rc2);
1100#else
1101 int rc2 = RTSemFastMutexRequest(pGMM->hMtx); AssertRC(rc2);
1102#endif
1103 *puLockNanoTS = RTTimeSystemNanoTS();
1104 ASMAtomicDecU32(&pGMM->cMtxContenders);
1105#ifdef VBOX_STRICT
1106 pGMM->hMtxOwner = RTThreadNativeSelf();
1107#endif
1108
1109 return true;
1110}
1111
1112
1113/**
1114 * Acquires a chunk lock.
1115 *
1116 * The caller must own the giant lock.
1117 *
1118 * @returns Assert status code from RTSemFastMutexRequest.
1119 * @param pMtxState The chunk mutex state info. (Avoids
1120 * passing the same flags and stuff around
1121 * for subsequent release and drop-giant
1122 * calls.)
1123 * @param pGMM Pointer to the GMM instance.
1124 * @param pChunk Pointer to the chunk.
1125 * @param fFlags Flags regarding the giant lock, GMMR0CHUNK_MTX_XXX.
1126 */
1127static int gmmR0ChunkMutexAcquire(PGMMR0CHUNKMTXSTATE pMtxState, PGMM pGMM, PGMMCHUNK pChunk, uint32_t fFlags)
1128{
1129 Assert(fFlags > GMMR0CHUNK_MTX_INVALID && fFlags < GMMR0CHUNK_MTX_END);
1130 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1131
1132 pMtxState->pGMM = pGMM;
1133 pMtxState->fFlags = (uint8_t)fFlags;
1134
1135 /*
1136 * Get the lock index and reference the lock.
1137 */
1138 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1139 uint32_t iChunkMtx = pChunk->iChunkMtx;
1140 if (iChunkMtx == UINT8_MAX)
1141 {
1142 iChunkMtx = pGMM->iNextChunkMtx++;
1143 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1144
1145 /* Try get an unused one... */
1146 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1147 {
1148 iChunkMtx = pGMM->iNextChunkMtx++;
1149 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1150 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1151 {
1152 iChunkMtx = pGMM->iNextChunkMtx++;
1153 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1154 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1155 {
1156 iChunkMtx = pGMM->iNextChunkMtx++;
1157 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1158 }
1159 }
1160 }
1161
1162 pChunk->iChunkMtx = iChunkMtx;
1163 }
1164 AssertCompile(RT_ELEMENTS(pGMM->aChunkMtx) < UINT8_MAX);
1165 pMtxState->iChunkMtx = (uint8_t)iChunkMtx;
1166 ASMAtomicIncU32(&pGMM->aChunkMtx[iChunkMtx].cUsers);
1167
1168 /*
1169 * Drop the giant?
1170 */
1171 if (fFlags != GMMR0CHUNK_MTX_KEEP_GIANT)
1172 {
1173 /** @todo GMM life cycle cleanup (we may race someone
1174 * destroying and cleaning up GMM)? */
1175 gmmR0MutexRelease(pGMM);
1176 }
1177
1178 /*
1179 * Take the chunk mutex.
1180 */
1181 int rc = RTSemFastMutexRequest(pGMM->aChunkMtx[iChunkMtx].hMtx);
1182 AssertRC(rc);
1183 return rc;
1184}
1185
1186
1187/**
1188 * Releases the GMM giant lock.
1189 *
1190 * @returns Assert status code from RTSemFastMutexRequest.
1191 * @param pMtxState Pointer to the chunk mutex state.
1192 * @param pChunk Pointer to the chunk if it's still
1193 * alive, NULL if it isn't. This is used to deassociate
1194 * the chunk from the mutex on the way out so a new one
1195 * can be selected next time, thus avoiding contented
1196 * mutexes.
1197 */
1198static int gmmR0ChunkMutexRelease(PGMMR0CHUNKMTXSTATE pMtxState, PGMMCHUNK pChunk)
1199{
1200 PGMM pGMM = pMtxState->pGMM;
1201
1202 /*
1203 * Release the chunk mutex and reacquire the giant if requested.
1204 */
1205 int rc = RTSemFastMutexRelease(pGMM->aChunkMtx[pMtxState->iChunkMtx].hMtx);
1206 AssertRC(rc);
1207 if (pMtxState->fFlags == GMMR0CHUNK_MTX_RETAKE_GIANT)
1208 rc = gmmR0MutexAcquire(pGMM);
1209 else
1210 Assert((pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT) == (pGMM->hMtxOwner == RTThreadNativeSelf()));
1211
1212 /*
1213 * Drop the chunk mutex user reference and deassociate it from the chunk
1214 * when possible.
1215 */
1216 if ( ASMAtomicDecU32(&pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers) == 0
1217 && pChunk
1218 && RT_SUCCESS(rc) )
1219 {
1220 if (pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT)
1221 pChunk->iChunkMtx = UINT8_MAX;
1222 else
1223 {
1224 rc = gmmR0MutexAcquire(pGMM);
1225 if (RT_SUCCESS(rc))
1226 {
1227 if (pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers == 0)
1228 pChunk->iChunkMtx = UINT8_MAX;
1229 rc = gmmR0MutexRelease(pGMM);
1230 }
1231 }
1232 }
1233
1234 pMtxState->pGMM = NULL;
1235 return rc;
1236}
1237
1238
1239/**
1240 * Drops the giant GMM lock we kept in gmmR0ChunkMutexAcquire while keeping the
1241 * chunk locked.
1242 *
1243 * This only works if gmmR0ChunkMutexAcquire was called with
1244 * GMMR0CHUNK_MTX_KEEP_GIANT. gmmR0ChunkMutexRelease will retake the giant
1245 * mutex, i.e. behave as if GMMR0CHUNK_MTX_RETAKE_GIANT was used.
1246 *
1247 * @returns VBox status code (assuming success is ok).
1248 * @param pMtxState Pointer to the chunk mutex state.
1249 */
1250static int gmmR0ChunkMutexDropGiant(PGMMR0CHUNKMTXSTATE pMtxState)
1251{
1252 AssertReturn(pMtxState->fFlags == GMMR0CHUNK_MTX_KEEP_GIANT, VERR_GMM_MTX_FLAGS);
1253 Assert(pMtxState->pGMM->hMtxOwner == RTThreadNativeSelf());
1254 pMtxState->fFlags = GMMR0CHUNK_MTX_RETAKE_GIANT;
1255 /** @todo GMM life cycle cleanup (we may race someone
1256 * destroying and cleaning up GMM)? */
1257 return gmmR0MutexRelease(pMtxState->pGMM);
1258}
1259
1260
1261/**
1262 * For experimenting with NUMA affinity and such.
1263 *
1264 * @returns The current NUMA Node ID.
1265 */
1266static uint16_t gmmR0GetCurrentNumaNodeId(void)
1267{
1268#if 1
1269 return GMM_CHUNK_NUMA_ID_UNKNOWN;
1270#else
1271 return RTMpCpuId() / 16;
1272#endif
1273}
1274
1275
1276
1277/**
1278 * Cleans up when a VM is terminating.
1279 *
1280 * @param pGVM Pointer to the Global VM structure.
1281 */
1282GMMR0DECL(void) GMMR0CleanupVM(PGVM pGVM)
1283{
1284 LogFlow(("GMMR0CleanupVM: pGVM=%p:{.hSelf=%#x}\n", pGVM, pGVM->hSelf));
1285
1286 PGMM pGMM;
1287 GMM_GET_VALID_INSTANCE_VOID(pGMM);
1288
1289#ifdef VBOX_WITH_PAGE_SHARING
1290 /*
1291 * Clean up all registered shared modules first.
1292 */
1293 gmmR0SharedModuleCleanup(pGMM, pGVM);
1294#endif
1295
1296 gmmR0MutexAcquire(pGMM);
1297 uint64_t uLockNanoTS = RTTimeSystemNanoTS();
1298 GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
1299
1300 /*
1301 * The policy is 'INVALID' until the initial reservation
1302 * request has been serviced.
1303 */
1304 if ( pGVM->gmm.s.Stats.enmPolicy > GMMOCPOLICY_INVALID
1305 && pGVM->gmm.s.Stats.enmPolicy < GMMOCPOLICY_END)
1306 {
1307 /*
1308 * If it's the last VM around, we can skip walking all the chunk looking
1309 * for the pages owned by this VM and instead flush the whole shebang.
1310 *
1311 * This takes care of the eventuality that a VM has left shared page
1312 * references behind (shouldn't happen of course, but you never know).
1313 */
1314 Assert(pGMM->cRegisteredVMs);
1315 pGMM->cRegisteredVMs--;
1316
1317 /*
1318 * Walk the entire pool looking for pages that belong to this VM
1319 * and leftover mappings. (This'll only catch private pages,
1320 * shared pages will be 'left behind'.)
1321 */
1322 /** @todo r=bird: This scanning+freeing could be optimized in bound mode! */
1323 uint64_t cPrivatePages = pGVM->gmm.s.Stats.cPrivatePages; /* save */
1324
1325 unsigned iCountDown = 64;
1326 bool fRedoFromStart;
1327 PGMMCHUNK pChunk;
1328 do
1329 {
1330 fRedoFromStart = false;
1331 RTListForEachReverse(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
1332 {
1333 uint32_t const cFreeChunksOld = pGMM->cFreedChunks;
1334 if ( ( !pGMM->fBoundMemoryMode
1335 || pChunk->hGVM == pGVM->hSelf)
1336 && gmmR0CleanupVMScanChunk(pGMM, pGVM, pChunk))
1337 {
1338 /* We left the giant mutex, so reset the yield counters. */
1339 uLockNanoTS = RTTimeSystemNanoTS();
1340 iCountDown = 64;
1341 }
1342 else
1343 {
1344 /* Didn't leave it, so do normal yielding. */
1345 if (!iCountDown)
1346 gmmR0MutexYield(pGMM, &uLockNanoTS);
1347 else
1348 iCountDown--;
1349 }
1350 if (pGMM->cFreedChunks != cFreeChunksOld)
1351 {
1352 fRedoFromStart = true;
1353 break;
1354 }
1355 }
1356 } while (fRedoFromStart);
1357
1358 if (pGVM->gmm.s.Stats.cPrivatePages)
1359 SUPR0Printf("GMMR0CleanupVM: hGVM=%#x has %#x private pages that cannot be found!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cPrivatePages);
1360
1361 pGMM->cAllocatedPages -= cPrivatePages;
1362
1363 /*
1364 * Free empty chunks.
1365 */
1366 PGMMCHUNKFREESET pPrivateSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
1367 do
1368 {
1369 fRedoFromStart = false;
1370 iCountDown = 10240;
1371 pChunk = pPrivateSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
1372 while (pChunk)
1373 {
1374 PGMMCHUNK pNext = pChunk->pFreeNext;
1375 Assert(pChunk->cFree == GMM_CHUNK_NUM_PAGES);
1376 if ( !pGMM->fBoundMemoryMode
1377 || pChunk->hGVM == pGVM->hSelf)
1378 {
1379 uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1380 if (gmmR0FreeChunk(pGMM, pGVM, pChunk, true /*fRelaxedSem*/))
1381 {
1382 /* We've left the giant mutex, restart? (+1 for our unlink) */
1383 fRedoFromStart = pPrivateSet->idGeneration != idGenerationOld + 1;
1384 if (fRedoFromStart)
1385 break;
1386 uLockNanoTS = RTTimeSystemNanoTS();
1387 iCountDown = 10240;
1388 }
1389 }
1390
1391 /* Advance and maybe yield the lock. */
1392 pChunk = pNext;
1393 if (--iCountDown == 0)
1394 {
1395 uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1396 fRedoFromStart = gmmR0MutexYield(pGMM, &uLockNanoTS)
1397 && pPrivateSet->idGeneration != idGenerationOld;
1398 if (fRedoFromStart)
1399 break;
1400 iCountDown = 10240;
1401 }
1402 }
1403 } while (fRedoFromStart);
1404
1405 /*
1406 * Account for shared pages that weren't freed.
1407 */
1408 if (pGVM->gmm.s.Stats.cSharedPages)
1409 {
1410 Assert(pGMM->cSharedPages >= pGVM->gmm.s.Stats.cSharedPages);
1411 SUPR0Printf("GMMR0CleanupVM: hGVM=%#x left %#x shared pages behind!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cSharedPages);
1412 pGMM->cLeftBehindSharedPages += pGVM->gmm.s.Stats.cSharedPages;
1413 }
1414
1415 /*
1416 * Clean up balloon statistics in case the VM process crashed.
1417 */
1418 Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
1419 pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
1420
1421 /*
1422 * Update the over-commitment management statistics.
1423 */
1424 pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1425 + pGVM->gmm.s.Stats.Reserved.cFixedPages
1426 + pGVM->gmm.s.Stats.Reserved.cShadowPages;
1427 switch (pGVM->gmm.s.Stats.enmPolicy)
1428 {
1429 case GMMOCPOLICY_NO_OC:
1430 break;
1431 default:
1432 /** @todo Update GMM->cOverCommittedPages */
1433 break;
1434 }
1435 }
1436
1437 /* zap the GVM data. */
1438 pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1439 pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1440 pGVM->gmm.s.Stats.fMayAllocate = false;
1441
1442 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1443 gmmR0MutexRelease(pGMM);
1444
1445 LogFlow(("GMMR0CleanupVM: returns\n"));
1446}
1447
1448
1449/**
1450 * Scan one chunk for private pages belonging to the specified VM.
1451 *
1452 * @note This function may drop the giant mutex!
1453 *
1454 * @returns @c true if we've temporarily dropped the giant mutex, @c false if
1455 * we didn't.
1456 * @param pGMM Pointer to the GMM instance.
1457 * @param pGVM The global VM handle.
1458 * @param pChunk The chunk to scan.
1459 */
1460static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1461{
1462 Assert(!pGMM->fBoundMemoryMode || pChunk->hGVM == pGVM->hSelf);
1463
1464 /*
1465 * Look for pages belonging to the VM.
1466 * (Perform some internal checks while we're scanning.)
1467 */
1468#ifndef VBOX_STRICT
1469 if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
1470#endif
1471 {
1472 unsigned cPrivate = 0;
1473 unsigned cShared = 0;
1474 unsigned cFree = 0;
1475
1476 gmmR0UnlinkChunk(pChunk); /* avoiding cFreePages updates. */
1477
1478 uint16_t hGVM = pGVM->hSelf;
1479 unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
1480 while (iPage-- > 0)
1481 if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
1482 {
1483 if (pChunk->aPages[iPage].Private.hGVM == hGVM)
1484 {
1485 /*
1486 * Free the page.
1487 *
1488 * The reason for not using gmmR0FreePrivatePage here is that we
1489 * must *not* cause the chunk to be freed from under us - we're in
1490 * an AVL tree walk here.
1491 */
1492 pChunk->aPages[iPage].u = 0;
1493 pChunk->aPages[iPage].Free.iNext = pChunk->iFreeHead;
1494 pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
1495 pChunk->iFreeHead = iPage;
1496 pChunk->cPrivate--;
1497 pChunk->cFree++;
1498 pGVM->gmm.s.Stats.cPrivatePages--;
1499 cFree++;
1500 }
1501 else
1502 cPrivate++;
1503 }
1504 else if (GMM_PAGE_IS_FREE(&pChunk->aPages[iPage]))
1505 cFree++;
1506 else
1507 cShared++;
1508
1509 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1510
1511 /*
1512 * Did it add up?
1513 */
1514 if (RT_UNLIKELY( pChunk->cFree != cFree
1515 || pChunk->cPrivate != cPrivate
1516 || pChunk->cShared != cShared))
1517 {
1518 SUPR0Printf("gmmR0CleanupVMScanChunk: Chunk %RKv/%#x has bogus stats - free=%d/%d private=%d/%d shared=%d/%d\n",
1519 pChunk, pChunk->Core.Key, pChunk->cFree, cFree, pChunk->cPrivate, cPrivate, pChunk->cShared, cShared);
1520 pChunk->cFree = cFree;
1521 pChunk->cPrivate = cPrivate;
1522 pChunk->cShared = cShared;
1523 }
1524 }
1525
1526 /*
1527 * If not in bound memory mode, we should reset the hGVM field
1528 * if it has our handle in it.
1529 */
1530 if (pChunk->hGVM == pGVM->hSelf)
1531 {
1532 if (!g_pGMM->fBoundMemoryMode)
1533 pChunk->hGVM = NIL_GVM_HANDLE;
1534 else if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
1535 {
1536 SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: cFree=%#x - it should be 0 in bound mode!\n",
1537 pChunk, pChunk->Core.Key, pChunk->cFree);
1538 AssertMsgFailed(("%p/%#x: cFree=%#x - it should be 0 in bound mode!\n", pChunk, pChunk->Core.Key, pChunk->cFree));
1539
1540 gmmR0UnlinkChunk(pChunk);
1541 pChunk->cFree = GMM_CHUNK_NUM_PAGES;
1542 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1543 }
1544 }
1545
1546 /*
1547 * Look for a mapping belonging to the terminating VM.
1548 */
1549 GMMR0CHUNKMTXSTATE MtxState;
1550 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
1551 unsigned cMappings = pChunk->cMappingsX;
1552 for (unsigned i = 0; i < cMappings; i++)
1553 if (pChunk->paMappingsX[i].pGVM == pGVM)
1554 {
1555 gmmR0ChunkMutexDropGiant(&MtxState);
1556
1557 RTR0MEMOBJ hMemObj = pChunk->paMappingsX[i].hMapObj;
1558
1559 cMappings--;
1560 if (i < cMappings)
1561 pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
1562 pChunk->paMappingsX[cMappings].pGVM = NULL;
1563 pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
1564 Assert(pChunk->cMappingsX - 1U == cMappings);
1565 pChunk->cMappingsX = cMappings;
1566
1567 int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings (NA) */);
1568 if (RT_FAILURE(rc))
1569 {
1570 SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: mapping #%x: RTRMemObjFree(%RKv,false) -> %d \n",
1571 pChunk, pChunk->Core.Key, i, hMemObj, rc);
1572 AssertRC(rc);
1573 }
1574
1575 gmmR0ChunkMutexRelease(&MtxState, pChunk);
1576 return true;
1577 }
1578
1579 gmmR0ChunkMutexRelease(&MtxState, pChunk);
1580 return false;
1581}
1582
1583
1584/**
1585 * The initial resource reservations.
1586 *
1587 * This will make memory reservations according to policy and priority. If there aren't
1588 * sufficient resources available to sustain the VM this function will fail and all
1589 * future allocations requests will fail as well.
1590 *
1591 * These are just the initial reservations made very very early during the VM creation
1592 * process and will be adjusted later in the GMMR0UpdateReservation call after the
1593 * ring-3 init has completed.
1594 *
1595 * @returns VBox status code.
1596 * @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1597 * @retval VERR_GMM_
1598 *
1599 * @param pGVM The global (ring-0) VM structure.
1600 * @param idCpu The VCPU id - must be zero.
1601 * @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1602 * This does not include MMIO2 and similar.
1603 * @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1604 * @param cFixedPages The number of pages that may be allocated for fixed objects like the
1605 * hyper heap, MMIO2 and similar.
1606 * @param enmPolicy The OC policy to use on this VM.
1607 * @param enmPriority The priority in an out-of-memory situation.
1608 *
1609 * @thread The creator thread / EMT(0).
1610 */
1611GMMR0DECL(int) GMMR0InitialReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages,
1612 uint32_t cFixedPages, GMMOCPOLICY enmPolicy, GMMPRIORITY enmPriority)
1613{
1614 LogFlow(("GMMR0InitialReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x enmPolicy=%d enmPriority=%d\n",
1615 pGVM, cBasePages, cShadowPages, cFixedPages, enmPolicy, enmPriority));
1616
1617 /*
1618 * Validate, get basics and take the semaphore.
1619 */
1620 AssertReturn(idCpu == 0, VERR_INVALID_CPU_ID);
1621 PGMM pGMM;
1622 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1623 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1624 if (RT_FAILURE(rc))
1625 return rc;
1626
1627 AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1628 AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1629 AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1630 AssertReturn(enmPolicy > GMMOCPOLICY_INVALID && enmPolicy < GMMOCPOLICY_END, VERR_INVALID_PARAMETER);
1631 AssertReturn(enmPriority > GMMPRIORITY_INVALID && enmPriority < GMMPRIORITY_END, VERR_INVALID_PARAMETER);
1632
1633 gmmR0MutexAcquire(pGMM);
1634 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1635 {
1636 if ( !pGVM->gmm.s.Stats.Reserved.cBasePages
1637 && !pGVM->gmm.s.Stats.Reserved.cFixedPages
1638 && !pGVM->gmm.s.Stats.Reserved.cShadowPages)
1639 {
1640 /*
1641 * Check if we can accommodate this.
1642 */
1643 /* ... later ... */
1644 if (RT_SUCCESS(rc))
1645 {
1646 /*
1647 * Update the records.
1648 */
1649 pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1650 pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1651 pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1652 pGVM->gmm.s.Stats.enmPolicy = enmPolicy;
1653 pGVM->gmm.s.Stats.enmPriority = enmPriority;
1654 pGVM->gmm.s.Stats.fMayAllocate = true;
1655
1656 pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1657 pGMM->cRegisteredVMs++;
1658 }
1659 }
1660 else
1661 rc = VERR_WRONG_ORDER;
1662 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1663 }
1664 else
1665 rc = VERR_GMM_IS_NOT_SANE;
1666 gmmR0MutexRelease(pGMM);
1667 LogFlow(("GMMR0InitialReservation: returns %Rrc\n", rc));
1668 return rc;
1669}
1670
1671
1672/**
1673 * VMMR0 request wrapper for GMMR0InitialReservation.
1674 *
1675 * @returns see GMMR0InitialReservation.
1676 * @param pGVM The global (ring-0) VM structure.
1677 * @param idCpu The VCPU id.
1678 * @param pReq Pointer to the request packet.
1679 */
1680GMMR0DECL(int) GMMR0InitialReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMINITIALRESERVATIONREQ pReq)
1681{
1682 /*
1683 * Validate input and pass it on.
1684 */
1685 AssertPtrReturn(pGVM, VERR_INVALID_POINTER);
1686 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1687 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
1688
1689 return GMMR0InitialReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages,
1690 pReq->cFixedPages, pReq->enmPolicy, pReq->enmPriority);
1691}
1692
1693
1694/**
1695 * This updates the memory reservation with the additional MMIO2 and ROM pages.
1696 *
1697 * @returns VBox status code.
1698 * @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1699 *
1700 * @param pGVM The global (ring-0) VM structure.
1701 * @param idCpu The VCPU id.
1702 * @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1703 * This does not include MMIO2 and similar.
1704 * @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1705 * @param cFixedPages The number of pages that may be allocated for fixed objects like the
1706 * hyper heap, MMIO2 and similar.
1707 *
1708 * @thread EMT(idCpu)
1709 */
1710GMMR0DECL(int) GMMR0UpdateReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages,
1711 uint32_t cShadowPages, uint32_t cFixedPages)
1712{
1713 LogFlow(("GMMR0UpdateReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x\n",
1714 pGVM, cBasePages, cShadowPages, cFixedPages));
1715
1716 /*
1717 * Validate, get basics and take the semaphore.
1718 */
1719 PGMM pGMM;
1720 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1721 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1722 if (RT_FAILURE(rc))
1723 return rc;
1724
1725 AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1726 AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1727 AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1728
1729 gmmR0MutexAcquire(pGMM);
1730 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1731 {
1732 if ( pGVM->gmm.s.Stats.Reserved.cBasePages
1733 && pGVM->gmm.s.Stats.Reserved.cFixedPages
1734 && pGVM->gmm.s.Stats.Reserved.cShadowPages)
1735 {
1736 /*
1737 * Check if we can accommodate this.
1738 */
1739 /* ... later ... */
1740 if (RT_SUCCESS(rc))
1741 {
1742 /*
1743 * Update the records.
1744 */
1745 pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1746 + pGVM->gmm.s.Stats.Reserved.cFixedPages
1747 + pGVM->gmm.s.Stats.Reserved.cShadowPages;
1748 pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1749
1750 pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1751 pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1752 pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1753 }
1754 }
1755 else
1756 rc = VERR_WRONG_ORDER;
1757 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1758 }
1759 else
1760 rc = VERR_GMM_IS_NOT_SANE;
1761 gmmR0MutexRelease(pGMM);
1762 LogFlow(("GMMR0UpdateReservation: returns %Rrc\n", rc));
1763 return rc;
1764}
1765
1766
1767/**
1768 * VMMR0 request wrapper for GMMR0UpdateReservation.
1769 *
1770 * @returns see GMMR0UpdateReservation.
1771 * @param pGVM The global (ring-0) VM structure.
1772 * @param idCpu The VCPU id.
1773 * @param pReq Pointer to the request packet.
1774 */
1775GMMR0DECL(int) GMMR0UpdateReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMUPDATERESERVATIONREQ pReq)
1776{
1777 /*
1778 * Validate input and pass it on.
1779 */
1780 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1781 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
1782
1783 return GMMR0UpdateReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages);
1784}
1785
1786#ifdef GMMR0_WITH_SANITY_CHECK
1787
1788/**
1789 * Performs sanity checks on a free set.
1790 *
1791 * @returns Error count.
1792 *
1793 * @param pGMM Pointer to the GMM instance.
1794 * @param pSet Pointer to the set.
1795 * @param pszSetName The set name.
1796 * @param pszFunction The function from which it was called.
1797 * @param uLine The line number.
1798 */
1799static uint32_t gmmR0SanityCheckSet(PGMM pGMM, PGMMCHUNKFREESET pSet, const char *pszSetName,
1800 const char *pszFunction, unsigned uLineNo)
1801{
1802 uint32_t cErrors = 0;
1803
1804 /*
1805 * Count the free pages in all the chunks and match it against pSet->cFreePages.
1806 */
1807 uint32_t cPages = 0;
1808 for (unsigned i = 0; i < RT_ELEMENTS(pSet->apLists); i++)
1809 {
1810 for (PGMMCHUNK pCur = pSet->apLists[i]; pCur; pCur = pCur->pFreeNext)
1811 {
1812 /** @todo check that the chunk is hash into the right set. */
1813 cPages += pCur->cFree;
1814 }
1815 }
1816 if (RT_UNLIKELY(cPages != pSet->cFreePages))
1817 {
1818 SUPR0Printf("GMM insanity: found %#x pages in the %s set, expected %#x. (%s, line %u)\n",
1819 cPages, pszSetName, pSet->cFreePages, pszFunction, uLineNo);
1820 cErrors++;
1821 }
1822
1823 return cErrors;
1824}
1825
1826
1827/**
1828 * Performs some sanity checks on the GMM while owning lock.
1829 *
1830 * @returns Error count.
1831 *
1832 * @param pGMM Pointer to the GMM instance.
1833 * @param pszFunction The function from which it is called.
1834 * @param uLineNo The line number.
1835 */
1836static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo)
1837{
1838 uint32_t cErrors = 0;
1839
1840 cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->PrivateX, "private", pszFunction, uLineNo);
1841 cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->Shared, "shared", pszFunction, uLineNo);
1842 /** @todo add more sanity checks. */
1843
1844 return cErrors;
1845}
1846
1847#endif /* GMMR0_WITH_SANITY_CHECK */
1848
1849/**
1850 * Looks up a chunk in the tree and fill in the TLB entry for it.
1851 *
1852 * This is not expected to fail and will bitch if it does.
1853 *
1854 * @returns Pointer to the allocation chunk, NULL if not found.
1855 * @param pGMM Pointer to the GMM instance.
1856 * @param idChunk The ID of the chunk to find.
1857 * @param pTlbe Pointer to the TLB entry.
1858 */
1859static PGMMCHUNK gmmR0GetChunkSlow(PGMM pGMM, uint32_t idChunk, PGMMCHUNKTLBE pTlbe)
1860{
1861 PGMMCHUNK pChunk = (PGMMCHUNK)RTAvlU32Get(&pGMM->pChunks, idChunk);
1862 AssertMsgReturn(pChunk, ("Chunk %#x not found!\n", idChunk), NULL);
1863 pTlbe->idChunk = idChunk;
1864 pTlbe->pChunk = pChunk;
1865 return pChunk;
1866}
1867
1868
1869/**
1870 * Finds a allocation chunk.
1871 *
1872 * This is not expected to fail and will bitch if it does.
1873 *
1874 * @returns Pointer to the allocation chunk, NULL if not found.
1875 * @param pGMM Pointer to the GMM instance.
1876 * @param idChunk The ID of the chunk to find.
1877 */
1878DECLINLINE(PGMMCHUNK) gmmR0GetChunk(PGMM pGMM, uint32_t idChunk)
1879{
1880 /*
1881 * Do a TLB lookup, branch if not in the TLB.
1882 */
1883 PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(idChunk)];
1884 if ( pTlbe->idChunk != idChunk
1885 || !pTlbe->pChunk)
1886 return gmmR0GetChunkSlow(pGMM, idChunk, pTlbe);
1887 return pTlbe->pChunk;
1888}
1889
1890
1891/**
1892 * Finds a page.
1893 *
1894 * This is not expected to fail and will bitch if it does.
1895 *
1896 * @returns Pointer to the page, NULL if not found.
1897 * @param pGMM Pointer to the GMM instance.
1898 * @param idPage The ID of the page to find.
1899 */
1900DECLINLINE(PGMMPAGE) gmmR0GetPage(PGMM pGMM, uint32_t idPage)
1901{
1902 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1903 if (RT_LIKELY(pChunk))
1904 return &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
1905 return NULL;
1906}
1907
1908
1909#if 0 /* unused */
1910/**
1911 * Gets the host physical address for a page given by it's ID.
1912 *
1913 * @returns The host physical address or NIL_RTHCPHYS.
1914 * @param pGMM Pointer to the GMM instance.
1915 * @param idPage The ID of the page to find.
1916 */
1917DECLINLINE(RTHCPHYS) gmmR0GetPageHCPhys(PGMM pGMM, uint32_t idPage)
1918{
1919 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1920 if (RT_LIKELY(pChunk))
1921 return RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, idPage & GMM_PAGEID_IDX_MASK);
1922 return NIL_RTHCPHYS;
1923}
1924#endif /* unused */
1925
1926
1927/**
1928 * Selects the appropriate free list given the number of free pages.
1929 *
1930 * @returns Free list index.
1931 * @param cFree The number of free pages in the chunk.
1932 */
1933DECLINLINE(unsigned) gmmR0SelectFreeSetList(unsigned cFree)
1934{
1935 unsigned iList = cFree >> GMM_CHUNK_FREE_SET_SHIFT;
1936 AssertMsg(iList < RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists) / RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists[0]),
1937 ("%d (%u)\n", iList, cFree));
1938 return iList;
1939}
1940
1941
1942/**
1943 * Unlinks the chunk from the free list it's currently on (if any).
1944 *
1945 * @param pChunk The allocation chunk.
1946 */
1947DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk)
1948{
1949 PGMMCHUNKFREESET pSet = pChunk->pSet;
1950 if (RT_LIKELY(pSet))
1951 {
1952 pSet->cFreePages -= pChunk->cFree;
1953 pSet->idGeneration++;
1954
1955 PGMMCHUNK pPrev = pChunk->pFreePrev;
1956 PGMMCHUNK pNext = pChunk->pFreeNext;
1957 if (pPrev)
1958 pPrev->pFreeNext = pNext;
1959 else
1960 pSet->apLists[gmmR0SelectFreeSetList(pChunk->cFree)] = pNext;
1961 if (pNext)
1962 pNext->pFreePrev = pPrev;
1963
1964 pChunk->pSet = NULL;
1965 pChunk->pFreeNext = NULL;
1966 pChunk->pFreePrev = NULL;
1967 }
1968 else
1969 {
1970 Assert(!pChunk->pFreeNext);
1971 Assert(!pChunk->pFreePrev);
1972 Assert(!pChunk->cFree);
1973 }
1974}
1975
1976
1977/**
1978 * Links the chunk onto the appropriate free list in the specified free set.
1979 *
1980 * If no free entries, it's not linked into any list.
1981 *
1982 * @param pChunk The allocation chunk.
1983 * @param pSet The free set.
1984 */
1985DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet)
1986{
1987 Assert(!pChunk->pSet);
1988 Assert(!pChunk->pFreeNext);
1989 Assert(!pChunk->pFreePrev);
1990
1991 if (pChunk->cFree > 0)
1992 {
1993 pChunk->pSet = pSet;
1994 pChunk->pFreePrev = NULL;
1995 unsigned const iList = gmmR0SelectFreeSetList(pChunk->cFree);
1996 pChunk->pFreeNext = pSet->apLists[iList];
1997 if (pChunk->pFreeNext)
1998 pChunk->pFreeNext->pFreePrev = pChunk;
1999 pSet->apLists[iList] = pChunk;
2000
2001 pSet->cFreePages += pChunk->cFree;
2002 pSet->idGeneration++;
2003 }
2004}
2005
2006
2007/**
2008 * Links the chunk onto the appropriate free list in the specified free set.
2009 *
2010 * If no free entries, it's not linked into any list.
2011 *
2012 * @param pGMM Pointer to the GMM instance.
2013 * @param pGVM Pointer to the kernel-only VM instace data.
2014 * @param pChunk The allocation chunk.
2015 */
2016DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
2017{
2018 PGMMCHUNKFREESET pSet;
2019 if (pGMM->fBoundMemoryMode)
2020 pSet = &pGVM->gmm.s.Private;
2021 else if (pChunk->cShared)
2022 pSet = &pGMM->Shared;
2023 else
2024 pSet = &pGMM->PrivateX;
2025 gmmR0LinkChunk(pChunk, pSet);
2026}
2027
2028
2029/**
2030 * Frees a Chunk ID.
2031 *
2032 * @param pGMM Pointer to the GMM instance.
2033 * @param idChunk The Chunk ID to free.
2034 */
2035static void gmmR0FreeChunkId(PGMM pGMM, uint32_t idChunk)
2036{
2037 AssertReturnVoid(idChunk != NIL_GMM_CHUNKID);
2038 AssertMsg(ASMBitTest(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk));
2039 ASMAtomicBitClear(&pGMM->bmChunkId[0], idChunk);
2040}
2041
2042
2043/**
2044 * Allocates a new Chunk ID.
2045 *
2046 * @returns The Chunk ID.
2047 * @param pGMM Pointer to the GMM instance.
2048 */
2049static uint32_t gmmR0AllocateChunkId(PGMM pGMM)
2050{
2051 AssertCompile(!((GMM_CHUNKID_LAST + 1) & 31)); /* must be a multiple of 32 */
2052 AssertCompile(NIL_GMM_CHUNKID == 0);
2053
2054 /*
2055 * Try the next sequential one.
2056 */
2057 int32_t idChunk = ++pGMM->idChunkPrev;
2058#if 0 /** @todo enable this code */
2059 if ( idChunk <= GMM_CHUNKID_LAST
2060 && idChunk > NIL_GMM_CHUNKID
2061 && !ASMAtomicBitTestAndSet(&pVMM->bmChunkId[0], idChunk))
2062 return idChunk;
2063#endif
2064
2065 /*
2066 * Scan sequentially from the last one.
2067 */
2068 if ( (uint32_t)idChunk < GMM_CHUNKID_LAST
2069 && idChunk > NIL_GMM_CHUNKID)
2070 {
2071 idChunk = ASMBitNextClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1, idChunk - 1);
2072 if (idChunk > NIL_GMM_CHUNKID)
2073 {
2074 AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2075 return pGMM->idChunkPrev = idChunk;
2076 }
2077 }
2078
2079 /*
2080 * Ok, scan from the start.
2081 * We're not racing anyone, so there is no need to expect failures or have restart loops.
2082 */
2083 idChunk = ASMBitFirstClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1);
2084 AssertMsgReturn(idChunk > NIL_GMM_CHUNKID, ("%#x\n", idChunk), NIL_GVM_HANDLE);
2085 AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2086
2087 return pGMM->idChunkPrev = idChunk;
2088}
2089
2090
2091/**
2092 * Allocates one private page.
2093 *
2094 * Worker for gmmR0AllocatePages.
2095 *
2096 * @param pChunk The chunk to allocate it from.
2097 * @param hGVM The GVM handle of the VM requesting memory.
2098 * @param pPageDesc The page descriptor.
2099 */
2100static void gmmR0AllocatePage(PGMMCHUNK pChunk, uint32_t hGVM, PGMMPAGEDESC pPageDesc)
2101{
2102 /* update the chunk stats. */
2103 if (pChunk->hGVM == NIL_GVM_HANDLE)
2104 pChunk->hGVM = hGVM;
2105 Assert(pChunk->cFree);
2106 pChunk->cFree--;
2107 pChunk->cPrivate++;
2108
2109 /* unlink the first free page. */
2110 const uint32_t iPage = pChunk->iFreeHead;
2111 AssertReleaseMsg(iPage < RT_ELEMENTS(pChunk->aPages), ("%d\n", iPage));
2112 PGMMPAGE pPage = &pChunk->aPages[iPage];
2113 Assert(GMM_PAGE_IS_FREE(pPage));
2114 pChunk->iFreeHead = pPage->Free.iNext;
2115 Log3(("A pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x iNext=%#x\n",
2116 pPage, iPage, (pChunk->Core.Key << GMM_CHUNKID_SHIFT) | iPage,
2117 pPage->Common.u2State, pChunk->iFreeHead, pPage->Free.iNext));
2118
2119 /* make the page private. */
2120 pPage->u = 0;
2121 AssertCompile(GMM_PAGE_STATE_PRIVATE == 0);
2122 pPage->Private.hGVM = hGVM;
2123 AssertCompile(NIL_RTHCPHYS >= GMM_GCPHYS_LAST);
2124 AssertCompile(GMM_GCPHYS_UNSHAREABLE >= GMM_GCPHYS_LAST);
2125 if (pPageDesc->HCPhysGCPhys <= GMM_GCPHYS_LAST)
2126 pPage->Private.pfn = pPageDesc->HCPhysGCPhys >> PAGE_SHIFT;
2127 else
2128 pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE; /* unshareable / unassigned - same thing. */
2129
2130 /* update the page descriptor. */
2131 pPageDesc->HCPhysGCPhys = RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, iPage);
2132 Assert(pPageDesc->HCPhysGCPhys != NIL_RTHCPHYS);
2133 pPageDesc->idPage = (pChunk->Core.Key << GMM_CHUNKID_SHIFT) | iPage;
2134 pPageDesc->idSharedPage = NIL_GMM_PAGEID;
2135}
2136
2137
2138/**
2139 * Picks the free pages from a chunk.
2140 *
2141 * @returns The new page descriptor table index.
2142 * @param pChunk The chunk.
2143 * @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2144 * affinity.
2145 * @param iPage The current page descriptor table index.
2146 * @param cPages The total number of pages to allocate.
2147 * @param paPages The page descriptor table (input + ouput).
2148 */
2149static uint32_t gmmR0AllocatePagesFromChunk(PGMMCHUNK pChunk, uint16_t const hGVM, uint32_t iPage, uint32_t cPages,
2150 PGMMPAGEDESC paPages)
2151{
2152 PGMMCHUNKFREESET pSet = pChunk->pSet; Assert(pSet);
2153 gmmR0UnlinkChunk(pChunk);
2154
2155 for (; pChunk->cFree && iPage < cPages; iPage++)
2156 gmmR0AllocatePage(pChunk, hGVM, &paPages[iPage]);
2157
2158 gmmR0LinkChunk(pChunk, pSet);
2159 return iPage;
2160}
2161
2162
2163/**
2164 * Registers a new chunk of memory.
2165 *
2166 * This is called by both gmmR0AllocateOneChunk and GMMR0SeedChunk.
2167 *
2168 * @returns VBox status code. On success, the giant GMM lock will be held, the
2169 * caller must release it (ugly).
2170 * @param pGMM Pointer to the GMM instance.
2171 * @param pSet Pointer to the set.
2172 * @param hMemObj The memory object for the chunk.
2173 * @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2174 * affinity.
2175 * @param fChunkFlags The chunk flags, GMM_CHUNK_FLAGS_XXX.
2176 * @param ppChunk Chunk address (out). Optional.
2177 *
2178 * @remarks The caller must not own the giant GMM mutex.
2179 * The giant GMM mutex will be acquired and returned acquired in
2180 * the success path. On failure, no locks will be held.
2181 */
2182static int gmmR0RegisterChunk(PGMM pGMM, PGMMCHUNKFREESET pSet, RTR0MEMOBJ hMemObj, uint16_t hGVM, uint16_t fChunkFlags,
2183 PGMMCHUNK *ppChunk)
2184{
2185 Assert(pGMM->hMtxOwner != RTThreadNativeSelf());
2186 Assert(hGVM != NIL_GVM_HANDLE || pGMM->fBoundMemoryMode);
2187#ifdef GMM_WITH_LEGACY_MODE
2188 Assert(fChunkFlags == 0 || fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE || fChunkFlags == GMM_CHUNK_FLAGS_SEEDED);
2189#else
2190 Assert(fChunkFlags == 0 || fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE);
2191#endif
2192
2193#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
2194 /*
2195 * Get a ring-0 mapping of the object.
2196 */
2197# ifdef GMM_WITH_LEGACY_MODE
2198 uint8_t *pbMapping = !(fChunkFlags & GMM_CHUNK_FLAGS_SEEDED) ? (uint8_t *)RTR0MemObjAddress(hMemObj) : NULL;
2199# else
2200 uint8_t *pbMapping = (uint8_t *)RTR0MemObjAddress(hMemObj);
2201# endif
2202 if (!pbMapping)
2203 {
2204 RTR0MEMOBJ hMapObj;
2205 int rc = RTR0MemObjMapKernel(&hMapObj, hMemObj, (void *)-1, 0, RTMEM_PROT_READ | RTMEM_PROT_WRITE);
2206 if (RT_SUCCESS(rc))
2207 pbMapping = (uint8_t *)RTR0MemObjAddress(hMapObj);
2208 else
2209 return rc;
2210 AssertPtr(pbMapping);
2211 }
2212#endif
2213
2214 /*
2215 * Allocate a chunk.
2216 */
2217 int rc;
2218 PGMMCHUNK pChunk = (PGMMCHUNK)RTMemAllocZ(sizeof(*pChunk));
2219 if (pChunk)
2220 {
2221 /*
2222 * Initialize it.
2223 */
2224 pChunk->hMemObj = hMemObj;
2225#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
2226 pChunk->pbMapping = pbMapping;
2227#endif
2228 pChunk->cFree = GMM_CHUNK_NUM_PAGES;
2229 pChunk->hGVM = hGVM;
2230 /*pChunk->iFreeHead = 0;*/
2231 pChunk->idNumaNode = gmmR0GetCurrentNumaNodeId();
2232 pChunk->iChunkMtx = UINT8_MAX;
2233 pChunk->fFlags = fChunkFlags;
2234 for (unsigned iPage = 0; iPage < RT_ELEMENTS(pChunk->aPages) - 1; iPage++)
2235 {
2236 pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
2237 pChunk->aPages[iPage].Free.iNext = iPage + 1;
2238 }
2239 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.u2State = GMM_PAGE_STATE_FREE;
2240 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.iNext = UINT16_MAX;
2241
2242 /*
2243 * Allocate a Chunk ID and insert it into the tree.
2244 * This has to be done behind the mutex of course.
2245 */
2246 rc = gmmR0MutexAcquire(pGMM);
2247 if (RT_SUCCESS(rc))
2248 {
2249 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2250 {
2251 pChunk->Core.Key = gmmR0AllocateChunkId(pGMM);
2252 if ( pChunk->Core.Key != NIL_GMM_CHUNKID
2253 && pChunk->Core.Key <= GMM_CHUNKID_LAST
2254 && RTAvlU32Insert(&pGMM->pChunks, &pChunk->Core))
2255 {
2256 pGMM->cChunks++;
2257 RTListAppend(&pGMM->ChunkList, &pChunk->ListNode);
2258 gmmR0LinkChunk(pChunk, pSet);
2259 LogFlow(("gmmR0RegisterChunk: pChunk=%p id=%#x cChunks=%d\n", pChunk, pChunk->Core.Key, pGMM->cChunks));
2260
2261 if (ppChunk)
2262 *ppChunk = pChunk;
2263 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2264 return VINF_SUCCESS;
2265 }
2266
2267 /* bail out */
2268 rc = VERR_GMM_CHUNK_INSERT;
2269 }
2270 else
2271 rc = VERR_GMM_IS_NOT_SANE;
2272 gmmR0MutexRelease(pGMM);
2273 }
2274
2275 RTMemFree(pChunk);
2276 }
2277 else
2278 rc = VERR_NO_MEMORY;
2279 return rc;
2280}
2281
2282
2283/**
2284 * Allocate a new chunk, immediately pick the requested pages from it, and adds
2285 * what's remaining to the specified free set.
2286 *
2287 * @note This will leave the giant mutex while allocating the new chunk!
2288 *
2289 * @returns VBox status code.
2290 * @param pGMM Pointer to the GMM instance data.
2291 * @param pGVM Pointer to the kernel-only VM instace data.
2292 * @param pSet Pointer to the free set.
2293 * @param cPages The number of pages requested.
2294 * @param paPages The page descriptor table (input + output).
2295 * @param piPage The pointer to the page descriptor table index variable.
2296 * This will be updated.
2297 */
2298static int gmmR0AllocateChunkNew(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet, uint32_t cPages,
2299 PGMMPAGEDESC paPages, uint32_t *piPage)
2300{
2301 gmmR0MutexRelease(pGMM);
2302
2303 RTR0MEMOBJ hMemObj;
2304#ifndef GMM_WITH_LEGACY_MODE
2305 int rc;
2306# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2307 if (pGMM->fHasWorkingAllocPhysNC)
2308 rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2309 else
2310# endif
2311 rc = RTR0MemObjAllocPage(&hMemObj, GMM_CHUNK_SIZE, false /*fExecutable*/);
2312#else
2313 int rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2314#endif
2315 if (RT_SUCCESS(rc))
2316 {
2317 /** @todo Duplicate gmmR0RegisterChunk here so we can avoid chaining up the
2318 * free pages first and then unchaining them right afterwards. Instead
2319 * do as much work as possible without holding the giant lock. */
2320 PGMMCHUNK pChunk;
2321 rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, 0 /*fChunkFlags*/, &pChunk);
2322 if (RT_SUCCESS(rc))
2323 {
2324 *piPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, *piPage, cPages, paPages);
2325 return VINF_SUCCESS;
2326 }
2327
2328 /* bail out */
2329 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
2330 }
2331
2332 int rc2 = gmmR0MutexAcquire(pGMM);
2333 AssertRCReturn(rc2, RT_FAILURE(rc) ? rc : rc2);
2334 return rc;
2335
2336}
2337
2338
2339/**
2340 * As a last restort we'll pick any page we can get.
2341 *
2342 * @returns The new page descriptor table index.
2343 * @param pSet The set to pick from.
2344 * @param pGVM Pointer to the global VM structure.
2345 * @param iPage The current page descriptor table index.
2346 * @param cPages The total number of pages to allocate.
2347 * @param paPages The page descriptor table (input + ouput).
2348 */
2349static uint32_t gmmR0AllocatePagesIndiscriminately(PGMMCHUNKFREESET pSet, PGVM pGVM,
2350 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2351{
2352 unsigned iList = RT_ELEMENTS(pSet->apLists);
2353 while (iList-- > 0)
2354 {
2355 PGMMCHUNK pChunk = pSet->apLists[iList];
2356 while (pChunk)
2357 {
2358 PGMMCHUNK pNext = pChunk->pFreeNext;
2359
2360 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2361 if (iPage >= cPages)
2362 return iPage;
2363
2364 pChunk = pNext;
2365 }
2366 }
2367 return iPage;
2368}
2369
2370
2371/**
2372 * Pick pages from empty chunks on the same NUMA node.
2373 *
2374 * @returns The new page descriptor table index.
2375 * @param pSet The set to pick from.
2376 * @param pGVM Pointer to the global VM structure.
2377 * @param iPage The current page descriptor table index.
2378 * @param cPages The total number of pages to allocate.
2379 * @param paPages The page descriptor table (input + ouput).
2380 */
2381static uint32_t gmmR0AllocatePagesFromEmptyChunksOnSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2382 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2383{
2384 PGMMCHUNK pChunk = pSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
2385 if (pChunk)
2386 {
2387 uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2388 while (pChunk)
2389 {
2390 PGMMCHUNK pNext = pChunk->pFreeNext;
2391
2392 if (pChunk->idNumaNode == idNumaNode)
2393 {
2394 pChunk->hGVM = pGVM->hSelf;
2395 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2396 if (iPage >= cPages)
2397 {
2398 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2399 return iPage;
2400 }
2401 }
2402
2403 pChunk = pNext;
2404 }
2405 }
2406 return iPage;
2407}
2408
2409
2410/**
2411 * Pick pages from non-empty chunks on the same NUMA node.
2412 *
2413 * @returns The new page descriptor table index.
2414 * @param pSet The set to pick from.
2415 * @param pGVM Pointer to the global VM structure.
2416 * @param iPage The current page descriptor table index.
2417 * @param cPages The total number of pages to allocate.
2418 * @param paPages The page descriptor table (input + ouput).
2419 */
2420static uint32_t gmmR0AllocatePagesFromSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2421 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2422{
2423 /** @todo start by picking from chunks with about the right size first? */
2424 uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2425 unsigned iList = GMM_CHUNK_FREE_SET_UNUSED_LIST;
2426 while (iList-- > 0)
2427 {
2428 PGMMCHUNK pChunk = pSet->apLists[iList];
2429 while (pChunk)
2430 {
2431 PGMMCHUNK pNext = pChunk->pFreeNext;
2432
2433 if (pChunk->idNumaNode == idNumaNode)
2434 {
2435 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2436 if (iPage >= cPages)
2437 {
2438 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2439 return iPage;
2440 }
2441 }
2442
2443 pChunk = pNext;
2444 }
2445 }
2446 return iPage;
2447}
2448
2449
2450/**
2451 * Pick pages that are in chunks already associated with the VM.
2452 *
2453 * @returns The new page descriptor table index.
2454 * @param pGMM Pointer to the GMM instance data.
2455 * @param pGVM Pointer to the global VM structure.
2456 * @param pSet The set to pick from.
2457 * @param iPage The current page descriptor table index.
2458 * @param cPages The total number of pages to allocate.
2459 * @param paPages The page descriptor table (input + ouput).
2460 */
2461static uint32_t gmmR0AllocatePagesAssociatedWithVM(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet,
2462 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2463{
2464 uint16_t const hGVM = pGVM->hSelf;
2465
2466 /* Hint. */
2467 if (pGVM->gmm.s.idLastChunkHint != NIL_GMM_CHUNKID)
2468 {
2469 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pGVM->gmm.s.idLastChunkHint);
2470 if (pChunk && pChunk->cFree)
2471 {
2472 iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2473 if (iPage >= cPages)
2474 return iPage;
2475 }
2476 }
2477
2478 /* Scan. */
2479 for (unsigned iList = 0; iList < RT_ELEMENTS(pSet->apLists); iList++)
2480 {
2481 PGMMCHUNK pChunk = pSet->apLists[iList];
2482 while (pChunk)
2483 {
2484 PGMMCHUNK pNext = pChunk->pFreeNext;
2485
2486 if (pChunk->hGVM == hGVM)
2487 {
2488 iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2489 if (iPage >= cPages)
2490 {
2491 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2492 return iPage;
2493 }
2494 }
2495
2496 pChunk = pNext;
2497 }
2498 }
2499 return iPage;
2500}
2501
2502
2503
2504/**
2505 * Pick pages in bound memory mode.
2506 *
2507 * @returns The new page descriptor table index.
2508 * @param pGVM Pointer to the global VM structure.
2509 * @param iPage The current page descriptor table index.
2510 * @param cPages The total number of pages to allocate.
2511 * @param paPages The page descriptor table (input + ouput).
2512 */
2513static uint32_t gmmR0AllocatePagesInBoundMode(PGVM pGVM, uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2514{
2515 for (unsigned iList = 0; iList < RT_ELEMENTS(pGVM->gmm.s.Private.apLists); iList++)
2516 {
2517 PGMMCHUNK pChunk = pGVM->gmm.s.Private.apLists[iList];
2518 while (pChunk)
2519 {
2520 Assert(pChunk->hGVM == pGVM->hSelf);
2521 PGMMCHUNK pNext = pChunk->pFreeNext;
2522 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2523 if (iPage >= cPages)
2524 return iPage;
2525 pChunk = pNext;
2526 }
2527 }
2528 return iPage;
2529}
2530
2531
2532/**
2533 * Checks if we should start picking pages from chunks of other VMs because
2534 * we're getting close to the system memory or reserved limit.
2535 *
2536 * @returns @c true if we should, @c false if we should first try allocate more
2537 * chunks.
2538 */
2539static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(PGVM pGVM)
2540{
2541 /*
2542 * Don't allocate a new chunk if we're
2543 */
2544 uint64_t cPgReserved = pGVM->gmm.s.Stats.Reserved.cBasePages
2545 + pGVM->gmm.s.Stats.Reserved.cFixedPages
2546 - pGVM->gmm.s.Stats.cBalloonedPages
2547 /** @todo what about shared pages? */;
2548 uint64_t cPgAllocated = pGVM->gmm.s.Stats.Allocated.cBasePages
2549 + pGVM->gmm.s.Stats.Allocated.cFixedPages;
2550 uint64_t cPgDelta = cPgReserved - cPgAllocated;
2551 if (cPgDelta < GMM_CHUNK_NUM_PAGES * 4)
2552 return true;
2553 /** @todo make the threshold configurable, also test the code to see if
2554 * this ever kicks in (we might be reserving too much or smth). */
2555
2556 /*
2557 * Check how close we're to the max memory limit and how many fragments
2558 * there are?...
2559 */
2560 /** @todo */
2561
2562 return false;
2563}
2564
2565
2566/**
2567 * Checks if we should start picking pages from chunks of other VMs because
2568 * there is a lot of free pages around.
2569 *
2570 * @returns @c true if we should, @c false if we should first try allocate more
2571 * chunks.
2572 */
2573static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(PGMM pGMM)
2574{
2575 /*
2576 * Setting the limit at 16 chunks (32 MB) at the moment.
2577 */
2578 if (pGMM->PrivateX.cFreePages >= GMM_CHUNK_NUM_PAGES * 16)
2579 return true;
2580 return false;
2581}
2582
2583
2584/**
2585 * Common worker for GMMR0AllocateHandyPages and GMMR0AllocatePages.
2586 *
2587 * @returns VBox status code:
2588 * @retval VINF_SUCCESS on success.
2589 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk or
2590 * gmmR0AllocateMoreChunks is necessary.
2591 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2592 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2593 * that is we're trying to allocate more than we've reserved.
2594 *
2595 * @param pGMM Pointer to the GMM instance data.
2596 * @param pGVM Pointer to the VM.
2597 * @param cPages The number of pages to allocate.
2598 * @param paPages Pointer to the page descriptors. See GMMPAGEDESC for
2599 * details on what is expected on input.
2600 * @param enmAccount The account to charge.
2601 *
2602 * @remarks Call takes the giant GMM lock.
2603 */
2604static int gmmR0AllocatePagesNew(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2605{
2606 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
2607
2608 /*
2609 * Check allocation limits.
2610 */
2611 if (RT_UNLIKELY(pGMM->cAllocatedPages + cPages > pGMM->cMaxPages))
2612 return VERR_GMM_HIT_GLOBAL_LIMIT;
2613
2614 switch (enmAccount)
2615 {
2616 case GMMACCOUNT_BASE:
2617 if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
2618 > pGVM->gmm.s.Stats.Reserved.cBasePages))
2619 {
2620 Log(("gmmR0AllocatePages:Base: Reserved=%#llx Allocated+Ballooned+Requested=%#llx+%#llx+%#x!\n",
2621 pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages,
2622 pGVM->gmm.s.Stats.cBalloonedPages, cPages));
2623 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2624 }
2625 break;
2626 case GMMACCOUNT_SHADOW:
2627 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages + cPages > pGVM->gmm.s.Stats.Reserved.cShadowPages))
2628 {
2629 Log(("gmmR0AllocatePages:Shadow: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2630 pGVM->gmm.s.Stats.Reserved.cShadowPages, pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
2631 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2632 }
2633 break;
2634 case GMMACCOUNT_FIXED:
2635 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages + cPages > pGVM->gmm.s.Stats.Reserved.cFixedPages))
2636 {
2637 Log(("gmmR0AllocatePages:Fixed: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2638 pGVM->gmm.s.Stats.Reserved.cFixedPages, pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
2639 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2640 }
2641 break;
2642 default:
2643 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2644 }
2645
2646#ifdef GMM_WITH_LEGACY_MODE
2647 /*
2648 * If we're in legacy memory mode, it's easy to figure if we have
2649 * sufficient number of pages up-front.
2650 */
2651 if ( pGMM->fLegacyAllocationMode
2652 && pGVM->gmm.s.Private.cFreePages < cPages)
2653 {
2654 Assert(pGMM->fBoundMemoryMode);
2655 return VERR_GMM_SEED_ME;
2656 }
2657#endif
2658
2659 /*
2660 * Update the accounts before we proceed because we might be leaving the
2661 * protection of the global mutex and thus run the risk of permitting
2662 * too much memory to be allocated.
2663 */
2664 switch (enmAccount)
2665 {
2666 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages += cPages; break;
2667 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages += cPages; break;
2668 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages += cPages; break;
2669 default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2670 }
2671 pGVM->gmm.s.Stats.cPrivatePages += cPages;
2672 pGMM->cAllocatedPages += cPages;
2673
2674#ifdef GMM_WITH_LEGACY_MODE
2675 /*
2676 * Part two of it's-easy-in-legacy-memory-mode.
2677 */
2678 if (pGMM->fLegacyAllocationMode)
2679 {
2680 uint32_t iPage = gmmR0AllocatePagesInBoundMode(pGVM, 0, cPages, paPages);
2681 AssertReleaseReturn(iPage == cPages, VERR_GMM_ALLOC_PAGES_IPE);
2682 return VINF_SUCCESS;
2683 }
2684#endif
2685
2686 /*
2687 * Bound mode is also relatively straightforward.
2688 */
2689 uint32_t iPage = 0;
2690 int rc = VINF_SUCCESS;
2691 if (pGMM->fBoundMemoryMode)
2692 {
2693 iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2694 if (iPage < cPages)
2695 do
2696 rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGVM->gmm.s.Private, cPages, paPages, &iPage);
2697 while (iPage < cPages && RT_SUCCESS(rc));
2698 }
2699 /*
2700 * Shared mode is trickier as we should try archive the same locality as
2701 * in bound mode, but smartly make use of non-full chunks allocated by
2702 * other VMs if we're low on memory.
2703 */
2704 else
2705 {
2706 /* Pick the most optimal pages first. */
2707 iPage = gmmR0AllocatePagesAssociatedWithVM(pGMM, pGVM, &pGMM->PrivateX, iPage, cPages, paPages);
2708 if (iPage < cPages)
2709 {
2710 /* Maybe we should try getting pages from chunks "belonging" to
2711 other VMs before allocating more chunks? */
2712 bool fTriedOnSameAlready = false;
2713 if (gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(pGVM))
2714 {
2715 iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2716 fTriedOnSameAlready = true;
2717 }
2718
2719 /* Allocate memory from empty chunks. */
2720 if (iPage < cPages)
2721 iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2722
2723 /* Grab empty shared chunks. */
2724 if (iPage < cPages)
2725 iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2726
2727 /* If there is a lof of free pages spread around, try not waste
2728 system memory on more chunks. (Should trigger defragmentation.) */
2729 if ( !fTriedOnSameAlready
2730 && gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(pGMM))
2731 {
2732 iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2733 if (iPage < cPages)
2734 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2735 }
2736
2737 /*
2738 * Ok, try allocate new chunks.
2739 */
2740 if (iPage < cPages)
2741 {
2742 do
2743 rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGMM->PrivateX, cPages, paPages, &iPage);
2744 while (iPage < cPages && RT_SUCCESS(rc));
2745
2746 /* If the host is out of memory, take whatever we can get. */
2747 if ( (rc == VERR_NO_MEMORY || rc == VERR_NO_PHYS_MEMORY)
2748 && pGMM->PrivateX.cFreePages + pGMM->Shared.cFreePages >= cPages - iPage)
2749 {
2750 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2751 if (iPage < cPages)
2752 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2753 AssertRelease(iPage == cPages);
2754 rc = VINF_SUCCESS;
2755 }
2756 }
2757 }
2758 }
2759
2760 /*
2761 * Clean up on failure. Since this is bound to be a low-memory condition
2762 * we will give back any empty chunks that might be hanging around.
2763 */
2764 if (RT_FAILURE(rc))
2765 {
2766 /* Update the statistics. */
2767 pGVM->gmm.s.Stats.cPrivatePages -= cPages;
2768 pGMM->cAllocatedPages -= cPages - iPage;
2769 switch (enmAccount)
2770 {
2771 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages; break;
2772 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= cPages; break;
2773 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= cPages; break;
2774 default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2775 }
2776
2777 /* Release the pages. */
2778 while (iPage-- > 0)
2779 {
2780 uint32_t idPage = paPages[iPage].idPage;
2781 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
2782 if (RT_LIKELY(pPage))
2783 {
2784 Assert(GMM_PAGE_IS_PRIVATE(pPage));
2785 Assert(pPage->Private.hGVM == pGVM->hSelf);
2786 gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
2787 }
2788 else
2789 AssertMsgFailed(("idPage=%#x\n", idPage));
2790
2791 paPages[iPage].idPage = NIL_GMM_PAGEID;
2792 paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2793 paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2794 }
2795
2796 /* Free empty chunks. */
2797 /** @todo */
2798
2799 /* return the fail status on failure */
2800 return rc;
2801 }
2802 return VINF_SUCCESS;
2803}
2804
2805
2806/**
2807 * Updates the previous allocations and allocates more pages.
2808 *
2809 * The handy pages are always taken from the 'base' memory account.
2810 * The allocated pages are not cleared and will contains random garbage.
2811 *
2812 * @returns VBox status code:
2813 * @retval VINF_SUCCESS on success.
2814 * @retval VERR_NOT_OWNER if the caller is not an EMT.
2815 * @retval VERR_GMM_PAGE_NOT_FOUND if one of the pages to update wasn't found.
2816 * @retval VERR_GMM_PAGE_NOT_PRIVATE if one of the pages to update wasn't a
2817 * private page.
2818 * @retval VERR_GMM_PAGE_NOT_SHARED if one of the pages to update wasn't a
2819 * shared page.
2820 * @retval VERR_GMM_NOT_PAGE_OWNER if one of the pages to be updated wasn't
2821 * owned by the VM.
2822 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2823 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2824 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2825 * that is we're trying to allocate more than we've reserved.
2826 *
2827 * @param pGVM The global (ring-0) VM structure.
2828 * @param idCpu The VCPU id.
2829 * @param cPagesToUpdate The number of pages to update (starting from the head).
2830 * @param cPagesToAlloc The number of pages to allocate (starting from the head).
2831 * @param paPages The array of page descriptors.
2832 * See GMMPAGEDESC for details on what is expected on input.
2833 * @thread EMT(idCpu)
2834 */
2835GMMR0DECL(int) GMMR0AllocateHandyPages(PGVM pGVM, VMCPUID idCpu, uint32_t cPagesToUpdate,
2836 uint32_t cPagesToAlloc, PGMMPAGEDESC paPages)
2837{
2838 LogFlow(("GMMR0AllocateHandyPages: pGVM=%p cPagesToUpdate=%#x cPagesToAlloc=%#x paPages=%p\n",
2839 pGVM, cPagesToUpdate, cPagesToAlloc, paPages));
2840
2841 /*
2842 * Validate, get basics and take the semaphore.
2843 * (This is a relatively busy path, so make predictions where possible.)
2844 */
2845 PGMM pGMM;
2846 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2847 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
2848 if (RT_FAILURE(rc))
2849 return rc;
2850
2851 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2852 AssertMsgReturn( (cPagesToUpdate && cPagesToUpdate < 1024)
2853 || (cPagesToAlloc && cPagesToAlloc < 1024),
2854 ("cPagesToUpdate=%#x cPagesToAlloc=%#x\n", cPagesToUpdate, cPagesToAlloc),
2855 VERR_INVALID_PARAMETER);
2856
2857 unsigned iPage = 0;
2858 for (; iPage < cPagesToUpdate; iPage++)
2859 {
2860 AssertMsgReturn( ( paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2861 && !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK))
2862 || paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2863 || paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE,
2864 ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys),
2865 VERR_INVALID_PARAMETER);
2866 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2867 /*|| paPages[iPage].idPage == NIL_GMM_PAGEID*/,
2868 ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2869 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2870 /*|| paPages[iPage].idSharedPage == NIL_GMM_PAGEID*/,
2871 ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2872 }
2873
2874 for (; iPage < cPagesToAlloc; iPage++)
2875 {
2876 AssertMsgReturn(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS, ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys), VERR_INVALID_PARAMETER);
2877 AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2878 AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2879 }
2880
2881 gmmR0MutexAcquire(pGMM);
2882 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2883 {
2884 /* No allocations before the initial reservation has been made! */
2885 if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
2886 && pGVM->gmm.s.Stats.Reserved.cFixedPages
2887 && pGVM->gmm.s.Stats.Reserved.cShadowPages))
2888 {
2889 /*
2890 * Perform the updates.
2891 * Stop on the first error.
2892 */
2893 for (iPage = 0; iPage < cPagesToUpdate; iPage++)
2894 {
2895 if (paPages[iPage].idPage != NIL_GMM_PAGEID)
2896 {
2897 PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idPage);
2898 if (RT_LIKELY(pPage))
2899 {
2900 if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
2901 {
2902 if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
2903 {
2904 AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2905 if (RT_LIKELY(paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST))
2906 pPage->Private.pfn = paPages[iPage].HCPhysGCPhys >> PAGE_SHIFT;
2907 else if (paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE)
2908 pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
2909 /* else: NIL_RTHCPHYS nothing */
2910
2911 paPages[iPage].idPage = NIL_GMM_PAGEID;
2912 paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2913 }
2914 else
2915 {
2916 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not owner! hGVM=%#x hSelf=%#x\n",
2917 iPage, paPages[iPage].idPage, pPage->Private.hGVM, pGVM->hSelf));
2918 rc = VERR_GMM_NOT_PAGE_OWNER;
2919 break;
2920 }
2921 }
2922 else
2923 {
2924 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not private! %.*Rhxs (type %d)\n", iPage, paPages[iPage].idPage, sizeof(*pPage), pPage, pPage->Common.u2State));
2925 rc = VERR_GMM_PAGE_NOT_PRIVATE;
2926 break;
2927 }
2928 }
2929 else
2930 {
2931 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (private)\n", iPage, paPages[iPage].idPage));
2932 rc = VERR_GMM_PAGE_NOT_FOUND;
2933 break;
2934 }
2935 }
2936
2937 if (paPages[iPage].idSharedPage != NIL_GMM_PAGEID)
2938 {
2939 PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idSharedPage);
2940 if (RT_LIKELY(pPage))
2941 {
2942 if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
2943 {
2944 AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2945 Assert(pPage->Shared.cRefs);
2946 Assert(pGVM->gmm.s.Stats.cSharedPages);
2947 Assert(pGVM->gmm.s.Stats.Allocated.cBasePages);
2948
2949 Log(("GMMR0AllocateHandyPages: free shared page %x cRefs=%d\n", paPages[iPage].idSharedPage, pPage->Shared.cRefs));
2950 pGVM->gmm.s.Stats.cSharedPages--;
2951 pGVM->gmm.s.Stats.Allocated.cBasePages--;
2952 if (!--pPage->Shared.cRefs)
2953 gmmR0FreeSharedPage(pGMM, pGVM, paPages[iPage].idSharedPage, pPage);
2954 else
2955 {
2956 Assert(pGMM->cDuplicatePages);
2957 pGMM->cDuplicatePages--;
2958 }
2959
2960 paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2961 }
2962 else
2963 {
2964 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not shared!\n", iPage, paPages[iPage].idSharedPage));
2965 rc = VERR_GMM_PAGE_NOT_SHARED;
2966 break;
2967 }
2968 }
2969 else
2970 {
2971 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (shared)\n", iPage, paPages[iPage].idSharedPage));
2972 rc = VERR_GMM_PAGE_NOT_FOUND;
2973 break;
2974 }
2975 }
2976 } /* for each page to update */
2977
2978 if (RT_SUCCESS(rc) && cPagesToAlloc > 0)
2979 {
2980#if defined(VBOX_STRICT) && 0 /** @todo re-test this later. Appeared to be a PGM init bug. */
2981 for (iPage = 0; iPage < cPagesToAlloc; iPage++)
2982 {
2983 Assert(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS);
2984 Assert(paPages[iPage].idPage == NIL_GMM_PAGEID);
2985 Assert(paPages[iPage].idSharedPage == NIL_GMM_PAGEID);
2986 }
2987#endif
2988
2989 /*
2990 * Join paths with GMMR0AllocatePages for the allocation.
2991 * Note! gmmR0AllocateMoreChunks may leave the protection of the mutex!
2992 */
2993 rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPagesToAlloc, paPages, GMMACCOUNT_BASE);
2994 }
2995 }
2996 else
2997 rc = VERR_WRONG_ORDER;
2998 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2999 }
3000 else
3001 rc = VERR_GMM_IS_NOT_SANE;
3002 gmmR0MutexRelease(pGMM);
3003 LogFlow(("GMMR0AllocateHandyPages: returns %Rrc\n", rc));
3004 return rc;
3005}
3006
3007
3008/**
3009 * Allocate one or more pages.
3010 *
3011 * This is typically used for ROMs and MMIO2 (VRAM) during VM creation.
3012 * The allocated pages are not cleared and will contain random garbage.
3013 *
3014 * @returns VBox status code:
3015 * @retval VINF_SUCCESS on success.
3016 * @retval VERR_NOT_OWNER if the caller is not an EMT.
3017 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
3018 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3019 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3020 * that is we're trying to allocate more than we've reserved.
3021 *
3022 * @param pGVM The global (ring-0) VM structure.
3023 * @param idCpu The VCPU id.
3024 * @param cPages The number of pages to allocate.
3025 * @param paPages Pointer to the page descriptors.
3026 * See GMMPAGEDESC for details on what is expected on
3027 * input.
3028 * @param enmAccount The account to charge.
3029 *
3030 * @thread EMT.
3031 */
3032GMMR0DECL(int) GMMR0AllocatePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
3033{
3034 LogFlow(("GMMR0AllocatePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3035
3036 /*
3037 * Validate, get basics and take the semaphore.
3038 */
3039 PGMM pGMM;
3040 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3041 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3042 if (RT_FAILURE(rc))
3043 return rc;
3044
3045 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3046 AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3047 AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3048
3049 for (unsigned iPage = 0; iPage < cPages; iPage++)
3050 {
3051 AssertMsgReturn( paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
3052 || paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE
3053 || ( enmAccount == GMMACCOUNT_BASE
3054 && paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
3055 && !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK)),
3056 ("#%#x: %RHp enmAccount=%d\n", iPage, paPages[iPage].HCPhysGCPhys, enmAccount),
3057 VERR_INVALID_PARAMETER);
3058 AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3059 AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
3060 }
3061
3062 gmmR0MutexAcquire(pGMM);
3063 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3064 {
3065
3066 /* No allocations before the initial reservation has been made! */
3067 if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
3068 && pGVM->gmm.s.Stats.Reserved.cFixedPages
3069 && pGVM->gmm.s.Stats.Reserved.cShadowPages))
3070 rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPages, paPages, enmAccount);
3071 else
3072 rc = VERR_WRONG_ORDER;
3073 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3074 }
3075 else
3076 rc = VERR_GMM_IS_NOT_SANE;
3077 gmmR0MutexRelease(pGMM);
3078 LogFlow(("GMMR0AllocatePages: returns %Rrc\n", rc));
3079 return rc;
3080}
3081
3082
3083/**
3084 * VMMR0 request wrapper for GMMR0AllocatePages.
3085 *
3086 * @returns see GMMR0AllocatePages.
3087 * @param pGVM The global (ring-0) VM structure.
3088 * @param idCpu The VCPU id.
3089 * @param pReq Pointer to the request packet.
3090 */
3091GMMR0DECL(int) GMMR0AllocatePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMALLOCATEPAGESREQ pReq)
3092{
3093 /*
3094 * Validate input and pass it on.
3095 */
3096 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3097 AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0]),
3098 ("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0])),
3099 VERR_INVALID_PARAMETER);
3100 AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages]),
3101 ("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages])),
3102 VERR_INVALID_PARAMETER);
3103
3104 return GMMR0AllocatePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3105}
3106
3107
3108/**
3109 * Allocate a large page to represent guest RAM
3110 *
3111 * The allocated pages are not cleared and will contains random garbage.
3112 *
3113 * @returns VBox status code:
3114 * @retval VINF_SUCCESS on success.
3115 * @retval VERR_NOT_OWNER if the caller is not an EMT.
3116 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
3117 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3118 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3119 * that is we're trying to allocate more than we've reserved.
3120 * @returns see GMMR0AllocatePages.
3121 *
3122 * @param pGVM The global (ring-0) VM structure.
3123 * @param idCpu The VCPU id.
3124 * @param cbPage Large page size.
3125 * @param pIdPage Where to return the GMM page ID of the page.
3126 * @param pHCPhys Where to return the host physical address of the page.
3127 */
3128GMMR0DECL(int) GMMR0AllocateLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t cbPage, uint32_t *pIdPage, RTHCPHYS *pHCPhys)
3129{
3130 LogFlow(("GMMR0AllocateLargePage: pGVM=%p cbPage=%x\n", pGVM, cbPage));
3131
3132 AssertReturn(cbPage == GMM_CHUNK_SIZE, VERR_INVALID_PARAMETER);
3133 AssertPtrReturn(pIdPage, VERR_INVALID_PARAMETER);
3134 AssertPtrReturn(pHCPhys, VERR_INVALID_PARAMETER);
3135
3136 /*
3137 * Validate, get basics and take the semaphore.
3138 */
3139 PGMM pGMM;
3140 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3141 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3142 if (RT_FAILURE(rc))
3143 return rc;
3144
3145#ifdef GMM_WITH_LEGACY_MODE
3146 // /* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3147 // if (pGMM->fLegacyAllocationMode)
3148 // return VERR_NOT_SUPPORTED;
3149#endif
3150
3151 *pHCPhys = NIL_RTHCPHYS;
3152 *pIdPage = NIL_GMM_PAGEID;
3153
3154 gmmR0MutexAcquire(pGMM);
3155 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3156 {
3157 const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3158 if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
3159 > pGVM->gmm.s.Stats.Reserved.cBasePages))
3160 {
3161 Log(("GMMR0AllocateLargePage: Reserved=%#llx Allocated+Requested=%#llx+%#x!\n",
3162 pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3163 gmmR0MutexRelease(pGMM);
3164 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
3165 }
3166
3167 /*
3168 * Allocate a new large page chunk.
3169 *
3170 * Note! We leave the giant GMM lock temporarily as the allocation might
3171 * take a long time. gmmR0RegisterChunk will retake it (ugly).
3172 */
3173 AssertCompile(GMM_CHUNK_SIZE == _2M);
3174 gmmR0MutexRelease(pGMM);
3175
3176 RTR0MEMOBJ hMemObj;
3177 rc = RTR0MemObjAllocPhysEx(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS, GMM_CHUNK_SIZE);
3178 if (RT_SUCCESS(rc))
3179 {
3180 PGMMCHUNKFREESET pSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
3181 PGMMCHUNK pChunk;
3182 rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_LARGE_PAGE, &pChunk);
3183 if (RT_SUCCESS(rc))
3184 {
3185 /*
3186 * Allocate all the pages in the chunk.
3187 */
3188 /* Unlink the new chunk from the free list. */
3189 gmmR0UnlinkChunk(pChunk);
3190
3191 /** @todo rewrite this to skip the looping. */
3192 /* Allocate all pages. */
3193 GMMPAGEDESC PageDesc;
3194 gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3195
3196 /* Return the first page as we'll use the whole chunk as one big page. */
3197 *pIdPage = PageDesc.idPage;
3198 *pHCPhys = PageDesc.HCPhysGCPhys;
3199
3200 for (unsigned i = 1; i < cPages; i++)
3201 gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3202
3203 /* Update accounting. */
3204 pGVM->gmm.s.Stats.Allocated.cBasePages += cPages;
3205 pGVM->gmm.s.Stats.cPrivatePages += cPages;
3206 pGMM->cAllocatedPages += cPages;
3207
3208 gmmR0LinkChunk(pChunk, pSet);
3209 gmmR0MutexRelease(pGMM);
3210 LogFlow(("GMMR0AllocateLargePage: returns VINF_SUCCESS\n"));
3211 return VINF_SUCCESS;
3212 }
3213 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3214 }
3215 }
3216 else
3217 {
3218 gmmR0MutexRelease(pGMM);
3219 rc = VERR_GMM_IS_NOT_SANE;
3220 }
3221
3222 LogFlow(("GMMR0AllocateLargePage: returns %Rrc\n", rc));
3223 return rc;
3224}
3225
3226
3227/**
3228 * Free a large page.
3229 *
3230 * @returns VBox status code:
3231 * @param pGVM The global (ring-0) VM structure.
3232 * @param idCpu The VCPU id.
3233 * @param idPage The large page id.
3234 */
3235GMMR0DECL(int) GMMR0FreeLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t idPage)
3236{
3237 LogFlow(("GMMR0FreeLargePage: pGVM=%p idPage=%x\n", pGVM, idPage));
3238
3239 /*
3240 * Validate, get basics and take the semaphore.
3241 */
3242 PGMM pGMM;
3243 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3244 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3245 if (RT_FAILURE(rc))
3246 return rc;
3247
3248#ifdef GMM_WITH_LEGACY_MODE
3249 // /* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3250 // if (pGMM->fLegacyAllocationMode)
3251 // return VERR_NOT_SUPPORTED;
3252#endif
3253
3254 gmmR0MutexAcquire(pGMM);
3255 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3256 {
3257 const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3258
3259 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3260 {
3261 Log(("GMMR0FreeLargePage: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3262 gmmR0MutexRelease(pGMM);
3263 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3264 }
3265
3266 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3267 if (RT_LIKELY( pPage
3268 && GMM_PAGE_IS_PRIVATE(pPage)))
3269 {
3270 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3271 Assert(pChunk);
3272 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3273 Assert(pChunk->cPrivate > 0);
3274
3275 /* Release the memory immediately. */
3276 gmmR0FreeChunk(pGMM, NULL, pChunk, false /*fRelaxedSem*/); /** @todo this can be relaxed too! */
3277
3278 /* Update accounting. */
3279 pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages;
3280 pGVM->gmm.s.Stats.cPrivatePages -= cPages;
3281 pGMM->cAllocatedPages -= cPages;
3282 }
3283 else
3284 rc = VERR_GMM_PAGE_NOT_FOUND;
3285 }
3286 else
3287 rc = VERR_GMM_IS_NOT_SANE;
3288
3289 gmmR0MutexRelease(pGMM);
3290 LogFlow(("GMMR0FreeLargePage: returns %Rrc\n", rc));
3291 return rc;
3292}
3293
3294
3295/**
3296 * VMMR0 request wrapper for GMMR0FreeLargePage.
3297 *
3298 * @returns see GMMR0FreeLargePage.
3299 * @param pGVM The global (ring-0) VM structure.
3300 * @param idCpu The VCPU id.
3301 * @param pReq Pointer to the request packet.
3302 */
3303GMMR0DECL(int) GMMR0FreeLargePageReq(PGVM pGVM, VMCPUID idCpu, PGMMFREELARGEPAGEREQ pReq)
3304{
3305 /*
3306 * Validate input and pass it on.
3307 */
3308 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3309 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMFREEPAGESREQ),
3310 ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(GMMFREEPAGESREQ)),
3311 VERR_INVALID_PARAMETER);
3312
3313 return GMMR0FreeLargePage(pGVM, idCpu, pReq->idPage);
3314}
3315
3316
3317/**
3318 * Frees a chunk, giving it back to the host OS.
3319 *
3320 * @param pGMM Pointer to the GMM instance.
3321 * @param pGVM This is set when called from GMMR0CleanupVM so we can
3322 * unmap and free the chunk in one go.
3323 * @param pChunk The chunk to free.
3324 * @param fRelaxedSem Whether we can release the semaphore while doing the
3325 * freeing (@c true) or not.
3326 */
3327static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3328{
3329 Assert(pChunk->Core.Key != NIL_GMM_CHUNKID);
3330
3331 GMMR0CHUNKMTXSTATE MtxState;
3332 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3333
3334 /*
3335 * Cleanup hack! Unmap the chunk from the callers address space.
3336 * This shouldn't happen, so screw lock contention...
3337 */
3338 if ( pChunk->cMappingsX
3339#ifdef GMM_WITH_LEGACY_MODE
3340 && (!pGMM->fLegacyAllocationMode || (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
3341#endif
3342 && pGVM)
3343 gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3344
3345 /*
3346 * If there are current mappings of the chunk, then request the
3347 * VMs to unmap them. Reposition the chunk in the free list so
3348 * it won't be a likely candidate for allocations.
3349 */
3350 if (pChunk->cMappingsX)
3351 {
3352 /** @todo R0 -> VM request */
3353 /* The chunk can be mapped by more than one VM if fBoundMemoryMode is false! */
3354 Log(("gmmR0FreeChunk: chunk still has %d mappings; don't free!\n", pChunk->cMappingsX));
3355 gmmR0ChunkMutexRelease(&MtxState, pChunk);
3356 return false;
3357 }
3358
3359
3360 /*
3361 * Save and trash the handle.
3362 */
3363 RTR0MEMOBJ const hMemObj = pChunk->hMemObj;
3364 pChunk->hMemObj = NIL_RTR0MEMOBJ;
3365
3366 /*
3367 * Unlink it from everywhere.
3368 */
3369 gmmR0UnlinkChunk(pChunk);
3370
3371 RTListNodeRemove(&pChunk->ListNode);
3372
3373 PAVLU32NODECORE pCore = RTAvlU32Remove(&pGMM->pChunks, pChunk->Core.Key);
3374 Assert(pCore == &pChunk->Core); NOREF(pCore);
3375
3376 PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(pChunk->Core.Key)];
3377 if (pTlbe->pChunk == pChunk)
3378 {
3379 pTlbe->idChunk = NIL_GMM_CHUNKID;
3380 pTlbe->pChunk = NULL;
3381 }
3382
3383 Assert(pGMM->cChunks > 0);
3384 pGMM->cChunks--;
3385
3386 /*
3387 * Free the Chunk ID before dropping the locks and freeing the rest.
3388 */
3389 gmmR0FreeChunkId(pGMM, pChunk->Core.Key);
3390 pChunk->Core.Key = NIL_GMM_CHUNKID;
3391
3392 pGMM->cFreedChunks++;
3393
3394 gmmR0ChunkMutexRelease(&MtxState, NULL);
3395 if (fRelaxedSem)
3396 gmmR0MutexRelease(pGMM);
3397
3398 RTMemFree(pChunk->paMappingsX);
3399 pChunk->paMappingsX = NULL;
3400
3401 RTMemFree(pChunk);
3402
3403#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
3404 int rc = RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3405#else
3406 int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3407#endif
3408 AssertLogRelRC(rc);
3409
3410 if (fRelaxedSem)
3411 gmmR0MutexAcquire(pGMM);
3412 return fRelaxedSem;
3413}
3414
3415
3416/**
3417 * Free page worker.
3418 *
3419 * The caller does all the statistic decrementing, we do all the incrementing.
3420 *
3421 * @param pGMM Pointer to the GMM instance data.
3422 * @param pGVM Pointer to the GVM instance.
3423 * @param pChunk Pointer to the chunk this page belongs to.
3424 * @param idPage The Page ID.
3425 * @param pPage Pointer to the page.
3426 */
3427static void gmmR0FreePageWorker(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint32_t idPage, PGMMPAGE pPage)
3428{
3429 Log3(("F pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x\n",
3430 pPage, pPage - &pChunk->aPages[0], idPage, pPage->Common.u2State, pChunk->iFreeHead)); NOREF(idPage);
3431
3432 /*
3433 * Put the page on the free list.
3434 */
3435 pPage->u = 0;
3436 pPage->Free.u2State = GMM_PAGE_STATE_FREE;
3437 Assert(pChunk->iFreeHead < RT_ELEMENTS(pChunk->aPages) || pChunk->iFreeHead == UINT16_MAX);
3438 pPage->Free.iNext = pChunk->iFreeHead;
3439 pChunk->iFreeHead = pPage - &pChunk->aPages[0];
3440
3441 /*
3442 * Update statistics (the cShared/cPrivate stats are up to date already),
3443 * and relink the chunk if necessary.
3444 */
3445 unsigned const cFree = pChunk->cFree;
3446 if ( !cFree
3447 || gmmR0SelectFreeSetList(cFree) != gmmR0SelectFreeSetList(cFree + 1))
3448 {
3449 gmmR0UnlinkChunk(pChunk);
3450 pChunk->cFree++;
3451 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
3452 }
3453 else
3454 {
3455 pChunk->cFree = cFree + 1;
3456 pChunk->pSet->cFreePages++;
3457 }
3458
3459 /*
3460 * If the chunk becomes empty, consider giving memory back to the host OS.
3461 *
3462 * The current strategy is to try give it back if there are other chunks
3463 * in this free list, meaning if there are at least 240 free pages in this
3464 * category. Note that since there are probably mappings of the chunk,
3465 * it won't be freed up instantly, which probably screws up this logic
3466 * a bit...
3467 */
3468 /** @todo Do this on the way out. */
3469 if (RT_LIKELY( pChunk->cFree != GMM_CHUNK_NUM_PAGES
3470 || pChunk->pFreeNext == NULL
3471 || pChunk->pFreePrev == NULL /** @todo this is probably misfiring, see reset... */))
3472 { /* likely */ }
3473#ifdef GMM_WITH_LEGACY_MODE
3474 else if (RT_LIKELY(pGMM->fLegacyAllocationMode && !(pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE)))
3475 { /* likely */ }
3476#endif
3477 else
3478 gmmR0FreeChunk(pGMM, NULL, pChunk, false);
3479
3480}
3481
3482
3483/**
3484 * Frees a shared page, the page is known to exist and be valid and such.
3485 *
3486 * @param pGMM Pointer to the GMM instance.
3487 * @param pGVM Pointer to the GVM instance.
3488 * @param idPage The page id.
3489 * @param pPage The page structure.
3490 */
3491DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3492{
3493 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3494 Assert(pChunk);
3495 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3496 Assert(pChunk->cShared > 0);
3497 Assert(pGMM->cSharedPages > 0);
3498 Assert(pGMM->cAllocatedPages > 0);
3499 Assert(!pPage->Shared.cRefs);
3500
3501 pChunk->cShared--;
3502 pGMM->cAllocatedPages--;
3503 pGMM->cSharedPages--;
3504 gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3505}
3506
3507
3508/**
3509 * Frees a private page, the page is known to exist and be valid and such.
3510 *
3511 * @param pGMM Pointer to the GMM instance.
3512 * @param pGVM Pointer to the GVM instance.
3513 * @param idPage The page id.
3514 * @param pPage The page structure.
3515 */
3516DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3517{
3518 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3519 Assert(pChunk);
3520 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3521 Assert(pChunk->cPrivate > 0);
3522 Assert(pGMM->cAllocatedPages > 0);
3523
3524 pChunk->cPrivate--;
3525 pGMM->cAllocatedPages--;
3526 gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3527}
3528
3529
3530/**
3531 * Common worker for GMMR0FreePages and GMMR0BalloonedPages.
3532 *
3533 * @returns VBox status code:
3534 * @retval xxx
3535 *
3536 * @param pGMM Pointer to the GMM instance data.
3537 * @param pGVM Pointer to the VM.
3538 * @param cPages The number of pages to free.
3539 * @param paPages Pointer to the page descriptors.
3540 * @param enmAccount The account this relates to.
3541 */
3542static int gmmR0FreePages(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3543{
3544 /*
3545 * Check that the request isn't impossible wrt to the account status.
3546 */
3547 switch (enmAccount)
3548 {
3549 case GMMACCOUNT_BASE:
3550 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3551 {
3552 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3553 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3554 }
3555 break;
3556 case GMMACCOUNT_SHADOW:
3557 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages < cPages))
3558 {
3559 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
3560 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3561 }
3562 break;
3563 case GMMACCOUNT_FIXED:
3564 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages < cPages))
3565 {
3566 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
3567 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3568 }
3569 break;
3570 default:
3571 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3572 }
3573
3574 /*
3575 * Walk the descriptors and free the pages.
3576 *
3577 * Statistics (except the account) are being updated as we go along,
3578 * unlike the alloc code. Also, stop on the first error.
3579 */
3580 int rc = VINF_SUCCESS;
3581 uint32_t iPage;
3582 for (iPage = 0; iPage < cPages; iPage++)
3583 {
3584 uint32_t idPage = paPages[iPage].idPage;
3585 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3586 if (RT_LIKELY(pPage))
3587 {
3588 if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
3589 {
3590 if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
3591 {
3592 Assert(pGVM->gmm.s.Stats.cPrivatePages);
3593 pGVM->gmm.s.Stats.cPrivatePages--;
3594 gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
3595 }
3596 else
3597 {
3598 Log(("gmmR0AllocatePages: #%#x/%#x: not owner! hGVM=%#x hSelf=%#x\n", iPage, idPage,
3599 pPage->Private.hGVM, pGVM->hSelf));
3600 rc = VERR_GMM_NOT_PAGE_OWNER;
3601 break;
3602 }
3603 }
3604 else if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3605 {
3606 Assert(pGVM->gmm.s.Stats.cSharedPages);
3607 Assert(pPage->Shared.cRefs);
3608#if defined(VBOX_WITH_PAGE_SHARING) && defined(VBOX_STRICT) && HC_ARCH_BITS == 64
3609 if (pPage->Shared.u14Checksum)
3610 {
3611 uint32_t uChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
3612 uChecksum &= UINT32_C(0x00003fff);
3613 AssertMsg(!uChecksum || uChecksum == pPage->Shared.u14Checksum,
3614 ("%#x vs %#x - idPage=%#x\n", uChecksum, pPage->Shared.u14Checksum, idPage));
3615 }
3616#endif
3617 pGVM->gmm.s.Stats.cSharedPages--;
3618 if (!--pPage->Shared.cRefs)
3619 gmmR0FreeSharedPage(pGMM, pGVM, idPage, pPage);
3620 else
3621 {
3622 Assert(pGMM->cDuplicatePages);
3623 pGMM->cDuplicatePages--;
3624 }
3625 }
3626 else
3627 {
3628 Log(("gmmR0AllocatePages: #%#x/%#x: already free!\n", iPage, idPage));
3629 rc = VERR_GMM_PAGE_ALREADY_FREE;
3630 break;
3631 }
3632 }
3633 else
3634 {
3635 Log(("gmmR0AllocatePages: #%#x/%#x: not found!\n", iPage, idPage));
3636 rc = VERR_GMM_PAGE_NOT_FOUND;
3637 break;
3638 }
3639 paPages[iPage].idPage = NIL_GMM_PAGEID;
3640 }
3641
3642 /*
3643 * Update the account.
3644 */
3645 switch (enmAccount)
3646 {
3647 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= iPage; break;
3648 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= iPage; break;
3649 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= iPage; break;
3650 default:
3651 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3652 }
3653
3654 /*
3655 * Any threshold stuff to be done here?
3656 */
3657
3658 return rc;
3659}
3660
3661
3662/**
3663 * Free one or more pages.
3664 *
3665 * This is typically used at reset time or power off.
3666 *
3667 * @returns VBox status code:
3668 * @retval xxx
3669 *
3670 * @param pGVM The global (ring-0) VM structure.
3671 * @param idCpu The VCPU id.
3672 * @param cPages The number of pages to allocate.
3673 * @param paPages Pointer to the page descriptors containing the page IDs
3674 * for each page.
3675 * @param enmAccount The account this relates to.
3676 * @thread EMT.
3677 */
3678GMMR0DECL(int) GMMR0FreePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3679{
3680 LogFlow(("GMMR0FreePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3681
3682 /*
3683 * Validate input and get the basics.
3684 */
3685 PGMM pGMM;
3686 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3687 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3688 if (RT_FAILURE(rc))
3689 return rc;
3690
3691 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3692 AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3693 AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3694
3695 for (unsigned iPage = 0; iPage < cPages; iPage++)
3696 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
3697 /*|| paPages[iPage].idPage == NIL_GMM_PAGEID*/,
3698 ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3699
3700 /*
3701 * Take the semaphore and call the worker function.
3702 */
3703 gmmR0MutexAcquire(pGMM);
3704 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3705 {
3706 rc = gmmR0FreePages(pGMM, pGVM, cPages, paPages, enmAccount);
3707 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3708 }
3709 else
3710 rc = VERR_GMM_IS_NOT_SANE;
3711 gmmR0MutexRelease(pGMM);
3712 LogFlow(("GMMR0FreePages: returns %Rrc\n", rc));
3713 return rc;
3714}
3715
3716
3717/**
3718 * VMMR0 request wrapper for GMMR0FreePages.
3719 *
3720 * @returns see GMMR0FreePages.
3721 * @param pGVM The global (ring-0) VM structure.
3722 * @param idCpu The VCPU id.
3723 * @param pReq Pointer to the request packet.
3724 */
3725GMMR0DECL(int) GMMR0FreePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMFREEPAGESREQ pReq)
3726{
3727 /*
3728 * Validate input and pass it on.
3729 */
3730 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3731 AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0]),
3732 ("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0])),
3733 VERR_INVALID_PARAMETER);
3734 AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages]),
3735 ("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages])),
3736 VERR_INVALID_PARAMETER);
3737
3738 return GMMR0FreePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3739}
3740
3741
3742/**
3743 * Report back on a memory ballooning request.
3744 *
3745 * The request may or may not have been initiated by the GMM. If it was initiated
3746 * by the GMM it is important that this function is called even if no pages were
3747 * ballooned.
3748 *
3749 * @returns VBox status code:
3750 * @retval VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH
3751 * @retval VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH
3752 * @retval VERR_GMM_OVERCOMMITTED_TRY_AGAIN_IN_A_BIT - reset condition
3753 * indicating that we won't necessarily have sufficient RAM to boot
3754 * the VM again and that it should pause until this changes (we'll try
3755 * balloon some other VM). (For standard deflate we have little choice
3756 * but to hope the VM won't use the memory that was returned to it.)
3757 *
3758 * @param pGVM The global (ring-0) VM structure.
3759 * @param idCpu The VCPU id.
3760 * @param enmAction Inflate/deflate/reset.
3761 * @param cBalloonedPages The number of pages that was ballooned.
3762 *
3763 * @thread EMT(idCpu)
3764 */
3765GMMR0DECL(int) GMMR0BalloonedPages(PGVM pGVM, VMCPUID idCpu, GMMBALLOONACTION enmAction, uint32_t cBalloonedPages)
3766{
3767 LogFlow(("GMMR0BalloonedPages: pGVM=%p enmAction=%d cBalloonedPages=%#x\n",
3768 pGVM, enmAction, cBalloonedPages));
3769
3770 AssertMsgReturn(cBalloonedPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cBalloonedPages), VERR_INVALID_PARAMETER);
3771
3772 /*
3773 * Validate input and get the basics.
3774 */
3775 PGMM pGMM;
3776 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3777 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3778 if (RT_FAILURE(rc))
3779 return rc;
3780
3781 /*
3782 * Take the semaphore and do some more validations.
3783 */
3784 gmmR0MutexAcquire(pGMM);
3785 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3786 {
3787 switch (enmAction)
3788 {
3789 case GMMBALLOONACTION_INFLATE:
3790 {
3791 if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cBalloonedPages
3792 <= pGVM->gmm.s.Stats.Reserved.cBasePages))
3793 {
3794 /*
3795 * Record the ballooned memory.
3796 */
3797 pGMM->cBalloonedPages += cBalloonedPages;
3798 if (pGVM->gmm.s.Stats.cReqBalloonedPages)
3799 {
3800 /* Codepath never taken. Might be interesting in the future to request ballooned memory from guests in low memory conditions.. */
3801 AssertFailed();
3802
3803 pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3804 pGVM->gmm.s.Stats.cReqActuallyBalloonedPages += cBalloonedPages;
3805 Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx Req=%#llx Actual=%#llx (pending)\n",
3806 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages,
3807 pGVM->gmm.s.Stats.cReqBalloonedPages, pGVM->gmm.s.Stats.cReqActuallyBalloonedPages));
3808 }
3809 else
3810 {
3811 pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3812 Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3813 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3814 }
3815 }
3816 else
3817 {
3818 Log(("GMMR0BalloonedPages: cBasePages=%#llx Total=%#llx cBalloonedPages=%#llx Reserved=%#llx\n",
3819 pGVM->gmm.s.Stats.Allocated.cBasePages, pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages,
3820 pGVM->gmm.s.Stats.Reserved.cBasePages));
3821 rc = VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3822 }
3823 break;
3824 }
3825
3826 case GMMBALLOONACTION_DEFLATE:
3827 {
3828 /* Deflate. */
3829 if (pGVM->gmm.s.Stats.cBalloonedPages >= cBalloonedPages)
3830 {
3831 /*
3832 * Record the ballooned memory.
3833 */
3834 Assert(pGMM->cBalloonedPages >= cBalloonedPages);
3835 pGMM->cBalloonedPages -= cBalloonedPages;
3836 pGVM->gmm.s.Stats.cBalloonedPages -= cBalloonedPages;
3837 if (pGVM->gmm.s.Stats.cReqDeflatePages)
3838 {
3839 AssertFailed(); /* This is path is for later. */
3840 Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx Req=%#llx\n",
3841 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages, pGVM->gmm.s.Stats.cReqDeflatePages));
3842
3843 /*
3844 * Anything we need to do here now when the request has been completed?
3845 */
3846 pGVM->gmm.s.Stats.cReqDeflatePages = 0;
3847 }
3848 else
3849 Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3850 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3851 }
3852 else
3853 {
3854 Log(("GMMR0BalloonedPages: Total=%#llx cBalloonedPages=%#llx\n", pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages));
3855 rc = VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH;
3856 }
3857 break;
3858 }
3859
3860 case GMMBALLOONACTION_RESET:
3861 {
3862 /* Reset to an empty balloon. */
3863 Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
3864
3865 pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
3866 pGVM->gmm.s.Stats.cBalloonedPages = 0;
3867 break;
3868 }
3869
3870 default:
3871 rc = VERR_INVALID_PARAMETER;
3872 break;
3873 }
3874 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3875 }
3876 else
3877 rc = VERR_GMM_IS_NOT_SANE;
3878
3879 gmmR0MutexRelease(pGMM);
3880 LogFlow(("GMMR0BalloonedPages: returns %Rrc\n", rc));
3881 return rc;
3882}
3883
3884
3885/**
3886 * VMMR0 request wrapper for GMMR0BalloonedPages.
3887 *
3888 * @returns see GMMR0BalloonedPages.
3889 * @param pGVM The global (ring-0) VM structure.
3890 * @param idCpu The VCPU id.
3891 * @param pReq Pointer to the request packet.
3892 */
3893GMMR0DECL(int) GMMR0BalloonedPagesReq(PGVM pGVM, VMCPUID idCpu, PGMMBALLOONEDPAGESREQ pReq)
3894{
3895 /*
3896 * Validate input and pass it on.
3897 */
3898 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3899 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMBALLOONEDPAGESREQ),
3900 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMBALLOONEDPAGESREQ)),
3901 VERR_INVALID_PARAMETER);
3902
3903 return GMMR0BalloonedPages(pGVM, idCpu, pReq->enmAction, pReq->cBalloonedPages);
3904}
3905
3906
3907/**
3908 * Return memory statistics for the hypervisor
3909 *
3910 * @returns VBox status code.
3911 * @param pReq Pointer to the request packet.
3912 */
3913GMMR0DECL(int) GMMR0QueryHypervisorMemoryStatsReq(PGMMMEMSTATSREQ pReq)
3914{
3915 /*
3916 * Validate input and pass it on.
3917 */
3918 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3919 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3920 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3921 VERR_INVALID_PARAMETER);
3922
3923 /*
3924 * Validate input and get the basics.
3925 */
3926 PGMM pGMM;
3927 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3928 pReq->cAllocPages = pGMM->cAllocatedPages;
3929 pReq->cFreePages = (pGMM->cChunks << (GMM_CHUNK_SHIFT- PAGE_SHIFT)) - pGMM->cAllocatedPages;
3930 pReq->cBalloonedPages = pGMM->cBalloonedPages;
3931 pReq->cMaxPages = pGMM->cMaxPages;
3932 pReq->cSharedPages = pGMM->cDuplicatePages;
3933 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3934
3935 return VINF_SUCCESS;
3936}
3937
3938
3939/**
3940 * Return memory statistics for the VM
3941 *
3942 * @returns VBox status code.
3943 * @param pGVM The global (ring-0) VM structure.
3944 * @param idCpu Cpu id.
3945 * @param pReq Pointer to the request packet.
3946 *
3947 * @thread EMT(idCpu)
3948 */
3949GMMR0DECL(int) GMMR0QueryMemoryStatsReq(PGVM pGVM, VMCPUID idCpu, PGMMMEMSTATSREQ pReq)
3950{
3951 /*
3952 * Validate input and pass it on.
3953 */
3954 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3955 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3956 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3957 VERR_INVALID_PARAMETER);
3958
3959 /*
3960 * Validate input and get the basics.
3961 */
3962 PGMM pGMM;
3963 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3964 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3965 if (RT_FAILURE(rc))
3966 return rc;
3967
3968 /*
3969 * Take the semaphore and do some more validations.
3970 */
3971 gmmR0MutexAcquire(pGMM);
3972 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3973 {
3974 pReq->cAllocPages = pGVM->gmm.s.Stats.Allocated.cBasePages;
3975 pReq->cBalloonedPages = pGVM->gmm.s.Stats.cBalloonedPages;
3976 pReq->cMaxPages = pGVM->gmm.s.Stats.Reserved.cBasePages;
3977 pReq->cFreePages = pReq->cMaxPages - pReq->cAllocPages;
3978 }
3979 else
3980 rc = VERR_GMM_IS_NOT_SANE;
3981
3982 gmmR0MutexRelease(pGMM);
3983 LogFlow(("GMMR3QueryVMMemoryStats: returns %Rrc\n", rc));
3984 return rc;
3985}
3986
3987
3988/**
3989 * Worker for gmmR0UnmapChunk and gmmr0FreeChunk.
3990 *
3991 * Don't call this in legacy allocation mode!
3992 *
3993 * @returns VBox status code.
3994 * @param pGMM Pointer to the GMM instance data.
3995 * @param pGVM Pointer to the Global VM structure.
3996 * @param pChunk Pointer to the chunk to be unmapped.
3997 */
3998static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
3999{
4000 RT_NOREF_PV(pGMM);
4001#ifdef GMM_WITH_LEGACY_MODE
4002 Assert(!pGMM->fLegacyAllocationMode || (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE));
4003#endif
4004
4005 /*
4006 * Find the mapping and try unmapping it.
4007 */
4008 uint32_t cMappings = pChunk->cMappingsX;
4009 for (uint32_t i = 0; i < cMappings; i++)
4010 {
4011 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4012 if (pChunk->paMappingsX[i].pGVM == pGVM)
4013 {
4014 /* unmap */
4015 int rc = RTR0MemObjFree(pChunk->paMappingsX[i].hMapObj, false /* fFreeMappings (NA) */);
4016 if (RT_SUCCESS(rc))
4017 {
4018 /* update the record. */
4019 cMappings--;
4020 if (i < cMappings)
4021 pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
4022 pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
4023 pChunk->paMappingsX[cMappings].pGVM = NULL;
4024 Assert(pChunk->cMappingsX - 1U == cMappings);
4025 pChunk->cMappingsX = cMappings;
4026 }
4027
4028 return rc;
4029 }
4030 }
4031
4032 Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
4033 return VERR_GMM_CHUNK_NOT_MAPPED;
4034}
4035
4036
4037/**
4038 * Unmaps a chunk previously mapped into the address space of the current process.
4039 *
4040 * @returns VBox status code.
4041 * @param pGMM Pointer to the GMM instance data.
4042 * @param pGVM Pointer to the Global VM structure.
4043 * @param pChunk Pointer to the chunk to be unmapped.
4044 * @param fRelaxedSem Whether we can release the semaphore while doing the
4045 * mapping (@c true) or not.
4046 */
4047static int gmmR0UnmapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
4048{
4049#ifdef GMM_WITH_LEGACY_MODE
4050 if (!pGMM->fLegacyAllocationMode || (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
4051 {
4052#endif
4053 /*
4054 * Lock the chunk and if possible leave the giant GMM lock.
4055 */
4056 GMMR0CHUNKMTXSTATE MtxState;
4057 int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4058 fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4059 if (RT_SUCCESS(rc))
4060 {
4061 rc = gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
4062 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4063 }
4064 return rc;
4065#ifdef GMM_WITH_LEGACY_MODE
4066 }
4067
4068 if (pChunk->hGVM == pGVM->hSelf)
4069 return VINF_SUCCESS;
4070
4071 Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x (legacy)\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
4072 return VERR_GMM_CHUNK_NOT_MAPPED;
4073#endif
4074}
4075
4076
4077/**
4078 * Worker for gmmR0MapChunk.
4079 *
4080 * @returns VBox status code.
4081 * @param pGMM Pointer to the GMM instance data.
4082 * @param pGVM Pointer to the Global VM structure.
4083 * @param pChunk Pointer to the chunk to be mapped.
4084 * @param ppvR3 Where to store the ring-3 address of the mapping.
4085 * In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4086 * contain the address of the existing mapping.
4087 */
4088static int gmmR0MapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4089{
4090#ifdef GMM_WITH_LEGACY_MODE
4091 /*
4092 * If we're in legacy mode this is simple.
4093 */
4094 if (pGMM->fLegacyAllocationMode && !(pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
4095 {
4096 if (pChunk->hGVM != pGVM->hSelf)
4097 {
4098 Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4099 return VERR_GMM_CHUNK_NOT_FOUND;
4100 }
4101
4102 *ppvR3 = RTR0MemObjAddressR3(pChunk->hMemObj);
4103 return VINF_SUCCESS;
4104 }
4105#else
4106 RT_NOREF(pGMM);
4107#endif
4108
4109 /*
4110 * Check to see if the chunk is already mapped.
4111 */
4112 for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4113 {
4114 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4115 if (pChunk->paMappingsX[i].pGVM == pGVM)
4116 {
4117 *ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4118 Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4119#ifdef VBOX_WITH_PAGE_SHARING
4120 /* The ring-3 chunk cache can be out of sync; don't fail. */
4121 return VINF_SUCCESS;
4122#else
4123 return VERR_GMM_CHUNK_ALREADY_MAPPED;
4124#endif
4125 }
4126 }
4127
4128 /*
4129 * Do the mapping.
4130 */
4131 RTR0MEMOBJ hMapObj;
4132 int rc = RTR0MemObjMapUser(&hMapObj, pChunk->hMemObj, (RTR3PTR)-1, 0, RTMEM_PROT_READ | RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4133 if (RT_SUCCESS(rc))
4134 {
4135 /* reallocate the array? assumes few users per chunk (usually one). */
4136 unsigned iMapping = pChunk->cMappingsX;
4137 if ( iMapping <= 3
4138 || (iMapping & 3) == 0)
4139 {
4140 unsigned cNewSize = iMapping <= 3
4141 ? iMapping + 1
4142 : iMapping + 4;
4143 Assert(cNewSize < 4 || RT_ALIGN_32(cNewSize, 4) == cNewSize);
4144 if (RT_UNLIKELY(cNewSize > UINT16_MAX))
4145 {
4146 rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4147 return VERR_GMM_TOO_MANY_CHUNK_MAPPINGS;
4148 }
4149
4150 void *pvMappings = RTMemRealloc(pChunk->paMappingsX, cNewSize * sizeof(pChunk->paMappingsX[0]));
4151 if (RT_UNLIKELY(!pvMappings))
4152 {
4153 rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4154 return VERR_NO_MEMORY;
4155 }
4156 pChunk->paMappingsX = (PGMMCHUNKMAP)pvMappings;
4157 }
4158
4159 /* insert new entry */
4160 pChunk->paMappingsX[iMapping].hMapObj = hMapObj;
4161 pChunk->paMappingsX[iMapping].pGVM = pGVM;
4162 Assert(pChunk->cMappingsX == iMapping);
4163 pChunk->cMappingsX = iMapping + 1;
4164
4165 *ppvR3 = RTR0MemObjAddressR3(hMapObj);
4166 }
4167
4168 return rc;
4169}
4170
4171
4172/**
4173 * Maps a chunk into the user address space of the current process.
4174 *
4175 * @returns VBox status code.
4176 * @param pGMM Pointer to the GMM instance data.
4177 * @param pGVM Pointer to the Global VM structure.
4178 * @param pChunk Pointer to the chunk to be mapped.
4179 * @param fRelaxedSem Whether we can release the semaphore while doing the
4180 * mapping (@c true) or not.
4181 * @param ppvR3 Where to store the ring-3 address of the mapping.
4182 * In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4183 * contain the address of the existing mapping.
4184 */
4185static int gmmR0MapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem, PRTR3PTR ppvR3)
4186{
4187 /*
4188 * Take the chunk lock and leave the giant GMM lock when possible, then
4189 * call the worker function.
4190 */
4191 GMMR0CHUNKMTXSTATE MtxState;
4192 int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4193 fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4194 if (RT_SUCCESS(rc))
4195 {
4196 rc = gmmR0MapChunkLocked(pGMM, pGVM, pChunk, ppvR3);
4197 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4198 }
4199
4200 return rc;
4201}
4202
4203
4204
4205#if defined(VBOX_WITH_PAGE_SHARING) || (defined(VBOX_STRICT) && HC_ARCH_BITS == 64)
4206/**
4207 * Check if a chunk is mapped into the specified VM
4208 *
4209 * @returns mapped yes/no
4210 * @param pGMM Pointer to the GMM instance.
4211 * @param pGVM Pointer to the Global VM structure.
4212 * @param pChunk Pointer to the chunk to be mapped.
4213 * @param ppvR3 Where to store the ring-3 address of the mapping.
4214 */
4215static bool gmmR0IsChunkMapped(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4216{
4217 GMMR0CHUNKMTXSTATE MtxState;
4218 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
4219 for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4220 {
4221 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4222 if (pChunk->paMappingsX[i].pGVM == pGVM)
4223 {
4224 *ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4225 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4226 return true;
4227 }
4228 }
4229 *ppvR3 = NULL;
4230 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4231 return false;
4232}
4233#endif /* VBOX_WITH_PAGE_SHARING || (VBOX_STRICT && 64-BIT) */
4234
4235
4236/**
4237 * Map a chunk and/or unmap another chunk.
4238 *
4239 * The mapping and unmapping applies to the current process.
4240 *
4241 * This API does two things because it saves a kernel call per mapping when
4242 * when the ring-3 mapping cache is full.
4243 *
4244 * @returns VBox status code.
4245 * @param pGVM The global (ring-0) VM structure.
4246 * @param idChunkMap The chunk to map. NIL_GMM_CHUNKID if nothing to map.
4247 * @param idChunkUnmap The chunk to unmap. NIL_GMM_CHUNKID if nothing to unmap.
4248 * @param ppvR3 Where to store the address of the mapped chunk. NULL is ok if nothing to map.
4249 * @thread EMT ???
4250 */
4251GMMR0DECL(int) GMMR0MapUnmapChunk(PGVM pGVM, uint32_t idChunkMap, uint32_t idChunkUnmap, PRTR3PTR ppvR3)
4252{
4253 LogFlow(("GMMR0MapUnmapChunk: pGVM=%p idChunkMap=%#x idChunkUnmap=%#x ppvR3=%p\n",
4254 pGVM, idChunkMap, idChunkUnmap, ppvR3));
4255
4256 /*
4257 * Validate input and get the basics.
4258 */
4259 PGMM pGMM;
4260 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4261 int rc = GVMMR0ValidateGVM(pGVM);
4262 if (RT_FAILURE(rc))
4263 return rc;
4264
4265 AssertCompile(NIL_GMM_CHUNKID == 0);
4266 AssertMsgReturn(idChunkMap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkMap), VERR_INVALID_PARAMETER);
4267 AssertMsgReturn(idChunkUnmap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkUnmap), VERR_INVALID_PARAMETER);
4268
4269 if ( idChunkMap == NIL_GMM_CHUNKID
4270 && idChunkUnmap == NIL_GMM_CHUNKID)
4271 return VERR_INVALID_PARAMETER;
4272
4273 if (idChunkMap != NIL_GMM_CHUNKID)
4274 {
4275 AssertPtrReturn(ppvR3, VERR_INVALID_POINTER);
4276 *ppvR3 = NIL_RTR3PTR;
4277 }
4278
4279 /*
4280 * Take the semaphore and do the work.
4281 *
4282 * The unmapping is done last since it's easier to undo a mapping than
4283 * undoing an unmapping. The ring-3 mapping cache cannot not be so big
4284 * that it pushes the user virtual address space to within a chunk of
4285 * it it's limits, so, no problem here.
4286 */
4287 gmmR0MutexAcquire(pGMM);
4288 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4289 {
4290 PGMMCHUNK pMap = NULL;
4291 if (idChunkMap != NIL_GVM_HANDLE)
4292 {
4293 pMap = gmmR0GetChunk(pGMM, idChunkMap);
4294 if (RT_LIKELY(pMap))
4295 rc = gmmR0MapChunk(pGMM, pGVM, pMap, true /*fRelaxedSem*/, ppvR3);
4296 else
4297 {
4298 Log(("GMMR0MapUnmapChunk: idChunkMap=%#x\n", idChunkMap));
4299 rc = VERR_GMM_CHUNK_NOT_FOUND;
4300 }
4301 }
4302/** @todo split this operation, the bail out might (theoretcially) not be
4303 * entirely safe. */
4304
4305 if ( idChunkUnmap != NIL_GMM_CHUNKID
4306 && RT_SUCCESS(rc))
4307 {
4308 PGMMCHUNK pUnmap = gmmR0GetChunk(pGMM, idChunkUnmap);
4309 if (RT_LIKELY(pUnmap))
4310 rc = gmmR0UnmapChunk(pGMM, pGVM, pUnmap, true /*fRelaxedSem*/);
4311 else
4312 {
4313 Log(("GMMR0MapUnmapChunk: idChunkUnmap=%#x\n", idChunkUnmap));
4314 rc = VERR_GMM_CHUNK_NOT_FOUND;
4315 }
4316
4317 if (RT_FAILURE(rc) && pMap)
4318 gmmR0UnmapChunk(pGMM, pGVM, pMap, false /*fRelaxedSem*/);
4319 }
4320
4321 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4322 }
4323 else
4324 rc = VERR_GMM_IS_NOT_SANE;
4325 gmmR0MutexRelease(pGMM);
4326
4327 LogFlow(("GMMR0MapUnmapChunk: returns %Rrc\n", rc));
4328 return rc;
4329}
4330
4331
4332/**
4333 * VMMR0 request wrapper for GMMR0MapUnmapChunk.
4334 *
4335 * @returns see GMMR0MapUnmapChunk.
4336 * @param pGVM The global (ring-0) VM structure.
4337 * @param pReq Pointer to the request packet.
4338 */
4339GMMR0DECL(int) GMMR0MapUnmapChunkReq(PGVM pGVM, PGMMMAPUNMAPCHUNKREQ pReq)
4340{
4341 /*
4342 * Validate input and pass it on.
4343 */
4344 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4345 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4346
4347 return GMMR0MapUnmapChunk(pGVM, pReq->idChunkMap, pReq->idChunkUnmap, &pReq->pvR3);
4348}
4349
4350
4351/**
4352 * Legacy mode API for supplying pages.
4353 *
4354 * The specified user address points to a allocation chunk sized block that
4355 * will be locked down and used by the GMM when the GM asks for pages.
4356 *
4357 * @returns VBox status code.
4358 * @param pGVM The global (ring-0) VM structure.
4359 * @param idCpu The VCPU id.
4360 * @param pvR3 Pointer to the chunk size memory block to lock down.
4361 */
4362GMMR0DECL(int) GMMR0SeedChunk(PGVM pGVM, VMCPUID idCpu, RTR3PTR pvR3)
4363{
4364#ifdef GMM_WITH_LEGACY_MODE
4365 /*
4366 * Validate input and get the basics.
4367 */
4368 PGMM pGMM;
4369 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4370 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4371 if (RT_FAILURE(rc))
4372 return rc;
4373
4374 AssertPtrReturn(pvR3, VERR_INVALID_POINTER);
4375 AssertReturn(!(PAGE_OFFSET_MASK & pvR3), VERR_INVALID_POINTER);
4376
4377 if (!pGMM->fLegacyAllocationMode)
4378 {
4379 Log(("GMMR0SeedChunk: not in legacy allocation mode!\n"));
4380 return VERR_NOT_SUPPORTED;
4381 }
4382
4383 /*
4384 * Lock the memory and add it as new chunk with our hGVM.
4385 * (The GMM locking is done inside gmmR0RegisterChunk.)
4386 */
4387 RTR0MEMOBJ hMemObj;
4388 rc = RTR0MemObjLockUser(&hMemObj, pvR3, GMM_CHUNK_SIZE, RTMEM_PROT_READ | RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4389 if (RT_SUCCESS(rc))
4390 {
4391 rc = gmmR0RegisterChunk(pGMM, &pGVM->gmm.s.Private, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_SEEDED, NULL);
4392 if (RT_SUCCESS(rc))
4393 gmmR0MutexRelease(pGMM);
4394 else
4395 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
4396 }
4397
4398 LogFlow(("GMMR0SeedChunk: rc=%d (pvR3=%p)\n", rc, pvR3));
4399 return rc;
4400#else
4401 RT_NOREF(pGVM, idCpu, pvR3);
4402 return VERR_NOT_SUPPORTED;
4403#endif
4404}
4405
4406#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
4407
4408/**
4409 * Gets the ring-0 virtual address for the given page.
4410 *
4411 * @returns VBox status code.
4412 * @param pGVM Pointer to the kernel-only VM instace data.
4413 * @param idPage The page ID.
4414 * @param ppv Where to store the address.
4415 * @thread EMT
4416 */
4417GMMR0DECL(int) GMMR0PageIdToVirt(PGVM pGVM, uint32_t idPage, void **ppv)
4418{
4419 *ppv = NULL;
4420 PGMM pGMM;
4421 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4422 gmmR0MutexAcquire(pGMM); /** @todo shared access */
4423
4424 int rc;
4425 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4426 if (pChunk)
4427 {
4428 const GMMPAGE *pPage = &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
4429 if (RT_LIKELY( ( GMM_PAGE_IS_PRIVATE(pPage)
4430 && pPage->Private.hGVM == pGVM->hSelf)
4431 || GMM_PAGE_IS_SHARED(pPage)))
4432 {
4433 AssertPtr(pChunk->pbMapping);
4434 *ppv = &pChunk->pbMapping[(idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT];
4435 rc = VINF_SUCCESS;
4436 }
4437 else
4438 rc = VERR_GMM_NOT_PAGE_OWNER;
4439 }
4440 else
4441 rc = VERR_GMM_PAGE_NOT_FOUND;
4442
4443 gmmR0MutexRelease(pGMM);
4444 return rc;
4445}
4446
4447#endif
4448
4449#ifdef VBOX_WITH_PAGE_SHARING
4450
4451# ifdef VBOX_STRICT
4452/**
4453 * For checksumming shared pages in strict builds.
4454 *
4455 * The purpose is making sure that a page doesn't change.
4456 *
4457 * @returns Checksum, 0 on failure.
4458 * @param pGMM The GMM instance data.
4459 * @param pGVM Pointer to the kernel-only VM instace data.
4460 * @param idPage The page ID.
4461 */
4462static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage)
4463{
4464 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4465 AssertMsgReturn(pChunk, ("idPage=%#x\n", idPage), 0);
4466
4467 uint8_t *pbChunk;
4468 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4469 return 0;
4470 uint8_t const *pbPage = pbChunk + ((idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4471
4472 return RTCrc32(pbPage, PAGE_SIZE);
4473}
4474# endif /* VBOX_STRICT */
4475
4476
4477/**
4478 * Calculates the module hash value.
4479 *
4480 * @returns Hash value.
4481 * @param pszModuleName The module name.
4482 * @param pszVersion The module version string.
4483 */
4484static uint32_t gmmR0ShModCalcHash(const char *pszModuleName, const char *pszVersion)
4485{
4486 return RTStrHash1ExN(3, pszModuleName, RTSTR_MAX, "::", (size_t)2, pszVersion, RTSTR_MAX);
4487}
4488
4489
4490/**
4491 * Finds a global module.
4492 *
4493 * @returns Pointer to the global module on success, NULL if not found.
4494 * @param pGMM The GMM instance data.
4495 * @param uHash The hash as calculated by gmmR0ShModCalcHash.
4496 * @param cbModule The module size.
4497 * @param enmGuestOS The guest OS type.
4498 * @param cRegions The number of regions.
4499 * @param pszModuleName The module name.
4500 * @param pszVersion The module version.
4501 * @param paRegions The region descriptions.
4502 */
4503static PGMMSHAREDMODULE gmmR0ShModFindGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4504 uint32_t cRegions, const char *pszModuleName, const char *pszVersion,
4505 struct VMMDEVSHAREDREGIONDESC const *paRegions)
4506{
4507 for (PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTAvllU32Get(&pGMM->pGlobalSharedModuleTree, uHash);
4508 pGblMod;
4509 pGblMod = (PGMMSHAREDMODULE)pGblMod->Core.pList)
4510 {
4511 if (pGblMod->cbModule != cbModule)
4512 continue;
4513 if (pGblMod->enmGuestOS != enmGuestOS)
4514 continue;
4515 if (pGblMod->cRegions != cRegions)
4516 continue;
4517 if (strcmp(pGblMod->szName, pszModuleName))
4518 continue;
4519 if (strcmp(pGblMod->szVersion, pszVersion))
4520 continue;
4521
4522 uint32_t i;
4523 for (i = 0; i < cRegions; i++)
4524 {
4525 uint32_t off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4526 if (pGblMod->aRegions[i].off != off)
4527 break;
4528
4529 uint32_t cb = RT_ALIGN_32(paRegions[i].cbRegion + off, PAGE_SIZE);
4530 if (pGblMod->aRegions[i].cb != cb)
4531 break;
4532 }
4533
4534 if (i == cRegions)
4535 return pGblMod;
4536 }
4537
4538 return NULL;
4539}
4540
4541
4542/**
4543 * Creates a new global module.
4544 *
4545 * @returns VBox status code.
4546 * @param pGMM The GMM instance data.
4547 * @param uHash The hash as calculated by gmmR0ShModCalcHash.
4548 * @param cbModule The module size.
4549 * @param enmGuestOS The guest OS type.
4550 * @param cRegions The number of regions.
4551 * @param pszModuleName The module name.
4552 * @param pszVersion The module version.
4553 * @param paRegions The region descriptions.
4554 * @param ppGblMod Where to return the new module on success.
4555 */
4556static int gmmR0ShModNewGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4557 uint32_t cRegions, const char *pszModuleName, const char *pszVersion,
4558 struct VMMDEVSHAREDREGIONDESC const *paRegions, PGMMSHAREDMODULE *ppGblMod)
4559{
4560 Log(("gmmR0ShModNewGlobal: %s %s size %#x os %u rgn %u\n", pszModuleName, pszVersion, cbModule, enmGuestOS, cRegions));
4561 if (pGMM->cShareableModules >= GMM_MAX_SHARED_GLOBAL_MODULES)
4562 {
4563 Log(("gmmR0ShModNewGlobal: Too many modules\n"));
4564 return VERR_GMM_TOO_MANY_GLOBAL_MODULES;
4565 }
4566
4567 PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULE, aRegions[cRegions]));
4568 if (!pGblMod)
4569 {
4570 Log(("gmmR0ShModNewGlobal: No memory\n"));
4571 return VERR_NO_MEMORY;
4572 }
4573
4574 pGblMod->Core.Key = uHash;
4575 pGblMod->cbModule = cbModule;
4576 pGblMod->cRegions = cRegions;
4577 pGblMod->cUsers = 1;
4578 pGblMod->enmGuestOS = enmGuestOS;
4579 strcpy(pGblMod->szName, pszModuleName);
4580 strcpy(pGblMod->szVersion, pszVersion);
4581
4582 for (uint32_t i = 0; i < cRegions; i++)
4583 {
4584 Log(("gmmR0ShModNewGlobal: rgn[%u]=%RGvLB%#x\n", i, paRegions[i].GCRegionAddr, paRegions[i].cbRegion));
4585 pGblMod->aRegions[i].off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4586 pGblMod->aRegions[i].cb = paRegions[i].cbRegion + pGblMod->aRegions[i].off;
4587 pGblMod->aRegions[i].cb = RT_ALIGN_32(pGblMod->aRegions[i].cb, PAGE_SIZE);
4588 pGblMod->aRegions[i].paidPages = NULL; /* allocated when needed. */
4589 }
4590
4591 bool fInsert = RTAvllU32Insert(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4592 Assert(fInsert); NOREF(fInsert);
4593 pGMM->cShareableModules++;
4594
4595 *ppGblMod = pGblMod;
4596 return VINF_SUCCESS;
4597}
4598
4599
4600/**
4601 * Deletes a global module which is no longer referenced by anyone.
4602 *
4603 * @param pGMM The GMM instance data.
4604 * @param pGblMod The module to delete.
4605 */
4606static void gmmR0ShModDeleteGlobal(PGMM pGMM, PGMMSHAREDMODULE pGblMod)
4607{
4608 Assert(pGblMod->cUsers == 0);
4609 Assert(pGMM->cShareableModules > 0 && pGMM->cShareableModules <= GMM_MAX_SHARED_GLOBAL_MODULES);
4610
4611 void *pvTest = RTAvllU32RemoveNode(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4612 Assert(pvTest == pGblMod); NOREF(pvTest);
4613 pGMM->cShareableModules--;
4614
4615 uint32_t i = pGblMod->cRegions;
4616 while (i-- > 0)
4617 {
4618 if (pGblMod->aRegions[i].paidPages)
4619 {
4620 /* We don't doing anything to the pages as they are handled by the
4621 copy-on-write mechanism in PGM. */
4622 RTMemFree(pGblMod->aRegions[i].paidPages);
4623 pGblMod->aRegions[i].paidPages = NULL;
4624 }
4625 }
4626 RTMemFree(pGblMod);
4627}
4628
4629
4630static int gmmR0ShModNewPerVM(PGVM pGVM, RTGCPTR GCBaseAddr, uint32_t cRegions, const VMMDEVSHAREDREGIONDESC *paRegions,
4631 PGMMSHAREDMODULEPERVM *ppRecVM)
4632{
4633 if (pGVM->gmm.s.Stats.cShareableModules >= GMM_MAX_SHARED_PER_VM_MODULES)
4634 return VERR_GMM_TOO_MANY_PER_VM_MODULES;
4635
4636 PGMMSHAREDMODULEPERVM pRecVM;
4637 pRecVM = (PGMMSHAREDMODULEPERVM)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULEPERVM, aRegionsGCPtrs[cRegions]));
4638 if (!pRecVM)
4639 return VERR_NO_MEMORY;
4640
4641 pRecVM->Core.Key = GCBaseAddr;
4642 for (uint32_t i = 0; i < cRegions; i++)
4643 pRecVM->aRegionsGCPtrs[i] = paRegions[i].GCRegionAddr;
4644
4645 bool fInsert = RTAvlGCPtrInsert(&pGVM->gmm.s.pSharedModuleTree, &pRecVM->Core);
4646 Assert(fInsert); NOREF(fInsert);
4647 pGVM->gmm.s.Stats.cShareableModules++;
4648
4649 *ppRecVM = pRecVM;
4650 return VINF_SUCCESS;
4651}
4652
4653
4654static void gmmR0ShModDeletePerVM(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULEPERVM pRecVM, bool fRemove)
4655{
4656 /*
4657 * Free the per-VM module.
4658 */
4659 PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
4660 pRecVM->pGlobalModule = NULL;
4661
4662 if (fRemove)
4663 {
4664 void *pvTest = RTAvlGCPtrRemove(&pGVM->gmm.s.pSharedModuleTree, pRecVM->Core.Key);
4665 Assert(pvTest == &pRecVM->Core); NOREF(pvTest);
4666 }
4667
4668 RTMemFree(pRecVM);
4669
4670 /*
4671 * Release the global module.
4672 * (In the registration bailout case, it might not be.)
4673 */
4674 if (pGblMod)
4675 {
4676 Assert(pGblMod->cUsers > 0);
4677 pGblMod->cUsers--;
4678 if (pGblMod->cUsers == 0)
4679 gmmR0ShModDeleteGlobal(pGMM, pGblMod);
4680 }
4681}
4682
4683#endif /* VBOX_WITH_PAGE_SHARING */
4684
4685/**
4686 * Registers a new shared module for the VM.
4687 *
4688 * @returns VBox status code.
4689 * @param pGVM The global (ring-0) VM structure.
4690 * @param idCpu The VCPU id.
4691 * @param enmGuestOS The guest OS type.
4692 * @param pszModuleName The module name.
4693 * @param pszVersion The module version.
4694 * @param GCPtrModBase The module base address.
4695 * @param cbModule The module size.
4696 * @param cRegions The mumber of shared region descriptors.
4697 * @param paRegions Pointer to an array of shared region(s).
4698 * @thread EMT(idCpu)
4699 */
4700GMMR0DECL(int) GMMR0RegisterSharedModule(PGVM pGVM, VMCPUID idCpu, VBOXOSFAMILY enmGuestOS, char *pszModuleName,
4701 char *pszVersion, RTGCPTR GCPtrModBase, uint32_t cbModule,
4702 uint32_t cRegions, struct VMMDEVSHAREDREGIONDESC const *paRegions)
4703{
4704#ifdef VBOX_WITH_PAGE_SHARING
4705 /*
4706 * Validate input and get the basics.
4707 *
4708 * Note! Turns out the module size does necessarily match the size of the
4709 * regions. (iTunes on XP)
4710 */
4711 PGMM pGMM;
4712 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4713 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4714 if (RT_FAILURE(rc))
4715 return rc;
4716
4717 if (RT_UNLIKELY(cRegions > VMMDEVSHAREDREGIONDESC_MAX))
4718 return VERR_GMM_TOO_MANY_REGIONS;
4719
4720 if (RT_UNLIKELY(cbModule == 0 || cbModule > _1G))
4721 return VERR_GMM_BAD_SHARED_MODULE_SIZE;
4722
4723 uint32_t cbTotal = 0;
4724 for (uint32_t i = 0; i < cRegions; i++)
4725 {
4726 if (RT_UNLIKELY(paRegions[i].cbRegion == 0 || paRegions[i].cbRegion > _1G))
4727 return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4728
4729 cbTotal += paRegions[i].cbRegion;
4730 if (RT_UNLIKELY(cbTotal > _1G))
4731 return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4732 }
4733
4734 AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4735 if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4736 return VERR_GMM_MODULE_NAME_TOO_LONG;
4737
4738 AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4739 if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4740 return VERR_GMM_MODULE_NAME_TOO_LONG;
4741
4742 uint32_t const uHash = gmmR0ShModCalcHash(pszModuleName, pszVersion);
4743 Log(("GMMR0RegisterSharedModule %s %s base %RGv size %x hash %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule, uHash));
4744
4745 /*
4746 * Take the semaphore and do some more validations.
4747 */
4748 gmmR0MutexAcquire(pGMM);
4749 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4750 {
4751 /*
4752 * Check if this module is already locally registered and register
4753 * it if it isn't. The base address is a unique module identifier
4754 * locally.
4755 */
4756 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4757 bool fNewModule = pRecVM == NULL;
4758 if (fNewModule)
4759 {
4760 rc = gmmR0ShModNewPerVM(pGVM, GCPtrModBase, cRegions, paRegions, &pRecVM);
4761 if (RT_SUCCESS(rc))
4762 {
4763 /*
4764 * Find a matching global module, register a new one if needed.
4765 */
4766 PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4767 pszModuleName, pszVersion, paRegions);
4768 if (!pGblMod)
4769 {
4770 Assert(fNewModule);
4771 rc = gmmR0ShModNewGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4772 pszModuleName, pszVersion, paRegions, &pGblMod);
4773 if (RT_SUCCESS(rc))
4774 {
4775 pRecVM->pGlobalModule = pGblMod; /* (One referenced returned by gmmR0ShModNewGlobal.) */
4776 Log(("GMMR0RegisterSharedModule: new module %s %s\n", pszModuleName, pszVersion));
4777 }
4778 else
4779 gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /*fRemove*/);
4780 }
4781 else
4782 {
4783 Assert(pGblMod->cUsers > 0 && pGblMod->cUsers < UINT32_MAX / 2);
4784 pGblMod->cUsers++;
4785 pRecVM->pGlobalModule = pGblMod;
4786
4787 Log(("GMMR0RegisterSharedModule: new per vm module %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4788 }
4789 }
4790 }
4791 else
4792 {
4793 /*
4794 * Attempt to re-register an existing module.
4795 */
4796 PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4797 pszModuleName, pszVersion, paRegions);
4798 if (pRecVM->pGlobalModule == pGblMod)
4799 {
4800 Log(("GMMR0RegisterSharedModule: already registered %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4801 rc = VINF_GMM_SHARED_MODULE_ALREADY_REGISTERED;
4802 }
4803 else
4804 {
4805 /** @todo may have to unregister+register when this happens in case it's caused
4806 * by VBoxService crashing and being restarted... */
4807 Log(("GMMR0RegisterSharedModule: Address clash!\n"
4808 " incoming at %RGvLB%#x %s %s rgns %u\n"
4809 " existing at %RGvLB%#x %s %s rgns %u\n",
4810 GCPtrModBase, cbModule, pszModuleName, pszVersion, cRegions,
4811 pRecVM->Core.Key, pRecVM->pGlobalModule->cbModule, pRecVM->pGlobalModule->szName,
4812 pRecVM->pGlobalModule->szVersion, pRecVM->pGlobalModule->cRegions));
4813 rc = VERR_GMM_SHARED_MODULE_ADDRESS_CLASH;
4814 }
4815 }
4816 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4817 }
4818 else
4819 rc = VERR_GMM_IS_NOT_SANE;
4820
4821 gmmR0MutexRelease(pGMM);
4822 return rc;
4823#else
4824
4825 NOREF(pGVM); NOREF(idCpu); NOREF(enmGuestOS); NOREF(pszModuleName); NOREF(pszVersion);
4826 NOREF(GCPtrModBase); NOREF(cbModule); NOREF(cRegions); NOREF(paRegions);
4827 return VERR_NOT_IMPLEMENTED;
4828#endif
4829}
4830
4831
4832/**
4833 * VMMR0 request wrapper for GMMR0RegisterSharedModule.
4834 *
4835 * @returns see GMMR0RegisterSharedModule.
4836 * @param pGVM The global (ring-0) VM structure.
4837 * @param idCpu The VCPU id.
4838 * @param pReq Pointer to the request packet.
4839 */
4840GMMR0DECL(int) GMMR0RegisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMREGISTERSHAREDMODULEREQ pReq)
4841{
4842 /*
4843 * Validate input and pass it on.
4844 */
4845 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4846 AssertMsgReturn( pReq->Hdr.cbReq >= sizeof(*pReq)
4847 && pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMREGISTERSHAREDMODULEREQ, aRegions[pReq->cRegions]),
4848 ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4849
4850 /* Pass back return code in the request packet to preserve informational codes. (VMMR3CallR0 chokes on them) */
4851 pReq->rc = GMMR0RegisterSharedModule(pGVM, idCpu, pReq->enmGuestOS, pReq->szName, pReq->szVersion,
4852 pReq->GCBaseAddr, pReq->cbModule, pReq->cRegions, pReq->aRegions);
4853 return VINF_SUCCESS;
4854}
4855
4856
4857/**
4858 * Unregisters a shared module for the VM
4859 *
4860 * @returns VBox status code.
4861 * @param pGVM The global (ring-0) VM structure.
4862 * @param idCpu The VCPU id.
4863 * @param pszModuleName The module name.
4864 * @param pszVersion The module version.
4865 * @param GCPtrModBase The module base address.
4866 * @param cbModule The module size.
4867 */
4868GMMR0DECL(int) GMMR0UnregisterSharedModule(PGVM pGVM, VMCPUID idCpu, char *pszModuleName, char *pszVersion,
4869 RTGCPTR GCPtrModBase, uint32_t cbModule)
4870{
4871#ifdef VBOX_WITH_PAGE_SHARING
4872 /*
4873 * Validate input and get the basics.
4874 */
4875 PGMM pGMM;
4876 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4877 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4878 if (RT_FAILURE(rc))
4879 return rc;
4880
4881 AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4882 AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4883 if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4884 return VERR_GMM_MODULE_NAME_TOO_LONG;
4885 if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4886 return VERR_GMM_MODULE_NAME_TOO_LONG;
4887
4888 Log(("GMMR0UnregisterSharedModule %s %s base=%RGv size %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule));
4889
4890 /*
4891 * Take the semaphore and do some more validations.
4892 */
4893 gmmR0MutexAcquire(pGMM);
4894 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4895 {
4896 /*
4897 * Locate and remove the specified module.
4898 */
4899 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4900 if (pRecVM)
4901 {
4902 /** @todo Do we need to do more validations here, like that the
4903 * name + version + cbModule matches? */
4904 NOREF(cbModule);
4905 Assert(pRecVM->pGlobalModule);
4906 gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /*fRemove*/);
4907 }
4908 else
4909 rc = VERR_GMM_SHARED_MODULE_NOT_FOUND;
4910
4911 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4912 }
4913 else
4914 rc = VERR_GMM_IS_NOT_SANE;
4915
4916 gmmR0MutexRelease(pGMM);
4917 return rc;
4918#else
4919
4920 NOREF(pGVM); NOREF(idCpu); NOREF(pszModuleName); NOREF(pszVersion); NOREF(GCPtrModBase); NOREF(cbModule);
4921 return VERR_NOT_IMPLEMENTED;
4922#endif
4923}
4924
4925
4926/**
4927 * VMMR0 request wrapper for GMMR0UnregisterSharedModule.
4928 *
4929 * @returns see GMMR0UnregisterSharedModule.
4930 * @param pGVM The global (ring-0) VM structure.
4931 * @param idCpu The VCPU id.
4932 * @param pReq Pointer to the request packet.
4933 */
4934GMMR0DECL(int) GMMR0UnregisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMUNREGISTERSHAREDMODULEREQ pReq)
4935{
4936 /*
4937 * Validate input and pass it on.
4938 */
4939 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4940 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4941
4942 return GMMR0UnregisterSharedModule(pGVM, idCpu, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule);
4943}
4944
4945#ifdef VBOX_WITH_PAGE_SHARING
4946
4947/**
4948 * Increase the use count of a shared page, the page is known to exist and be valid and such.
4949 *
4950 * @param pGMM Pointer to the GMM instance.
4951 * @param pGVM Pointer to the GVM instance.
4952 * @param pPage The page structure.
4953 */
4954DECLINLINE(void) gmmR0UseSharedPage(PGMM pGMM, PGVM pGVM, PGMMPAGE pPage)
4955{
4956 Assert(pGMM->cSharedPages > 0);
4957 Assert(pGMM->cAllocatedPages > 0);
4958
4959 pGMM->cDuplicatePages++;
4960
4961 pPage->Shared.cRefs++;
4962 pGVM->gmm.s.Stats.cSharedPages++;
4963 pGVM->gmm.s.Stats.Allocated.cBasePages++;
4964}
4965
4966
4967/**
4968 * Converts a private page to a shared page, the page is known to exist and be valid and such.
4969 *
4970 * @param pGMM Pointer to the GMM instance.
4971 * @param pGVM Pointer to the GVM instance.
4972 * @param HCPhys Host physical address
4973 * @param idPage The Page ID
4974 * @param pPage The page structure.
4975 * @param pPageDesc Shared page descriptor
4976 */
4977DECLINLINE(void) gmmR0ConvertToSharedPage(PGMM pGMM, PGVM pGVM, RTHCPHYS HCPhys, uint32_t idPage, PGMMPAGE pPage,
4978 PGMMSHAREDPAGEDESC pPageDesc)
4979{
4980 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4981 Assert(pChunk);
4982 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
4983 Assert(GMM_PAGE_IS_PRIVATE(pPage));
4984
4985 pChunk->cPrivate--;
4986 pChunk->cShared++;
4987
4988 pGMM->cSharedPages++;
4989
4990 pGVM->gmm.s.Stats.cSharedPages++;
4991 pGVM->gmm.s.Stats.cPrivatePages--;
4992
4993 /* Modify the page structure. */
4994 pPage->Shared.pfn = (uint32_t)(uint64_t)(HCPhys >> PAGE_SHIFT);
4995 pPage->Shared.cRefs = 1;
4996#ifdef VBOX_STRICT
4997 pPageDesc->u32StrictChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
4998 pPage->Shared.u14Checksum = pPageDesc->u32StrictChecksum;
4999#else
5000 NOREF(pPageDesc);
5001 pPage->Shared.u14Checksum = 0;
5002#endif
5003 pPage->Shared.u2State = GMM_PAGE_STATE_SHARED;
5004}
5005
5006
5007static int gmmR0SharedModuleCheckPageFirstTime(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULE pModule,
5008 unsigned idxRegion, unsigned idxPage,
5009 PGMMSHAREDPAGEDESC pPageDesc, PGMMSHAREDREGIONDESC pGlobalRegion)
5010{
5011 NOREF(pModule);
5012
5013 /* Easy case: just change the internal page type. */
5014 PGMMPAGE pPage = gmmR0GetPage(pGMM, pPageDesc->idPage);
5015 AssertMsgReturn(pPage, ("idPage=%#x (GCPhys=%RGp HCPhys=%RHp idxRegion=%#x idxPage=%#x) #1\n",
5016 pPageDesc->idPage, pPageDesc->GCPhys, pPageDesc->HCPhys, idxRegion, idxPage),
5017 VERR_PGM_PHYS_INVALID_PAGE_ID);
5018 NOREF(idxRegion);
5019
5020 AssertMsg(pPageDesc->GCPhys == (pPage->Private.pfn << 12), ("desc %RGp gmm %RGp\n", pPageDesc->HCPhys, (pPage->Private.pfn << 12)));
5021
5022 gmmR0ConvertToSharedPage(pGMM, pGVM, pPageDesc->HCPhys, pPageDesc->idPage, pPage, pPageDesc);
5023
5024 /* Keep track of these references. */
5025 pGlobalRegion->paidPages[idxPage] = pPageDesc->idPage;
5026
5027 return VINF_SUCCESS;
5028}
5029
5030/**
5031 * Checks specified shared module range for changes
5032 *
5033 * Performs the following tasks:
5034 * - If a shared page is new, then it changes the GMM page type to shared and
5035 * returns it in the pPageDesc descriptor.
5036 * - If a shared page already exists, then it checks if the VM page is
5037 * identical and if so frees the VM page and returns the shared page in
5038 * pPageDesc descriptor.
5039 *
5040 * @remarks ASSUMES the caller has acquired the GMM semaphore!!
5041 *
5042 * @returns VBox status code.
5043 * @param pGVM Pointer to the GVM instance data.
5044 * @param pModule Module description
5045 * @param idxRegion Region index
5046 * @param idxPage Page index
5047 * @param pPageDesc Page descriptor
5048 */
5049GMMR0DECL(int) GMMR0SharedModuleCheckPage(PGVM pGVM, PGMMSHAREDMODULE pModule, uint32_t idxRegion, uint32_t idxPage,
5050 PGMMSHAREDPAGEDESC pPageDesc)
5051{
5052 int rc;
5053 PGMM pGMM;
5054 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5055 pPageDesc->u32StrictChecksum = 0;
5056
5057 AssertMsgReturn(idxRegion < pModule->cRegions,
5058 ("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5059 VERR_INVALID_PARAMETER);
5060
5061 uint32_t const cPages = pModule->aRegions[idxRegion].cb >> PAGE_SHIFT;
5062 AssertMsgReturn(idxPage < cPages,
5063 ("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5064 VERR_INVALID_PARAMETER);
5065
5066 LogFlow(("GMMR0SharedModuleCheckRange %s base %RGv region %d idxPage %d\n", pModule->szName, pModule->Core.Key, idxRegion, idxPage));
5067
5068 /*
5069 * First time; create a page descriptor array.
5070 */
5071 PGMMSHAREDREGIONDESC pGlobalRegion = &pModule->aRegions[idxRegion];
5072 if (!pGlobalRegion->paidPages)
5073 {
5074 Log(("Allocate page descriptor array for %d pages\n", cPages));
5075 pGlobalRegion->paidPages = (uint32_t *)RTMemAlloc(cPages * sizeof(pGlobalRegion->paidPages[0]));
5076 AssertReturn(pGlobalRegion->paidPages, VERR_NO_MEMORY);
5077
5078 /* Invalidate all descriptors. */
5079 uint32_t i = cPages;
5080 while (i-- > 0)
5081 pGlobalRegion->paidPages[i] = NIL_GMM_PAGEID;
5082 }
5083
5084 /*
5085 * We've seen this shared page for the first time?
5086 */
5087 if (pGlobalRegion->paidPages[idxPage] == NIL_GMM_PAGEID)
5088 {
5089 Log(("New shared page guest %RGp host %RHp\n", pPageDesc->GCPhys, pPageDesc->HCPhys));
5090 return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5091 }
5092
5093 /*
5094 * We've seen it before...
5095 */
5096 Log(("Replace existing page guest %RGp host %RHp id %#x -> id %#x\n",
5097 pPageDesc->GCPhys, pPageDesc->HCPhys, pPageDesc->idPage, pGlobalRegion->paidPages[idxPage]));
5098 Assert(pPageDesc->idPage != pGlobalRegion->paidPages[idxPage]);
5099
5100 /*
5101 * Get the shared page source.
5102 */
5103 PGMMPAGE pPage = gmmR0GetPage(pGMM, pGlobalRegion->paidPages[idxPage]);
5104 AssertMsgReturn(pPage, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #2\n", pPageDesc->idPage, idxRegion, idxPage),
5105 VERR_PGM_PHYS_INVALID_PAGE_ID);
5106
5107 if (pPage->Common.u2State != GMM_PAGE_STATE_SHARED)
5108 {
5109 /*
5110 * Page was freed at some point; invalidate this entry.
5111 */
5112 /** @todo this isn't really bullet proof. */
5113 Log(("Old shared page was freed -> create a new one\n"));
5114 pGlobalRegion->paidPages[idxPage] = NIL_GMM_PAGEID;
5115 return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5116 }
5117
5118 Log(("Replace existing page guest host %RHp -> %RHp\n", pPageDesc->HCPhys, ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT));
5119
5120 /*
5121 * Calculate the virtual address of the local page.
5122 */
5123 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pPageDesc->idPage >> GMM_CHUNKID_SHIFT);
5124 AssertMsgReturn(pChunk, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #4\n", pPageDesc->idPage, idxRegion, idxPage),
5125 VERR_PGM_PHYS_INVALID_PAGE_ID);
5126
5127 uint8_t *pbChunk;
5128 AssertMsgReturn(gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk),
5129 ("idPage=%#x (idxRegion=%#x idxPage=%#x) #3\n", pPageDesc->idPage, idxRegion, idxPage),
5130 VERR_PGM_PHYS_INVALID_PAGE_ID);
5131 uint8_t *pbLocalPage = pbChunk + ((pPageDesc->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5132
5133 /*
5134 * Calculate the virtual address of the shared page.
5135 */
5136 pChunk = gmmR0GetChunk(pGMM, pGlobalRegion->paidPages[idxPage] >> GMM_CHUNKID_SHIFT);
5137 Assert(pChunk); /* can't fail as gmmR0GetPage succeeded. */
5138
5139 /*
5140 * Get the virtual address of the physical page; map the chunk into the VM
5141 * process if not already done.
5142 */
5143 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5144 {
5145 Log(("Map chunk into process!\n"));
5146 rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/, (PRTR3PTR)&pbChunk);
5147 AssertRCReturn(rc, rc);
5148 }
5149 uint8_t *pbSharedPage = pbChunk + ((pGlobalRegion->paidPages[idxPage] & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5150
5151#ifdef VBOX_STRICT
5152 pPageDesc->u32StrictChecksum = RTCrc32(pbSharedPage, PAGE_SIZE);
5153 uint32_t uChecksum = pPageDesc->u32StrictChecksum & UINT32_C(0x00003fff);
5154 AssertMsg(!uChecksum || uChecksum == pPage->Shared.u14Checksum || !pPage->Shared.u14Checksum,
5155 ("%#x vs %#x - idPage=%#x - %s %s\n", uChecksum, pPage->Shared.u14Checksum,
5156 pGlobalRegion->paidPages[idxPage], pModule->szName, pModule->szVersion));
5157#endif
5158
5159 /** @todo write ASMMemComparePage. */
5160 if (memcmp(pbSharedPage, pbLocalPage, PAGE_SIZE))
5161 {
5162 Log(("Unexpected differences found between local and shared page; skip\n"));
5163 /* Signal to the caller that this one hasn't changed. */
5164 pPageDesc->idPage = NIL_GMM_PAGEID;
5165 return VINF_SUCCESS;
5166 }
5167
5168 /*
5169 * Free the old local page.
5170 */
5171 GMMFREEPAGEDESC PageDesc;
5172 PageDesc.idPage = pPageDesc->idPage;
5173 rc = gmmR0FreePages(pGMM, pGVM, 1, &PageDesc, GMMACCOUNT_BASE);
5174 AssertRCReturn(rc, rc);
5175
5176 gmmR0UseSharedPage(pGMM, pGVM, pPage);
5177
5178 /*
5179 * Pass along the new physical address & page id.
5180 */
5181 pPageDesc->HCPhys = ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT;
5182 pPageDesc->idPage = pGlobalRegion->paidPages[idxPage];
5183
5184 return VINF_SUCCESS;
5185}
5186
5187
5188/**
5189 * RTAvlGCPtrDestroy callback.
5190 *
5191 * @returns 0 or VERR_GMM_INSTANCE.
5192 * @param pNode The node to destroy.
5193 * @param pvArgs Pointer to an argument packet.
5194 */
5195static DECLCALLBACK(int) gmmR0CleanupSharedModule(PAVLGCPTRNODECORE pNode, void *pvArgs)
5196{
5197 gmmR0ShModDeletePerVM(((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGMM,
5198 ((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGVM,
5199 (PGMMSHAREDMODULEPERVM)pNode,
5200 false /*fRemove*/);
5201 return VINF_SUCCESS;
5202}
5203
5204
5205/**
5206 * Used by GMMR0CleanupVM to clean up shared modules.
5207 *
5208 * This is called without taking the GMM lock so that it can be yielded as
5209 * needed here.
5210 *
5211 * @param pGMM The GMM handle.
5212 * @param pGVM The global VM handle.
5213 */
5214static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM)
5215{
5216 gmmR0MutexAcquire(pGMM);
5217 GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
5218
5219 GMMR0SHMODPERVMDTORARGS Args;
5220 Args.pGVM = pGVM;
5221 Args.pGMM = pGMM;
5222 RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5223
5224 AssertMsg(pGVM->gmm.s.Stats.cShareableModules == 0, ("%d\n", pGVM->gmm.s.Stats.cShareableModules));
5225 pGVM->gmm.s.Stats.cShareableModules = 0;
5226
5227 gmmR0MutexRelease(pGMM);
5228}
5229
5230#endif /* VBOX_WITH_PAGE_SHARING */
5231
5232/**
5233 * Removes all shared modules for the specified VM
5234 *
5235 * @returns VBox status code.
5236 * @param pGVM The global (ring-0) VM structure.
5237 * @param idCpu The VCPU id.
5238 */
5239GMMR0DECL(int) GMMR0ResetSharedModules(PGVM pGVM, VMCPUID idCpu)
5240{
5241#ifdef VBOX_WITH_PAGE_SHARING
5242 /*
5243 * Validate input and get the basics.
5244 */
5245 PGMM pGMM;
5246 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5247 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5248 if (RT_FAILURE(rc))
5249 return rc;
5250
5251 /*
5252 * Take the semaphore and do some more validations.
5253 */
5254 gmmR0MutexAcquire(pGMM);
5255 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5256 {
5257 Log(("GMMR0ResetSharedModules\n"));
5258 GMMR0SHMODPERVMDTORARGS Args;
5259 Args.pGVM = pGVM;
5260 Args.pGMM = pGMM;
5261 RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5262 pGVM->gmm.s.Stats.cShareableModules = 0;
5263
5264 rc = VINF_SUCCESS;
5265 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5266 }
5267 else
5268 rc = VERR_GMM_IS_NOT_SANE;
5269
5270 gmmR0MutexRelease(pGMM);
5271 return rc;
5272#else
5273 RT_NOREF(pGVM, idCpu);
5274 return VERR_NOT_IMPLEMENTED;
5275#endif
5276}
5277
5278#ifdef VBOX_WITH_PAGE_SHARING
5279
5280/**
5281 * Tree enumeration callback for checking a shared module.
5282 */
5283static DECLCALLBACK(int) gmmR0CheckSharedModule(PAVLGCPTRNODECORE pNode, void *pvUser)
5284{
5285 GMMCHECKSHAREDMODULEINFO *pArgs = (GMMCHECKSHAREDMODULEINFO*)pvUser;
5286 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)pNode;
5287 PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
5288
5289 Log(("gmmR0CheckSharedModule: check %s %s base=%RGv size=%x\n",
5290 pGblMod->szName, pGblMod->szVersion, pGblMod->Core.Key, pGblMod->cbModule));
5291
5292 int rc = PGMR0SharedModuleCheck(pArgs->pGVM, pArgs->pGVM, pArgs->idCpu, pGblMod, pRecVM->aRegionsGCPtrs);
5293 if (RT_FAILURE(rc))
5294 return rc;
5295 return VINF_SUCCESS;
5296}
5297
5298#endif /* VBOX_WITH_PAGE_SHARING */
5299
5300/**
5301 * Check all shared modules for the specified VM.
5302 *
5303 * @returns VBox status code.
5304 * @param pGVM The global (ring-0) VM structure.
5305 * @param idCpu The calling EMT number.
5306 * @thread EMT(idCpu)
5307 */
5308GMMR0DECL(int) GMMR0CheckSharedModules(PGVM pGVM, VMCPUID idCpu)
5309{
5310#ifdef VBOX_WITH_PAGE_SHARING
5311 /*
5312 * Validate input and get the basics.
5313 */
5314 PGMM pGMM;
5315 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5316 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5317 if (RT_FAILURE(rc))
5318 return rc;
5319
5320# ifndef DEBUG_sandervl
5321 /*
5322 * Take the semaphore and do some more validations.
5323 */
5324 gmmR0MutexAcquire(pGMM);
5325# endif
5326 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5327 {
5328 /*
5329 * Walk the tree, checking each module.
5330 */
5331 Log(("GMMR0CheckSharedModules\n"));
5332
5333 GMMCHECKSHAREDMODULEINFO Args;
5334 Args.pGVM = pGVM;
5335 Args.idCpu = idCpu;
5336 rc = RTAvlGCPtrDoWithAll(&pGVM->gmm.s.pSharedModuleTree, true /* fFromLeft */, gmmR0CheckSharedModule, &Args);
5337
5338 Log(("GMMR0CheckSharedModules done (rc=%Rrc)!\n", rc));
5339 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5340 }
5341 else
5342 rc = VERR_GMM_IS_NOT_SANE;
5343
5344# ifndef DEBUG_sandervl
5345 gmmR0MutexRelease(pGMM);
5346# endif
5347 return rc;
5348#else
5349 RT_NOREF(pGVM, idCpu);
5350 return VERR_NOT_IMPLEMENTED;
5351#endif
5352}
5353
5354#if defined(VBOX_STRICT) && HC_ARCH_BITS == 64
5355
5356/**
5357 * Worker for GMMR0FindDuplicatePageReq.
5358 *
5359 * @returns true if duplicate, false if not.
5360 */
5361static bool gmmR0FindDupPageInChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint8_t const *pbSourcePage)
5362{
5363 bool fFoundDuplicate = false;
5364 /* Only take chunks not mapped into this VM process; not entirely correct. */
5365 uint8_t *pbChunk;
5366 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5367 {
5368 int rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/, (PRTR3PTR)&pbChunk);
5369 if (RT_SUCCESS(rc))
5370 {
5371 /*
5372 * Look for duplicate pages
5373 */
5374 uintptr_t iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
5375 while (iPage-- > 0)
5376 {
5377 if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
5378 {
5379 uint8_t *pbDestPage = pbChunk + (iPage << PAGE_SHIFT);
5380 if (!memcmp(pbSourcePage, pbDestPage, PAGE_SIZE))
5381 {
5382 fFoundDuplicate = true;
5383 break;
5384 }
5385 }
5386 }
5387 gmmR0UnmapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/);
5388 }
5389 }
5390 return fFoundDuplicate;
5391}
5392
5393
5394/**
5395 * Find a duplicate of the specified page in other active VMs
5396 *
5397 * @returns VBox status code.
5398 * @param pGVM The global (ring-0) VM structure.
5399 * @param pReq Pointer to the request packet.
5400 */
5401GMMR0DECL(int) GMMR0FindDuplicatePageReq(PGVM pGVM, PGMMFINDDUPLICATEPAGEREQ pReq)
5402{
5403 /*
5404 * Validate input and pass it on.
5405 */
5406 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5407 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5408
5409 PGMM pGMM;
5410 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5411
5412 int rc = GVMMR0ValidateGVM(pGVM);
5413 if (RT_FAILURE(rc))
5414 return rc;
5415
5416 /*
5417 * Take the semaphore and do some more validations.
5418 */
5419 rc = gmmR0MutexAcquire(pGMM);
5420 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5421 {
5422 uint8_t *pbChunk;
5423 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pReq->idPage >> GMM_CHUNKID_SHIFT);
5424 if (pChunk)
5425 {
5426 if (gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5427 {
5428 uint8_t *pbSourcePage = pbChunk + ((pReq->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5429 PGMMPAGE pPage = gmmR0GetPage(pGMM, pReq->idPage);
5430 if (pPage)
5431 {
5432 /*
5433 * Walk the chunks
5434 */
5435 GMMFINDDUPPAGEINFO Args;
5436 Args.pGVM = pGVM;
5437 Args.pGMM = pGMM;
5438 Args.pSourcePage = pbSourcePage;
5439 Args.fFoundDuplicate = false;
5440
5441 PGMMCHUNK pChunk;
5442 pReq->fDuplicate = false;
5443 RTListForEach(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
5444 {
5445 if (gmmR0FindDupPageInChunk(pGMM, pGVM, pChunk, pbSourcePage))
5446 {
5447 pReq->fDuplicate = true;
5448 break;
5449 }
5450 }
5451 }
5452 else
5453 {
5454 AssertFailed();
5455 rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
5456 }
5457 }
5458 else
5459 AssertFailed();
5460 }
5461 else
5462 AssertFailed();
5463 }
5464 else
5465 rc = VERR_GMM_IS_NOT_SANE;
5466
5467 gmmR0MutexRelease(pGMM);
5468 return rc;
5469}
5470
5471#endif /* VBOX_STRICT && HC_ARCH_BITS == 64 */
5472
5473
5474/**
5475 * Retrieves the GMM statistics visible to the caller.
5476 *
5477 * @returns VBox status code.
5478 *
5479 * @param pStats Where to put the statistics.
5480 * @param pSession The current session.
5481 * @param pGVM The GVM to obtain statistics for. Optional.
5482 */
5483GMMR0DECL(int) GMMR0QueryStatistics(PGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5484{
5485 LogFlow(("GVMMR0QueryStatistics: pStats=%p pSession=%p pGVM=%p\n", pStats, pSession, pGVM));
5486
5487 /*
5488 * Validate input.
5489 */
5490 AssertPtrReturn(pSession, VERR_INVALID_POINTER);
5491 AssertPtrReturn(pStats, VERR_INVALID_POINTER);
5492 pStats->cMaxPages = 0; /* (crash before taking the mutex...) */
5493
5494 PGMM pGMM;
5495 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5496
5497 /*
5498 * Validate the VM handle, if not NULL, and lock the GMM.
5499 */
5500 int rc;
5501 if (pGVM)
5502 {
5503 rc = GVMMR0ValidateGVM(pGVM);
5504 if (RT_FAILURE(rc))
5505 return rc;
5506 }
5507
5508 rc = gmmR0MutexAcquire(pGMM);
5509 if (RT_FAILURE(rc))
5510 return rc;
5511
5512 /*
5513 * Copy out the GMM statistics.
5514 */
5515 pStats->cMaxPages = pGMM->cMaxPages;
5516 pStats->cReservedPages = pGMM->cReservedPages;
5517 pStats->cOverCommittedPages = pGMM->cOverCommittedPages;
5518 pStats->cAllocatedPages = pGMM->cAllocatedPages;
5519 pStats->cSharedPages = pGMM->cSharedPages;
5520 pStats->cDuplicatePages = pGMM->cDuplicatePages;
5521 pStats->cLeftBehindSharedPages = pGMM->cLeftBehindSharedPages;
5522 pStats->cBalloonedPages = pGMM->cBalloonedPages;
5523 pStats->cChunks = pGMM->cChunks;
5524 pStats->cFreedChunks = pGMM->cFreedChunks;
5525 pStats->cShareableModules = pGMM->cShareableModules;
5526 RT_ZERO(pStats->au64Reserved);
5527
5528 /*
5529 * Copy out the VM statistics.
5530 */
5531 if (pGVM)
5532 pStats->VMStats = pGVM->gmm.s.Stats;
5533 else
5534 RT_ZERO(pStats->VMStats);
5535
5536 gmmR0MutexRelease(pGMM);
5537 return rc;
5538}
5539
5540
5541/**
5542 * VMMR0 request wrapper for GMMR0QueryStatistics.
5543 *
5544 * @returns see GMMR0QueryStatistics.
5545 * @param pGVM The global (ring-0) VM structure. Optional.
5546 * @param pReq Pointer to the request packet.
5547 */
5548GMMR0DECL(int) GMMR0QueryStatisticsReq(PGVM pGVM, PGMMQUERYSTATISTICSSREQ pReq)
5549{
5550 /*
5551 * Validate input and pass it on.
5552 */
5553 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5554 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5555
5556 return GMMR0QueryStatistics(&pReq->Stats, pReq->pSession, pGVM);
5557}
5558
5559
5560/**
5561 * Resets the specified GMM statistics.
5562 *
5563 * @returns VBox status code.
5564 *
5565 * @param pStats Which statistics to reset, that is, non-zero fields
5566 * indicates which to reset.
5567 * @param pSession The current session.
5568 * @param pGVM The GVM to reset statistics for. Optional.
5569 */
5570GMMR0DECL(int) GMMR0ResetStatistics(PCGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5571{
5572 NOREF(pStats); NOREF(pSession); NOREF(pGVM);
5573 /* Currently nothing we can reset at the moment. */
5574 return VINF_SUCCESS;
5575}
5576
5577
5578/**
5579 * VMMR0 request wrapper for GMMR0ResetStatistics.
5580 *
5581 * @returns see GMMR0ResetStatistics.
5582 * @param pGVM The global (ring-0) VM structure. Optional.
5583 * @param pReq Pointer to the request packet.
5584 */
5585GMMR0DECL(int) GMMR0ResetStatisticsReq(PGVM pGVM, PGMMRESETSTATISTICSSREQ pReq)
5586{
5587 /*
5588 * Validate input and pass it on.
5589 */
5590 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5591 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5592
5593 return GMMR0ResetStatistics(&pReq->Stats, pReq->pSession, pGVM);
5594}
5595
注意: 瀏覽 TracBrowser 來幫助您使用儲存庫瀏覽器

© 2025 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette