VirtualBox

source: vbox/trunk/src/VBox/VMM/VMMR0/GMMR0.cpp@ 82977

最後變更 在這個檔案從82977是 82977,由 vboxsync 提交於 5 年 前

VMM/GMMR0: Use the chunk list rather than the AVL tree in GMMR0FindDuplicatePageReq to look for duplicate pages. This will restricts the AVL tree to lookups and make it simpler to protect. [fixes] bugref:9627

  • 屬性 svn:eol-style 設為 native
  • 屬性 svn:keywords 設為 Id Revision
檔案大小: 194.9 KB
 
1/* $Id: GMMR0.cpp 82977 2020-02-04 12:45:12Z vboxsync $ */
2/** @file
3 * GMM - Global Memory Manager.
4 */
5
6/*
7 * Copyright (C) 2007-2020 Oracle Corporation
8 *
9 * This file is part of VirtualBox Open Source Edition (OSE), as
10 * available from http://www.alldomusa.eu.org. This file is free software;
11 * you can redistribute it and/or modify it under the terms of the GNU
12 * General Public License (GPL) as published by the Free Software
13 * Foundation, in version 2 as it comes in the "COPYING" file of the
14 * VirtualBox OSE distribution. VirtualBox OSE is distributed in the
15 * hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
16 */
17
18
19/** @page pg_gmm GMM - The Global Memory Manager
20 *
21 * As the name indicates, this component is responsible for global memory
22 * management. Currently only guest RAM is allocated from the GMM, but this
23 * may change to include shadow page tables and other bits later.
24 *
25 * Guest RAM is managed as individual pages, but allocated from the host OS
26 * in chunks for reasons of portability / efficiency. To minimize the memory
27 * footprint all tracking structure must be as small as possible without
28 * unnecessary performance penalties.
29 *
30 * The allocation chunks has fixed sized, the size defined at compile time
31 * by the #GMM_CHUNK_SIZE \#define.
32 *
33 * Each chunk is given an unique ID. Each page also has a unique ID. The
34 * relationship between the two IDs is:
35 * @code
36 * GMM_CHUNK_SHIFT = log2(GMM_CHUNK_SIZE / PAGE_SIZE);
37 * idPage = (idChunk << GMM_CHUNK_SHIFT) | iPage;
38 * @endcode
39 * Where iPage is the index of the page within the chunk. This ID scheme
40 * permits for efficient chunk and page lookup, but it relies on the chunk size
41 * to be set at compile time. The chunks are organized in an AVL tree with their
42 * IDs being the keys.
43 *
44 * The physical address of each page in an allocation chunk is maintained by
45 * the #RTR0MEMOBJ and obtained using #RTR0MemObjGetPagePhysAddr. There is no
46 * need to duplicate this information (it'll cost 8-bytes per page if we did).
47 *
48 * So what do we need to track per page? Most importantly we need to know
49 * which state the page is in:
50 * - Private - Allocated for (eventually) backing one particular VM page.
51 * - Shared - Readonly page that is used by one or more VMs and treated
52 * as COW by PGM.
53 * - Free - Not used by anyone.
54 *
55 * For the page replacement operations (sharing, defragmenting and freeing)
56 * to be somewhat efficient, private pages needs to be associated with a
57 * particular page in a particular VM.
58 *
59 * Tracking the usage of shared pages is impractical and expensive, so we'll
60 * settle for a reference counting system instead.
61 *
62 * Free pages will be chained on LIFOs
63 *
64 * On 64-bit systems we will use a 64-bit bitfield per page, while on 32-bit
65 * systems a 32-bit bitfield will have to suffice because of address space
66 * limitations. The #GMMPAGE structure shows the details.
67 *
68 *
69 * @section sec_gmm_alloc_strat Page Allocation Strategy
70 *
71 * The strategy for allocating pages has to take fragmentation and shared
72 * pages into account, or we may end up with with 2000 chunks with only
73 * a few pages in each. Shared pages cannot easily be reallocated because
74 * of the inaccurate usage accounting (see above). Private pages can be
75 * reallocated by a defragmentation thread in the same manner that sharing
76 * is done.
77 *
78 * The first approach is to manage the free pages in two sets depending on
79 * whether they are mainly for the allocation of shared or private pages.
80 * In the initial implementation there will be almost no possibility for
81 * mixing shared and private pages in the same chunk (only if we're really
82 * stressed on memory), but when we implement forking of VMs and have to
83 * deal with lots of COW pages it'll start getting kind of interesting.
84 *
85 * The sets are lists of chunks with approximately the same number of
86 * free pages. Say the chunk size is 1MB, meaning 256 pages, and a set
87 * consists of 16 lists. So, the first list will contain the chunks with
88 * 1-7 free pages, the second covers 8-15, and so on. The chunks will be
89 * moved between the lists as pages are freed up or allocated.
90 *
91 *
92 * @section sec_gmm_costs Costs
93 *
94 * The per page cost in kernel space is 32-bit plus whatever RTR0MEMOBJ
95 * entails. In addition there is the chunk cost of approximately
96 * (sizeof(RT0MEMOBJ) + sizeof(CHUNK)) / 2^CHUNK_SHIFT bytes per page.
97 *
98 * On Windows the per page #RTR0MEMOBJ cost is 32-bit on 32-bit windows
99 * and 64-bit on 64-bit windows (a PFN_NUMBER in the MDL). So, 64-bit per page.
100 * The cost on Linux is identical, but here it's because of sizeof(struct page *).
101 *
102 *
103 * @section sec_gmm_legacy Legacy Mode for Non-Tier-1 Platforms
104 *
105 * In legacy mode the page source is locked user pages and not
106 * #RTR0MemObjAllocPhysNC, this means that a page can only be allocated
107 * by the VM that locked it. We will make no attempt at implementing
108 * page sharing on these systems, just do enough to make it all work.
109 *
110 * @note With 6.1 really dropping 32-bit support, the legacy mode is obsoleted
111 * under the assumption that there is sufficient kernel virtual address
112 * space to map all of the guest memory allocations. So, we'll be using
113 * #RTR0MemObjAllocPage on some platforms as an alternative to
114 * #RTR0MemObjAllocPhysNC.
115 *
116 *
117 * @subsection sub_gmm_locking Serializing
118 *
119 * One simple fast mutex will be employed in the initial implementation, not
120 * two as mentioned in @ref sec_pgmPhys_Serializing.
121 *
122 * @see @ref sec_pgmPhys_Serializing
123 *
124 *
125 * @section sec_gmm_overcommit Memory Over-Commitment Management
126 *
127 * The GVM will have to do the system wide memory over-commitment
128 * management. My current ideas are:
129 * - Per VM oc policy that indicates how much to initially commit
130 * to it and what to do in a out-of-memory situation.
131 * - Prevent overtaxing the host.
132 *
133 * There are some challenges here, the main ones are configurability and
134 * security. Should we for instance permit anyone to request 100% memory
135 * commitment? Who should be allowed to do runtime adjustments of the
136 * config. And how to prevent these settings from being lost when the last
137 * VM process exits? The solution is probably to have an optional root
138 * daemon the will keep VMMR0.r0 in memory and enable the security measures.
139 *
140 *
141 *
142 * @section sec_gmm_numa NUMA
143 *
144 * NUMA considerations will be designed and implemented a bit later.
145 *
146 * The preliminary guesses is that we will have to try allocate memory as
147 * close as possible to the CPUs the VM is executed on (EMT and additional CPU
148 * threads). Which means it's mostly about allocation and sharing policies.
149 * Both the scheduler and allocator interface will to supply some NUMA info
150 * and we'll need to have a way to calc access costs.
151 *
152 */
153
154
155/*********************************************************************************************************************************
156* Header Files *
157*********************************************************************************************************************************/
158#define LOG_GROUP LOG_GROUP_GMM
159#include <VBox/rawpci.h>
160#include <VBox/vmm/gmm.h>
161#include "GMMR0Internal.h"
162#include <VBox/vmm/vmcc.h>
163#include <VBox/vmm/pgm.h>
164#include <VBox/log.h>
165#include <VBox/param.h>
166#include <VBox/err.h>
167#include <VBox/VMMDev.h>
168#include <iprt/asm.h>
169#include <iprt/avl.h>
170#ifdef VBOX_STRICT
171# include <iprt/crc.h>
172#endif
173#include <iprt/critsect.h>
174#include <iprt/list.h>
175#include <iprt/mem.h>
176#include <iprt/memobj.h>
177#include <iprt/mp.h>
178#include <iprt/semaphore.h>
179#include <iprt/string.h>
180#include <iprt/time.h>
181
182
183/*********************************************************************************************************************************
184* Defined Constants And Macros *
185*********************************************************************************************************************************/
186/** @def VBOX_USE_CRIT_SECT_FOR_GIANT
187 * Use a critical section instead of a fast mutex for the giant GMM lock.
188 *
189 * @remarks This is primarily a way of avoiding the deadlock checks in the
190 * windows driver verifier. */
191#if defined(RT_OS_WINDOWS) || defined(RT_OS_DARWIN) || defined(DOXYGEN_RUNNING)
192# define VBOX_USE_CRIT_SECT_FOR_GIANT
193#endif
194
195#if (!defined(VBOX_WITH_RAM_IN_KERNEL) || defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)) \
196 && !defined(RT_OS_DARWIN)
197/** Enable the legacy mode code (will be dropped soon). */
198# define GMM_WITH_LEGACY_MODE
199#endif
200
201
202/*********************************************************************************************************************************
203* Structures and Typedefs *
204*********************************************************************************************************************************/
205/** Pointer to set of free chunks. */
206typedef struct GMMCHUNKFREESET *PGMMCHUNKFREESET;
207
208/**
209 * The per-page tracking structure employed by the GMM.
210 *
211 * On 32-bit hosts we'll some trickery is necessary to compress all
212 * the information into 32-bits. When the fSharedFree member is set,
213 * the 30th bit decides whether it's a free page or not.
214 *
215 * Because of the different layout on 32-bit and 64-bit hosts, macros
216 * are used to get and set some of the data.
217 */
218typedef union GMMPAGE
219{
220#if HC_ARCH_BITS == 64
221 /** Unsigned integer view. */
222 uint64_t u;
223
224 /** The common view. */
225 struct GMMPAGECOMMON
226 {
227 uint32_t uStuff1 : 32;
228 uint32_t uStuff2 : 30;
229 /** The page state. */
230 uint32_t u2State : 2;
231 } Common;
232
233 /** The view of a private page. */
234 struct GMMPAGEPRIVATE
235 {
236 /** The guest page frame number. (Max addressable: 2 ^ 44 - 16) */
237 uint32_t pfn;
238 /** The GVM handle. (64K VMs) */
239 uint32_t hGVM : 16;
240 /** Reserved. */
241 uint32_t u16Reserved : 14;
242 /** The page state. */
243 uint32_t u2State : 2;
244 } Private;
245
246 /** The view of a shared page. */
247 struct GMMPAGESHARED
248 {
249 /** The host page frame number. (Max addressable: 2 ^ 44 - 16) */
250 uint32_t pfn;
251 /** The reference count (64K VMs). */
252 uint32_t cRefs : 16;
253 /** Used for debug checksumming. */
254 uint32_t u14Checksum : 14;
255 /** The page state. */
256 uint32_t u2State : 2;
257 } Shared;
258
259 /** The view of a free page. */
260 struct GMMPAGEFREE
261 {
262 /** The index of the next page in the free list. UINT16_MAX is NIL. */
263 uint16_t iNext;
264 /** Reserved. Checksum or something? */
265 uint16_t u16Reserved0;
266 /** Reserved. Checksum or something? */
267 uint32_t u30Reserved1 : 30;
268 /** The page state. */
269 uint32_t u2State : 2;
270 } Free;
271
272#else /* 32-bit */
273 /** Unsigned integer view. */
274 uint32_t u;
275
276 /** The common view. */
277 struct GMMPAGECOMMON
278 {
279 uint32_t uStuff : 30;
280 /** The page state. */
281 uint32_t u2State : 2;
282 } Common;
283
284 /** The view of a private page. */
285 struct GMMPAGEPRIVATE
286 {
287 /** The guest page frame number. (Max addressable: 2 ^ 36) */
288 uint32_t pfn : 24;
289 /** The GVM handle. (127 VMs) */
290 uint32_t hGVM : 7;
291 /** The top page state bit, MBZ. */
292 uint32_t fZero : 1;
293 } Private;
294
295 /** The view of a shared page. */
296 struct GMMPAGESHARED
297 {
298 /** The reference count. */
299 uint32_t cRefs : 30;
300 /** The page state. */
301 uint32_t u2State : 2;
302 } Shared;
303
304 /** The view of a free page. */
305 struct GMMPAGEFREE
306 {
307 /** The index of the next page in the free list. UINT16_MAX is NIL. */
308 uint32_t iNext : 16;
309 /** Reserved. Checksum or something? */
310 uint32_t u14Reserved : 14;
311 /** The page state. */
312 uint32_t u2State : 2;
313 } Free;
314#endif
315} GMMPAGE;
316AssertCompileSize(GMMPAGE, sizeof(RTHCUINTPTR));
317/** Pointer to a GMMPAGE. */
318typedef GMMPAGE *PGMMPAGE;
319
320
321/** @name The Page States.
322 * @{ */
323/** A private page. */
324#define GMM_PAGE_STATE_PRIVATE 0
325/** A private page - alternative value used on the 32-bit implementation.
326 * This will never be used on 64-bit hosts. */
327#define GMM_PAGE_STATE_PRIVATE_32 1
328/** A shared page. */
329#define GMM_PAGE_STATE_SHARED 2
330/** A free page. */
331#define GMM_PAGE_STATE_FREE 3
332/** @} */
333
334
335/** @def GMM_PAGE_IS_PRIVATE
336 *
337 * @returns true if private, false if not.
338 * @param pPage The GMM page.
339 */
340#if HC_ARCH_BITS == 64
341# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_PRIVATE )
342#else
343# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Private.fZero == 0 )
344#endif
345
346/** @def GMM_PAGE_IS_SHARED
347 *
348 * @returns true if shared, false if not.
349 * @param pPage The GMM page.
350 */
351#define GMM_PAGE_IS_SHARED(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_SHARED )
352
353/** @def GMM_PAGE_IS_FREE
354 *
355 * @returns true if free, false if not.
356 * @param pPage The GMM page.
357 */
358#define GMM_PAGE_IS_FREE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_FREE )
359
360/** @def GMM_PAGE_PFN_LAST
361 * The last valid guest pfn range.
362 * @remark Some of the values outside the range has special meaning,
363 * see GMM_PAGE_PFN_UNSHAREABLE.
364 */
365#if HC_ARCH_BITS == 64
366# define GMM_PAGE_PFN_LAST UINT32_C(0xfffffff0)
367#else
368# define GMM_PAGE_PFN_LAST UINT32_C(0x00fffff0)
369#endif
370AssertCompile(GMM_PAGE_PFN_LAST == (GMM_GCPHYS_LAST >> PAGE_SHIFT));
371
372/** @def GMM_PAGE_PFN_UNSHAREABLE
373 * Indicates that this page isn't used for normal guest memory and thus isn't shareable.
374 */
375#if HC_ARCH_BITS == 64
376# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0xfffffff1)
377#else
378# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0x00fffff1)
379#endif
380AssertCompile(GMM_PAGE_PFN_UNSHAREABLE == (GMM_GCPHYS_UNSHAREABLE >> PAGE_SHIFT));
381
382
383/**
384 * A GMM allocation chunk ring-3 mapping record.
385 *
386 * This should really be associated with a session and not a VM, but
387 * it's simpler to associated with a VM and cleanup with the VM object
388 * is destroyed.
389 */
390typedef struct GMMCHUNKMAP
391{
392 /** The mapping object. */
393 RTR0MEMOBJ hMapObj;
394 /** The VM owning the mapping. */
395 PGVM pGVM;
396} GMMCHUNKMAP;
397/** Pointer to a GMM allocation chunk mapping. */
398typedef struct GMMCHUNKMAP *PGMMCHUNKMAP;
399
400
401/**
402 * A GMM allocation chunk.
403 */
404typedef struct GMMCHUNK
405{
406 /** The AVL node core.
407 * The Key is the chunk ID. (Giant mtx.) */
408 AVLU32NODECORE Core;
409 /** The memory object.
410 * Either from RTR0MemObjAllocPhysNC or RTR0MemObjLockUser depending on
411 * what the host can dish up with. (Chunk mtx protects mapping accesses
412 * and related frees.) */
413 RTR0MEMOBJ hMemObj;
414#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
415 /** Pointer to the kernel mapping. */
416 uint8_t *pbMapping;
417#endif
418 /** Pointer to the next chunk in the free list. (Giant mtx.) */
419 PGMMCHUNK pFreeNext;
420 /** Pointer to the previous chunk in the free list. (Giant mtx.) */
421 PGMMCHUNK pFreePrev;
422 /** Pointer to the free set this chunk belongs to. NULL for
423 * chunks with no free pages. (Giant mtx.) */
424 PGMMCHUNKFREESET pSet;
425 /** List node in the chunk list (GMM::ChunkList). (Giant mtx.) */
426 RTLISTNODE ListNode;
427 /** Pointer to an array of mappings. (Chunk mtx.) */
428 PGMMCHUNKMAP paMappingsX;
429 /** The number of mappings. (Chunk mtx.) */
430 uint16_t cMappingsX;
431 /** The mapping lock this chunk is using using. UINT16_MAX if nobody is
432 * mapping or freeing anything. (Giant mtx.) */
433 uint8_t volatile iChunkMtx;
434 /** GMM_CHUNK_FLAGS_XXX. (Giant mtx.) */
435 uint8_t fFlags;
436 /** The head of the list of free pages. UINT16_MAX is the NIL value.
437 * (Giant mtx.) */
438 uint16_t iFreeHead;
439 /** The number of free pages. (Giant mtx.) */
440 uint16_t cFree;
441 /** The GVM handle of the VM that first allocated pages from this chunk, this
442 * is used as a preference when there are several chunks to choose from.
443 * When in bound memory mode this isn't a preference any longer. (Giant
444 * mtx.) */
445 uint16_t hGVM;
446 /** The ID of the NUMA node the memory mostly resides on. (Reserved for
447 * future use.) (Giant mtx.) */
448 uint16_t idNumaNode;
449 /** The number of private pages. (Giant mtx.) */
450 uint16_t cPrivate;
451 /** The number of shared pages. (Giant mtx.) */
452 uint16_t cShared;
453 /** The pages. (Giant mtx.) */
454 GMMPAGE aPages[GMM_CHUNK_SIZE >> PAGE_SHIFT];
455} GMMCHUNK;
456
457/** Indicates that the NUMA properies of the memory is unknown. */
458#define GMM_CHUNK_NUMA_ID_UNKNOWN UINT16_C(0xfffe)
459
460/** @name GMM_CHUNK_FLAGS_XXX - chunk flags.
461 * @{ */
462/** Indicates that the chunk is a large page (2MB). */
463#define GMM_CHUNK_FLAGS_LARGE_PAGE UINT16_C(0x0001)
464#ifdef GMM_WITH_LEGACY_MODE
465/** Indicates that the chunk was locked rather than allocated directly. */
466# define GMM_CHUNK_FLAGS_SEEDED UINT16_C(0x0002)
467#endif
468/** @} */
469
470
471/**
472 * An allocation chunk TLB entry.
473 */
474typedef struct GMMCHUNKTLBE
475{
476 /** The chunk id. */
477 uint32_t idChunk;
478 /** Pointer to the chunk. */
479 PGMMCHUNK pChunk;
480} GMMCHUNKTLBE;
481/** Pointer to an allocation chunk TLB entry. */
482typedef GMMCHUNKTLBE *PGMMCHUNKTLBE;
483
484
485/** The number of entries tin the allocation chunk TLB. */
486#define GMM_CHUNKTLB_ENTRIES 32
487/** Gets the TLB entry index for the given Chunk ID. */
488#define GMM_CHUNKTLB_IDX(idChunk) ( (idChunk) & (GMM_CHUNKTLB_ENTRIES - 1) )
489
490/**
491 * An allocation chunk TLB.
492 */
493typedef struct GMMCHUNKTLB
494{
495 /** The TLB entries. */
496 GMMCHUNKTLBE aEntries[GMM_CHUNKTLB_ENTRIES];
497} GMMCHUNKTLB;
498/** Pointer to an allocation chunk TLB. */
499typedef GMMCHUNKTLB *PGMMCHUNKTLB;
500
501
502/**
503 * The GMM instance data.
504 */
505typedef struct GMM
506{
507 /** Magic / eye catcher. GMM_MAGIC */
508 uint32_t u32Magic;
509 /** The number of threads waiting on the mutex. */
510 uint32_t cMtxContenders;
511#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
512 /** The critical section protecting the GMM.
513 * More fine grained locking can be implemented later if necessary. */
514 RTCRITSECT GiantCritSect;
515#else
516 /** The fast mutex protecting the GMM.
517 * More fine grained locking can be implemented later if necessary. */
518 RTSEMFASTMUTEX hMtx;
519#endif
520#ifdef VBOX_STRICT
521 /** The current mutex owner. */
522 RTNATIVETHREAD hMtxOwner;
523#endif
524 /** The chunk tree. */
525 PAVLU32NODECORE pChunks;
526 /** The chunk TLB. */
527 GMMCHUNKTLB ChunkTLB;
528 /** The private free set. */
529 GMMCHUNKFREESET PrivateX;
530 /** The shared free set. */
531 GMMCHUNKFREESET Shared;
532
533 /** Shared module tree (global).
534 * @todo separate trees for distinctly different guest OSes. */
535 PAVLLU32NODECORE pGlobalSharedModuleTree;
536 /** Sharable modules (count of nodes in pGlobalSharedModuleTree). */
537 uint32_t cShareableModules;
538
539 /** The chunk list. For simplifying the cleanup process and avoid tree
540 * traversal. */
541 RTLISTANCHOR ChunkList;
542
543 /** The maximum number of pages we're allowed to allocate.
544 * @gcfgm{GMM/MaxPages,64-bit, Direct.}
545 * @gcfgm{GMM/PctPages,32-bit, Relative to the number of host pages.} */
546 uint64_t cMaxPages;
547 /** The number of pages that has been reserved.
548 * The deal is that cReservedPages - cOverCommittedPages <= cMaxPages. */
549 uint64_t cReservedPages;
550 /** The number of pages that we have over-committed in reservations. */
551 uint64_t cOverCommittedPages;
552 /** The number of actually allocated (committed if you like) pages. */
553 uint64_t cAllocatedPages;
554 /** The number of pages that are shared. A subset of cAllocatedPages. */
555 uint64_t cSharedPages;
556 /** The number of pages that are actually shared between VMs. */
557 uint64_t cDuplicatePages;
558 /** The number of pages that are shared that has been left behind by
559 * VMs not doing proper cleanups. */
560 uint64_t cLeftBehindSharedPages;
561 /** The number of allocation chunks.
562 * (The number of pages we've allocated from the host can be derived from this.) */
563 uint32_t cChunks;
564 /** The number of current ballooned pages. */
565 uint64_t cBalloonedPages;
566
567#ifndef GMM_WITH_LEGACY_MODE
568# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
569 /** Whether #RTR0MemObjAllocPhysNC works. */
570 bool fHasWorkingAllocPhysNC;
571# else
572 bool fPadding;
573# endif
574#else
575 /** The legacy allocation mode indicator.
576 * This is determined at initialization time. */
577 bool fLegacyAllocationMode;
578#endif
579 /** The bound memory mode indicator.
580 * When set, the memory will be bound to a specific VM and never
581 * shared. This is always set if fLegacyAllocationMode is set.
582 * (Also determined at initialization time.) */
583 bool fBoundMemoryMode;
584 /** The number of registered VMs. */
585 uint16_t cRegisteredVMs;
586
587 /** The number of freed chunks ever. This is used a list generation to
588 * avoid restarting the cleanup scanning when the list wasn't modified. */
589 uint32_t cFreedChunks;
590 /** The previous allocated Chunk ID.
591 * Used as a hint to avoid scanning the whole bitmap. */
592 uint32_t idChunkPrev;
593 /** Chunk ID allocation bitmap.
594 * Bits of allocated IDs are set, free ones are clear.
595 * The NIL id (0) is marked allocated. */
596 uint32_t bmChunkId[(GMM_CHUNKID_LAST + 1 + 31) / 32];
597
598 /** The index of the next mutex to use. */
599 uint32_t iNextChunkMtx;
600 /** Chunk locks for reducing lock contention without having to allocate
601 * one lock per chunk. */
602 struct
603 {
604 /** The mutex */
605 RTSEMFASTMUTEX hMtx;
606 /** The number of threads currently using this mutex. */
607 uint32_t volatile cUsers;
608 } aChunkMtx[64];
609} GMM;
610/** Pointer to the GMM instance. */
611typedef GMM *PGMM;
612
613/** The value of GMM::u32Magic (Katsuhiro Otomo). */
614#define GMM_MAGIC UINT32_C(0x19540414)
615
616
617/**
618 * GMM chunk mutex state.
619 *
620 * This is returned by gmmR0ChunkMutexAcquire and is used by the other
621 * gmmR0ChunkMutex* methods.
622 */
623typedef struct GMMR0CHUNKMTXSTATE
624{
625 PGMM pGMM;
626 /** The index of the chunk mutex. */
627 uint8_t iChunkMtx;
628 /** The relevant flags (GMMR0CHUNK_MTX_XXX). */
629 uint8_t fFlags;
630} GMMR0CHUNKMTXSTATE;
631/** Pointer to a chunk mutex state. */
632typedef GMMR0CHUNKMTXSTATE *PGMMR0CHUNKMTXSTATE;
633
634/** @name GMMR0CHUNK_MTX_XXX
635 * @{ */
636#define GMMR0CHUNK_MTX_INVALID UINT32_C(0)
637#define GMMR0CHUNK_MTX_KEEP_GIANT UINT32_C(1)
638#define GMMR0CHUNK_MTX_RETAKE_GIANT UINT32_C(2)
639#define GMMR0CHUNK_MTX_DROP_GIANT UINT32_C(3)
640#define GMMR0CHUNK_MTX_END UINT32_C(4)
641/** @} */
642
643
644/** The maximum number of shared modules per-vm. */
645#define GMM_MAX_SHARED_PER_VM_MODULES 2048
646/** The maximum number of shared modules GMM is allowed to track. */
647#define GMM_MAX_SHARED_GLOBAL_MODULES 16834
648
649
650/**
651 * Argument packet for gmmR0SharedModuleCleanup.
652 */
653typedef struct GMMR0SHMODPERVMDTORARGS
654{
655 PGVM pGVM;
656 PGMM pGMM;
657} GMMR0SHMODPERVMDTORARGS;
658
659/**
660 * Argument packet for gmmR0CheckSharedModule.
661 */
662typedef struct GMMCHECKSHAREDMODULEINFO
663{
664 PGVM pGVM;
665 VMCPUID idCpu;
666} GMMCHECKSHAREDMODULEINFO;
667
668
669/*********************************************************************************************************************************
670* Global Variables *
671*********************************************************************************************************************************/
672/** Pointer to the GMM instance data. */
673static PGMM g_pGMM = NULL;
674
675/** Macro for obtaining and validating the g_pGMM pointer.
676 *
677 * On failure it will return from the invoking function with the specified
678 * return value.
679 *
680 * @param pGMM The name of the pGMM variable.
681 * @param rc The return value on failure. Use VERR_GMM_INSTANCE for VBox
682 * status codes.
683 */
684#define GMM_GET_VALID_INSTANCE(pGMM, rc) \
685 do { \
686 (pGMM) = g_pGMM; \
687 AssertPtrReturn((pGMM), (rc)); \
688 AssertMsgReturn((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic), (rc)); \
689 } while (0)
690
691/** Macro for obtaining and validating the g_pGMM pointer, void function
692 * variant.
693 *
694 * On failure it will return from the invoking function.
695 *
696 * @param pGMM The name of the pGMM variable.
697 */
698#define GMM_GET_VALID_INSTANCE_VOID(pGMM) \
699 do { \
700 (pGMM) = g_pGMM; \
701 AssertPtrReturnVoid((pGMM)); \
702 AssertMsgReturnVoid((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic)); \
703 } while (0)
704
705
706/** @def GMM_CHECK_SANITY_UPON_ENTERING
707 * Checks the sanity of the GMM instance data before making changes.
708 *
709 * This is macro is a stub by default and must be enabled manually in the code.
710 *
711 * @returns true if sane, false if not.
712 * @param pGMM The name of the pGMM variable.
713 */
714#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
715# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
716#else
717# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (true)
718#endif
719
720/** @def GMM_CHECK_SANITY_UPON_LEAVING
721 * Checks the sanity of the GMM instance data after making changes.
722 *
723 * This is macro is a stub by default and must be enabled manually in the code.
724 *
725 * @returns true if sane, false if not.
726 * @param pGMM The name of the pGMM variable.
727 */
728#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
729# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
730#else
731# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (true)
732#endif
733
734/** @def GMM_CHECK_SANITY_IN_LOOPS
735 * Checks the sanity of the GMM instance in the allocation loops.
736 *
737 * This is macro is a stub by default and must be enabled manually in the code.
738 *
739 * @returns true if sane, false if not.
740 * @param pGMM The name of the pGMM variable.
741 */
742#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
743# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
744#else
745# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (true)
746#endif
747
748
749/*********************************************************************************************************************************
750* Internal Functions *
751*********************************************************************************************************************************/
752static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM);
753static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
754DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk);
755DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet);
756DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
757#ifdef GMMR0_WITH_SANITY_CHECK
758static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo);
759#endif
760static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem);
761DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
762DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
763static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
764#ifdef VBOX_WITH_PAGE_SHARING
765static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM);
766# ifdef VBOX_STRICT
767static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage);
768# endif
769#endif
770
771
772
773/**
774 * Initializes the GMM component.
775 *
776 * This is called when the VMMR0.r0 module is loaded and protected by the
777 * loader semaphore.
778 *
779 * @returns VBox status code.
780 */
781GMMR0DECL(int) GMMR0Init(void)
782{
783 LogFlow(("GMMInit:\n"));
784
785 /*
786 * Allocate the instance data and the locks.
787 */
788 PGMM pGMM = (PGMM)RTMemAllocZ(sizeof(*pGMM));
789 if (!pGMM)
790 return VERR_NO_MEMORY;
791
792 pGMM->u32Magic = GMM_MAGIC;
793 for (unsigned i = 0; i < RT_ELEMENTS(pGMM->ChunkTLB.aEntries); i++)
794 pGMM->ChunkTLB.aEntries[i].idChunk = NIL_GMM_CHUNKID;
795 RTListInit(&pGMM->ChunkList);
796 ASMBitSet(&pGMM->bmChunkId[0], NIL_GMM_CHUNKID);
797
798#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
799 int rc = RTCritSectInit(&pGMM->GiantCritSect);
800#else
801 int rc = RTSemFastMutexCreate(&pGMM->hMtx);
802#endif
803 if (RT_SUCCESS(rc))
804 {
805 unsigned iMtx;
806 for (iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
807 {
808 rc = RTSemFastMutexCreate(&pGMM->aChunkMtx[iMtx].hMtx);
809 if (RT_FAILURE(rc))
810 break;
811 }
812 if (RT_SUCCESS(rc))
813 {
814#ifndef GMM_WITH_LEGACY_MODE
815 /*
816 * Figure out how we're going to allocate stuff (only applicable to
817 * host with linear physical memory mappings).
818 */
819 pGMM->fBoundMemoryMode = false;
820# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
821 pGMM->fHasWorkingAllocPhysNC = false;
822
823 RTR0MEMOBJ hMemObj;
824 rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
825 if (RT_SUCCESS(rc))
826 {
827 rc = RTR0MemObjFree(hMemObj, true);
828 AssertRC(rc);
829 pGMM->fHasWorkingAllocPhysNC = true;
830 }
831 else if (rc != VERR_NOT_SUPPORTED)
832 SUPR0Printf("GMMR0Init: Warning! RTR0MemObjAllocPhysNC(, %u, NIL_RTHCPHYS) -> %d!\n", GMM_CHUNK_SIZE, rc);
833# endif
834#else /* GMM_WITH_LEGACY_MODE */
835 /*
836 * Check and see if RTR0MemObjAllocPhysNC works.
837 */
838# if 0 /* later, see @bufref{3170}. */
839 RTR0MEMOBJ MemObj;
840 rc = RTR0MemObjAllocPhysNC(&MemObj, _64K, NIL_RTHCPHYS);
841 if (RT_SUCCESS(rc))
842 {
843 rc = RTR0MemObjFree(MemObj, true);
844 AssertRC(rc);
845 }
846 else if (rc == VERR_NOT_SUPPORTED)
847 pGMM->fLegacyAllocationMode = pGMM->fBoundMemoryMode = true;
848 else
849 SUPR0Printf("GMMR0Init: RTR0MemObjAllocPhysNC(,64K,Any) -> %d!\n", rc);
850# else
851# if defined(RT_OS_WINDOWS) || (defined(RT_OS_SOLARIS) && ARCH_BITS == 64) || defined(RT_OS_LINUX) || defined(RT_OS_FREEBSD)
852 pGMM->fLegacyAllocationMode = false;
853# if ARCH_BITS == 32
854 /* Don't reuse possibly partial chunks because of the virtual
855 address space limitation. */
856 pGMM->fBoundMemoryMode = true;
857# else
858 pGMM->fBoundMemoryMode = false;
859# endif
860# else
861 pGMM->fLegacyAllocationMode = true;
862 pGMM->fBoundMemoryMode = true;
863# endif
864# endif
865#endif /* GMM_WITH_LEGACY_MODE */
866
867 /*
868 * Query system page count and guess a reasonable cMaxPages value.
869 */
870 pGMM->cMaxPages = UINT32_MAX; /** @todo IPRT function for query ram size and such. */
871
872 g_pGMM = pGMM;
873#ifdef GMM_WITH_LEGACY_MODE
874 LogFlow(("GMMInit: pGMM=%p fLegacyAllocationMode=%RTbool fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fLegacyAllocationMode, pGMM->fBoundMemoryMode));
875#elif defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
876 LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool fHasWorkingAllocPhysNC=%RTbool\n", pGMM, pGMM->fBoundMemoryMode, pGMM->fHasWorkingAllocPhysNC));
877#else
878 LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fBoundMemoryMode));
879#endif
880 return VINF_SUCCESS;
881 }
882
883 /*
884 * Bail out.
885 */
886 while (iMtx-- > 0)
887 RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
888#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
889 RTCritSectDelete(&pGMM->GiantCritSect);
890#else
891 RTSemFastMutexDestroy(pGMM->hMtx);
892#endif
893 }
894
895 pGMM->u32Magic = 0;
896 RTMemFree(pGMM);
897 SUPR0Printf("GMMR0Init: failed! rc=%d\n", rc);
898 return rc;
899}
900
901
902/**
903 * Terminates the GMM component.
904 */
905GMMR0DECL(void) GMMR0Term(void)
906{
907 LogFlow(("GMMTerm:\n"));
908
909 /*
910 * Take care / be paranoid...
911 */
912 PGMM pGMM = g_pGMM;
913 if (!VALID_PTR(pGMM))
914 return;
915 if (pGMM->u32Magic != GMM_MAGIC)
916 {
917 SUPR0Printf("GMMR0Term: u32Magic=%#x\n", pGMM->u32Magic);
918 return;
919 }
920
921 /*
922 * Undo what init did and free all the resources we've acquired.
923 */
924 /* Destroy the fundamentals. */
925 g_pGMM = NULL;
926 pGMM->u32Magic = ~GMM_MAGIC;
927#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
928 RTCritSectDelete(&pGMM->GiantCritSect);
929#else
930 RTSemFastMutexDestroy(pGMM->hMtx);
931 pGMM->hMtx = NIL_RTSEMFASTMUTEX;
932#endif
933
934 /* Free any chunks still hanging around. */
935 RTAvlU32Destroy(&pGMM->pChunks, gmmR0TermDestroyChunk, pGMM);
936
937 /* Destroy the chunk locks. */
938 for (unsigned iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
939 {
940 Assert(pGMM->aChunkMtx[iMtx].cUsers == 0);
941 RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
942 pGMM->aChunkMtx[iMtx].hMtx = NIL_RTSEMFASTMUTEX;
943 }
944
945 /* Finally the instance data itself. */
946 RTMemFree(pGMM);
947 LogFlow(("GMMTerm: done\n"));
948}
949
950
951/**
952 * RTAvlU32Destroy callback.
953 *
954 * @returns 0
955 * @param pNode The node to destroy.
956 * @param pvGMM The GMM handle.
957 */
958static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM)
959{
960 PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
961
962 if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
963 SUPR0Printf("GMMR0Term: %RKv/%#x: cFree=%d cPrivate=%d cShared=%d cMappings=%d\n", pChunk,
964 pChunk->Core.Key, pChunk->cFree, pChunk->cPrivate, pChunk->cShared, pChunk->cMappingsX);
965
966 int rc = RTR0MemObjFree(pChunk->hMemObj, true /* fFreeMappings */);
967 if (RT_FAILURE(rc))
968 {
969 SUPR0Printf("GMMR0Term: %RKv/%#x: RTRMemObjFree(%RKv,true) -> %d (cMappings=%d)\n", pChunk,
970 pChunk->Core.Key, pChunk->hMemObj, rc, pChunk->cMappingsX);
971 AssertRC(rc);
972 }
973 pChunk->hMemObj = NIL_RTR0MEMOBJ;
974
975 RTMemFree(pChunk->paMappingsX);
976 pChunk->paMappingsX = NULL;
977
978 RTMemFree(pChunk);
979 NOREF(pvGMM);
980 return 0;
981}
982
983
984/**
985 * Initializes the per-VM data for the GMM.
986 *
987 * This is called from within the GVMM lock (from GVMMR0CreateVM)
988 * and should only initialize the data members so GMMR0CleanupVM
989 * can deal with them. We reserve no memory or anything here,
990 * that's done later in GMMR0InitVM.
991 *
992 * @param pGVM Pointer to the Global VM structure.
993 */
994GMMR0DECL(void) GMMR0InitPerVMData(PGVM pGVM)
995{
996 AssertCompile(RT_SIZEOFMEMB(GVM,gmm.s) <= RT_SIZEOFMEMB(GVM,gmm.padding));
997
998 pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
999 pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1000 pGVM->gmm.s.Stats.fMayAllocate = false;
1001}
1002
1003
1004/**
1005 * Acquires the GMM giant lock.
1006 *
1007 * @returns Assert status code from RTSemFastMutexRequest.
1008 * @param pGMM Pointer to the GMM instance.
1009 */
1010static int gmmR0MutexAcquire(PGMM pGMM)
1011{
1012 ASMAtomicIncU32(&pGMM->cMtxContenders);
1013#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1014 int rc = RTCritSectEnter(&pGMM->GiantCritSect);
1015#else
1016 int rc = RTSemFastMutexRequest(pGMM->hMtx);
1017#endif
1018 ASMAtomicDecU32(&pGMM->cMtxContenders);
1019 AssertRC(rc);
1020#ifdef VBOX_STRICT
1021 pGMM->hMtxOwner = RTThreadNativeSelf();
1022#endif
1023 return rc;
1024}
1025
1026
1027/**
1028 * Releases the GMM giant lock.
1029 *
1030 * @returns Assert status code from RTSemFastMutexRequest.
1031 * @param pGMM Pointer to the GMM instance.
1032 */
1033static int gmmR0MutexRelease(PGMM pGMM)
1034{
1035#ifdef VBOX_STRICT
1036 pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1037#endif
1038#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1039 int rc = RTCritSectLeave(&pGMM->GiantCritSect);
1040#else
1041 int rc = RTSemFastMutexRelease(pGMM->hMtx);
1042 AssertRC(rc);
1043#endif
1044 return rc;
1045}
1046
1047
1048/**
1049 * Yields the GMM giant lock if there is contention and a certain minimum time
1050 * has elapsed since we took it.
1051 *
1052 * @returns @c true if the mutex was yielded, @c false if not.
1053 * @param pGMM Pointer to the GMM instance.
1054 * @param puLockNanoTS Where the lock acquisition time stamp is kept
1055 * (in/out).
1056 */
1057static bool gmmR0MutexYield(PGMM pGMM, uint64_t *puLockNanoTS)
1058{
1059 /*
1060 * If nobody is contending the mutex, don't bother checking the time.
1061 */
1062 if (ASMAtomicReadU32(&pGMM->cMtxContenders) == 0)
1063 return false;
1064
1065 /*
1066 * Don't yield if we haven't executed for at least 2 milliseconds.
1067 */
1068 uint64_t uNanoNow = RTTimeSystemNanoTS();
1069 if (uNanoNow - *puLockNanoTS < UINT32_C(2000000))
1070 return false;
1071
1072 /*
1073 * Yield the mutex.
1074 */
1075#ifdef VBOX_STRICT
1076 pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1077#endif
1078 ASMAtomicIncU32(&pGMM->cMtxContenders);
1079#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1080 int rc1 = RTCritSectLeave(&pGMM->GiantCritSect); AssertRC(rc1);
1081#else
1082 int rc1 = RTSemFastMutexRelease(pGMM->hMtx); AssertRC(rc1);
1083#endif
1084
1085 RTThreadYield();
1086
1087#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1088 int rc2 = RTCritSectEnter(&pGMM->GiantCritSect); AssertRC(rc2);
1089#else
1090 int rc2 = RTSemFastMutexRequest(pGMM->hMtx); AssertRC(rc2);
1091#endif
1092 *puLockNanoTS = RTTimeSystemNanoTS();
1093 ASMAtomicDecU32(&pGMM->cMtxContenders);
1094#ifdef VBOX_STRICT
1095 pGMM->hMtxOwner = RTThreadNativeSelf();
1096#endif
1097
1098 return true;
1099}
1100
1101
1102/**
1103 * Acquires a chunk lock.
1104 *
1105 * The caller must own the giant lock.
1106 *
1107 * @returns Assert status code from RTSemFastMutexRequest.
1108 * @param pMtxState The chunk mutex state info. (Avoids
1109 * passing the same flags and stuff around
1110 * for subsequent release and drop-giant
1111 * calls.)
1112 * @param pGMM Pointer to the GMM instance.
1113 * @param pChunk Pointer to the chunk.
1114 * @param fFlags Flags regarding the giant lock, GMMR0CHUNK_MTX_XXX.
1115 */
1116static int gmmR0ChunkMutexAcquire(PGMMR0CHUNKMTXSTATE pMtxState, PGMM pGMM, PGMMCHUNK pChunk, uint32_t fFlags)
1117{
1118 Assert(fFlags > GMMR0CHUNK_MTX_INVALID && fFlags < GMMR0CHUNK_MTX_END);
1119 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1120
1121 pMtxState->pGMM = pGMM;
1122 pMtxState->fFlags = (uint8_t)fFlags;
1123
1124 /*
1125 * Get the lock index and reference the lock.
1126 */
1127 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1128 uint32_t iChunkMtx = pChunk->iChunkMtx;
1129 if (iChunkMtx == UINT8_MAX)
1130 {
1131 iChunkMtx = pGMM->iNextChunkMtx++;
1132 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1133
1134 /* Try get an unused one... */
1135 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1136 {
1137 iChunkMtx = pGMM->iNextChunkMtx++;
1138 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1139 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1140 {
1141 iChunkMtx = pGMM->iNextChunkMtx++;
1142 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1143 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1144 {
1145 iChunkMtx = pGMM->iNextChunkMtx++;
1146 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1147 }
1148 }
1149 }
1150
1151 pChunk->iChunkMtx = iChunkMtx;
1152 }
1153 AssertCompile(RT_ELEMENTS(pGMM->aChunkMtx) < UINT8_MAX);
1154 pMtxState->iChunkMtx = (uint8_t)iChunkMtx;
1155 ASMAtomicIncU32(&pGMM->aChunkMtx[iChunkMtx].cUsers);
1156
1157 /*
1158 * Drop the giant?
1159 */
1160 if (fFlags != GMMR0CHUNK_MTX_KEEP_GIANT)
1161 {
1162 /** @todo GMM life cycle cleanup (we may race someone
1163 * destroying and cleaning up GMM)? */
1164 gmmR0MutexRelease(pGMM);
1165 }
1166
1167 /*
1168 * Take the chunk mutex.
1169 */
1170 int rc = RTSemFastMutexRequest(pGMM->aChunkMtx[iChunkMtx].hMtx);
1171 AssertRC(rc);
1172 return rc;
1173}
1174
1175
1176/**
1177 * Releases the GMM giant lock.
1178 *
1179 * @returns Assert status code from RTSemFastMutexRequest.
1180 * @param pMtxState Pointer to the chunk mutex state.
1181 * @param pChunk Pointer to the chunk if it's still
1182 * alive, NULL if it isn't. This is used to deassociate
1183 * the chunk from the mutex on the way out so a new one
1184 * can be selected next time, thus avoiding contented
1185 * mutexes.
1186 */
1187static int gmmR0ChunkMutexRelease(PGMMR0CHUNKMTXSTATE pMtxState, PGMMCHUNK pChunk)
1188{
1189 PGMM pGMM = pMtxState->pGMM;
1190
1191 /*
1192 * Release the chunk mutex and reacquire the giant if requested.
1193 */
1194 int rc = RTSemFastMutexRelease(pGMM->aChunkMtx[pMtxState->iChunkMtx].hMtx);
1195 AssertRC(rc);
1196 if (pMtxState->fFlags == GMMR0CHUNK_MTX_RETAKE_GIANT)
1197 rc = gmmR0MutexAcquire(pGMM);
1198 else
1199 Assert((pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT) == (pGMM->hMtxOwner == RTThreadNativeSelf()));
1200
1201 /*
1202 * Drop the chunk mutex user reference and deassociate it from the chunk
1203 * when possible.
1204 */
1205 if ( ASMAtomicDecU32(&pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers) == 0
1206 && pChunk
1207 && RT_SUCCESS(rc) )
1208 {
1209 if (pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT)
1210 pChunk->iChunkMtx = UINT8_MAX;
1211 else
1212 {
1213 rc = gmmR0MutexAcquire(pGMM);
1214 if (RT_SUCCESS(rc))
1215 {
1216 if (pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers == 0)
1217 pChunk->iChunkMtx = UINT8_MAX;
1218 rc = gmmR0MutexRelease(pGMM);
1219 }
1220 }
1221 }
1222
1223 pMtxState->pGMM = NULL;
1224 return rc;
1225}
1226
1227
1228/**
1229 * Drops the giant GMM lock we kept in gmmR0ChunkMutexAcquire while keeping the
1230 * chunk locked.
1231 *
1232 * This only works if gmmR0ChunkMutexAcquire was called with
1233 * GMMR0CHUNK_MTX_KEEP_GIANT. gmmR0ChunkMutexRelease will retake the giant
1234 * mutex, i.e. behave as if GMMR0CHUNK_MTX_RETAKE_GIANT was used.
1235 *
1236 * @returns VBox status code (assuming success is ok).
1237 * @param pMtxState Pointer to the chunk mutex state.
1238 */
1239static int gmmR0ChunkMutexDropGiant(PGMMR0CHUNKMTXSTATE pMtxState)
1240{
1241 AssertReturn(pMtxState->fFlags == GMMR0CHUNK_MTX_KEEP_GIANT, VERR_GMM_MTX_FLAGS);
1242 Assert(pMtxState->pGMM->hMtxOwner == RTThreadNativeSelf());
1243 pMtxState->fFlags = GMMR0CHUNK_MTX_RETAKE_GIANT;
1244 /** @todo GMM life cycle cleanup (we may race someone
1245 * destroying and cleaning up GMM)? */
1246 return gmmR0MutexRelease(pMtxState->pGMM);
1247}
1248
1249
1250/**
1251 * For experimenting with NUMA affinity and such.
1252 *
1253 * @returns The current NUMA Node ID.
1254 */
1255static uint16_t gmmR0GetCurrentNumaNodeId(void)
1256{
1257#if 1
1258 return GMM_CHUNK_NUMA_ID_UNKNOWN;
1259#else
1260 return RTMpCpuId() / 16;
1261#endif
1262}
1263
1264
1265
1266/**
1267 * Cleans up when a VM is terminating.
1268 *
1269 * @param pGVM Pointer to the Global VM structure.
1270 */
1271GMMR0DECL(void) GMMR0CleanupVM(PGVM pGVM)
1272{
1273 LogFlow(("GMMR0CleanupVM: pGVM=%p:{.hSelf=%#x}\n", pGVM, pGVM->hSelf));
1274
1275 PGMM pGMM;
1276 GMM_GET_VALID_INSTANCE_VOID(pGMM);
1277
1278#ifdef VBOX_WITH_PAGE_SHARING
1279 /*
1280 * Clean up all registered shared modules first.
1281 */
1282 gmmR0SharedModuleCleanup(pGMM, pGVM);
1283#endif
1284
1285 gmmR0MutexAcquire(pGMM);
1286 uint64_t uLockNanoTS = RTTimeSystemNanoTS();
1287 GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
1288
1289 /*
1290 * The policy is 'INVALID' until the initial reservation
1291 * request has been serviced.
1292 */
1293 if ( pGVM->gmm.s.Stats.enmPolicy > GMMOCPOLICY_INVALID
1294 && pGVM->gmm.s.Stats.enmPolicy < GMMOCPOLICY_END)
1295 {
1296 /*
1297 * If it's the last VM around, we can skip walking all the chunk looking
1298 * for the pages owned by this VM and instead flush the whole shebang.
1299 *
1300 * This takes care of the eventuality that a VM has left shared page
1301 * references behind (shouldn't happen of course, but you never know).
1302 */
1303 Assert(pGMM->cRegisteredVMs);
1304 pGMM->cRegisteredVMs--;
1305
1306 /*
1307 * Walk the entire pool looking for pages that belong to this VM
1308 * and leftover mappings. (This'll only catch private pages,
1309 * shared pages will be 'left behind'.)
1310 */
1311 /** @todo r=bird: This scanning+freeing could be optimized in bound mode! */
1312 uint64_t cPrivatePages = pGVM->gmm.s.Stats.cPrivatePages; /* save */
1313
1314 unsigned iCountDown = 64;
1315 bool fRedoFromStart;
1316 PGMMCHUNK pChunk;
1317 do
1318 {
1319 fRedoFromStart = false;
1320 RTListForEachReverse(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
1321 {
1322 uint32_t const cFreeChunksOld = pGMM->cFreedChunks;
1323 if ( ( !pGMM->fBoundMemoryMode
1324 || pChunk->hGVM == pGVM->hSelf)
1325 && gmmR0CleanupVMScanChunk(pGMM, pGVM, pChunk))
1326 {
1327 /* We left the giant mutex, so reset the yield counters. */
1328 uLockNanoTS = RTTimeSystemNanoTS();
1329 iCountDown = 64;
1330 }
1331 else
1332 {
1333 /* Didn't leave it, so do normal yielding. */
1334 if (!iCountDown)
1335 gmmR0MutexYield(pGMM, &uLockNanoTS);
1336 else
1337 iCountDown--;
1338 }
1339 if (pGMM->cFreedChunks != cFreeChunksOld)
1340 {
1341 fRedoFromStart = true;
1342 break;
1343 }
1344 }
1345 } while (fRedoFromStart);
1346
1347 if (pGVM->gmm.s.Stats.cPrivatePages)
1348 SUPR0Printf("GMMR0CleanupVM: hGVM=%#x has %#x private pages that cannot be found!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cPrivatePages);
1349
1350 pGMM->cAllocatedPages -= cPrivatePages;
1351
1352 /*
1353 * Free empty chunks.
1354 */
1355 PGMMCHUNKFREESET pPrivateSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
1356 do
1357 {
1358 fRedoFromStart = false;
1359 iCountDown = 10240;
1360 pChunk = pPrivateSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
1361 while (pChunk)
1362 {
1363 PGMMCHUNK pNext = pChunk->pFreeNext;
1364 Assert(pChunk->cFree == GMM_CHUNK_NUM_PAGES);
1365 if ( !pGMM->fBoundMemoryMode
1366 || pChunk->hGVM == pGVM->hSelf)
1367 {
1368 uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1369 if (gmmR0FreeChunk(pGMM, pGVM, pChunk, true /*fRelaxedSem*/))
1370 {
1371 /* We've left the giant mutex, restart? (+1 for our unlink) */
1372 fRedoFromStart = pPrivateSet->idGeneration != idGenerationOld + 1;
1373 if (fRedoFromStart)
1374 break;
1375 uLockNanoTS = RTTimeSystemNanoTS();
1376 iCountDown = 10240;
1377 }
1378 }
1379
1380 /* Advance and maybe yield the lock. */
1381 pChunk = pNext;
1382 if (--iCountDown == 0)
1383 {
1384 uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1385 fRedoFromStart = gmmR0MutexYield(pGMM, &uLockNanoTS)
1386 && pPrivateSet->idGeneration != idGenerationOld;
1387 if (fRedoFromStart)
1388 break;
1389 iCountDown = 10240;
1390 }
1391 }
1392 } while (fRedoFromStart);
1393
1394 /*
1395 * Account for shared pages that weren't freed.
1396 */
1397 if (pGVM->gmm.s.Stats.cSharedPages)
1398 {
1399 Assert(pGMM->cSharedPages >= pGVM->gmm.s.Stats.cSharedPages);
1400 SUPR0Printf("GMMR0CleanupVM: hGVM=%#x left %#x shared pages behind!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cSharedPages);
1401 pGMM->cLeftBehindSharedPages += pGVM->gmm.s.Stats.cSharedPages;
1402 }
1403
1404 /*
1405 * Clean up balloon statistics in case the VM process crashed.
1406 */
1407 Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
1408 pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
1409
1410 /*
1411 * Update the over-commitment management statistics.
1412 */
1413 pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1414 + pGVM->gmm.s.Stats.Reserved.cFixedPages
1415 + pGVM->gmm.s.Stats.Reserved.cShadowPages;
1416 switch (pGVM->gmm.s.Stats.enmPolicy)
1417 {
1418 case GMMOCPOLICY_NO_OC:
1419 break;
1420 default:
1421 /** @todo Update GMM->cOverCommittedPages */
1422 break;
1423 }
1424 }
1425
1426 /* zap the GVM data. */
1427 pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1428 pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1429 pGVM->gmm.s.Stats.fMayAllocate = false;
1430
1431 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1432 gmmR0MutexRelease(pGMM);
1433
1434 LogFlow(("GMMR0CleanupVM: returns\n"));
1435}
1436
1437
1438/**
1439 * Scan one chunk for private pages belonging to the specified VM.
1440 *
1441 * @note This function may drop the giant mutex!
1442 *
1443 * @returns @c true if we've temporarily dropped the giant mutex, @c false if
1444 * we didn't.
1445 * @param pGMM Pointer to the GMM instance.
1446 * @param pGVM The global VM handle.
1447 * @param pChunk The chunk to scan.
1448 */
1449static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1450{
1451 Assert(!pGMM->fBoundMemoryMode || pChunk->hGVM == pGVM->hSelf);
1452
1453 /*
1454 * Look for pages belonging to the VM.
1455 * (Perform some internal checks while we're scanning.)
1456 */
1457#ifndef VBOX_STRICT
1458 if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
1459#endif
1460 {
1461 unsigned cPrivate = 0;
1462 unsigned cShared = 0;
1463 unsigned cFree = 0;
1464
1465 gmmR0UnlinkChunk(pChunk); /* avoiding cFreePages updates. */
1466
1467 uint16_t hGVM = pGVM->hSelf;
1468 unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
1469 while (iPage-- > 0)
1470 if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
1471 {
1472 if (pChunk->aPages[iPage].Private.hGVM == hGVM)
1473 {
1474 /*
1475 * Free the page.
1476 *
1477 * The reason for not using gmmR0FreePrivatePage here is that we
1478 * must *not* cause the chunk to be freed from under us - we're in
1479 * an AVL tree walk here.
1480 */
1481 pChunk->aPages[iPage].u = 0;
1482 pChunk->aPages[iPage].Free.iNext = pChunk->iFreeHead;
1483 pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
1484 pChunk->iFreeHead = iPage;
1485 pChunk->cPrivate--;
1486 pChunk->cFree++;
1487 pGVM->gmm.s.Stats.cPrivatePages--;
1488 cFree++;
1489 }
1490 else
1491 cPrivate++;
1492 }
1493 else if (GMM_PAGE_IS_FREE(&pChunk->aPages[iPage]))
1494 cFree++;
1495 else
1496 cShared++;
1497
1498 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1499
1500 /*
1501 * Did it add up?
1502 */
1503 if (RT_UNLIKELY( pChunk->cFree != cFree
1504 || pChunk->cPrivate != cPrivate
1505 || pChunk->cShared != cShared))
1506 {
1507 SUPR0Printf("gmmR0CleanupVMScanChunk: Chunk %RKv/%#x has bogus stats - free=%d/%d private=%d/%d shared=%d/%d\n",
1508 pChunk, pChunk->Core.Key, pChunk->cFree, cFree, pChunk->cPrivate, cPrivate, pChunk->cShared, cShared);
1509 pChunk->cFree = cFree;
1510 pChunk->cPrivate = cPrivate;
1511 pChunk->cShared = cShared;
1512 }
1513 }
1514
1515 /*
1516 * If not in bound memory mode, we should reset the hGVM field
1517 * if it has our handle in it.
1518 */
1519 if (pChunk->hGVM == pGVM->hSelf)
1520 {
1521 if (!g_pGMM->fBoundMemoryMode)
1522 pChunk->hGVM = NIL_GVM_HANDLE;
1523 else if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
1524 {
1525 SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: cFree=%#x - it should be 0 in bound mode!\n",
1526 pChunk, pChunk->Core.Key, pChunk->cFree);
1527 AssertMsgFailed(("%p/%#x: cFree=%#x - it should be 0 in bound mode!\n", pChunk, pChunk->Core.Key, pChunk->cFree));
1528
1529 gmmR0UnlinkChunk(pChunk);
1530 pChunk->cFree = GMM_CHUNK_NUM_PAGES;
1531 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1532 }
1533 }
1534
1535 /*
1536 * Look for a mapping belonging to the terminating VM.
1537 */
1538 GMMR0CHUNKMTXSTATE MtxState;
1539 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
1540 unsigned cMappings = pChunk->cMappingsX;
1541 for (unsigned i = 0; i < cMappings; i++)
1542 if (pChunk->paMappingsX[i].pGVM == pGVM)
1543 {
1544 gmmR0ChunkMutexDropGiant(&MtxState);
1545
1546 RTR0MEMOBJ hMemObj = pChunk->paMappingsX[i].hMapObj;
1547
1548 cMappings--;
1549 if (i < cMappings)
1550 pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
1551 pChunk->paMappingsX[cMappings].pGVM = NULL;
1552 pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
1553 Assert(pChunk->cMappingsX - 1U == cMappings);
1554 pChunk->cMappingsX = cMappings;
1555
1556 int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings (NA) */);
1557 if (RT_FAILURE(rc))
1558 {
1559 SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: mapping #%x: RTRMemObjFree(%RKv,false) -> %d \n",
1560 pChunk, pChunk->Core.Key, i, hMemObj, rc);
1561 AssertRC(rc);
1562 }
1563
1564 gmmR0ChunkMutexRelease(&MtxState, pChunk);
1565 return true;
1566 }
1567
1568 gmmR0ChunkMutexRelease(&MtxState, pChunk);
1569 return false;
1570}
1571
1572
1573/**
1574 * The initial resource reservations.
1575 *
1576 * This will make memory reservations according to policy and priority. If there aren't
1577 * sufficient resources available to sustain the VM this function will fail and all
1578 * future allocations requests will fail as well.
1579 *
1580 * These are just the initial reservations made very very early during the VM creation
1581 * process and will be adjusted later in the GMMR0UpdateReservation call after the
1582 * ring-3 init has completed.
1583 *
1584 * @returns VBox status code.
1585 * @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1586 * @retval VERR_GMM_
1587 *
1588 * @param pGVM The global (ring-0) VM structure.
1589 * @param idCpu The VCPU id - must be zero.
1590 * @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1591 * This does not include MMIO2 and similar.
1592 * @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1593 * @param cFixedPages The number of pages that may be allocated for fixed objects like the
1594 * hyper heap, MMIO2 and similar.
1595 * @param enmPolicy The OC policy to use on this VM.
1596 * @param enmPriority The priority in an out-of-memory situation.
1597 *
1598 * @thread The creator thread / EMT(0).
1599 */
1600GMMR0DECL(int) GMMR0InitialReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages,
1601 uint32_t cFixedPages, GMMOCPOLICY enmPolicy, GMMPRIORITY enmPriority)
1602{
1603 LogFlow(("GMMR0InitialReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x enmPolicy=%d enmPriority=%d\n",
1604 pGVM, cBasePages, cShadowPages, cFixedPages, enmPolicy, enmPriority));
1605
1606 /*
1607 * Validate, get basics and take the semaphore.
1608 */
1609 AssertReturn(idCpu == 0, VERR_INVALID_CPU_ID);
1610 PGMM pGMM;
1611 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1612 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1613 if (RT_FAILURE(rc))
1614 return rc;
1615
1616 AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1617 AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1618 AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1619 AssertReturn(enmPolicy > GMMOCPOLICY_INVALID && enmPolicy < GMMOCPOLICY_END, VERR_INVALID_PARAMETER);
1620 AssertReturn(enmPriority > GMMPRIORITY_INVALID && enmPriority < GMMPRIORITY_END, VERR_INVALID_PARAMETER);
1621
1622 gmmR0MutexAcquire(pGMM);
1623 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1624 {
1625 if ( !pGVM->gmm.s.Stats.Reserved.cBasePages
1626 && !pGVM->gmm.s.Stats.Reserved.cFixedPages
1627 && !pGVM->gmm.s.Stats.Reserved.cShadowPages)
1628 {
1629 /*
1630 * Check if we can accommodate this.
1631 */
1632 /* ... later ... */
1633 if (RT_SUCCESS(rc))
1634 {
1635 /*
1636 * Update the records.
1637 */
1638 pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1639 pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1640 pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1641 pGVM->gmm.s.Stats.enmPolicy = enmPolicy;
1642 pGVM->gmm.s.Stats.enmPriority = enmPriority;
1643 pGVM->gmm.s.Stats.fMayAllocate = true;
1644
1645 pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1646 pGMM->cRegisteredVMs++;
1647 }
1648 }
1649 else
1650 rc = VERR_WRONG_ORDER;
1651 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1652 }
1653 else
1654 rc = VERR_GMM_IS_NOT_SANE;
1655 gmmR0MutexRelease(pGMM);
1656 LogFlow(("GMMR0InitialReservation: returns %Rrc\n", rc));
1657 return rc;
1658}
1659
1660
1661/**
1662 * VMMR0 request wrapper for GMMR0InitialReservation.
1663 *
1664 * @returns see GMMR0InitialReservation.
1665 * @param pGVM The global (ring-0) VM structure.
1666 * @param idCpu The VCPU id.
1667 * @param pReq Pointer to the request packet.
1668 */
1669GMMR0DECL(int) GMMR0InitialReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMINITIALRESERVATIONREQ pReq)
1670{
1671 /*
1672 * Validate input and pass it on.
1673 */
1674 AssertPtrReturn(pGVM, VERR_INVALID_POINTER);
1675 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1676 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
1677
1678 return GMMR0InitialReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages,
1679 pReq->cFixedPages, pReq->enmPolicy, pReq->enmPriority);
1680}
1681
1682
1683/**
1684 * This updates the memory reservation with the additional MMIO2 and ROM pages.
1685 *
1686 * @returns VBox status code.
1687 * @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1688 *
1689 * @param pGVM The global (ring-0) VM structure.
1690 * @param idCpu The VCPU id.
1691 * @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1692 * This does not include MMIO2 and similar.
1693 * @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1694 * @param cFixedPages The number of pages that may be allocated for fixed objects like the
1695 * hyper heap, MMIO2 and similar.
1696 *
1697 * @thread EMT(idCpu)
1698 */
1699GMMR0DECL(int) GMMR0UpdateReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages,
1700 uint32_t cShadowPages, uint32_t cFixedPages)
1701{
1702 LogFlow(("GMMR0UpdateReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x\n",
1703 pGVM, cBasePages, cShadowPages, cFixedPages));
1704
1705 /*
1706 * Validate, get basics and take the semaphore.
1707 */
1708 PGMM pGMM;
1709 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1710 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1711 if (RT_FAILURE(rc))
1712 return rc;
1713
1714 AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1715 AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1716 AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1717
1718 gmmR0MutexAcquire(pGMM);
1719 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1720 {
1721 if ( pGVM->gmm.s.Stats.Reserved.cBasePages
1722 && pGVM->gmm.s.Stats.Reserved.cFixedPages
1723 && pGVM->gmm.s.Stats.Reserved.cShadowPages)
1724 {
1725 /*
1726 * Check if we can accommodate this.
1727 */
1728 /* ... later ... */
1729 if (RT_SUCCESS(rc))
1730 {
1731 /*
1732 * Update the records.
1733 */
1734 pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1735 + pGVM->gmm.s.Stats.Reserved.cFixedPages
1736 + pGVM->gmm.s.Stats.Reserved.cShadowPages;
1737 pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1738
1739 pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1740 pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1741 pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1742 }
1743 }
1744 else
1745 rc = VERR_WRONG_ORDER;
1746 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1747 }
1748 else
1749 rc = VERR_GMM_IS_NOT_SANE;
1750 gmmR0MutexRelease(pGMM);
1751 LogFlow(("GMMR0UpdateReservation: returns %Rrc\n", rc));
1752 return rc;
1753}
1754
1755
1756/**
1757 * VMMR0 request wrapper for GMMR0UpdateReservation.
1758 *
1759 * @returns see GMMR0UpdateReservation.
1760 * @param pGVM The global (ring-0) VM structure.
1761 * @param idCpu The VCPU id.
1762 * @param pReq Pointer to the request packet.
1763 */
1764GMMR0DECL(int) GMMR0UpdateReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMUPDATERESERVATIONREQ pReq)
1765{
1766 /*
1767 * Validate input and pass it on.
1768 */
1769 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1770 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
1771
1772 return GMMR0UpdateReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages);
1773}
1774
1775#ifdef GMMR0_WITH_SANITY_CHECK
1776
1777/**
1778 * Performs sanity checks on a free set.
1779 *
1780 * @returns Error count.
1781 *
1782 * @param pGMM Pointer to the GMM instance.
1783 * @param pSet Pointer to the set.
1784 * @param pszSetName The set name.
1785 * @param pszFunction The function from which it was called.
1786 * @param uLine The line number.
1787 */
1788static uint32_t gmmR0SanityCheckSet(PGMM pGMM, PGMMCHUNKFREESET pSet, const char *pszSetName,
1789 const char *pszFunction, unsigned uLineNo)
1790{
1791 uint32_t cErrors = 0;
1792
1793 /*
1794 * Count the free pages in all the chunks and match it against pSet->cFreePages.
1795 */
1796 uint32_t cPages = 0;
1797 for (unsigned i = 0; i < RT_ELEMENTS(pSet->apLists); i++)
1798 {
1799 for (PGMMCHUNK pCur = pSet->apLists[i]; pCur; pCur = pCur->pFreeNext)
1800 {
1801 /** @todo check that the chunk is hash into the right set. */
1802 cPages += pCur->cFree;
1803 }
1804 }
1805 if (RT_UNLIKELY(cPages != pSet->cFreePages))
1806 {
1807 SUPR0Printf("GMM insanity: found %#x pages in the %s set, expected %#x. (%s, line %u)\n",
1808 cPages, pszSetName, pSet->cFreePages, pszFunction, uLineNo);
1809 cErrors++;
1810 }
1811
1812 return cErrors;
1813}
1814
1815
1816/**
1817 * Performs some sanity checks on the GMM while owning lock.
1818 *
1819 * @returns Error count.
1820 *
1821 * @param pGMM Pointer to the GMM instance.
1822 * @param pszFunction The function from which it is called.
1823 * @param uLineNo The line number.
1824 */
1825static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo)
1826{
1827 uint32_t cErrors = 0;
1828
1829 cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->PrivateX, "private", pszFunction, uLineNo);
1830 cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->Shared, "shared", pszFunction, uLineNo);
1831 /** @todo add more sanity checks. */
1832
1833 return cErrors;
1834}
1835
1836#endif /* GMMR0_WITH_SANITY_CHECK */
1837
1838/**
1839 * Looks up a chunk in the tree and fill in the TLB entry for it.
1840 *
1841 * This is not expected to fail and will bitch if it does.
1842 *
1843 * @returns Pointer to the allocation chunk, NULL if not found.
1844 * @param pGMM Pointer to the GMM instance.
1845 * @param idChunk The ID of the chunk to find.
1846 * @param pTlbe Pointer to the TLB entry.
1847 */
1848static PGMMCHUNK gmmR0GetChunkSlow(PGMM pGMM, uint32_t idChunk, PGMMCHUNKTLBE pTlbe)
1849{
1850 PGMMCHUNK pChunk = (PGMMCHUNK)RTAvlU32Get(&pGMM->pChunks, idChunk);
1851 AssertMsgReturn(pChunk, ("Chunk %#x not found!\n", idChunk), NULL);
1852 pTlbe->idChunk = idChunk;
1853 pTlbe->pChunk = pChunk;
1854 return pChunk;
1855}
1856
1857
1858/**
1859 * Finds a allocation chunk.
1860 *
1861 * This is not expected to fail and will bitch if it does.
1862 *
1863 * @returns Pointer to the allocation chunk, NULL if not found.
1864 * @param pGMM Pointer to the GMM instance.
1865 * @param idChunk The ID of the chunk to find.
1866 */
1867DECLINLINE(PGMMCHUNK) gmmR0GetChunk(PGMM pGMM, uint32_t idChunk)
1868{
1869 /*
1870 * Do a TLB lookup, branch if not in the TLB.
1871 */
1872 PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(idChunk)];
1873 if ( pTlbe->idChunk != idChunk
1874 || !pTlbe->pChunk)
1875 return gmmR0GetChunkSlow(pGMM, idChunk, pTlbe);
1876 return pTlbe->pChunk;
1877}
1878
1879
1880/**
1881 * Finds a page.
1882 *
1883 * This is not expected to fail and will bitch if it does.
1884 *
1885 * @returns Pointer to the page, NULL if not found.
1886 * @param pGMM Pointer to the GMM instance.
1887 * @param idPage The ID of the page to find.
1888 */
1889DECLINLINE(PGMMPAGE) gmmR0GetPage(PGMM pGMM, uint32_t idPage)
1890{
1891 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1892 if (RT_LIKELY(pChunk))
1893 return &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
1894 return NULL;
1895}
1896
1897
1898#if 0 /* unused */
1899/**
1900 * Gets the host physical address for a page given by it's ID.
1901 *
1902 * @returns The host physical address or NIL_RTHCPHYS.
1903 * @param pGMM Pointer to the GMM instance.
1904 * @param idPage The ID of the page to find.
1905 */
1906DECLINLINE(RTHCPHYS) gmmR0GetPageHCPhys(PGMM pGMM, uint32_t idPage)
1907{
1908 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1909 if (RT_LIKELY(pChunk))
1910 return RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, idPage & GMM_PAGEID_IDX_MASK);
1911 return NIL_RTHCPHYS;
1912}
1913#endif /* unused */
1914
1915
1916/**
1917 * Selects the appropriate free list given the number of free pages.
1918 *
1919 * @returns Free list index.
1920 * @param cFree The number of free pages in the chunk.
1921 */
1922DECLINLINE(unsigned) gmmR0SelectFreeSetList(unsigned cFree)
1923{
1924 unsigned iList = cFree >> GMM_CHUNK_FREE_SET_SHIFT;
1925 AssertMsg(iList < RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists) / RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists[0]),
1926 ("%d (%u)\n", iList, cFree));
1927 return iList;
1928}
1929
1930
1931/**
1932 * Unlinks the chunk from the free list it's currently on (if any).
1933 *
1934 * @param pChunk The allocation chunk.
1935 */
1936DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk)
1937{
1938 PGMMCHUNKFREESET pSet = pChunk->pSet;
1939 if (RT_LIKELY(pSet))
1940 {
1941 pSet->cFreePages -= pChunk->cFree;
1942 pSet->idGeneration++;
1943
1944 PGMMCHUNK pPrev = pChunk->pFreePrev;
1945 PGMMCHUNK pNext = pChunk->pFreeNext;
1946 if (pPrev)
1947 pPrev->pFreeNext = pNext;
1948 else
1949 pSet->apLists[gmmR0SelectFreeSetList(pChunk->cFree)] = pNext;
1950 if (pNext)
1951 pNext->pFreePrev = pPrev;
1952
1953 pChunk->pSet = NULL;
1954 pChunk->pFreeNext = NULL;
1955 pChunk->pFreePrev = NULL;
1956 }
1957 else
1958 {
1959 Assert(!pChunk->pFreeNext);
1960 Assert(!pChunk->pFreePrev);
1961 Assert(!pChunk->cFree);
1962 }
1963}
1964
1965
1966/**
1967 * Links the chunk onto the appropriate free list in the specified free set.
1968 *
1969 * If no free entries, it's not linked into any list.
1970 *
1971 * @param pChunk The allocation chunk.
1972 * @param pSet The free set.
1973 */
1974DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet)
1975{
1976 Assert(!pChunk->pSet);
1977 Assert(!pChunk->pFreeNext);
1978 Assert(!pChunk->pFreePrev);
1979
1980 if (pChunk->cFree > 0)
1981 {
1982 pChunk->pSet = pSet;
1983 pChunk->pFreePrev = NULL;
1984 unsigned const iList = gmmR0SelectFreeSetList(pChunk->cFree);
1985 pChunk->pFreeNext = pSet->apLists[iList];
1986 if (pChunk->pFreeNext)
1987 pChunk->pFreeNext->pFreePrev = pChunk;
1988 pSet->apLists[iList] = pChunk;
1989
1990 pSet->cFreePages += pChunk->cFree;
1991 pSet->idGeneration++;
1992 }
1993}
1994
1995
1996/**
1997 * Links the chunk onto the appropriate free list in the specified free set.
1998 *
1999 * If no free entries, it's not linked into any list.
2000 *
2001 * @param pGMM Pointer to the GMM instance.
2002 * @param pGVM Pointer to the kernel-only VM instace data.
2003 * @param pChunk The allocation chunk.
2004 */
2005DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
2006{
2007 PGMMCHUNKFREESET pSet;
2008 if (pGMM->fBoundMemoryMode)
2009 pSet = &pGVM->gmm.s.Private;
2010 else if (pChunk->cShared)
2011 pSet = &pGMM->Shared;
2012 else
2013 pSet = &pGMM->PrivateX;
2014 gmmR0LinkChunk(pChunk, pSet);
2015}
2016
2017
2018/**
2019 * Frees a Chunk ID.
2020 *
2021 * @param pGMM Pointer to the GMM instance.
2022 * @param idChunk The Chunk ID to free.
2023 */
2024static void gmmR0FreeChunkId(PGMM pGMM, uint32_t idChunk)
2025{
2026 AssertReturnVoid(idChunk != NIL_GMM_CHUNKID);
2027 AssertMsg(ASMBitTest(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk));
2028 ASMAtomicBitClear(&pGMM->bmChunkId[0], idChunk);
2029}
2030
2031
2032/**
2033 * Allocates a new Chunk ID.
2034 *
2035 * @returns The Chunk ID.
2036 * @param pGMM Pointer to the GMM instance.
2037 */
2038static uint32_t gmmR0AllocateChunkId(PGMM pGMM)
2039{
2040 AssertCompile(!((GMM_CHUNKID_LAST + 1) & 31)); /* must be a multiple of 32 */
2041 AssertCompile(NIL_GMM_CHUNKID == 0);
2042
2043 /*
2044 * Try the next sequential one.
2045 */
2046 int32_t idChunk = ++pGMM->idChunkPrev;
2047#if 0 /** @todo enable this code */
2048 if ( idChunk <= GMM_CHUNKID_LAST
2049 && idChunk > NIL_GMM_CHUNKID
2050 && !ASMAtomicBitTestAndSet(&pVMM->bmChunkId[0], idChunk))
2051 return idChunk;
2052#endif
2053
2054 /*
2055 * Scan sequentially from the last one.
2056 */
2057 if ( (uint32_t)idChunk < GMM_CHUNKID_LAST
2058 && idChunk > NIL_GMM_CHUNKID)
2059 {
2060 idChunk = ASMBitNextClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1, idChunk - 1);
2061 if (idChunk > NIL_GMM_CHUNKID)
2062 {
2063 AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2064 return pGMM->idChunkPrev = idChunk;
2065 }
2066 }
2067
2068 /*
2069 * Ok, scan from the start.
2070 * We're not racing anyone, so there is no need to expect failures or have restart loops.
2071 */
2072 idChunk = ASMBitFirstClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1);
2073 AssertMsgReturn(idChunk > NIL_GMM_CHUNKID, ("%#x\n", idChunk), NIL_GVM_HANDLE);
2074 AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2075
2076 return pGMM->idChunkPrev = idChunk;
2077}
2078
2079
2080/**
2081 * Allocates one private page.
2082 *
2083 * Worker for gmmR0AllocatePages.
2084 *
2085 * @param pChunk The chunk to allocate it from.
2086 * @param hGVM The GVM handle of the VM requesting memory.
2087 * @param pPageDesc The page descriptor.
2088 */
2089static void gmmR0AllocatePage(PGMMCHUNK pChunk, uint32_t hGVM, PGMMPAGEDESC pPageDesc)
2090{
2091 /* update the chunk stats. */
2092 if (pChunk->hGVM == NIL_GVM_HANDLE)
2093 pChunk->hGVM = hGVM;
2094 Assert(pChunk->cFree);
2095 pChunk->cFree--;
2096 pChunk->cPrivate++;
2097
2098 /* unlink the first free page. */
2099 const uint32_t iPage = pChunk->iFreeHead;
2100 AssertReleaseMsg(iPage < RT_ELEMENTS(pChunk->aPages), ("%d\n", iPage));
2101 PGMMPAGE pPage = &pChunk->aPages[iPage];
2102 Assert(GMM_PAGE_IS_FREE(pPage));
2103 pChunk->iFreeHead = pPage->Free.iNext;
2104 Log3(("A pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x iNext=%#x\n",
2105 pPage, iPage, (pChunk->Core.Key << GMM_CHUNKID_SHIFT) | iPage,
2106 pPage->Common.u2State, pChunk->iFreeHead, pPage->Free.iNext));
2107
2108 /* make the page private. */
2109 pPage->u = 0;
2110 AssertCompile(GMM_PAGE_STATE_PRIVATE == 0);
2111 pPage->Private.hGVM = hGVM;
2112 AssertCompile(NIL_RTHCPHYS >= GMM_GCPHYS_LAST);
2113 AssertCompile(GMM_GCPHYS_UNSHAREABLE >= GMM_GCPHYS_LAST);
2114 if (pPageDesc->HCPhysGCPhys <= GMM_GCPHYS_LAST)
2115 pPage->Private.pfn = pPageDesc->HCPhysGCPhys >> PAGE_SHIFT;
2116 else
2117 pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE; /* unshareable / unassigned - same thing. */
2118
2119 /* update the page descriptor. */
2120 pPageDesc->HCPhysGCPhys = RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, iPage);
2121 Assert(pPageDesc->HCPhysGCPhys != NIL_RTHCPHYS);
2122 pPageDesc->idPage = (pChunk->Core.Key << GMM_CHUNKID_SHIFT) | iPage;
2123 pPageDesc->idSharedPage = NIL_GMM_PAGEID;
2124}
2125
2126
2127/**
2128 * Picks the free pages from a chunk.
2129 *
2130 * @returns The new page descriptor table index.
2131 * @param pChunk The chunk.
2132 * @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2133 * affinity.
2134 * @param iPage The current page descriptor table index.
2135 * @param cPages The total number of pages to allocate.
2136 * @param paPages The page descriptor table (input + ouput).
2137 */
2138static uint32_t gmmR0AllocatePagesFromChunk(PGMMCHUNK pChunk, uint16_t const hGVM, uint32_t iPage, uint32_t cPages,
2139 PGMMPAGEDESC paPages)
2140{
2141 PGMMCHUNKFREESET pSet = pChunk->pSet; Assert(pSet);
2142 gmmR0UnlinkChunk(pChunk);
2143
2144 for (; pChunk->cFree && iPage < cPages; iPage++)
2145 gmmR0AllocatePage(pChunk, hGVM, &paPages[iPage]);
2146
2147 gmmR0LinkChunk(pChunk, pSet);
2148 return iPage;
2149}
2150
2151
2152/**
2153 * Registers a new chunk of memory.
2154 *
2155 * This is called by both gmmR0AllocateOneChunk and GMMR0SeedChunk.
2156 *
2157 * @returns VBox status code. On success, the giant GMM lock will be held, the
2158 * caller must release it (ugly).
2159 * @param pGMM Pointer to the GMM instance.
2160 * @param pSet Pointer to the set.
2161 * @param hMemObj The memory object for the chunk.
2162 * @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2163 * affinity.
2164 * @param fChunkFlags The chunk flags, GMM_CHUNK_FLAGS_XXX.
2165 * @param ppChunk Chunk address (out). Optional.
2166 *
2167 * @remarks The caller must not own the giant GMM mutex.
2168 * The giant GMM mutex will be acquired and returned acquired in
2169 * the success path. On failure, no locks will be held.
2170 */
2171static int gmmR0RegisterChunk(PGMM pGMM, PGMMCHUNKFREESET pSet, RTR0MEMOBJ hMemObj, uint16_t hGVM, uint16_t fChunkFlags,
2172 PGMMCHUNK *ppChunk)
2173{
2174 Assert(pGMM->hMtxOwner != RTThreadNativeSelf());
2175 Assert(hGVM != NIL_GVM_HANDLE || pGMM->fBoundMemoryMode);
2176#ifdef GMM_WITH_LEGACY_MODE
2177 Assert(fChunkFlags == 0 || fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE || fChunkFlags == GMM_CHUNK_FLAGS_SEEDED);
2178#else
2179 Assert(fChunkFlags == 0 || fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE);
2180#endif
2181
2182#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
2183 /*
2184 * Get a ring-0 mapping of the object.
2185 */
2186# ifdef GMM_WITH_LEGACY_MODE
2187 uint8_t *pbMapping = !(fChunkFlags & GMM_CHUNK_FLAGS_SEEDED) ? (uint8_t *)RTR0MemObjAddress(hMemObj) : NULL;
2188# else
2189 uint8_t *pbMapping = (uint8_t *)RTR0MemObjAddress(hMemObj);
2190# endif
2191 if (!pbMapping)
2192 {
2193 RTR0MEMOBJ hMapObj;
2194 int rc = RTR0MemObjMapKernel(&hMapObj, hMemObj, (void *)-1, 0, RTMEM_PROT_READ | RTMEM_PROT_WRITE);
2195 if (RT_SUCCESS(rc))
2196 pbMapping = (uint8_t *)RTR0MemObjAddress(hMapObj);
2197 else
2198 return rc;
2199 AssertPtr(pbMapping);
2200 }
2201#endif
2202
2203 /*
2204 * Allocate a chunk.
2205 */
2206 int rc;
2207 PGMMCHUNK pChunk = (PGMMCHUNK)RTMemAllocZ(sizeof(*pChunk));
2208 if (pChunk)
2209 {
2210 /*
2211 * Initialize it.
2212 */
2213 pChunk->hMemObj = hMemObj;
2214#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
2215 pChunk->pbMapping = pbMapping;
2216#endif
2217 pChunk->cFree = GMM_CHUNK_NUM_PAGES;
2218 pChunk->hGVM = hGVM;
2219 /*pChunk->iFreeHead = 0;*/
2220 pChunk->idNumaNode = gmmR0GetCurrentNumaNodeId();
2221 pChunk->iChunkMtx = UINT8_MAX;
2222 pChunk->fFlags = fChunkFlags;
2223 for (unsigned iPage = 0; iPage < RT_ELEMENTS(pChunk->aPages) - 1; iPage++)
2224 {
2225 pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
2226 pChunk->aPages[iPage].Free.iNext = iPage + 1;
2227 }
2228 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.u2State = GMM_PAGE_STATE_FREE;
2229 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.iNext = UINT16_MAX;
2230
2231 /*
2232 * Allocate a Chunk ID and insert it into the tree.
2233 * This has to be done behind the mutex of course.
2234 */
2235 rc = gmmR0MutexAcquire(pGMM);
2236 if (RT_SUCCESS(rc))
2237 {
2238 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2239 {
2240 pChunk->Core.Key = gmmR0AllocateChunkId(pGMM);
2241 if ( pChunk->Core.Key != NIL_GMM_CHUNKID
2242 && pChunk->Core.Key <= GMM_CHUNKID_LAST
2243 && RTAvlU32Insert(&pGMM->pChunks, &pChunk->Core))
2244 {
2245 pGMM->cChunks++;
2246 RTListAppend(&pGMM->ChunkList, &pChunk->ListNode);
2247 gmmR0LinkChunk(pChunk, pSet);
2248 LogFlow(("gmmR0RegisterChunk: pChunk=%p id=%#x cChunks=%d\n", pChunk, pChunk->Core.Key, pGMM->cChunks));
2249
2250 if (ppChunk)
2251 *ppChunk = pChunk;
2252 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2253 return VINF_SUCCESS;
2254 }
2255
2256 /* bail out */
2257 rc = VERR_GMM_CHUNK_INSERT;
2258 }
2259 else
2260 rc = VERR_GMM_IS_NOT_SANE;
2261 gmmR0MutexRelease(pGMM);
2262 }
2263
2264 RTMemFree(pChunk);
2265 }
2266 else
2267 rc = VERR_NO_MEMORY;
2268 return rc;
2269}
2270
2271
2272/**
2273 * Allocate a new chunk, immediately pick the requested pages from it, and adds
2274 * what's remaining to the specified free set.
2275 *
2276 * @note This will leave the giant mutex while allocating the new chunk!
2277 *
2278 * @returns VBox status code.
2279 * @param pGMM Pointer to the GMM instance data.
2280 * @param pGVM Pointer to the kernel-only VM instace data.
2281 * @param pSet Pointer to the free set.
2282 * @param cPages The number of pages requested.
2283 * @param paPages The page descriptor table (input + output).
2284 * @param piPage The pointer to the page descriptor table index variable.
2285 * This will be updated.
2286 */
2287static int gmmR0AllocateChunkNew(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet, uint32_t cPages,
2288 PGMMPAGEDESC paPages, uint32_t *piPage)
2289{
2290 gmmR0MutexRelease(pGMM);
2291
2292 RTR0MEMOBJ hMemObj;
2293#ifndef GMM_WITH_LEGACY_MODE
2294 int rc;
2295# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2296 if (pGMM->fHasWorkingAllocPhysNC)
2297 rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2298 else
2299# endif
2300 rc = RTR0MemObjAllocPage(&hMemObj, GMM_CHUNK_SIZE, false /*fExecutable*/);
2301#else
2302 int rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2303#endif
2304 if (RT_SUCCESS(rc))
2305 {
2306 /** @todo Duplicate gmmR0RegisterChunk here so we can avoid chaining up the
2307 * free pages first and then unchaining them right afterwards. Instead
2308 * do as much work as possible without holding the giant lock. */
2309 PGMMCHUNK pChunk;
2310 rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, 0 /*fChunkFlags*/, &pChunk);
2311 if (RT_SUCCESS(rc))
2312 {
2313 *piPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, *piPage, cPages, paPages);
2314 return VINF_SUCCESS;
2315 }
2316
2317 /* bail out */
2318 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
2319 }
2320
2321 int rc2 = gmmR0MutexAcquire(pGMM);
2322 AssertRCReturn(rc2, RT_FAILURE(rc) ? rc : rc2);
2323 return rc;
2324
2325}
2326
2327
2328/**
2329 * As a last restort we'll pick any page we can get.
2330 *
2331 * @returns The new page descriptor table index.
2332 * @param pSet The set to pick from.
2333 * @param pGVM Pointer to the global VM structure.
2334 * @param iPage The current page descriptor table index.
2335 * @param cPages The total number of pages to allocate.
2336 * @param paPages The page descriptor table (input + ouput).
2337 */
2338static uint32_t gmmR0AllocatePagesIndiscriminately(PGMMCHUNKFREESET pSet, PGVM pGVM,
2339 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2340{
2341 unsigned iList = RT_ELEMENTS(pSet->apLists);
2342 while (iList-- > 0)
2343 {
2344 PGMMCHUNK pChunk = pSet->apLists[iList];
2345 while (pChunk)
2346 {
2347 PGMMCHUNK pNext = pChunk->pFreeNext;
2348
2349 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2350 if (iPage >= cPages)
2351 return iPage;
2352
2353 pChunk = pNext;
2354 }
2355 }
2356 return iPage;
2357}
2358
2359
2360/**
2361 * Pick pages from empty chunks on the same NUMA node.
2362 *
2363 * @returns The new page descriptor table index.
2364 * @param pSet The set to pick from.
2365 * @param pGVM Pointer to the global VM structure.
2366 * @param iPage The current page descriptor table index.
2367 * @param cPages The total number of pages to allocate.
2368 * @param paPages The page descriptor table (input + ouput).
2369 */
2370static uint32_t gmmR0AllocatePagesFromEmptyChunksOnSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2371 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2372{
2373 PGMMCHUNK pChunk = pSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
2374 if (pChunk)
2375 {
2376 uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2377 while (pChunk)
2378 {
2379 PGMMCHUNK pNext = pChunk->pFreeNext;
2380
2381 if (pChunk->idNumaNode == idNumaNode)
2382 {
2383 pChunk->hGVM = pGVM->hSelf;
2384 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2385 if (iPage >= cPages)
2386 {
2387 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2388 return iPage;
2389 }
2390 }
2391
2392 pChunk = pNext;
2393 }
2394 }
2395 return iPage;
2396}
2397
2398
2399/**
2400 * Pick pages from non-empty chunks on the same NUMA node.
2401 *
2402 * @returns The new page descriptor table index.
2403 * @param pSet The set to pick from.
2404 * @param pGVM Pointer to the global VM structure.
2405 * @param iPage The current page descriptor table index.
2406 * @param cPages The total number of pages to allocate.
2407 * @param paPages The page descriptor table (input + ouput).
2408 */
2409static uint32_t gmmR0AllocatePagesFromSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2410 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2411{
2412 /** @todo start by picking from chunks with about the right size first? */
2413 uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2414 unsigned iList = GMM_CHUNK_FREE_SET_UNUSED_LIST;
2415 while (iList-- > 0)
2416 {
2417 PGMMCHUNK pChunk = pSet->apLists[iList];
2418 while (pChunk)
2419 {
2420 PGMMCHUNK pNext = pChunk->pFreeNext;
2421
2422 if (pChunk->idNumaNode == idNumaNode)
2423 {
2424 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2425 if (iPage >= cPages)
2426 {
2427 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2428 return iPage;
2429 }
2430 }
2431
2432 pChunk = pNext;
2433 }
2434 }
2435 return iPage;
2436}
2437
2438
2439/**
2440 * Pick pages that are in chunks already associated with the VM.
2441 *
2442 * @returns The new page descriptor table index.
2443 * @param pGMM Pointer to the GMM instance data.
2444 * @param pGVM Pointer to the global VM structure.
2445 * @param pSet The set to pick from.
2446 * @param iPage The current page descriptor table index.
2447 * @param cPages The total number of pages to allocate.
2448 * @param paPages The page descriptor table (input + ouput).
2449 */
2450static uint32_t gmmR0AllocatePagesAssociatedWithVM(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet,
2451 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2452{
2453 uint16_t const hGVM = pGVM->hSelf;
2454
2455 /* Hint. */
2456 if (pGVM->gmm.s.idLastChunkHint != NIL_GMM_CHUNKID)
2457 {
2458 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pGVM->gmm.s.idLastChunkHint);
2459 if (pChunk && pChunk->cFree)
2460 {
2461 iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2462 if (iPage >= cPages)
2463 return iPage;
2464 }
2465 }
2466
2467 /* Scan. */
2468 for (unsigned iList = 0; iList < RT_ELEMENTS(pSet->apLists); iList++)
2469 {
2470 PGMMCHUNK pChunk = pSet->apLists[iList];
2471 while (pChunk)
2472 {
2473 PGMMCHUNK pNext = pChunk->pFreeNext;
2474
2475 if (pChunk->hGVM == hGVM)
2476 {
2477 iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2478 if (iPage >= cPages)
2479 {
2480 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2481 return iPage;
2482 }
2483 }
2484
2485 pChunk = pNext;
2486 }
2487 }
2488 return iPage;
2489}
2490
2491
2492
2493/**
2494 * Pick pages in bound memory mode.
2495 *
2496 * @returns The new page descriptor table index.
2497 * @param pGVM Pointer to the global VM structure.
2498 * @param iPage The current page descriptor table index.
2499 * @param cPages The total number of pages to allocate.
2500 * @param paPages The page descriptor table (input + ouput).
2501 */
2502static uint32_t gmmR0AllocatePagesInBoundMode(PGVM pGVM, uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2503{
2504 for (unsigned iList = 0; iList < RT_ELEMENTS(pGVM->gmm.s.Private.apLists); iList++)
2505 {
2506 PGMMCHUNK pChunk = pGVM->gmm.s.Private.apLists[iList];
2507 while (pChunk)
2508 {
2509 Assert(pChunk->hGVM == pGVM->hSelf);
2510 PGMMCHUNK pNext = pChunk->pFreeNext;
2511 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2512 if (iPage >= cPages)
2513 return iPage;
2514 pChunk = pNext;
2515 }
2516 }
2517 return iPage;
2518}
2519
2520
2521/**
2522 * Checks if we should start picking pages from chunks of other VMs because
2523 * we're getting close to the system memory or reserved limit.
2524 *
2525 * @returns @c true if we should, @c false if we should first try allocate more
2526 * chunks.
2527 */
2528static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(PGVM pGVM)
2529{
2530 /*
2531 * Don't allocate a new chunk if we're
2532 */
2533 uint64_t cPgReserved = pGVM->gmm.s.Stats.Reserved.cBasePages
2534 + pGVM->gmm.s.Stats.Reserved.cFixedPages
2535 - pGVM->gmm.s.Stats.cBalloonedPages
2536 /** @todo what about shared pages? */;
2537 uint64_t cPgAllocated = pGVM->gmm.s.Stats.Allocated.cBasePages
2538 + pGVM->gmm.s.Stats.Allocated.cFixedPages;
2539 uint64_t cPgDelta = cPgReserved - cPgAllocated;
2540 if (cPgDelta < GMM_CHUNK_NUM_PAGES * 4)
2541 return true;
2542 /** @todo make the threshold configurable, also test the code to see if
2543 * this ever kicks in (we might be reserving too much or smth). */
2544
2545 /*
2546 * Check how close we're to the max memory limit and how many fragments
2547 * there are?...
2548 */
2549 /** @todo */
2550
2551 return false;
2552}
2553
2554
2555/**
2556 * Checks if we should start picking pages from chunks of other VMs because
2557 * there is a lot of free pages around.
2558 *
2559 * @returns @c true if we should, @c false if we should first try allocate more
2560 * chunks.
2561 */
2562static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(PGMM pGMM)
2563{
2564 /*
2565 * Setting the limit at 16 chunks (32 MB) at the moment.
2566 */
2567 if (pGMM->PrivateX.cFreePages >= GMM_CHUNK_NUM_PAGES * 16)
2568 return true;
2569 return false;
2570}
2571
2572
2573/**
2574 * Common worker for GMMR0AllocateHandyPages and GMMR0AllocatePages.
2575 *
2576 * @returns VBox status code:
2577 * @retval VINF_SUCCESS on success.
2578 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk or
2579 * gmmR0AllocateMoreChunks is necessary.
2580 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2581 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2582 * that is we're trying to allocate more than we've reserved.
2583 *
2584 * @param pGMM Pointer to the GMM instance data.
2585 * @param pGVM Pointer to the VM.
2586 * @param cPages The number of pages to allocate.
2587 * @param paPages Pointer to the page descriptors. See GMMPAGEDESC for
2588 * details on what is expected on input.
2589 * @param enmAccount The account to charge.
2590 *
2591 * @remarks Call takes the giant GMM lock.
2592 */
2593static int gmmR0AllocatePagesNew(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2594{
2595 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
2596
2597 /*
2598 * Check allocation limits.
2599 */
2600 if (RT_UNLIKELY(pGMM->cAllocatedPages + cPages > pGMM->cMaxPages))
2601 return VERR_GMM_HIT_GLOBAL_LIMIT;
2602
2603 switch (enmAccount)
2604 {
2605 case GMMACCOUNT_BASE:
2606 if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
2607 > pGVM->gmm.s.Stats.Reserved.cBasePages))
2608 {
2609 Log(("gmmR0AllocatePages:Base: Reserved=%#llx Allocated+Ballooned+Requested=%#llx+%#llx+%#x!\n",
2610 pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages,
2611 pGVM->gmm.s.Stats.cBalloonedPages, cPages));
2612 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2613 }
2614 break;
2615 case GMMACCOUNT_SHADOW:
2616 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages + cPages > pGVM->gmm.s.Stats.Reserved.cShadowPages))
2617 {
2618 Log(("gmmR0AllocatePages:Shadow: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2619 pGVM->gmm.s.Stats.Reserved.cShadowPages, pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
2620 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2621 }
2622 break;
2623 case GMMACCOUNT_FIXED:
2624 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages + cPages > pGVM->gmm.s.Stats.Reserved.cFixedPages))
2625 {
2626 Log(("gmmR0AllocatePages:Fixed: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2627 pGVM->gmm.s.Stats.Reserved.cFixedPages, pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
2628 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2629 }
2630 break;
2631 default:
2632 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2633 }
2634
2635#ifdef GMM_WITH_LEGACY_MODE
2636 /*
2637 * If we're in legacy memory mode, it's easy to figure if we have
2638 * sufficient number of pages up-front.
2639 */
2640 if ( pGMM->fLegacyAllocationMode
2641 && pGVM->gmm.s.Private.cFreePages < cPages)
2642 {
2643 Assert(pGMM->fBoundMemoryMode);
2644 return VERR_GMM_SEED_ME;
2645 }
2646#endif
2647
2648 /*
2649 * Update the accounts before we proceed because we might be leaving the
2650 * protection of the global mutex and thus run the risk of permitting
2651 * too much memory to be allocated.
2652 */
2653 switch (enmAccount)
2654 {
2655 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages += cPages; break;
2656 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages += cPages; break;
2657 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages += cPages; break;
2658 default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2659 }
2660 pGVM->gmm.s.Stats.cPrivatePages += cPages;
2661 pGMM->cAllocatedPages += cPages;
2662
2663#ifdef GMM_WITH_LEGACY_MODE
2664 /*
2665 * Part two of it's-easy-in-legacy-memory-mode.
2666 */
2667 if (pGMM->fLegacyAllocationMode)
2668 {
2669 uint32_t iPage = gmmR0AllocatePagesInBoundMode(pGVM, 0, cPages, paPages);
2670 AssertReleaseReturn(iPage == cPages, VERR_GMM_ALLOC_PAGES_IPE);
2671 return VINF_SUCCESS;
2672 }
2673#endif
2674
2675 /*
2676 * Bound mode is also relatively straightforward.
2677 */
2678 uint32_t iPage = 0;
2679 int rc = VINF_SUCCESS;
2680 if (pGMM->fBoundMemoryMode)
2681 {
2682 iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2683 if (iPage < cPages)
2684 do
2685 rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGVM->gmm.s.Private, cPages, paPages, &iPage);
2686 while (iPage < cPages && RT_SUCCESS(rc));
2687 }
2688 /*
2689 * Shared mode is trickier as we should try archive the same locality as
2690 * in bound mode, but smartly make use of non-full chunks allocated by
2691 * other VMs if we're low on memory.
2692 */
2693 else
2694 {
2695 /* Pick the most optimal pages first. */
2696 iPage = gmmR0AllocatePagesAssociatedWithVM(pGMM, pGVM, &pGMM->PrivateX, iPage, cPages, paPages);
2697 if (iPage < cPages)
2698 {
2699 /* Maybe we should try getting pages from chunks "belonging" to
2700 other VMs before allocating more chunks? */
2701 bool fTriedOnSameAlready = false;
2702 if (gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(pGVM))
2703 {
2704 iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2705 fTriedOnSameAlready = true;
2706 }
2707
2708 /* Allocate memory from empty chunks. */
2709 if (iPage < cPages)
2710 iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2711
2712 /* Grab empty shared chunks. */
2713 if (iPage < cPages)
2714 iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2715
2716 /* If there is a lof of free pages spread around, try not waste
2717 system memory on more chunks. (Should trigger defragmentation.) */
2718 if ( !fTriedOnSameAlready
2719 && gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(pGMM))
2720 {
2721 iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2722 if (iPage < cPages)
2723 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2724 }
2725
2726 /*
2727 * Ok, try allocate new chunks.
2728 */
2729 if (iPage < cPages)
2730 {
2731 do
2732 rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGMM->PrivateX, cPages, paPages, &iPage);
2733 while (iPage < cPages && RT_SUCCESS(rc));
2734
2735 /* If the host is out of memory, take whatever we can get. */
2736 if ( (rc == VERR_NO_MEMORY || rc == VERR_NO_PHYS_MEMORY)
2737 && pGMM->PrivateX.cFreePages + pGMM->Shared.cFreePages >= cPages - iPage)
2738 {
2739 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2740 if (iPage < cPages)
2741 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2742 AssertRelease(iPage == cPages);
2743 rc = VINF_SUCCESS;
2744 }
2745 }
2746 }
2747 }
2748
2749 /*
2750 * Clean up on failure. Since this is bound to be a low-memory condition
2751 * we will give back any empty chunks that might be hanging around.
2752 */
2753 if (RT_FAILURE(rc))
2754 {
2755 /* Update the statistics. */
2756 pGVM->gmm.s.Stats.cPrivatePages -= cPages;
2757 pGMM->cAllocatedPages -= cPages - iPage;
2758 switch (enmAccount)
2759 {
2760 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages; break;
2761 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= cPages; break;
2762 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= cPages; break;
2763 default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2764 }
2765
2766 /* Release the pages. */
2767 while (iPage-- > 0)
2768 {
2769 uint32_t idPage = paPages[iPage].idPage;
2770 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
2771 if (RT_LIKELY(pPage))
2772 {
2773 Assert(GMM_PAGE_IS_PRIVATE(pPage));
2774 Assert(pPage->Private.hGVM == pGVM->hSelf);
2775 gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
2776 }
2777 else
2778 AssertMsgFailed(("idPage=%#x\n", idPage));
2779
2780 paPages[iPage].idPage = NIL_GMM_PAGEID;
2781 paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2782 paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2783 }
2784
2785 /* Free empty chunks. */
2786 /** @todo */
2787
2788 /* return the fail status on failure */
2789 return rc;
2790 }
2791 return VINF_SUCCESS;
2792}
2793
2794
2795/**
2796 * Updates the previous allocations and allocates more pages.
2797 *
2798 * The handy pages are always taken from the 'base' memory account.
2799 * The allocated pages are not cleared and will contains random garbage.
2800 *
2801 * @returns VBox status code:
2802 * @retval VINF_SUCCESS on success.
2803 * @retval VERR_NOT_OWNER if the caller is not an EMT.
2804 * @retval VERR_GMM_PAGE_NOT_FOUND if one of the pages to update wasn't found.
2805 * @retval VERR_GMM_PAGE_NOT_PRIVATE if one of the pages to update wasn't a
2806 * private page.
2807 * @retval VERR_GMM_PAGE_NOT_SHARED if one of the pages to update wasn't a
2808 * shared page.
2809 * @retval VERR_GMM_NOT_PAGE_OWNER if one of the pages to be updated wasn't
2810 * owned by the VM.
2811 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2812 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2813 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2814 * that is we're trying to allocate more than we've reserved.
2815 *
2816 * @param pGVM The global (ring-0) VM structure.
2817 * @param idCpu The VCPU id.
2818 * @param cPagesToUpdate The number of pages to update (starting from the head).
2819 * @param cPagesToAlloc The number of pages to allocate (starting from the head).
2820 * @param paPages The array of page descriptors.
2821 * See GMMPAGEDESC for details on what is expected on input.
2822 * @thread EMT(idCpu)
2823 */
2824GMMR0DECL(int) GMMR0AllocateHandyPages(PGVM pGVM, VMCPUID idCpu, uint32_t cPagesToUpdate,
2825 uint32_t cPagesToAlloc, PGMMPAGEDESC paPages)
2826{
2827 LogFlow(("GMMR0AllocateHandyPages: pGVM=%p cPagesToUpdate=%#x cPagesToAlloc=%#x paPages=%p\n",
2828 pGVM, cPagesToUpdate, cPagesToAlloc, paPages));
2829
2830 /*
2831 * Validate, get basics and take the semaphore.
2832 * (This is a relatively busy path, so make predictions where possible.)
2833 */
2834 PGMM pGMM;
2835 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2836 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
2837 if (RT_FAILURE(rc))
2838 return rc;
2839
2840 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2841 AssertMsgReturn( (cPagesToUpdate && cPagesToUpdate < 1024)
2842 || (cPagesToAlloc && cPagesToAlloc < 1024),
2843 ("cPagesToUpdate=%#x cPagesToAlloc=%#x\n", cPagesToUpdate, cPagesToAlloc),
2844 VERR_INVALID_PARAMETER);
2845
2846 unsigned iPage = 0;
2847 for (; iPage < cPagesToUpdate; iPage++)
2848 {
2849 AssertMsgReturn( ( paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2850 && !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK))
2851 || paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2852 || paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE,
2853 ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys),
2854 VERR_INVALID_PARAMETER);
2855 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2856 /*|| paPages[iPage].idPage == NIL_GMM_PAGEID*/,
2857 ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2858 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2859 /*|| paPages[iPage].idSharedPage == NIL_GMM_PAGEID*/,
2860 ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2861 }
2862
2863 for (; iPage < cPagesToAlloc; iPage++)
2864 {
2865 AssertMsgReturn(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS, ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys), VERR_INVALID_PARAMETER);
2866 AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2867 AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2868 }
2869
2870 gmmR0MutexAcquire(pGMM);
2871 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2872 {
2873 /* No allocations before the initial reservation has been made! */
2874 if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
2875 && pGVM->gmm.s.Stats.Reserved.cFixedPages
2876 && pGVM->gmm.s.Stats.Reserved.cShadowPages))
2877 {
2878 /*
2879 * Perform the updates.
2880 * Stop on the first error.
2881 */
2882 for (iPage = 0; iPage < cPagesToUpdate; iPage++)
2883 {
2884 if (paPages[iPage].idPage != NIL_GMM_PAGEID)
2885 {
2886 PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idPage);
2887 if (RT_LIKELY(pPage))
2888 {
2889 if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
2890 {
2891 if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
2892 {
2893 AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2894 if (RT_LIKELY(paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST))
2895 pPage->Private.pfn = paPages[iPage].HCPhysGCPhys >> PAGE_SHIFT;
2896 else if (paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE)
2897 pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
2898 /* else: NIL_RTHCPHYS nothing */
2899
2900 paPages[iPage].idPage = NIL_GMM_PAGEID;
2901 paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2902 }
2903 else
2904 {
2905 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not owner! hGVM=%#x hSelf=%#x\n",
2906 iPage, paPages[iPage].idPage, pPage->Private.hGVM, pGVM->hSelf));
2907 rc = VERR_GMM_NOT_PAGE_OWNER;
2908 break;
2909 }
2910 }
2911 else
2912 {
2913 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not private! %.*Rhxs (type %d)\n", iPage, paPages[iPage].idPage, sizeof(*pPage), pPage, pPage->Common.u2State));
2914 rc = VERR_GMM_PAGE_NOT_PRIVATE;
2915 break;
2916 }
2917 }
2918 else
2919 {
2920 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (private)\n", iPage, paPages[iPage].idPage));
2921 rc = VERR_GMM_PAGE_NOT_FOUND;
2922 break;
2923 }
2924 }
2925
2926 if (paPages[iPage].idSharedPage != NIL_GMM_PAGEID)
2927 {
2928 PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idSharedPage);
2929 if (RT_LIKELY(pPage))
2930 {
2931 if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
2932 {
2933 AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2934 Assert(pPage->Shared.cRefs);
2935 Assert(pGVM->gmm.s.Stats.cSharedPages);
2936 Assert(pGVM->gmm.s.Stats.Allocated.cBasePages);
2937
2938 Log(("GMMR0AllocateHandyPages: free shared page %x cRefs=%d\n", paPages[iPage].idSharedPage, pPage->Shared.cRefs));
2939 pGVM->gmm.s.Stats.cSharedPages--;
2940 pGVM->gmm.s.Stats.Allocated.cBasePages--;
2941 if (!--pPage->Shared.cRefs)
2942 gmmR0FreeSharedPage(pGMM, pGVM, paPages[iPage].idSharedPage, pPage);
2943 else
2944 {
2945 Assert(pGMM->cDuplicatePages);
2946 pGMM->cDuplicatePages--;
2947 }
2948
2949 paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2950 }
2951 else
2952 {
2953 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not shared!\n", iPage, paPages[iPage].idSharedPage));
2954 rc = VERR_GMM_PAGE_NOT_SHARED;
2955 break;
2956 }
2957 }
2958 else
2959 {
2960 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (shared)\n", iPage, paPages[iPage].idSharedPage));
2961 rc = VERR_GMM_PAGE_NOT_FOUND;
2962 break;
2963 }
2964 }
2965 } /* for each page to update */
2966
2967 if (RT_SUCCESS(rc) && cPagesToAlloc > 0)
2968 {
2969#if defined(VBOX_STRICT) && 0 /** @todo re-test this later. Appeared to be a PGM init bug. */
2970 for (iPage = 0; iPage < cPagesToAlloc; iPage++)
2971 {
2972 Assert(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS);
2973 Assert(paPages[iPage].idPage == NIL_GMM_PAGEID);
2974 Assert(paPages[iPage].idSharedPage == NIL_GMM_PAGEID);
2975 }
2976#endif
2977
2978 /*
2979 * Join paths with GMMR0AllocatePages for the allocation.
2980 * Note! gmmR0AllocateMoreChunks may leave the protection of the mutex!
2981 */
2982 rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPagesToAlloc, paPages, GMMACCOUNT_BASE);
2983 }
2984 }
2985 else
2986 rc = VERR_WRONG_ORDER;
2987 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2988 }
2989 else
2990 rc = VERR_GMM_IS_NOT_SANE;
2991 gmmR0MutexRelease(pGMM);
2992 LogFlow(("GMMR0AllocateHandyPages: returns %Rrc\n", rc));
2993 return rc;
2994}
2995
2996
2997/**
2998 * Allocate one or more pages.
2999 *
3000 * This is typically used for ROMs and MMIO2 (VRAM) during VM creation.
3001 * The allocated pages are not cleared and will contain random garbage.
3002 *
3003 * @returns VBox status code:
3004 * @retval VINF_SUCCESS on success.
3005 * @retval VERR_NOT_OWNER if the caller is not an EMT.
3006 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
3007 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3008 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3009 * that is we're trying to allocate more than we've reserved.
3010 *
3011 * @param pGVM The global (ring-0) VM structure.
3012 * @param idCpu The VCPU id.
3013 * @param cPages The number of pages to allocate.
3014 * @param paPages Pointer to the page descriptors.
3015 * See GMMPAGEDESC for details on what is expected on
3016 * input.
3017 * @param enmAccount The account to charge.
3018 *
3019 * @thread EMT.
3020 */
3021GMMR0DECL(int) GMMR0AllocatePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
3022{
3023 LogFlow(("GMMR0AllocatePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3024
3025 /*
3026 * Validate, get basics and take the semaphore.
3027 */
3028 PGMM pGMM;
3029 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3030 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3031 if (RT_FAILURE(rc))
3032 return rc;
3033
3034 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3035 AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3036 AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3037
3038 for (unsigned iPage = 0; iPage < cPages; iPage++)
3039 {
3040 AssertMsgReturn( paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
3041 || paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE
3042 || ( enmAccount == GMMACCOUNT_BASE
3043 && paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
3044 && !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK)),
3045 ("#%#x: %RHp enmAccount=%d\n", iPage, paPages[iPage].HCPhysGCPhys, enmAccount),
3046 VERR_INVALID_PARAMETER);
3047 AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3048 AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
3049 }
3050
3051 gmmR0MutexAcquire(pGMM);
3052 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3053 {
3054
3055 /* No allocations before the initial reservation has been made! */
3056 if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
3057 && pGVM->gmm.s.Stats.Reserved.cFixedPages
3058 && pGVM->gmm.s.Stats.Reserved.cShadowPages))
3059 rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPages, paPages, enmAccount);
3060 else
3061 rc = VERR_WRONG_ORDER;
3062 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3063 }
3064 else
3065 rc = VERR_GMM_IS_NOT_SANE;
3066 gmmR0MutexRelease(pGMM);
3067 LogFlow(("GMMR0AllocatePages: returns %Rrc\n", rc));
3068 return rc;
3069}
3070
3071
3072/**
3073 * VMMR0 request wrapper for GMMR0AllocatePages.
3074 *
3075 * @returns see GMMR0AllocatePages.
3076 * @param pGVM The global (ring-0) VM structure.
3077 * @param idCpu The VCPU id.
3078 * @param pReq Pointer to the request packet.
3079 */
3080GMMR0DECL(int) GMMR0AllocatePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMALLOCATEPAGESREQ pReq)
3081{
3082 /*
3083 * Validate input and pass it on.
3084 */
3085 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3086 AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0]),
3087 ("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0])),
3088 VERR_INVALID_PARAMETER);
3089 AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages]),
3090 ("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages])),
3091 VERR_INVALID_PARAMETER);
3092
3093 return GMMR0AllocatePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3094}
3095
3096
3097/**
3098 * Allocate a large page to represent guest RAM
3099 *
3100 * The allocated pages are not cleared and will contains random garbage.
3101 *
3102 * @returns VBox status code:
3103 * @retval VINF_SUCCESS on success.
3104 * @retval VERR_NOT_OWNER if the caller is not an EMT.
3105 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
3106 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3107 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3108 * that is we're trying to allocate more than we've reserved.
3109 * @returns see GMMR0AllocatePages.
3110 *
3111 * @param pGVM The global (ring-0) VM structure.
3112 * @param idCpu The VCPU id.
3113 * @param cbPage Large page size.
3114 * @param pIdPage Where to return the GMM page ID of the page.
3115 * @param pHCPhys Where to return the host physical address of the page.
3116 */
3117GMMR0DECL(int) GMMR0AllocateLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t cbPage, uint32_t *pIdPage, RTHCPHYS *pHCPhys)
3118{
3119 LogFlow(("GMMR0AllocateLargePage: pGVM=%p cbPage=%x\n", pGVM, cbPage));
3120
3121 AssertReturn(cbPage == GMM_CHUNK_SIZE, VERR_INVALID_PARAMETER);
3122 AssertPtrReturn(pIdPage, VERR_INVALID_PARAMETER);
3123 AssertPtrReturn(pHCPhys, VERR_INVALID_PARAMETER);
3124
3125 /*
3126 * Validate, get basics and take the semaphore.
3127 */
3128 PGMM pGMM;
3129 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3130 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3131 if (RT_FAILURE(rc))
3132 return rc;
3133
3134#ifdef GMM_WITH_LEGACY_MODE
3135 // /* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3136 // if (pGMM->fLegacyAllocationMode)
3137 // return VERR_NOT_SUPPORTED;
3138#endif
3139
3140 *pHCPhys = NIL_RTHCPHYS;
3141 *pIdPage = NIL_GMM_PAGEID;
3142
3143 gmmR0MutexAcquire(pGMM);
3144 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3145 {
3146 const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3147 if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
3148 > pGVM->gmm.s.Stats.Reserved.cBasePages))
3149 {
3150 Log(("GMMR0AllocateLargePage: Reserved=%#llx Allocated+Requested=%#llx+%#x!\n",
3151 pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3152 gmmR0MutexRelease(pGMM);
3153 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
3154 }
3155
3156 /*
3157 * Allocate a new large page chunk.
3158 *
3159 * Note! We leave the giant GMM lock temporarily as the allocation might
3160 * take a long time. gmmR0RegisterChunk will retake it (ugly).
3161 */
3162 AssertCompile(GMM_CHUNK_SIZE == _2M);
3163 gmmR0MutexRelease(pGMM);
3164
3165 RTR0MEMOBJ hMemObj;
3166 rc = RTR0MemObjAllocPhysEx(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS, GMM_CHUNK_SIZE);
3167 if (RT_SUCCESS(rc))
3168 {
3169 PGMMCHUNKFREESET pSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
3170 PGMMCHUNK pChunk;
3171 rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_LARGE_PAGE, &pChunk);
3172 if (RT_SUCCESS(rc))
3173 {
3174 /*
3175 * Allocate all the pages in the chunk.
3176 */
3177 /* Unlink the new chunk from the free list. */
3178 gmmR0UnlinkChunk(pChunk);
3179
3180 /** @todo rewrite this to skip the looping. */
3181 /* Allocate all pages. */
3182 GMMPAGEDESC PageDesc;
3183 gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3184
3185 /* Return the first page as we'll use the whole chunk as one big page. */
3186 *pIdPage = PageDesc.idPage;
3187 *pHCPhys = PageDesc.HCPhysGCPhys;
3188
3189 for (unsigned i = 1; i < cPages; i++)
3190 gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3191
3192 /* Update accounting. */
3193 pGVM->gmm.s.Stats.Allocated.cBasePages += cPages;
3194 pGVM->gmm.s.Stats.cPrivatePages += cPages;
3195 pGMM->cAllocatedPages += cPages;
3196
3197 gmmR0LinkChunk(pChunk, pSet);
3198 gmmR0MutexRelease(pGMM);
3199 LogFlow(("GMMR0AllocateLargePage: returns VINF_SUCCESS\n"));
3200 return VINF_SUCCESS;
3201 }
3202 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3203 }
3204 }
3205 else
3206 {
3207 gmmR0MutexRelease(pGMM);
3208 rc = VERR_GMM_IS_NOT_SANE;
3209 }
3210
3211 LogFlow(("GMMR0AllocateLargePage: returns %Rrc\n", rc));
3212 return rc;
3213}
3214
3215
3216/**
3217 * Free a large page.
3218 *
3219 * @returns VBox status code:
3220 * @param pGVM The global (ring-0) VM structure.
3221 * @param idCpu The VCPU id.
3222 * @param idPage The large page id.
3223 */
3224GMMR0DECL(int) GMMR0FreeLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t idPage)
3225{
3226 LogFlow(("GMMR0FreeLargePage: pGVM=%p idPage=%x\n", pGVM, idPage));
3227
3228 /*
3229 * Validate, get basics and take the semaphore.
3230 */
3231 PGMM pGMM;
3232 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3233 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3234 if (RT_FAILURE(rc))
3235 return rc;
3236
3237#ifdef GMM_WITH_LEGACY_MODE
3238 // /* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3239 // if (pGMM->fLegacyAllocationMode)
3240 // return VERR_NOT_SUPPORTED;
3241#endif
3242
3243 gmmR0MutexAcquire(pGMM);
3244 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3245 {
3246 const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3247
3248 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3249 {
3250 Log(("GMMR0FreeLargePage: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3251 gmmR0MutexRelease(pGMM);
3252 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3253 }
3254
3255 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3256 if (RT_LIKELY( pPage
3257 && GMM_PAGE_IS_PRIVATE(pPage)))
3258 {
3259 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3260 Assert(pChunk);
3261 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3262 Assert(pChunk->cPrivate > 0);
3263
3264 /* Release the memory immediately. */
3265 gmmR0FreeChunk(pGMM, NULL, pChunk, false /*fRelaxedSem*/); /** @todo this can be relaxed too! */
3266
3267 /* Update accounting. */
3268 pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages;
3269 pGVM->gmm.s.Stats.cPrivatePages -= cPages;
3270 pGMM->cAllocatedPages -= cPages;
3271 }
3272 else
3273 rc = VERR_GMM_PAGE_NOT_FOUND;
3274 }
3275 else
3276 rc = VERR_GMM_IS_NOT_SANE;
3277
3278 gmmR0MutexRelease(pGMM);
3279 LogFlow(("GMMR0FreeLargePage: returns %Rrc\n", rc));
3280 return rc;
3281}
3282
3283
3284/**
3285 * VMMR0 request wrapper for GMMR0FreeLargePage.
3286 *
3287 * @returns see GMMR0FreeLargePage.
3288 * @param pGVM The global (ring-0) VM structure.
3289 * @param idCpu The VCPU id.
3290 * @param pReq Pointer to the request packet.
3291 */
3292GMMR0DECL(int) GMMR0FreeLargePageReq(PGVM pGVM, VMCPUID idCpu, PGMMFREELARGEPAGEREQ pReq)
3293{
3294 /*
3295 * Validate input and pass it on.
3296 */
3297 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3298 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMFREEPAGESREQ),
3299 ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(GMMFREEPAGESREQ)),
3300 VERR_INVALID_PARAMETER);
3301
3302 return GMMR0FreeLargePage(pGVM, idCpu, pReq->idPage);
3303}
3304
3305
3306/**
3307 * Frees a chunk, giving it back to the host OS.
3308 *
3309 * @param pGMM Pointer to the GMM instance.
3310 * @param pGVM This is set when called from GMMR0CleanupVM so we can
3311 * unmap and free the chunk in one go.
3312 * @param pChunk The chunk to free.
3313 * @param fRelaxedSem Whether we can release the semaphore while doing the
3314 * freeing (@c true) or not.
3315 */
3316static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3317{
3318 Assert(pChunk->Core.Key != NIL_GMM_CHUNKID);
3319
3320 GMMR0CHUNKMTXSTATE MtxState;
3321 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3322
3323 /*
3324 * Cleanup hack! Unmap the chunk from the callers address space.
3325 * This shouldn't happen, so screw lock contention...
3326 */
3327 if ( pChunk->cMappingsX
3328#ifdef GMM_WITH_LEGACY_MODE
3329 && (!pGMM->fLegacyAllocationMode || (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
3330#endif
3331 && pGVM)
3332 gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3333
3334 /*
3335 * If there are current mappings of the chunk, then request the
3336 * VMs to unmap them. Reposition the chunk in the free list so
3337 * it won't be a likely candidate for allocations.
3338 */
3339 if (pChunk->cMappingsX)
3340 {
3341 /** @todo R0 -> VM request */
3342 /* The chunk can be mapped by more than one VM if fBoundMemoryMode is false! */
3343 Log(("gmmR0FreeChunk: chunk still has %d mappings; don't free!\n", pChunk->cMappingsX));
3344 gmmR0ChunkMutexRelease(&MtxState, pChunk);
3345 return false;
3346 }
3347
3348
3349 /*
3350 * Save and trash the handle.
3351 */
3352 RTR0MEMOBJ const hMemObj = pChunk->hMemObj;
3353 pChunk->hMemObj = NIL_RTR0MEMOBJ;
3354
3355 /*
3356 * Unlink it from everywhere.
3357 */
3358 gmmR0UnlinkChunk(pChunk);
3359
3360 RTListNodeRemove(&pChunk->ListNode);
3361
3362 PAVLU32NODECORE pCore = RTAvlU32Remove(&pGMM->pChunks, pChunk->Core.Key);
3363 Assert(pCore == &pChunk->Core); NOREF(pCore);
3364
3365 PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(pChunk->Core.Key)];
3366 if (pTlbe->pChunk == pChunk)
3367 {
3368 pTlbe->idChunk = NIL_GMM_CHUNKID;
3369 pTlbe->pChunk = NULL;
3370 }
3371
3372 Assert(pGMM->cChunks > 0);
3373 pGMM->cChunks--;
3374
3375 /*
3376 * Free the Chunk ID before dropping the locks and freeing the rest.
3377 */
3378 gmmR0FreeChunkId(pGMM, pChunk->Core.Key);
3379 pChunk->Core.Key = NIL_GMM_CHUNKID;
3380
3381 pGMM->cFreedChunks++;
3382
3383 gmmR0ChunkMutexRelease(&MtxState, NULL);
3384 if (fRelaxedSem)
3385 gmmR0MutexRelease(pGMM);
3386
3387 RTMemFree(pChunk->paMappingsX);
3388 pChunk->paMappingsX = NULL;
3389
3390 RTMemFree(pChunk);
3391
3392#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
3393 int rc = RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3394#else
3395 int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3396#endif
3397 AssertLogRelRC(rc);
3398
3399 if (fRelaxedSem)
3400 gmmR0MutexAcquire(pGMM);
3401 return fRelaxedSem;
3402}
3403
3404
3405/**
3406 * Free page worker.
3407 *
3408 * The caller does all the statistic decrementing, we do all the incrementing.
3409 *
3410 * @param pGMM Pointer to the GMM instance data.
3411 * @param pGVM Pointer to the GVM instance.
3412 * @param pChunk Pointer to the chunk this page belongs to.
3413 * @param idPage The Page ID.
3414 * @param pPage Pointer to the page.
3415 */
3416static void gmmR0FreePageWorker(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint32_t idPage, PGMMPAGE pPage)
3417{
3418 Log3(("F pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x\n",
3419 pPage, pPage - &pChunk->aPages[0], idPage, pPage->Common.u2State, pChunk->iFreeHead)); NOREF(idPage);
3420
3421 /*
3422 * Put the page on the free list.
3423 */
3424 pPage->u = 0;
3425 pPage->Free.u2State = GMM_PAGE_STATE_FREE;
3426 Assert(pChunk->iFreeHead < RT_ELEMENTS(pChunk->aPages) || pChunk->iFreeHead == UINT16_MAX);
3427 pPage->Free.iNext = pChunk->iFreeHead;
3428 pChunk->iFreeHead = pPage - &pChunk->aPages[0];
3429
3430 /*
3431 * Update statistics (the cShared/cPrivate stats are up to date already),
3432 * and relink the chunk if necessary.
3433 */
3434 unsigned const cFree = pChunk->cFree;
3435 if ( !cFree
3436 || gmmR0SelectFreeSetList(cFree) != gmmR0SelectFreeSetList(cFree + 1))
3437 {
3438 gmmR0UnlinkChunk(pChunk);
3439 pChunk->cFree++;
3440 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
3441 }
3442 else
3443 {
3444 pChunk->cFree = cFree + 1;
3445 pChunk->pSet->cFreePages++;
3446 }
3447
3448 /*
3449 * If the chunk becomes empty, consider giving memory back to the host OS.
3450 *
3451 * The current strategy is to try give it back if there are other chunks
3452 * in this free list, meaning if there are at least 240 free pages in this
3453 * category. Note that since there are probably mappings of the chunk,
3454 * it won't be freed up instantly, which probably screws up this logic
3455 * a bit...
3456 */
3457 /** @todo Do this on the way out. */
3458 if (RT_LIKELY( pChunk->cFree != GMM_CHUNK_NUM_PAGES
3459 || pChunk->pFreeNext == NULL
3460 || pChunk->pFreePrev == NULL /** @todo this is probably misfiring, see reset... */))
3461 { /* likely */ }
3462#ifdef GMM_WITH_LEGACY_MODE
3463 else if (RT_LIKELY(pGMM->fLegacyAllocationMode && !(pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE)))
3464 { /* likely */ }
3465#endif
3466 else
3467 gmmR0FreeChunk(pGMM, NULL, pChunk, false);
3468
3469}
3470
3471
3472/**
3473 * Frees a shared page, the page is known to exist and be valid and such.
3474 *
3475 * @param pGMM Pointer to the GMM instance.
3476 * @param pGVM Pointer to the GVM instance.
3477 * @param idPage The page id.
3478 * @param pPage The page structure.
3479 */
3480DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3481{
3482 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3483 Assert(pChunk);
3484 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3485 Assert(pChunk->cShared > 0);
3486 Assert(pGMM->cSharedPages > 0);
3487 Assert(pGMM->cAllocatedPages > 0);
3488 Assert(!pPage->Shared.cRefs);
3489
3490 pChunk->cShared--;
3491 pGMM->cAllocatedPages--;
3492 pGMM->cSharedPages--;
3493 gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3494}
3495
3496
3497/**
3498 * Frees a private page, the page is known to exist and be valid and such.
3499 *
3500 * @param pGMM Pointer to the GMM instance.
3501 * @param pGVM Pointer to the GVM instance.
3502 * @param idPage The page id.
3503 * @param pPage The page structure.
3504 */
3505DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3506{
3507 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3508 Assert(pChunk);
3509 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3510 Assert(pChunk->cPrivate > 0);
3511 Assert(pGMM->cAllocatedPages > 0);
3512
3513 pChunk->cPrivate--;
3514 pGMM->cAllocatedPages--;
3515 gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3516}
3517
3518
3519/**
3520 * Common worker for GMMR0FreePages and GMMR0BalloonedPages.
3521 *
3522 * @returns VBox status code:
3523 * @retval xxx
3524 *
3525 * @param pGMM Pointer to the GMM instance data.
3526 * @param pGVM Pointer to the VM.
3527 * @param cPages The number of pages to free.
3528 * @param paPages Pointer to the page descriptors.
3529 * @param enmAccount The account this relates to.
3530 */
3531static int gmmR0FreePages(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3532{
3533 /*
3534 * Check that the request isn't impossible wrt to the account status.
3535 */
3536 switch (enmAccount)
3537 {
3538 case GMMACCOUNT_BASE:
3539 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3540 {
3541 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3542 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3543 }
3544 break;
3545 case GMMACCOUNT_SHADOW:
3546 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages < cPages))
3547 {
3548 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
3549 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3550 }
3551 break;
3552 case GMMACCOUNT_FIXED:
3553 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages < cPages))
3554 {
3555 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
3556 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3557 }
3558 break;
3559 default:
3560 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3561 }
3562
3563 /*
3564 * Walk the descriptors and free the pages.
3565 *
3566 * Statistics (except the account) are being updated as we go along,
3567 * unlike the alloc code. Also, stop on the first error.
3568 */
3569 int rc = VINF_SUCCESS;
3570 uint32_t iPage;
3571 for (iPage = 0; iPage < cPages; iPage++)
3572 {
3573 uint32_t idPage = paPages[iPage].idPage;
3574 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3575 if (RT_LIKELY(pPage))
3576 {
3577 if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
3578 {
3579 if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
3580 {
3581 Assert(pGVM->gmm.s.Stats.cPrivatePages);
3582 pGVM->gmm.s.Stats.cPrivatePages--;
3583 gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
3584 }
3585 else
3586 {
3587 Log(("gmmR0AllocatePages: #%#x/%#x: not owner! hGVM=%#x hSelf=%#x\n", iPage, idPage,
3588 pPage->Private.hGVM, pGVM->hSelf));
3589 rc = VERR_GMM_NOT_PAGE_OWNER;
3590 break;
3591 }
3592 }
3593 else if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3594 {
3595 Assert(pGVM->gmm.s.Stats.cSharedPages);
3596 Assert(pPage->Shared.cRefs);
3597#if defined(VBOX_WITH_PAGE_SHARING) && defined(VBOX_STRICT) && HC_ARCH_BITS == 64
3598 if (pPage->Shared.u14Checksum)
3599 {
3600 uint32_t uChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
3601 uChecksum &= UINT32_C(0x00003fff);
3602 AssertMsg(!uChecksum || uChecksum == pPage->Shared.u14Checksum,
3603 ("%#x vs %#x - idPage=%#x\n", uChecksum, pPage->Shared.u14Checksum, idPage));
3604 }
3605#endif
3606 pGVM->gmm.s.Stats.cSharedPages--;
3607 if (!--pPage->Shared.cRefs)
3608 gmmR0FreeSharedPage(pGMM, pGVM, idPage, pPage);
3609 else
3610 {
3611 Assert(pGMM->cDuplicatePages);
3612 pGMM->cDuplicatePages--;
3613 }
3614 }
3615 else
3616 {
3617 Log(("gmmR0AllocatePages: #%#x/%#x: already free!\n", iPage, idPage));
3618 rc = VERR_GMM_PAGE_ALREADY_FREE;
3619 break;
3620 }
3621 }
3622 else
3623 {
3624 Log(("gmmR0AllocatePages: #%#x/%#x: not found!\n", iPage, idPage));
3625 rc = VERR_GMM_PAGE_NOT_FOUND;
3626 break;
3627 }
3628 paPages[iPage].idPage = NIL_GMM_PAGEID;
3629 }
3630
3631 /*
3632 * Update the account.
3633 */
3634 switch (enmAccount)
3635 {
3636 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= iPage; break;
3637 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= iPage; break;
3638 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= iPage; break;
3639 default:
3640 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3641 }
3642
3643 /*
3644 * Any threshold stuff to be done here?
3645 */
3646
3647 return rc;
3648}
3649
3650
3651/**
3652 * Free one or more pages.
3653 *
3654 * This is typically used at reset time or power off.
3655 *
3656 * @returns VBox status code:
3657 * @retval xxx
3658 *
3659 * @param pGVM The global (ring-0) VM structure.
3660 * @param idCpu The VCPU id.
3661 * @param cPages The number of pages to allocate.
3662 * @param paPages Pointer to the page descriptors containing the page IDs
3663 * for each page.
3664 * @param enmAccount The account this relates to.
3665 * @thread EMT.
3666 */
3667GMMR0DECL(int) GMMR0FreePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3668{
3669 LogFlow(("GMMR0FreePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3670
3671 /*
3672 * Validate input and get the basics.
3673 */
3674 PGMM pGMM;
3675 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3676 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3677 if (RT_FAILURE(rc))
3678 return rc;
3679
3680 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3681 AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3682 AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3683
3684 for (unsigned iPage = 0; iPage < cPages; iPage++)
3685 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
3686 /*|| paPages[iPage].idPage == NIL_GMM_PAGEID*/,
3687 ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3688
3689 /*
3690 * Take the semaphore and call the worker function.
3691 */
3692 gmmR0MutexAcquire(pGMM);
3693 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3694 {
3695 rc = gmmR0FreePages(pGMM, pGVM, cPages, paPages, enmAccount);
3696 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3697 }
3698 else
3699 rc = VERR_GMM_IS_NOT_SANE;
3700 gmmR0MutexRelease(pGMM);
3701 LogFlow(("GMMR0FreePages: returns %Rrc\n", rc));
3702 return rc;
3703}
3704
3705
3706/**
3707 * VMMR0 request wrapper for GMMR0FreePages.
3708 *
3709 * @returns see GMMR0FreePages.
3710 * @param pGVM The global (ring-0) VM structure.
3711 * @param idCpu The VCPU id.
3712 * @param pReq Pointer to the request packet.
3713 */
3714GMMR0DECL(int) GMMR0FreePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMFREEPAGESREQ pReq)
3715{
3716 /*
3717 * Validate input and pass it on.
3718 */
3719 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3720 AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0]),
3721 ("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0])),
3722 VERR_INVALID_PARAMETER);
3723 AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages]),
3724 ("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages])),
3725 VERR_INVALID_PARAMETER);
3726
3727 return GMMR0FreePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3728}
3729
3730
3731/**
3732 * Report back on a memory ballooning request.
3733 *
3734 * The request may or may not have been initiated by the GMM. If it was initiated
3735 * by the GMM it is important that this function is called even if no pages were
3736 * ballooned.
3737 *
3738 * @returns VBox status code:
3739 * @retval VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH
3740 * @retval VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH
3741 * @retval VERR_GMM_OVERCOMMITTED_TRY_AGAIN_IN_A_BIT - reset condition
3742 * indicating that we won't necessarily have sufficient RAM to boot
3743 * the VM again and that it should pause until this changes (we'll try
3744 * balloon some other VM). (For standard deflate we have little choice
3745 * but to hope the VM won't use the memory that was returned to it.)
3746 *
3747 * @param pGVM The global (ring-0) VM structure.
3748 * @param idCpu The VCPU id.
3749 * @param enmAction Inflate/deflate/reset.
3750 * @param cBalloonedPages The number of pages that was ballooned.
3751 *
3752 * @thread EMT(idCpu)
3753 */
3754GMMR0DECL(int) GMMR0BalloonedPages(PGVM pGVM, VMCPUID idCpu, GMMBALLOONACTION enmAction, uint32_t cBalloonedPages)
3755{
3756 LogFlow(("GMMR0BalloonedPages: pGVM=%p enmAction=%d cBalloonedPages=%#x\n",
3757 pGVM, enmAction, cBalloonedPages));
3758
3759 AssertMsgReturn(cBalloonedPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cBalloonedPages), VERR_INVALID_PARAMETER);
3760
3761 /*
3762 * Validate input and get the basics.
3763 */
3764 PGMM pGMM;
3765 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3766 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3767 if (RT_FAILURE(rc))
3768 return rc;
3769
3770 /*
3771 * Take the semaphore and do some more validations.
3772 */
3773 gmmR0MutexAcquire(pGMM);
3774 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3775 {
3776 switch (enmAction)
3777 {
3778 case GMMBALLOONACTION_INFLATE:
3779 {
3780 if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cBalloonedPages
3781 <= pGVM->gmm.s.Stats.Reserved.cBasePages))
3782 {
3783 /*
3784 * Record the ballooned memory.
3785 */
3786 pGMM->cBalloonedPages += cBalloonedPages;
3787 if (pGVM->gmm.s.Stats.cReqBalloonedPages)
3788 {
3789 /* Codepath never taken. Might be interesting in the future to request ballooned memory from guests in low memory conditions.. */
3790 AssertFailed();
3791
3792 pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3793 pGVM->gmm.s.Stats.cReqActuallyBalloonedPages += cBalloonedPages;
3794 Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx Req=%#llx Actual=%#llx (pending)\n",
3795 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages,
3796 pGVM->gmm.s.Stats.cReqBalloonedPages, pGVM->gmm.s.Stats.cReqActuallyBalloonedPages));
3797 }
3798 else
3799 {
3800 pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3801 Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3802 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3803 }
3804 }
3805 else
3806 {
3807 Log(("GMMR0BalloonedPages: cBasePages=%#llx Total=%#llx cBalloonedPages=%#llx Reserved=%#llx\n",
3808 pGVM->gmm.s.Stats.Allocated.cBasePages, pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages,
3809 pGVM->gmm.s.Stats.Reserved.cBasePages));
3810 rc = VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3811 }
3812 break;
3813 }
3814
3815 case GMMBALLOONACTION_DEFLATE:
3816 {
3817 /* Deflate. */
3818 if (pGVM->gmm.s.Stats.cBalloonedPages >= cBalloonedPages)
3819 {
3820 /*
3821 * Record the ballooned memory.
3822 */
3823 Assert(pGMM->cBalloonedPages >= cBalloonedPages);
3824 pGMM->cBalloonedPages -= cBalloonedPages;
3825 pGVM->gmm.s.Stats.cBalloonedPages -= cBalloonedPages;
3826 if (pGVM->gmm.s.Stats.cReqDeflatePages)
3827 {
3828 AssertFailed(); /* This is path is for later. */
3829 Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx Req=%#llx\n",
3830 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages, pGVM->gmm.s.Stats.cReqDeflatePages));
3831
3832 /*
3833 * Anything we need to do here now when the request has been completed?
3834 */
3835 pGVM->gmm.s.Stats.cReqDeflatePages = 0;
3836 }
3837 else
3838 Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3839 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3840 }
3841 else
3842 {
3843 Log(("GMMR0BalloonedPages: Total=%#llx cBalloonedPages=%#llx\n", pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages));
3844 rc = VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH;
3845 }
3846 break;
3847 }
3848
3849 case GMMBALLOONACTION_RESET:
3850 {
3851 /* Reset to an empty balloon. */
3852 Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
3853
3854 pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
3855 pGVM->gmm.s.Stats.cBalloonedPages = 0;
3856 break;
3857 }
3858
3859 default:
3860 rc = VERR_INVALID_PARAMETER;
3861 break;
3862 }
3863 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3864 }
3865 else
3866 rc = VERR_GMM_IS_NOT_SANE;
3867
3868 gmmR0MutexRelease(pGMM);
3869 LogFlow(("GMMR0BalloonedPages: returns %Rrc\n", rc));
3870 return rc;
3871}
3872
3873
3874/**
3875 * VMMR0 request wrapper for GMMR0BalloonedPages.
3876 *
3877 * @returns see GMMR0BalloonedPages.
3878 * @param pGVM The global (ring-0) VM structure.
3879 * @param idCpu The VCPU id.
3880 * @param pReq Pointer to the request packet.
3881 */
3882GMMR0DECL(int) GMMR0BalloonedPagesReq(PGVM pGVM, VMCPUID idCpu, PGMMBALLOONEDPAGESREQ pReq)
3883{
3884 /*
3885 * Validate input and pass it on.
3886 */
3887 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3888 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMBALLOONEDPAGESREQ),
3889 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMBALLOONEDPAGESREQ)),
3890 VERR_INVALID_PARAMETER);
3891
3892 return GMMR0BalloonedPages(pGVM, idCpu, pReq->enmAction, pReq->cBalloonedPages);
3893}
3894
3895
3896/**
3897 * Return memory statistics for the hypervisor
3898 *
3899 * @returns VBox status code.
3900 * @param pReq Pointer to the request packet.
3901 */
3902GMMR0DECL(int) GMMR0QueryHypervisorMemoryStatsReq(PGMMMEMSTATSREQ pReq)
3903{
3904 /*
3905 * Validate input and pass it on.
3906 */
3907 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3908 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3909 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3910 VERR_INVALID_PARAMETER);
3911
3912 /*
3913 * Validate input and get the basics.
3914 */
3915 PGMM pGMM;
3916 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3917 pReq->cAllocPages = pGMM->cAllocatedPages;
3918 pReq->cFreePages = (pGMM->cChunks << (GMM_CHUNK_SHIFT- PAGE_SHIFT)) - pGMM->cAllocatedPages;
3919 pReq->cBalloonedPages = pGMM->cBalloonedPages;
3920 pReq->cMaxPages = pGMM->cMaxPages;
3921 pReq->cSharedPages = pGMM->cDuplicatePages;
3922 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3923
3924 return VINF_SUCCESS;
3925}
3926
3927
3928/**
3929 * Return memory statistics for the VM
3930 *
3931 * @returns VBox status code.
3932 * @param pGVM The global (ring-0) VM structure.
3933 * @param idCpu Cpu id.
3934 * @param pReq Pointer to the request packet.
3935 *
3936 * @thread EMT(idCpu)
3937 */
3938GMMR0DECL(int) GMMR0QueryMemoryStatsReq(PGVM pGVM, VMCPUID idCpu, PGMMMEMSTATSREQ pReq)
3939{
3940 /*
3941 * Validate input and pass it on.
3942 */
3943 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3944 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3945 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3946 VERR_INVALID_PARAMETER);
3947
3948 /*
3949 * Validate input and get the basics.
3950 */
3951 PGMM pGMM;
3952 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3953 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3954 if (RT_FAILURE(rc))
3955 return rc;
3956
3957 /*
3958 * Take the semaphore and do some more validations.
3959 */
3960 gmmR0MutexAcquire(pGMM);
3961 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3962 {
3963 pReq->cAllocPages = pGVM->gmm.s.Stats.Allocated.cBasePages;
3964 pReq->cBalloonedPages = pGVM->gmm.s.Stats.cBalloonedPages;
3965 pReq->cMaxPages = pGVM->gmm.s.Stats.Reserved.cBasePages;
3966 pReq->cFreePages = pReq->cMaxPages - pReq->cAllocPages;
3967 }
3968 else
3969 rc = VERR_GMM_IS_NOT_SANE;
3970
3971 gmmR0MutexRelease(pGMM);
3972 LogFlow(("GMMR3QueryVMMemoryStats: returns %Rrc\n", rc));
3973 return rc;
3974}
3975
3976
3977/**
3978 * Worker for gmmR0UnmapChunk and gmmr0FreeChunk.
3979 *
3980 * Don't call this in legacy allocation mode!
3981 *
3982 * @returns VBox status code.
3983 * @param pGMM Pointer to the GMM instance data.
3984 * @param pGVM Pointer to the Global VM structure.
3985 * @param pChunk Pointer to the chunk to be unmapped.
3986 */
3987static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
3988{
3989 RT_NOREF_PV(pGMM);
3990#ifdef GMM_WITH_LEGACY_MODE
3991 Assert(!pGMM->fLegacyAllocationMode || (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE));
3992#endif
3993
3994 /*
3995 * Find the mapping and try unmapping it.
3996 */
3997 uint32_t cMappings = pChunk->cMappingsX;
3998 for (uint32_t i = 0; i < cMappings; i++)
3999 {
4000 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4001 if (pChunk->paMappingsX[i].pGVM == pGVM)
4002 {
4003 /* unmap */
4004 int rc = RTR0MemObjFree(pChunk->paMappingsX[i].hMapObj, false /* fFreeMappings (NA) */);
4005 if (RT_SUCCESS(rc))
4006 {
4007 /* update the record. */
4008 cMappings--;
4009 if (i < cMappings)
4010 pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
4011 pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
4012 pChunk->paMappingsX[cMappings].pGVM = NULL;
4013 Assert(pChunk->cMappingsX - 1U == cMappings);
4014 pChunk->cMappingsX = cMappings;
4015 }
4016
4017 return rc;
4018 }
4019 }
4020
4021 Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
4022 return VERR_GMM_CHUNK_NOT_MAPPED;
4023}
4024
4025
4026/**
4027 * Unmaps a chunk previously mapped into the address space of the current process.
4028 *
4029 * @returns VBox status code.
4030 * @param pGMM Pointer to the GMM instance data.
4031 * @param pGVM Pointer to the Global VM structure.
4032 * @param pChunk Pointer to the chunk to be unmapped.
4033 * @param fRelaxedSem Whether we can release the semaphore while doing the
4034 * mapping (@c true) or not.
4035 */
4036static int gmmR0UnmapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
4037{
4038#ifdef GMM_WITH_LEGACY_MODE
4039 if (!pGMM->fLegacyAllocationMode || (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
4040 {
4041#endif
4042 /*
4043 * Lock the chunk and if possible leave the giant GMM lock.
4044 */
4045 GMMR0CHUNKMTXSTATE MtxState;
4046 int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4047 fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4048 if (RT_SUCCESS(rc))
4049 {
4050 rc = gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
4051 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4052 }
4053 return rc;
4054#ifdef GMM_WITH_LEGACY_MODE
4055 }
4056
4057 if (pChunk->hGVM == pGVM->hSelf)
4058 return VINF_SUCCESS;
4059
4060 Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x (legacy)\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
4061 return VERR_GMM_CHUNK_NOT_MAPPED;
4062#endif
4063}
4064
4065
4066/**
4067 * Worker for gmmR0MapChunk.
4068 *
4069 * @returns VBox status code.
4070 * @param pGMM Pointer to the GMM instance data.
4071 * @param pGVM Pointer to the Global VM structure.
4072 * @param pChunk Pointer to the chunk to be mapped.
4073 * @param ppvR3 Where to store the ring-3 address of the mapping.
4074 * In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4075 * contain the address of the existing mapping.
4076 */
4077static int gmmR0MapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4078{
4079#ifdef GMM_WITH_LEGACY_MODE
4080 /*
4081 * If we're in legacy mode this is simple.
4082 */
4083 if (pGMM->fLegacyAllocationMode && !(pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
4084 {
4085 if (pChunk->hGVM != pGVM->hSelf)
4086 {
4087 Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4088 return VERR_GMM_CHUNK_NOT_FOUND;
4089 }
4090
4091 *ppvR3 = RTR0MemObjAddressR3(pChunk->hMemObj);
4092 return VINF_SUCCESS;
4093 }
4094#else
4095 RT_NOREF(pGMM);
4096#endif
4097
4098 /*
4099 * Check to see if the chunk is already mapped.
4100 */
4101 for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4102 {
4103 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4104 if (pChunk->paMappingsX[i].pGVM == pGVM)
4105 {
4106 *ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4107 Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4108#ifdef VBOX_WITH_PAGE_SHARING
4109 /* The ring-3 chunk cache can be out of sync; don't fail. */
4110 return VINF_SUCCESS;
4111#else
4112 return VERR_GMM_CHUNK_ALREADY_MAPPED;
4113#endif
4114 }
4115 }
4116
4117 /*
4118 * Do the mapping.
4119 */
4120 RTR0MEMOBJ hMapObj;
4121 int rc = RTR0MemObjMapUser(&hMapObj, pChunk->hMemObj, (RTR3PTR)-1, 0, RTMEM_PROT_READ | RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4122 if (RT_SUCCESS(rc))
4123 {
4124 /* reallocate the array? assumes few users per chunk (usually one). */
4125 unsigned iMapping = pChunk->cMappingsX;
4126 if ( iMapping <= 3
4127 || (iMapping & 3) == 0)
4128 {
4129 unsigned cNewSize = iMapping <= 3
4130 ? iMapping + 1
4131 : iMapping + 4;
4132 Assert(cNewSize < 4 || RT_ALIGN_32(cNewSize, 4) == cNewSize);
4133 if (RT_UNLIKELY(cNewSize > UINT16_MAX))
4134 {
4135 rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4136 return VERR_GMM_TOO_MANY_CHUNK_MAPPINGS;
4137 }
4138
4139 void *pvMappings = RTMemRealloc(pChunk->paMappingsX, cNewSize * sizeof(pChunk->paMappingsX[0]));
4140 if (RT_UNLIKELY(!pvMappings))
4141 {
4142 rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4143 return VERR_NO_MEMORY;
4144 }
4145 pChunk->paMappingsX = (PGMMCHUNKMAP)pvMappings;
4146 }
4147
4148 /* insert new entry */
4149 pChunk->paMappingsX[iMapping].hMapObj = hMapObj;
4150 pChunk->paMappingsX[iMapping].pGVM = pGVM;
4151 Assert(pChunk->cMappingsX == iMapping);
4152 pChunk->cMappingsX = iMapping + 1;
4153
4154 *ppvR3 = RTR0MemObjAddressR3(hMapObj);
4155 }
4156
4157 return rc;
4158}
4159
4160
4161/**
4162 * Maps a chunk into the user address space of the current process.
4163 *
4164 * @returns VBox status code.
4165 * @param pGMM Pointer to the GMM instance data.
4166 * @param pGVM Pointer to the Global VM structure.
4167 * @param pChunk Pointer to the chunk to be mapped.
4168 * @param fRelaxedSem Whether we can release the semaphore while doing the
4169 * mapping (@c true) or not.
4170 * @param ppvR3 Where to store the ring-3 address of the mapping.
4171 * In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4172 * contain the address of the existing mapping.
4173 */
4174static int gmmR0MapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem, PRTR3PTR ppvR3)
4175{
4176 /*
4177 * Take the chunk lock and leave the giant GMM lock when possible, then
4178 * call the worker function.
4179 */
4180 GMMR0CHUNKMTXSTATE MtxState;
4181 int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4182 fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4183 if (RT_SUCCESS(rc))
4184 {
4185 rc = gmmR0MapChunkLocked(pGMM, pGVM, pChunk, ppvR3);
4186 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4187 }
4188
4189 return rc;
4190}
4191
4192
4193
4194#if defined(VBOX_WITH_PAGE_SHARING) || (defined(VBOX_STRICT) && HC_ARCH_BITS == 64)
4195/**
4196 * Check if a chunk is mapped into the specified VM
4197 *
4198 * @returns mapped yes/no
4199 * @param pGMM Pointer to the GMM instance.
4200 * @param pGVM Pointer to the Global VM structure.
4201 * @param pChunk Pointer to the chunk to be mapped.
4202 * @param ppvR3 Where to store the ring-3 address of the mapping.
4203 */
4204static bool gmmR0IsChunkMapped(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4205{
4206 GMMR0CHUNKMTXSTATE MtxState;
4207 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
4208 for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4209 {
4210 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4211 if (pChunk->paMappingsX[i].pGVM == pGVM)
4212 {
4213 *ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4214 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4215 return true;
4216 }
4217 }
4218 *ppvR3 = NULL;
4219 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4220 return false;
4221}
4222#endif /* VBOX_WITH_PAGE_SHARING || (VBOX_STRICT && 64-BIT) */
4223
4224
4225/**
4226 * Map a chunk and/or unmap another chunk.
4227 *
4228 * The mapping and unmapping applies to the current process.
4229 *
4230 * This API does two things because it saves a kernel call per mapping when
4231 * when the ring-3 mapping cache is full.
4232 *
4233 * @returns VBox status code.
4234 * @param pGVM The global (ring-0) VM structure.
4235 * @param idChunkMap The chunk to map. NIL_GMM_CHUNKID if nothing to map.
4236 * @param idChunkUnmap The chunk to unmap. NIL_GMM_CHUNKID if nothing to unmap.
4237 * @param ppvR3 Where to store the address of the mapped chunk. NULL is ok if nothing to map.
4238 * @thread EMT ???
4239 */
4240GMMR0DECL(int) GMMR0MapUnmapChunk(PGVM pGVM, uint32_t idChunkMap, uint32_t idChunkUnmap, PRTR3PTR ppvR3)
4241{
4242 LogFlow(("GMMR0MapUnmapChunk: pGVM=%p idChunkMap=%#x idChunkUnmap=%#x ppvR3=%p\n",
4243 pGVM, idChunkMap, idChunkUnmap, ppvR3));
4244
4245 /*
4246 * Validate input and get the basics.
4247 */
4248 PGMM pGMM;
4249 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4250 int rc = GVMMR0ValidateGVM(pGVM);
4251 if (RT_FAILURE(rc))
4252 return rc;
4253
4254 AssertCompile(NIL_GMM_CHUNKID == 0);
4255 AssertMsgReturn(idChunkMap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkMap), VERR_INVALID_PARAMETER);
4256 AssertMsgReturn(idChunkUnmap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkUnmap), VERR_INVALID_PARAMETER);
4257
4258 if ( idChunkMap == NIL_GMM_CHUNKID
4259 && idChunkUnmap == NIL_GMM_CHUNKID)
4260 return VERR_INVALID_PARAMETER;
4261
4262 if (idChunkMap != NIL_GMM_CHUNKID)
4263 {
4264 AssertPtrReturn(ppvR3, VERR_INVALID_POINTER);
4265 *ppvR3 = NIL_RTR3PTR;
4266 }
4267
4268 /*
4269 * Take the semaphore and do the work.
4270 *
4271 * The unmapping is done last since it's easier to undo a mapping than
4272 * undoing an unmapping. The ring-3 mapping cache cannot not be so big
4273 * that it pushes the user virtual address space to within a chunk of
4274 * it it's limits, so, no problem here.
4275 */
4276 gmmR0MutexAcquire(pGMM);
4277 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4278 {
4279 PGMMCHUNK pMap = NULL;
4280 if (idChunkMap != NIL_GVM_HANDLE)
4281 {
4282 pMap = gmmR0GetChunk(pGMM, idChunkMap);
4283 if (RT_LIKELY(pMap))
4284 rc = gmmR0MapChunk(pGMM, pGVM, pMap, true /*fRelaxedSem*/, ppvR3);
4285 else
4286 {
4287 Log(("GMMR0MapUnmapChunk: idChunkMap=%#x\n", idChunkMap));
4288 rc = VERR_GMM_CHUNK_NOT_FOUND;
4289 }
4290 }
4291/** @todo split this operation, the bail out might (theoretcially) not be
4292 * entirely safe. */
4293
4294 if ( idChunkUnmap != NIL_GMM_CHUNKID
4295 && RT_SUCCESS(rc))
4296 {
4297 PGMMCHUNK pUnmap = gmmR0GetChunk(pGMM, idChunkUnmap);
4298 if (RT_LIKELY(pUnmap))
4299 rc = gmmR0UnmapChunk(pGMM, pGVM, pUnmap, true /*fRelaxedSem*/);
4300 else
4301 {
4302 Log(("GMMR0MapUnmapChunk: idChunkUnmap=%#x\n", idChunkUnmap));
4303 rc = VERR_GMM_CHUNK_NOT_FOUND;
4304 }
4305
4306 if (RT_FAILURE(rc) && pMap)
4307 gmmR0UnmapChunk(pGMM, pGVM, pMap, false /*fRelaxedSem*/);
4308 }
4309
4310 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4311 }
4312 else
4313 rc = VERR_GMM_IS_NOT_SANE;
4314 gmmR0MutexRelease(pGMM);
4315
4316 LogFlow(("GMMR0MapUnmapChunk: returns %Rrc\n", rc));
4317 return rc;
4318}
4319
4320
4321/**
4322 * VMMR0 request wrapper for GMMR0MapUnmapChunk.
4323 *
4324 * @returns see GMMR0MapUnmapChunk.
4325 * @param pGVM The global (ring-0) VM structure.
4326 * @param pReq Pointer to the request packet.
4327 */
4328GMMR0DECL(int) GMMR0MapUnmapChunkReq(PGVM pGVM, PGMMMAPUNMAPCHUNKREQ pReq)
4329{
4330 /*
4331 * Validate input and pass it on.
4332 */
4333 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4334 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4335
4336 return GMMR0MapUnmapChunk(pGVM, pReq->idChunkMap, pReq->idChunkUnmap, &pReq->pvR3);
4337}
4338
4339
4340/**
4341 * Legacy mode API for supplying pages.
4342 *
4343 * The specified user address points to a allocation chunk sized block that
4344 * will be locked down and used by the GMM when the GM asks for pages.
4345 *
4346 * @returns VBox status code.
4347 * @param pGVM The global (ring-0) VM structure.
4348 * @param idCpu The VCPU id.
4349 * @param pvR3 Pointer to the chunk size memory block to lock down.
4350 */
4351GMMR0DECL(int) GMMR0SeedChunk(PGVM pGVM, VMCPUID idCpu, RTR3PTR pvR3)
4352{
4353#ifdef GMM_WITH_LEGACY_MODE
4354 /*
4355 * Validate input and get the basics.
4356 */
4357 PGMM pGMM;
4358 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4359 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4360 if (RT_FAILURE(rc))
4361 return rc;
4362
4363 AssertPtrReturn(pvR3, VERR_INVALID_POINTER);
4364 AssertReturn(!(PAGE_OFFSET_MASK & pvR3), VERR_INVALID_POINTER);
4365
4366 if (!pGMM->fLegacyAllocationMode)
4367 {
4368 Log(("GMMR0SeedChunk: not in legacy allocation mode!\n"));
4369 return VERR_NOT_SUPPORTED;
4370 }
4371
4372 /*
4373 * Lock the memory and add it as new chunk with our hGVM.
4374 * (The GMM locking is done inside gmmR0RegisterChunk.)
4375 */
4376 RTR0MEMOBJ hMemObj;
4377 rc = RTR0MemObjLockUser(&hMemObj, pvR3, GMM_CHUNK_SIZE, RTMEM_PROT_READ | RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4378 if (RT_SUCCESS(rc))
4379 {
4380 rc = gmmR0RegisterChunk(pGMM, &pGVM->gmm.s.Private, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_SEEDED, NULL);
4381 if (RT_SUCCESS(rc))
4382 gmmR0MutexRelease(pGMM);
4383 else
4384 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
4385 }
4386
4387 LogFlow(("GMMR0SeedChunk: rc=%d (pvR3=%p)\n", rc, pvR3));
4388 return rc;
4389#else
4390 RT_NOREF(pGVM, idCpu, pvR3);
4391 return VERR_NOT_SUPPORTED;
4392#endif
4393}
4394
4395#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
4396
4397/**
4398 * Gets the ring-0 virtual address for the given page.
4399 *
4400 * @returns VBox status code.
4401 * @param pGVM Pointer to the kernel-only VM instace data.
4402 * @param idPage The page ID.
4403 * @param ppv Where to store the address.
4404 * @thread EMT
4405 */
4406GMMR0DECL(int) GMMR0PageIdToVirt(PGVM pGVM, uint32_t idPage, void **ppv)
4407{
4408 *ppv = NULL;
4409 PGMM pGMM;
4410 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4411 gmmR0MutexAcquire(pGMM); /** @todo shared access */
4412
4413 int rc;
4414 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4415 if (pChunk)
4416 {
4417 const GMMPAGE *pPage = &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
4418 if (RT_LIKELY( ( GMM_PAGE_IS_PRIVATE(pPage)
4419 && pPage->Private.hGVM == pGVM->hSelf)
4420 || GMM_PAGE_IS_SHARED(pPage)))
4421 {
4422 AssertPtr(pChunk->pbMapping);
4423 *ppv = &pChunk->pbMapping[(idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT];
4424 rc = VINF_SUCCESS;
4425 }
4426 else
4427 rc = VERR_GMM_NOT_PAGE_OWNER;
4428 }
4429 else
4430 rc = VERR_GMM_PAGE_NOT_FOUND;
4431
4432 gmmR0MutexRelease(pGMM);
4433 return rc;
4434}
4435
4436#endif
4437
4438#ifdef VBOX_WITH_PAGE_SHARING
4439
4440# ifdef VBOX_STRICT
4441/**
4442 * For checksumming shared pages in strict builds.
4443 *
4444 * The purpose is making sure that a page doesn't change.
4445 *
4446 * @returns Checksum, 0 on failure.
4447 * @param pGMM The GMM instance data.
4448 * @param pGVM Pointer to the kernel-only VM instace data.
4449 * @param idPage The page ID.
4450 */
4451static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage)
4452{
4453 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4454 AssertMsgReturn(pChunk, ("idPage=%#x\n", idPage), 0);
4455
4456 uint8_t *pbChunk;
4457 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4458 return 0;
4459 uint8_t const *pbPage = pbChunk + ((idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4460
4461 return RTCrc32(pbPage, PAGE_SIZE);
4462}
4463# endif /* VBOX_STRICT */
4464
4465
4466/**
4467 * Calculates the module hash value.
4468 *
4469 * @returns Hash value.
4470 * @param pszModuleName The module name.
4471 * @param pszVersion The module version string.
4472 */
4473static uint32_t gmmR0ShModCalcHash(const char *pszModuleName, const char *pszVersion)
4474{
4475 return RTStrHash1ExN(3, pszModuleName, RTSTR_MAX, "::", (size_t)2, pszVersion, RTSTR_MAX);
4476}
4477
4478
4479/**
4480 * Finds a global module.
4481 *
4482 * @returns Pointer to the global module on success, NULL if not found.
4483 * @param pGMM The GMM instance data.
4484 * @param uHash The hash as calculated by gmmR0ShModCalcHash.
4485 * @param cbModule The module size.
4486 * @param enmGuestOS The guest OS type.
4487 * @param cRegions The number of regions.
4488 * @param pszModuleName The module name.
4489 * @param pszVersion The module version.
4490 * @param paRegions The region descriptions.
4491 */
4492static PGMMSHAREDMODULE gmmR0ShModFindGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4493 uint32_t cRegions, const char *pszModuleName, const char *pszVersion,
4494 struct VMMDEVSHAREDREGIONDESC const *paRegions)
4495{
4496 for (PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTAvllU32Get(&pGMM->pGlobalSharedModuleTree, uHash);
4497 pGblMod;
4498 pGblMod = (PGMMSHAREDMODULE)pGblMod->Core.pList)
4499 {
4500 if (pGblMod->cbModule != cbModule)
4501 continue;
4502 if (pGblMod->enmGuestOS != enmGuestOS)
4503 continue;
4504 if (pGblMod->cRegions != cRegions)
4505 continue;
4506 if (strcmp(pGblMod->szName, pszModuleName))
4507 continue;
4508 if (strcmp(pGblMod->szVersion, pszVersion))
4509 continue;
4510
4511 uint32_t i;
4512 for (i = 0; i < cRegions; i++)
4513 {
4514 uint32_t off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4515 if (pGblMod->aRegions[i].off != off)
4516 break;
4517
4518 uint32_t cb = RT_ALIGN_32(paRegions[i].cbRegion + off, PAGE_SIZE);
4519 if (pGblMod->aRegions[i].cb != cb)
4520 break;
4521 }
4522
4523 if (i == cRegions)
4524 return pGblMod;
4525 }
4526
4527 return NULL;
4528}
4529
4530
4531/**
4532 * Creates a new global module.
4533 *
4534 * @returns VBox status code.
4535 * @param pGMM The GMM instance data.
4536 * @param uHash The hash as calculated by gmmR0ShModCalcHash.
4537 * @param cbModule The module size.
4538 * @param enmGuestOS The guest OS type.
4539 * @param cRegions The number of regions.
4540 * @param pszModuleName The module name.
4541 * @param pszVersion The module version.
4542 * @param paRegions The region descriptions.
4543 * @param ppGblMod Where to return the new module on success.
4544 */
4545static int gmmR0ShModNewGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4546 uint32_t cRegions, const char *pszModuleName, const char *pszVersion,
4547 struct VMMDEVSHAREDREGIONDESC const *paRegions, PGMMSHAREDMODULE *ppGblMod)
4548{
4549 Log(("gmmR0ShModNewGlobal: %s %s size %#x os %u rgn %u\n", pszModuleName, pszVersion, cbModule, enmGuestOS, cRegions));
4550 if (pGMM->cShareableModules >= GMM_MAX_SHARED_GLOBAL_MODULES)
4551 {
4552 Log(("gmmR0ShModNewGlobal: Too many modules\n"));
4553 return VERR_GMM_TOO_MANY_GLOBAL_MODULES;
4554 }
4555
4556 PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULE, aRegions[cRegions]));
4557 if (!pGblMod)
4558 {
4559 Log(("gmmR0ShModNewGlobal: No memory\n"));
4560 return VERR_NO_MEMORY;
4561 }
4562
4563 pGblMod->Core.Key = uHash;
4564 pGblMod->cbModule = cbModule;
4565 pGblMod->cRegions = cRegions;
4566 pGblMod->cUsers = 1;
4567 pGblMod->enmGuestOS = enmGuestOS;
4568 strcpy(pGblMod->szName, pszModuleName);
4569 strcpy(pGblMod->szVersion, pszVersion);
4570
4571 for (uint32_t i = 0; i < cRegions; i++)
4572 {
4573 Log(("gmmR0ShModNewGlobal: rgn[%u]=%RGvLB%#x\n", i, paRegions[i].GCRegionAddr, paRegions[i].cbRegion));
4574 pGblMod->aRegions[i].off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4575 pGblMod->aRegions[i].cb = paRegions[i].cbRegion + pGblMod->aRegions[i].off;
4576 pGblMod->aRegions[i].cb = RT_ALIGN_32(pGblMod->aRegions[i].cb, PAGE_SIZE);
4577 pGblMod->aRegions[i].paidPages = NULL; /* allocated when needed. */
4578 }
4579
4580 bool fInsert = RTAvllU32Insert(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4581 Assert(fInsert); NOREF(fInsert);
4582 pGMM->cShareableModules++;
4583
4584 *ppGblMod = pGblMod;
4585 return VINF_SUCCESS;
4586}
4587
4588
4589/**
4590 * Deletes a global module which is no longer referenced by anyone.
4591 *
4592 * @param pGMM The GMM instance data.
4593 * @param pGblMod The module to delete.
4594 */
4595static void gmmR0ShModDeleteGlobal(PGMM pGMM, PGMMSHAREDMODULE pGblMod)
4596{
4597 Assert(pGblMod->cUsers == 0);
4598 Assert(pGMM->cShareableModules > 0 && pGMM->cShareableModules <= GMM_MAX_SHARED_GLOBAL_MODULES);
4599
4600 void *pvTest = RTAvllU32RemoveNode(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4601 Assert(pvTest == pGblMod); NOREF(pvTest);
4602 pGMM->cShareableModules--;
4603
4604 uint32_t i = pGblMod->cRegions;
4605 while (i-- > 0)
4606 {
4607 if (pGblMod->aRegions[i].paidPages)
4608 {
4609 /* We don't doing anything to the pages as they are handled by the
4610 copy-on-write mechanism in PGM. */
4611 RTMemFree(pGblMod->aRegions[i].paidPages);
4612 pGblMod->aRegions[i].paidPages = NULL;
4613 }
4614 }
4615 RTMemFree(pGblMod);
4616}
4617
4618
4619static int gmmR0ShModNewPerVM(PGVM pGVM, RTGCPTR GCBaseAddr, uint32_t cRegions, const VMMDEVSHAREDREGIONDESC *paRegions,
4620 PGMMSHAREDMODULEPERVM *ppRecVM)
4621{
4622 if (pGVM->gmm.s.Stats.cShareableModules >= GMM_MAX_SHARED_PER_VM_MODULES)
4623 return VERR_GMM_TOO_MANY_PER_VM_MODULES;
4624
4625 PGMMSHAREDMODULEPERVM pRecVM;
4626 pRecVM = (PGMMSHAREDMODULEPERVM)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULEPERVM, aRegionsGCPtrs[cRegions]));
4627 if (!pRecVM)
4628 return VERR_NO_MEMORY;
4629
4630 pRecVM->Core.Key = GCBaseAddr;
4631 for (uint32_t i = 0; i < cRegions; i++)
4632 pRecVM->aRegionsGCPtrs[i] = paRegions[i].GCRegionAddr;
4633
4634 bool fInsert = RTAvlGCPtrInsert(&pGVM->gmm.s.pSharedModuleTree, &pRecVM->Core);
4635 Assert(fInsert); NOREF(fInsert);
4636 pGVM->gmm.s.Stats.cShareableModules++;
4637
4638 *ppRecVM = pRecVM;
4639 return VINF_SUCCESS;
4640}
4641
4642
4643static void gmmR0ShModDeletePerVM(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULEPERVM pRecVM, bool fRemove)
4644{
4645 /*
4646 * Free the per-VM module.
4647 */
4648 PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
4649 pRecVM->pGlobalModule = NULL;
4650
4651 if (fRemove)
4652 {
4653 void *pvTest = RTAvlGCPtrRemove(&pGVM->gmm.s.pSharedModuleTree, pRecVM->Core.Key);
4654 Assert(pvTest == &pRecVM->Core); NOREF(pvTest);
4655 }
4656
4657 RTMemFree(pRecVM);
4658
4659 /*
4660 * Release the global module.
4661 * (In the registration bailout case, it might not be.)
4662 */
4663 if (pGblMod)
4664 {
4665 Assert(pGblMod->cUsers > 0);
4666 pGblMod->cUsers--;
4667 if (pGblMod->cUsers == 0)
4668 gmmR0ShModDeleteGlobal(pGMM, pGblMod);
4669 }
4670}
4671
4672#endif /* VBOX_WITH_PAGE_SHARING */
4673
4674/**
4675 * Registers a new shared module for the VM.
4676 *
4677 * @returns VBox status code.
4678 * @param pGVM The global (ring-0) VM structure.
4679 * @param idCpu The VCPU id.
4680 * @param enmGuestOS The guest OS type.
4681 * @param pszModuleName The module name.
4682 * @param pszVersion The module version.
4683 * @param GCPtrModBase The module base address.
4684 * @param cbModule The module size.
4685 * @param cRegions The mumber of shared region descriptors.
4686 * @param paRegions Pointer to an array of shared region(s).
4687 * @thread EMT(idCpu)
4688 */
4689GMMR0DECL(int) GMMR0RegisterSharedModule(PGVM pGVM, VMCPUID idCpu, VBOXOSFAMILY enmGuestOS, char *pszModuleName,
4690 char *pszVersion, RTGCPTR GCPtrModBase, uint32_t cbModule,
4691 uint32_t cRegions, struct VMMDEVSHAREDREGIONDESC const *paRegions)
4692{
4693#ifdef VBOX_WITH_PAGE_SHARING
4694 /*
4695 * Validate input and get the basics.
4696 *
4697 * Note! Turns out the module size does necessarily match the size of the
4698 * regions. (iTunes on XP)
4699 */
4700 PGMM pGMM;
4701 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4702 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4703 if (RT_FAILURE(rc))
4704 return rc;
4705
4706 if (RT_UNLIKELY(cRegions > VMMDEVSHAREDREGIONDESC_MAX))
4707 return VERR_GMM_TOO_MANY_REGIONS;
4708
4709 if (RT_UNLIKELY(cbModule == 0 || cbModule > _1G))
4710 return VERR_GMM_BAD_SHARED_MODULE_SIZE;
4711
4712 uint32_t cbTotal = 0;
4713 for (uint32_t i = 0; i < cRegions; i++)
4714 {
4715 if (RT_UNLIKELY(paRegions[i].cbRegion == 0 || paRegions[i].cbRegion > _1G))
4716 return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4717
4718 cbTotal += paRegions[i].cbRegion;
4719 if (RT_UNLIKELY(cbTotal > _1G))
4720 return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4721 }
4722
4723 AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4724 if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4725 return VERR_GMM_MODULE_NAME_TOO_LONG;
4726
4727 AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4728 if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4729 return VERR_GMM_MODULE_NAME_TOO_LONG;
4730
4731 uint32_t const uHash = gmmR0ShModCalcHash(pszModuleName, pszVersion);
4732 Log(("GMMR0RegisterSharedModule %s %s base %RGv size %x hash %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule, uHash));
4733
4734 /*
4735 * Take the semaphore and do some more validations.
4736 */
4737 gmmR0MutexAcquire(pGMM);
4738 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4739 {
4740 /*
4741 * Check if this module is already locally registered and register
4742 * it if it isn't. The base address is a unique module identifier
4743 * locally.
4744 */
4745 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4746 bool fNewModule = pRecVM == NULL;
4747 if (fNewModule)
4748 {
4749 rc = gmmR0ShModNewPerVM(pGVM, GCPtrModBase, cRegions, paRegions, &pRecVM);
4750 if (RT_SUCCESS(rc))
4751 {
4752 /*
4753 * Find a matching global module, register a new one if needed.
4754 */
4755 PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4756 pszModuleName, pszVersion, paRegions);
4757 if (!pGblMod)
4758 {
4759 Assert(fNewModule);
4760 rc = gmmR0ShModNewGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4761 pszModuleName, pszVersion, paRegions, &pGblMod);
4762 if (RT_SUCCESS(rc))
4763 {
4764 pRecVM->pGlobalModule = pGblMod; /* (One referenced returned by gmmR0ShModNewGlobal.) */
4765 Log(("GMMR0RegisterSharedModule: new module %s %s\n", pszModuleName, pszVersion));
4766 }
4767 else
4768 gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /*fRemove*/);
4769 }
4770 else
4771 {
4772 Assert(pGblMod->cUsers > 0 && pGblMod->cUsers < UINT32_MAX / 2);
4773 pGblMod->cUsers++;
4774 pRecVM->pGlobalModule = pGblMod;
4775
4776 Log(("GMMR0RegisterSharedModule: new per vm module %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4777 }
4778 }
4779 }
4780 else
4781 {
4782 /*
4783 * Attempt to re-register an existing module.
4784 */
4785 PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4786 pszModuleName, pszVersion, paRegions);
4787 if (pRecVM->pGlobalModule == pGblMod)
4788 {
4789 Log(("GMMR0RegisterSharedModule: already registered %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4790 rc = VINF_GMM_SHARED_MODULE_ALREADY_REGISTERED;
4791 }
4792 else
4793 {
4794 /** @todo may have to unregister+register when this happens in case it's caused
4795 * by VBoxService crashing and being restarted... */
4796 Log(("GMMR0RegisterSharedModule: Address clash!\n"
4797 " incoming at %RGvLB%#x %s %s rgns %u\n"
4798 " existing at %RGvLB%#x %s %s rgns %u\n",
4799 GCPtrModBase, cbModule, pszModuleName, pszVersion, cRegions,
4800 pRecVM->Core.Key, pRecVM->pGlobalModule->cbModule, pRecVM->pGlobalModule->szName,
4801 pRecVM->pGlobalModule->szVersion, pRecVM->pGlobalModule->cRegions));
4802 rc = VERR_GMM_SHARED_MODULE_ADDRESS_CLASH;
4803 }
4804 }
4805 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4806 }
4807 else
4808 rc = VERR_GMM_IS_NOT_SANE;
4809
4810 gmmR0MutexRelease(pGMM);
4811 return rc;
4812#else
4813
4814 NOREF(pGVM); NOREF(idCpu); NOREF(enmGuestOS); NOREF(pszModuleName); NOREF(pszVersion);
4815 NOREF(GCPtrModBase); NOREF(cbModule); NOREF(cRegions); NOREF(paRegions);
4816 return VERR_NOT_IMPLEMENTED;
4817#endif
4818}
4819
4820
4821/**
4822 * VMMR0 request wrapper for GMMR0RegisterSharedModule.
4823 *
4824 * @returns see GMMR0RegisterSharedModule.
4825 * @param pGVM The global (ring-0) VM structure.
4826 * @param idCpu The VCPU id.
4827 * @param pReq Pointer to the request packet.
4828 */
4829GMMR0DECL(int) GMMR0RegisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMREGISTERSHAREDMODULEREQ pReq)
4830{
4831 /*
4832 * Validate input and pass it on.
4833 */
4834 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4835 AssertMsgReturn( pReq->Hdr.cbReq >= sizeof(*pReq)
4836 && pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMREGISTERSHAREDMODULEREQ, aRegions[pReq->cRegions]),
4837 ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4838
4839 /* Pass back return code in the request packet to preserve informational codes. (VMMR3CallR0 chokes on them) */
4840 pReq->rc = GMMR0RegisterSharedModule(pGVM, idCpu, pReq->enmGuestOS, pReq->szName, pReq->szVersion,
4841 pReq->GCBaseAddr, pReq->cbModule, pReq->cRegions, pReq->aRegions);
4842 return VINF_SUCCESS;
4843}
4844
4845
4846/**
4847 * Unregisters a shared module for the VM
4848 *
4849 * @returns VBox status code.
4850 * @param pGVM The global (ring-0) VM structure.
4851 * @param idCpu The VCPU id.
4852 * @param pszModuleName The module name.
4853 * @param pszVersion The module version.
4854 * @param GCPtrModBase The module base address.
4855 * @param cbModule The module size.
4856 */
4857GMMR0DECL(int) GMMR0UnregisterSharedModule(PGVM pGVM, VMCPUID idCpu, char *pszModuleName, char *pszVersion,
4858 RTGCPTR GCPtrModBase, uint32_t cbModule)
4859{
4860#ifdef VBOX_WITH_PAGE_SHARING
4861 /*
4862 * Validate input and get the basics.
4863 */
4864 PGMM pGMM;
4865 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4866 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4867 if (RT_FAILURE(rc))
4868 return rc;
4869
4870 AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4871 AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4872 if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4873 return VERR_GMM_MODULE_NAME_TOO_LONG;
4874 if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4875 return VERR_GMM_MODULE_NAME_TOO_LONG;
4876
4877 Log(("GMMR0UnregisterSharedModule %s %s base=%RGv size %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule));
4878
4879 /*
4880 * Take the semaphore and do some more validations.
4881 */
4882 gmmR0MutexAcquire(pGMM);
4883 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4884 {
4885 /*
4886 * Locate and remove the specified module.
4887 */
4888 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4889 if (pRecVM)
4890 {
4891 /** @todo Do we need to do more validations here, like that the
4892 * name + version + cbModule matches? */
4893 NOREF(cbModule);
4894 Assert(pRecVM->pGlobalModule);
4895 gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /*fRemove*/);
4896 }
4897 else
4898 rc = VERR_GMM_SHARED_MODULE_NOT_FOUND;
4899
4900 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4901 }
4902 else
4903 rc = VERR_GMM_IS_NOT_SANE;
4904
4905 gmmR0MutexRelease(pGMM);
4906 return rc;
4907#else
4908
4909 NOREF(pGVM); NOREF(idCpu); NOREF(pszModuleName); NOREF(pszVersion); NOREF(GCPtrModBase); NOREF(cbModule);
4910 return VERR_NOT_IMPLEMENTED;
4911#endif
4912}
4913
4914
4915/**
4916 * VMMR0 request wrapper for GMMR0UnregisterSharedModule.
4917 *
4918 * @returns see GMMR0UnregisterSharedModule.
4919 * @param pGVM The global (ring-0) VM structure.
4920 * @param idCpu The VCPU id.
4921 * @param pReq Pointer to the request packet.
4922 */
4923GMMR0DECL(int) GMMR0UnregisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMUNREGISTERSHAREDMODULEREQ pReq)
4924{
4925 /*
4926 * Validate input and pass it on.
4927 */
4928 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4929 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4930
4931 return GMMR0UnregisterSharedModule(pGVM, idCpu, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule);
4932}
4933
4934#ifdef VBOX_WITH_PAGE_SHARING
4935
4936/**
4937 * Increase the use count of a shared page, the page is known to exist and be valid and such.
4938 *
4939 * @param pGMM Pointer to the GMM instance.
4940 * @param pGVM Pointer to the GVM instance.
4941 * @param pPage The page structure.
4942 */
4943DECLINLINE(void) gmmR0UseSharedPage(PGMM pGMM, PGVM pGVM, PGMMPAGE pPage)
4944{
4945 Assert(pGMM->cSharedPages > 0);
4946 Assert(pGMM->cAllocatedPages > 0);
4947
4948 pGMM->cDuplicatePages++;
4949
4950 pPage->Shared.cRefs++;
4951 pGVM->gmm.s.Stats.cSharedPages++;
4952 pGVM->gmm.s.Stats.Allocated.cBasePages++;
4953}
4954
4955
4956/**
4957 * Converts a private page to a shared page, the page is known to exist and be valid and such.
4958 *
4959 * @param pGMM Pointer to the GMM instance.
4960 * @param pGVM Pointer to the GVM instance.
4961 * @param HCPhys Host physical address
4962 * @param idPage The Page ID
4963 * @param pPage The page structure.
4964 * @param pPageDesc Shared page descriptor
4965 */
4966DECLINLINE(void) gmmR0ConvertToSharedPage(PGMM pGMM, PGVM pGVM, RTHCPHYS HCPhys, uint32_t idPage, PGMMPAGE pPage,
4967 PGMMSHAREDPAGEDESC pPageDesc)
4968{
4969 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4970 Assert(pChunk);
4971 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
4972 Assert(GMM_PAGE_IS_PRIVATE(pPage));
4973
4974 pChunk->cPrivate--;
4975 pChunk->cShared++;
4976
4977 pGMM->cSharedPages++;
4978
4979 pGVM->gmm.s.Stats.cSharedPages++;
4980 pGVM->gmm.s.Stats.cPrivatePages--;
4981
4982 /* Modify the page structure. */
4983 pPage->Shared.pfn = (uint32_t)(uint64_t)(HCPhys >> PAGE_SHIFT);
4984 pPage->Shared.cRefs = 1;
4985#ifdef VBOX_STRICT
4986 pPageDesc->u32StrictChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
4987 pPage->Shared.u14Checksum = pPageDesc->u32StrictChecksum;
4988#else
4989 NOREF(pPageDesc);
4990 pPage->Shared.u14Checksum = 0;
4991#endif
4992 pPage->Shared.u2State = GMM_PAGE_STATE_SHARED;
4993}
4994
4995
4996static int gmmR0SharedModuleCheckPageFirstTime(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULE pModule,
4997 unsigned idxRegion, unsigned idxPage,
4998 PGMMSHAREDPAGEDESC pPageDesc, PGMMSHAREDREGIONDESC pGlobalRegion)
4999{
5000 NOREF(pModule);
5001
5002 /* Easy case: just change the internal page type. */
5003 PGMMPAGE pPage = gmmR0GetPage(pGMM, pPageDesc->idPage);
5004 AssertMsgReturn(pPage, ("idPage=%#x (GCPhys=%RGp HCPhys=%RHp idxRegion=%#x idxPage=%#x) #1\n",
5005 pPageDesc->idPage, pPageDesc->GCPhys, pPageDesc->HCPhys, idxRegion, idxPage),
5006 VERR_PGM_PHYS_INVALID_PAGE_ID);
5007 NOREF(idxRegion);
5008
5009 AssertMsg(pPageDesc->GCPhys == (pPage->Private.pfn << 12), ("desc %RGp gmm %RGp\n", pPageDesc->HCPhys, (pPage->Private.pfn << 12)));
5010
5011 gmmR0ConvertToSharedPage(pGMM, pGVM, pPageDesc->HCPhys, pPageDesc->idPage, pPage, pPageDesc);
5012
5013 /* Keep track of these references. */
5014 pGlobalRegion->paidPages[idxPage] = pPageDesc->idPage;
5015
5016 return VINF_SUCCESS;
5017}
5018
5019/**
5020 * Checks specified shared module range for changes
5021 *
5022 * Performs the following tasks:
5023 * - If a shared page is new, then it changes the GMM page type to shared and
5024 * returns it in the pPageDesc descriptor.
5025 * - If a shared page already exists, then it checks if the VM page is
5026 * identical and if so frees the VM page and returns the shared page in
5027 * pPageDesc descriptor.
5028 *
5029 * @remarks ASSUMES the caller has acquired the GMM semaphore!!
5030 *
5031 * @returns VBox status code.
5032 * @param pGVM Pointer to the GVM instance data.
5033 * @param pModule Module description
5034 * @param idxRegion Region index
5035 * @param idxPage Page index
5036 * @param pPageDesc Page descriptor
5037 */
5038GMMR0DECL(int) GMMR0SharedModuleCheckPage(PGVM pGVM, PGMMSHAREDMODULE pModule, uint32_t idxRegion, uint32_t idxPage,
5039 PGMMSHAREDPAGEDESC pPageDesc)
5040{
5041 int rc;
5042 PGMM pGMM;
5043 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5044 pPageDesc->u32StrictChecksum = 0;
5045
5046 AssertMsgReturn(idxRegion < pModule->cRegions,
5047 ("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5048 VERR_INVALID_PARAMETER);
5049
5050 uint32_t const cPages = pModule->aRegions[idxRegion].cb >> PAGE_SHIFT;
5051 AssertMsgReturn(idxPage < cPages,
5052 ("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5053 VERR_INVALID_PARAMETER);
5054
5055 LogFlow(("GMMR0SharedModuleCheckRange %s base %RGv region %d idxPage %d\n", pModule->szName, pModule->Core.Key, idxRegion, idxPage));
5056
5057 /*
5058 * First time; create a page descriptor array.
5059 */
5060 PGMMSHAREDREGIONDESC pGlobalRegion = &pModule->aRegions[idxRegion];
5061 if (!pGlobalRegion->paidPages)
5062 {
5063 Log(("Allocate page descriptor array for %d pages\n", cPages));
5064 pGlobalRegion->paidPages = (uint32_t *)RTMemAlloc(cPages * sizeof(pGlobalRegion->paidPages[0]));
5065 AssertReturn(pGlobalRegion->paidPages, VERR_NO_MEMORY);
5066
5067 /* Invalidate all descriptors. */
5068 uint32_t i = cPages;
5069 while (i-- > 0)
5070 pGlobalRegion->paidPages[i] = NIL_GMM_PAGEID;
5071 }
5072
5073 /*
5074 * We've seen this shared page for the first time?
5075 */
5076 if (pGlobalRegion->paidPages[idxPage] == NIL_GMM_PAGEID)
5077 {
5078 Log(("New shared page guest %RGp host %RHp\n", pPageDesc->GCPhys, pPageDesc->HCPhys));
5079 return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5080 }
5081
5082 /*
5083 * We've seen it before...
5084 */
5085 Log(("Replace existing page guest %RGp host %RHp id %#x -> id %#x\n",
5086 pPageDesc->GCPhys, pPageDesc->HCPhys, pPageDesc->idPage, pGlobalRegion->paidPages[idxPage]));
5087 Assert(pPageDesc->idPage != pGlobalRegion->paidPages[idxPage]);
5088
5089 /*
5090 * Get the shared page source.
5091 */
5092 PGMMPAGE pPage = gmmR0GetPage(pGMM, pGlobalRegion->paidPages[idxPage]);
5093 AssertMsgReturn(pPage, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #2\n", pPageDesc->idPage, idxRegion, idxPage),
5094 VERR_PGM_PHYS_INVALID_PAGE_ID);
5095
5096 if (pPage->Common.u2State != GMM_PAGE_STATE_SHARED)
5097 {
5098 /*
5099 * Page was freed at some point; invalidate this entry.
5100 */
5101 /** @todo this isn't really bullet proof. */
5102 Log(("Old shared page was freed -> create a new one\n"));
5103 pGlobalRegion->paidPages[idxPage] = NIL_GMM_PAGEID;
5104 return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5105 }
5106
5107 Log(("Replace existing page guest host %RHp -> %RHp\n", pPageDesc->HCPhys, ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT));
5108
5109 /*
5110 * Calculate the virtual address of the local page.
5111 */
5112 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pPageDesc->idPage >> GMM_CHUNKID_SHIFT);
5113 AssertMsgReturn(pChunk, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #4\n", pPageDesc->idPage, idxRegion, idxPage),
5114 VERR_PGM_PHYS_INVALID_PAGE_ID);
5115
5116 uint8_t *pbChunk;
5117 AssertMsgReturn(gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk),
5118 ("idPage=%#x (idxRegion=%#x idxPage=%#x) #3\n", pPageDesc->idPage, idxRegion, idxPage),
5119 VERR_PGM_PHYS_INVALID_PAGE_ID);
5120 uint8_t *pbLocalPage = pbChunk + ((pPageDesc->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5121
5122 /*
5123 * Calculate the virtual address of the shared page.
5124 */
5125 pChunk = gmmR0GetChunk(pGMM, pGlobalRegion->paidPages[idxPage] >> GMM_CHUNKID_SHIFT);
5126 Assert(pChunk); /* can't fail as gmmR0GetPage succeeded. */
5127
5128 /*
5129 * Get the virtual address of the physical page; map the chunk into the VM
5130 * process if not already done.
5131 */
5132 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5133 {
5134 Log(("Map chunk into process!\n"));
5135 rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/, (PRTR3PTR)&pbChunk);
5136 AssertRCReturn(rc, rc);
5137 }
5138 uint8_t *pbSharedPage = pbChunk + ((pGlobalRegion->paidPages[idxPage] & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5139
5140#ifdef VBOX_STRICT
5141 pPageDesc->u32StrictChecksum = RTCrc32(pbSharedPage, PAGE_SIZE);
5142 uint32_t uChecksum = pPageDesc->u32StrictChecksum & UINT32_C(0x00003fff);
5143 AssertMsg(!uChecksum || uChecksum == pPage->Shared.u14Checksum || !pPage->Shared.u14Checksum,
5144 ("%#x vs %#x - idPage=%#x - %s %s\n", uChecksum, pPage->Shared.u14Checksum,
5145 pGlobalRegion->paidPages[idxPage], pModule->szName, pModule->szVersion));
5146#endif
5147
5148 /** @todo write ASMMemComparePage. */
5149 if (memcmp(pbSharedPage, pbLocalPage, PAGE_SIZE))
5150 {
5151 Log(("Unexpected differences found between local and shared page; skip\n"));
5152 /* Signal to the caller that this one hasn't changed. */
5153 pPageDesc->idPage = NIL_GMM_PAGEID;
5154 return VINF_SUCCESS;
5155 }
5156
5157 /*
5158 * Free the old local page.
5159 */
5160 GMMFREEPAGEDESC PageDesc;
5161 PageDesc.idPage = pPageDesc->idPage;
5162 rc = gmmR0FreePages(pGMM, pGVM, 1, &PageDesc, GMMACCOUNT_BASE);
5163 AssertRCReturn(rc, rc);
5164
5165 gmmR0UseSharedPage(pGMM, pGVM, pPage);
5166
5167 /*
5168 * Pass along the new physical address & page id.
5169 */
5170 pPageDesc->HCPhys = ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT;
5171 pPageDesc->idPage = pGlobalRegion->paidPages[idxPage];
5172
5173 return VINF_SUCCESS;
5174}
5175
5176
5177/**
5178 * RTAvlGCPtrDestroy callback.
5179 *
5180 * @returns 0 or VERR_GMM_INSTANCE.
5181 * @param pNode The node to destroy.
5182 * @param pvArgs Pointer to an argument packet.
5183 */
5184static DECLCALLBACK(int) gmmR0CleanupSharedModule(PAVLGCPTRNODECORE pNode, void *pvArgs)
5185{
5186 gmmR0ShModDeletePerVM(((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGMM,
5187 ((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGVM,
5188 (PGMMSHAREDMODULEPERVM)pNode,
5189 false /*fRemove*/);
5190 return VINF_SUCCESS;
5191}
5192
5193
5194/**
5195 * Used by GMMR0CleanupVM to clean up shared modules.
5196 *
5197 * This is called without taking the GMM lock so that it can be yielded as
5198 * needed here.
5199 *
5200 * @param pGMM The GMM handle.
5201 * @param pGVM The global VM handle.
5202 */
5203static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM)
5204{
5205 gmmR0MutexAcquire(pGMM);
5206 GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
5207
5208 GMMR0SHMODPERVMDTORARGS Args;
5209 Args.pGVM = pGVM;
5210 Args.pGMM = pGMM;
5211 RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5212
5213 AssertMsg(pGVM->gmm.s.Stats.cShareableModules == 0, ("%d\n", pGVM->gmm.s.Stats.cShareableModules));
5214 pGVM->gmm.s.Stats.cShareableModules = 0;
5215
5216 gmmR0MutexRelease(pGMM);
5217}
5218
5219#endif /* VBOX_WITH_PAGE_SHARING */
5220
5221/**
5222 * Removes all shared modules for the specified VM
5223 *
5224 * @returns VBox status code.
5225 * @param pGVM The global (ring-0) VM structure.
5226 * @param idCpu The VCPU id.
5227 */
5228GMMR0DECL(int) GMMR0ResetSharedModules(PGVM pGVM, VMCPUID idCpu)
5229{
5230#ifdef VBOX_WITH_PAGE_SHARING
5231 /*
5232 * Validate input and get the basics.
5233 */
5234 PGMM pGMM;
5235 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5236 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5237 if (RT_FAILURE(rc))
5238 return rc;
5239
5240 /*
5241 * Take the semaphore and do some more validations.
5242 */
5243 gmmR0MutexAcquire(pGMM);
5244 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5245 {
5246 Log(("GMMR0ResetSharedModules\n"));
5247 GMMR0SHMODPERVMDTORARGS Args;
5248 Args.pGVM = pGVM;
5249 Args.pGMM = pGMM;
5250 RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5251 pGVM->gmm.s.Stats.cShareableModules = 0;
5252
5253 rc = VINF_SUCCESS;
5254 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5255 }
5256 else
5257 rc = VERR_GMM_IS_NOT_SANE;
5258
5259 gmmR0MutexRelease(pGMM);
5260 return rc;
5261#else
5262 RT_NOREF(pGVM, idCpu);
5263 return VERR_NOT_IMPLEMENTED;
5264#endif
5265}
5266
5267#ifdef VBOX_WITH_PAGE_SHARING
5268
5269/**
5270 * Tree enumeration callback for checking a shared module.
5271 */
5272static DECLCALLBACK(int) gmmR0CheckSharedModule(PAVLGCPTRNODECORE pNode, void *pvUser)
5273{
5274 GMMCHECKSHAREDMODULEINFO *pArgs = (GMMCHECKSHAREDMODULEINFO*)pvUser;
5275 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)pNode;
5276 PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
5277
5278 Log(("gmmR0CheckSharedModule: check %s %s base=%RGv size=%x\n",
5279 pGblMod->szName, pGblMod->szVersion, pGblMod->Core.Key, pGblMod->cbModule));
5280
5281 int rc = PGMR0SharedModuleCheck(pArgs->pGVM, pArgs->pGVM, pArgs->idCpu, pGblMod, pRecVM->aRegionsGCPtrs);
5282 if (RT_FAILURE(rc))
5283 return rc;
5284 return VINF_SUCCESS;
5285}
5286
5287#endif /* VBOX_WITH_PAGE_SHARING */
5288
5289/**
5290 * Check all shared modules for the specified VM.
5291 *
5292 * @returns VBox status code.
5293 * @param pGVM The global (ring-0) VM structure.
5294 * @param idCpu The calling EMT number.
5295 * @thread EMT(idCpu)
5296 */
5297GMMR0DECL(int) GMMR0CheckSharedModules(PGVM pGVM, VMCPUID idCpu)
5298{
5299#ifdef VBOX_WITH_PAGE_SHARING
5300 /*
5301 * Validate input and get the basics.
5302 */
5303 PGMM pGMM;
5304 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5305 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5306 if (RT_FAILURE(rc))
5307 return rc;
5308
5309# ifndef DEBUG_sandervl
5310 /*
5311 * Take the semaphore and do some more validations.
5312 */
5313 gmmR0MutexAcquire(pGMM);
5314# endif
5315 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5316 {
5317 /*
5318 * Walk the tree, checking each module.
5319 */
5320 Log(("GMMR0CheckSharedModules\n"));
5321
5322 GMMCHECKSHAREDMODULEINFO Args;
5323 Args.pGVM = pGVM;
5324 Args.idCpu = idCpu;
5325 rc = RTAvlGCPtrDoWithAll(&pGVM->gmm.s.pSharedModuleTree, true /* fFromLeft */, gmmR0CheckSharedModule, &Args);
5326
5327 Log(("GMMR0CheckSharedModules done (rc=%Rrc)!\n", rc));
5328 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5329 }
5330 else
5331 rc = VERR_GMM_IS_NOT_SANE;
5332
5333# ifndef DEBUG_sandervl
5334 gmmR0MutexRelease(pGMM);
5335# endif
5336 return rc;
5337#else
5338 RT_NOREF(pGVM, idCpu);
5339 return VERR_NOT_IMPLEMENTED;
5340#endif
5341}
5342
5343#if defined(VBOX_STRICT) && HC_ARCH_BITS == 64
5344
5345/**
5346 * Worker for GMMR0FindDuplicatePageReq.
5347 *
5348 * @returns true if duplicate, false if not.
5349 */
5350static bool gmmR0FindDupPageInChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint8_t const *pbSourcePage)
5351{
5352 bool fFoundDuplicate = false;
5353 /* Only take chunks not mapped into this VM process; not entirely correct. */
5354 uint8_t *pbChunk;
5355 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5356 {
5357 int rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/, (PRTR3PTR)&pbChunk);
5358 if (RT_SUCCESS(rc))
5359 {
5360 /*
5361 * Look for duplicate pages
5362 */
5363 uintptr_t iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
5364 while (iPage-- > 0)
5365 {
5366 if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
5367 {
5368 uint8_t *pbDestPage = pbChunk + (iPage << PAGE_SHIFT);
5369 if (!memcmp(pbSourcePage, pbDestPage, PAGE_SIZE))
5370 {
5371 fFoundDuplicate = true;
5372 break;
5373 }
5374 }
5375 }
5376 gmmR0UnmapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/);
5377 }
5378 }
5379 return fFoundDuplicate;
5380}
5381
5382
5383/**
5384 * Find a duplicate of the specified page in other active VMs
5385 *
5386 * @returns VBox status code.
5387 * @param pGVM The global (ring-0) VM structure.
5388 * @param pReq Pointer to the request packet.
5389 */
5390GMMR0DECL(int) GMMR0FindDuplicatePageReq(PGVM pGVM, PGMMFINDDUPLICATEPAGEREQ pReq)
5391{
5392 /*
5393 * Validate input and pass it on.
5394 */
5395 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5396 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5397
5398 PGMM pGMM;
5399 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5400
5401 int rc = GVMMR0ValidateGVM(pGVM);
5402 if (RT_FAILURE(rc))
5403 return rc;
5404
5405 /*
5406 * Take the semaphore and do some more validations.
5407 */
5408 rc = gmmR0MutexAcquire(pGMM);
5409 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5410 {
5411 uint8_t *pbChunk;
5412 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pReq->idPage >> GMM_CHUNKID_SHIFT);
5413 if (pChunk)
5414 {
5415 if (gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5416 {
5417 uint8_t *pbSourcePage = pbChunk + ((pReq->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5418 PGMMPAGE pPage = gmmR0GetPage(pGMM, pReq->idPage);
5419 if (pPage)
5420 {
5421 /*
5422 * Walk the chunks
5423 */
5424 pReq->fDuplicate = false;
5425 RTListForEach(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
5426 {
5427 if (gmmR0FindDupPageInChunk(pGMM, pGVM, pChunk, pbSourcePage))
5428 {
5429 pReq->fDuplicate = true;
5430 break;
5431 }
5432 }
5433 }
5434 else
5435 {
5436 AssertFailed();
5437 rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
5438 }
5439 }
5440 else
5441 AssertFailed();
5442 }
5443 else
5444 AssertFailed();
5445 }
5446 else
5447 rc = VERR_GMM_IS_NOT_SANE;
5448
5449 gmmR0MutexRelease(pGMM);
5450 return rc;
5451}
5452
5453#endif /* VBOX_STRICT && HC_ARCH_BITS == 64 */
5454
5455
5456/**
5457 * Retrieves the GMM statistics visible to the caller.
5458 *
5459 * @returns VBox status code.
5460 *
5461 * @param pStats Where to put the statistics.
5462 * @param pSession The current session.
5463 * @param pGVM The GVM to obtain statistics for. Optional.
5464 */
5465GMMR0DECL(int) GMMR0QueryStatistics(PGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5466{
5467 LogFlow(("GVMMR0QueryStatistics: pStats=%p pSession=%p pGVM=%p\n", pStats, pSession, pGVM));
5468
5469 /*
5470 * Validate input.
5471 */
5472 AssertPtrReturn(pSession, VERR_INVALID_POINTER);
5473 AssertPtrReturn(pStats, VERR_INVALID_POINTER);
5474 pStats->cMaxPages = 0; /* (crash before taking the mutex...) */
5475
5476 PGMM pGMM;
5477 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5478
5479 /*
5480 * Validate the VM handle, if not NULL, and lock the GMM.
5481 */
5482 int rc;
5483 if (pGVM)
5484 {
5485 rc = GVMMR0ValidateGVM(pGVM);
5486 if (RT_FAILURE(rc))
5487 return rc;
5488 }
5489
5490 rc = gmmR0MutexAcquire(pGMM);
5491 if (RT_FAILURE(rc))
5492 return rc;
5493
5494 /*
5495 * Copy out the GMM statistics.
5496 */
5497 pStats->cMaxPages = pGMM->cMaxPages;
5498 pStats->cReservedPages = pGMM->cReservedPages;
5499 pStats->cOverCommittedPages = pGMM->cOverCommittedPages;
5500 pStats->cAllocatedPages = pGMM->cAllocatedPages;
5501 pStats->cSharedPages = pGMM->cSharedPages;
5502 pStats->cDuplicatePages = pGMM->cDuplicatePages;
5503 pStats->cLeftBehindSharedPages = pGMM->cLeftBehindSharedPages;
5504 pStats->cBalloonedPages = pGMM->cBalloonedPages;
5505 pStats->cChunks = pGMM->cChunks;
5506 pStats->cFreedChunks = pGMM->cFreedChunks;
5507 pStats->cShareableModules = pGMM->cShareableModules;
5508 RT_ZERO(pStats->au64Reserved);
5509
5510 /*
5511 * Copy out the VM statistics.
5512 */
5513 if (pGVM)
5514 pStats->VMStats = pGVM->gmm.s.Stats;
5515 else
5516 RT_ZERO(pStats->VMStats);
5517
5518 gmmR0MutexRelease(pGMM);
5519 return rc;
5520}
5521
5522
5523/**
5524 * VMMR0 request wrapper for GMMR0QueryStatistics.
5525 *
5526 * @returns see GMMR0QueryStatistics.
5527 * @param pGVM The global (ring-0) VM structure. Optional.
5528 * @param pReq Pointer to the request packet.
5529 */
5530GMMR0DECL(int) GMMR0QueryStatisticsReq(PGVM pGVM, PGMMQUERYSTATISTICSSREQ pReq)
5531{
5532 /*
5533 * Validate input and pass it on.
5534 */
5535 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5536 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5537
5538 return GMMR0QueryStatistics(&pReq->Stats, pReq->pSession, pGVM);
5539}
5540
5541
5542/**
5543 * Resets the specified GMM statistics.
5544 *
5545 * @returns VBox status code.
5546 *
5547 * @param pStats Which statistics to reset, that is, non-zero fields
5548 * indicates which to reset.
5549 * @param pSession The current session.
5550 * @param pGVM The GVM to reset statistics for. Optional.
5551 */
5552GMMR0DECL(int) GMMR0ResetStatistics(PCGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5553{
5554 NOREF(pStats); NOREF(pSession); NOREF(pGVM);
5555 /* Currently nothing we can reset at the moment. */
5556 return VINF_SUCCESS;
5557}
5558
5559
5560/**
5561 * VMMR0 request wrapper for GMMR0ResetStatistics.
5562 *
5563 * @returns see GMMR0ResetStatistics.
5564 * @param pGVM The global (ring-0) VM structure. Optional.
5565 * @param pReq Pointer to the request packet.
5566 */
5567GMMR0DECL(int) GMMR0ResetStatisticsReq(PGVM pGVM, PGMMRESETSTATISTICSSREQ pReq)
5568{
5569 /*
5570 * Validate input and pass it on.
5571 */
5572 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5573 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5574
5575 return GMMR0ResetStatistics(&pReq->Stats, pReq->pSession, pGVM);
5576}
5577
注意: 瀏覽 TracBrowser 來幫助您使用儲存庫瀏覽器

© 2025 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette