GMMR0.cpp@ 82976

最後變更在這個檔案從82976是 82976,由 vboxsync 提交於 5 年前
VMM/GMMR0: Use the chunk list rather than the AVL tree in GMMR0FindDuplicatePageReq to look for duplicate pages. This will restricts the AVL tree to lookups and make it simpler to protect. bugref:9627
屬性 svn:eol-style 設為 `native` 屬性 svn:keywords 設為 `Id Revision`
檔案大小: 195.5 KB

行
1	/* $Id: GMMR0.cpp 82976 2020-02-04 12:36:10Z vboxsync $ */
2	/** @file
3	* GMM - Global Memory Manager.
4	*/
5
6	/*
7	* Copyright (C) 2007-2020 Oracle Corporation
8	*
9	* This file is part of VirtualBox Open Source Edition (OSE), as
10	* available from http://www.alldomusa.eu.org. This file is free software;
11	* you can redistribute it and/or modify it under the terms of the GNU
12	* General Public License (GPL) as published by the Free Software
13	* Foundation, in version 2 as it comes in the "COPYING" file of the
14	* VirtualBox OSE distribution. VirtualBox OSE is distributed in the
15	* hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
16	*/
17
18
19	/** @page pg_gmm GMM - The Global Memory Manager
20	*
21	* As the name indicates, this component is responsible for global memory
22	* management. Currently only guest RAM is allocated from the GMM, but this
23	* may change to include shadow page tables and other bits later.
24	*
25	* Guest RAM is managed as individual pages, but allocated from the host OS
26	* in chunks for reasons of portability / efficiency. To minimize the memory
27	* footprint all tracking structure must be as small as possible without
28	* unnecessary performance penalties.
29	*
30	* The allocation chunks has fixed sized, the size defined at compile time
31	* by the #GMM_CHUNK_SIZE \#define.
32	*
33	* Each chunk is given an unique ID. Each page also has a unique ID. The
34	* relationship between the two IDs is:
35	* @code
36	* GMM_CHUNK_SHIFT = log2(GMM_CHUNK_SIZE / PAGE_SIZE);
37	* idPage = (idChunk << GMM_CHUNK_SHIFT) \| iPage;
38	* @endcode
39	* Where iPage is the index of the page within the chunk. This ID scheme
40	* permits for efficient chunk and page lookup, but it relies on the chunk size
41	* to be set at compile time. The chunks are organized in an AVL tree with their
42	* IDs being the keys.
43	*
44	* The physical address of each page in an allocation chunk is maintained by
45	* the #RTR0MEMOBJ and obtained using #RTR0MemObjGetPagePhysAddr. There is no
46	* need to duplicate this information (it'll cost 8-bytes per page if we did).
47	*
48	* So what do we need to track per page? Most importantly we need to know
49	* which state the page is in:
50	* - Private - Allocated for (eventually) backing one particular VM page.
51	* - Shared - Readonly page that is used by one or more VMs and treated
52	* as COW by PGM.
53	* - Free - Not used by anyone.
54	*
55	* For the page replacement operations (sharing, defragmenting and freeing)
56	* to be somewhat efficient, private pages needs to be associated with a
57	* particular page in a particular VM.
58	*
59	* Tracking the usage of shared pages is impractical and expensive, so we'll
60	* settle for a reference counting system instead.
61	*
62	* Free pages will be chained on LIFOs
63	*
64	* On 64-bit systems we will use a 64-bit bitfield per page, while on 32-bit
65	* systems a 32-bit bitfield will have to suffice because of address space
66	* limitations. The #GMMPAGE structure shows the details.
67	*
68	*
69	* @section sec_gmm_alloc_strat Page Allocation Strategy
70	*
71	* The strategy for allocating pages has to take fragmentation and shared
72	* pages into account, or we may end up with with 2000 chunks with only
73	* a few pages in each. Shared pages cannot easily be reallocated because
74	* of the inaccurate usage accounting (see above). Private pages can be
75	* reallocated by a defragmentation thread in the same manner that sharing
76	* is done.
77	*
78	* The first approach is to manage the free pages in two sets depending on
79	* whether they are mainly for the allocation of shared or private pages.
80	* In the initial implementation there will be almost no possibility for
81	* mixing shared and private pages in the same chunk (only if we're really
82	* stressed on memory), but when we implement forking of VMs and have to
83	* deal with lots of COW pages it'll start getting kind of interesting.
84	*
85	* The sets are lists of chunks with approximately the same number of
86	* free pages. Say the chunk size is 1MB, meaning 256 pages, and a set
87	* consists of 16 lists. So, the first list will contain the chunks with
88	* 1-7 free pages, the second covers 8-15, and so on. The chunks will be
89	* moved between the lists as pages are freed up or allocated.
90	*
91	*
92	* @section sec_gmm_costs Costs
93	*
94	* The per page cost in kernel space is 32-bit plus whatever RTR0MEMOBJ
95	* entails. In addition there is the chunk cost of approximately
96	* (sizeof(RT0MEMOBJ) + sizeof(CHUNK)) / 2^CHUNK_SHIFT bytes per page.
97	*
98	* On Windows the per page #RTR0MEMOBJ cost is 32-bit on 32-bit windows
99	* and 64-bit on 64-bit windows (a PFN_NUMBER in the MDL). So, 64-bit per page.
100	* The cost on Linux is identical, but here it's because of sizeof(struct page *).
101	*
102	*
103	* @section sec_gmm_legacy Legacy Mode for Non-Tier-1 Platforms
104	*
105	* In legacy mode the page source is locked user pages and not
106	* #RTR0MemObjAllocPhysNC, this means that a page can only be allocated
107	* by the VM that locked it. We will make no attempt at implementing
108	* page sharing on these systems, just do enough to make it all work.
109	*
110	* @note With 6.1 really dropping 32-bit support, the legacy mode is obsoleted
111	* under the assumption that there is sufficient kernel virtual address
112	* space to map all of the guest memory allocations. So, we'll be using
113	* #RTR0MemObjAllocPage on some platforms as an alternative to
114	* #RTR0MemObjAllocPhysNC.
115	*
116	*
117	* @subsection sub_gmm_locking Serializing
118	*
119	* One simple fast mutex will be employed in the initial implementation, not
120	* two as mentioned in @ref sec_pgmPhys_Serializing.
121	*
122	* @see @ref sec_pgmPhys_Serializing
123	*
124	*
125	* @section sec_gmm_overcommit Memory Over-Commitment Management
126	*
127	* The GVM will have to do the system wide memory over-commitment
128	* management. My current ideas are:
129	* - Per VM oc policy that indicates how much to initially commit
130	* to it and what to do in a out-of-memory situation.
131	* - Prevent overtaxing the host.
132	*
133	* There are some challenges here, the main ones are configurability and
134	* security. Should we for instance permit anyone to request 100% memory
135	* commitment? Who should be allowed to do runtime adjustments of the
136	* config. And how to prevent these settings from being lost when the last
137	* VM process exits? The solution is probably to have an optional root
138	* daemon the will keep VMMR0.r0 in memory and enable the security measures.
139	*
140	*
141	*
142	* @section sec_gmm_numa NUMA
143	*
144	* NUMA considerations will be designed and implemented a bit later.
145	*
146	* The preliminary guesses is that we will have to try allocate memory as
147	* close as possible to the CPUs the VM is executed on (EMT and additional CPU
148	* threads). Which means it's mostly about allocation and sharing policies.
149	* Both the scheduler and allocator interface will to supply some NUMA info
150	* and we'll need to have a way to calc access costs.
151	*
152	*/
153
154
155	/*********************************************************************************************************************************
156	* Header Files *
157	*********************************************************************************************************************************/
158	#define LOG_GROUP LOG_GROUP_GMM
159	#include <VBox/rawpci.h>
160	#include <VBox/vmm/gmm.h>
161	#include "GMMR0Internal.h"
162	#include <VBox/vmm/vmcc.h>
163	#include <VBox/vmm/pgm.h>
164	#include <VBox/log.h>
165	#include <VBox/param.h>
166	#include <VBox/err.h>
167	#include <VBox/VMMDev.h>
168	#include <iprt/asm.h>
169	#include <iprt/avl.h>
170	#ifdef VBOX_STRICT
171	# include <iprt/crc.h>
172	#endif
173	#include <iprt/critsect.h>
174	#include <iprt/list.h>
175	#include <iprt/mem.h>
176	#include <iprt/memobj.h>
177	#include <iprt/mp.h>
178	#include <iprt/semaphore.h>
179	#include <iprt/string.h>
180	#include <iprt/time.h>
181
182
183	/*********************************************************************************************************************************
184	* Defined Constants And Macros *
185	*********************************************************************************************************************************/
186	/** @def VBOX_USE_CRIT_SECT_FOR_GIANT
187	* Use a critical section instead of a fast mutex for the giant GMM lock.
188	*
189	* @remarks This is primarily a way of avoiding the deadlock checks in the
190	* windows driver verifier. */
191	#if defined(RT_OS_WINDOWS) \|\| defined(RT_OS_DARWIN) \|\| defined(DOXYGEN_RUNNING)
192	# define VBOX_USE_CRIT_SECT_FOR_GIANT
193	#endif
194
195	#if (!defined(VBOX_WITH_RAM_IN_KERNEL) \|\| defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)) \
196	&& !defined(RT_OS_DARWIN)
197	/** Enable the legacy mode code (will be dropped soon). */
198	# define GMM_WITH_LEGACY_MODE
199	#endif
200
201
202	/*********************************************************************************************************************************
203	* Structures and Typedefs *
204	*********************************************************************************************************************************/
205	/** Pointer to set of free chunks. */
206	typedef struct GMMCHUNKFREESET *PGMMCHUNKFREESET;
207
208	/**
209	* The per-page tracking structure employed by the GMM.
210	*
211	* On 32-bit hosts we'll some trickery is necessary to compress all
212	* the information into 32-bits. When the fSharedFree member is set,
213	* the 30th bit decides whether it's a free page or not.
214	*
215	* Because of the different layout on 32-bit and 64-bit hosts, macros
216	* are used to get and set some of the data.
217	*/
218	typedef union GMMPAGE
219	{
220	#if HC_ARCH_BITS == 64
221	/** Unsigned integer view. */
222	uint64_t u;
223
224	/** The common view. */
225	struct GMMPAGECOMMON
226	{
227	uint32_t uStuff1 : 32;
228	uint32_t uStuff2 : 30;
229	/** The page state. */
230	uint32_t u2State : 2;
231	} Common;
232
233	/** The view of a private page. */
234	struct GMMPAGEPRIVATE
235	{
236	/** The guest page frame number. (Max addressable: 2 ^ 44 - 16) */
237	uint32_t pfn;
238	/** The GVM handle. (64K VMs) */
239	uint32_t hGVM : 16;
240	/** Reserved. */
241	uint32_t u16Reserved : 14;
242	/** The page state. */
243	uint32_t u2State : 2;
244	} Private;
245
246	/** The view of a shared page. */
247	struct GMMPAGESHARED
248	{
249	/** The host page frame number. (Max addressable: 2 ^ 44 - 16) */
250	uint32_t pfn;
251	/** The reference count (64K VMs). */
252	uint32_t cRefs : 16;
253	/** Used for debug checksumming. */
254	uint32_t u14Checksum : 14;
255	/** The page state. */
256	uint32_t u2State : 2;
257	} Shared;
258
259	/** The view of a free page. */
260	struct GMMPAGEFREE
261	{
262	/** The index of the next page in the free list. UINT16_MAX is NIL. */
263	uint16_t iNext;
264	/** Reserved. Checksum or something? */
265	uint16_t u16Reserved0;
266	/** Reserved. Checksum or something? */
267	uint32_t u30Reserved1 : 30;
268	/** The page state. */
269	uint32_t u2State : 2;
270	} Free;
271
272	#else /* 32-bit */
273	/** Unsigned integer view. */
274	uint32_t u;
275
276	/** The common view. */
277	struct GMMPAGECOMMON
278	{
279	uint32_t uStuff : 30;
280	/** The page state. */
281	uint32_t u2State : 2;
282	} Common;
283
284	/** The view of a private page. */
285	struct GMMPAGEPRIVATE
286	{
287	/** The guest page frame number. (Max addressable: 2 ^ 36) */
288	uint32_t pfn : 24;
289	/** The GVM handle. (127 VMs) */
290	uint32_t hGVM : 7;
291	/** The top page state bit, MBZ. */
292	uint32_t fZero : 1;
293	} Private;
294
295	/** The view of a shared page. */
296	struct GMMPAGESHARED
297	{
298	/** The reference count. */
299	uint32_t cRefs : 30;
300	/** The page state. */
301	uint32_t u2State : 2;
302	} Shared;
303
304	/** The view of a free page. */
305	struct GMMPAGEFREE
306	{
307	/** The index of the next page in the free list. UINT16_MAX is NIL. */
308	uint32_t iNext : 16;
309	/** Reserved. Checksum or something? */
310	uint32_t u14Reserved : 14;
311	/** The page state. */
312	uint32_t u2State : 2;
313	} Free;
314	#endif
315	} GMMPAGE;
316	AssertCompileSize(GMMPAGE, sizeof(RTHCUINTPTR));
317	/** Pointer to a GMMPAGE. */
318	typedef GMMPAGE *PGMMPAGE;
319
320
321	/** @name The Page States.
322	* @{ */
323	/** A private page. */
324	#define GMM_PAGE_STATE_PRIVATE 0
325	/** A private page - alternative value used on the 32-bit implementation.
326	* This will never be used on 64-bit hosts. */
327	#define GMM_PAGE_STATE_PRIVATE_32 1
328	/** A shared page. */
329	#define GMM_PAGE_STATE_SHARED 2
330	/** A free page. */
331	#define GMM_PAGE_STATE_FREE 3
332	/** @} */
333
334
335	/** @def GMM_PAGE_IS_PRIVATE
336	*
337	* @returns true if private, false if not.
338	* @param pPage The GMM page.
339	*/
340	#if HC_ARCH_BITS == 64
341	# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_PRIVATE )
342	#else
343	# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Private.fZero == 0 )
344	#endif
345
346	/** @def GMM_PAGE_IS_SHARED
347	*
348	* @returns true if shared, false if not.
349	* @param pPage The GMM page.
350	*/
351	#define GMM_PAGE_IS_SHARED(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_SHARED )
352
353	/** @def GMM_PAGE_IS_FREE
354	*
355	* @returns true if free, false if not.
356	* @param pPage The GMM page.
357	*/
358	#define GMM_PAGE_IS_FREE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_FREE )
359
360	/** @def GMM_PAGE_PFN_LAST
361	* The last valid guest pfn range.
362	* @remark Some of the values outside the range has special meaning,
363	* see GMM_PAGE_PFN_UNSHAREABLE.
364	*/
365	#if HC_ARCH_BITS == 64
366	# define GMM_PAGE_PFN_LAST UINT32_C(0xfffffff0)
367	#else
368	# define GMM_PAGE_PFN_LAST UINT32_C(0x00fffff0)
369	#endif
370	AssertCompile(GMM_PAGE_PFN_LAST == (GMM_GCPHYS_LAST >> PAGE_SHIFT));
371
372	/** @def GMM_PAGE_PFN_UNSHAREABLE
373	* Indicates that this page isn't used for normal guest memory and thus isn't shareable.
374	*/
375	#if HC_ARCH_BITS == 64
376	# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0xfffffff1)
377	#else
378	# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0x00fffff1)
379	#endif
380	AssertCompile(GMM_PAGE_PFN_UNSHAREABLE == (GMM_GCPHYS_UNSHAREABLE >> PAGE_SHIFT));
381
382
383	/**
384	* A GMM allocation chunk ring-3 mapping record.
385	*
386	* This should really be associated with a session and not a VM, but
387	* it's simpler to associated with a VM and cleanup with the VM object
388	* is destroyed.
389	*/
390	typedef struct GMMCHUNKMAP
391	{
392	/** The mapping object. */
393	RTR0MEMOBJ hMapObj;
394	/** The VM owning the mapping. */
395	PGVM pGVM;
396	} GMMCHUNKMAP;
397	/** Pointer to a GMM allocation chunk mapping. */
398	typedef struct GMMCHUNKMAP *PGMMCHUNKMAP;
399
400
401	/**
402	* A GMM allocation chunk.
403	*/
404	typedef struct GMMCHUNK
405	{
406	/** The AVL node core.
407	* The Key is the chunk ID. (Giant mtx.) */
408	AVLU32NODECORE Core;
409	/** The memory object.
410	* Either from RTR0MemObjAllocPhysNC or RTR0MemObjLockUser depending on
411	* what the host can dish up with. (Chunk mtx protects mapping accesses
412	* and related frees.) */
413	RTR0MEMOBJ hMemObj;
414	#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
415	/** Pointer to the kernel mapping. */
416	uint8_t *pbMapping;
417	#endif
418	/** Pointer to the next chunk in the free list. (Giant mtx.) */
419	PGMMCHUNK pFreeNext;
420	/** Pointer to the previous chunk in the free list. (Giant mtx.) */
421	PGMMCHUNK pFreePrev;
422	/** Pointer to the free set this chunk belongs to. NULL for
423	* chunks with no free pages. (Giant mtx.) */
424	PGMMCHUNKFREESET pSet;
425	/** List node in the chunk list (GMM::ChunkList). (Giant mtx.) */
426	RTLISTNODE ListNode;
427	/** Pointer to an array of mappings. (Chunk mtx.) */
428	PGMMCHUNKMAP paMappingsX;
429	/** The number of mappings. (Chunk mtx.) */
430	uint16_t cMappingsX;
431	/** The mapping lock this chunk is using using. UINT16_MAX if nobody is
432	* mapping or freeing anything. (Giant mtx.) */
433	uint8_t volatile iChunkMtx;
434	/** GMM_CHUNK_FLAGS_XXX. (Giant mtx.) */
435	uint8_t fFlags;
436	/** The head of the list of free pages. UINT16_MAX is the NIL value.
437	* (Giant mtx.) */
438	uint16_t iFreeHead;
439	/** The number of free pages. (Giant mtx.) */
440	uint16_t cFree;
441	/** The GVM handle of the VM that first allocated pages from this chunk, this
442	* is used as a preference when there are several chunks to choose from.
443	* When in bound memory mode this isn't a preference any longer. (Giant
444	* mtx.) */
445	uint16_t hGVM;
446	/** The ID of the NUMA node the memory mostly resides on. (Reserved for
447	* future use.) (Giant mtx.) */
448	uint16_t idNumaNode;
449	/** The number of private pages. (Giant mtx.) */
450	uint16_t cPrivate;
451	/** The number of shared pages. (Giant mtx.) */
452	uint16_t cShared;
453	/** The pages. (Giant mtx.) */
454	GMMPAGE aPages[GMM_CHUNK_SIZE >> PAGE_SHIFT];
455	} GMMCHUNK;
456
457	/** Indicates that the NUMA properies of the memory is unknown. */
458	#define GMM_CHUNK_NUMA_ID_UNKNOWN UINT16_C(0xfffe)
459
460	/** @name GMM_CHUNK_FLAGS_XXX - chunk flags.
461	* @{ */
462	/** Indicates that the chunk is a large page (2MB). */
463	#define GMM_CHUNK_FLAGS_LARGE_PAGE UINT16_C(0x0001)
464	#ifdef GMM_WITH_LEGACY_MODE
465	/** Indicates that the chunk was locked rather than allocated directly. */
466	# define GMM_CHUNK_FLAGS_SEEDED UINT16_C(0x0002)
467	#endif
468	/** @} */
469
470
471	/**
472	* An allocation chunk TLB entry.
473	*/
474	typedef struct GMMCHUNKTLBE
475	{
476	/** The chunk id. */
477	uint32_t idChunk;
478	/** Pointer to the chunk. */
479	PGMMCHUNK pChunk;
480	} GMMCHUNKTLBE;
481	/** Pointer to an allocation chunk TLB entry. */
482	typedef GMMCHUNKTLBE *PGMMCHUNKTLBE;
483
484
485	/** The number of entries tin the allocation chunk TLB. */
486	#define GMM_CHUNKTLB_ENTRIES 32
487	/** Gets the TLB entry index for the given Chunk ID. */
488	#define GMM_CHUNKTLB_IDX(idChunk) ( (idChunk) & (GMM_CHUNKTLB_ENTRIES - 1) )
489
490	/**
491	* An allocation chunk TLB.
492	*/
493	typedef struct GMMCHUNKTLB
494	{
495	/** The TLB entries. */
496	GMMCHUNKTLBE aEntries[GMM_CHUNKTLB_ENTRIES];
497	} GMMCHUNKTLB;
498	/** Pointer to an allocation chunk TLB. */
499	typedef GMMCHUNKTLB *PGMMCHUNKTLB;
500
501
502	/**
503	* The GMM instance data.
504	*/
505	typedef struct GMM
506	{
507	/** Magic / eye catcher. GMM_MAGIC */
508	uint32_t u32Magic;
509	/** The number of threads waiting on the mutex. */
510	uint32_t cMtxContenders;
511	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
512	/** The critical section protecting the GMM.
513	* More fine grained locking can be implemented later if necessary. */
514	RTCRITSECT GiantCritSect;
515	#else
516	/** The fast mutex protecting the GMM.
517	* More fine grained locking can be implemented later if necessary. */
518	RTSEMFASTMUTEX hMtx;
519	#endif
520	#ifdef VBOX_STRICT
521	/** The current mutex owner. */
522	RTNATIVETHREAD hMtxOwner;
523	#endif
524	/** The chunk tree. */
525	PAVLU32NODECORE pChunks;
526	/** The chunk TLB. */
527	GMMCHUNKTLB ChunkTLB;
528	/** The private free set. */
529	GMMCHUNKFREESET PrivateX;
530	/** The shared free set. */
531	GMMCHUNKFREESET Shared;
532
533	/** Shared module tree (global).
534	* @todo separate trees for distinctly different guest OSes. */
535	PAVLLU32NODECORE pGlobalSharedModuleTree;
536	/** Sharable modules (count of nodes in pGlobalSharedModuleTree). */
537	uint32_t cShareableModules;
538
539	/** The chunk list. For simplifying the cleanup process and avoid tree
540	* traversal. */
541	RTLISTANCHOR ChunkList;
542
543	/** The maximum number of pages we're allowed to allocate.
544	* @gcfgm{GMM/MaxPages,64-bit, Direct.}
545	* @gcfgm{GMM/PctPages,32-bit, Relative to the number of host pages.} */
546	uint64_t cMaxPages;
547	/** The number of pages that has been reserved.
548	* The deal is that cReservedPages - cOverCommittedPages <= cMaxPages. */
549	uint64_t cReservedPages;
550	/** The number of pages that we have over-committed in reservations. */
551	uint64_t cOverCommittedPages;
552	/** The number of actually allocated (committed if you like) pages. */
553	uint64_t cAllocatedPages;
554	/** The number of pages that are shared. A subset of cAllocatedPages. */
555	uint64_t cSharedPages;
556	/** The number of pages that are actually shared between VMs. */
557	uint64_t cDuplicatePages;
558	/** The number of pages that are shared that has been left behind by
559	* VMs not doing proper cleanups. */
560	uint64_t cLeftBehindSharedPages;
561	/** The number of allocation chunks.
562	* (The number of pages we've allocated from the host can be derived from this.) */
563	uint32_t cChunks;
564	/** The number of current ballooned pages. */
565	uint64_t cBalloonedPages;
566
567	#ifndef GMM_WITH_LEGACY_MODE
568	# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
569	/** Whether #RTR0MemObjAllocPhysNC works. */
570	bool fHasWorkingAllocPhysNC;
571	# else
572	bool fPadding;
573	# endif
574	#else
575	/** The legacy allocation mode indicator.
576	* This is determined at initialization time. */
577	bool fLegacyAllocationMode;
578	#endif
579	/** The bound memory mode indicator.
580	* When set, the memory will be bound to a specific VM and never
581	* shared. This is always set if fLegacyAllocationMode is set.
582	* (Also determined at initialization time.) */
583	bool fBoundMemoryMode;
584	/** The number of registered VMs. */
585	uint16_t cRegisteredVMs;
586
587	/** The number of freed chunks ever. This is used a list generation to
588	* avoid restarting the cleanup scanning when the list wasn't modified. */
589	uint32_t cFreedChunks;
590	/** The previous allocated Chunk ID.
591	* Used as a hint to avoid scanning the whole bitmap. */
592	uint32_t idChunkPrev;
593	/** Chunk ID allocation bitmap.
594	* Bits of allocated IDs are set, free ones are clear.
595	* The NIL id (0) is marked allocated. */
596	uint32_t bmChunkId[(GMM_CHUNKID_LAST + 1 + 31) / 32];
597
598	/** The index of the next mutex to use. */
599	uint32_t iNextChunkMtx;
600	/** Chunk locks for reducing lock contention without having to allocate
601	* one lock per chunk. */
602	struct
603	{
604	/** The mutex */
605	RTSEMFASTMUTEX hMtx;
606	/** The number of threads currently using this mutex. */
607	uint32_t volatile cUsers;
608	} aChunkMtx[64];
609	} GMM;
610	/** Pointer to the GMM instance. */
611	typedef GMM *PGMM;
612
613	/** The value of GMM::u32Magic (Katsuhiro Otomo). */
614	#define GMM_MAGIC UINT32_C(0x19540414)
615
616
617	/**
618	* GMM chunk mutex state.
619	*
620	* This is returned by gmmR0ChunkMutexAcquire and is used by the other
621	* gmmR0ChunkMutex* methods.
622	*/
623	typedef struct GMMR0CHUNKMTXSTATE
624	{
625	PGMM pGMM;
626	/** The index of the chunk mutex. */
627	uint8_t iChunkMtx;
628	/** The relevant flags (GMMR0CHUNK_MTX_XXX). */
629	uint8_t fFlags;
630	} GMMR0CHUNKMTXSTATE;
631	/** Pointer to a chunk mutex state. */
632	typedef GMMR0CHUNKMTXSTATE *PGMMR0CHUNKMTXSTATE;
633
634	/** @name GMMR0CHUNK_MTX_XXX
635	* @{ */
636	#define GMMR0CHUNK_MTX_INVALID UINT32_C(0)
637	#define GMMR0CHUNK_MTX_KEEP_GIANT UINT32_C(1)
638	#define GMMR0CHUNK_MTX_RETAKE_GIANT UINT32_C(2)
639	#define GMMR0CHUNK_MTX_DROP_GIANT UINT32_C(3)
640	#define GMMR0CHUNK_MTX_END UINT32_C(4)
641	/** @} */
642
643
644	/** The maximum number of shared modules per-vm. */
645	#define GMM_MAX_SHARED_PER_VM_MODULES 2048
646	/** The maximum number of shared modules GMM is allowed to track. */
647	#define GMM_MAX_SHARED_GLOBAL_MODULES 16834
648
649
650	/**
651	* Argument packet for gmmR0SharedModuleCleanup.
652	*/
653	typedef struct GMMR0SHMODPERVMDTORARGS
654	{
655	PGVM pGVM;
656	PGMM pGMM;
657	} GMMR0SHMODPERVMDTORARGS;
658
659	/**
660	* Argument packet for gmmR0CheckSharedModule.
661	*/
662	typedef struct GMMCHECKSHAREDMODULEINFO
663	{
664	PGVM pGVM;
665	VMCPUID idCpu;
666	} GMMCHECKSHAREDMODULEINFO;
667
668	/**
669	* Argument packet for gmmR0FindDupPageInChunk by GMMR0FindDuplicatePage.
670	*/
671	typedef struct GMMFINDDUPPAGEINFO
672	{
673	PGVM pGVM;
674	PGMM pGMM;
675	uint8_t *pSourcePage;
676	bool fFoundDuplicate;
677	} GMMFINDDUPPAGEINFO;
678
679
680	/*********************************************************************************************************************************
681	* Global Variables *
682	*********************************************************************************************************************************/
683	/** Pointer to the GMM instance data. */
684	static PGMM g_pGMM = NULL;
685
686	/** Macro for obtaining and validating the g_pGMM pointer.
687	*
688	* On failure it will return from the invoking function with the specified
689	* return value.
690	*
691	* @param pGMM The name of the pGMM variable.
692	* @param rc The return value on failure. Use VERR_GMM_INSTANCE for VBox
693	* status codes.
694	*/
695	#define GMM_GET_VALID_INSTANCE(pGMM, rc) \
696	do { \
697	(pGMM) = g_pGMM; \
698	AssertPtrReturn((pGMM), (rc)); \
699	AssertMsgReturn((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic), (rc)); \
700	} while (0)
701
702	/** Macro for obtaining and validating the g_pGMM pointer, void function
703	* variant.
704	*
705	* On failure it will return from the invoking function.
706	*
707	* @param pGMM The name of the pGMM variable.
708	*/
709	#define GMM_GET_VALID_INSTANCE_VOID(pGMM) \
710	do { \
711	(pGMM) = g_pGMM; \
712	AssertPtrReturnVoid((pGMM)); \
713	AssertMsgReturnVoid((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic)); \
714	} while (0)
715
716
717	/** @def GMM_CHECK_SANITY_UPON_ENTERING
718	* Checks the sanity of the GMM instance data before making changes.
719	*
720	* This is macro is a stub by default and must be enabled manually in the code.
721	*
722	* @returns true if sane, false if not.
723	* @param pGMM The name of the pGMM variable.
724	*/
725	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
726	# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
727	#else
728	# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (true)
729	#endif
730
731	/** @def GMM_CHECK_SANITY_UPON_LEAVING
732	* Checks the sanity of the GMM instance data after making changes.
733	*
734	* This is macro is a stub by default and must be enabled manually in the code.
735	*
736	* @returns true if sane, false if not.
737	* @param pGMM The name of the pGMM variable.
738	*/
739	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
740	# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
741	#else
742	# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (true)
743	#endif
744
745	/** @def GMM_CHECK_SANITY_IN_LOOPS
746	* Checks the sanity of the GMM instance in the allocation loops.
747	*
748	* This is macro is a stub by default and must be enabled manually in the code.
749	*
750	* @returns true if sane, false if not.
751	* @param pGMM The name of the pGMM variable.
752	*/
753	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
754	# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
755	#else
756	# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (true)
757	#endif
758
759
760	/*********************************************************************************************************************************
761	* Internal Functions *
762	*********************************************************************************************************************************/
763	static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM);
764	static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
765	DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk);
766	DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet);
767	DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
768	#ifdef GMMR0_WITH_SANITY_CHECK
769	static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo);
770	#endif
771	static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem);
772	DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
773	DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
774	static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
775	#ifdef VBOX_WITH_PAGE_SHARING
776	static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM);
777	# ifdef VBOX_STRICT
778	static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage);
779	# endif
780	#endif
781
782
783
784	/**
785	* Initializes the GMM component.
786	*
787	* This is called when the VMMR0.r0 module is loaded and protected by the
788	* loader semaphore.
789	*
790	* @returns VBox status code.
791	*/
792	GMMR0DECL(int) GMMR0Init(void)
793	{
794	LogFlow(("GMMInit:\n"));
795
796	/*
797	* Allocate the instance data and the locks.
798	*/
799	PGMM pGMM = (PGMM)RTMemAllocZ(sizeof(*pGMM));
800	if (!pGMM)
801	return VERR_NO_MEMORY;
802
803	pGMM->u32Magic = GMM_MAGIC;
804	for (unsigned i = 0; i < RT_ELEMENTS(pGMM->ChunkTLB.aEntries); i++)
805	pGMM->ChunkTLB.aEntries[i].idChunk = NIL_GMM_CHUNKID;
806	RTListInit(&pGMM->ChunkList);
807	ASMBitSet(&pGMM->bmChunkId[0], NIL_GMM_CHUNKID);
808
809	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
810	int rc = RTCritSectInit(&pGMM->GiantCritSect);
811	#else
812	int rc = RTSemFastMutexCreate(&pGMM->hMtx);
813	#endif
814	if (RT_SUCCESS(rc))
815	{
816	unsigned iMtx;
817	for (iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
818	{
819	rc = RTSemFastMutexCreate(&pGMM->aChunkMtx[iMtx].hMtx);
820	if (RT_FAILURE(rc))
821	break;
822	}
823	if (RT_SUCCESS(rc))
824	{
825	#ifndef GMM_WITH_LEGACY_MODE
826	/*
827	* Figure out how we're going to allocate stuff (only applicable to
828	* host with linear physical memory mappings).
829	*/
830	pGMM->fBoundMemoryMode = false;
831	# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
832	pGMM->fHasWorkingAllocPhysNC = false;
833
834	RTR0MEMOBJ hMemObj;
835	rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
836	if (RT_SUCCESS(rc))
837	{
838	rc = RTR0MemObjFree(hMemObj, true);
839	AssertRC(rc);
840	pGMM->fHasWorkingAllocPhysNC = true;
841	}
842	else if (rc != VERR_NOT_SUPPORTED)
843	SUPR0Printf("GMMR0Init: Warning! RTR0MemObjAllocPhysNC(, %u, NIL_RTHCPHYS) -> %d!\n", GMM_CHUNK_SIZE, rc);
844	# endif
845	#else /* GMM_WITH_LEGACY_MODE */
846	/*
847	* Check and see if RTR0MemObjAllocPhysNC works.
848	*/
849	# if 0 /* later, see @bufref{3170}. */
850	RTR0MEMOBJ MemObj;
851	rc = RTR0MemObjAllocPhysNC(&MemObj, _64K, NIL_RTHCPHYS);
852	if (RT_SUCCESS(rc))
853	{
854	rc = RTR0MemObjFree(MemObj, true);
855	AssertRC(rc);
856	}
857	else if (rc == VERR_NOT_SUPPORTED)
858	pGMM->fLegacyAllocationMode = pGMM->fBoundMemoryMode = true;
859	else
860	SUPR0Printf("GMMR0Init: RTR0MemObjAllocPhysNC(,64K,Any) -> %d!\n", rc);
861	# else
862	# if defined(RT_OS_WINDOWS) \|\| (defined(RT_OS_SOLARIS) && ARCH_BITS == 64) \|\| defined(RT_OS_LINUX) \|\| defined(RT_OS_FREEBSD)
863	pGMM->fLegacyAllocationMode = false;
864	# if ARCH_BITS == 32
865	/* Don't reuse possibly partial chunks because of the virtual
866	address space limitation. */
867	pGMM->fBoundMemoryMode = true;
868	# else
869	pGMM->fBoundMemoryMode = false;
870	# endif
871	# else
872	pGMM->fLegacyAllocationMode = true;
873	pGMM->fBoundMemoryMode = true;
874	# endif
875	# endif
876	#endif /* GMM_WITH_LEGACY_MODE */
877
878	/*
879	* Query system page count and guess a reasonable cMaxPages value.
880	*/
881	pGMM->cMaxPages = UINT32_MAX; /** @todo IPRT function for query ram size and such. */
882
883	g_pGMM = pGMM;
884	#ifdef GMM_WITH_LEGACY_MODE
885	LogFlow(("GMMInit: pGMM=%p fLegacyAllocationMode=%RTbool fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fLegacyAllocationMode, pGMM->fBoundMemoryMode));
886	#elif defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
887	LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool fHasWorkingAllocPhysNC=%RTbool\n", pGMM, pGMM->fBoundMemoryMode, pGMM->fHasWorkingAllocPhysNC));
888	#else
889	LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fBoundMemoryMode));
890	#endif
891	return VINF_SUCCESS;
892	}
893
894	/*
895	* Bail out.
896	*/
897	while (iMtx-- > 0)
898	RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
899	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
900	RTCritSectDelete(&pGMM->GiantCritSect);
901	#else
902	RTSemFastMutexDestroy(pGMM->hMtx);
903	#endif
904	}
905
906	pGMM->u32Magic = 0;
907	RTMemFree(pGMM);
908	SUPR0Printf("GMMR0Init: failed! rc=%d\n", rc);
909	return rc;
910	}
911
912
913	/**
914	* Terminates the GMM component.
915	*/
916	GMMR0DECL(void) GMMR0Term(void)
917	{
918	LogFlow(("GMMTerm:\n"));
919
920	/*
921	* Take care / be paranoid...
922	*/
923	PGMM pGMM = g_pGMM;
924	if (!VALID_PTR(pGMM))
925	return;
926	if (pGMM->u32Magic != GMM_MAGIC)
927	{
928	SUPR0Printf("GMMR0Term: u32Magic=%#x\n", pGMM->u32Magic);
929	return;
930	}
931
932	/*
933	* Undo what init did and free all the resources we've acquired.
934	*/
935	/* Destroy the fundamentals. */
936	g_pGMM = NULL;
937	pGMM->u32Magic = ~GMM_MAGIC;
938	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
939	RTCritSectDelete(&pGMM->GiantCritSect);
940	#else
941	RTSemFastMutexDestroy(pGMM->hMtx);
942	pGMM->hMtx = NIL_RTSEMFASTMUTEX;
943	#endif
944
945	/* Free any chunks still hanging around. */
946	RTAvlU32Destroy(&pGMM->pChunks, gmmR0TermDestroyChunk, pGMM);
947
948	/* Destroy the chunk locks. */
949	for (unsigned iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
950	{
951	Assert(pGMM->aChunkMtx[iMtx].cUsers == 0);
952	RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
953	pGMM->aChunkMtx[iMtx].hMtx = NIL_RTSEMFASTMUTEX;
954	}
955
956	/* Finally the instance data itself. */
957	RTMemFree(pGMM);
958	LogFlow(("GMMTerm: done\n"));
959	}
960
961
962	/**
963	* RTAvlU32Destroy callback.
964	*
965	* @returns 0
966	* @param pNode The node to destroy.
967	* @param pvGMM The GMM handle.
968	*/
969	static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM)
970	{
971	PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
972
973	if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
974	SUPR0Printf("GMMR0Term: %RKv/%#x: cFree=%d cPrivate=%d cShared=%d cMappings=%d\n", pChunk,
975	pChunk->Core.Key, pChunk->cFree, pChunk->cPrivate, pChunk->cShared, pChunk->cMappingsX);
976
977	int rc = RTR0MemObjFree(pChunk->hMemObj, true /* fFreeMappings */);
978	if (RT_FAILURE(rc))
979	{
980	SUPR0Printf("GMMR0Term: %RKv/%#x: RTRMemObjFree(%RKv,true) -> %d (cMappings=%d)\n", pChunk,
981	pChunk->Core.Key, pChunk->hMemObj, rc, pChunk->cMappingsX);
982	AssertRC(rc);
983	}
984	pChunk->hMemObj = NIL_RTR0MEMOBJ;
985
986	RTMemFree(pChunk->paMappingsX);
987	pChunk->paMappingsX = NULL;
988
989	RTMemFree(pChunk);
990	NOREF(pvGMM);
991	return 0;
992	}
993
994
995	/**
996	* Initializes the per-VM data for the GMM.
997	*
998	* This is called from within the GVMM lock (from GVMMR0CreateVM)
999	* and should only initialize the data members so GMMR0CleanupVM
1000	* can deal with them. We reserve no memory or anything here,
1001	* that's done later in GMMR0InitVM.
1002	*
1003	* @param pGVM Pointer to the Global VM structure.
1004	*/
1005	GMMR0DECL(void) GMMR0InitPerVMData(PGVM pGVM)
1006	{
1007	AssertCompile(RT_SIZEOFMEMB(GVM,gmm.s) <= RT_SIZEOFMEMB(GVM,gmm.padding));
1008
1009	pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1010	pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1011	pGVM->gmm.s.Stats.fMayAllocate = false;
1012	}
1013
1014
1015	/**
1016	* Acquires the GMM giant lock.
1017	*
1018	* @returns Assert status code from RTSemFastMutexRequest.
1019	* @param pGMM Pointer to the GMM instance.
1020	*/
1021	static int gmmR0MutexAcquire(PGMM pGMM)
1022	{
1023	ASMAtomicIncU32(&pGMM->cMtxContenders);
1024	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1025	int rc = RTCritSectEnter(&pGMM->GiantCritSect);
1026	#else
1027	int rc = RTSemFastMutexRequest(pGMM->hMtx);
1028	#endif
1029	ASMAtomicDecU32(&pGMM->cMtxContenders);
1030	AssertRC(rc);
1031	#ifdef VBOX_STRICT
1032	pGMM->hMtxOwner = RTThreadNativeSelf();
1033	#endif
1034	return rc;
1035	}
1036
1037
1038	/**
1039	* Releases the GMM giant lock.
1040	*
1041	* @returns Assert status code from RTSemFastMutexRequest.
1042	* @param pGMM Pointer to the GMM instance.
1043	*/
1044	static int gmmR0MutexRelease(PGMM pGMM)
1045	{
1046	#ifdef VBOX_STRICT
1047	pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1048	#endif
1049	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1050	int rc = RTCritSectLeave(&pGMM->GiantCritSect);
1051	#else
1052	int rc = RTSemFastMutexRelease(pGMM->hMtx);
1053	AssertRC(rc);
1054	#endif
1055	return rc;
1056	}
1057
1058
1059	/**
1060	* Yields the GMM giant lock if there is contention and a certain minimum time
1061	* has elapsed since we took it.
1062	*
1063	* @returns @c true if the mutex was yielded, @c false if not.
1064	* @param pGMM Pointer to the GMM instance.
1065	* @param puLockNanoTS Where the lock acquisition time stamp is kept
1066	* (in/out).
1067	*/
1068	static bool gmmR0MutexYield(PGMM pGMM, uint64_t *puLockNanoTS)
1069	{
1070	/*
1071	* If nobody is contending the mutex, don't bother checking the time.
1072	*/
1073	if (ASMAtomicReadU32(&pGMM->cMtxContenders) == 0)
1074	return false;
1075
1076	/*
1077	* Don't yield if we haven't executed for at least 2 milliseconds.
1078	*/
1079	uint64_t uNanoNow = RTTimeSystemNanoTS();
1080	if (uNanoNow - *puLockNanoTS < UINT32_C(2000000))
1081	return false;
1082
1083	/*
1084	* Yield the mutex.
1085	*/
1086	#ifdef VBOX_STRICT
1087	pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1088	#endif
1089	ASMAtomicIncU32(&pGMM->cMtxContenders);
1090	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1091	int rc1 = RTCritSectLeave(&pGMM->GiantCritSect); AssertRC(rc1);
1092	#else
1093	int rc1 = RTSemFastMutexRelease(pGMM->hMtx); AssertRC(rc1);
1094	#endif
1095
1096	RTThreadYield();
1097
1098	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1099	int rc2 = RTCritSectEnter(&pGMM->GiantCritSect); AssertRC(rc2);
1100	#else
1101	int rc2 = RTSemFastMutexRequest(pGMM->hMtx); AssertRC(rc2);
1102	#endif
1103	*puLockNanoTS = RTTimeSystemNanoTS();
1104	ASMAtomicDecU32(&pGMM->cMtxContenders);
1105	#ifdef VBOX_STRICT
1106	pGMM->hMtxOwner = RTThreadNativeSelf();
1107	#endif
1108
1109	return true;
1110	}
1111
1112
1113	/**
1114	* Acquires a chunk lock.
1115	*
1116	* The caller must own the giant lock.
1117	*
1118	* @returns Assert status code from RTSemFastMutexRequest.
1119	* @param pMtxState The chunk mutex state info. (Avoids
1120	* passing the same flags and stuff around
1121	* for subsequent release and drop-giant
1122	* calls.)
1123	* @param pGMM Pointer to the GMM instance.
1124	* @param pChunk Pointer to the chunk.
1125	* @param fFlags Flags regarding the giant lock, GMMR0CHUNK_MTX_XXX.
1126	*/
1127	static int gmmR0ChunkMutexAcquire(PGMMR0CHUNKMTXSTATE pMtxState, PGMM pGMM, PGMMCHUNK pChunk, uint32_t fFlags)
1128	{
1129	Assert(fFlags > GMMR0CHUNK_MTX_INVALID && fFlags < GMMR0CHUNK_MTX_END);
1130	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1131
1132	pMtxState->pGMM = pGMM;
1133	pMtxState->fFlags = (uint8_t)fFlags;
1134
1135	/*
1136	* Get the lock index and reference the lock.
1137	*/
1138	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1139	uint32_t iChunkMtx = pChunk->iChunkMtx;
1140	if (iChunkMtx == UINT8_MAX)
1141	{
1142	iChunkMtx = pGMM->iNextChunkMtx++;
1143	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1144
1145	/* Try get an unused one... */
1146	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1147	{
1148	iChunkMtx = pGMM->iNextChunkMtx++;
1149	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1150	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1151	{
1152	iChunkMtx = pGMM->iNextChunkMtx++;
1153	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1154	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1155	{
1156	iChunkMtx = pGMM->iNextChunkMtx++;
1157	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1158	}
1159	}
1160	}
1161
1162	pChunk->iChunkMtx = iChunkMtx;
1163	}
1164	AssertCompile(RT_ELEMENTS(pGMM->aChunkMtx) < UINT8_MAX);
1165	pMtxState->iChunkMtx = (uint8_t)iChunkMtx;
1166	ASMAtomicIncU32(&pGMM->aChunkMtx[iChunkMtx].cUsers);
1167
1168	/*
1169	* Drop the giant?
1170	*/
1171	if (fFlags != GMMR0CHUNK_MTX_KEEP_GIANT)
1172	{
1173	/** @todo GMM life cycle cleanup (we may race someone
1174	* destroying and cleaning up GMM)? */
1175	gmmR0MutexRelease(pGMM);
1176	}
1177
1178	/*
1179	* Take the chunk mutex.
1180	*/
1181	int rc = RTSemFastMutexRequest(pGMM->aChunkMtx[iChunkMtx].hMtx);
1182	AssertRC(rc);
1183	return rc;
1184	}
1185
1186
1187	/**
1188	* Releases the GMM giant lock.
1189	*
1190	* @returns Assert status code from RTSemFastMutexRequest.
1191	* @param pMtxState Pointer to the chunk mutex state.
1192	* @param pChunk Pointer to the chunk if it's still
1193	* alive, NULL if it isn't. This is used to deassociate
1194	* the chunk from the mutex on the way out so a new one
1195	* can be selected next time, thus avoiding contented
1196	* mutexes.
1197	*/
1198	static int gmmR0ChunkMutexRelease(PGMMR0CHUNKMTXSTATE pMtxState, PGMMCHUNK pChunk)
1199	{
1200	PGMM pGMM = pMtxState->pGMM;
1201
1202	/*
1203	* Release the chunk mutex and reacquire the giant if requested.
1204	*/
1205	int rc = RTSemFastMutexRelease(pGMM->aChunkMtx[pMtxState->iChunkMtx].hMtx);
1206	AssertRC(rc);
1207	if (pMtxState->fFlags == GMMR0CHUNK_MTX_RETAKE_GIANT)
1208	rc = gmmR0MutexAcquire(pGMM);
1209	else
1210	Assert((pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT) == (pGMM->hMtxOwner == RTThreadNativeSelf()));
1211
1212	/*
1213	* Drop the chunk mutex user reference and deassociate it from the chunk
1214	* when possible.
1215	*/
1216	if ( ASMAtomicDecU32(&pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers) == 0
1217	&& pChunk
1218	&& RT_SUCCESS(rc) )
1219	{
1220	if (pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT)
1221	pChunk->iChunkMtx = UINT8_MAX;
1222	else
1223	{
1224	rc = gmmR0MutexAcquire(pGMM);
1225	if (RT_SUCCESS(rc))
1226	{
1227	if (pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers == 0)
1228	pChunk->iChunkMtx = UINT8_MAX;
1229	rc = gmmR0MutexRelease(pGMM);
1230	}
1231	}
1232	}
1233
1234	pMtxState->pGMM = NULL;
1235	return rc;
1236	}
1237
1238
1239	/**
1240	* Drops the giant GMM lock we kept in gmmR0ChunkMutexAcquire while keeping the
1241	* chunk locked.
1242	*
1243	* This only works if gmmR0ChunkMutexAcquire was called with
1244	* GMMR0CHUNK_MTX_KEEP_GIANT. gmmR0ChunkMutexRelease will retake the giant
1245	* mutex, i.e. behave as if GMMR0CHUNK_MTX_RETAKE_GIANT was used.
1246	*
1247	* @returns VBox status code (assuming success is ok).
1248	* @param pMtxState Pointer to the chunk mutex state.
1249	*/
1250	static int gmmR0ChunkMutexDropGiant(PGMMR0CHUNKMTXSTATE pMtxState)
1251	{
1252	AssertReturn(pMtxState->fFlags == GMMR0CHUNK_MTX_KEEP_GIANT, VERR_GMM_MTX_FLAGS);
1253	Assert(pMtxState->pGMM->hMtxOwner == RTThreadNativeSelf());
1254	pMtxState->fFlags = GMMR0CHUNK_MTX_RETAKE_GIANT;
1255	/** @todo GMM life cycle cleanup (we may race someone
1256	* destroying and cleaning up GMM)? */
1257	return gmmR0MutexRelease(pMtxState->pGMM);
1258	}
1259
1260
1261	/**
1262	* For experimenting with NUMA affinity and such.
1263	*
1264	* @returns The current NUMA Node ID.
1265	*/
1266	static uint16_t gmmR0GetCurrentNumaNodeId(void)
1267	{
1268	#if 1
1269	return GMM_CHUNK_NUMA_ID_UNKNOWN;
1270	#else
1271	return RTMpCpuId() / 16;
1272	#endif
1273	}
1274
1275
1276
1277	/**
1278	* Cleans up when a VM is terminating.
1279	*
1280	* @param pGVM Pointer to the Global VM structure.
1281	*/
1282	GMMR0DECL(void) GMMR0CleanupVM(PGVM pGVM)
1283	{
1284	LogFlow(("GMMR0CleanupVM: pGVM=%p:{.hSelf=%#x}\n", pGVM, pGVM->hSelf));
1285
1286	PGMM pGMM;
1287	GMM_GET_VALID_INSTANCE_VOID(pGMM);
1288
1289	#ifdef VBOX_WITH_PAGE_SHARING
1290	/*
1291	* Clean up all registered shared modules first.
1292	*/
1293	gmmR0SharedModuleCleanup(pGMM, pGVM);
1294	#endif
1295
1296	gmmR0MutexAcquire(pGMM);
1297	uint64_t uLockNanoTS = RTTimeSystemNanoTS();
1298	GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
1299
1300	/*
1301	* The policy is 'INVALID' until the initial reservation
1302	* request has been serviced.
1303	*/
1304	if ( pGVM->gmm.s.Stats.enmPolicy > GMMOCPOLICY_INVALID
1305	&& pGVM->gmm.s.Stats.enmPolicy < GMMOCPOLICY_END)
1306	{
1307	/*
1308	* If it's the last VM around, we can skip walking all the chunk looking
1309	* for the pages owned by this VM and instead flush the whole shebang.
1310	*
1311	* This takes care of the eventuality that a VM has left shared page
1312	* references behind (shouldn't happen of course, but you never know).
1313	*/
1314	Assert(pGMM->cRegisteredVMs);
1315	pGMM->cRegisteredVMs--;
1316
1317	/*
1318	* Walk the entire pool looking for pages that belong to this VM
1319	* and leftover mappings. (This'll only catch private pages,
1320	* shared pages will be 'left behind'.)
1321	*/
1322	/** @todo r=bird: This scanning+freeing could be optimized in bound mode! */
1323	uint64_t cPrivatePages = pGVM->gmm.s.Stats.cPrivatePages; /* save */
1324
1325	unsigned iCountDown = 64;
1326	bool fRedoFromStart;
1327	PGMMCHUNK pChunk;
1328	do
1329	{
1330	fRedoFromStart = false;
1331	RTListForEachReverse(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
1332	{
1333	uint32_t const cFreeChunksOld = pGMM->cFreedChunks;
1334	if ( ( !pGMM->fBoundMemoryMode
1335	\|\| pChunk->hGVM == pGVM->hSelf)
1336	&& gmmR0CleanupVMScanChunk(pGMM, pGVM, pChunk))
1337	{
1338	/* We left the giant mutex, so reset the yield counters. */
1339	uLockNanoTS = RTTimeSystemNanoTS();
1340	iCountDown = 64;
1341	}
1342	else
1343	{
1344	/* Didn't leave it, so do normal yielding. */
1345	if (!iCountDown)
1346	gmmR0MutexYield(pGMM, &uLockNanoTS);
1347	else
1348	iCountDown--;
1349	}
1350	if (pGMM->cFreedChunks != cFreeChunksOld)
1351	{
1352	fRedoFromStart = true;
1353	break;
1354	}
1355	}
1356	} while (fRedoFromStart);
1357
1358	if (pGVM->gmm.s.Stats.cPrivatePages)
1359	SUPR0Printf("GMMR0CleanupVM: hGVM=%#x has %#x private pages that cannot be found!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cPrivatePages);
1360
1361	pGMM->cAllocatedPages -= cPrivatePages;
1362
1363	/*
1364	* Free empty chunks.
1365	*/
1366	PGMMCHUNKFREESET pPrivateSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
1367	do
1368	{
1369	fRedoFromStart = false;
1370	iCountDown = 10240;
1371	pChunk = pPrivateSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
1372	while (pChunk)
1373	{
1374	PGMMCHUNK pNext = pChunk->pFreeNext;
1375	Assert(pChunk->cFree == GMM_CHUNK_NUM_PAGES);
1376	if ( !pGMM->fBoundMemoryMode
1377	\|\| pChunk->hGVM == pGVM->hSelf)
1378	{
1379	uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1380	if (gmmR0FreeChunk(pGMM, pGVM, pChunk, true /fRelaxedSem/))
1381	{
1382	/* We've left the giant mutex, restart? (+1 for our unlink) */
1383	fRedoFromStart = pPrivateSet->idGeneration != idGenerationOld + 1;
1384	if (fRedoFromStart)
1385	break;
1386	uLockNanoTS = RTTimeSystemNanoTS();
1387	iCountDown = 10240;
1388	}
1389	}
1390
1391	/* Advance and maybe yield the lock. */
1392	pChunk = pNext;
1393	if (--iCountDown == 0)
1394	{
1395	uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1396	fRedoFromStart = gmmR0MutexYield(pGMM, &uLockNanoTS)
1397	&& pPrivateSet->idGeneration != idGenerationOld;
1398	if (fRedoFromStart)
1399	break;
1400	iCountDown = 10240;
1401	}
1402	}
1403	} while (fRedoFromStart);
1404
1405	/*
1406	* Account for shared pages that weren't freed.
1407	*/
1408	if (pGVM->gmm.s.Stats.cSharedPages)
1409	{
1410	Assert(pGMM->cSharedPages >= pGVM->gmm.s.Stats.cSharedPages);
1411	SUPR0Printf("GMMR0CleanupVM: hGVM=%#x left %#x shared pages behind!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cSharedPages);
1412	pGMM->cLeftBehindSharedPages += pGVM->gmm.s.Stats.cSharedPages;
1413	}
1414
1415	/*
1416	* Clean up balloon statistics in case the VM process crashed.
1417	*/
1418	Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
1419	pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
1420
1421	/*
1422	* Update the over-commitment management statistics.
1423	*/
1424	pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1425	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
1426	+ pGVM->gmm.s.Stats.Reserved.cShadowPages;
1427	switch (pGVM->gmm.s.Stats.enmPolicy)
1428	{
1429	case GMMOCPOLICY_NO_OC:
1430	break;
1431	default:
1432	/** @todo Update GMM->cOverCommittedPages */
1433	break;
1434	}
1435	}
1436
1437	/* zap the GVM data. */
1438	pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1439	pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1440	pGVM->gmm.s.Stats.fMayAllocate = false;
1441
1442	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1443	gmmR0MutexRelease(pGMM);
1444
1445	LogFlow(("GMMR0CleanupVM: returns\n"));
1446	}
1447
1448
1449	/**
1450	* Scan one chunk for private pages belonging to the specified VM.
1451	*
1452	* @note This function may drop the giant mutex!
1453	*
1454	* @returns @c true if we've temporarily dropped the giant mutex, @c false if
1455	* we didn't.
1456	* @param pGMM Pointer to the GMM instance.
1457	* @param pGVM The global VM handle.
1458	* @param pChunk The chunk to scan.
1459	*/
1460	static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1461	{
1462	Assert(!pGMM->fBoundMemoryMode \|\| pChunk->hGVM == pGVM->hSelf);
1463
1464	/*
1465	* Look for pages belonging to the VM.
1466	* (Perform some internal checks while we're scanning.)
1467	*/
1468	#ifndef VBOX_STRICT
1469	if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
1470	#endif
1471	{
1472	unsigned cPrivate = 0;
1473	unsigned cShared = 0;
1474	unsigned cFree = 0;
1475
1476	gmmR0UnlinkChunk(pChunk); /* avoiding cFreePages updates. */
1477
1478	uint16_t hGVM = pGVM->hSelf;
1479	unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
1480	while (iPage-- > 0)
1481	if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
1482	{
1483	if (pChunk->aPages[iPage].Private.hGVM == hGVM)
1484	{
1485	/*
1486	* Free the page.
1487	*
1488	* The reason for not using gmmR0FreePrivatePage here is that we
1489	* must not cause the chunk to be freed from under us - we're in
1490	* an AVL tree walk here.
1491	*/
1492	pChunk->aPages[iPage].u = 0;
1493	pChunk->aPages[iPage].Free.iNext = pChunk->iFreeHead;
1494	pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
1495	pChunk->iFreeHead = iPage;
1496	pChunk->cPrivate--;
1497	pChunk->cFree++;
1498	pGVM->gmm.s.Stats.cPrivatePages--;
1499	cFree++;
1500	}
1501	else
1502	cPrivate++;
1503	}
1504	else if (GMM_PAGE_IS_FREE(&pChunk->aPages[iPage]))
1505	cFree++;
1506	else
1507	cShared++;
1508
1509	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1510
1511	/*
1512	* Did it add up?
1513	*/
1514	if (RT_UNLIKELY( pChunk->cFree != cFree
1515	\|\| pChunk->cPrivate != cPrivate
1516	\|\| pChunk->cShared != cShared))
1517	{
1518	SUPR0Printf("gmmR0CleanupVMScanChunk: Chunk %RKv/%#x has bogus stats - free=%d/%d private=%d/%d shared=%d/%d\n",
1519	pChunk, pChunk->Core.Key, pChunk->cFree, cFree, pChunk->cPrivate, cPrivate, pChunk->cShared, cShared);
1520	pChunk->cFree = cFree;
1521	pChunk->cPrivate = cPrivate;
1522	pChunk->cShared = cShared;
1523	}
1524	}
1525
1526	/*
1527	* If not in bound memory mode, we should reset the hGVM field
1528	* if it has our handle in it.
1529	*/
1530	if (pChunk->hGVM == pGVM->hSelf)
1531	{
1532	if (!g_pGMM->fBoundMemoryMode)
1533	pChunk->hGVM = NIL_GVM_HANDLE;
1534	else if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
1535	{
1536	SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: cFree=%#x - it should be 0 in bound mode!\n",
1537	pChunk, pChunk->Core.Key, pChunk->cFree);
1538	AssertMsgFailed(("%p/%#x: cFree=%#x - it should be 0 in bound mode!\n", pChunk, pChunk->Core.Key, pChunk->cFree));
1539
1540	gmmR0UnlinkChunk(pChunk);
1541	pChunk->cFree = GMM_CHUNK_NUM_PAGES;
1542	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1543	}
1544	}
1545
1546	/*
1547	* Look for a mapping belonging to the terminating VM.
1548	*/
1549	GMMR0CHUNKMTXSTATE MtxState;
1550	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
1551	unsigned cMappings = pChunk->cMappingsX;
1552	for (unsigned i = 0; i < cMappings; i++)
1553	if (pChunk->paMappingsX[i].pGVM == pGVM)
1554	{
1555	gmmR0ChunkMutexDropGiant(&MtxState);
1556
1557	RTR0MEMOBJ hMemObj = pChunk->paMappingsX[i].hMapObj;
1558
1559	cMappings--;
1560	if (i < cMappings)
1561	pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
1562	pChunk->paMappingsX[cMappings].pGVM = NULL;
1563	pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
1564	Assert(pChunk->cMappingsX - 1U == cMappings);
1565	pChunk->cMappingsX = cMappings;
1566
1567	int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings (NA) */);
1568	if (RT_FAILURE(rc))
1569	{
1570	SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: mapping #%x: RTRMemObjFree(%RKv,false) -> %d \n",
1571	pChunk, pChunk->Core.Key, i, hMemObj, rc);
1572	AssertRC(rc);
1573	}
1574
1575	gmmR0ChunkMutexRelease(&MtxState, pChunk);
1576	return true;
1577	}
1578
1579	gmmR0ChunkMutexRelease(&MtxState, pChunk);
1580	return false;
1581	}
1582
1583
1584	/**
1585	* The initial resource reservations.
1586	*
1587	* This will make memory reservations according to policy and priority. If there aren't
1588	* sufficient resources available to sustain the VM this function will fail and all
1589	* future allocations requests will fail as well.
1590	*
1591	* These are just the initial reservations made very very early during the VM creation
1592	* process and will be adjusted later in the GMMR0UpdateReservation call after the
1593	* ring-3 init has completed.
1594	*
1595	* @returns VBox status code.
1596	* @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1597	* @retval VERR_GMM_
1598	*
1599	* @param pGVM The global (ring-0) VM structure.
1600	* @param idCpu The VCPU id - must be zero.
1601	* @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1602	* This does not include MMIO2 and similar.
1603	* @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1604	* @param cFixedPages The number of pages that may be allocated for fixed objects like the
1605	* hyper heap, MMIO2 and similar.
1606	* @param enmPolicy The OC policy to use on this VM.
1607	* @param enmPriority The priority in an out-of-memory situation.
1608	*
1609	* @thread The creator thread / EMT(0).
1610	*/
1611	GMMR0DECL(int) GMMR0InitialReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages,
1612	uint32_t cFixedPages, GMMOCPOLICY enmPolicy, GMMPRIORITY enmPriority)
1613	{
1614	LogFlow(("GMMR0InitialReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x enmPolicy=%d enmPriority=%d\n",
1615	pGVM, cBasePages, cShadowPages, cFixedPages, enmPolicy, enmPriority));
1616
1617	/*
1618	* Validate, get basics and take the semaphore.
1619	*/
1620	AssertReturn(idCpu == 0, VERR_INVALID_CPU_ID);
1621	PGMM pGMM;
1622	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1623	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1624	if (RT_FAILURE(rc))
1625	return rc;
1626
1627	AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1628	AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1629	AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1630	AssertReturn(enmPolicy > GMMOCPOLICY_INVALID && enmPolicy < GMMOCPOLICY_END, VERR_INVALID_PARAMETER);
1631	AssertReturn(enmPriority > GMMPRIORITY_INVALID && enmPriority < GMMPRIORITY_END, VERR_INVALID_PARAMETER);
1632
1633	gmmR0MutexAcquire(pGMM);
1634	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1635	{
1636	if ( !pGVM->gmm.s.Stats.Reserved.cBasePages
1637	&& !pGVM->gmm.s.Stats.Reserved.cFixedPages
1638	&& !pGVM->gmm.s.Stats.Reserved.cShadowPages)
1639	{
1640	/*
1641	* Check if we can accommodate this.
1642	*/
1643	/* ... later ... */
1644	if (RT_SUCCESS(rc))
1645	{
1646	/*
1647	* Update the records.
1648	*/
1649	pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1650	pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1651	pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1652	pGVM->gmm.s.Stats.enmPolicy = enmPolicy;
1653	pGVM->gmm.s.Stats.enmPriority = enmPriority;
1654	pGVM->gmm.s.Stats.fMayAllocate = true;
1655
1656	pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1657	pGMM->cRegisteredVMs++;
1658	}
1659	}
1660	else
1661	rc = VERR_WRONG_ORDER;
1662	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1663	}
1664	else
1665	rc = VERR_GMM_IS_NOT_SANE;
1666	gmmR0MutexRelease(pGMM);
1667	LogFlow(("GMMR0InitialReservation: returns %Rrc\n", rc));
1668	return rc;
1669	}
1670
1671
1672	/**
1673	* VMMR0 request wrapper for GMMR0InitialReservation.
1674	*
1675	* @returns see GMMR0InitialReservation.
1676	* @param pGVM The global (ring-0) VM structure.
1677	* @param idCpu The VCPU id.
1678	* @param pReq Pointer to the request packet.
1679	*/
1680	GMMR0DECL(int) GMMR0InitialReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMINITIALRESERVATIONREQ pReq)
1681	{
1682	/*
1683	* Validate input and pass it on.
1684	*/
1685	AssertPtrReturn(pGVM, VERR_INVALID_POINTER);
1686	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1687	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
1688
1689	return GMMR0InitialReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages,
1690	pReq->cFixedPages, pReq->enmPolicy, pReq->enmPriority);
1691	}
1692
1693
1694	/**
1695	* This updates the memory reservation with the additional MMIO2 and ROM pages.
1696	*
1697	* @returns VBox status code.
1698	* @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1699	*
1700	* @param pGVM The global (ring-0) VM structure.
1701	* @param idCpu The VCPU id.
1702	* @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1703	* This does not include MMIO2 and similar.
1704	* @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1705	* @param cFixedPages The number of pages that may be allocated for fixed objects like the
1706	* hyper heap, MMIO2 and similar.
1707	*
1708	* @thread EMT(idCpu)
1709	*/
1710	GMMR0DECL(int) GMMR0UpdateReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages,
1711	uint32_t cShadowPages, uint32_t cFixedPages)
1712	{
1713	LogFlow(("GMMR0UpdateReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x\n",
1714	pGVM, cBasePages, cShadowPages, cFixedPages));
1715
1716	/*
1717	* Validate, get basics and take the semaphore.
1718	*/
1719	PGMM pGMM;
1720	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1721	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1722	if (RT_FAILURE(rc))
1723	return rc;
1724
1725	AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1726	AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1727	AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1728
1729	gmmR0MutexAcquire(pGMM);
1730	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1731	{
1732	if ( pGVM->gmm.s.Stats.Reserved.cBasePages
1733	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
1734	&& pGVM->gmm.s.Stats.Reserved.cShadowPages)
1735	{
1736	/*
1737	* Check if we can accommodate this.
1738	*/
1739	/* ... later ... */
1740	if (RT_SUCCESS(rc))
1741	{
1742	/*
1743	* Update the records.
1744	*/
1745	pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1746	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
1747	+ pGVM->gmm.s.Stats.Reserved.cShadowPages;
1748	pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1749
1750	pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1751	pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1752	pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1753	}
1754	}
1755	else
1756	rc = VERR_WRONG_ORDER;
1757	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1758	}
1759	else
1760	rc = VERR_GMM_IS_NOT_SANE;
1761	gmmR0MutexRelease(pGMM);
1762	LogFlow(("GMMR0UpdateReservation: returns %Rrc\n", rc));
1763	return rc;
1764	}
1765
1766
1767	/**
1768	* VMMR0 request wrapper for GMMR0UpdateReservation.
1769	*
1770	* @returns see GMMR0UpdateReservation.
1771	* @param pGVM The global (ring-0) VM structure.
1772	* @param idCpu The VCPU id.
1773	* @param pReq Pointer to the request packet.
1774	*/
1775	GMMR0DECL(int) GMMR0UpdateReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMUPDATERESERVATIONREQ pReq)
1776	{
1777	/*
1778	* Validate input and pass it on.
1779	*/
1780	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1781	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
1782
1783	return GMMR0UpdateReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages);
1784	}
1785
1786	#ifdef GMMR0_WITH_SANITY_CHECK
1787
1788	/**
1789	* Performs sanity checks on a free set.
1790	*
1791	* @returns Error count.
1792	*
1793	* @param pGMM Pointer to the GMM instance.
1794	* @param pSet Pointer to the set.
1795	* @param pszSetName The set name.
1796	* @param pszFunction The function from which it was called.
1797	* @param uLine The line number.
1798	*/
1799	static uint32_t gmmR0SanityCheckSet(PGMM pGMM, PGMMCHUNKFREESET pSet, const char *pszSetName,
1800	const char *pszFunction, unsigned uLineNo)
1801	{
1802	uint32_t cErrors = 0;
1803
1804	/*
1805	* Count the free pages in all the chunks and match it against pSet->cFreePages.
1806	*/
1807	uint32_t cPages = 0;
1808	for (unsigned i = 0; i < RT_ELEMENTS(pSet->apLists); i++)
1809	{
1810	for (PGMMCHUNK pCur = pSet->apLists[i]; pCur; pCur = pCur->pFreeNext)
1811	{
1812	/** @todo check that the chunk is hash into the right set. */
1813	cPages += pCur->cFree;
1814	}
1815	}
1816	if (RT_UNLIKELY(cPages != pSet->cFreePages))
1817	{
1818	SUPR0Printf("GMM insanity: found %#x pages in the %s set, expected %#x. (%s, line %u)\n",
1819	cPages, pszSetName, pSet->cFreePages, pszFunction, uLineNo);
1820	cErrors++;
1821	}
1822
1823	return cErrors;
1824	}
1825
1826
1827	/**
1828	* Performs some sanity checks on the GMM while owning lock.
1829	*
1830	* @returns Error count.
1831	*
1832	* @param pGMM Pointer to the GMM instance.
1833	* @param pszFunction The function from which it is called.
1834	* @param uLineNo The line number.
1835	*/
1836	static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo)
1837	{
1838	uint32_t cErrors = 0;
1839
1840	cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->PrivateX, "private", pszFunction, uLineNo);
1841	cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->Shared, "shared", pszFunction, uLineNo);
1842	/** @todo add more sanity checks. */
1843
1844	return cErrors;
1845	}
1846
1847	#endif /* GMMR0_WITH_SANITY_CHECK */
1848
1849	/**
1850	* Looks up a chunk in the tree and fill in the TLB entry for it.
1851	*
1852	* This is not expected to fail and will bitch if it does.
1853	*
1854	* @returns Pointer to the allocation chunk, NULL if not found.
1855	* @param pGMM Pointer to the GMM instance.
1856	* @param idChunk The ID of the chunk to find.
1857	* @param pTlbe Pointer to the TLB entry.
1858	*/
1859	static PGMMCHUNK gmmR0GetChunkSlow(PGMM pGMM, uint32_t idChunk, PGMMCHUNKTLBE pTlbe)
1860	{
1861	PGMMCHUNK pChunk = (PGMMCHUNK)RTAvlU32Get(&pGMM->pChunks, idChunk);
1862	AssertMsgReturn(pChunk, ("Chunk %#x not found!\n", idChunk), NULL);
1863	pTlbe->idChunk = idChunk;
1864	pTlbe->pChunk = pChunk;
1865	return pChunk;
1866	}
1867
1868
1869	/**
1870	* Finds a allocation chunk.
1871	*
1872	* This is not expected to fail and will bitch if it does.
1873	*
1874	* @returns Pointer to the allocation chunk, NULL if not found.
1875	* @param pGMM Pointer to the GMM instance.
1876	* @param idChunk The ID of the chunk to find.
1877	*/
1878	DECLINLINE(PGMMCHUNK) gmmR0GetChunk(PGMM pGMM, uint32_t idChunk)
1879	{
1880	/*
1881	* Do a TLB lookup, branch if not in the TLB.
1882	*/
1883	PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(idChunk)];
1884	if ( pTlbe->idChunk != idChunk
1885	\|\| !pTlbe->pChunk)
1886	return gmmR0GetChunkSlow(pGMM, idChunk, pTlbe);
1887	return pTlbe->pChunk;
1888	}
1889
1890
1891	/**
1892	* Finds a page.
1893	*
1894	* This is not expected to fail and will bitch if it does.
1895	*
1896	* @returns Pointer to the page, NULL if not found.
1897	* @param pGMM Pointer to the GMM instance.
1898	* @param idPage The ID of the page to find.
1899	*/
1900	DECLINLINE(PGMMPAGE) gmmR0GetPage(PGMM pGMM, uint32_t idPage)
1901	{
1902	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1903	if (RT_LIKELY(pChunk))
1904	return &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
1905	return NULL;
1906	}
1907
1908
1909	#if 0 /* unused */
1910	/**
1911	* Gets the host physical address for a page given by it's ID.
1912	*
1913	* @returns The host physical address or NIL_RTHCPHYS.
1914	* @param pGMM Pointer to the GMM instance.
1915	* @param idPage The ID of the page to find.
1916	*/
1917	DECLINLINE(RTHCPHYS) gmmR0GetPageHCPhys(PGMM pGMM, uint32_t idPage)
1918	{
1919	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1920	if (RT_LIKELY(pChunk))
1921	return RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, idPage & GMM_PAGEID_IDX_MASK);
1922	return NIL_RTHCPHYS;
1923	}
1924	#endif /* unused */
1925
1926
1927	/**
1928	* Selects the appropriate free list given the number of free pages.
1929	*
1930	* @returns Free list index.
1931	* @param cFree The number of free pages in the chunk.
1932	*/
1933	DECLINLINE(unsigned) gmmR0SelectFreeSetList(unsigned cFree)
1934	{
1935	unsigned iList = cFree >> GMM_CHUNK_FREE_SET_SHIFT;
1936	AssertMsg(iList < RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists) / RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists[0]),
1937	("%d (%u)\n", iList, cFree));
1938	return iList;
1939	}
1940
1941
1942	/**
1943	* Unlinks the chunk from the free list it's currently on (if any).
1944	*
1945	* @param pChunk The allocation chunk.
1946	*/
1947	DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk)
1948	{
1949	PGMMCHUNKFREESET pSet = pChunk->pSet;
1950	if (RT_LIKELY(pSet))
1951	{
1952	pSet->cFreePages -= pChunk->cFree;
1953	pSet->idGeneration++;
1954
1955	PGMMCHUNK pPrev = pChunk->pFreePrev;
1956	PGMMCHUNK pNext = pChunk->pFreeNext;
1957	if (pPrev)
1958	pPrev->pFreeNext = pNext;
1959	else
1960	pSet->apLists[gmmR0SelectFreeSetList(pChunk->cFree)] = pNext;
1961	if (pNext)
1962	pNext->pFreePrev = pPrev;
1963
1964	pChunk->pSet = NULL;
1965	pChunk->pFreeNext = NULL;
1966	pChunk->pFreePrev = NULL;
1967	}
1968	else
1969	{
1970	Assert(!pChunk->pFreeNext);
1971	Assert(!pChunk->pFreePrev);
1972	Assert(!pChunk->cFree);
1973	}
1974	}
1975
1976
1977	/**
1978	* Links the chunk onto the appropriate free list in the specified free set.
1979	*
1980	* If no free entries, it's not linked into any list.
1981	*
1982	* @param pChunk The allocation chunk.
1983	* @param pSet The free set.
1984	*/
1985	DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet)
1986	{
1987	Assert(!pChunk->pSet);
1988	Assert(!pChunk->pFreeNext);
1989	Assert(!pChunk->pFreePrev);
1990
1991	if (pChunk->cFree > 0)
1992	{
1993	pChunk->pSet = pSet;
1994	pChunk->pFreePrev = NULL;
1995	unsigned const iList = gmmR0SelectFreeSetList(pChunk->cFree);
1996	pChunk->pFreeNext = pSet->apLists[iList];
1997	if (pChunk->pFreeNext)
1998	pChunk->pFreeNext->pFreePrev = pChunk;
1999	pSet->apLists[iList] = pChunk;
2000
2001	pSet->cFreePages += pChunk->cFree;
2002	pSet->idGeneration++;
2003	}
2004	}
2005
2006
2007	/**
2008	* Links the chunk onto the appropriate free list in the specified free set.
2009	*
2010	* If no free entries, it's not linked into any list.
2011	*
2012	* @param pGMM Pointer to the GMM instance.
2013	* @param pGVM Pointer to the kernel-only VM instace data.
2014	* @param pChunk The allocation chunk.
2015	*/
2016	DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
2017	{
2018	PGMMCHUNKFREESET pSet;
2019	if (pGMM->fBoundMemoryMode)
2020	pSet = &pGVM->gmm.s.Private;
2021	else if (pChunk->cShared)
2022	pSet = &pGMM->Shared;
2023	else
2024	pSet = &pGMM->PrivateX;
2025	gmmR0LinkChunk(pChunk, pSet);
2026	}
2027
2028
2029	/**
2030	* Frees a Chunk ID.
2031	*
2032	* @param pGMM Pointer to the GMM instance.
2033	* @param idChunk The Chunk ID to free.
2034	*/
2035	static void gmmR0FreeChunkId(PGMM pGMM, uint32_t idChunk)
2036	{
2037	AssertReturnVoid(idChunk != NIL_GMM_CHUNKID);
2038	AssertMsg(ASMBitTest(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk));
2039	ASMAtomicBitClear(&pGMM->bmChunkId[0], idChunk);
2040	}
2041
2042
2043	/**
2044	* Allocates a new Chunk ID.
2045	*
2046	* @returns The Chunk ID.
2047	* @param pGMM Pointer to the GMM instance.
2048	*/
2049	static uint32_t gmmR0AllocateChunkId(PGMM pGMM)
2050	{
2051	AssertCompile(!((GMM_CHUNKID_LAST + 1) & 31)); /* must be a multiple of 32 */
2052	AssertCompile(NIL_GMM_CHUNKID == 0);
2053
2054	/*
2055	* Try the next sequential one.
2056	*/
2057	int32_t idChunk = ++pGMM->idChunkPrev;
2058	#if 0 /** @todo enable this code */
2059	if ( idChunk <= GMM_CHUNKID_LAST
2060	&& idChunk > NIL_GMM_CHUNKID
2061	&& !ASMAtomicBitTestAndSet(&pVMM->bmChunkId[0], idChunk))
2062	return idChunk;
2063	#endif
2064
2065	/*
2066	* Scan sequentially from the last one.
2067	*/
2068	if ( (uint32_t)idChunk < GMM_CHUNKID_LAST
2069	&& idChunk > NIL_GMM_CHUNKID)
2070	{
2071	idChunk = ASMBitNextClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1, idChunk - 1);
2072	if (idChunk > NIL_GMM_CHUNKID)
2073	{
2074	AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2075	return pGMM->idChunkPrev = idChunk;
2076	}
2077	}
2078
2079	/*
2080	* Ok, scan from the start.
2081	* We're not racing anyone, so there is no need to expect failures or have restart loops.
2082	*/
2083	idChunk = ASMBitFirstClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1);
2084	AssertMsgReturn(idChunk > NIL_GMM_CHUNKID, ("%#x\n", idChunk), NIL_GVM_HANDLE);
2085	AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2086
2087	return pGMM->idChunkPrev = idChunk;
2088	}
2089
2090
2091	/**
2092	* Allocates one private page.
2093	*
2094	* Worker for gmmR0AllocatePages.
2095	*
2096	* @param pChunk The chunk to allocate it from.
2097	* @param hGVM The GVM handle of the VM requesting memory.
2098	* @param pPageDesc The page descriptor.
2099	*/
2100	static void gmmR0AllocatePage(PGMMCHUNK pChunk, uint32_t hGVM, PGMMPAGEDESC pPageDesc)
2101	{
2102	/* update the chunk stats. */
2103	if (pChunk->hGVM == NIL_GVM_HANDLE)
2104	pChunk->hGVM = hGVM;
2105	Assert(pChunk->cFree);
2106	pChunk->cFree--;
2107	pChunk->cPrivate++;
2108
2109	/* unlink the first free page. */
2110	const uint32_t iPage = pChunk->iFreeHead;
2111	AssertReleaseMsg(iPage < RT_ELEMENTS(pChunk->aPages), ("%d\n", iPage));
2112	PGMMPAGE pPage = &pChunk->aPages[iPage];
2113	Assert(GMM_PAGE_IS_FREE(pPage));
2114	pChunk->iFreeHead = pPage->Free.iNext;
2115	Log3(("A pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x iNext=%#x\n",
2116	pPage, iPage, (pChunk->Core.Key << GMM_CHUNKID_SHIFT) \| iPage,
2117	pPage->Common.u2State, pChunk->iFreeHead, pPage->Free.iNext));
2118
2119	/* make the page private. */
2120	pPage->u = 0;
2121	AssertCompile(GMM_PAGE_STATE_PRIVATE == 0);
2122	pPage->Private.hGVM = hGVM;
2123	AssertCompile(NIL_RTHCPHYS >= GMM_GCPHYS_LAST);
2124	AssertCompile(GMM_GCPHYS_UNSHAREABLE >= GMM_GCPHYS_LAST);
2125	if (pPageDesc->HCPhysGCPhys <= GMM_GCPHYS_LAST)
2126	pPage->Private.pfn = pPageDesc->HCPhysGCPhys >> PAGE_SHIFT;
2127	else
2128	pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE; /* unshareable / unassigned - same thing. */
2129
2130	/* update the page descriptor. */
2131	pPageDesc->HCPhysGCPhys = RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, iPage);
2132	Assert(pPageDesc->HCPhysGCPhys != NIL_RTHCPHYS);
2133	pPageDesc->idPage = (pChunk->Core.Key << GMM_CHUNKID_SHIFT) \| iPage;
2134	pPageDesc->idSharedPage = NIL_GMM_PAGEID;
2135	}
2136
2137
2138	/**
2139	* Picks the free pages from a chunk.
2140	*
2141	* @returns The new page descriptor table index.
2142	* @param pChunk The chunk.
2143	* @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2144	* affinity.
2145	* @param iPage The current page descriptor table index.
2146	* @param cPages The total number of pages to allocate.
2147	* @param paPages The page descriptor table (input + ouput).
2148	*/
2149	static uint32_t gmmR0AllocatePagesFromChunk(PGMMCHUNK pChunk, uint16_t const hGVM, uint32_t iPage, uint32_t cPages,
2150	PGMMPAGEDESC paPages)
2151	{
2152	PGMMCHUNKFREESET pSet = pChunk->pSet; Assert(pSet);
2153	gmmR0UnlinkChunk(pChunk);
2154
2155	for (; pChunk->cFree && iPage < cPages; iPage++)
2156	gmmR0AllocatePage(pChunk, hGVM, &paPages[iPage]);
2157
2158	gmmR0LinkChunk(pChunk, pSet);
2159	return iPage;
2160	}
2161
2162
2163	/**
2164	* Registers a new chunk of memory.
2165	*
2166	* This is called by both gmmR0AllocateOneChunk and GMMR0SeedChunk.
2167	*
2168	* @returns VBox status code. On success, the giant GMM lock will be held, the
2169	* caller must release it (ugly).
2170	* @param pGMM Pointer to the GMM instance.
2171	* @param pSet Pointer to the set.
2172	* @param hMemObj The memory object for the chunk.
2173	* @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2174	* affinity.
2175	* @param fChunkFlags The chunk flags, GMM_CHUNK_FLAGS_XXX.
2176	* @param ppChunk Chunk address (out). Optional.
2177	*
2178	* @remarks The caller must not own the giant GMM mutex.
2179	* The giant GMM mutex will be acquired and returned acquired in
2180	* the success path. On failure, no locks will be held.
2181	*/
2182	static int gmmR0RegisterChunk(PGMM pGMM, PGMMCHUNKFREESET pSet, RTR0MEMOBJ hMemObj, uint16_t hGVM, uint16_t fChunkFlags,
2183	PGMMCHUNK *ppChunk)
2184	{
2185	Assert(pGMM->hMtxOwner != RTThreadNativeSelf());
2186	Assert(hGVM != NIL_GVM_HANDLE \|\| pGMM->fBoundMemoryMode);
2187	#ifdef GMM_WITH_LEGACY_MODE
2188	Assert(fChunkFlags == 0 \|\| fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE \|\| fChunkFlags == GMM_CHUNK_FLAGS_SEEDED);
2189	#else
2190	Assert(fChunkFlags == 0 \|\| fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE);
2191	#endif
2192
2193	#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
2194	/*
2195	* Get a ring-0 mapping of the object.
2196	*/
2197	# ifdef GMM_WITH_LEGACY_MODE
2198	uint8_t pbMapping = !(fChunkFlags & GMM_CHUNK_FLAGS_SEEDED) ? (uint8_t )RTR0MemObjAddress(hMemObj) : NULL;
2199	# else
2200	uint8_t pbMapping = (uint8_t )RTR0MemObjAddress(hMemObj);
2201	# endif
2202	if (!pbMapping)
2203	{
2204	RTR0MEMOBJ hMapObj;
2205	int rc = RTR0MemObjMapKernel(&hMapObj, hMemObj, (void *)-1, 0, RTMEM_PROT_READ \| RTMEM_PROT_WRITE);
2206	if (RT_SUCCESS(rc))
2207	pbMapping = (uint8_t *)RTR0MemObjAddress(hMapObj);
2208	else
2209	return rc;
2210	AssertPtr(pbMapping);
2211	}
2212	#endif
2213
2214	/*
2215	* Allocate a chunk.
2216	*/
2217	int rc;
2218	PGMMCHUNK pChunk = (PGMMCHUNK)RTMemAllocZ(sizeof(*pChunk));
2219	if (pChunk)
2220	{
2221	/*
2222	* Initialize it.
2223	*/
2224	pChunk->hMemObj = hMemObj;
2225	#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
2226	pChunk->pbMapping = pbMapping;
2227	#endif
2228	pChunk->cFree = GMM_CHUNK_NUM_PAGES;
2229	pChunk->hGVM = hGVM;
2230	/pChunk->iFreeHead = 0;/
2231	pChunk->idNumaNode = gmmR0GetCurrentNumaNodeId();
2232	pChunk->iChunkMtx = UINT8_MAX;
2233	pChunk->fFlags = fChunkFlags;
2234	for (unsigned iPage = 0; iPage < RT_ELEMENTS(pChunk->aPages) - 1; iPage++)
2235	{
2236	pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
2237	pChunk->aPages[iPage].Free.iNext = iPage + 1;
2238	}
2239	pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.u2State = GMM_PAGE_STATE_FREE;
2240	pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.iNext = UINT16_MAX;
2241
2242	/*
2243	* Allocate a Chunk ID and insert it into the tree.
2244	* This has to be done behind the mutex of course.
2245	*/
2246	rc = gmmR0MutexAcquire(pGMM);
2247	if (RT_SUCCESS(rc))
2248	{
2249	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2250	{
2251	pChunk->Core.Key = gmmR0AllocateChunkId(pGMM);
2252	if ( pChunk->Core.Key != NIL_GMM_CHUNKID
2253	&& pChunk->Core.Key <= GMM_CHUNKID_LAST
2254	&& RTAvlU32Insert(&pGMM->pChunks, &pChunk->Core))
2255	{
2256	pGMM->cChunks++;
2257	RTListAppend(&pGMM->ChunkList, &pChunk->ListNode);
2258	gmmR0LinkChunk(pChunk, pSet);
2259	LogFlow(("gmmR0RegisterChunk: pChunk=%p id=%#x cChunks=%d\n", pChunk, pChunk->Core.Key, pGMM->cChunks));
2260
2261	if (ppChunk)
2262	*ppChunk = pChunk;
2263	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2264	return VINF_SUCCESS;
2265	}
2266
2267	/* bail out */
2268	rc = VERR_GMM_CHUNK_INSERT;
2269	}
2270	else
2271	rc = VERR_GMM_IS_NOT_SANE;
2272	gmmR0MutexRelease(pGMM);
2273	}
2274
2275	RTMemFree(pChunk);
2276	}
2277	else
2278	rc = VERR_NO_MEMORY;
2279	return rc;
2280	}
2281
2282
2283	/**
2284	* Allocate a new chunk, immediately pick the requested pages from it, and adds
2285	* what's remaining to the specified free set.
2286	*
2287	* @note This will leave the giant mutex while allocating the new chunk!
2288	*
2289	* @returns VBox status code.
2290	* @param pGMM Pointer to the GMM instance data.
2291	* @param pGVM Pointer to the kernel-only VM instace data.
2292	* @param pSet Pointer to the free set.
2293	* @param cPages The number of pages requested.
2294	* @param paPages The page descriptor table (input + output).
2295	* @param piPage The pointer to the page descriptor table index variable.
2296	* This will be updated.
2297	*/
2298	static int gmmR0AllocateChunkNew(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet, uint32_t cPages,
2299	PGMMPAGEDESC paPages, uint32_t *piPage)
2300	{
2301	gmmR0MutexRelease(pGMM);
2302
2303	RTR0MEMOBJ hMemObj;
2304	#ifndef GMM_WITH_LEGACY_MODE
2305	int rc;
2306	# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2307	if (pGMM->fHasWorkingAllocPhysNC)
2308	rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2309	else
2310	# endif
2311	rc = RTR0MemObjAllocPage(&hMemObj, GMM_CHUNK_SIZE, false /fExecutable/);
2312	#else
2313	int rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2314	#endif
2315	if (RT_SUCCESS(rc))
2316	{
2317	/** @todo Duplicate gmmR0RegisterChunk here so we can avoid chaining up the
2318	* free pages first and then unchaining them right afterwards. Instead
2319	* do as much work as possible without holding the giant lock. */
2320	PGMMCHUNK pChunk;
2321	rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, 0 /fChunkFlags/, &pChunk);
2322	if (RT_SUCCESS(rc))
2323	{
2324	piPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, piPage, cPages, paPages);
2325	return VINF_SUCCESS;
2326	}
2327
2328	/* bail out */
2329	RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
2330	}
2331
2332	int rc2 = gmmR0MutexAcquire(pGMM);
2333	AssertRCReturn(rc2, RT_FAILURE(rc) ? rc : rc2);
2334	return rc;
2335
2336	}
2337
2338
2339	/**
2340	* As a last restort we'll pick any page we can get.
2341	*
2342	* @returns The new page descriptor table index.
2343	* @param pSet The set to pick from.
2344	* @param pGVM Pointer to the global VM structure.
2345	* @param iPage The current page descriptor table index.
2346	* @param cPages The total number of pages to allocate.
2347	* @param paPages The page descriptor table (input + ouput).
2348	*/
2349	static uint32_t gmmR0AllocatePagesIndiscriminately(PGMMCHUNKFREESET pSet, PGVM pGVM,
2350	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2351	{
2352	unsigned iList = RT_ELEMENTS(pSet->apLists);
2353	while (iList-- > 0)
2354	{
2355	PGMMCHUNK pChunk = pSet->apLists[iList];
2356	while (pChunk)
2357	{
2358	PGMMCHUNK pNext = pChunk->pFreeNext;
2359
2360	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2361	if (iPage >= cPages)
2362	return iPage;
2363
2364	pChunk = pNext;
2365	}
2366	}
2367	return iPage;
2368	}
2369
2370
2371	/**
2372	* Pick pages from empty chunks on the same NUMA node.
2373	*
2374	* @returns The new page descriptor table index.
2375	* @param pSet The set to pick from.
2376	* @param pGVM Pointer to the global VM structure.
2377	* @param iPage The current page descriptor table index.
2378	* @param cPages The total number of pages to allocate.
2379	* @param paPages The page descriptor table (input + ouput).
2380	*/
2381	static uint32_t gmmR0AllocatePagesFromEmptyChunksOnSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2382	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2383	{
2384	PGMMCHUNK pChunk = pSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
2385	if (pChunk)
2386	{
2387	uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2388	while (pChunk)
2389	{
2390	PGMMCHUNK pNext = pChunk->pFreeNext;
2391
2392	if (pChunk->idNumaNode == idNumaNode)
2393	{
2394	pChunk->hGVM = pGVM->hSelf;
2395	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2396	if (iPage >= cPages)
2397	{
2398	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2399	return iPage;
2400	}
2401	}
2402
2403	pChunk = pNext;
2404	}
2405	}
2406	return iPage;
2407	}
2408
2409
2410	/**
2411	* Pick pages from non-empty chunks on the same NUMA node.
2412	*
2413	* @returns The new page descriptor table index.
2414	* @param pSet The set to pick from.
2415	* @param pGVM Pointer to the global VM structure.
2416	* @param iPage The current page descriptor table index.
2417	* @param cPages The total number of pages to allocate.
2418	* @param paPages The page descriptor table (input + ouput).
2419	*/
2420	static uint32_t gmmR0AllocatePagesFromSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2421	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2422	{
2423	/** @todo start by picking from chunks with about the right size first? */
2424	uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2425	unsigned iList = GMM_CHUNK_FREE_SET_UNUSED_LIST;
2426	while (iList-- > 0)
2427	{
2428	PGMMCHUNK pChunk = pSet->apLists[iList];
2429	while (pChunk)
2430	{
2431	PGMMCHUNK pNext = pChunk->pFreeNext;
2432
2433	if (pChunk->idNumaNode == idNumaNode)
2434	{
2435	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2436	if (iPage >= cPages)
2437	{
2438	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2439	return iPage;
2440	}
2441	}
2442
2443	pChunk = pNext;
2444	}
2445	}
2446	return iPage;
2447	}
2448
2449
2450	/**
2451	* Pick pages that are in chunks already associated with the VM.
2452	*
2453	* @returns The new page descriptor table index.
2454	* @param pGMM Pointer to the GMM instance data.
2455	* @param pGVM Pointer to the global VM structure.
2456	* @param pSet The set to pick from.
2457	* @param iPage The current page descriptor table index.
2458	* @param cPages The total number of pages to allocate.
2459	* @param paPages The page descriptor table (input + ouput).
2460	*/
2461	static uint32_t gmmR0AllocatePagesAssociatedWithVM(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet,
2462	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2463	{
2464	uint16_t const hGVM = pGVM->hSelf;
2465
2466	/* Hint. */
2467	if (pGVM->gmm.s.idLastChunkHint != NIL_GMM_CHUNKID)
2468	{
2469	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pGVM->gmm.s.idLastChunkHint);
2470	if (pChunk && pChunk->cFree)
2471	{
2472	iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2473	if (iPage >= cPages)
2474	return iPage;
2475	}
2476	}
2477
2478	/* Scan. */
2479	for (unsigned iList = 0; iList < RT_ELEMENTS(pSet->apLists); iList++)
2480	{
2481	PGMMCHUNK pChunk = pSet->apLists[iList];
2482	while (pChunk)
2483	{
2484	PGMMCHUNK pNext = pChunk->pFreeNext;
2485
2486	if (pChunk->hGVM == hGVM)
2487	{
2488	iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2489	if (iPage >= cPages)
2490	{
2491	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2492	return iPage;
2493	}
2494	}
2495
2496	pChunk = pNext;
2497	}
2498	}
2499	return iPage;
2500	}
2501
2502
2503
2504	/**
2505	* Pick pages in bound memory mode.
2506	*
2507	* @returns The new page descriptor table index.
2508	* @param pGVM Pointer to the global VM structure.
2509	* @param iPage The current page descriptor table index.
2510	* @param cPages The total number of pages to allocate.
2511	* @param paPages The page descriptor table (input + ouput).
2512	*/
2513	static uint32_t gmmR0AllocatePagesInBoundMode(PGVM pGVM, uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2514	{
2515	for (unsigned iList = 0; iList < RT_ELEMENTS(pGVM->gmm.s.Private.apLists); iList++)
2516	{
2517	PGMMCHUNK pChunk = pGVM->gmm.s.Private.apLists[iList];
2518	while (pChunk)
2519	{
2520	Assert(pChunk->hGVM == pGVM->hSelf);
2521	PGMMCHUNK pNext = pChunk->pFreeNext;
2522	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2523	if (iPage >= cPages)
2524	return iPage;
2525	pChunk = pNext;
2526	}
2527	}
2528	return iPage;
2529	}
2530
2531
2532	/**
2533	* Checks if we should start picking pages from chunks of other VMs because
2534	* we're getting close to the system memory or reserved limit.
2535	*
2536	* @returns @c true if we should, @c false if we should first try allocate more
2537	* chunks.
2538	*/
2539	static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(PGVM pGVM)
2540	{
2541	/*
2542	* Don't allocate a new chunk if we're
2543	*/
2544	uint64_t cPgReserved = pGVM->gmm.s.Stats.Reserved.cBasePages
2545	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
2546	- pGVM->gmm.s.Stats.cBalloonedPages
2547	/** @todo what about shared pages? */;
2548	uint64_t cPgAllocated = pGVM->gmm.s.Stats.Allocated.cBasePages
2549	+ pGVM->gmm.s.Stats.Allocated.cFixedPages;
2550	uint64_t cPgDelta = cPgReserved - cPgAllocated;
2551	if (cPgDelta < GMM_CHUNK_NUM_PAGES * 4)
2552	return true;
2553	/** @todo make the threshold configurable, also test the code to see if
2554	* this ever kicks in (we might be reserving too much or smth). */
2555
2556	/*
2557	* Check how close we're to the max memory limit and how many fragments
2558	* there are?...
2559	*/
2560	/** @todo */
2561
2562	return false;
2563	}
2564
2565
2566	/**
2567	* Checks if we should start picking pages from chunks of other VMs because
2568	* there is a lot of free pages around.
2569	*
2570	* @returns @c true if we should, @c false if we should first try allocate more
2571	* chunks.
2572	*/
2573	static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(PGMM pGMM)
2574	{
2575	/*
2576	* Setting the limit at 16 chunks (32 MB) at the moment.
2577	*/
2578	if (pGMM->PrivateX.cFreePages >= GMM_CHUNK_NUM_PAGES * 16)
2579	return true;
2580	return false;
2581	}
2582
2583
2584	/**
2585	* Common worker for GMMR0AllocateHandyPages and GMMR0AllocatePages.
2586	*
2587	* @returns VBox status code:
2588	* @retval VINF_SUCCESS on success.
2589	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk or
2590	* gmmR0AllocateMoreChunks is necessary.
2591	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2592	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2593	* that is we're trying to allocate more than we've reserved.
2594	*
2595	* @param pGMM Pointer to the GMM instance data.
2596	* @param pGVM Pointer to the VM.
2597	* @param cPages The number of pages to allocate.
2598	* @param paPages Pointer to the page descriptors. See GMMPAGEDESC for
2599	* details on what is expected on input.
2600	* @param enmAccount The account to charge.
2601	*
2602	* @remarks Call takes the giant GMM lock.
2603	*/
2604	static int gmmR0AllocatePagesNew(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2605	{
2606	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
2607
2608	/*
2609	* Check allocation limits.
2610	*/
2611	if (RT_UNLIKELY(pGMM->cAllocatedPages + cPages > pGMM->cMaxPages))
2612	return VERR_GMM_HIT_GLOBAL_LIMIT;
2613
2614	switch (enmAccount)
2615	{
2616	case GMMACCOUNT_BASE:
2617	if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
2618	> pGVM->gmm.s.Stats.Reserved.cBasePages))
2619	{
2620	Log(("gmmR0AllocatePages:Base: Reserved=%#llx Allocated+Ballooned+Requested=%#llx+%#llx+%#x!\n",
2621	pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages,
2622	pGVM->gmm.s.Stats.cBalloonedPages, cPages));
2623	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2624	}
2625	break;
2626	case GMMACCOUNT_SHADOW:
2627	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages + cPages > pGVM->gmm.s.Stats.Reserved.cShadowPages))
2628	{
2629	Log(("gmmR0AllocatePages:Shadow: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2630	pGVM->gmm.s.Stats.Reserved.cShadowPages, pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
2631	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2632	}
2633	break;
2634	case GMMACCOUNT_FIXED:
2635	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages + cPages > pGVM->gmm.s.Stats.Reserved.cFixedPages))
2636	{
2637	Log(("gmmR0AllocatePages:Fixed: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2638	pGVM->gmm.s.Stats.Reserved.cFixedPages, pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
2639	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2640	}
2641	break;
2642	default:
2643	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2644	}
2645
2646	#ifdef GMM_WITH_LEGACY_MODE
2647	/*
2648	* If we're in legacy memory mode, it's easy to figure if we have
2649	* sufficient number of pages up-front.
2650	*/
2651	if ( pGMM->fLegacyAllocationMode
2652	&& pGVM->gmm.s.Private.cFreePages < cPages)
2653	{
2654	Assert(pGMM->fBoundMemoryMode);
2655	return VERR_GMM_SEED_ME;
2656	}
2657	#endif
2658
2659	/*
2660	* Update the accounts before we proceed because we might be leaving the
2661	* protection of the global mutex and thus run the risk of permitting
2662	* too much memory to be allocated.
2663	*/
2664	switch (enmAccount)
2665	{
2666	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages += cPages; break;
2667	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages += cPages; break;
2668	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages += cPages; break;
2669	default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2670	}
2671	pGVM->gmm.s.Stats.cPrivatePages += cPages;
2672	pGMM->cAllocatedPages += cPages;
2673
2674	#ifdef GMM_WITH_LEGACY_MODE
2675	/*
2676	* Part two of it's-easy-in-legacy-memory-mode.
2677	*/
2678	if (pGMM->fLegacyAllocationMode)
2679	{
2680	uint32_t iPage = gmmR0AllocatePagesInBoundMode(pGVM, 0, cPages, paPages);
2681	AssertReleaseReturn(iPage == cPages, VERR_GMM_ALLOC_PAGES_IPE);
2682	return VINF_SUCCESS;
2683	}
2684	#endif
2685
2686	/*
2687	* Bound mode is also relatively straightforward.
2688	*/
2689	uint32_t iPage = 0;
2690	int rc = VINF_SUCCESS;
2691	if (pGMM->fBoundMemoryMode)
2692	{
2693	iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2694	if (iPage < cPages)
2695	do
2696	rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGVM->gmm.s.Private, cPages, paPages, &iPage);
2697	while (iPage < cPages && RT_SUCCESS(rc));
2698	}
2699	/*
2700	* Shared mode is trickier as we should try archive the same locality as
2701	* in bound mode, but smartly make use of non-full chunks allocated by
2702	* other VMs if we're low on memory.
2703	*/
2704	else
2705	{
2706	/* Pick the most optimal pages first. */
2707	iPage = gmmR0AllocatePagesAssociatedWithVM(pGMM, pGVM, &pGMM->PrivateX, iPage, cPages, paPages);
2708	if (iPage < cPages)
2709	{
2710	/* Maybe we should try getting pages from chunks "belonging" to
2711	other VMs before allocating more chunks? */
2712	bool fTriedOnSameAlready = false;
2713	if (gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(pGVM))
2714	{
2715	iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2716	fTriedOnSameAlready = true;
2717	}
2718
2719	/* Allocate memory from empty chunks. */
2720	if (iPage < cPages)
2721	iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2722
2723	/* Grab empty shared chunks. */
2724	if (iPage < cPages)
2725	iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2726
2727	/* If there is a lof of free pages spread around, try not waste
2728	system memory on more chunks. (Should trigger defragmentation.) */
2729	if ( !fTriedOnSameAlready
2730	&& gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(pGMM))
2731	{
2732	iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2733	if (iPage < cPages)
2734	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2735	}
2736
2737	/*
2738	* Ok, try allocate new chunks.
2739	*/
2740	if (iPage < cPages)
2741	{
2742	do
2743	rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGMM->PrivateX, cPages, paPages, &iPage);
2744	while (iPage < cPages && RT_SUCCESS(rc));
2745
2746	/* If the host is out of memory, take whatever we can get. */
2747	if ( (rc == VERR_NO_MEMORY \|\| rc == VERR_NO_PHYS_MEMORY)
2748	&& pGMM->PrivateX.cFreePages + pGMM->Shared.cFreePages >= cPages - iPage)
2749	{
2750	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2751	if (iPage < cPages)
2752	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2753	AssertRelease(iPage == cPages);
2754	rc = VINF_SUCCESS;
2755	}
2756	}
2757	}
2758	}
2759
2760	/*
2761	* Clean up on failure. Since this is bound to be a low-memory condition
2762	* we will give back any empty chunks that might be hanging around.
2763	*/
2764	if (RT_FAILURE(rc))
2765	{
2766	/* Update the statistics. */
2767	pGVM->gmm.s.Stats.cPrivatePages -= cPages;
2768	pGMM->cAllocatedPages -= cPages - iPage;
2769	switch (enmAccount)
2770	{
2771	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages; break;
2772	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= cPages; break;
2773	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= cPages; break;
2774	default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2775	}
2776
2777	/* Release the pages. */
2778	while (iPage-- > 0)
2779	{
2780	uint32_t idPage = paPages[iPage].idPage;
2781	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
2782	if (RT_LIKELY(pPage))
2783	{
2784	Assert(GMM_PAGE_IS_PRIVATE(pPage));
2785	Assert(pPage->Private.hGVM == pGVM->hSelf);
2786	gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
2787	}
2788	else
2789	AssertMsgFailed(("idPage=%#x\n", idPage));
2790
2791	paPages[iPage].idPage = NIL_GMM_PAGEID;
2792	paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2793	paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2794	}
2795
2796	/* Free empty chunks. */
2797	/** @todo */
2798
2799	/* return the fail status on failure */
2800	return rc;
2801	}
2802	return VINF_SUCCESS;
2803	}
2804
2805
2806	/**
2807	* Updates the previous allocations and allocates more pages.
2808	*
2809	* The handy pages are always taken from the 'base' memory account.
2810	* The allocated pages are not cleared and will contains random garbage.
2811	*
2812	* @returns VBox status code:
2813	* @retval VINF_SUCCESS on success.
2814	* @retval VERR_NOT_OWNER if the caller is not an EMT.
2815	* @retval VERR_GMM_PAGE_NOT_FOUND if one of the pages to update wasn't found.
2816	* @retval VERR_GMM_PAGE_NOT_PRIVATE if one of the pages to update wasn't a
2817	* private page.
2818	* @retval VERR_GMM_PAGE_NOT_SHARED if one of the pages to update wasn't a
2819	* shared page.
2820	* @retval VERR_GMM_NOT_PAGE_OWNER if one of the pages to be updated wasn't
2821	* owned by the VM.
2822	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2823	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2824	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2825	* that is we're trying to allocate more than we've reserved.
2826	*
2827	* @param pGVM The global (ring-0) VM structure.
2828	* @param idCpu The VCPU id.
2829	* @param cPagesToUpdate The number of pages to update (starting from the head).
2830	* @param cPagesToAlloc The number of pages to allocate (starting from the head).
2831	* @param paPages The array of page descriptors.
2832	* See GMMPAGEDESC for details on what is expected on input.
2833	* @thread EMT(idCpu)
2834	*/
2835	GMMR0DECL(int) GMMR0AllocateHandyPages(PGVM pGVM, VMCPUID idCpu, uint32_t cPagesToUpdate,
2836	uint32_t cPagesToAlloc, PGMMPAGEDESC paPages)
2837	{
2838	LogFlow(("GMMR0AllocateHandyPages: pGVM=%p cPagesToUpdate=%#x cPagesToAlloc=%#x paPages=%p\n",
2839	pGVM, cPagesToUpdate, cPagesToAlloc, paPages));
2840
2841	/*
2842	* Validate, get basics and take the semaphore.
2843	* (This is a relatively busy path, so make predictions where possible.)
2844	*/
2845	PGMM pGMM;
2846	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2847	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
2848	if (RT_FAILURE(rc))
2849	return rc;
2850
2851	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2852	AssertMsgReturn( (cPagesToUpdate && cPagesToUpdate < 1024)
2853	\|\| (cPagesToAlloc && cPagesToAlloc < 1024),
2854	("cPagesToUpdate=%#x cPagesToAlloc=%#x\n", cPagesToUpdate, cPagesToAlloc),
2855	VERR_INVALID_PARAMETER);
2856
2857	unsigned iPage = 0;
2858	for (; iPage < cPagesToUpdate; iPage++)
2859	{
2860	AssertMsgReturn( ( paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2861	&& !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK))
2862	\|\| paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2863	\|\| paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE,
2864	("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys),
2865	VERR_INVALID_PARAMETER);
2866	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2867	/\|\| paPages[iPage].idPage == NIL_GMM_PAGEID/,
2868	("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2869	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2870	/\|\| paPages[iPage].idSharedPage == NIL_GMM_PAGEID/,
2871	("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2872	}
2873
2874	for (; iPage < cPagesToAlloc; iPage++)
2875	{
2876	AssertMsgReturn(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS, ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys), VERR_INVALID_PARAMETER);
2877	AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2878	AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2879	}
2880
2881	gmmR0MutexAcquire(pGMM);
2882	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2883	{
2884	/* No allocations before the initial reservation has been made! */
2885	if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
2886	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
2887	&& pGVM->gmm.s.Stats.Reserved.cShadowPages))
2888	{
2889	/*
2890	* Perform the updates.
2891	* Stop on the first error.
2892	*/
2893	for (iPage = 0; iPage < cPagesToUpdate; iPage++)
2894	{
2895	if (paPages[iPage].idPage != NIL_GMM_PAGEID)
2896	{
2897	PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idPage);
2898	if (RT_LIKELY(pPage))
2899	{
2900	if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
2901	{
2902	if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
2903	{
2904	AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2905	if (RT_LIKELY(paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST))
2906	pPage->Private.pfn = paPages[iPage].HCPhysGCPhys >> PAGE_SHIFT;
2907	else if (paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE)
2908	pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
2909	/* else: NIL_RTHCPHYS nothing */
2910
2911	paPages[iPage].idPage = NIL_GMM_PAGEID;
2912	paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2913	}
2914	else
2915	{
2916	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not owner! hGVM=%#x hSelf=%#x\n",
2917	iPage, paPages[iPage].idPage, pPage->Private.hGVM, pGVM->hSelf));
2918	rc = VERR_GMM_NOT_PAGE_OWNER;
2919	break;
2920	}
2921	}
2922	else
2923	{
2924	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not private! %.Rhxs (type %d)\n", iPage, paPages[iPage].idPage, sizeof(pPage), pPage, pPage->Common.u2State));
2925	rc = VERR_GMM_PAGE_NOT_PRIVATE;
2926	break;
2927	}
2928	}
2929	else
2930	{
2931	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (private)\n", iPage, paPages[iPage].idPage));
2932	rc = VERR_GMM_PAGE_NOT_FOUND;
2933	break;
2934	}
2935	}
2936
2937	if (paPages[iPage].idSharedPage != NIL_GMM_PAGEID)
2938	{
2939	PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idSharedPage);
2940	if (RT_LIKELY(pPage))
2941	{
2942	if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
2943	{
2944	AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2945	Assert(pPage->Shared.cRefs);
2946	Assert(pGVM->gmm.s.Stats.cSharedPages);
2947	Assert(pGVM->gmm.s.Stats.Allocated.cBasePages);
2948
2949	Log(("GMMR0AllocateHandyPages: free shared page %x cRefs=%d\n", paPages[iPage].idSharedPage, pPage->Shared.cRefs));
2950	pGVM->gmm.s.Stats.cSharedPages--;
2951	pGVM->gmm.s.Stats.Allocated.cBasePages--;
2952	if (!--pPage->Shared.cRefs)
2953	gmmR0FreeSharedPage(pGMM, pGVM, paPages[iPage].idSharedPage, pPage);
2954	else
2955	{
2956	Assert(pGMM->cDuplicatePages);
2957	pGMM->cDuplicatePages--;
2958	}
2959
2960	paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2961	}
2962	else
2963	{
2964	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not shared!\n", iPage, paPages[iPage].idSharedPage));
2965	rc = VERR_GMM_PAGE_NOT_SHARED;
2966	break;
2967	}
2968	}
2969	else
2970	{
2971	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (shared)\n", iPage, paPages[iPage].idSharedPage));
2972	rc = VERR_GMM_PAGE_NOT_FOUND;
2973	break;
2974	}
2975	}
2976	} /* for each page to update */
2977
2978	if (RT_SUCCESS(rc) && cPagesToAlloc > 0)
2979	{
2980	#if defined(VBOX_STRICT) && 0 /** @todo re-test this later. Appeared to be a PGM init bug. */
2981	for (iPage = 0; iPage < cPagesToAlloc; iPage++)
2982	{
2983	Assert(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS);
2984	Assert(paPages[iPage].idPage == NIL_GMM_PAGEID);
2985	Assert(paPages[iPage].idSharedPage == NIL_GMM_PAGEID);
2986	}
2987	#endif
2988
2989	/*
2990	* Join paths with GMMR0AllocatePages for the allocation.
2991	* Note! gmmR0AllocateMoreChunks may leave the protection of the mutex!
2992	*/
2993	rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPagesToAlloc, paPages, GMMACCOUNT_BASE);
2994	}
2995	}
2996	else
2997	rc = VERR_WRONG_ORDER;
2998	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2999	}
3000	else
3001	rc = VERR_GMM_IS_NOT_SANE;
3002	gmmR0MutexRelease(pGMM);
3003	LogFlow(("GMMR0AllocateHandyPages: returns %Rrc\n", rc));
3004	return rc;
3005	}
3006
3007
3008	/**
3009	* Allocate one or more pages.
3010	*
3011	* This is typically used for ROMs and MMIO2 (VRAM) during VM creation.
3012	* The allocated pages are not cleared and will contain random garbage.
3013	*
3014	* @returns VBox status code:
3015	* @retval VINF_SUCCESS on success.
3016	* @retval VERR_NOT_OWNER if the caller is not an EMT.
3017	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
3018	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3019	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3020	* that is we're trying to allocate more than we've reserved.
3021	*
3022	* @param pGVM The global (ring-0) VM structure.
3023	* @param idCpu The VCPU id.
3024	* @param cPages The number of pages to allocate.
3025	* @param paPages Pointer to the page descriptors.
3026	* See GMMPAGEDESC for details on what is expected on
3027	* input.
3028	* @param enmAccount The account to charge.
3029	*
3030	* @thread EMT.
3031	*/
3032	GMMR0DECL(int) GMMR0AllocatePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
3033	{
3034	LogFlow(("GMMR0AllocatePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3035
3036	/*
3037	* Validate, get basics and take the semaphore.
3038	*/
3039	PGMM pGMM;
3040	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3041	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3042	if (RT_FAILURE(rc))
3043	return rc;
3044
3045	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3046	AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3047	AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3048
3049	for (unsigned iPage = 0; iPage < cPages; iPage++)
3050	{
3051	AssertMsgReturn( paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
3052	\|\| paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE
3053	\|\| ( enmAccount == GMMACCOUNT_BASE
3054	&& paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
3055	&& !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK)),
3056	("#%#x: %RHp enmAccount=%d\n", iPage, paPages[iPage].HCPhysGCPhys, enmAccount),
3057	VERR_INVALID_PARAMETER);
3058	AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3059	AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
3060	}
3061
3062	gmmR0MutexAcquire(pGMM);
3063	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3064	{
3065
3066	/* No allocations before the initial reservation has been made! */
3067	if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
3068	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
3069	&& pGVM->gmm.s.Stats.Reserved.cShadowPages))
3070	rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPages, paPages, enmAccount);
3071	else
3072	rc = VERR_WRONG_ORDER;
3073	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3074	}
3075	else
3076	rc = VERR_GMM_IS_NOT_SANE;
3077	gmmR0MutexRelease(pGMM);
3078	LogFlow(("GMMR0AllocatePages: returns %Rrc\n", rc));
3079	return rc;
3080	}
3081
3082
3083	/**
3084	* VMMR0 request wrapper for GMMR0AllocatePages.
3085	*
3086	* @returns see GMMR0AllocatePages.
3087	* @param pGVM The global (ring-0) VM structure.
3088	* @param idCpu The VCPU id.
3089	* @param pReq Pointer to the request packet.
3090	*/
3091	GMMR0DECL(int) GMMR0AllocatePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMALLOCATEPAGESREQ pReq)
3092	{
3093	/*
3094	* Validate input and pass it on.
3095	*/
3096	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3097	AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0]),
3098	("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0])),
3099	VERR_INVALID_PARAMETER);
3100	AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages]),
3101	("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages])),
3102	VERR_INVALID_PARAMETER);
3103
3104	return GMMR0AllocatePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3105	}
3106
3107
3108	/**
3109	* Allocate a large page to represent guest RAM
3110	*
3111	* The allocated pages are not cleared and will contains random garbage.
3112	*
3113	* @returns VBox status code:
3114	* @retval VINF_SUCCESS on success.
3115	* @retval VERR_NOT_OWNER if the caller is not an EMT.
3116	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
3117	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3118	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3119	* that is we're trying to allocate more than we've reserved.
3120	* @returns see GMMR0AllocatePages.
3121	*
3122	* @param pGVM The global (ring-0) VM structure.
3123	* @param idCpu The VCPU id.
3124	* @param cbPage Large page size.
3125	* @param pIdPage Where to return the GMM page ID of the page.
3126	* @param pHCPhys Where to return the host physical address of the page.
3127	*/
3128	GMMR0DECL(int) GMMR0AllocateLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t cbPage, uint32_t pIdPage, RTHCPHYS pHCPhys)
3129	{
3130	LogFlow(("GMMR0AllocateLargePage: pGVM=%p cbPage=%x\n", pGVM, cbPage));
3131
3132	AssertReturn(cbPage == GMM_CHUNK_SIZE, VERR_INVALID_PARAMETER);
3133	AssertPtrReturn(pIdPage, VERR_INVALID_PARAMETER);
3134	AssertPtrReturn(pHCPhys, VERR_INVALID_PARAMETER);
3135
3136	/*
3137	* Validate, get basics and take the semaphore.
3138	*/
3139	PGMM pGMM;
3140	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3141	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3142	if (RT_FAILURE(rc))
3143	return rc;
3144
3145	#ifdef GMM_WITH_LEGACY_MODE
3146	// /* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3147	// if (pGMM->fLegacyAllocationMode)
3148	// return VERR_NOT_SUPPORTED;
3149	#endif
3150
3151	*pHCPhys = NIL_RTHCPHYS;
3152	*pIdPage = NIL_GMM_PAGEID;
3153
3154	gmmR0MutexAcquire(pGMM);
3155	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3156	{
3157	const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3158	if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
3159	> pGVM->gmm.s.Stats.Reserved.cBasePages))
3160	{
3161	Log(("GMMR0AllocateLargePage: Reserved=%#llx Allocated+Requested=%#llx+%#x!\n",
3162	pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3163	gmmR0MutexRelease(pGMM);
3164	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
3165	}
3166
3167	/*
3168	* Allocate a new large page chunk.
3169	*
3170	* Note! We leave the giant GMM lock temporarily as the allocation might
3171	* take a long time. gmmR0RegisterChunk will retake it (ugly).
3172	*/
3173	AssertCompile(GMM_CHUNK_SIZE == _2M);
3174	gmmR0MutexRelease(pGMM);
3175
3176	RTR0MEMOBJ hMemObj;
3177	rc = RTR0MemObjAllocPhysEx(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS, GMM_CHUNK_SIZE);
3178	if (RT_SUCCESS(rc))
3179	{
3180	PGMMCHUNKFREESET pSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
3181	PGMMCHUNK pChunk;
3182	rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_LARGE_PAGE, &pChunk);
3183	if (RT_SUCCESS(rc))
3184	{
3185	/*
3186	* Allocate all the pages in the chunk.
3187	*/
3188	/* Unlink the new chunk from the free list. */
3189	gmmR0UnlinkChunk(pChunk);
3190
3191	/** @todo rewrite this to skip the looping. */
3192	/* Allocate all pages. */
3193	GMMPAGEDESC PageDesc;
3194	gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3195
3196	/* Return the first page as we'll use the whole chunk as one big page. */
3197	*pIdPage = PageDesc.idPage;
3198	*pHCPhys = PageDesc.HCPhysGCPhys;
3199
3200	for (unsigned i = 1; i < cPages; i++)
3201	gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3202
3203	/* Update accounting. */
3204	pGVM->gmm.s.Stats.Allocated.cBasePages += cPages;
3205	pGVM->gmm.s.Stats.cPrivatePages += cPages;
3206	pGMM->cAllocatedPages += cPages;
3207
3208	gmmR0LinkChunk(pChunk, pSet);
3209	gmmR0MutexRelease(pGMM);
3210	LogFlow(("GMMR0AllocateLargePage: returns VINF_SUCCESS\n"));
3211	return VINF_SUCCESS;
3212	}
3213	RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3214	}
3215	}
3216	else
3217	{
3218	gmmR0MutexRelease(pGMM);
3219	rc = VERR_GMM_IS_NOT_SANE;
3220	}
3221
3222	LogFlow(("GMMR0AllocateLargePage: returns %Rrc\n", rc));
3223	return rc;
3224	}
3225
3226
3227	/**
3228	* Free a large page.
3229	*
3230	* @returns VBox status code:
3231	* @param pGVM The global (ring-0) VM structure.
3232	* @param idCpu The VCPU id.
3233	* @param idPage The large page id.
3234	*/
3235	GMMR0DECL(int) GMMR0FreeLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t idPage)
3236	{
3237	LogFlow(("GMMR0FreeLargePage: pGVM=%p idPage=%x\n", pGVM, idPage));
3238
3239	/*
3240	* Validate, get basics and take the semaphore.
3241	*/
3242	PGMM pGMM;
3243	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3244	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3245	if (RT_FAILURE(rc))
3246	return rc;
3247
3248	#ifdef GMM_WITH_LEGACY_MODE
3249	// /* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3250	// if (pGMM->fLegacyAllocationMode)
3251	// return VERR_NOT_SUPPORTED;
3252	#endif
3253
3254	gmmR0MutexAcquire(pGMM);
3255	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3256	{
3257	const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3258
3259	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3260	{
3261	Log(("GMMR0FreeLargePage: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3262	gmmR0MutexRelease(pGMM);
3263	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3264	}
3265
3266	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3267	if (RT_LIKELY( pPage
3268	&& GMM_PAGE_IS_PRIVATE(pPage)))
3269	{
3270	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3271	Assert(pChunk);
3272	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3273	Assert(pChunk->cPrivate > 0);
3274
3275	/* Release the memory immediately. */
3276	gmmR0FreeChunk(pGMM, NULL, pChunk, false /fRelaxedSem/); /** @todo this can be relaxed too! */
3277
3278	/* Update accounting. */
3279	pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages;
3280	pGVM->gmm.s.Stats.cPrivatePages -= cPages;
3281	pGMM->cAllocatedPages -= cPages;
3282	}
3283	else
3284	rc = VERR_GMM_PAGE_NOT_FOUND;
3285	}
3286	else
3287	rc = VERR_GMM_IS_NOT_SANE;
3288
3289	gmmR0MutexRelease(pGMM);
3290	LogFlow(("GMMR0FreeLargePage: returns %Rrc\n", rc));
3291	return rc;
3292	}
3293
3294
3295	/**
3296	* VMMR0 request wrapper for GMMR0FreeLargePage.
3297	*
3298	* @returns see GMMR0FreeLargePage.
3299	* @param pGVM The global (ring-0) VM structure.
3300	* @param idCpu The VCPU id.
3301	* @param pReq Pointer to the request packet.
3302	*/
3303	GMMR0DECL(int) GMMR0FreeLargePageReq(PGVM pGVM, VMCPUID idCpu, PGMMFREELARGEPAGEREQ pReq)
3304	{
3305	/*
3306	* Validate input and pass it on.
3307	*/
3308	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3309	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMFREEPAGESREQ),
3310	("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(GMMFREEPAGESREQ)),
3311	VERR_INVALID_PARAMETER);
3312
3313	return GMMR0FreeLargePage(pGVM, idCpu, pReq->idPage);
3314	}
3315
3316
3317	/**
3318	* Frees a chunk, giving it back to the host OS.
3319	*
3320	* @param pGMM Pointer to the GMM instance.
3321	* @param pGVM This is set when called from GMMR0CleanupVM so we can
3322	* unmap and free the chunk in one go.
3323	* @param pChunk The chunk to free.
3324	* @param fRelaxedSem Whether we can release the semaphore while doing the
3325	* freeing (@c true) or not.
3326	*/
3327	static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3328	{
3329	Assert(pChunk->Core.Key != NIL_GMM_CHUNKID);
3330
3331	GMMR0CHUNKMTXSTATE MtxState;
3332	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3333
3334	/*
3335	* Cleanup hack! Unmap the chunk from the callers address space.
3336	* This shouldn't happen, so screw lock contention...
3337	*/
3338	if ( pChunk->cMappingsX
3339	#ifdef GMM_WITH_LEGACY_MODE
3340	&& (!pGMM->fLegacyAllocationMode \|\| (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
3341	#endif
3342	&& pGVM)
3343	gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3344
3345	/*
3346	* If there are current mappings of the chunk, then request the
3347	* VMs to unmap them. Reposition the chunk in the free list so
3348	* it won't be a likely candidate for allocations.
3349	*/
3350	if (pChunk->cMappingsX)
3351	{
3352	/** @todo R0 -> VM request */
3353	/* The chunk can be mapped by more than one VM if fBoundMemoryMode is false! */
3354	Log(("gmmR0FreeChunk: chunk still has %d mappings; don't free!\n", pChunk->cMappingsX));
3355	gmmR0ChunkMutexRelease(&MtxState, pChunk);
3356	return false;
3357	}
3358
3359
3360	/*
3361	* Save and trash the handle.
3362	*/
3363	RTR0MEMOBJ const hMemObj = pChunk->hMemObj;
3364	pChunk->hMemObj = NIL_RTR0MEMOBJ;
3365
3366	/*
3367	* Unlink it from everywhere.
3368	*/
3369	gmmR0UnlinkChunk(pChunk);
3370
3371	RTListNodeRemove(&pChunk->ListNode);
3372
3373	PAVLU32NODECORE pCore = RTAvlU32Remove(&pGMM->pChunks, pChunk->Core.Key);
3374	Assert(pCore == &pChunk->Core); NOREF(pCore);
3375
3376	PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(pChunk->Core.Key)];
3377	if (pTlbe->pChunk == pChunk)
3378	{
3379	pTlbe->idChunk = NIL_GMM_CHUNKID;
3380	pTlbe->pChunk = NULL;
3381	}
3382
3383	Assert(pGMM->cChunks > 0);
3384	pGMM->cChunks--;
3385
3386	/*
3387	* Free the Chunk ID before dropping the locks and freeing the rest.
3388	*/
3389	gmmR0FreeChunkId(pGMM, pChunk->Core.Key);
3390	pChunk->Core.Key = NIL_GMM_CHUNKID;
3391
3392	pGMM->cFreedChunks++;
3393
3394	gmmR0ChunkMutexRelease(&MtxState, NULL);
3395	if (fRelaxedSem)
3396	gmmR0MutexRelease(pGMM);
3397
3398	RTMemFree(pChunk->paMappingsX);
3399	pChunk->paMappingsX = NULL;
3400
3401	RTMemFree(pChunk);
3402
3403	#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
3404	int rc = RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3405	#else
3406	int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3407	#endif
3408	AssertLogRelRC(rc);
3409
3410	if (fRelaxedSem)
3411	gmmR0MutexAcquire(pGMM);
3412	return fRelaxedSem;
3413	}
3414
3415
3416	/**
3417	* Free page worker.
3418	*
3419	* The caller does all the statistic decrementing, we do all the incrementing.
3420	*
3421	* @param pGMM Pointer to the GMM instance data.
3422	* @param pGVM Pointer to the GVM instance.
3423	* @param pChunk Pointer to the chunk this page belongs to.
3424	* @param idPage The Page ID.
3425	* @param pPage Pointer to the page.
3426	*/
3427	static void gmmR0FreePageWorker(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint32_t idPage, PGMMPAGE pPage)
3428	{
3429	Log3(("F pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x\n",
3430	pPage, pPage - &pChunk->aPages[0], idPage, pPage->Common.u2State, pChunk->iFreeHead)); NOREF(idPage);
3431
3432	/*
3433	* Put the page on the free list.
3434	*/
3435	pPage->u = 0;
3436	pPage->Free.u2State = GMM_PAGE_STATE_FREE;
3437	Assert(pChunk->iFreeHead < RT_ELEMENTS(pChunk->aPages) \|\| pChunk->iFreeHead == UINT16_MAX);
3438	pPage->Free.iNext = pChunk->iFreeHead;
3439	pChunk->iFreeHead = pPage - &pChunk->aPages[0];
3440
3441	/*
3442	* Update statistics (the cShared/cPrivate stats are up to date already),
3443	* and relink the chunk if necessary.
3444	*/
3445	unsigned const cFree = pChunk->cFree;
3446	if ( !cFree
3447	\|\| gmmR0SelectFreeSetList(cFree) != gmmR0SelectFreeSetList(cFree + 1))
3448	{
3449	gmmR0UnlinkChunk(pChunk);
3450	pChunk->cFree++;
3451	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
3452	}
3453	else
3454	{
3455	pChunk->cFree = cFree + 1;
3456	pChunk->pSet->cFreePages++;
3457	}
3458
3459	/*
3460	* If the chunk becomes empty, consider giving memory back to the host OS.
3461	*
3462	* The current strategy is to try give it back if there are other chunks
3463	* in this free list, meaning if there are at least 240 free pages in this
3464	* category. Note that since there are probably mappings of the chunk,
3465	* it won't be freed up instantly, which probably screws up this logic
3466	* a bit...
3467	*/
3468	/** @todo Do this on the way out. */
3469	if (RT_LIKELY( pChunk->cFree != GMM_CHUNK_NUM_PAGES
3470	\|\| pChunk->pFreeNext == NULL
3471	\|\| pChunk->pFreePrev == NULL /** @todo this is probably misfiring, see reset... */))
3472	{ /* likely */ }
3473	#ifdef GMM_WITH_LEGACY_MODE
3474	else if (RT_LIKELY(pGMM->fLegacyAllocationMode && !(pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE)))
3475	{ /* likely */ }
3476	#endif
3477	else
3478	gmmR0FreeChunk(pGMM, NULL, pChunk, false);
3479
3480	}
3481
3482
3483	/**
3484	* Frees a shared page, the page is known to exist and be valid and such.
3485	*
3486	* @param pGMM Pointer to the GMM instance.
3487	* @param pGVM Pointer to the GVM instance.
3488	* @param idPage The page id.
3489	* @param pPage The page structure.
3490	*/
3491	DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3492	{
3493	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3494	Assert(pChunk);
3495	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3496	Assert(pChunk->cShared > 0);
3497	Assert(pGMM->cSharedPages > 0);
3498	Assert(pGMM->cAllocatedPages > 0);
3499	Assert(!pPage->Shared.cRefs);
3500
3501	pChunk->cShared--;
3502	pGMM->cAllocatedPages--;
3503	pGMM->cSharedPages--;
3504	gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3505	}
3506
3507
3508	/**
3509	* Frees a private page, the page is known to exist and be valid and such.
3510	*
3511	* @param pGMM Pointer to the GMM instance.
3512	* @param pGVM Pointer to the GVM instance.
3513	* @param idPage The page id.
3514	* @param pPage The page structure.
3515	*/
3516	DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3517	{
3518	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3519	Assert(pChunk);
3520	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3521	Assert(pChunk->cPrivate > 0);
3522	Assert(pGMM->cAllocatedPages > 0);
3523
3524	pChunk->cPrivate--;
3525	pGMM->cAllocatedPages--;
3526	gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3527	}
3528
3529
3530	/**
3531	* Common worker for GMMR0FreePages and GMMR0BalloonedPages.
3532	*
3533	* @returns VBox status code:
3534	* @retval xxx
3535	*
3536	* @param pGMM Pointer to the GMM instance data.
3537	* @param pGVM Pointer to the VM.
3538	* @param cPages The number of pages to free.
3539	* @param paPages Pointer to the page descriptors.
3540	* @param enmAccount The account this relates to.
3541	*/
3542	static int gmmR0FreePages(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3543	{
3544	/*
3545	* Check that the request isn't impossible wrt to the account status.
3546	*/
3547	switch (enmAccount)
3548	{
3549	case GMMACCOUNT_BASE:
3550	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3551	{
3552	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3553	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3554	}
3555	break;
3556	case GMMACCOUNT_SHADOW:
3557	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages < cPages))
3558	{
3559	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
3560	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3561	}
3562	break;
3563	case GMMACCOUNT_FIXED:
3564	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages < cPages))
3565	{
3566	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
3567	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3568	}
3569	break;
3570	default:
3571	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3572	}
3573
3574	/*
3575	* Walk the descriptors and free the pages.
3576	*
3577	* Statistics (except the account) are being updated as we go along,
3578	* unlike the alloc code. Also, stop on the first error.
3579	*/
3580	int rc = VINF_SUCCESS;
3581	uint32_t iPage;
3582	for (iPage = 0; iPage < cPages; iPage++)
3583	{
3584	uint32_t idPage = paPages[iPage].idPage;
3585	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3586	if (RT_LIKELY(pPage))
3587	{
3588	if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
3589	{
3590	if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
3591	{
3592	Assert(pGVM->gmm.s.Stats.cPrivatePages);
3593	pGVM->gmm.s.Stats.cPrivatePages--;
3594	gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
3595	}
3596	else
3597	{
3598	Log(("gmmR0AllocatePages: #%#x/%#x: not owner! hGVM=%#x hSelf=%#x\n", iPage, idPage,
3599	pPage->Private.hGVM, pGVM->hSelf));
3600	rc = VERR_GMM_NOT_PAGE_OWNER;
3601	break;
3602	}
3603	}
3604	else if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3605	{
3606	Assert(pGVM->gmm.s.Stats.cSharedPages);
3607	Assert(pPage->Shared.cRefs);
3608	#if defined(VBOX_WITH_PAGE_SHARING) && defined(VBOX_STRICT) && HC_ARCH_BITS == 64
3609	if (pPage->Shared.u14Checksum)
3610	{
3611	uint32_t uChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
3612	uChecksum &= UINT32_C(0x00003fff);
3613	AssertMsg(!uChecksum \|\| uChecksum == pPage->Shared.u14Checksum,
3614	("%#x vs %#x - idPage=%#x\n", uChecksum, pPage->Shared.u14Checksum, idPage));
3615	}
3616	#endif
3617	pGVM->gmm.s.Stats.cSharedPages--;
3618	if (!--pPage->Shared.cRefs)
3619	gmmR0FreeSharedPage(pGMM, pGVM, idPage, pPage);
3620	else
3621	{
3622	Assert(pGMM->cDuplicatePages);
3623	pGMM->cDuplicatePages--;
3624	}
3625	}
3626	else
3627	{
3628	Log(("gmmR0AllocatePages: #%#x/%#x: already free!\n", iPage, idPage));
3629	rc = VERR_GMM_PAGE_ALREADY_FREE;
3630	break;
3631	}
3632	}
3633	else
3634	{
3635	Log(("gmmR0AllocatePages: #%#x/%#x: not found!\n", iPage, idPage));
3636	rc = VERR_GMM_PAGE_NOT_FOUND;
3637	break;
3638	}
3639	paPages[iPage].idPage = NIL_GMM_PAGEID;
3640	}
3641
3642	/*
3643	* Update the account.
3644	*/
3645	switch (enmAccount)
3646	{
3647	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= iPage; break;
3648	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= iPage; break;
3649	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= iPage; break;
3650	default:
3651	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3652	}
3653
3654	/*
3655	* Any threshold stuff to be done here?
3656	*/
3657
3658	return rc;
3659	}
3660
3661
3662	/**
3663	* Free one or more pages.
3664	*
3665	* This is typically used at reset time or power off.
3666	*
3667	* @returns VBox status code:
3668	* @retval xxx
3669	*
3670	* @param pGVM The global (ring-0) VM structure.
3671	* @param idCpu The VCPU id.
3672	* @param cPages The number of pages to allocate.
3673	* @param paPages Pointer to the page descriptors containing the page IDs
3674	* for each page.
3675	* @param enmAccount The account this relates to.
3676	* @thread EMT.
3677	*/
3678	GMMR0DECL(int) GMMR0FreePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3679	{
3680	LogFlow(("GMMR0FreePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3681
3682	/*
3683	* Validate input and get the basics.
3684	*/
3685	PGMM pGMM;
3686	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3687	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3688	if (RT_FAILURE(rc))
3689	return rc;
3690
3691	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3692	AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3693	AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3694
3695	for (unsigned iPage = 0; iPage < cPages; iPage++)
3696	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
3697	/\|\| paPages[iPage].idPage == NIL_GMM_PAGEID/,
3698	("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3699
3700	/*
3701	* Take the semaphore and call the worker function.
3702	*/
3703	gmmR0MutexAcquire(pGMM);
3704	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3705	{
3706	rc = gmmR0FreePages(pGMM, pGVM, cPages, paPages, enmAccount);
3707	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3708	}
3709	else
3710	rc = VERR_GMM_IS_NOT_SANE;
3711	gmmR0MutexRelease(pGMM);
3712	LogFlow(("GMMR0FreePages: returns %Rrc\n", rc));
3713	return rc;
3714	}
3715
3716
3717	/**
3718	* VMMR0 request wrapper for GMMR0FreePages.
3719	*
3720	* @returns see GMMR0FreePages.
3721	* @param pGVM The global (ring-0) VM structure.
3722	* @param idCpu The VCPU id.
3723	* @param pReq Pointer to the request packet.
3724	*/
3725	GMMR0DECL(int) GMMR0FreePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMFREEPAGESREQ pReq)
3726	{
3727	/*
3728	* Validate input and pass it on.
3729	*/
3730	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3731	AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0]),
3732	("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0])),
3733	VERR_INVALID_PARAMETER);
3734	AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages]),
3735	("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages])),
3736	VERR_INVALID_PARAMETER);
3737
3738	return GMMR0FreePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3739	}
3740
3741
3742	/**
3743	* Report back on a memory ballooning request.
3744	*
3745	* The request may or may not have been initiated by the GMM. If it was initiated
3746	* by the GMM it is important that this function is called even if no pages were
3747	* ballooned.
3748	*
3749	* @returns VBox status code:
3750	* @retval VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH
3751	* @retval VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH
3752	* @retval VERR_GMM_OVERCOMMITTED_TRY_AGAIN_IN_A_BIT - reset condition
3753	* indicating that we won't necessarily have sufficient RAM to boot
3754	* the VM again and that it should pause until this changes (we'll try
3755	* balloon some other VM). (For standard deflate we have little choice
3756	* but to hope the VM won't use the memory that was returned to it.)
3757	*
3758	* @param pGVM The global (ring-0) VM structure.
3759	* @param idCpu The VCPU id.
3760	* @param enmAction Inflate/deflate/reset.
3761	* @param cBalloonedPages The number of pages that was ballooned.
3762	*
3763	* @thread EMT(idCpu)
3764	*/
3765	GMMR0DECL(int) GMMR0BalloonedPages(PGVM pGVM, VMCPUID idCpu, GMMBALLOONACTION enmAction, uint32_t cBalloonedPages)
3766	{
3767	LogFlow(("GMMR0BalloonedPages: pGVM=%p enmAction=%d cBalloonedPages=%#x\n",
3768	pGVM, enmAction, cBalloonedPages));
3769
3770	AssertMsgReturn(cBalloonedPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cBalloonedPages), VERR_INVALID_PARAMETER);
3771
3772	/*
3773	* Validate input and get the basics.
3774	*/
3775	PGMM pGMM;
3776	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3777	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3778	if (RT_FAILURE(rc))
3779	return rc;
3780
3781	/*
3782	* Take the semaphore and do some more validations.
3783	*/
3784	gmmR0MutexAcquire(pGMM);
3785	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3786	{
3787	switch (enmAction)
3788	{
3789	case GMMBALLOONACTION_INFLATE:
3790	{
3791	if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cBalloonedPages
3792	<= pGVM->gmm.s.Stats.Reserved.cBasePages))
3793	{
3794	/*
3795	* Record the ballooned memory.
3796	*/
3797	pGMM->cBalloonedPages += cBalloonedPages;
3798	if (pGVM->gmm.s.Stats.cReqBalloonedPages)
3799	{
3800	/* Codepath never taken. Might be interesting in the future to request ballooned memory from guests in low memory conditions.. */
3801	AssertFailed();
3802
3803	pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3804	pGVM->gmm.s.Stats.cReqActuallyBalloonedPages += cBalloonedPages;
3805	Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx Req=%#llx Actual=%#llx (pending)\n",
3806	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages,
3807	pGVM->gmm.s.Stats.cReqBalloonedPages, pGVM->gmm.s.Stats.cReqActuallyBalloonedPages));
3808	}
3809	else
3810	{
3811	pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3812	Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3813	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3814	}
3815	}
3816	else
3817	{
3818	Log(("GMMR0BalloonedPages: cBasePages=%#llx Total=%#llx cBalloonedPages=%#llx Reserved=%#llx\n",
3819	pGVM->gmm.s.Stats.Allocated.cBasePages, pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages,
3820	pGVM->gmm.s.Stats.Reserved.cBasePages));
3821	rc = VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3822	}
3823	break;
3824	}
3825
3826	case GMMBALLOONACTION_DEFLATE:
3827	{
3828	/* Deflate. */
3829	if (pGVM->gmm.s.Stats.cBalloonedPages >= cBalloonedPages)
3830	{
3831	/*
3832	* Record the ballooned memory.
3833	*/
3834	Assert(pGMM->cBalloonedPages >= cBalloonedPages);
3835	pGMM->cBalloonedPages -= cBalloonedPages;
3836	pGVM->gmm.s.Stats.cBalloonedPages -= cBalloonedPages;
3837	if (pGVM->gmm.s.Stats.cReqDeflatePages)
3838	{
3839	AssertFailed(); /* This is path is for later. */
3840	Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx Req=%#llx\n",
3841	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages, pGVM->gmm.s.Stats.cReqDeflatePages));
3842
3843	/*
3844	* Anything we need to do here now when the request has been completed?
3845	*/
3846	pGVM->gmm.s.Stats.cReqDeflatePages = 0;
3847	}
3848	else
3849	Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3850	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3851	}
3852	else
3853	{
3854	Log(("GMMR0BalloonedPages: Total=%#llx cBalloonedPages=%#llx\n", pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages));
3855	rc = VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH;
3856	}
3857	break;
3858	}
3859
3860	case GMMBALLOONACTION_RESET:
3861	{
3862	/* Reset to an empty balloon. */
3863	Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
3864
3865	pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
3866	pGVM->gmm.s.Stats.cBalloonedPages = 0;
3867	break;
3868	}
3869
3870	default:
3871	rc = VERR_INVALID_PARAMETER;
3872	break;
3873	}
3874	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3875	}
3876	else
3877	rc = VERR_GMM_IS_NOT_SANE;
3878
3879	gmmR0MutexRelease(pGMM);
3880	LogFlow(("GMMR0BalloonedPages: returns %Rrc\n", rc));
3881	return rc;
3882	}
3883
3884
3885	/**
3886	* VMMR0 request wrapper for GMMR0BalloonedPages.
3887	*
3888	* @returns see GMMR0BalloonedPages.
3889	* @param pGVM The global (ring-0) VM structure.
3890	* @param idCpu The VCPU id.
3891	* @param pReq Pointer to the request packet.
3892	*/
3893	GMMR0DECL(int) GMMR0BalloonedPagesReq(PGVM pGVM, VMCPUID idCpu, PGMMBALLOONEDPAGESREQ pReq)
3894	{
3895	/*
3896	* Validate input and pass it on.
3897	*/
3898	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3899	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMBALLOONEDPAGESREQ),
3900	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMBALLOONEDPAGESREQ)),
3901	VERR_INVALID_PARAMETER);
3902
3903	return GMMR0BalloonedPages(pGVM, idCpu, pReq->enmAction, pReq->cBalloonedPages);
3904	}
3905
3906
3907	/**
3908	* Return memory statistics for the hypervisor
3909	*
3910	* @returns VBox status code.
3911	* @param pReq Pointer to the request packet.
3912	*/
3913	GMMR0DECL(int) GMMR0QueryHypervisorMemoryStatsReq(PGMMMEMSTATSREQ pReq)
3914	{
3915	/*
3916	* Validate input and pass it on.
3917	*/
3918	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3919	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3920	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3921	VERR_INVALID_PARAMETER);
3922
3923	/*
3924	* Validate input and get the basics.
3925	*/
3926	PGMM pGMM;
3927	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3928	pReq->cAllocPages = pGMM->cAllocatedPages;
3929	pReq->cFreePages = (pGMM->cChunks << (GMM_CHUNK_SHIFT- PAGE_SHIFT)) - pGMM->cAllocatedPages;
3930	pReq->cBalloonedPages = pGMM->cBalloonedPages;
3931	pReq->cMaxPages = pGMM->cMaxPages;
3932	pReq->cSharedPages = pGMM->cDuplicatePages;
3933	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3934
3935	return VINF_SUCCESS;
3936	}
3937
3938
3939	/**
3940	* Return memory statistics for the VM
3941	*
3942	* @returns VBox status code.
3943	* @param pGVM The global (ring-0) VM structure.
3944	* @param idCpu Cpu id.
3945	* @param pReq Pointer to the request packet.
3946	*
3947	* @thread EMT(idCpu)
3948	*/
3949	GMMR0DECL(int) GMMR0QueryMemoryStatsReq(PGVM pGVM, VMCPUID idCpu, PGMMMEMSTATSREQ pReq)
3950	{
3951	/*
3952	* Validate input and pass it on.
3953	*/
3954	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3955	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3956	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3957	VERR_INVALID_PARAMETER);
3958
3959	/*
3960	* Validate input and get the basics.
3961	*/
3962	PGMM pGMM;
3963	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3964	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3965	if (RT_FAILURE(rc))
3966	return rc;
3967
3968	/*
3969	* Take the semaphore and do some more validations.
3970	*/
3971	gmmR0MutexAcquire(pGMM);
3972	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3973	{
3974	pReq->cAllocPages = pGVM->gmm.s.Stats.Allocated.cBasePages;
3975	pReq->cBalloonedPages = pGVM->gmm.s.Stats.cBalloonedPages;
3976	pReq->cMaxPages = pGVM->gmm.s.Stats.Reserved.cBasePages;
3977	pReq->cFreePages = pReq->cMaxPages - pReq->cAllocPages;
3978	}
3979	else
3980	rc = VERR_GMM_IS_NOT_SANE;
3981
3982	gmmR0MutexRelease(pGMM);
3983	LogFlow(("GMMR3QueryVMMemoryStats: returns %Rrc\n", rc));
3984	return rc;
3985	}
3986
3987
3988	/**
3989	* Worker for gmmR0UnmapChunk and gmmr0FreeChunk.
3990	*
3991	* Don't call this in legacy allocation mode!
3992	*
3993	* @returns VBox status code.
3994	* @param pGMM Pointer to the GMM instance data.
3995	* @param pGVM Pointer to the Global VM structure.
3996	* @param pChunk Pointer to the chunk to be unmapped.
3997	*/
3998	static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
3999	{
4000	RT_NOREF_PV(pGMM);
4001	#ifdef GMM_WITH_LEGACY_MODE
4002	Assert(!pGMM->fLegacyAllocationMode \|\| (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE));
4003	#endif
4004
4005	/*
4006	* Find the mapping and try unmapping it.
4007	*/
4008	uint32_t cMappings = pChunk->cMappingsX;
4009	for (uint32_t i = 0; i < cMappings; i++)
4010	{
4011	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4012	if (pChunk->paMappingsX[i].pGVM == pGVM)
4013	{
4014	/* unmap */
4015	int rc = RTR0MemObjFree(pChunk->paMappingsX[i].hMapObj, false /* fFreeMappings (NA) */);
4016	if (RT_SUCCESS(rc))
4017	{
4018	/* update the record. */
4019	cMappings--;
4020	if (i < cMappings)
4021	pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
4022	pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
4023	pChunk->paMappingsX[cMappings].pGVM = NULL;
4024	Assert(pChunk->cMappingsX - 1U == cMappings);
4025	pChunk->cMappingsX = cMappings;
4026	}
4027
4028	return rc;
4029	}
4030	}
4031
4032	Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
4033	return VERR_GMM_CHUNK_NOT_MAPPED;
4034	}
4035
4036
4037	/**
4038	* Unmaps a chunk previously mapped into the address space of the current process.
4039	*
4040	* @returns VBox status code.
4041	* @param pGMM Pointer to the GMM instance data.
4042	* @param pGVM Pointer to the Global VM structure.
4043	* @param pChunk Pointer to the chunk to be unmapped.
4044	* @param fRelaxedSem Whether we can release the semaphore while doing the
4045	* mapping (@c true) or not.
4046	*/
4047	static int gmmR0UnmapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
4048	{
4049	#ifdef GMM_WITH_LEGACY_MODE
4050	if (!pGMM->fLegacyAllocationMode \|\| (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
4051	{
4052	#endif
4053	/*
4054	* Lock the chunk and if possible leave the giant GMM lock.
4055	*/
4056	GMMR0CHUNKMTXSTATE MtxState;
4057	int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4058	fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4059	if (RT_SUCCESS(rc))
4060	{
4061	rc = gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
4062	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4063	}
4064	return rc;
4065	#ifdef GMM_WITH_LEGACY_MODE
4066	}
4067
4068	if (pChunk->hGVM == pGVM->hSelf)
4069	return VINF_SUCCESS;
4070
4071	Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x (legacy)\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
4072	return VERR_GMM_CHUNK_NOT_MAPPED;
4073	#endif
4074	}
4075
4076
4077	/**
4078	* Worker for gmmR0MapChunk.
4079	*
4080	* @returns VBox status code.
4081	* @param pGMM Pointer to the GMM instance data.
4082	* @param pGVM Pointer to the Global VM structure.
4083	* @param pChunk Pointer to the chunk to be mapped.
4084	* @param ppvR3 Where to store the ring-3 address of the mapping.
4085	* In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4086	* contain the address of the existing mapping.
4087	*/
4088	static int gmmR0MapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4089	{
4090	#ifdef GMM_WITH_LEGACY_MODE
4091	/*
4092	* If we're in legacy mode this is simple.
4093	*/
4094	if (pGMM->fLegacyAllocationMode && !(pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
4095	{
4096	if (pChunk->hGVM != pGVM->hSelf)
4097	{
4098	Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4099	return VERR_GMM_CHUNK_NOT_FOUND;
4100	}
4101
4102	*ppvR3 = RTR0MemObjAddressR3(pChunk->hMemObj);
4103	return VINF_SUCCESS;
4104	}
4105	#else
4106	RT_NOREF(pGMM);
4107	#endif
4108
4109	/*
4110	* Check to see if the chunk is already mapped.
4111	*/
4112	for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4113	{
4114	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4115	if (pChunk->paMappingsX[i].pGVM == pGVM)
4116	{
4117	*ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4118	Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4119	#ifdef VBOX_WITH_PAGE_SHARING
4120	/* The ring-3 chunk cache can be out of sync; don't fail. */
4121	return VINF_SUCCESS;
4122	#else
4123	return VERR_GMM_CHUNK_ALREADY_MAPPED;
4124	#endif
4125	}
4126	}
4127
4128	/*
4129	* Do the mapping.
4130	*/
4131	RTR0MEMOBJ hMapObj;
4132	int rc = RTR0MemObjMapUser(&hMapObj, pChunk->hMemObj, (RTR3PTR)-1, 0, RTMEM_PROT_READ \| RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4133	if (RT_SUCCESS(rc))
4134	{
4135	/* reallocate the array? assumes few users per chunk (usually one). */
4136	unsigned iMapping = pChunk->cMappingsX;
4137	if ( iMapping <= 3
4138	\|\| (iMapping & 3) == 0)
4139	{
4140	unsigned cNewSize = iMapping <= 3
4141	? iMapping + 1
4142	: iMapping + 4;
4143	Assert(cNewSize < 4 \|\| RT_ALIGN_32(cNewSize, 4) == cNewSize);
4144	if (RT_UNLIKELY(cNewSize > UINT16_MAX))
4145	{
4146	rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4147	return VERR_GMM_TOO_MANY_CHUNK_MAPPINGS;
4148	}
4149
4150	void pvMappings = RTMemRealloc(pChunk->paMappingsX, cNewSize sizeof(pChunk->paMappingsX[0]));
4151	if (RT_UNLIKELY(!pvMappings))
4152	{
4153	rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4154	return VERR_NO_MEMORY;
4155	}
4156	pChunk->paMappingsX = (PGMMCHUNKMAP)pvMappings;
4157	}
4158
4159	/* insert new entry */
4160	pChunk->paMappingsX[iMapping].hMapObj = hMapObj;
4161	pChunk->paMappingsX[iMapping].pGVM = pGVM;
4162	Assert(pChunk->cMappingsX == iMapping);
4163	pChunk->cMappingsX = iMapping + 1;
4164
4165	*ppvR3 = RTR0MemObjAddressR3(hMapObj);
4166	}
4167
4168	return rc;
4169	}
4170
4171
4172	/**
4173	* Maps a chunk into the user address space of the current process.
4174	*
4175	* @returns VBox status code.
4176	* @param pGMM Pointer to the GMM instance data.
4177	* @param pGVM Pointer to the Global VM structure.
4178	* @param pChunk Pointer to the chunk to be mapped.
4179	* @param fRelaxedSem Whether we can release the semaphore while doing the
4180	* mapping (@c true) or not.
4181	* @param ppvR3 Where to store the ring-3 address of the mapping.
4182	* In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4183	* contain the address of the existing mapping.
4184	*/
4185	static int gmmR0MapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem, PRTR3PTR ppvR3)
4186	{
4187	/*
4188	* Take the chunk lock and leave the giant GMM lock when possible, then
4189	* call the worker function.
4190	*/
4191	GMMR0CHUNKMTXSTATE MtxState;
4192	int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4193	fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4194	if (RT_SUCCESS(rc))
4195	{
4196	rc = gmmR0MapChunkLocked(pGMM, pGVM, pChunk, ppvR3);
4197	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4198	}
4199
4200	return rc;
4201	}
4202
4203
4204
4205	#if defined(VBOX_WITH_PAGE_SHARING) \|\| (defined(VBOX_STRICT) && HC_ARCH_BITS == 64)
4206	/**
4207	* Check if a chunk is mapped into the specified VM
4208	*
4209	* @returns mapped yes/no
4210	* @param pGMM Pointer to the GMM instance.
4211	* @param pGVM Pointer to the Global VM structure.
4212	* @param pChunk Pointer to the chunk to be mapped.
4213	* @param ppvR3 Where to store the ring-3 address of the mapping.
4214	*/
4215	static bool gmmR0IsChunkMapped(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4216	{
4217	GMMR0CHUNKMTXSTATE MtxState;
4218	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
4219	for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4220	{
4221	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4222	if (pChunk->paMappingsX[i].pGVM == pGVM)
4223	{
4224	*ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4225	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4226	return true;
4227	}
4228	}
4229	*ppvR3 = NULL;
4230	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4231	return false;
4232	}
4233	#endif /* VBOX_WITH_PAGE_SHARING \|\| (VBOX_STRICT && 64-BIT) */
4234
4235
4236	/**
4237	* Map a chunk and/or unmap another chunk.
4238	*
4239	* The mapping and unmapping applies to the current process.
4240	*
4241	* This API does two things because it saves a kernel call per mapping when
4242	* when the ring-3 mapping cache is full.
4243	*
4244	* @returns VBox status code.
4245	* @param pGVM The global (ring-0) VM structure.
4246	* @param idChunkMap The chunk to map. NIL_GMM_CHUNKID if nothing to map.
4247	* @param idChunkUnmap The chunk to unmap. NIL_GMM_CHUNKID if nothing to unmap.
4248	* @param ppvR3 Where to store the address of the mapped chunk. NULL is ok if nothing to map.
4249	* @thread EMT ???
4250	*/
4251	GMMR0DECL(int) GMMR0MapUnmapChunk(PGVM pGVM, uint32_t idChunkMap, uint32_t idChunkUnmap, PRTR3PTR ppvR3)
4252	{
4253	LogFlow(("GMMR0MapUnmapChunk: pGVM=%p idChunkMap=%#x idChunkUnmap=%#x ppvR3=%p\n",
4254	pGVM, idChunkMap, idChunkUnmap, ppvR3));
4255
4256	/*
4257	* Validate input and get the basics.
4258	*/
4259	PGMM pGMM;
4260	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4261	int rc = GVMMR0ValidateGVM(pGVM);
4262	if (RT_FAILURE(rc))
4263	return rc;
4264
4265	AssertCompile(NIL_GMM_CHUNKID == 0);
4266	AssertMsgReturn(idChunkMap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkMap), VERR_INVALID_PARAMETER);
4267	AssertMsgReturn(idChunkUnmap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkUnmap), VERR_INVALID_PARAMETER);
4268
4269	if ( idChunkMap == NIL_GMM_CHUNKID
4270	&& idChunkUnmap == NIL_GMM_CHUNKID)
4271	return VERR_INVALID_PARAMETER;
4272
4273	if (idChunkMap != NIL_GMM_CHUNKID)
4274	{
4275	AssertPtrReturn(ppvR3, VERR_INVALID_POINTER);
4276	*ppvR3 = NIL_RTR3PTR;
4277	}
4278
4279	/*
4280	* Take the semaphore and do the work.
4281	*
4282	* The unmapping is done last since it's easier to undo a mapping than
4283	* undoing an unmapping. The ring-3 mapping cache cannot not be so big
4284	* that it pushes the user virtual address space to within a chunk of
4285	* it it's limits, so, no problem here.
4286	*/
4287	gmmR0MutexAcquire(pGMM);
4288	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4289	{
4290	PGMMCHUNK pMap = NULL;
4291	if (idChunkMap != NIL_GVM_HANDLE)
4292	{
4293	pMap = gmmR0GetChunk(pGMM, idChunkMap);
4294	if (RT_LIKELY(pMap))
4295	rc = gmmR0MapChunk(pGMM, pGVM, pMap, true /fRelaxedSem/, ppvR3);
4296	else
4297	{
4298	Log(("GMMR0MapUnmapChunk: idChunkMap=%#x\n", idChunkMap));
4299	rc = VERR_GMM_CHUNK_NOT_FOUND;
4300	}
4301	}
4302	/** @todo split this operation, the bail out might (theoretcially) not be
4303	* entirely safe. */
4304
4305	if ( idChunkUnmap != NIL_GMM_CHUNKID
4306	&& RT_SUCCESS(rc))
4307	{
4308	PGMMCHUNK pUnmap = gmmR0GetChunk(pGMM, idChunkUnmap);
4309	if (RT_LIKELY(pUnmap))
4310	rc = gmmR0UnmapChunk(pGMM, pGVM, pUnmap, true /fRelaxedSem/);
4311	else
4312	{
4313	Log(("GMMR0MapUnmapChunk: idChunkUnmap=%#x\n", idChunkUnmap));
4314	rc = VERR_GMM_CHUNK_NOT_FOUND;
4315	}
4316
4317	if (RT_FAILURE(rc) && pMap)
4318	gmmR0UnmapChunk(pGMM, pGVM, pMap, false /fRelaxedSem/);
4319	}
4320
4321	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4322	}
4323	else
4324	rc = VERR_GMM_IS_NOT_SANE;
4325	gmmR0MutexRelease(pGMM);
4326
4327	LogFlow(("GMMR0MapUnmapChunk: returns %Rrc\n", rc));
4328	return rc;
4329	}
4330
4331
4332	/**
4333	* VMMR0 request wrapper for GMMR0MapUnmapChunk.
4334	*
4335	* @returns see GMMR0MapUnmapChunk.
4336	* @param pGVM The global (ring-0) VM structure.
4337	* @param pReq Pointer to the request packet.
4338	*/
4339	GMMR0DECL(int) GMMR0MapUnmapChunkReq(PGVM pGVM, PGMMMAPUNMAPCHUNKREQ pReq)
4340	{
4341	/*
4342	* Validate input and pass it on.
4343	*/
4344	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4345	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4346
4347	return GMMR0MapUnmapChunk(pGVM, pReq->idChunkMap, pReq->idChunkUnmap, &pReq->pvR3);
4348	}
4349
4350
4351	/**
4352	* Legacy mode API for supplying pages.
4353	*
4354	* The specified user address points to a allocation chunk sized block that
4355	* will be locked down and used by the GMM when the GM asks for pages.
4356	*
4357	* @returns VBox status code.
4358	* @param pGVM The global (ring-0) VM structure.
4359	* @param idCpu The VCPU id.
4360	* @param pvR3 Pointer to the chunk size memory block to lock down.
4361	*/
4362	GMMR0DECL(int) GMMR0SeedChunk(PGVM pGVM, VMCPUID idCpu, RTR3PTR pvR3)
4363	{
4364	#ifdef GMM_WITH_LEGACY_MODE
4365	/*
4366	* Validate input and get the basics.
4367	*/
4368	PGMM pGMM;
4369	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4370	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4371	if (RT_FAILURE(rc))
4372	return rc;
4373
4374	AssertPtrReturn(pvR3, VERR_INVALID_POINTER);
4375	AssertReturn(!(PAGE_OFFSET_MASK & pvR3), VERR_INVALID_POINTER);
4376
4377	if (!pGMM->fLegacyAllocationMode)
4378	{
4379	Log(("GMMR0SeedChunk: not in legacy allocation mode!\n"));
4380	return VERR_NOT_SUPPORTED;
4381	}
4382
4383	/*
4384	* Lock the memory and add it as new chunk with our hGVM.
4385	* (The GMM locking is done inside gmmR0RegisterChunk.)
4386	*/
4387	RTR0MEMOBJ hMemObj;
4388	rc = RTR0MemObjLockUser(&hMemObj, pvR3, GMM_CHUNK_SIZE, RTMEM_PROT_READ \| RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4389	if (RT_SUCCESS(rc))
4390	{
4391	rc = gmmR0RegisterChunk(pGMM, &pGVM->gmm.s.Private, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_SEEDED, NULL);
4392	if (RT_SUCCESS(rc))
4393	gmmR0MutexRelease(pGMM);
4394	else
4395	RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
4396	}
4397
4398	LogFlow(("GMMR0SeedChunk: rc=%d (pvR3=%p)\n", rc, pvR3));
4399	return rc;
4400	#else
4401	RT_NOREF(pGVM, idCpu, pvR3);
4402	return VERR_NOT_SUPPORTED;
4403	#endif
4404	}
4405
4406	#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
4407
4408	/**
4409	* Gets the ring-0 virtual address for the given page.
4410	*
4411	* @returns VBox status code.
4412	* @param pGVM Pointer to the kernel-only VM instace data.
4413	* @param idPage The page ID.
4414	* @param ppv Where to store the address.
4415	* @thread EMT
4416	*/
4417	GMMR0DECL(int) GMMR0PageIdToVirt(PGVM pGVM, uint32_t idPage, void **ppv)
4418	{
4419	*ppv = NULL;
4420	PGMM pGMM;
4421	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4422	gmmR0MutexAcquire(pGMM); /** @todo shared access */
4423
4424	int rc;
4425	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4426	if (pChunk)
4427	{
4428	const GMMPAGE *pPage = &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
4429	if (RT_LIKELY( ( GMM_PAGE_IS_PRIVATE(pPage)
4430	&& pPage->Private.hGVM == pGVM->hSelf)
4431	\|\| GMM_PAGE_IS_SHARED(pPage)))
4432	{
4433	AssertPtr(pChunk->pbMapping);
4434	*ppv = &pChunk->pbMapping[(idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT];
4435	rc = VINF_SUCCESS;
4436	}
4437	else
4438	rc = VERR_GMM_NOT_PAGE_OWNER;
4439	}
4440	else
4441	rc = VERR_GMM_PAGE_NOT_FOUND;
4442
4443	gmmR0MutexRelease(pGMM);
4444	return rc;
4445	}
4446
4447	#endif
4448
4449	#ifdef VBOX_WITH_PAGE_SHARING
4450
4451	# ifdef VBOX_STRICT
4452	/**
4453	* For checksumming shared pages in strict builds.
4454	*
4455	* The purpose is making sure that a page doesn't change.
4456	*
4457	* @returns Checksum, 0 on failure.
4458	* @param pGMM The GMM instance data.
4459	* @param pGVM Pointer to the kernel-only VM instace data.
4460	* @param idPage The page ID.
4461	*/
4462	static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage)
4463	{
4464	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4465	AssertMsgReturn(pChunk, ("idPage=%#x\n", idPage), 0);
4466
4467	uint8_t *pbChunk;
4468	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4469	return 0;
4470	uint8_t const *pbPage = pbChunk + ((idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4471
4472	return RTCrc32(pbPage, PAGE_SIZE);
4473	}
4474	# endif /* VBOX_STRICT */
4475
4476
4477	/**
4478	* Calculates the module hash value.
4479	*
4480	* @returns Hash value.
4481	* @param pszModuleName The module name.
4482	* @param pszVersion The module version string.
4483	*/
4484	static uint32_t gmmR0ShModCalcHash(const char pszModuleName, const char pszVersion)
4485	{
4486	return RTStrHash1ExN(3, pszModuleName, RTSTR_MAX, "::", (size_t)2, pszVersion, RTSTR_MAX);
4487	}
4488
4489
4490	/**
4491	* Finds a global module.
4492	*
4493	* @returns Pointer to the global module on success, NULL if not found.
4494	* @param pGMM The GMM instance data.
4495	* @param uHash The hash as calculated by gmmR0ShModCalcHash.
4496	* @param cbModule The module size.
4497	* @param enmGuestOS The guest OS type.
4498	* @param cRegions The number of regions.
4499	* @param pszModuleName The module name.
4500	* @param pszVersion The module version.
4501	* @param paRegions The region descriptions.
4502	*/
4503	static PGMMSHAREDMODULE gmmR0ShModFindGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4504	uint32_t cRegions, const char pszModuleName, const char pszVersion,
4505	struct VMMDEVSHAREDREGIONDESC const *paRegions)
4506	{
4507	for (PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTAvllU32Get(&pGMM->pGlobalSharedModuleTree, uHash);
4508	pGblMod;
4509	pGblMod = (PGMMSHAREDMODULE)pGblMod->Core.pList)
4510	{
4511	if (pGblMod->cbModule != cbModule)
4512	continue;
4513	if (pGblMod->enmGuestOS != enmGuestOS)
4514	continue;
4515	if (pGblMod->cRegions != cRegions)
4516	continue;
4517	if (strcmp(pGblMod->szName, pszModuleName))
4518	continue;
4519	if (strcmp(pGblMod->szVersion, pszVersion))
4520	continue;
4521
4522	uint32_t i;
4523	for (i = 0; i < cRegions; i++)
4524	{
4525	uint32_t off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4526	if (pGblMod->aRegions[i].off != off)
4527	break;
4528
4529	uint32_t cb = RT_ALIGN_32(paRegions[i].cbRegion + off, PAGE_SIZE);
4530	if (pGblMod->aRegions[i].cb != cb)
4531	break;
4532	}
4533
4534	if (i == cRegions)
4535	return pGblMod;
4536	}
4537
4538	return NULL;
4539	}
4540
4541
4542	/**
4543	* Creates a new global module.
4544	*
4545	* @returns VBox status code.
4546	* @param pGMM The GMM instance data.
4547	* @param uHash The hash as calculated by gmmR0ShModCalcHash.
4548	* @param cbModule The module size.
4549	* @param enmGuestOS The guest OS type.
4550	* @param cRegions The number of regions.
4551	* @param pszModuleName The module name.
4552	* @param pszVersion The module version.
4553	* @param paRegions The region descriptions.
4554	* @param ppGblMod Where to return the new module on success.
4555	*/
4556	static int gmmR0ShModNewGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4557	uint32_t cRegions, const char pszModuleName, const char pszVersion,
4558	struct VMMDEVSHAREDREGIONDESC const paRegions, PGMMSHAREDMODULE ppGblMod)
4559	{
4560	Log(("gmmR0ShModNewGlobal: %s %s size %#x os %u rgn %u\n", pszModuleName, pszVersion, cbModule, enmGuestOS, cRegions));
4561	if (pGMM->cShareableModules >= GMM_MAX_SHARED_GLOBAL_MODULES)
4562	{
4563	Log(("gmmR0ShModNewGlobal: Too many modules\n"));
4564	return VERR_GMM_TOO_MANY_GLOBAL_MODULES;
4565	}
4566
4567	PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULE, aRegions[cRegions]));
4568	if (!pGblMod)
4569	{
4570	Log(("gmmR0ShModNewGlobal: No memory\n"));
4571	return VERR_NO_MEMORY;
4572	}
4573
4574	pGblMod->Core.Key = uHash;
4575	pGblMod->cbModule = cbModule;
4576	pGblMod->cRegions = cRegions;
4577	pGblMod->cUsers = 1;
4578	pGblMod->enmGuestOS = enmGuestOS;
4579	strcpy(pGblMod->szName, pszModuleName);
4580	strcpy(pGblMod->szVersion, pszVersion);
4581
4582	for (uint32_t i = 0; i < cRegions; i++)
4583	{
4584	Log(("gmmR0ShModNewGlobal: rgn[%u]=%RGvLB%#x\n", i, paRegions[i].GCRegionAddr, paRegions[i].cbRegion));
4585	pGblMod->aRegions[i].off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4586	pGblMod->aRegions[i].cb = paRegions[i].cbRegion + pGblMod->aRegions[i].off;
4587	pGblMod->aRegions[i].cb = RT_ALIGN_32(pGblMod->aRegions[i].cb, PAGE_SIZE);
4588	pGblMod->aRegions[i].paidPages = NULL; /* allocated when needed. */
4589	}
4590
4591	bool fInsert = RTAvllU32Insert(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4592	Assert(fInsert); NOREF(fInsert);
4593	pGMM->cShareableModules++;
4594
4595	*ppGblMod = pGblMod;
4596	return VINF_SUCCESS;
4597	}
4598
4599
4600	/**
4601	* Deletes a global module which is no longer referenced by anyone.
4602	*
4603	* @param pGMM The GMM instance data.
4604	* @param pGblMod The module to delete.
4605	*/
4606	static void gmmR0ShModDeleteGlobal(PGMM pGMM, PGMMSHAREDMODULE pGblMod)
4607	{
4608	Assert(pGblMod->cUsers == 0);
4609	Assert(pGMM->cShareableModules > 0 && pGMM->cShareableModules <= GMM_MAX_SHARED_GLOBAL_MODULES);
4610
4611	void *pvTest = RTAvllU32RemoveNode(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4612	Assert(pvTest == pGblMod); NOREF(pvTest);
4613	pGMM->cShareableModules--;
4614
4615	uint32_t i = pGblMod->cRegions;
4616	while (i-- > 0)
4617	{
4618	if (pGblMod->aRegions[i].paidPages)
4619	{
4620	/* We don't doing anything to the pages as they are handled by the
4621	copy-on-write mechanism in PGM. */
4622	RTMemFree(pGblMod->aRegions[i].paidPages);
4623	pGblMod->aRegions[i].paidPages = NULL;
4624	}
4625	}
4626	RTMemFree(pGblMod);
4627	}
4628
4629
4630	static int gmmR0ShModNewPerVM(PGVM pGVM, RTGCPTR GCBaseAddr, uint32_t cRegions, const VMMDEVSHAREDREGIONDESC *paRegions,
4631	PGMMSHAREDMODULEPERVM *ppRecVM)
4632	{
4633	if (pGVM->gmm.s.Stats.cShareableModules >= GMM_MAX_SHARED_PER_VM_MODULES)
4634	return VERR_GMM_TOO_MANY_PER_VM_MODULES;
4635
4636	PGMMSHAREDMODULEPERVM pRecVM;
4637	pRecVM = (PGMMSHAREDMODULEPERVM)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULEPERVM, aRegionsGCPtrs[cRegions]));
4638	if (!pRecVM)
4639	return VERR_NO_MEMORY;
4640
4641	pRecVM->Core.Key = GCBaseAddr;
4642	for (uint32_t i = 0; i < cRegions; i++)
4643	pRecVM->aRegionsGCPtrs[i] = paRegions[i].GCRegionAddr;
4644
4645	bool fInsert = RTAvlGCPtrInsert(&pGVM->gmm.s.pSharedModuleTree, &pRecVM->Core);
4646	Assert(fInsert); NOREF(fInsert);
4647	pGVM->gmm.s.Stats.cShareableModules++;
4648
4649	*ppRecVM = pRecVM;
4650	return VINF_SUCCESS;
4651	}
4652
4653
4654	static void gmmR0ShModDeletePerVM(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULEPERVM pRecVM, bool fRemove)
4655	{
4656	/*
4657	* Free the per-VM module.
4658	*/
4659	PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
4660	pRecVM->pGlobalModule = NULL;
4661
4662	if (fRemove)
4663	{
4664	void *pvTest = RTAvlGCPtrRemove(&pGVM->gmm.s.pSharedModuleTree, pRecVM->Core.Key);
4665	Assert(pvTest == &pRecVM->Core); NOREF(pvTest);
4666	}
4667
4668	RTMemFree(pRecVM);
4669
4670	/*
4671	* Release the global module.
4672	* (In the registration bailout case, it might not be.)
4673	*/
4674	if (pGblMod)
4675	{
4676	Assert(pGblMod->cUsers > 0);
4677	pGblMod->cUsers--;
4678	if (pGblMod->cUsers == 0)
4679	gmmR0ShModDeleteGlobal(pGMM, pGblMod);
4680	}
4681	}
4682
4683	#endif /* VBOX_WITH_PAGE_SHARING */
4684
4685	/**
4686	* Registers a new shared module for the VM.
4687	*
4688	* @returns VBox status code.
4689	* @param pGVM The global (ring-0) VM structure.
4690	* @param idCpu The VCPU id.
4691	* @param enmGuestOS The guest OS type.
4692	* @param pszModuleName The module name.
4693	* @param pszVersion The module version.
4694	* @param GCPtrModBase The module base address.
4695	* @param cbModule The module size.
4696	* @param cRegions The mumber of shared region descriptors.
4697	* @param paRegions Pointer to an array of shared region(s).
4698	* @thread EMT(idCpu)
4699	*/
4700	GMMR0DECL(int) GMMR0RegisterSharedModule(PGVM pGVM, VMCPUID idCpu, VBOXOSFAMILY enmGuestOS, char *pszModuleName,
4701	char *pszVersion, RTGCPTR GCPtrModBase, uint32_t cbModule,
4702	uint32_t cRegions, struct VMMDEVSHAREDREGIONDESC const *paRegions)
4703	{
4704	#ifdef VBOX_WITH_PAGE_SHARING
4705	/*
4706	* Validate input and get the basics.
4707	*
4708	* Note! Turns out the module size does necessarily match the size of the
4709	* regions. (iTunes on XP)
4710	*/
4711	PGMM pGMM;
4712	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4713	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4714	if (RT_FAILURE(rc))
4715	return rc;
4716
4717	if (RT_UNLIKELY(cRegions > VMMDEVSHAREDREGIONDESC_MAX))
4718	return VERR_GMM_TOO_MANY_REGIONS;
4719
4720	if (RT_UNLIKELY(cbModule == 0 \|\| cbModule > _1G))
4721	return VERR_GMM_BAD_SHARED_MODULE_SIZE;
4722
4723	uint32_t cbTotal = 0;
4724	for (uint32_t i = 0; i < cRegions; i++)
4725	{
4726	if (RT_UNLIKELY(paRegions[i].cbRegion == 0 \|\| paRegions[i].cbRegion > _1G))
4727	return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4728
4729	cbTotal += paRegions[i].cbRegion;
4730	if (RT_UNLIKELY(cbTotal > _1G))
4731	return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4732	}
4733
4734	AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4735	if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4736	return VERR_GMM_MODULE_NAME_TOO_LONG;
4737
4738	AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4739	if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4740	return VERR_GMM_MODULE_NAME_TOO_LONG;
4741
4742	uint32_t const uHash = gmmR0ShModCalcHash(pszModuleName, pszVersion);
4743	Log(("GMMR0RegisterSharedModule %s %s base %RGv size %x hash %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule, uHash));
4744
4745	/*
4746	* Take the semaphore and do some more validations.
4747	*/
4748	gmmR0MutexAcquire(pGMM);
4749	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4750	{
4751	/*
4752	* Check if this module is already locally registered and register
4753	* it if it isn't. The base address is a unique module identifier
4754	* locally.
4755	*/
4756	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4757	bool fNewModule = pRecVM == NULL;
4758	if (fNewModule)
4759	{
4760	rc = gmmR0ShModNewPerVM(pGVM, GCPtrModBase, cRegions, paRegions, &pRecVM);
4761	if (RT_SUCCESS(rc))
4762	{
4763	/*
4764	* Find a matching global module, register a new one if needed.
4765	*/
4766	PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4767	pszModuleName, pszVersion, paRegions);
4768	if (!pGblMod)
4769	{
4770	Assert(fNewModule);
4771	rc = gmmR0ShModNewGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4772	pszModuleName, pszVersion, paRegions, &pGblMod);
4773	if (RT_SUCCESS(rc))
4774	{
4775	pRecVM->pGlobalModule = pGblMod; /* (One referenced returned by gmmR0ShModNewGlobal.) */
4776	Log(("GMMR0RegisterSharedModule: new module %s %s\n", pszModuleName, pszVersion));
4777	}
4778	else
4779	gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /fRemove/);
4780	}
4781	else
4782	{
4783	Assert(pGblMod->cUsers > 0 && pGblMod->cUsers < UINT32_MAX / 2);
4784	pGblMod->cUsers++;
4785	pRecVM->pGlobalModule = pGblMod;
4786
4787	Log(("GMMR0RegisterSharedModule: new per vm module %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4788	}
4789	}
4790	}
4791	else
4792	{
4793	/*
4794	* Attempt to re-register an existing module.
4795	*/
4796	PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4797	pszModuleName, pszVersion, paRegions);
4798	if (pRecVM->pGlobalModule == pGblMod)
4799	{
4800	Log(("GMMR0RegisterSharedModule: already registered %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4801	rc = VINF_GMM_SHARED_MODULE_ALREADY_REGISTERED;
4802	}
4803	else
4804	{
4805	/** @todo may have to unregister+register when this happens in case it's caused
4806	* by VBoxService crashing and being restarted... */
4807	Log(("GMMR0RegisterSharedModule: Address clash!\n"
4808	" incoming at %RGvLB%#x %s %s rgns %u\n"
4809	" existing at %RGvLB%#x %s %s rgns %u\n",
4810	GCPtrModBase, cbModule, pszModuleName, pszVersion, cRegions,
4811	pRecVM->Core.Key, pRecVM->pGlobalModule->cbModule, pRecVM->pGlobalModule->szName,
4812	pRecVM->pGlobalModule->szVersion, pRecVM->pGlobalModule->cRegions));
4813	rc = VERR_GMM_SHARED_MODULE_ADDRESS_CLASH;
4814	}
4815	}
4816	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4817	}
4818	else
4819	rc = VERR_GMM_IS_NOT_SANE;
4820
4821	gmmR0MutexRelease(pGMM);
4822	return rc;
4823	#else
4824
4825	NOREF(pGVM); NOREF(idCpu); NOREF(enmGuestOS); NOREF(pszModuleName); NOREF(pszVersion);
4826	NOREF(GCPtrModBase); NOREF(cbModule); NOREF(cRegions); NOREF(paRegions);
4827	return VERR_NOT_IMPLEMENTED;
4828	#endif
4829	}
4830
4831
4832	/**
4833	* VMMR0 request wrapper for GMMR0RegisterSharedModule.
4834	*
4835	* @returns see GMMR0RegisterSharedModule.
4836	* @param pGVM The global (ring-0) VM structure.
4837	* @param idCpu The VCPU id.
4838	* @param pReq Pointer to the request packet.
4839	*/
4840	GMMR0DECL(int) GMMR0RegisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMREGISTERSHAREDMODULEREQ pReq)
4841	{
4842	/*
4843	* Validate input and pass it on.
4844	*/
4845	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4846	AssertMsgReturn( pReq->Hdr.cbReq >= sizeof(*pReq)
4847	&& pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMREGISTERSHAREDMODULEREQ, aRegions[pReq->cRegions]),
4848	("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4849
4850	/* Pass back return code in the request packet to preserve informational codes. (VMMR3CallR0 chokes on them) */
4851	pReq->rc = GMMR0RegisterSharedModule(pGVM, idCpu, pReq->enmGuestOS, pReq->szName, pReq->szVersion,
4852	pReq->GCBaseAddr, pReq->cbModule, pReq->cRegions, pReq->aRegions);
4853	return VINF_SUCCESS;
4854	}
4855
4856
4857	/**
4858	* Unregisters a shared module for the VM
4859	*
4860	* @returns VBox status code.
4861	* @param pGVM The global (ring-0) VM structure.
4862	* @param idCpu The VCPU id.
4863	* @param pszModuleName The module name.
4864	* @param pszVersion The module version.
4865	* @param GCPtrModBase The module base address.
4866	* @param cbModule The module size.
4867	*/
4868	GMMR0DECL(int) GMMR0UnregisterSharedModule(PGVM pGVM, VMCPUID idCpu, char pszModuleName, char pszVersion,
4869	RTGCPTR GCPtrModBase, uint32_t cbModule)
4870	{
4871	#ifdef VBOX_WITH_PAGE_SHARING
4872	/*
4873	* Validate input and get the basics.
4874	*/
4875	PGMM pGMM;
4876	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4877	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4878	if (RT_FAILURE(rc))
4879	return rc;
4880
4881	AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4882	AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4883	if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4884	return VERR_GMM_MODULE_NAME_TOO_LONG;
4885	if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4886	return VERR_GMM_MODULE_NAME_TOO_LONG;
4887
4888	Log(("GMMR0UnregisterSharedModule %s %s base=%RGv size %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule));
4889
4890	/*
4891	* Take the semaphore and do some more validations.
4892	*/
4893	gmmR0MutexAcquire(pGMM);
4894	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4895	{
4896	/*
4897	* Locate and remove the specified module.
4898	*/
4899	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4900	if (pRecVM)
4901	{
4902	/** @todo Do we need to do more validations here, like that the
4903	* name + version + cbModule matches? */
4904	NOREF(cbModule);
4905	Assert(pRecVM->pGlobalModule);
4906	gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /fRemove/);
4907	}
4908	else
4909	rc = VERR_GMM_SHARED_MODULE_NOT_FOUND;
4910
4911	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4912	}
4913	else
4914	rc = VERR_GMM_IS_NOT_SANE;
4915
4916	gmmR0MutexRelease(pGMM);
4917	return rc;
4918	#else
4919
4920	NOREF(pGVM); NOREF(idCpu); NOREF(pszModuleName); NOREF(pszVersion); NOREF(GCPtrModBase); NOREF(cbModule);
4921	return VERR_NOT_IMPLEMENTED;
4922	#endif
4923	}
4924
4925
4926	/**
4927	* VMMR0 request wrapper for GMMR0UnregisterSharedModule.
4928	*
4929	* @returns see GMMR0UnregisterSharedModule.
4930	* @param pGVM The global (ring-0) VM structure.
4931	* @param idCpu The VCPU id.
4932	* @param pReq Pointer to the request packet.
4933	*/
4934	GMMR0DECL(int) GMMR0UnregisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMUNREGISTERSHAREDMODULEREQ pReq)
4935	{
4936	/*
4937	* Validate input and pass it on.
4938	*/
4939	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4940	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4941
4942	return GMMR0UnregisterSharedModule(pGVM, idCpu, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule);
4943	}
4944
4945	#ifdef VBOX_WITH_PAGE_SHARING
4946
4947	/**
4948	* Increase the use count of a shared page, the page is known to exist and be valid and such.
4949	*
4950	* @param pGMM Pointer to the GMM instance.
4951	* @param pGVM Pointer to the GVM instance.
4952	* @param pPage The page structure.
4953	*/
4954	DECLINLINE(void) gmmR0UseSharedPage(PGMM pGMM, PGVM pGVM, PGMMPAGE pPage)
4955	{
4956	Assert(pGMM->cSharedPages > 0);
4957	Assert(pGMM->cAllocatedPages > 0);
4958
4959	pGMM->cDuplicatePages++;
4960
4961	pPage->Shared.cRefs++;
4962	pGVM->gmm.s.Stats.cSharedPages++;
4963	pGVM->gmm.s.Stats.Allocated.cBasePages++;
4964	}
4965
4966
4967	/**
4968	* Converts a private page to a shared page, the page is known to exist and be valid and such.
4969	*
4970	* @param pGMM Pointer to the GMM instance.
4971	* @param pGVM Pointer to the GVM instance.
4972	* @param HCPhys Host physical address
4973	* @param idPage The Page ID
4974	* @param pPage The page structure.
4975	* @param pPageDesc Shared page descriptor
4976	*/
4977	DECLINLINE(void) gmmR0ConvertToSharedPage(PGMM pGMM, PGVM pGVM, RTHCPHYS HCPhys, uint32_t idPage, PGMMPAGE pPage,
4978	PGMMSHAREDPAGEDESC pPageDesc)
4979	{
4980	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4981	Assert(pChunk);
4982	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
4983	Assert(GMM_PAGE_IS_PRIVATE(pPage));
4984
4985	pChunk->cPrivate--;
4986	pChunk->cShared++;
4987
4988	pGMM->cSharedPages++;
4989
4990	pGVM->gmm.s.Stats.cSharedPages++;
4991	pGVM->gmm.s.Stats.cPrivatePages--;
4992
4993	/* Modify the page structure. */
4994	pPage->Shared.pfn = (uint32_t)(uint64_t)(HCPhys >> PAGE_SHIFT);
4995	pPage->Shared.cRefs = 1;
4996	#ifdef VBOX_STRICT
4997	pPageDesc->u32StrictChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
4998	pPage->Shared.u14Checksum = pPageDesc->u32StrictChecksum;
4999	#else
5000	NOREF(pPageDesc);
5001	pPage->Shared.u14Checksum = 0;
5002	#endif
5003	pPage->Shared.u2State = GMM_PAGE_STATE_SHARED;
5004	}
5005
5006
5007	static int gmmR0SharedModuleCheckPageFirstTime(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULE pModule,
5008	unsigned idxRegion, unsigned idxPage,
5009	PGMMSHAREDPAGEDESC pPageDesc, PGMMSHAREDREGIONDESC pGlobalRegion)
5010	{
5011	NOREF(pModule);
5012
5013	/* Easy case: just change the internal page type. */
5014	PGMMPAGE pPage = gmmR0GetPage(pGMM, pPageDesc->idPage);
5015	AssertMsgReturn(pPage, ("idPage=%#x (GCPhys=%RGp HCPhys=%RHp idxRegion=%#x idxPage=%#x) #1\n",
5016	pPageDesc->idPage, pPageDesc->GCPhys, pPageDesc->HCPhys, idxRegion, idxPage),
5017	VERR_PGM_PHYS_INVALID_PAGE_ID);
5018	NOREF(idxRegion);
5019
5020	AssertMsg(pPageDesc->GCPhys == (pPage->Private.pfn << 12), ("desc %RGp gmm %RGp\n", pPageDesc->HCPhys, (pPage->Private.pfn << 12)));
5021
5022	gmmR0ConvertToSharedPage(pGMM, pGVM, pPageDesc->HCPhys, pPageDesc->idPage, pPage, pPageDesc);
5023
5024	/* Keep track of these references. */
5025	pGlobalRegion->paidPages[idxPage] = pPageDesc->idPage;
5026
5027	return VINF_SUCCESS;
5028	}
5029
5030	/**
5031	* Checks specified shared module range for changes
5032	*
5033	* Performs the following tasks:
5034	* - If a shared page is new, then it changes the GMM page type to shared and
5035	* returns it in the pPageDesc descriptor.
5036	* - If a shared page already exists, then it checks if the VM page is
5037	* identical and if so frees the VM page and returns the shared page in
5038	* pPageDesc descriptor.
5039	*
5040	* @remarks ASSUMES the caller has acquired the GMM semaphore!!
5041	*
5042	* @returns VBox status code.
5043	* @param pGVM Pointer to the GVM instance data.
5044	* @param pModule Module description
5045	* @param idxRegion Region index
5046	* @param idxPage Page index
5047	* @param pPageDesc Page descriptor
5048	*/
5049	GMMR0DECL(int) GMMR0SharedModuleCheckPage(PGVM pGVM, PGMMSHAREDMODULE pModule, uint32_t idxRegion, uint32_t idxPage,
5050	PGMMSHAREDPAGEDESC pPageDesc)
5051	{
5052	int rc;
5053	PGMM pGMM;
5054	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5055	pPageDesc->u32StrictChecksum = 0;
5056
5057	AssertMsgReturn(idxRegion < pModule->cRegions,
5058	("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5059	VERR_INVALID_PARAMETER);
5060
5061	uint32_t const cPages = pModule->aRegions[idxRegion].cb >> PAGE_SHIFT;
5062	AssertMsgReturn(idxPage < cPages,
5063	("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5064	VERR_INVALID_PARAMETER);
5065
5066	LogFlow(("GMMR0SharedModuleCheckRange %s base %RGv region %d idxPage %d\n", pModule->szName, pModule->Core.Key, idxRegion, idxPage));
5067
5068	/*
5069	* First time; create a page descriptor array.
5070	*/
5071	PGMMSHAREDREGIONDESC pGlobalRegion = &pModule->aRegions[idxRegion];
5072	if (!pGlobalRegion->paidPages)
5073	{
5074	Log(("Allocate page descriptor array for %d pages\n", cPages));
5075	pGlobalRegion->paidPages = (uint32_t )RTMemAlloc(cPages sizeof(pGlobalRegion->paidPages[0]));
5076	AssertReturn(pGlobalRegion->paidPages, VERR_NO_MEMORY);
5077
5078	/* Invalidate all descriptors. */
5079	uint32_t i = cPages;
5080	while (i-- > 0)
5081	pGlobalRegion->paidPages[i] = NIL_GMM_PAGEID;
5082	}
5083
5084	/*
5085	* We've seen this shared page for the first time?
5086	*/
5087	if (pGlobalRegion->paidPages[idxPage] == NIL_GMM_PAGEID)
5088	{
5089	Log(("New shared page guest %RGp host %RHp\n", pPageDesc->GCPhys, pPageDesc->HCPhys));
5090	return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5091	}
5092
5093	/*
5094	* We've seen it before...
5095	*/
5096	Log(("Replace existing page guest %RGp host %RHp id %#x -> id %#x\n",
5097	pPageDesc->GCPhys, pPageDesc->HCPhys, pPageDesc->idPage, pGlobalRegion->paidPages[idxPage]));
5098	Assert(pPageDesc->idPage != pGlobalRegion->paidPages[idxPage]);
5099
5100	/*
5101	* Get the shared page source.
5102	*/
5103	PGMMPAGE pPage = gmmR0GetPage(pGMM, pGlobalRegion->paidPages[idxPage]);
5104	AssertMsgReturn(pPage, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #2\n", pPageDesc->idPage, idxRegion, idxPage),
5105	VERR_PGM_PHYS_INVALID_PAGE_ID);
5106
5107	if (pPage->Common.u2State != GMM_PAGE_STATE_SHARED)
5108	{
5109	/*
5110	* Page was freed at some point; invalidate this entry.
5111	*/
5112	/** @todo this isn't really bullet proof. */
5113	Log(("Old shared page was freed -> create a new one\n"));
5114	pGlobalRegion->paidPages[idxPage] = NIL_GMM_PAGEID;
5115	return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5116	}
5117
5118	Log(("Replace existing page guest host %RHp -> %RHp\n", pPageDesc->HCPhys, ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT));
5119
5120	/*
5121	* Calculate the virtual address of the local page.
5122	*/
5123	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pPageDesc->idPage >> GMM_CHUNKID_SHIFT);
5124	AssertMsgReturn(pChunk, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #4\n", pPageDesc->idPage, idxRegion, idxPage),
5125	VERR_PGM_PHYS_INVALID_PAGE_ID);
5126
5127	uint8_t *pbChunk;
5128	AssertMsgReturn(gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk),
5129	("idPage=%#x (idxRegion=%#x idxPage=%#x) #3\n", pPageDesc->idPage, idxRegion, idxPage),
5130	VERR_PGM_PHYS_INVALID_PAGE_ID);
5131	uint8_t *pbLocalPage = pbChunk + ((pPageDesc->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5132
5133	/*
5134	* Calculate the virtual address of the shared page.
5135	*/
5136	pChunk = gmmR0GetChunk(pGMM, pGlobalRegion->paidPages[idxPage] >> GMM_CHUNKID_SHIFT);
5137	Assert(pChunk); /* can't fail as gmmR0GetPage succeeded. */
5138
5139	/*
5140	* Get the virtual address of the physical page; map the chunk into the VM
5141	* process if not already done.
5142	*/
5143	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5144	{
5145	Log(("Map chunk into process!\n"));
5146	rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/, (PRTR3PTR)&pbChunk);
5147	AssertRCReturn(rc, rc);
5148	}
5149	uint8_t *pbSharedPage = pbChunk + ((pGlobalRegion->paidPages[idxPage] & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5150
5151	#ifdef VBOX_STRICT
5152	pPageDesc->u32StrictChecksum = RTCrc32(pbSharedPage, PAGE_SIZE);
5153	uint32_t uChecksum = pPageDesc->u32StrictChecksum & UINT32_C(0x00003fff);
5154	AssertMsg(!uChecksum \|\| uChecksum == pPage->Shared.u14Checksum \|\| !pPage->Shared.u14Checksum,
5155	("%#x vs %#x - idPage=%#x - %s %s\n", uChecksum, pPage->Shared.u14Checksum,
5156	pGlobalRegion->paidPages[idxPage], pModule->szName, pModule->szVersion));
5157	#endif
5158
5159	/** @todo write ASMMemComparePage. */
5160	if (memcmp(pbSharedPage, pbLocalPage, PAGE_SIZE))
5161	{
5162	Log(("Unexpected differences found between local and shared page; skip\n"));
5163	/* Signal to the caller that this one hasn't changed. */
5164	pPageDesc->idPage = NIL_GMM_PAGEID;
5165	return VINF_SUCCESS;
5166	}
5167
5168	/*
5169	* Free the old local page.
5170	*/
5171	GMMFREEPAGEDESC PageDesc;
5172	PageDesc.idPage = pPageDesc->idPage;
5173	rc = gmmR0FreePages(pGMM, pGVM, 1, &PageDesc, GMMACCOUNT_BASE);
5174	AssertRCReturn(rc, rc);
5175
5176	gmmR0UseSharedPage(pGMM, pGVM, pPage);
5177
5178	/*
5179	* Pass along the new physical address & page id.
5180	*/
5181	pPageDesc->HCPhys = ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT;
5182	pPageDesc->idPage = pGlobalRegion->paidPages[idxPage];
5183
5184	return VINF_SUCCESS;
5185	}
5186
5187
5188	/**
5189	* RTAvlGCPtrDestroy callback.
5190	*
5191	* @returns 0 or VERR_GMM_INSTANCE.
5192	* @param pNode The node to destroy.
5193	* @param pvArgs Pointer to an argument packet.
5194	*/
5195	static DECLCALLBACK(int) gmmR0CleanupSharedModule(PAVLGCPTRNODECORE pNode, void *pvArgs)
5196	{
5197	gmmR0ShModDeletePerVM(((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGMM,
5198	((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGVM,
5199	(PGMMSHAREDMODULEPERVM)pNode,
5200	false /fRemove/);
5201	return VINF_SUCCESS;
5202	}
5203
5204
5205	/**
5206	* Used by GMMR0CleanupVM to clean up shared modules.
5207	*
5208	* This is called without taking the GMM lock so that it can be yielded as
5209	* needed here.
5210	*
5211	* @param pGMM The GMM handle.
5212	* @param pGVM The global VM handle.
5213	*/
5214	static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM)
5215	{
5216	gmmR0MutexAcquire(pGMM);
5217	GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
5218
5219	GMMR0SHMODPERVMDTORARGS Args;
5220	Args.pGVM = pGVM;
5221	Args.pGMM = pGMM;
5222	RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5223
5224	AssertMsg(pGVM->gmm.s.Stats.cShareableModules == 0, ("%d\n", pGVM->gmm.s.Stats.cShareableModules));
5225	pGVM->gmm.s.Stats.cShareableModules = 0;
5226
5227	gmmR0MutexRelease(pGMM);
5228	}
5229
5230	#endif /* VBOX_WITH_PAGE_SHARING */
5231
5232	/**
5233	* Removes all shared modules for the specified VM
5234	*
5235	* @returns VBox status code.
5236	* @param pGVM The global (ring-0) VM structure.
5237	* @param idCpu The VCPU id.
5238	*/
5239	GMMR0DECL(int) GMMR0ResetSharedModules(PGVM pGVM, VMCPUID idCpu)
5240	{
5241	#ifdef VBOX_WITH_PAGE_SHARING
5242	/*
5243	* Validate input and get the basics.
5244	*/
5245	PGMM pGMM;
5246	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5247	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5248	if (RT_FAILURE(rc))
5249	return rc;
5250
5251	/*
5252	* Take the semaphore and do some more validations.
5253	*/
5254	gmmR0MutexAcquire(pGMM);
5255	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5256	{
5257	Log(("GMMR0ResetSharedModules\n"));
5258	GMMR0SHMODPERVMDTORARGS Args;
5259	Args.pGVM = pGVM;
5260	Args.pGMM = pGMM;
5261	RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5262	pGVM->gmm.s.Stats.cShareableModules = 0;
5263
5264	rc = VINF_SUCCESS;
5265	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5266	}
5267	else
5268	rc = VERR_GMM_IS_NOT_SANE;
5269
5270	gmmR0MutexRelease(pGMM);
5271	return rc;
5272	#else
5273	RT_NOREF(pGVM, idCpu);
5274	return VERR_NOT_IMPLEMENTED;
5275	#endif
5276	}
5277
5278	#ifdef VBOX_WITH_PAGE_SHARING
5279
5280	/**
5281	* Tree enumeration callback for checking a shared module.
5282	*/
5283	static DECLCALLBACK(int) gmmR0CheckSharedModule(PAVLGCPTRNODECORE pNode, void *pvUser)
5284	{
5285	GMMCHECKSHAREDMODULEINFO pArgs = (GMMCHECKSHAREDMODULEINFO)pvUser;
5286	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)pNode;
5287	PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
5288
5289	Log(("gmmR0CheckSharedModule: check %s %s base=%RGv size=%x\n",
5290	pGblMod->szName, pGblMod->szVersion, pGblMod->Core.Key, pGblMod->cbModule));
5291
5292	int rc = PGMR0SharedModuleCheck(pArgs->pGVM, pArgs->pGVM, pArgs->idCpu, pGblMod, pRecVM->aRegionsGCPtrs);
5293	if (RT_FAILURE(rc))
5294	return rc;
5295	return VINF_SUCCESS;
5296	}
5297
5298	#endif /* VBOX_WITH_PAGE_SHARING */
5299
5300	/**
5301	* Check all shared modules for the specified VM.
5302	*
5303	* @returns VBox status code.
5304	* @param pGVM The global (ring-0) VM structure.
5305	* @param idCpu The calling EMT number.
5306	* @thread EMT(idCpu)
5307	*/
5308	GMMR0DECL(int) GMMR0CheckSharedModules(PGVM pGVM, VMCPUID idCpu)
5309	{
5310	#ifdef VBOX_WITH_PAGE_SHARING
5311	/*
5312	* Validate input and get the basics.
5313	*/
5314	PGMM pGMM;
5315	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5316	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5317	if (RT_FAILURE(rc))
5318	return rc;
5319
5320	# ifndef DEBUG_sandervl
5321	/*
5322	* Take the semaphore and do some more validations.
5323	*/
5324	gmmR0MutexAcquire(pGMM);
5325	# endif
5326	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5327	{
5328	/*
5329	* Walk the tree, checking each module.
5330	*/
5331	Log(("GMMR0CheckSharedModules\n"));
5332
5333	GMMCHECKSHAREDMODULEINFO Args;
5334	Args.pGVM = pGVM;
5335	Args.idCpu = idCpu;
5336	rc = RTAvlGCPtrDoWithAll(&pGVM->gmm.s.pSharedModuleTree, true /* fFromLeft */, gmmR0CheckSharedModule, &Args);
5337
5338	Log(("GMMR0CheckSharedModules done (rc=%Rrc)!\n", rc));
5339	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5340	}
5341	else
5342	rc = VERR_GMM_IS_NOT_SANE;
5343
5344	# ifndef DEBUG_sandervl
5345	gmmR0MutexRelease(pGMM);
5346	# endif
5347	return rc;
5348	#else
5349	RT_NOREF(pGVM, idCpu);
5350	return VERR_NOT_IMPLEMENTED;
5351	#endif
5352	}
5353
5354	#if defined(VBOX_STRICT) && HC_ARCH_BITS == 64
5355
5356	/**
5357	* Worker for GMMR0FindDuplicatePageReq.
5358	*
5359	* @returns true if duplicate, false if not.
5360	*/
5361	static bool gmmR0FindDupPageInChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint8_t const *pbSourcePage)
5362	{
5363	bool fFoundDuplicate = false;
5364	/* Only take chunks not mapped into this VM process; not entirely correct. */
5365	uint8_t *pbChunk;
5366	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5367	{
5368	int rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/, (PRTR3PTR)&pbChunk);
5369	if (RT_SUCCESS(rc))
5370	{
5371	/*
5372	* Look for duplicate pages
5373	*/
5374	uintptr_t iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
5375	while (iPage-- > 0)
5376	{
5377	if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
5378	{
5379	uint8_t *pbDestPage = pbChunk + (iPage << PAGE_SHIFT);
5380	if (!memcmp(pbSourcePage, pbDestPage, PAGE_SIZE))
5381	{
5382	fFoundDuplicate = true;
5383	break;
5384	}
5385	}
5386	}
5387	gmmR0UnmapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/);
5388	}
5389	}
5390	return fFoundDuplicate;
5391	}
5392
5393
5394	/**
5395	* Find a duplicate of the specified page in other active VMs
5396	*
5397	* @returns VBox status code.
5398	* @param pGVM The global (ring-0) VM structure.
5399	* @param pReq Pointer to the request packet.
5400	*/
5401	GMMR0DECL(int) GMMR0FindDuplicatePageReq(PGVM pGVM, PGMMFINDDUPLICATEPAGEREQ pReq)
5402	{
5403	/*
5404	* Validate input and pass it on.
5405	*/
5406	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5407	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5408
5409	PGMM pGMM;
5410	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5411
5412	int rc = GVMMR0ValidateGVM(pGVM);
5413	if (RT_FAILURE(rc))
5414	return rc;
5415
5416	/*
5417	* Take the semaphore and do some more validations.
5418	*/
5419	rc = gmmR0MutexAcquire(pGMM);
5420	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5421	{
5422	uint8_t *pbChunk;
5423	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pReq->idPage >> GMM_CHUNKID_SHIFT);
5424	if (pChunk)
5425	{
5426	if (gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5427	{
5428	uint8_t *pbSourcePage = pbChunk + ((pReq->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5429	PGMMPAGE pPage = gmmR0GetPage(pGMM, pReq->idPage);
5430	if (pPage)
5431	{
5432	/*
5433	* Walk the chunks
5434	*/
5435	GMMFINDDUPPAGEINFO Args;
5436	Args.pGVM = pGVM;
5437	Args.pGMM = pGMM;
5438	Args.pSourcePage = pbSourcePage;
5439	Args.fFoundDuplicate = false;
5440
5441	PGMMCHUNK pChunk;
5442	pReq->fDuplicate = false;
5443	RTListForEach(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
5444	{
5445	if (gmmR0FindDupPageInChunk(pGMM, pGVM, pChunk, pbSourcePage))
5446	{
5447	pReq->fDuplicate = true;
5448	break;
5449	}
5450	}
5451	}
5452	else
5453	{
5454	AssertFailed();
5455	rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
5456	}
5457	}
5458	else
5459	AssertFailed();
5460	}
5461	else
5462	AssertFailed();
5463	}
5464	else
5465	rc = VERR_GMM_IS_NOT_SANE;
5466
5467	gmmR0MutexRelease(pGMM);
5468	return rc;
5469	}
5470
5471	#endif /* VBOX_STRICT && HC_ARCH_BITS == 64 */
5472
5473
5474	/**
5475	* Retrieves the GMM statistics visible to the caller.
5476	*
5477	* @returns VBox status code.
5478	*
5479	* @param pStats Where to put the statistics.
5480	* @param pSession The current session.
5481	* @param pGVM The GVM to obtain statistics for. Optional.
5482	*/
5483	GMMR0DECL(int) GMMR0QueryStatistics(PGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5484	{
5485	LogFlow(("GVMMR0QueryStatistics: pStats=%p pSession=%p pGVM=%p\n", pStats, pSession, pGVM));
5486
5487	/*
5488	* Validate input.
5489	*/
5490	AssertPtrReturn(pSession, VERR_INVALID_POINTER);
5491	AssertPtrReturn(pStats, VERR_INVALID_POINTER);
5492	pStats->cMaxPages = 0; /* (crash before taking the mutex...) */
5493
5494	PGMM pGMM;
5495	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5496
5497	/*
5498	* Validate the VM handle, if not NULL, and lock the GMM.
5499	*/
5500	int rc;
5501	if (pGVM)
5502	{
5503	rc = GVMMR0ValidateGVM(pGVM);
5504	if (RT_FAILURE(rc))
5505	return rc;
5506	}
5507
5508	rc = gmmR0MutexAcquire(pGMM);
5509	if (RT_FAILURE(rc))
5510	return rc;
5511
5512	/*
5513	* Copy out the GMM statistics.
5514	*/
5515	pStats->cMaxPages = pGMM->cMaxPages;
5516	pStats->cReservedPages = pGMM->cReservedPages;
5517	pStats->cOverCommittedPages = pGMM->cOverCommittedPages;
5518	pStats->cAllocatedPages = pGMM->cAllocatedPages;
5519	pStats->cSharedPages = pGMM->cSharedPages;
5520	pStats->cDuplicatePages = pGMM->cDuplicatePages;
5521	pStats->cLeftBehindSharedPages = pGMM->cLeftBehindSharedPages;
5522	pStats->cBalloonedPages = pGMM->cBalloonedPages;
5523	pStats->cChunks = pGMM->cChunks;
5524	pStats->cFreedChunks = pGMM->cFreedChunks;
5525	pStats->cShareableModules = pGMM->cShareableModules;
5526	RT_ZERO(pStats->au64Reserved);
5527
5528	/*
5529	* Copy out the VM statistics.
5530	*/
5531	if (pGVM)
5532	pStats->VMStats = pGVM->gmm.s.Stats;
5533	else
5534	RT_ZERO(pStats->VMStats);
5535
5536	gmmR0MutexRelease(pGMM);
5537	return rc;
5538	}
5539
5540
5541	/**
5542	* VMMR0 request wrapper for GMMR0QueryStatistics.
5543	*
5544	* @returns see GMMR0QueryStatistics.
5545	* @param pGVM The global (ring-0) VM structure. Optional.
5546	* @param pReq Pointer to the request packet.
5547	*/
5548	GMMR0DECL(int) GMMR0QueryStatisticsReq(PGVM pGVM, PGMMQUERYSTATISTICSSREQ pReq)
5549	{
5550	/*
5551	* Validate input and pass it on.
5552	*/
5553	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5554	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5555
5556	return GMMR0QueryStatistics(&pReq->Stats, pReq->pSession, pGVM);
5557	}
5558
5559
5560	/**
5561	* Resets the specified GMM statistics.
5562	*
5563	* @returns VBox status code.
5564	*
5565	* @param pStats Which statistics to reset, that is, non-zero fields
5566	* indicates which to reset.
5567	* @param pSession The current session.
5568	* @param pGVM The GVM to reset statistics for. Optional.
5569	*/
5570	GMMR0DECL(int) GMMR0ResetStatistics(PCGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5571	{
5572	NOREF(pStats); NOREF(pSession); NOREF(pGVM);
5573	/* Currently nothing we can reset at the moment. */
5574	return VINF_SUCCESS;
5575	}
5576
5577
5578	/**
5579	* VMMR0 request wrapper for GMMR0ResetStatistics.
5580	*
5581	* @returns see GMMR0ResetStatistics.
5582	* @param pGVM The global (ring-0) VM structure. Optional.
5583	* @param pReq Pointer to the request packet.
5584	*/
5585	GMMR0DECL(int) GMMR0ResetStatisticsReq(PGVM pGVM, PGMMRESETSTATISTICSSREQ pReq)
5586	{
5587	/*
5588	* Validate input and pass it on.
5589	*/
5590	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5591	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5592
5593	return GMMR0ResetStatistics(&pReq->Stats, pReq->pSession, pGVM);
5594	}
5595

注意: 瀏覽 TracBrowser 來幫助您使用儲存庫瀏覽器

source: vbox/trunk/src/VBox/VMM/VMMR0/GMMR0.cpp@ 82976

以其他格式下載: