GMMR0.cpp@ 82978

最後變更在這個檔案從82978是 82978,由 vboxsync 提交於 5 年前
VMM/GMMR0: Introduce a spinlock to protect the AVL tree and associated TLB. bugref:9627
屬性 svn:eol-style 設為 `native` 屬性 svn:keywords 設為 `Id Revision`
檔案大小: 196.7 KB

行
1	/* $Id: GMMR0.cpp 82978 2020-02-04 14:52:50Z vboxsync $ */
2	/** @file
3	* GMM - Global Memory Manager.
4	*/
5
6	/*
7	* Copyright (C) 2007-2020 Oracle Corporation
8	*
9	* This file is part of VirtualBox Open Source Edition (OSE), as
10	* available from http://www.alldomusa.eu.org. This file is free software;
11	* you can redistribute it and/or modify it under the terms of the GNU
12	* General Public License (GPL) as published by the Free Software
13	* Foundation, in version 2 as it comes in the "COPYING" file of the
14	* VirtualBox OSE distribution. VirtualBox OSE is distributed in the
15	* hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
16	*/
17
18
19	/** @page pg_gmm GMM - The Global Memory Manager
20	*
21	* As the name indicates, this component is responsible for global memory
22	* management. Currently only guest RAM is allocated from the GMM, but this
23	* may change to include shadow page tables and other bits later.
24	*
25	* Guest RAM is managed as individual pages, but allocated from the host OS
26	* in chunks for reasons of portability / efficiency. To minimize the memory
27	* footprint all tracking structure must be as small as possible without
28	* unnecessary performance penalties.
29	*
30	* The allocation chunks has fixed sized, the size defined at compile time
31	* by the #GMM_CHUNK_SIZE \#define.
32	*
33	* Each chunk is given an unique ID. Each page also has a unique ID. The
34	* relationship between the two IDs is:
35	* @code
36	* GMM_CHUNK_SHIFT = log2(GMM_CHUNK_SIZE / PAGE_SIZE);
37	* idPage = (idChunk << GMM_CHUNK_SHIFT) \| iPage;
38	* @endcode
39	* Where iPage is the index of the page within the chunk. This ID scheme
40	* permits for efficient chunk and page lookup, but it relies on the chunk size
41	* to be set at compile time. The chunks are organized in an AVL tree with their
42	* IDs being the keys.
43	*
44	* The physical address of each page in an allocation chunk is maintained by
45	* the #RTR0MEMOBJ and obtained using #RTR0MemObjGetPagePhysAddr. There is no
46	* need to duplicate this information (it'll cost 8-bytes per page if we did).
47	*
48	* So what do we need to track per page? Most importantly we need to know
49	* which state the page is in:
50	* - Private - Allocated for (eventually) backing one particular VM page.
51	* - Shared - Readonly page that is used by one or more VMs and treated
52	* as COW by PGM.
53	* - Free - Not used by anyone.
54	*
55	* For the page replacement operations (sharing, defragmenting and freeing)
56	* to be somewhat efficient, private pages needs to be associated with a
57	* particular page in a particular VM.
58	*
59	* Tracking the usage of shared pages is impractical and expensive, so we'll
60	* settle for a reference counting system instead.
61	*
62	* Free pages will be chained on LIFOs
63	*
64	* On 64-bit systems we will use a 64-bit bitfield per page, while on 32-bit
65	* systems a 32-bit bitfield will have to suffice because of address space
66	* limitations. The #GMMPAGE structure shows the details.
67	*
68	*
69	* @section sec_gmm_alloc_strat Page Allocation Strategy
70	*
71	* The strategy for allocating pages has to take fragmentation and shared
72	* pages into account, or we may end up with with 2000 chunks with only
73	* a few pages in each. Shared pages cannot easily be reallocated because
74	* of the inaccurate usage accounting (see above). Private pages can be
75	* reallocated by a defragmentation thread in the same manner that sharing
76	* is done.
77	*
78	* The first approach is to manage the free pages in two sets depending on
79	* whether they are mainly for the allocation of shared or private pages.
80	* In the initial implementation there will be almost no possibility for
81	* mixing shared and private pages in the same chunk (only if we're really
82	* stressed on memory), but when we implement forking of VMs and have to
83	* deal with lots of COW pages it'll start getting kind of interesting.
84	*
85	* The sets are lists of chunks with approximately the same number of
86	* free pages. Say the chunk size is 1MB, meaning 256 pages, and a set
87	* consists of 16 lists. So, the first list will contain the chunks with
88	* 1-7 free pages, the second covers 8-15, and so on. The chunks will be
89	* moved between the lists as pages are freed up or allocated.
90	*
91	*
92	* @section sec_gmm_costs Costs
93	*
94	* The per page cost in kernel space is 32-bit plus whatever RTR0MEMOBJ
95	* entails. In addition there is the chunk cost of approximately
96	* (sizeof(RT0MEMOBJ) + sizeof(CHUNK)) / 2^CHUNK_SHIFT bytes per page.
97	*
98	* On Windows the per page #RTR0MEMOBJ cost is 32-bit on 32-bit windows
99	* and 64-bit on 64-bit windows (a PFN_NUMBER in the MDL). So, 64-bit per page.
100	* The cost on Linux is identical, but here it's because of sizeof(struct page *).
101	*
102	*
103	* @section sec_gmm_legacy Legacy Mode for Non-Tier-1 Platforms
104	*
105	* In legacy mode the page source is locked user pages and not
106	* #RTR0MemObjAllocPhysNC, this means that a page can only be allocated
107	* by the VM that locked it. We will make no attempt at implementing
108	* page sharing on these systems, just do enough to make it all work.
109	*
110	* @note With 6.1 really dropping 32-bit support, the legacy mode is obsoleted
111	* under the assumption that there is sufficient kernel virtual address
112	* space to map all of the guest memory allocations. So, we'll be using
113	* #RTR0MemObjAllocPage on some platforms as an alternative to
114	* #RTR0MemObjAllocPhysNC.
115	*
116	*
117	* @subsection sub_gmm_locking Serializing
118	*
119	* One simple fast mutex will be employed in the initial implementation, not
120	* two as mentioned in @ref sec_pgmPhys_Serializing.
121	*
122	* @see @ref sec_pgmPhys_Serializing
123	*
124	*
125	* @section sec_gmm_overcommit Memory Over-Commitment Management
126	*
127	* The GVM will have to do the system wide memory over-commitment
128	* management. My current ideas are:
129	* - Per VM oc policy that indicates how much to initially commit
130	* to it and what to do in a out-of-memory situation.
131	* - Prevent overtaxing the host.
132	*
133	* There are some challenges here, the main ones are configurability and
134	* security. Should we for instance permit anyone to request 100% memory
135	* commitment? Who should be allowed to do runtime adjustments of the
136	* config. And how to prevent these settings from being lost when the last
137	* VM process exits? The solution is probably to have an optional root
138	* daemon the will keep VMMR0.r0 in memory and enable the security measures.
139	*
140	*
141	*
142	* @section sec_gmm_numa NUMA
143	*
144	* NUMA considerations will be designed and implemented a bit later.
145	*
146	* The preliminary guesses is that we will have to try allocate memory as
147	* close as possible to the CPUs the VM is executed on (EMT and additional CPU
148	* threads). Which means it's mostly about allocation and sharing policies.
149	* Both the scheduler and allocator interface will to supply some NUMA info
150	* and we'll need to have a way to calc access costs.
151	*
152	*/
153
154
155	/*********************************************************************************************************************************
156	* Header Files *
157	*********************************************************************************************************************************/
158	#define LOG_GROUP LOG_GROUP_GMM
159	#include <VBox/rawpci.h>
160	#include <VBox/vmm/gmm.h>
161	#include "GMMR0Internal.h"
162	#include <VBox/vmm/vmcc.h>
163	#include <VBox/vmm/pgm.h>
164	#include <VBox/log.h>
165	#include <VBox/param.h>
166	#include <VBox/err.h>
167	#include <VBox/VMMDev.h>
168	#include <iprt/asm.h>
169	#include <iprt/avl.h>
170	#ifdef VBOX_STRICT
171	# include <iprt/crc.h>
172	#endif
173	#include <iprt/critsect.h>
174	#include <iprt/list.h>
175	#include <iprt/mem.h>
176	#include <iprt/memobj.h>
177	#include <iprt/mp.h>
178	#include <iprt/semaphore.h>
179	#include <iprt/spinlock.h>
180	#include <iprt/string.h>
181	#include <iprt/time.h>
182
183
184	/*********************************************************************************************************************************
185	* Defined Constants And Macros *
186	*********************************************************************************************************************************/
187	/** @def VBOX_USE_CRIT_SECT_FOR_GIANT
188	* Use a critical section instead of a fast mutex for the giant GMM lock.
189	*
190	* @remarks This is primarily a way of avoiding the deadlock checks in the
191	* windows driver verifier. */
192	#if defined(RT_OS_WINDOWS) \|\| defined(RT_OS_DARWIN) \|\| defined(DOXYGEN_RUNNING)
193	# define VBOX_USE_CRIT_SECT_FOR_GIANT
194	#endif
195
196	#if (!defined(VBOX_WITH_RAM_IN_KERNEL) \|\| defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)) \
197	&& !defined(RT_OS_DARWIN)
198	/** Enable the legacy mode code (will be dropped soon). */
199	# define GMM_WITH_LEGACY_MODE
200	#endif
201
202
203	/*********************************************************************************************************************************
204	* Structures and Typedefs *
205	*********************************************************************************************************************************/
206	/** Pointer to set of free chunks. */
207	typedef struct GMMCHUNKFREESET *PGMMCHUNKFREESET;
208
209	/**
210	* The per-page tracking structure employed by the GMM.
211	*
212	* On 32-bit hosts we'll some trickery is necessary to compress all
213	* the information into 32-bits. When the fSharedFree member is set,
214	* the 30th bit decides whether it's a free page or not.
215	*
216	* Because of the different layout on 32-bit and 64-bit hosts, macros
217	* are used to get and set some of the data.
218	*/
219	typedef union GMMPAGE
220	{
221	#if HC_ARCH_BITS == 64
222	/** Unsigned integer view. */
223	uint64_t u;
224
225	/** The common view. */
226	struct GMMPAGECOMMON
227	{
228	uint32_t uStuff1 : 32;
229	uint32_t uStuff2 : 30;
230	/** The page state. */
231	uint32_t u2State : 2;
232	} Common;
233
234	/** The view of a private page. */
235	struct GMMPAGEPRIVATE
236	{
237	/** The guest page frame number. (Max addressable: 2 ^ 44 - 16) */
238	uint32_t pfn;
239	/** The GVM handle. (64K VMs) */
240	uint32_t hGVM : 16;
241	/** Reserved. */
242	uint32_t u16Reserved : 14;
243	/** The page state. */
244	uint32_t u2State : 2;
245	} Private;
246
247	/** The view of a shared page. */
248	struct GMMPAGESHARED
249	{
250	/** The host page frame number. (Max addressable: 2 ^ 44 - 16) */
251	uint32_t pfn;
252	/** The reference count (64K VMs). */
253	uint32_t cRefs : 16;
254	/** Used for debug checksumming. */
255	uint32_t u14Checksum : 14;
256	/** The page state. */
257	uint32_t u2State : 2;
258	} Shared;
259
260	/** The view of a free page. */
261	struct GMMPAGEFREE
262	{
263	/** The index of the next page in the free list. UINT16_MAX is NIL. */
264	uint16_t iNext;
265	/** Reserved. Checksum or something? */
266	uint16_t u16Reserved0;
267	/** Reserved. Checksum or something? */
268	uint32_t u30Reserved1 : 30;
269	/** The page state. */
270	uint32_t u2State : 2;
271	} Free;
272
273	#else /* 32-bit */
274	/** Unsigned integer view. */
275	uint32_t u;
276
277	/** The common view. */
278	struct GMMPAGECOMMON
279	{
280	uint32_t uStuff : 30;
281	/** The page state. */
282	uint32_t u2State : 2;
283	} Common;
284
285	/** The view of a private page. */
286	struct GMMPAGEPRIVATE
287	{
288	/** The guest page frame number. (Max addressable: 2 ^ 36) */
289	uint32_t pfn : 24;
290	/** The GVM handle. (127 VMs) */
291	uint32_t hGVM : 7;
292	/** The top page state bit, MBZ. */
293	uint32_t fZero : 1;
294	} Private;
295
296	/** The view of a shared page. */
297	struct GMMPAGESHARED
298	{
299	/** The reference count. */
300	uint32_t cRefs : 30;
301	/** The page state. */
302	uint32_t u2State : 2;
303	} Shared;
304
305	/** The view of a free page. */
306	struct GMMPAGEFREE
307	{
308	/** The index of the next page in the free list. UINT16_MAX is NIL. */
309	uint32_t iNext : 16;
310	/** Reserved. Checksum or something? */
311	uint32_t u14Reserved : 14;
312	/** The page state. */
313	uint32_t u2State : 2;
314	} Free;
315	#endif
316	} GMMPAGE;
317	AssertCompileSize(GMMPAGE, sizeof(RTHCUINTPTR));
318	/** Pointer to a GMMPAGE. */
319	typedef GMMPAGE *PGMMPAGE;
320
321
322	/** @name The Page States.
323	* @{ */
324	/** A private page. */
325	#define GMM_PAGE_STATE_PRIVATE 0
326	/** A private page - alternative value used on the 32-bit implementation.
327	* This will never be used on 64-bit hosts. */
328	#define GMM_PAGE_STATE_PRIVATE_32 1
329	/** A shared page. */
330	#define GMM_PAGE_STATE_SHARED 2
331	/** A free page. */
332	#define GMM_PAGE_STATE_FREE 3
333	/** @} */
334
335
336	/** @def GMM_PAGE_IS_PRIVATE
337	*
338	* @returns true if private, false if not.
339	* @param pPage The GMM page.
340	*/
341	#if HC_ARCH_BITS == 64
342	# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_PRIVATE )
343	#else
344	# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Private.fZero == 0 )
345	#endif
346
347	/** @def GMM_PAGE_IS_SHARED
348	*
349	* @returns true if shared, false if not.
350	* @param pPage The GMM page.
351	*/
352	#define GMM_PAGE_IS_SHARED(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_SHARED )
353
354	/** @def GMM_PAGE_IS_FREE
355	*
356	* @returns true if free, false if not.
357	* @param pPage The GMM page.
358	*/
359	#define GMM_PAGE_IS_FREE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_FREE )
360
361	/** @def GMM_PAGE_PFN_LAST
362	* The last valid guest pfn range.
363	* @remark Some of the values outside the range has special meaning,
364	* see GMM_PAGE_PFN_UNSHAREABLE.
365	*/
366	#if HC_ARCH_BITS == 64
367	# define GMM_PAGE_PFN_LAST UINT32_C(0xfffffff0)
368	#else
369	# define GMM_PAGE_PFN_LAST UINT32_C(0x00fffff0)
370	#endif
371	AssertCompile(GMM_PAGE_PFN_LAST == (GMM_GCPHYS_LAST >> PAGE_SHIFT));
372
373	/** @def GMM_PAGE_PFN_UNSHAREABLE
374	* Indicates that this page isn't used for normal guest memory and thus isn't shareable.
375	*/
376	#if HC_ARCH_BITS == 64
377	# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0xfffffff1)
378	#else
379	# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0x00fffff1)
380	#endif
381	AssertCompile(GMM_PAGE_PFN_UNSHAREABLE == (GMM_GCPHYS_UNSHAREABLE >> PAGE_SHIFT));
382
383
384	/**
385	* A GMM allocation chunk ring-3 mapping record.
386	*
387	* This should really be associated with a session and not a VM, but
388	* it's simpler to associated with a VM and cleanup with the VM object
389	* is destroyed.
390	*/
391	typedef struct GMMCHUNKMAP
392	{
393	/** The mapping object. */
394	RTR0MEMOBJ hMapObj;
395	/** The VM owning the mapping. */
396	PGVM pGVM;
397	} GMMCHUNKMAP;
398	/** Pointer to a GMM allocation chunk mapping. */
399	typedef struct GMMCHUNKMAP *PGMMCHUNKMAP;
400
401
402	/**
403	* A GMM allocation chunk.
404	*/
405	typedef struct GMMCHUNK
406	{
407	/** The AVL node core.
408	* The Key is the chunk ID. (Giant mtx.) */
409	AVLU32NODECORE Core;
410	/** The memory object.
411	* Either from RTR0MemObjAllocPhysNC or RTR0MemObjLockUser depending on
412	* what the host can dish up with. (Chunk mtx protects mapping accesses
413	* and related frees.) */
414	RTR0MEMOBJ hMemObj;
415	#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
416	/** Pointer to the kernel mapping. */
417	uint8_t *pbMapping;
418	#endif
419	/** Pointer to the next chunk in the free list. (Giant mtx.) */
420	PGMMCHUNK pFreeNext;
421	/** Pointer to the previous chunk in the free list. (Giant mtx.) */
422	PGMMCHUNK pFreePrev;
423	/** Pointer to the free set this chunk belongs to. NULL for
424	* chunks with no free pages. (Giant mtx.) */
425	PGMMCHUNKFREESET pSet;
426	/** List node in the chunk list (GMM::ChunkList). (Giant mtx.) */
427	RTLISTNODE ListNode;
428	/** Pointer to an array of mappings. (Chunk mtx.) */
429	PGMMCHUNKMAP paMappingsX;
430	/** The number of mappings. (Chunk mtx.) */
431	uint16_t cMappingsX;
432	/** The mapping lock this chunk is using using. UINT16_MAX if nobody is
433	* mapping or freeing anything. (Giant mtx.) */
434	uint8_t volatile iChunkMtx;
435	/** GMM_CHUNK_FLAGS_XXX. (Giant mtx.) */
436	uint8_t fFlags;
437	/** The head of the list of free pages. UINT16_MAX is the NIL value.
438	* (Giant mtx.) */
439	uint16_t iFreeHead;
440	/** The number of free pages. (Giant mtx.) */
441	uint16_t cFree;
442	/** The GVM handle of the VM that first allocated pages from this chunk, this
443	* is used as a preference when there are several chunks to choose from.
444	* When in bound memory mode this isn't a preference any longer. (Giant
445	* mtx.) */
446	uint16_t hGVM;
447	/** The ID of the NUMA node the memory mostly resides on. (Reserved for
448	* future use.) (Giant mtx.) */
449	uint16_t idNumaNode;
450	/** The number of private pages. (Giant mtx.) */
451	uint16_t cPrivate;
452	/** The number of shared pages. (Giant mtx.) */
453	uint16_t cShared;
454	/** The pages. (Giant mtx.) */
455	GMMPAGE aPages[GMM_CHUNK_SIZE >> PAGE_SHIFT];
456	} GMMCHUNK;
457
458	/** Indicates that the NUMA properies of the memory is unknown. */
459	#define GMM_CHUNK_NUMA_ID_UNKNOWN UINT16_C(0xfffe)
460
461	/** @name GMM_CHUNK_FLAGS_XXX - chunk flags.
462	* @{ */
463	/** Indicates that the chunk is a large page (2MB). */
464	#define GMM_CHUNK_FLAGS_LARGE_PAGE UINT16_C(0x0001)
465	#ifdef GMM_WITH_LEGACY_MODE
466	/** Indicates that the chunk was locked rather than allocated directly. */
467	# define GMM_CHUNK_FLAGS_SEEDED UINT16_C(0x0002)
468	#endif
469	/** @} */
470
471
472	/**
473	* An allocation chunk TLB entry.
474	*/
475	typedef struct GMMCHUNKTLBE
476	{
477	/** The chunk id. */
478	uint32_t idChunk;
479	/** Pointer to the chunk. */
480	PGMMCHUNK pChunk;
481	} GMMCHUNKTLBE;
482	/** Pointer to an allocation chunk TLB entry. */
483	typedef GMMCHUNKTLBE *PGMMCHUNKTLBE;
484
485
486	/** The number of entries tin the allocation chunk TLB. */
487	#define GMM_CHUNKTLB_ENTRIES 32
488	/** Gets the TLB entry index for the given Chunk ID. */
489	#define GMM_CHUNKTLB_IDX(idChunk) ( (idChunk) & (GMM_CHUNKTLB_ENTRIES - 1) )
490
491	/**
492	* An allocation chunk TLB.
493	*/
494	typedef struct GMMCHUNKTLB
495	{
496	/** The TLB entries. */
497	GMMCHUNKTLBE aEntries[GMM_CHUNKTLB_ENTRIES];
498	} GMMCHUNKTLB;
499	/** Pointer to an allocation chunk TLB. */
500	typedef GMMCHUNKTLB *PGMMCHUNKTLB;
501
502
503	/**
504	* The GMM instance data.
505	*/
506	typedef struct GMM
507	{
508	/** Magic / eye catcher. GMM_MAGIC */
509	uint32_t u32Magic;
510	/** The number of threads waiting on the mutex. */
511	uint32_t cMtxContenders;
512	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
513	/** The critical section protecting the GMM.
514	* More fine grained locking can be implemented later if necessary. */
515	RTCRITSECT GiantCritSect;
516	#else
517	/** The fast mutex protecting the GMM.
518	* More fine grained locking can be implemented later if necessary. */
519	RTSEMFASTMUTEX hMtx;
520	#endif
521	#ifdef VBOX_STRICT
522	/** The current mutex owner. */
523	RTNATIVETHREAD hMtxOwner;
524	#endif
525	/** Spinlock protecting the AVL tree.
526	* @todo Make this a read-write spinlock as we should allow concurrent
527	* lookups. */
528	RTSPINLOCK hSpinLockTree;
529	/** The chunk tree.
530	* Protected by hSpinLockTree. */
531	PAVLU32NODECORE pChunks;
532	/** The chunk TLB.
533	* Protected by hSpinLockTree. */
534	GMMCHUNKTLB ChunkTLB;
535	/** The private free set. */
536	GMMCHUNKFREESET PrivateX;
537	/** The shared free set. */
538	GMMCHUNKFREESET Shared;
539
540	/** Shared module tree (global).
541	* @todo separate trees for distinctly different guest OSes. */
542	PAVLLU32NODECORE pGlobalSharedModuleTree;
543	/** Sharable modules (count of nodes in pGlobalSharedModuleTree). */
544	uint32_t cShareableModules;
545
546	/** The chunk list. For simplifying the cleanup process and avoid tree
547	* traversal. */
548	RTLISTANCHOR ChunkList;
549
550	/** The maximum number of pages we're allowed to allocate.
551	* @gcfgm{GMM/MaxPages,64-bit, Direct.}
552	* @gcfgm{GMM/PctPages,32-bit, Relative to the number of host pages.} */
553	uint64_t cMaxPages;
554	/** The number of pages that has been reserved.
555	* The deal is that cReservedPages - cOverCommittedPages <= cMaxPages. */
556	uint64_t cReservedPages;
557	/** The number of pages that we have over-committed in reservations. */
558	uint64_t cOverCommittedPages;
559	/** The number of actually allocated (committed if you like) pages. */
560	uint64_t cAllocatedPages;
561	/** The number of pages that are shared. A subset of cAllocatedPages. */
562	uint64_t cSharedPages;
563	/** The number of pages that are actually shared between VMs. */
564	uint64_t cDuplicatePages;
565	/** The number of pages that are shared that has been left behind by
566	* VMs not doing proper cleanups. */
567	uint64_t cLeftBehindSharedPages;
568	/** The number of allocation chunks.
569	* (The number of pages we've allocated from the host can be derived from this.) */
570	uint32_t cChunks;
571	/** The number of current ballooned pages. */
572	uint64_t cBalloonedPages;
573
574	#ifndef GMM_WITH_LEGACY_MODE
575	# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
576	/** Whether #RTR0MemObjAllocPhysNC works. */
577	bool fHasWorkingAllocPhysNC;
578	# else
579	bool fPadding;
580	# endif
581	#else
582	/** The legacy allocation mode indicator.
583	* This is determined at initialization time. */
584	bool fLegacyAllocationMode;
585	#endif
586	/** The bound memory mode indicator.
587	* When set, the memory will be bound to a specific VM and never
588	* shared. This is always set if fLegacyAllocationMode is set.
589	* (Also determined at initialization time.) */
590	bool fBoundMemoryMode;
591	/** The number of registered VMs. */
592	uint16_t cRegisteredVMs;
593
594	/** The number of freed chunks ever. This is used a list generation to
595	* avoid restarting the cleanup scanning when the list wasn't modified. */
596	uint32_t cFreedChunks;
597	/** The previous allocated Chunk ID.
598	* Used as a hint to avoid scanning the whole bitmap. */
599	uint32_t idChunkPrev;
600	/** Chunk ID allocation bitmap.
601	* Bits of allocated IDs are set, free ones are clear.
602	* The NIL id (0) is marked allocated. */
603	uint32_t bmChunkId[(GMM_CHUNKID_LAST + 1 + 31) / 32];
604
605	/** The index of the next mutex to use. */
606	uint32_t iNextChunkMtx;
607	/** Chunk locks for reducing lock contention without having to allocate
608	* one lock per chunk. */
609	struct
610	{
611	/** The mutex */
612	RTSEMFASTMUTEX hMtx;
613	/** The number of threads currently using this mutex. */
614	uint32_t volatile cUsers;
615	} aChunkMtx[64];
616	} GMM;
617	/** Pointer to the GMM instance. */
618	typedef GMM *PGMM;
619
620	/** The value of GMM::u32Magic (Katsuhiro Otomo). */
621	#define GMM_MAGIC UINT32_C(0x19540414)
622
623
624	/**
625	* GMM chunk mutex state.
626	*
627	* This is returned by gmmR0ChunkMutexAcquire and is used by the other
628	* gmmR0ChunkMutex* methods.
629	*/
630	typedef struct GMMR0CHUNKMTXSTATE
631	{
632	PGMM pGMM;
633	/** The index of the chunk mutex. */
634	uint8_t iChunkMtx;
635	/** The relevant flags (GMMR0CHUNK_MTX_XXX). */
636	uint8_t fFlags;
637	} GMMR0CHUNKMTXSTATE;
638	/** Pointer to a chunk mutex state. */
639	typedef GMMR0CHUNKMTXSTATE *PGMMR0CHUNKMTXSTATE;
640
641	/** @name GMMR0CHUNK_MTX_XXX
642	* @{ */
643	#define GMMR0CHUNK_MTX_INVALID UINT32_C(0)
644	#define GMMR0CHUNK_MTX_KEEP_GIANT UINT32_C(1)
645	#define GMMR0CHUNK_MTX_RETAKE_GIANT UINT32_C(2)
646	#define GMMR0CHUNK_MTX_DROP_GIANT UINT32_C(3)
647	#define GMMR0CHUNK_MTX_END UINT32_C(4)
648	/** @} */
649
650
651	/** The maximum number of shared modules per-vm. */
652	#define GMM_MAX_SHARED_PER_VM_MODULES 2048
653	/** The maximum number of shared modules GMM is allowed to track. */
654	#define GMM_MAX_SHARED_GLOBAL_MODULES 16834
655
656
657	/**
658	* Argument packet for gmmR0SharedModuleCleanup.
659	*/
660	typedef struct GMMR0SHMODPERVMDTORARGS
661	{
662	PGVM pGVM;
663	PGMM pGMM;
664	} GMMR0SHMODPERVMDTORARGS;
665
666	/**
667	* Argument packet for gmmR0CheckSharedModule.
668	*/
669	typedef struct GMMCHECKSHAREDMODULEINFO
670	{
671	PGVM pGVM;
672	VMCPUID idCpu;
673	} GMMCHECKSHAREDMODULEINFO;
674
675
676	/*********************************************************************************************************************************
677	* Global Variables *
678	*********************************************************************************************************************************/
679	/** Pointer to the GMM instance data. */
680	static PGMM g_pGMM = NULL;
681
682	/** Macro for obtaining and validating the g_pGMM pointer.
683	*
684	* On failure it will return from the invoking function with the specified
685	* return value.
686	*
687	* @param pGMM The name of the pGMM variable.
688	* @param rc The return value on failure. Use VERR_GMM_INSTANCE for VBox
689	* status codes.
690	*/
691	#define GMM_GET_VALID_INSTANCE(pGMM, rc) \
692	do { \
693	(pGMM) = g_pGMM; \
694	AssertPtrReturn((pGMM), (rc)); \
695	AssertMsgReturn((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic), (rc)); \
696	} while (0)
697
698	/** Macro for obtaining and validating the g_pGMM pointer, void function
699	* variant.
700	*
701	* On failure it will return from the invoking function.
702	*
703	* @param pGMM The name of the pGMM variable.
704	*/
705	#define GMM_GET_VALID_INSTANCE_VOID(pGMM) \
706	do { \
707	(pGMM) = g_pGMM; \
708	AssertPtrReturnVoid((pGMM)); \
709	AssertMsgReturnVoid((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic)); \
710	} while (0)
711
712
713	/** @def GMM_CHECK_SANITY_UPON_ENTERING
714	* Checks the sanity of the GMM instance data before making changes.
715	*
716	* This is macro is a stub by default and must be enabled manually in the code.
717	*
718	* @returns true if sane, false if not.
719	* @param pGMM The name of the pGMM variable.
720	*/
721	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
722	# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
723	#else
724	# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (true)
725	#endif
726
727	/** @def GMM_CHECK_SANITY_UPON_LEAVING
728	* Checks the sanity of the GMM instance data after making changes.
729	*
730	* This is macro is a stub by default and must be enabled manually in the code.
731	*
732	* @returns true if sane, false if not.
733	* @param pGMM The name of the pGMM variable.
734	*/
735	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
736	# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
737	#else
738	# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (true)
739	#endif
740
741	/** @def GMM_CHECK_SANITY_IN_LOOPS
742	* Checks the sanity of the GMM instance in the allocation loops.
743	*
744	* This is macro is a stub by default and must be enabled manually in the code.
745	*
746	* @returns true if sane, false if not.
747	* @param pGMM The name of the pGMM variable.
748	*/
749	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
750	# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
751	#else
752	# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (true)
753	#endif
754
755
756	/*********************************************************************************************************************************
757	* Internal Functions *
758	*********************************************************************************************************************************/
759	static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM);
760	static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
761	DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk);
762	DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet);
763	DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
764	#ifdef GMMR0_WITH_SANITY_CHECK
765	static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo);
766	#endif
767	static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem);
768	DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
769	DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
770	static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
771	#ifdef VBOX_WITH_PAGE_SHARING
772	static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM);
773	# ifdef VBOX_STRICT
774	static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage);
775	# endif
776	#endif
777
778
779
780	/**
781	* Initializes the GMM component.
782	*
783	* This is called when the VMMR0.r0 module is loaded and protected by the
784	* loader semaphore.
785	*
786	* @returns VBox status code.
787	*/
788	GMMR0DECL(int) GMMR0Init(void)
789	{
790	LogFlow(("GMMInit:\n"));
791
792	/*
793	* Allocate the instance data and the locks.
794	*/
795	PGMM pGMM = (PGMM)RTMemAllocZ(sizeof(*pGMM));
796	if (!pGMM)
797	return VERR_NO_MEMORY;
798
799	pGMM->u32Magic = GMM_MAGIC;
800	for (unsigned i = 0; i < RT_ELEMENTS(pGMM->ChunkTLB.aEntries); i++)
801	pGMM->ChunkTLB.aEntries[i].idChunk = NIL_GMM_CHUNKID;
802	RTListInit(&pGMM->ChunkList);
803	ASMBitSet(&pGMM->bmChunkId[0], NIL_GMM_CHUNKID);
804
805	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
806	int rc = RTCritSectInit(&pGMM->GiantCritSect);
807	#else
808	int rc = RTSemFastMutexCreate(&pGMM->hMtx);
809	#endif
810	if (RT_SUCCESS(rc))
811	{
812	unsigned iMtx;
813	for (iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
814	{
815	rc = RTSemFastMutexCreate(&pGMM->aChunkMtx[iMtx].hMtx);
816	if (RT_FAILURE(rc))
817	break;
818	}
819	pGMM->hSpinLockTree = NIL_RTSPINLOCK;
820	if (RT_SUCCESS(rc))
821	rc = RTSpinlockCreate(&pGMM->hSpinLockTree, RTSPINLOCK_FLAGS_INTERRUPT_SAFE, "gmm-chunk-tree");
822	if (RT_SUCCESS(rc))
823	{
824	#ifndef GMM_WITH_LEGACY_MODE
825	/*
826	* Figure out how we're going to allocate stuff (only applicable to
827	* host with linear physical memory mappings).
828	*/
829	pGMM->fBoundMemoryMode = false;
830	# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
831	pGMM->fHasWorkingAllocPhysNC = false;
832
833	RTR0MEMOBJ hMemObj;
834	rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
835	if (RT_SUCCESS(rc))
836	{
837	rc = RTR0MemObjFree(hMemObj, true);
838	AssertRC(rc);
839	pGMM->fHasWorkingAllocPhysNC = true;
840	}
841	else if (rc != VERR_NOT_SUPPORTED)
842	SUPR0Printf("GMMR0Init: Warning! RTR0MemObjAllocPhysNC(, %u, NIL_RTHCPHYS) -> %d!\n", GMM_CHUNK_SIZE, rc);
843	# endif
844	#else /* GMM_WITH_LEGACY_MODE */
845	/*
846	* Check and see if RTR0MemObjAllocPhysNC works.
847	*/
848	# if 0 /* later, see @bufref{3170}. */
849	RTR0MEMOBJ MemObj;
850	rc = RTR0MemObjAllocPhysNC(&MemObj, _64K, NIL_RTHCPHYS);
851	if (RT_SUCCESS(rc))
852	{
853	rc = RTR0MemObjFree(MemObj, true);
854	AssertRC(rc);
855	}
856	else if (rc == VERR_NOT_SUPPORTED)
857	pGMM->fLegacyAllocationMode = pGMM->fBoundMemoryMode = true;
858	else
859	SUPR0Printf("GMMR0Init: RTR0MemObjAllocPhysNC(,64K,Any) -> %d!\n", rc);
860	# else
861	# if defined(RT_OS_WINDOWS) \|\| (defined(RT_OS_SOLARIS) && ARCH_BITS == 64) \|\| defined(RT_OS_LINUX) \|\| defined(RT_OS_FREEBSD)
862	pGMM->fLegacyAllocationMode = false;
863	# if ARCH_BITS == 32
864	/* Don't reuse possibly partial chunks because of the virtual
865	address space limitation. */
866	pGMM->fBoundMemoryMode = true;
867	# else
868	pGMM->fBoundMemoryMode = false;
869	# endif
870	# else
871	pGMM->fLegacyAllocationMode = true;
872	pGMM->fBoundMemoryMode = true;
873	# endif
874	# endif
875	#endif /* GMM_WITH_LEGACY_MODE */
876
877	/*
878	* Query system page count and guess a reasonable cMaxPages value.
879	*/
880	pGMM->cMaxPages = UINT32_MAX; /** @todo IPRT function for query ram size and such. */
881
882	g_pGMM = pGMM;
883	#ifdef GMM_WITH_LEGACY_MODE
884	LogFlow(("GMMInit: pGMM=%p fLegacyAllocationMode=%RTbool fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fLegacyAllocationMode, pGMM->fBoundMemoryMode));
885	#elif defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
886	LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool fHasWorkingAllocPhysNC=%RTbool\n", pGMM, pGMM->fBoundMemoryMode, pGMM->fHasWorkingAllocPhysNC));
887	#else
888	LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fBoundMemoryMode));
889	#endif
890	return VINF_SUCCESS;
891	}
892
893	/*
894	* Bail out.
895	*/
896	RTSpinlockDestroy(pGMM->hSpinLockTree);
897	while (iMtx-- > 0)
898	RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
899	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
900	RTCritSectDelete(&pGMM->GiantCritSect);
901	#else
902	RTSemFastMutexDestroy(pGMM->hMtx);
903	#endif
904	}
905
906	pGMM->u32Magic = 0;
907	RTMemFree(pGMM);
908	SUPR0Printf("GMMR0Init: failed! rc=%d\n", rc);
909	return rc;
910	}
911
912
913	/**
914	* Terminates the GMM component.
915	*/
916	GMMR0DECL(void) GMMR0Term(void)
917	{
918	LogFlow(("GMMTerm:\n"));
919
920	/*
921	* Take care / be paranoid...
922	*/
923	PGMM pGMM = g_pGMM;
924	if (!VALID_PTR(pGMM))
925	return;
926	if (pGMM->u32Magic != GMM_MAGIC)
927	{
928	SUPR0Printf("GMMR0Term: u32Magic=%#x\n", pGMM->u32Magic);
929	return;
930	}
931
932	/*
933	* Undo what init did and free all the resources we've acquired.
934	*/
935	/* Destroy the fundamentals. */
936	g_pGMM = NULL;
937	pGMM->u32Magic = ~GMM_MAGIC;
938	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
939	RTCritSectDelete(&pGMM->GiantCritSect);
940	#else
941	RTSemFastMutexDestroy(pGMM->hMtx);
942	pGMM->hMtx = NIL_RTSEMFASTMUTEX;
943	#endif
944	RTSpinlockDestroy(pGMM->hSpinLockTree);
945	pGMM->hSpinLockTree = NIL_RTSPINLOCK;
946
947	/* Free any chunks still hanging around. */
948	RTAvlU32Destroy(&pGMM->pChunks, gmmR0TermDestroyChunk, pGMM);
949
950	/* Destroy the chunk locks. */
951	for (unsigned iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
952	{
953	Assert(pGMM->aChunkMtx[iMtx].cUsers == 0);
954	RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
955	pGMM->aChunkMtx[iMtx].hMtx = NIL_RTSEMFASTMUTEX;
956	}
957
958	/* Finally the instance data itself. */
959	RTMemFree(pGMM);
960	LogFlow(("GMMTerm: done\n"));
961	}
962
963
964	/**
965	* RTAvlU32Destroy callback.
966	*
967	* @returns 0
968	* @param pNode The node to destroy.
969	* @param pvGMM The GMM handle.
970	*/
971	static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM)
972	{
973	PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
974
975	if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
976	SUPR0Printf("GMMR0Term: %RKv/%#x: cFree=%d cPrivate=%d cShared=%d cMappings=%d\n", pChunk,
977	pChunk->Core.Key, pChunk->cFree, pChunk->cPrivate, pChunk->cShared, pChunk->cMappingsX);
978
979	int rc = RTR0MemObjFree(pChunk->hMemObj, true /* fFreeMappings */);
980	if (RT_FAILURE(rc))
981	{
982	SUPR0Printf("GMMR0Term: %RKv/%#x: RTRMemObjFree(%RKv,true) -> %d (cMappings=%d)\n", pChunk,
983	pChunk->Core.Key, pChunk->hMemObj, rc, pChunk->cMappingsX);
984	AssertRC(rc);
985	}
986	pChunk->hMemObj = NIL_RTR0MEMOBJ;
987
988	RTMemFree(pChunk->paMappingsX);
989	pChunk->paMappingsX = NULL;
990
991	RTMemFree(pChunk);
992	NOREF(pvGMM);
993	return 0;
994	}
995
996
997	/**
998	* Initializes the per-VM data for the GMM.
999	*
1000	* This is called from within the GVMM lock (from GVMMR0CreateVM)
1001	* and should only initialize the data members so GMMR0CleanupVM
1002	* can deal with them. We reserve no memory or anything here,
1003	* that's done later in GMMR0InitVM.
1004	*
1005	* @param pGVM Pointer to the Global VM structure.
1006	*/
1007	GMMR0DECL(void) GMMR0InitPerVMData(PGVM pGVM)
1008	{
1009	AssertCompile(RT_SIZEOFMEMB(GVM,gmm.s) <= RT_SIZEOFMEMB(GVM,gmm.padding));
1010
1011	pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1012	pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1013	pGVM->gmm.s.Stats.fMayAllocate = false;
1014	}
1015
1016
1017	/**
1018	* Acquires the GMM giant lock.
1019	*
1020	* @returns Assert status code from RTSemFastMutexRequest.
1021	* @param pGMM Pointer to the GMM instance.
1022	*/
1023	static int gmmR0MutexAcquire(PGMM pGMM)
1024	{
1025	ASMAtomicIncU32(&pGMM->cMtxContenders);
1026	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1027	int rc = RTCritSectEnter(&pGMM->GiantCritSect);
1028	#else
1029	int rc = RTSemFastMutexRequest(pGMM->hMtx);
1030	#endif
1031	ASMAtomicDecU32(&pGMM->cMtxContenders);
1032	AssertRC(rc);
1033	#ifdef VBOX_STRICT
1034	pGMM->hMtxOwner = RTThreadNativeSelf();
1035	#endif
1036	return rc;
1037	}
1038
1039
1040	/**
1041	* Releases the GMM giant lock.
1042	*
1043	* @returns Assert status code from RTSemFastMutexRequest.
1044	* @param pGMM Pointer to the GMM instance.
1045	*/
1046	static int gmmR0MutexRelease(PGMM pGMM)
1047	{
1048	#ifdef VBOX_STRICT
1049	pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1050	#endif
1051	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1052	int rc = RTCritSectLeave(&pGMM->GiantCritSect);
1053	#else
1054	int rc = RTSemFastMutexRelease(pGMM->hMtx);
1055	AssertRC(rc);
1056	#endif
1057	return rc;
1058	}
1059
1060
1061	/**
1062	* Yields the GMM giant lock if there is contention and a certain minimum time
1063	* has elapsed since we took it.
1064	*
1065	* @returns @c true if the mutex was yielded, @c false if not.
1066	* @param pGMM Pointer to the GMM instance.
1067	* @param puLockNanoTS Where the lock acquisition time stamp is kept
1068	* (in/out).
1069	*/
1070	static bool gmmR0MutexYield(PGMM pGMM, uint64_t *puLockNanoTS)
1071	{
1072	/*
1073	* If nobody is contending the mutex, don't bother checking the time.
1074	*/
1075	if (ASMAtomicReadU32(&pGMM->cMtxContenders) == 0)
1076	return false;
1077
1078	/*
1079	* Don't yield if we haven't executed for at least 2 milliseconds.
1080	*/
1081	uint64_t uNanoNow = RTTimeSystemNanoTS();
1082	if (uNanoNow - *puLockNanoTS < UINT32_C(2000000))
1083	return false;
1084
1085	/*
1086	* Yield the mutex.
1087	*/
1088	#ifdef VBOX_STRICT
1089	pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1090	#endif
1091	ASMAtomicIncU32(&pGMM->cMtxContenders);
1092	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1093	int rc1 = RTCritSectLeave(&pGMM->GiantCritSect); AssertRC(rc1);
1094	#else
1095	int rc1 = RTSemFastMutexRelease(pGMM->hMtx); AssertRC(rc1);
1096	#endif
1097
1098	RTThreadYield();
1099
1100	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1101	int rc2 = RTCritSectEnter(&pGMM->GiantCritSect); AssertRC(rc2);
1102	#else
1103	int rc2 = RTSemFastMutexRequest(pGMM->hMtx); AssertRC(rc2);
1104	#endif
1105	*puLockNanoTS = RTTimeSystemNanoTS();
1106	ASMAtomicDecU32(&pGMM->cMtxContenders);
1107	#ifdef VBOX_STRICT
1108	pGMM->hMtxOwner = RTThreadNativeSelf();
1109	#endif
1110
1111	return true;
1112	}
1113
1114
1115	/**
1116	* Acquires a chunk lock.
1117	*
1118	* The caller must own the giant lock.
1119	*
1120	* @returns Assert status code from RTSemFastMutexRequest.
1121	* @param pMtxState The chunk mutex state info. (Avoids
1122	* passing the same flags and stuff around
1123	* for subsequent release and drop-giant
1124	* calls.)
1125	* @param pGMM Pointer to the GMM instance.
1126	* @param pChunk Pointer to the chunk.
1127	* @param fFlags Flags regarding the giant lock, GMMR0CHUNK_MTX_XXX.
1128	*/
1129	static int gmmR0ChunkMutexAcquire(PGMMR0CHUNKMTXSTATE pMtxState, PGMM pGMM, PGMMCHUNK pChunk, uint32_t fFlags)
1130	{
1131	Assert(fFlags > GMMR0CHUNK_MTX_INVALID && fFlags < GMMR0CHUNK_MTX_END);
1132	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1133
1134	pMtxState->pGMM = pGMM;
1135	pMtxState->fFlags = (uint8_t)fFlags;
1136
1137	/*
1138	* Get the lock index and reference the lock.
1139	*/
1140	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1141	uint32_t iChunkMtx = pChunk->iChunkMtx;
1142	if (iChunkMtx == UINT8_MAX)
1143	{
1144	iChunkMtx = pGMM->iNextChunkMtx++;
1145	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1146
1147	/* Try get an unused one... */
1148	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1149	{
1150	iChunkMtx = pGMM->iNextChunkMtx++;
1151	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1152	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1153	{
1154	iChunkMtx = pGMM->iNextChunkMtx++;
1155	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1156	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1157	{
1158	iChunkMtx = pGMM->iNextChunkMtx++;
1159	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1160	}
1161	}
1162	}
1163
1164	pChunk->iChunkMtx = iChunkMtx;
1165	}
1166	AssertCompile(RT_ELEMENTS(pGMM->aChunkMtx) < UINT8_MAX);
1167	pMtxState->iChunkMtx = (uint8_t)iChunkMtx;
1168	ASMAtomicIncU32(&pGMM->aChunkMtx[iChunkMtx].cUsers);
1169
1170	/*
1171	* Drop the giant?
1172	*/
1173	if (fFlags != GMMR0CHUNK_MTX_KEEP_GIANT)
1174	{
1175	/** @todo GMM life cycle cleanup (we may race someone
1176	* destroying and cleaning up GMM)? */
1177	gmmR0MutexRelease(pGMM);
1178	}
1179
1180	/*
1181	* Take the chunk mutex.
1182	*/
1183	int rc = RTSemFastMutexRequest(pGMM->aChunkMtx[iChunkMtx].hMtx);
1184	AssertRC(rc);
1185	return rc;
1186	}
1187
1188
1189	/**
1190	* Releases the GMM giant lock.
1191	*
1192	* @returns Assert status code from RTSemFastMutexRequest.
1193	* @param pMtxState Pointer to the chunk mutex state.
1194	* @param pChunk Pointer to the chunk if it's still
1195	* alive, NULL if it isn't. This is used to deassociate
1196	* the chunk from the mutex on the way out so a new one
1197	* can be selected next time, thus avoiding contented
1198	* mutexes.
1199	*/
1200	static int gmmR0ChunkMutexRelease(PGMMR0CHUNKMTXSTATE pMtxState, PGMMCHUNK pChunk)
1201	{
1202	PGMM pGMM = pMtxState->pGMM;
1203
1204	/*
1205	* Release the chunk mutex and reacquire the giant if requested.
1206	*/
1207	int rc = RTSemFastMutexRelease(pGMM->aChunkMtx[pMtxState->iChunkMtx].hMtx);
1208	AssertRC(rc);
1209	if (pMtxState->fFlags == GMMR0CHUNK_MTX_RETAKE_GIANT)
1210	rc = gmmR0MutexAcquire(pGMM);
1211	else
1212	Assert((pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT) == (pGMM->hMtxOwner == RTThreadNativeSelf()));
1213
1214	/*
1215	* Drop the chunk mutex user reference and deassociate it from the chunk
1216	* when possible.
1217	*/
1218	if ( ASMAtomicDecU32(&pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers) == 0
1219	&& pChunk
1220	&& RT_SUCCESS(rc) )
1221	{
1222	if (pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT)
1223	pChunk->iChunkMtx = UINT8_MAX;
1224	else
1225	{
1226	rc = gmmR0MutexAcquire(pGMM);
1227	if (RT_SUCCESS(rc))
1228	{
1229	if (pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers == 0)
1230	pChunk->iChunkMtx = UINT8_MAX;
1231	rc = gmmR0MutexRelease(pGMM);
1232	}
1233	}
1234	}
1235
1236	pMtxState->pGMM = NULL;
1237	return rc;
1238	}
1239
1240
1241	/**
1242	* Drops the giant GMM lock we kept in gmmR0ChunkMutexAcquire while keeping the
1243	* chunk locked.
1244	*
1245	* This only works if gmmR0ChunkMutexAcquire was called with
1246	* GMMR0CHUNK_MTX_KEEP_GIANT. gmmR0ChunkMutexRelease will retake the giant
1247	* mutex, i.e. behave as if GMMR0CHUNK_MTX_RETAKE_GIANT was used.
1248	*
1249	* @returns VBox status code (assuming success is ok).
1250	* @param pMtxState Pointer to the chunk mutex state.
1251	*/
1252	static int gmmR0ChunkMutexDropGiant(PGMMR0CHUNKMTXSTATE pMtxState)
1253	{
1254	AssertReturn(pMtxState->fFlags == GMMR0CHUNK_MTX_KEEP_GIANT, VERR_GMM_MTX_FLAGS);
1255	Assert(pMtxState->pGMM->hMtxOwner == RTThreadNativeSelf());
1256	pMtxState->fFlags = GMMR0CHUNK_MTX_RETAKE_GIANT;
1257	/** @todo GMM life cycle cleanup (we may race someone
1258	* destroying and cleaning up GMM)? */
1259	return gmmR0MutexRelease(pMtxState->pGMM);
1260	}
1261
1262
1263	/**
1264	* For experimenting with NUMA affinity and such.
1265	*
1266	* @returns The current NUMA Node ID.
1267	*/
1268	static uint16_t gmmR0GetCurrentNumaNodeId(void)
1269	{
1270	#if 1
1271	return GMM_CHUNK_NUMA_ID_UNKNOWN;
1272	#else
1273	return RTMpCpuId() / 16;
1274	#endif
1275	}
1276
1277
1278
1279	/**
1280	* Cleans up when a VM is terminating.
1281	*
1282	* @param pGVM Pointer to the Global VM structure.
1283	*/
1284	GMMR0DECL(void) GMMR0CleanupVM(PGVM pGVM)
1285	{
1286	LogFlow(("GMMR0CleanupVM: pGVM=%p:{.hSelf=%#x}\n", pGVM, pGVM->hSelf));
1287
1288	PGMM pGMM;
1289	GMM_GET_VALID_INSTANCE_VOID(pGMM);
1290
1291	#ifdef VBOX_WITH_PAGE_SHARING
1292	/*
1293	* Clean up all registered shared modules first.
1294	*/
1295	gmmR0SharedModuleCleanup(pGMM, pGVM);
1296	#endif
1297
1298	gmmR0MutexAcquire(pGMM);
1299	uint64_t uLockNanoTS = RTTimeSystemNanoTS();
1300	GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
1301
1302	/*
1303	* The policy is 'INVALID' until the initial reservation
1304	* request has been serviced.
1305	*/
1306	if ( pGVM->gmm.s.Stats.enmPolicy > GMMOCPOLICY_INVALID
1307	&& pGVM->gmm.s.Stats.enmPolicy < GMMOCPOLICY_END)
1308	{
1309	/*
1310	* If it's the last VM around, we can skip walking all the chunk looking
1311	* for the pages owned by this VM and instead flush the whole shebang.
1312	*
1313	* This takes care of the eventuality that a VM has left shared page
1314	* references behind (shouldn't happen of course, but you never know).
1315	*/
1316	Assert(pGMM->cRegisteredVMs);
1317	pGMM->cRegisteredVMs--;
1318
1319	/*
1320	* Walk the entire pool looking for pages that belong to this VM
1321	* and leftover mappings. (This'll only catch private pages,
1322	* shared pages will be 'left behind'.)
1323	*/
1324	/** @todo r=bird: This scanning+freeing could be optimized in bound mode! */
1325	uint64_t cPrivatePages = pGVM->gmm.s.Stats.cPrivatePages; /* save */
1326
1327	unsigned iCountDown = 64;
1328	bool fRedoFromStart;
1329	PGMMCHUNK pChunk;
1330	do
1331	{
1332	fRedoFromStart = false;
1333	RTListForEachReverse(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
1334	{
1335	uint32_t const cFreeChunksOld = pGMM->cFreedChunks;
1336	if ( ( !pGMM->fBoundMemoryMode
1337	\|\| pChunk->hGVM == pGVM->hSelf)
1338	&& gmmR0CleanupVMScanChunk(pGMM, pGVM, pChunk))
1339	{
1340	/* We left the giant mutex, so reset the yield counters. */
1341	uLockNanoTS = RTTimeSystemNanoTS();
1342	iCountDown = 64;
1343	}
1344	else
1345	{
1346	/* Didn't leave it, so do normal yielding. */
1347	if (!iCountDown)
1348	gmmR0MutexYield(pGMM, &uLockNanoTS);
1349	else
1350	iCountDown--;
1351	}
1352	if (pGMM->cFreedChunks != cFreeChunksOld)
1353	{
1354	fRedoFromStart = true;
1355	break;
1356	}
1357	}
1358	} while (fRedoFromStart);
1359
1360	if (pGVM->gmm.s.Stats.cPrivatePages)
1361	SUPR0Printf("GMMR0CleanupVM: hGVM=%#x has %#x private pages that cannot be found!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cPrivatePages);
1362
1363	pGMM->cAllocatedPages -= cPrivatePages;
1364
1365	/*
1366	* Free empty chunks.
1367	*/
1368	PGMMCHUNKFREESET pPrivateSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
1369	do
1370	{
1371	fRedoFromStart = false;
1372	iCountDown = 10240;
1373	pChunk = pPrivateSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
1374	while (pChunk)
1375	{
1376	PGMMCHUNK pNext = pChunk->pFreeNext;
1377	Assert(pChunk->cFree == GMM_CHUNK_NUM_PAGES);
1378	if ( !pGMM->fBoundMemoryMode
1379	\|\| pChunk->hGVM == pGVM->hSelf)
1380	{
1381	uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1382	if (gmmR0FreeChunk(pGMM, pGVM, pChunk, true /fRelaxedSem/))
1383	{
1384	/* We've left the giant mutex, restart? (+1 for our unlink) */
1385	fRedoFromStart = pPrivateSet->idGeneration != idGenerationOld + 1;
1386	if (fRedoFromStart)
1387	break;
1388	uLockNanoTS = RTTimeSystemNanoTS();
1389	iCountDown = 10240;
1390	}
1391	}
1392
1393	/* Advance and maybe yield the lock. */
1394	pChunk = pNext;
1395	if (--iCountDown == 0)
1396	{
1397	uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1398	fRedoFromStart = gmmR0MutexYield(pGMM, &uLockNanoTS)
1399	&& pPrivateSet->idGeneration != idGenerationOld;
1400	if (fRedoFromStart)
1401	break;
1402	iCountDown = 10240;
1403	}
1404	}
1405	} while (fRedoFromStart);
1406
1407	/*
1408	* Account for shared pages that weren't freed.
1409	*/
1410	if (pGVM->gmm.s.Stats.cSharedPages)
1411	{
1412	Assert(pGMM->cSharedPages >= pGVM->gmm.s.Stats.cSharedPages);
1413	SUPR0Printf("GMMR0CleanupVM: hGVM=%#x left %#x shared pages behind!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cSharedPages);
1414	pGMM->cLeftBehindSharedPages += pGVM->gmm.s.Stats.cSharedPages;
1415	}
1416
1417	/*
1418	* Clean up balloon statistics in case the VM process crashed.
1419	*/
1420	Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
1421	pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
1422
1423	/*
1424	* Update the over-commitment management statistics.
1425	*/
1426	pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1427	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
1428	+ pGVM->gmm.s.Stats.Reserved.cShadowPages;
1429	switch (pGVM->gmm.s.Stats.enmPolicy)
1430	{
1431	case GMMOCPOLICY_NO_OC:
1432	break;
1433	default:
1434	/** @todo Update GMM->cOverCommittedPages */
1435	break;
1436	}
1437	}
1438
1439	/* zap the GVM data. */
1440	pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1441	pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1442	pGVM->gmm.s.Stats.fMayAllocate = false;
1443
1444	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1445	gmmR0MutexRelease(pGMM);
1446
1447	LogFlow(("GMMR0CleanupVM: returns\n"));
1448	}
1449
1450
1451	/**
1452	* Scan one chunk for private pages belonging to the specified VM.
1453	*
1454	* @note This function may drop the giant mutex!
1455	*
1456	* @returns @c true if we've temporarily dropped the giant mutex, @c false if
1457	* we didn't.
1458	* @param pGMM Pointer to the GMM instance.
1459	* @param pGVM The global VM handle.
1460	* @param pChunk The chunk to scan.
1461	*/
1462	static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1463	{
1464	Assert(!pGMM->fBoundMemoryMode \|\| pChunk->hGVM == pGVM->hSelf);
1465
1466	/*
1467	* Look for pages belonging to the VM.
1468	* (Perform some internal checks while we're scanning.)
1469	*/
1470	#ifndef VBOX_STRICT
1471	if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
1472	#endif
1473	{
1474	unsigned cPrivate = 0;
1475	unsigned cShared = 0;
1476	unsigned cFree = 0;
1477
1478	gmmR0UnlinkChunk(pChunk); /* avoiding cFreePages updates. */
1479
1480	uint16_t hGVM = pGVM->hSelf;
1481	unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
1482	while (iPage-- > 0)
1483	if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
1484	{
1485	if (pChunk->aPages[iPage].Private.hGVM == hGVM)
1486	{
1487	/*
1488	* Free the page.
1489	*
1490	* The reason for not using gmmR0FreePrivatePage here is that we
1491	* must not cause the chunk to be freed from under us - we're in
1492	* an AVL tree walk here.
1493	*/
1494	pChunk->aPages[iPage].u = 0;
1495	pChunk->aPages[iPage].Free.iNext = pChunk->iFreeHead;
1496	pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
1497	pChunk->iFreeHead = iPage;
1498	pChunk->cPrivate--;
1499	pChunk->cFree++;
1500	pGVM->gmm.s.Stats.cPrivatePages--;
1501	cFree++;
1502	}
1503	else
1504	cPrivate++;
1505	}
1506	else if (GMM_PAGE_IS_FREE(&pChunk->aPages[iPage]))
1507	cFree++;
1508	else
1509	cShared++;
1510
1511	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1512
1513	/*
1514	* Did it add up?
1515	*/
1516	if (RT_UNLIKELY( pChunk->cFree != cFree
1517	\|\| pChunk->cPrivate != cPrivate
1518	\|\| pChunk->cShared != cShared))
1519	{
1520	SUPR0Printf("gmmR0CleanupVMScanChunk: Chunk %RKv/%#x has bogus stats - free=%d/%d private=%d/%d shared=%d/%d\n",
1521	pChunk, pChunk->Core.Key, pChunk->cFree, cFree, pChunk->cPrivate, cPrivate, pChunk->cShared, cShared);
1522	pChunk->cFree = cFree;
1523	pChunk->cPrivate = cPrivate;
1524	pChunk->cShared = cShared;
1525	}
1526	}
1527
1528	/*
1529	* If not in bound memory mode, we should reset the hGVM field
1530	* if it has our handle in it.
1531	*/
1532	if (pChunk->hGVM == pGVM->hSelf)
1533	{
1534	if (!g_pGMM->fBoundMemoryMode)
1535	pChunk->hGVM = NIL_GVM_HANDLE;
1536	else if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
1537	{
1538	SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: cFree=%#x - it should be 0 in bound mode!\n",
1539	pChunk, pChunk->Core.Key, pChunk->cFree);
1540	AssertMsgFailed(("%p/%#x: cFree=%#x - it should be 0 in bound mode!\n", pChunk, pChunk->Core.Key, pChunk->cFree));
1541
1542	gmmR0UnlinkChunk(pChunk);
1543	pChunk->cFree = GMM_CHUNK_NUM_PAGES;
1544	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1545	}
1546	}
1547
1548	/*
1549	* Look for a mapping belonging to the terminating VM.
1550	*/
1551	GMMR0CHUNKMTXSTATE MtxState;
1552	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
1553	unsigned cMappings = pChunk->cMappingsX;
1554	for (unsigned i = 0; i < cMappings; i++)
1555	if (pChunk->paMappingsX[i].pGVM == pGVM)
1556	{
1557	gmmR0ChunkMutexDropGiant(&MtxState);
1558
1559	RTR0MEMOBJ hMemObj = pChunk->paMappingsX[i].hMapObj;
1560
1561	cMappings--;
1562	if (i < cMappings)
1563	pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
1564	pChunk->paMappingsX[cMappings].pGVM = NULL;
1565	pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
1566	Assert(pChunk->cMappingsX - 1U == cMappings);
1567	pChunk->cMappingsX = cMappings;
1568
1569	int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings (NA) */);
1570	if (RT_FAILURE(rc))
1571	{
1572	SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: mapping #%x: RTRMemObjFree(%RKv,false) -> %d \n",
1573	pChunk, pChunk->Core.Key, i, hMemObj, rc);
1574	AssertRC(rc);
1575	}
1576
1577	gmmR0ChunkMutexRelease(&MtxState, pChunk);
1578	return true;
1579	}
1580
1581	gmmR0ChunkMutexRelease(&MtxState, pChunk);
1582	return false;
1583	}
1584
1585
1586	/**
1587	* The initial resource reservations.
1588	*
1589	* This will make memory reservations according to policy and priority. If there aren't
1590	* sufficient resources available to sustain the VM this function will fail and all
1591	* future allocations requests will fail as well.
1592	*
1593	* These are just the initial reservations made very very early during the VM creation
1594	* process and will be adjusted later in the GMMR0UpdateReservation call after the
1595	* ring-3 init has completed.
1596	*
1597	* @returns VBox status code.
1598	* @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1599	* @retval VERR_GMM_
1600	*
1601	* @param pGVM The global (ring-0) VM structure.
1602	* @param idCpu The VCPU id - must be zero.
1603	* @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1604	* This does not include MMIO2 and similar.
1605	* @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1606	* @param cFixedPages The number of pages that may be allocated for fixed objects like the
1607	* hyper heap, MMIO2 and similar.
1608	* @param enmPolicy The OC policy to use on this VM.
1609	* @param enmPriority The priority in an out-of-memory situation.
1610	*
1611	* @thread The creator thread / EMT(0).
1612	*/
1613	GMMR0DECL(int) GMMR0InitialReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages,
1614	uint32_t cFixedPages, GMMOCPOLICY enmPolicy, GMMPRIORITY enmPriority)
1615	{
1616	LogFlow(("GMMR0InitialReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x enmPolicy=%d enmPriority=%d\n",
1617	pGVM, cBasePages, cShadowPages, cFixedPages, enmPolicy, enmPriority));
1618
1619	/*
1620	* Validate, get basics and take the semaphore.
1621	*/
1622	AssertReturn(idCpu == 0, VERR_INVALID_CPU_ID);
1623	PGMM pGMM;
1624	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1625	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1626	if (RT_FAILURE(rc))
1627	return rc;
1628
1629	AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1630	AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1631	AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1632	AssertReturn(enmPolicy > GMMOCPOLICY_INVALID && enmPolicy < GMMOCPOLICY_END, VERR_INVALID_PARAMETER);
1633	AssertReturn(enmPriority > GMMPRIORITY_INVALID && enmPriority < GMMPRIORITY_END, VERR_INVALID_PARAMETER);
1634
1635	gmmR0MutexAcquire(pGMM);
1636	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1637	{
1638	if ( !pGVM->gmm.s.Stats.Reserved.cBasePages
1639	&& !pGVM->gmm.s.Stats.Reserved.cFixedPages
1640	&& !pGVM->gmm.s.Stats.Reserved.cShadowPages)
1641	{
1642	/*
1643	* Check if we can accommodate this.
1644	*/
1645	/* ... later ... */
1646	if (RT_SUCCESS(rc))
1647	{
1648	/*
1649	* Update the records.
1650	*/
1651	pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1652	pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1653	pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1654	pGVM->gmm.s.Stats.enmPolicy = enmPolicy;
1655	pGVM->gmm.s.Stats.enmPriority = enmPriority;
1656	pGVM->gmm.s.Stats.fMayAllocate = true;
1657
1658	pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1659	pGMM->cRegisteredVMs++;
1660	}
1661	}
1662	else
1663	rc = VERR_WRONG_ORDER;
1664	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1665	}
1666	else
1667	rc = VERR_GMM_IS_NOT_SANE;
1668	gmmR0MutexRelease(pGMM);
1669	LogFlow(("GMMR0InitialReservation: returns %Rrc\n", rc));
1670	return rc;
1671	}
1672
1673
1674	/**
1675	* VMMR0 request wrapper for GMMR0InitialReservation.
1676	*
1677	* @returns see GMMR0InitialReservation.
1678	* @param pGVM The global (ring-0) VM structure.
1679	* @param idCpu The VCPU id.
1680	* @param pReq Pointer to the request packet.
1681	*/
1682	GMMR0DECL(int) GMMR0InitialReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMINITIALRESERVATIONREQ pReq)
1683	{
1684	/*
1685	* Validate input and pass it on.
1686	*/
1687	AssertPtrReturn(pGVM, VERR_INVALID_POINTER);
1688	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1689	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
1690
1691	return GMMR0InitialReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages,
1692	pReq->cFixedPages, pReq->enmPolicy, pReq->enmPriority);
1693	}
1694
1695
1696	/**
1697	* This updates the memory reservation with the additional MMIO2 and ROM pages.
1698	*
1699	* @returns VBox status code.
1700	* @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1701	*
1702	* @param pGVM The global (ring-0) VM structure.
1703	* @param idCpu The VCPU id.
1704	* @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1705	* This does not include MMIO2 and similar.
1706	* @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1707	* @param cFixedPages The number of pages that may be allocated for fixed objects like the
1708	* hyper heap, MMIO2 and similar.
1709	*
1710	* @thread EMT(idCpu)
1711	*/
1712	GMMR0DECL(int) GMMR0UpdateReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages,
1713	uint32_t cShadowPages, uint32_t cFixedPages)
1714	{
1715	LogFlow(("GMMR0UpdateReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x\n",
1716	pGVM, cBasePages, cShadowPages, cFixedPages));
1717
1718	/*
1719	* Validate, get basics and take the semaphore.
1720	*/
1721	PGMM pGMM;
1722	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1723	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1724	if (RT_FAILURE(rc))
1725	return rc;
1726
1727	AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1728	AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1729	AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1730
1731	gmmR0MutexAcquire(pGMM);
1732	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1733	{
1734	if ( pGVM->gmm.s.Stats.Reserved.cBasePages
1735	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
1736	&& pGVM->gmm.s.Stats.Reserved.cShadowPages)
1737	{
1738	/*
1739	* Check if we can accommodate this.
1740	*/
1741	/* ... later ... */
1742	if (RT_SUCCESS(rc))
1743	{
1744	/*
1745	* Update the records.
1746	*/
1747	pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1748	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
1749	+ pGVM->gmm.s.Stats.Reserved.cShadowPages;
1750	pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1751
1752	pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1753	pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1754	pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1755	}
1756	}
1757	else
1758	rc = VERR_WRONG_ORDER;
1759	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1760	}
1761	else
1762	rc = VERR_GMM_IS_NOT_SANE;
1763	gmmR0MutexRelease(pGMM);
1764	LogFlow(("GMMR0UpdateReservation: returns %Rrc\n", rc));
1765	return rc;
1766	}
1767
1768
1769	/**
1770	* VMMR0 request wrapper for GMMR0UpdateReservation.
1771	*
1772	* @returns see GMMR0UpdateReservation.
1773	* @param pGVM The global (ring-0) VM structure.
1774	* @param idCpu The VCPU id.
1775	* @param pReq Pointer to the request packet.
1776	*/
1777	GMMR0DECL(int) GMMR0UpdateReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMUPDATERESERVATIONREQ pReq)
1778	{
1779	/*
1780	* Validate input and pass it on.
1781	*/
1782	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1783	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
1784
1785	return GMMR0UpdateReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages);
1786	}
1787
1788	#ifdef GMMR0_WITH_SANITY_CHECK
1789
1790	/**
1791	* Performs sanity checks on a free set.
1792	*
1793	* @returns Error count.
1794	*
1795	* @param pGMM Pointer to the GMM instance.
1796	* @param pSet Pointer to the set.
1797	* @param pszSetName The set name.
1798	* @param pszFunction The function from which it was called.
1799	* @param uLine The line number.
1800	*/
1801	static uint32_t gmmR0SanityCheckSet(PGMM pGMM, PGMMCHUNKFREESET pSet, const char *pszSetName,
1802	const char *pszFunction, unsigned uLineNo)
1803	{
1804	uint32_t cErrors = 0;
1805
1806	/*
1807	* Count the free pages in all the chunks and match it against pSet->cFreePages.
1808	*/
1809	uint32_t cPages = 0;
1810	for (unsigned i = 0; i < RT_ELEMENTS(pSet->apLists); i++)
1811	{
1812	for (PGMMCHUNK pCur = pSet->apLists[i]; pCur; pCur = pCur->pFreeNext)
1813	{
1814	/** @todo check that the chunk is hash into the right set. */
1815	cPages += pCur->cFree;
1816	}
1817	}
1818	if (RT_UNLIKELY(cPages != pSet->cFreePages))
1819	{
1820	SUPR0Printf("GMM insanity: found %#x pages in the %s set, expected %#x. (%s, line %u)\n",
1821	cPages, pszSetName, pSet->cFreePages, pszFunction, uLineNo);
1822	cErrors++;
1823	}
1824
1825	return cErrors;
1826	}
1827
1828
1829	/**
1830	* Performs some sanity checks on the GMM while owning lock.
1831	*
1832	* @returns Error count.
1833	*
1834	* @param pGMM Pointer to the GMM instance.
1835	* @param pszFunction The function from which it is called.
1836	* @param uLineNo The line number.
1837	*/
1838	static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo)
1839	{
1840	uint32_t cErrors = 0;
1841
1842	cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->PrivateX, "private", pszFunction, uLineNo);
1843	cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->Shared, "shared", pszFunction, uLineNo);
1844	/** @todo add more sanity checks. */
1845
1846	return cErrors;
1847	}
1848
1849	#endif /* GMMR0_WITH_SANITY_CHECK */
1850
1851	/**
1852	* Looks up a chunk in the tree and fill in the TLB entry for it.
1853	*
1854	* This is not expected to fail and will bitch if it does.
1855	*
1856	* @returns Pointer to the allocation chunk, NULL if not found.
1857	* @param pGMM Pointer to the GMM instance.
1858	* @param idChunk The ID of the chunk to find.
1859	* @param pTlbe Pointer to the TLB entry.
1860	*
1861	* @note Caller owns spinlock.
1862	*/
1863	static PGMMCHUNK gmmR0GetChunkSlow(PGMM pGMM, uint32_t idChunk, PGMMCHUNKTLBE pTlbe)
1864	{
1865	PGMMCHUNK pChunk = (PGMMCHUNK)RTAvlU32Get(&pGMM->pChunks, idChunk);
1866	AssertMsgReturn(pChunk, ("Chunk %#x not found!\n", idChunk), NULL);
1867	pTlbe->idChunk = idChunk;
1868	pTlbe->pChunk = pChunk;
1869	return pChunk;
1870	}
1871
1872
1873	/**
1874	* Finds a allocation chunk, spin-locked.
1875	*
1876	* This is not expected to fail and will bitch if it does.
1877	*
1878	* @returns Pointer to the allocation chunk, NULL if not found.
1879	* @param pGMM Pointer to the GMM instance.
1880	* @param idChunk The ID of the chunk to find.
1881	*/
1882	DECLINLINE(PGMMCHUNK) gmmR0GetChunkLocked(PGMM pGMM, uint32_t idChunk)
1883	{
1884	/*
1885	* Do a TLB lookup, branch if not in the TLB.
1886	*/
1887	PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(idChunk)];
1888	PGMMCHUNK pChunk = pTlbe->pChunk;
1889	if ( pChunk == NULL
1890	\|\| pTlbe->idChunk != idChunk)
1891	pChunk = gmmR0GetChunkSlow(pGMM, idChunk, pTlbe);
1892	return pChunk;
1893	}
1894
1895
1896	/**
1897	* Finds a allocation chunk.
1898	*
1899	* This is not expected to fail and will bitch if it does.
1900	*
1901	* @returns Pointer to the allocation chunk, NULL if not found.
1902	* @param pGMM Pointer to the GMM instance.
1903	* @param idChunk The ID of the chunk to find.
1904	*/
1905	DECLINLINE(PGMMCHUNK) gmmR0GetChunk(PGMM pGMM, uint32_t idChunk)
1906	{
1907	RTSpinlockAcquire(pGMM->hSpinLockTree);
1908	PGMMCHUNK pChunk = gmmR0GetChunkLocked(pGMM, idChunk);
1909	RTSpinlockRelease(pGMM->hSpinLockTree);
1910	return pChunk;
1911	}
1912
1913
1914	/**
1915	* Finds a page.
1916	*
1917	* This is not expected to fail and will bitch if it does.
1918	*
1919	* @returns Pointer to the page, NULL if not found.
1920	* @param pGMM Pointer to the GMM instance.
1921	* @param idPage The ID of the page to find.
1922	*/
1923	DECLINLINE(PGMMPAGE) gmmR0GetPage(PGMM pGMM, uint32_t idPage)
1924	{
1925	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1926	if (RT_LIKELY(pChunk))
1927	return &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
1928	return NULL;
1929	}
1930
1931
1932	#if 0 /* unused */
1933	/**
1934	* Gets the host physical address for a page given by it's ID.
1935	*
1936	* @returns The host physical address or NIL_RTHCPHYS.
1937	* @param pGMM Pointer to the GMM instance.
1938	* @param idPage The ID of the page to find.
1939	*/
1940	DECLINLINE(RTHCPHYS) gmmR0GetPageHCPhys(PGMM pGMM, uint32_t idPage)
1941	{
1942	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1943	if (RT_LIKELY(pChunk))
1944	return RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, idPage & GMM_PAGEID_IDX_MASK);
1945	return NIL_RTHCPHYS;
1946	}
1947	#endif /* unused */
1948
1949
1950	/**
1951	* Selects the appropriate free list given the number of free pages.
1952	*
1953	* @returns Free list index.
1954	* @param cFree The number of free pages in the chunk.
1955	*/
1956	DECLINLINE(unsigned) gmmR0SelectFreeSetList(unsigned cFree)
1957	{
1958	unsigned iList = cFree >> GMM_CHUNK_FREE_SET_SHIFT;
1959	AssertMsg(iList < RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists) / RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists[0]),
1960	("%d (%u)\n", iList, cFree));
1961	return iList;
1962	}
1963
1964
1965	/**
1966	* Unlinks the chunk from the free list it's currently on (if any).
1967	*
1968	* @param pChunk The allocation chunk.
1969	*/
1970	DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk)
1971	{
1972	PGMMCHUNKFREESET pSet = pChunk->pSet;
1973	if (RT_LIKELY(pSet))
1974	{
1975	pSet->cFreePages -= pChunk->cFree;
1976	pSet->idGeneration++;
1977
1978	PGMMCHUNK pPrev = pChunk->pFreePrev;
1979	PGMMCHUNK pNext = pChunk->pFreeNext;
1980	if (pPrev)
1981	pPrev->pFreeNext = pNext;
1982	else
1983	pSet->apLists[gmmR0SelectFreeSetList(pChunk->cFree)] = pNext;
1984	if (pNext)
1985	pNext->pFreePrev = pPrev;
1986
1987	pChunk->pSet = NULL;
1988	pChunk->pFreeNext = NULL;
1989	pChunk->pFreePrev = NULL;
1990	}
1991	else
1992	{
1993	Assert(!pChunk->pFreeNext);
1994	Assert(!pChunk->pFreePrev);
1995	Assert(!pChunk->cFree);
1996	}
1997	}
1998
1999
2000	/**
2001	* Links the chunk onto the appropriate free list in the specified free set.
2002	*
2003	* If no free entries, it's not linked into any list.
2004	*
2005	* @param pChunk The allocation chunk.
2006	* @param pSet The free set.
2007	*/
2008	DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet)
2009	{
2010	Assert(!pChunk->pSet);
2011	Assert(!pChunk->pFreeNext);
2012	Assert(!pChunk->pFreePrev);
2013
2014	if (pChunk->cFree > 0)
2015	{
2016	pChunk->pSet = pSet;
2017	pChunk->pFreePrev = NULL;
2018	unsigned const iList = gmmR0SelectFreeSetList(pChunk->cFree);
2019	pChunk->pFreeNext = pSet->apLists[iList];
2020	if (pChunk->pFreeNext)
2021	pChunk->pFreeNext->pFreePrev = pChunk;
2022	pSet->apLists[iList] = pChunk;
2023
2024	pSet->cFreePages += pChunk->cFree;
2025	pSet->idGeneration++;
2026	}
2027	}
2028
2029
2030	/**
2031	* Links the chunk onto the appropriate free list in the specified free set.
2032	*
2033	* If no free entries, it's not linked into any list.
2034	*
2035	* @param pGMM Pointer to the GMM instance.
2036	* @param pGVM Pointer to the kernel-only VM instace data.
2037	* @param pChunk The allocation chunk.
2038	*/
2039	DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
2040	{
2041	PGMMCHUNKFREESET pSet;
2042	if (pGMM->fBoundMemoryMode)
2043	pSet = &pGVM->gmm.s.Private;
2044	else if (pChunk->cShared)
2045	pSet = &pGMM->Shared;
2046	else
2047	pSet = &pGMM->PrivateX;
2048	gmmR0LinkChunk(pChunk, pSet);
2049	}
2050
2051
2052	/**
2053	* Frees a Chunk ID.
2054	*
2055	* @param pGMM Pointer to the GMM instance.
2056	* @param idChunk The Chunk ID to free.
2057	*/
2058	static void gmmR0FreeChunkId(PGMM pGMM, uint32_t idChunk)
2059	{
2060	AssertReturnVoid(idChunk != NIL_GMM_CHUNKID);
2061	AssertMsg(ASMBitTest(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk));
2062	ASMAtomicBitClear(&pGMM->bmChunkId[0], idChunk);
2063	}
2064
2065
2066	/**
2067	* Allocates a new Chunk ID.
2068	*
2069	* @returns The Chunk ID.
2070	* @param pGMM Pointer to the GMM instance.
2071	*/
2072	static uint32_t gmmR0AllocateChunkId(PGMM pGMM)
2073	{
2074	AssertCompile(!((GMM_CHUNKID_LAST + 1) & 31)); /* must be a multiple of 32 */
2075	AssertCompile(NIL_GMM_CHUNKID == 0);
2076
2077	/*
2078	* Try the next sequential one.
2079	*/
2080	int32_t idChunk = ++pGMM->idChunkPrev;
2081	#if 0 /** @todo enable this code */
2082	if ( idChunk <= GMM_CHUNKID_LAST
2083	&& idChunk > NIL_GMM_CHUNKID
2084	&& !ASMAtomicBitTestAndSet(&pVMM->bmChunkId[0], idChunk))
2085	return idChunk;
2086	#endif
2087
2088	/*
2089	* Scan sequentially from the last one.
2090	*/
2091	if ( (uint32_t)idChunk < GMM_CHUNKID_LAST
2092	&& idChunk > NIL_GMM_CHUNKID)
2093	{
2094	idChunk = ASMBitNextClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1, idChunk - 1);
2095	if (idChunk > NIL_GMM_CHUNKID)
2096	{
2097	AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2098	return pGMM->idChunkPrev = idChunk;
2099	}
2100	}
2101
2102	/*
2103	* Ok, scan from the start.
2104	* We're not racing anyone, so there is no need to expect failures or have restart loops.
2105	*/
2106	idChunk = ASMBitFirstClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1);
2107	AssertMsgReturn(idChunk > NIL_GMM_CHUNKID, ("%#x\n", idChunk), NIL_GVM_HANDLE);
2108	AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2109
2110	return pGMM->idChunkPrev = idChunk;
2111	}
2112
2113
2114	/**
2115	* Allocates one private page.
2116	*
2117	* Worker for gmmR0AllocatePages.
2118	*
2119	* @param pChunk The chunk to allocate it from.
2120	* @param hGVM The GVM handle of the VM requesting memory.
2121	* @param pPageDesc The page descriptor.
2122	*/
2123	static void gmmR0AllocatePage(PGMMCHUNK pChunk, uint32_t hGVM, PGMMPAGEDESC pPageDesc)
2124	{
2125	/* update the chunk stats. */
2126	if (pChunk->hGVM == NIL_GVM_HANDLE)
2127	pChunk->hGVM = hGVM;
2128	Assert(pChunk->cFree);
2129	pChunk->cFree--;
2130	pChunk->cPrivate++;
2131
2132	/* unlink the first free page. */
2133	const uint32_t iPage = pChunk->iFreeHead;
2134	AssertReleaseMsg(iPage < RT_ELEMENTS(pChunk->aPages), ("%d\n", iPage));
2135	PGMMPAGE pPage = &pChunk->aPages[iPage];
2136	Assert(GMM_PAGE_IS_FREE(pPage));
2137	pChunk->iFreeHead = pPage->Free.iNext;
2138	Log3(("A pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x iNext=%#x\n",
2139	pPage, iPage, (pChunk->Core.Key << GMM_CHUNKID_SHIFT) \| iPage,
2140	pPage->Common.u2State, pChunk->iFreeHead, pPage->Free.iNext));
2141
2142	/* make the page private. */
2143	pPage->u = 0;
2144	AssertCompile(GMM_PAGE_STATE_PRIVATE == 0);
2145	pPage->Private.hGVM = hGVM;
2146	AssertCompile(NIL_RTHCPHYS >= GMM_GCPHYS_LAST);
2147	AssertCompile(GMM_GCPHYS_UNSHAREABLE >= GMM_GCPHYS_LAST);
2148	if (pPageDesc->HCPhysGCPhys <= GMM_GCPHYS_LAST)
2149	pPage->Private.pfn = pPageDesc->HCPhysGCPhys >> PAGE_SHIFT;
2150	else
2151	pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE; /* unshareable / unassigned - same thing. */
2152
2153	/* update the page descriptor. */
2154	pPageDesc->HCPhysGCPhys = RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, iPage);
2155	Assert(pPageDesc->HCPhysGCPhys != NIL_RTHCPHYS);
2156	pPageDesc->idPage = (pChunk->Core.Key << GMM_CHUNKID_SHIFT) \| iPage;
2157	pPageDesc->idSharedPage = NIL_GMM_PAGEID;
2158	}
2159
2160
2161	/**
2162	* Picks the free pages from a chunk.
2163	*
2164	* @returns The new page descriptor table index.
2165	* @param pChunk The chunk.
2166	* @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2167	* affinity.
2168	* @param iPage The current page descriptor table index.
2169	* @param cPages The total number of pages to allocate.
2170	* @param paPages The page descriptor table (input + ouput).
2171	*/
2172	static uint32_t gmmR0AllocatePagesFromChunk(PGMMCHUNK pChunk, uint16_t const hGVM, uint32_t iPage, uint32_t cPages,
2173	PGMMPAGEDESC paPages)
2174	{
2175	PGMMCHUNKFREESET pSet = pChunk->pSet; Assert(pSet);
2176	gmmR0UnlinkChunk(pChunk);
2177
2178	for (; pChunk->cFree && iPage < cPages; iPage++)
2179	gmmR0AllocatePage(pChunk, hGVM, &paPages[iPage]);
2180
2181	gmmR0LinkChunk(pChunk, pSet);
2182	return iPage;
2183	}
2184
2185
2186	/**
2187	* Registers a new chunk of memory.
2188	*
2189	* This is called by both gmmR0AllocateOneChunk and GMMR0SeedChunk.
2190	*
2191	* @returns VBox status code. On success, the giant GMM lock will be held, the
2192	* caller must release it (ugly).
2193	* @param pGMM Pointer to the GMM instance.
2194	* @param pSet Pointer to the set.
2195	* @param hMemObj The memory object for the chunk.
2196	* @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2197	* affinity.
2198	* @param fChunkFlags The chunk flags, GMM_CHUNK_FLAGS_XXX.
2199	* @param ppChunk Chunk address (out). Optional.
2200	*
2201	* @remarks The caller must not own the giant GMM mutex.
2202	* The giant GMM mutex will be acquired and returned acquired in
2203	* the success path. On failure, no locks will be held.
2204	*/
2205	static int gmmR0RegisterChunk(PGMM pGMM, PGMMCHUNKFREESET pSet, RTR0MEMOBJ hMemObj, uint16_t hGVM, uint16_t fChunkFlags,
2206	PGMMCHUNK *ppChunk)
2207	{
2208	Assert(pGMM->hMtxOwner != RTThreadNativeSelf());
2209	Assert(hGVM != NIL_GVM_HANDLE \|\| pGMM->fBoundMemoryMode);
2210	#ifdef GMM_WITH_LEGACY_MODE
2211	Assert(fChunkFlags == 0 \|\| fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE \|\| fChunkFlags == GMM_CHUNK_FLAGS_SEEDED);
2212	#else
2213	Assert(fChunkFlags == 0 \|\| fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE);
2214	#endif
2215
2216	#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
2217	/*
2218	* Get a ring-0 mapping of the object.
2219	*/
2220	# ifdef GMM_WITH_LEGACY_MODE
2221	uint8_t pbMapping = !(fChunkFlags & GMM_CHUNK_FLAGS_SEEDED) ? (uint8_t )RTR0MemObjAddress(hMemObj) : NULL;
2222	# else
2223	uint8_t pbMapping = (uint8_t )RTR0MemObjAddress(hMemObj);
2224	# endif
2225	if (!pbMapping)
2226	{
2227	RTR0MEMOBJ hMapObj;
2228	int rc = RTR0MemObjMapKernel(&hMapObj, hMemObj, (void *)-1, 0, RTMEM_PROT_READ \| RTMEM_PROT_WRITE);
2229	if (RT_SUCCESS(rc))
2230	pbMapping = (uint8_t *)RTR0MemObjAddress(hMapObj);
2231	else
2232	return rc;
2233	AssertPtr(pbMapping);
2234	}
2235	#endif
2236
2237	/*
2238	* Allocate a chunk.
2239	*/
2240	int rc;
2241	PGMMCHUNK pChunk = (PGMMCHUNK)RTMemAllocZ(sizeof(*pChunk));
2242	if (pChunk)
2243	{
2244	/*
2245	* Initialize it.
2246	*/
2247	pChunk->hMemObj = hMemObj;
2248	#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
2249	pChunk->pbMapping = pbMapping;
2250	#endif
2251	pChunk->cFree = GMM_CHUNK_NUM_PAGES;
2252	pChunk->hGVM = hGVM;
2253	/pChunk->iFreeHead = 0;/
2254	pChunk->idNumaNode = gmmR0GetCurrentNumaNodeId();
2255	pChunk->iChunkMtx = UINT8_MAX;
2256	pChunk->fFlags = fChunkFlags;
2257	for (unsigned iPage = 0; iPage < RT_ELEMENTS(pChunk->aPages) - 1; iPage++)
2258	{
2259	pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
2260	pChunk->aPages[iPage].Free.iNext = iPage + 1;
2261	}
2262	pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.u2State = GMM_PAGE_STATE_FREE;
2263	pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.iNext = UINT16_MAX;
2264
2265	/*
2266	* Allocate a Chunk ID and insert it into the tree.
2267	* This has to be done behind the mutex of course.
2268	*/
2269	rc = gmmR0MutexAcquire(pGMM);
2270	if (RT_SUCCESS(rc))
2271	{
2272	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2273	{
2274	pChunk->Core.Key = gmmR0AllocateChunkId(pGMM);
2275	if ( pChunk->Core.Key != NIL_GMM_CHUNKID
2276	&& pChunk->Core.Key <= GMM_CHUNKID_LAST)
2277	{
2278	RTSpinlockAcquire(pGMM->hSpinLockTree);
2279	if (RTAvlU32Insert(&pGMM->pChunks, &pChunk->Core))
2280	{
2281	pGMM->cChunks++;
2282	RTListAppend(&pGMM->ChunkList, &pChunk->ListNode);
2283	RTSpinlockRelease(pGMM->hSpinLockTree);
2284
2285	gmmR0LinkChunk(pChunk, pSet);
2286
2287	LogFlow(("gmmR0RegisterChunk: pChunk=%p id=%#x cChunks=%d\n", pChunk, pChunk->Core.Key, pGMM->cChunks));
2288
2289	if (ppChunk)
2290	*ppChunk = pChunk;
2291	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2292	return VINF_SUCCESS;
2293	}
2294	RTSpinlockRelease(pGMM->hSpinLockTree);
2295	}
2296
2297	/* bail out */
2298	rc = VERR_GMM_CHUNK_INSERT;
2299	}
2300	else
2301	rc = VERR_GMM_IS_NOT_SANE;
2302	gmmR0MutexRelease(pGMM);
2303	}
2304
2305	RTMemFree(pChunk);
2306	}
2307	else
2308	rc = VERR_NO_MEMORY;
2309	return rc;
2310	}
2311
2312
2313	/**
2314	* Allocate a new chunk, immediately pick the requested pages from it, and adds
2315	* what's remaining to the specified free set.
2316	*
2317	* @note This will leave the giant mutex while allocating the new chunk!
2318	*
2319	* @returns VBox status code.
2320	* @param pGMM Pointer to the GMM instance data.
2321	* @param pGVM Pointer to the kernel-only VM instace data.
2322	* @param pSet Pointer to the free set.
2323	* @param cPages The number of pages requested.
2324	* @param paPages The page descriptor table (input + output).
2325	* @param piPage The pointer to the page descriptor table index variable.
2326	* This will be updated.
2327	*/
2328	static int gmmR0AllocateChunkNew(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet, uint32_t cPages,
2329	PGMMPAGEDESC paPages, uint32_t *piPage)
2330	{
2331	gmmR0MutexRelease(pGMM);
2332
2333	RTR0MEMOBJ hMemObj;
2334	#ifndef GMM_WITH_LEGACY_MODE
2335	int rc;
2336	# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2337	if (pGMM->fHasWorkingAllocPhysNC)
2338	rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2339	else
2340	# endif
2341	rc = RTR0MemObjAllocPage(&hMemObj, GMM_CHUNK_SIZE, false /fExecutable/);
2342	#else
2343	int rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2344	#endif
2345	if (RT_SUCCESS(rc))
2346	{
2347	/** @todo Duplicate gmmR0RegisterChunk here so we can avoid chaining up the
2348	* free pages first and then unchaining them right afterwards. Instead
2349	* do as much work as possible without holding the giant lock. */
2350	PGMMCHUNK pChunk;
2351	rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, 0 /fChunkFlags/, &pChunk);
2352	if (RT_SUCCESS(rc))
2353	{
2354	piPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, piPage, cPages, paPages);
2355	return VINF_SUCCESS;
2356	}
2357
2358	/* bail out */
2359	RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
2360	}
2361
2362	int rc2 = gmmR0MutexAcquire(pGMM);
2363	AssertRCReturn(rc2, RT_FAILURE(rc) ? rc : rc2);
2364	return rc;
2365
2366	}
2367
2368
2369	/**
2370	* As a last restort we'll pick any page we can get.
2371	*
2372	* @returns The new page descriptor table index.
2373	* @param pSet The set to pick from.
2374	* @param pGVM Pointer to the global VM structure.
2375	* @param iPage The current page descriptor table index.
2376	* @param cPages The total number of pages to allocate.
2377	* @param paPages The page descriptor table (input + ouput).
2378	*/
2379	static uint32_t gmmR0AllocatePagesIndiscriminately(PGMMCHUNKFREESET pSet, PGVM pGVM,
2380	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2381	{
2382	unsigned iList = RT_ELEMENTS(pSet->apLists);
2383	while (iList-- > 0)
2384	{
2385	PGMMCHUNK pChunk = pSet->apLists[iList];
2386	while (pChunk)
2387	{
2388	PGMMCHUNK pNext = pChunk->pFreeNext;
2389
2390	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2391	if (iPage >= cPages)
2392	return iPage;
2393
2394	pChunk = pNext;
2395	}
2396	}
2397	return iPage;
2398	}
2399
2400
2401	/**
2402	* Pick pages from empty chunks on the same NUMA node.
2403	*
2404	* @returns The new page descriptor table index.
2405	* @param pSet The set to pick from.
2406	* @param pGVM Pointer to the global VM structure.
2407	* @param iPage The current page descriptor table index.
2408	* @param cPages The total number of pages to allocate.
2409	* @param paPages The page descriptor table (input + ouput).
2410	*/
2411	static uint32_t gmmR0AllocatePagesFromEmptyChunksOnSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2412	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2413	{
2414	PGMMCHUNK pChunk = pSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
2415	if (pChunk)
2416	{
2417	uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2418	while (pChunk)
2419	{
2420	PGMMCHUNK pNext = pChunk->pFreeNext;
2421
2422	if (pChunk->idNumaNode == idNumaNode)
2423	{
2424	pChunk->hGVM = pGVM->hSelf;
2425	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2426	if (iPage >= cPages)
2427	{
2428	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2429	return iPage;
2430	}
2431	}
2432
2433	pChunk = pNext;
2434	}
2435	}
2436	return iPage;
2437	}
2438
2439
2440	/**
2441	* Pick pages from non-empty chunks on the same NUMA node.
2442	*
2443	* @returns The new page descriptor table index.
2444	* @param pSet The set to pick from.
2445	* @param pGVM Pointer to the global VM structure.
2446	* @param iPage The current page descriptor table index.
2447	* @param cPages The total number of pages to allocate.
2448	* @param paPages The page descriptor table (input + ouput).
2449	*/
2450	static uint32_t gmmR0AllocatePagesFromSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2451	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2452	{
2453	/** @todo start by picking from chunks with about the right size first? */
2454	uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2455	unsigned iList = GMM_CHUNK_FREE_SET_UNUSED_LIST;
2456	while (iList-- > 0)
2457	{
2458	PGMMCHUNK pChunk = pSet->apLists[iList];
2459	while (pChunk)
2460	{
2461	PGMMCHUNK pNext = pChunk->pFreeNext;
2462
2463	if (pChunk->idNumaNode == idNumaNode)
2464	{
2465	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2466	if (iPage >= cPages)
2467	{
2468	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2469	return iPage;
2470	}
2471	}
2472
2473	pChunk = pNext;
2474	}
2475	}
2476	return iPage;
2477	}
2478
2479
2480	/**
2481	* Pick pages that are in chunks already associated with the VM.
2482	*
2483	* @returns The new page descriptor table index.
2484	* @param pGMM Pointer to the GMM instance data.
2485	* @param pGVM Pointer to the global VM structure.
2486	* @param pSet The set to pick from.
2487	* @param iPage The current page descriptor table index.
2488	* @param cPages The total number of pages to allocate.
2489	* @param paPages The page descriptor table (input + ouput).
2490	*/
2491	static uint32_t gmmR0AllocatePagesAssociatedWithVM(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet,
2492	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2493	{
2494	uint16_t const hGVM = pGVM->hSelf;
2495
2496	/* Hint. */
2497	if (pGVM->gmm.s.idLastChunkHint != NIL_GMM_CHUNKID)
2498	{
2499	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pGVM->gmm.s.idLastChunkHint);
2500	if (pChunk && pChunk->cFree)
2501	{
2502	iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2503	if (iPage >= cPages)
2504	return iPage;
2505	}
2506	}
2507
2508	/* Scan. */
2509	for (unsigned iList = 0; iList < RT_ELEMENTS(pSet->apLists); iList++)
2510	{
2511	PGMMCHUNK pChunk = pSet->apLists[iList];
2512	while (pChunk)
2513	{
2514	PGMMCHUNK pNext = pChunk->pFreeNext;
2515
2516	if (pChunk->hGVM == hGVM)
2517	{
2518	iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2519	if (iPage >= cPages)
2520	{
2521	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2522	return iPage;
2523	}
2524	}
2525
2526	pChunk = pNext;
2527	}
2528	}
2529	return iPage;
2530	}
2531
2532
2533
2534	/**
2535	* Pick pages in bound memory mode.
2536	*
2537	* @returns The new page descriptor table index.
2538	* @param pGVM Pointer to the global VM structure.
2539	* @param iPage The current page descriptor table index.
2540	* @param cPages The total number of pages to allocate.
2541	* @param paPages The page descriptor table (input + ouput).
2542	*/
2543	static uint32_t gmmR0AllocatePagesInBoundMode(PGVM pGVM, uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2544	{
2545	for (unsigned iList = 0; iList < RT_ELEMENTS(pGVM->gmm.s.Private.apLists); iList++)
2546	{
2547	PGMMCHUNK pChunk = pGVM->gmm.s.Private.apLists[iList];
2548	while (pChunk)
2549	{
2550	Assert(pChunk->hGVM == pGVM->hSelf);
2551	PGMMCHUNK pNext = pChunk->pFreeNext;
2552	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2553	if (iPage >= cPages)
2554	return iPage;
2555	pChunk = pNext;
2556	}
2557	}
2558	return iPage;
2559	}
2560
2561
2562	/**
2563	* Checks if we should start picking pages from chunks of other VMs because
2564	* we're getting close to the system memory or reserved limit.
2565	*
2566	* @returns @c true if we should, @c false if we should first try allocate more
2567	* chunks.
2568	*/
2569	static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(PGVM pGVM)
2570	{
2571	/*
2572	* Don't allocate a new chunk if we're
2573	*/
2574	uint64_t cPgReserved = pGVM->gmm.s.Stats.Reserved.cBasePages
2575	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
2576	- pGVM->gmm.s.Stats.cBalloonedPages
2577	/** @todo what about shared pages? */;
2578	uint64_t cPgAllocated = pGVM->gmm.s.Stats.Allocated.cBasePages
2579	+ pGVM->gmm.s.Stats.Allocated.cFixedPages;
2580	uint64_t cPgDelta = cPgReserved - cPgAllocated;
2581	if (cPgDelta < GMM_CHUNK_NUM_PAGES * 4)
2582	return true;
2583	/** @todo make the threshold configurable, also test the code to see if
2584	* this ever kicks in (we might be reserving too much or smth). */
2585
2586	/*
2587	* Check how close we're to the max memory limit and how many fragments
2588	* there are?...
2589	*/
2590	/** @todo */
2591
2592	return false;
2593	}
2594
2595
2596	/**
2597	* Checks if we should start picking pages from chunks of other VMs because
2598	* there is a lot of free pages around.
2599	*
2600	* @returns @c true if we should, @c false if we should first try allocate more
2601	* chunks.
2602	*/
2603	static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(PGMM pGMM)
2604	{
2605	/*
2606	* Setting the limit at 16 chunks (32 MB) at the moment.
2607	*/
2608	if (pGMM->PrivateX.cFreePages >= GMM_CHUNK_NUM_PAGES * 16)
2609	return true;
2610	return false;
2611	}
2612
2613
2614	/**
2615	* Common worker for GMMR0AllocateHandyPages and GMMR0AllocatePages.
2616	*
2617	* @returns VBox status code:
2618	* @retval VINF_SUCCESS on success.
2619	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk or
2620	* gmmR0AllocateMoreChunks is necessary.
2621	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2622	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2623	* that is we're trying to allocate more than we've reserved.
2624	*
2625	* @param pGMM Pointer to the GMM instance data.
2626	* @param pGVM Pointer to the VM.
2627	* @param cPages The number of pages to allocate.
2628	* @param paPages Pointer to the page descriptors. See GMMPAGEDESC for
2629	* details on what is expected on input.
2630	* @param enmAccount The account to charge.
2631	*
2632	* @remarks Call takes the giant GMM lock.
2633	*/
2634	static int gmmR0AllocatePagesNew(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2635	{
2636	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
2637
2638	/*
2639	* Check allocation limits.
2640	*/
2641	if (RT_UNLIKELY(pGMM->cAllocatedPages + cPages > pGMM->cMaxPages))
2642	return VERR_GMM_HIT_GLOBAL_LIMIT;
2643
2644	switch (enmAccount)
2645	{
2646	case GMMACCOUNT_BASE:
2647	if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
2648	> pGVM->gmm.s.Stats.Reserved.cBasePages))
2649	{
2650	Log(("gmmR0AllocatePages:Base: Reserved=%#llx Allocated+Ballooned+Requested=%#llx+%#llx+%#x!\n",
2651	pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages,
2652	pGVM->gmm.s.Stats.cBalloonedPages, cPages));
2653	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2654	}
2655	break;
2656	case GMMACCOUNT_SHADOW:
2657	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages + cPages > pGVM->gmm.s.Stats.Reserved.cShadowPages))
2658	{
2659	Log(("gmmR0AllocatePages:Shadow: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2660	pGVM->gmm.s.Stats.Reserved.cShadowPages, pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
2661	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2662	}
2663	break;
2664	case GMMACCOUNT_FIXED:
2665	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages + cPages > pGVM->gmm.s.Stats.Reserved.cFixedPages))
2666	{
2667	Log(("gmmR0AllocatePages:Fixed: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2668	pGVM->gmm.s.Stats.Reserved.cFixedPages, pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
2669	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2670	}
2671	break;
2672	default:
2673	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2674	}
2675
2676	#ifdef GMM_WITH_LEGACY_MODE
2677	/*
2678	* If we're in legacy memory mode, it's easy to figure if we have
2679	* sufficient number of pages up-front.
2680	*/
2681	if ( pGMM->fLegacyAllocationMode
2682	&& pGVM->gmm.s.Private.cFreePages < cPages)
2683	{
2684	Assert(pGMM->fBoundMemoryMode);
2685	return VERR_GMM_SEED_ME;
2686	}
2687	#endif
2688
2689	/*
2690	* Update the accounts before we proceed because we might be leaving the
2691	* protection of the global mutex and thus run the risk of permitting
2692	* too much memory to be allocated.
2693	*/
2694	switch (enmAccount)
2695	{
2696	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages += cPages; break;
2697	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages += cPages; break;
2698	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages += cPages; break;
2699	default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2700	}
2701	pGVM->gmm.s.Stats.cPrivatePages += cPages;
2702	pGMM->cAllocatedPages += cPages;
2703
2704	#ifdef GMM_WITH_LEGACY_MODE
2705	/*
2706	* Part two of it's-easy-in-legacy-memory-mode.
2707	*/
2708	if (pGMM->fLegacyAllocationMode)
2709	{
2710	uint32_t iPage = gmmR0AllocatePagesInBoundMode(pGVM, 0, cPages, paPages);
2711	AssertReleaseReturn(iPage == cPages, VERR_GMM_ALLOC_PAGES_IPE);
2712	return VINF_SUCCESS;
2713	}
2714	#endif
2715
2716	/*
2717	* Bound mode is also relatively straightforward.
2718	*/
2719	uint32_t iPage = 0;
2720	int rc = VINF_SUCCESS;
2721	if (pGMM->fBoundMemoryMode)
2722	{
2723	iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2724	if (iPage < cPages)
2725	do
2726	rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGVM->gmm.s.Private, cPages, paPages, &iPage);
2727	while (iPage < cPages && RT_SUCCESS(rc));
2728	}
2729	/*
2730	* Shared mode is trickier as we should try archive the same locality as
2731	* in bound mode, but smartly make use of non-full chunks allocated by
2732	* other VMs if we're low on memory.
2733	*/
2734	else
2735	{
2736	/* Pick the most optimal pages first. */
2737	iPage = gmmR0AllocatePagesAssociatedWithVM(pGMM, pGVM, &pGMM->PrivateX, iPage, cPages, paPages);
2738	if (iPage < cPages)
2739	{
2740	/* Maybe we should try getting pages from chunks "belonging" to
2741	other VMs before allocating more chunks? */
2742	bool fTriedOnSameAlready = false;
2743	if (gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(pGVM))
2744	{
2745	iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2746	fTriedOnSameAlready = true;
2747	}
2748
2749	/* Allocate memory from empty chunks. */
2750	if (iPage < cPages)
2751	iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2752
2753	/* Grab empty shared chunks. */
2754	if (iPage < cPages)
2755	iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2756
2757	/* If there is a lof of free pages spread around, try not waste
2758	system memory on more chunks. (Should trigger defragmentation.) */
2759	if ( !fTriedOnSameAlready
2760	&& gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(pGMM))
2761	{
2762	iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2763	if (iPage < cPages)
2764	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2765	}
2766
2767	/*
2768	* Ok, try allocate new chunks.
2769	*/
2770	if (iPage < cPages)
2771	{
2772	do
2773	rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGMM->PrivateX, cPages, paPages, &iPage);
2774	while (iPage < cPages && RT_SUCCESS(rc));
2775
2776	/* If the host is out of memory, take whatever we can get. */
2777	if ( (rc == VERR_NO_MEMORY \|\| rc == VERR_NO_PHYS_MEMORY)
2778	&& pGMM->PrivateX.cFreePages + pGMM->Shared.cFreePages >= cPages - iPage)
2779	{
2780	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2781	if (iPage < cPages)
2782	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2783	AssertRelease(iPage == cPages);
2784	rc = VINF_SUCCESS;
2785	}
2786	}
2787	}
2788	}
2789
2790	/*
2791	* Clean up on failure. Since this is bound to be a low-memory condition
2792	* we will give back any empty chunks that might be hanging around.
2793	*/
2794	if (RT_FAILURE(rc))
2795	{
2796	/* Update the statistics. */
2797	pGVM->gmm.s.Stats.cPrivatePages -= cPages;
2798	pGMM->cAllocatedPages -= cPages - iPage;
2799	switch (enmAccount)
2800	{
2801	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages; break;
2802	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= cPages; break;
2803	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= cPages; break;
2804	default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2805	}
2806
2807	/* Release the pages. */
2808	while (iPage-- > 0)
2809	{
2810	uint32_t idPage = paPages[iPage].idPage;
2811	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
2812	if (RT_LIKELY(pPage))
2813	{
2814	Assert(GMM_PAGE_IS_PRIVATE(pPage));
2815	Assert(pPage->Private.hGVM == pGVM->hSelf);
2816	gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
2817	}
2818	else
2819	AssertMsgFailed(("idPage=%#x\n", idPage));
2820
2821	paPages[iPage].idPage = NIL_GMM_PAGEID;
2822	paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2823	paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2824	}
2825
2826	/* Free empty chunks. */
2827	/** @todo */
2828
2829	/* return the fail status on failure */
2830	return rc;
2831	}
2832	return VINF_SUCCESS;
2833	}
2834
2835
2836	/**
2837	* Updates the previous allocations and allocates more pages.
2838	*
2839	* The handy pages are always taken from the 'base' memory account.
2840	* The allocated pages are not cleared and will contains random garbage.
2841	*
2842	* @returns VBox status code:
2843	* @retval VINF_SUCCESS on success.
2844	* @retval VERR_NOT_OWNER if the caller is not an EMT.
2845	* @retval VERR_GMM_PAGE_NOT_FOUND if one of the pages to update wasn't found.
2846	* @retval VERR_GMM_PAGE_NOT_PRIVATE if one of the pages to update wasn't a
2847	* private page.
2848	* @retval VERR_GMM_PAGE_NOT_SHARED if one of the pages to update wasn't a
2849	* shared page.
2850	* @retval VERR_GMM_NOT_PAGE_OWNER if one of the pages to be updated wasn't
2851	* owned by the VM.
2852	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2853	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2854	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2855	* that is we're trying to allocate more than we've reserved.
2856	*
2857	* @param pGVM The global (ring-0) VM structure.
2858	* @param idCpu The VCPU id.
2859	* @param cPagesToUpdate The number of pages to update (starting from the head).
2860	* @param cPagesToAlloc The number of pages to allocate (starting from the head).
2861	* @param paPages The array of page descriptors.
2862	* See GMMPAGEDESC for details on what is expected on input.
2863	* @thread EMT(idCpu)
2864	*/
2865	GMMR0DECL(int) GMMR0AllocateHandyPages(PGVM pGVM, VMCPUID idCpu, uint32_t cPagesToUpdate,
2866	uint32_t cPagesToAlloc, PGMMPAGEDESC paPages)
2867	{
2868	LogFlow(("GMMR0AllocateHandyPages: pGVM=%p cPagesToUpdate=%#x cPagesToAlloc=%#x paPages=%p\n",
2869	pGVM, cPagesToUpdate, cPagesToAlloc, paPages));
2870
2871	/*
2872	* Validate, get basics and take the semaphore.
2873	* (This is a relatively busy path, so make predictions where possible.)
2874	*/
2875	PGMM pGMM;
2876	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2877	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
2878	if (RT_FAILURE(rc))
2879	return rc;
2880
2881	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2882	AssertMsgReturn( (cPagesToUpdate && cPagesToUpdate < 1024)
2883	\|\| (cPagesToAlloc && cPagesToAlloc < 1024),
2884	("cPagesToUpdate=%#x cPagesToAlloc=%#x\n", cPagesToUpdate, cPagesToAlloc),
2885	VERR_INVALID_PARAMETER);
2886
2887	unsigned iPage = 0;
2888	for (; iPage < cPagesToUpdate; iPage++)
2889	{
2890	AssertMsgReturn( ( paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2891	&& !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK))
2892	\|\| paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2893	\|\| paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE,
2894	("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys),
2895	VERR_INVALID_PARAMETER);
2896	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2897	/\|\| paPages[iPage].idPage == NIL_GMM_PAGEID/,
2898	("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2899	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2900	/\|\| paPages[iPage].idSharedPage == NIL_GMM_PAGEID/,
2901	("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2902	}
2903
2904	for (; iPage < cPagesToAlloc; iPage++)
2905	{
2906	AssertMsgReturn(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS, ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys), VERR_INVALID_PARAMETER);
2907	AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2908	AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2909	}
2910
2911	gmmR0MutexAcquire(pGMM);
2912	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2913	{
2914	/* No allocations before the initial reservation has been made! */
2915	if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
2916	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
2917	&& pGVM->gmm.s.Stats.Reserved.cShadowPages))
2918	{
2919	/*
2920	* Perform the updates.
2921	* Stop on the first error.
2922	*/
2923	for (iPage = 0; iPage < cPagesToUpdate; iPage++)
2924	{
2925	if (paPages[iPage].idPage != NIL_GMM_PAGEID)
2926	{
2927	PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idPage);
2928	if (RT_LIKELY(pPage))
2929	{
2930	if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
2931	{
2932	if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
2933	{
2934	AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2935	if (RT_LIKELY(paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST))
2936	pPage->Private.pfn = paPages[iPage].HCPhysGCPhys >> PAGE_SHIFT;
2937	else if (paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE)
2938	pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
2939	/* else: NIL_RTHCPHYS nothing */
2940
2941	paPages[iPage].idPage = NIL_GMM_PAGEID;
2942	paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2943	}
2944	else
2945	{
2946	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not owner! hGVM=%#x hSelf=%#x\n",
2947	iPage, paPages[iPage].idPage, pPage->Private.hGVM, pGVM->hSelf));
2948	rc = VERR_GMM_NOT_PAGE_OWNER;
2949	break;
2950	}
2951	}
2952	else
2953	{
2954	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not private! %.Rhxs (type %d)\n", iPage, paPages[iPage].idPage, sizeof(pPage), pPage, pPage->Common.u2State));
2955	rc = VERR_GMM_PAGE_NOT_PRIVATE;
2956	break;
2957	}
2958	}
2959	else
2960	{
2961	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (private)\n", iPage, paPages[iPage].idPage));
2962	rc = VERR_GMM_PAGE_NOT_FOUND;
2963	break;
2964	}
2965	}
2966
2967	if (paPages[iPage].idSharedPage != NIL_GMM_PAGEID)
2968	{
2969	PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idSharedPage);
2970	if (RT_LIKELY(pPage))
2971	{
2972	if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
2973	{
2974	AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2975	Assert(pPage->Shared.cRefs);
2976	Assert(pGVM->gmm.s.Stats.cSharedPages);
2977	Assert(pGVM->gmm.s.Stats.Allocated.cBasePages);
2978
2979	Log(("GMMR0AllocateHandyPages: free shared page %x cRefs=%d\n", paPages[iPage].idSharedPage, pPage->Shared.cRefs));
2980	pGVM->gmm.s.Stats.cSharedPages--;
2981	pGVM->gmm.s.Stats.Allocated.cBasePages--;
2982	if (!--pPage->Shared.cRefs)
2983	gmmR0FreeSharedPage(pGMM, pGVM, paPages[iPage].idSharedPage, pPage);
2984	else
2985	{
2986	Assert(pGMM->cDuplicatePages);
2987	pGMM->cDuplicatePages--;
2988	}
2989
2990	paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2991	}
2992	else
2993	{
2994	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not shared!\n", iPage, paPages[iPage].idSharedPage));
2995	rc = VERR_GMM_PAGE_NOT_SHARED;
2996	break;
2997	}
2998	}
2999	else
3000	{
3001	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (shared)\n", iPage, paPages[iPage].idSharedPage));
3002	rc = VERR_GMM_PAGE_NOT_FOUND;
3003	break;
3004	}
3005	}
3006	} /* for each page to update */
3007
3008	if (RT_SUCCESS(rc) && cPagesToAlloc > 0)
3009	{
3010	#if defined(VBOX_STRICT) && 0 /** @todo re-test this later. Appeared to be a PGM init bug. */
3011	for (iPage = 0; iPage < cPagesToAlloc; iPage++)
3012	{
3013	Assert(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS);
3014	Assert(paPages[iPage].idPage == NIL_GMM_PAGEID);
3015	Assert(paPages[iPage].idSharedPage == NIL_GMM_PAGEID);
3016	}
3017	#endif
3018
3019	/*
3020	* Join paths with GMMR0AllocatePages for the allocation.
3021	* Note! gmmR0AllocateMoreChunks may leave the protection of the mutex!
3022	*/
3023	rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPagesToAlloc, paPages, GMMACCOUNT_BASE);
3024	}
3025	}
3026	else
3027	rc = VERR_WRONG_ORDER;
3028	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3029	}
3030	else
3031	rc = VERR_GMM_IS_NOT_SANE;
3032	gmmR0MutexRelease(pGMM);
3033	LogFlow(("GMMR0AllocateHandyPages: returns %Rrc\n", rc));
3034	return rc;
3035	}
3036
3037
3038	/**
3039	* Allocate one or more pages.
3040	*
3041	* This is typically used for ROMs and MMIO2 (VRAM) during VM creation.
3042	* The allocated pages are not cleared and will contain random garbage.
3043	*
3044	* @returns VBox status code:
3045	* @retval VINF_SUCCESS on success.
3046	* @retval VERR_NOT_OWNER if the caller is not an EMT.
3047	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
3048	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3049	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3050	* that is we're trying to allocate more than we've reserved.
3051	*
3052	* @param pGVM The global (ring-0) VM structure.
3053	* @param idCpu The VCPU id.
3054	* @param cPages The number of pages to allocate.
3055	* @param paPages Pointer to the page descriptors.
3056	* See GMMPAGEDESC for details on what is expected on
3057	* input.
3058	* @param enmAccount The account to charge.
3059	*
3060	* @thread EMT.
3061	*/
3062	GMMR0DECL(int) GMMR0AllocatePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
3063	{
3064	LogFlow(("GMMR0AllocatePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3065
3066	/*
3067	* Validate, get basics and take the semaphore.
3068	*/
3069	PGMM pGMM;
3070	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3071	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3072	if (RT_FAILURE(rc))
3073	return rc;
3074
3075	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3076	AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3077	AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3078
3079	for (unsigned iPage = 0; iPage < cPages; iPage++)
3080	{
3081	AssertMsgReturn( paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
3082	\|\| paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE
3083	\|\| ( enmAccount == GMMACCOUNT_BASE
3084	&& paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
3085	&& !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK)),
3086	("#%#x: %RHp enmAccount=%d\n", iPage, paPages[iPage].HCPhysGCPhys, enmAccount),
3087	VERR_INVALID_PARAMETER);
3088	AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3089	AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
3090	}
3091
3092	gmmR0MutexAcquire(pGMM);
3093	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3094	{
3095
3096	/* No allocations before the initial reservation has been made! */
3097	if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
3098	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
3099	&& pGVM->gmm.s.Stats.Reserved.cShadowPages))
3100	rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPages, paPages, enmAccount);
3101	else
3102	rc = VERR_WRONG_ORDER;
3103	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3104	}
3105	else
3106	rc = VERR_GMM_IS_NOT_SANE;
3107	gmmR0MutexRelease(pGMM);
3108	LogFlow(("GMMR0AllocatePages: returns %Rrc\n", rc));
3109	return rc;
3110	}
3111
3112
3113	/**
3114	* VMMR0 request wrapper for GMMR0AllocatePages.
3115	*
3116	* @returns see GMMR0AllocatePages.
3117	* @param pGVM The global (ring-0) VM structure.
3118	* @param idCpu The VCPU id.
3119	* @param pReq Pointer to the request packet.
3120	*/
3121	GMMR0DECL(int) GMMR0AllocatePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMALLOCATEPAGESREQ pReq)
3122	{
3123	/*
3124	* Validate input and pass it on.
3125	*/
3126	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3127	AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0]),
3128	("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0])),
3129	VERR_INVALID_PARAMETER);
3130	AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages]),
3131	("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages])),
3132	VERR_INVALID_PARAMETER);
3133
3134	return GMMR0AllocatePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3135	}
3136
3137
3138	/**
3139	* Allocate a large page to represent guest RAM
3140	*
3141	* The allocated pages are not cleared and will contains random garbage.
3142	*
3143	* @returns VBox status code:
3144	* @retval VINF_SUCCESS on success.
3145	* @retval VERR_NOT_OWNER if the caller is not an EMT.
3146	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
3147	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3148	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3149	* that is we're trying to allocate more than we've reserved.
3150	* @returns see GMMR0AllocatePages.
3151	*
3152	* @param pGVM The global (ring-0) VM structure.
3153	* @param idCpu The VCPU id.
3154	* @param cbPage Large page size.
3155	* @param pIdPage Where to return the GMM page ID of the page.
3156	* @param pHCPhys Where to return the host physical address of the page.
3157	*/
3158	GMMR0DECL(int) GMMR0AllocateLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t cbPage, uint32_t pIdPage, RTHCPHYS pHCPhys)
3159	{
3160	LogFlow(("GMMR0AllocateLargePage: pGVM=%p cbPage=%x\n", pGVM, cbPage));
3161
3162	AssertReturn(cbPage == GMM_CHUNK_SIZE, VERR_INVALID_PARAMETER);
3163	AssertPtrReturn(pIdPage, VERR_INVALID_PARAMETER);
3164	AssertPtrReturn(pHCPhys, VERR_INVALID_PARAMETER);
3165
3166	/*
3167	* Validate, get basics and take the semaphore.
3168	*/
3169	PGMM pGMM;
3170	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3171	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3172	if (RT_FAILURE(rc))
3173	return rc;
3174
3175	#ifdef GMM_WITH_LEGACY_MODE
3176	// /* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3177	// if (pGMM->fLegacyAllocationMode)
3178	// return VERR_NOT_SUPPORTED;
3179	#endif
3180
3181	*pHCPhys = NIL_RTHCPHYS;
3182	*pIdPage = NIL_GMM_PAGEID;
3183
3184	gmmR0MutexAcquire(pGMM);
3185	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3186	{
3187	const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3188	if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
3189	> pGVM->gmm.s.Stats.Reserved.cBasePages))
3190	{
3191	Log(("GMMR0AllocateLargePage: Reserved=%#llx Allocated+Requested=%#llx+%#x!\n",
3192	pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3193	gmmR0MutexRelease(pGMM);
3194	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
3195	}
3196
3197	/*
3198	* Allocate a new large page chunk.
3199	*
3200	* Note! We leave the giant GMM lock temporarily as the allocation might
3201	* take a long time. gmmR0RegisterChunk will retake it (ugly).
3202	*/
3203	AssertCompile(GMM_CHUNK_SIZE == _2M);
3204	gmmR0MutexRelease(pGMM);
3205
3206	RTR0MEMOBJ hMemObj;
3207	rc = RTR0MemObjAllocPhysEx(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS, GMM_CHUNK_SIZE);
3208	if (RT_SUCCESS(rc))
3209	{
3210	PGMMCHUNKFREESET pSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
3211	PGMMCHUNK pChunk;
3212	rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_LARGE_PAGE, &pChunk);
3213	if (RT_SUCCESS(rc))
3214	{
3215	/*
3216	* Allocate all the pages in the chunk.
3217	*/
3218	/* Unlink the new chunk from the free list. */
3219	gmmR0UnlinkChunk(pChunk);
3220
3221	/** @todo rewrite this to skip the looping. */
3222	/* Allocate all pages. */
3223	GMMPAGEDESC PageDesc;
3224	gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3225
3226	/* Return the first page as we'll use the whole chunk as one big page. */
3227	*pIdPage = PageDesc.idPage;
3228	*pHCPhys = PageDesc.HCPhysGCPhys;
3229
3230	for (unsigned i = 1; i < cPages; i++)
3231	gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3232
3233	/* Update accounting. */
3234	pGVM->gmm.s.Stats.Allocated.cBasePages += cPages;
3235	pGVM->gmm.s.Stats.cPrivatePages += cPages;
3236	pGMM->cAllocatedPages += cPages;
3237
3238	gmmR0LinkChunk(pChunk, pSet);
3239	gmmR0MutexRelease(pGMM);
3240	LogFlow(("GMMR0AllocateLargePage: returns VINF_SUCCESS\n"));
3241	return VINF_SUCCESS;
3242	}
3243	RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3244	}
3245	}
3246	else
3247	{
3248	gmmR0MutexRelease(pGMM);
3249	rc = VERR_GMM_IS_NOT_SANE;
3250	}
3251
3252	LogFlow(("GMMR0AllocateLargePage: returns %Rrc\n", rc));
3253	return rc;
3254	}
3255
3256
3257	/**
3258	* Free a large page.
3259	*
3260	* @returns VBox status code:
3261	* @param pGVM The global (ring-0) VM structure.
3262	* @param idCpu The VCPU id.
3263	* @param idPage The large page id.
3264	*/
3265	GMMR0DECL(int) GMMR0FreeLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t idPage)
3266	{
3267	LogFlow(("GMMR0FreeLargePage: pGVM=%p idPage=%x\n", pGVM, idPage));
3268
3269	/*
3270	* Validate, get basics and take the semaphore.
3271	*/
3272	PGMM pGMM;
3273	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3274	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3275	if (RT_FAILURE(rc))
3276	return rc;
3277
3278	#ifdef GMM_WITH_LEGACY_MODE
3279	// /* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3280	// if (pGMM->fLegacyAllocationMode)
3281	// return VERR_NOT_SUPPORTED;
3282	#endif
3283
3284	gmmR0MutexAcquire(pGMM);
3285	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3286	{
3287	const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3288
3289	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3290	{
3291	Log(("GMMR0FreeLargePage: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3292	gmmR0MutexRelease(pGMM);
3293	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3294	}
3295
3296	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3297	if (RT_LIKELY( pPage
3298	&& GMM_PAGE_IS_PRIVATE(pPage)))
3299	{
3300	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3301	Assert(pChunk);
3302	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3303	Assert(pChunk->cPrivate > 0);
3304
3305	/* Release the memory immediately. */
3306	gmmR0FreeChunk(pGMM, NULL, pChunk, false /fRelaxedSem/); /** @todo this can be relaxed too! */
3307
3308	/* Update accounting. */
3309	pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages;
3310	pGVM->gmm.s.Stats.cPrivatePages -= cPages;
3311	pGMM->cAllocatedPages -= cPages;
3312	}
3313	else
3314	rc = VERR_GMM_PAGE_NOT_FOUND;
3315	}
3316	else
3317	rc = VERR_GMM_IS_NOT_SANE;
3318
3319	gmmR0MutexRelease(pGMM);
3320	LogFlow(("GMMR0FreeLargePage: returns %Rrc\n", rc));
3321	return rc;
3322	}
3323
3324
3325	/**
3326	* VMMR0 request wrapper for GMMR0FreeLargePage.
3327	*
3328	* @returns see GMMR0FreeLargePage.
3329	* @param pGVM The global (ring-0) VM structure.
3330	* @param idCpu The VCPU id.
3331	* @param pReq Pointer to the request packet.
3332	*/
3333	GMMR0DECL(int) GMMR0FreeLargePageReq(PGVM pGVM, VMCPUID idCpu, PGMMFREELARGEPAGEREQ pReq)
3334	{
3335	/*
3336	* Validate input and pass it on.
3337	*/
3338	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3339	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMFREEPAGESREQ),
3340	("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(GMMFREEPAGESREQ)),
3341	VERR_INVALID_PARAMETER);
3342
3343	return GMMR0FreeLargePage(pGVM, idCpu, pReq->idPage);
3344	}
3345
3346
3347	/**
3348	* Frees a chunk, giving it back to the host OS.
3349	*
3350	* @param pGMM Pointer to the GMM instance.
3351	* @param pGVM This is set when called from GMMR0CleanupVM so we can
3352	* unmap and free the chunk in one go.
3353	* @param pChunk The chunk to free.
3354	* @param fRelaxedSem Whether we can release the semaphore while doing the
3355	* freeing (@c true) or not.
3356	*/
3357	static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3358	{
3359	Assert(pChunk->Core.Key != NIL_GMM_CHUNKID);
3360
3361	GMMR0CHUNKMTXSTATE MtxState;
3362	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3363
3364	/*
3365	* Cleanup hack! Unmap the chunk from the callers address space.
3366	* This shouldn't happen, so screw lock contention...
3367	*/
3368	if ( pChunk->cMappingsX
3369	#ifdef GMM_WITH_LEGACY_MODE
3370	&& (!pGMM->fLegacyAllocationMode \|\| (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
3371	#endif
3372	&& pGVM)
3373	gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3374
3375	/*
3376	* If there are current mappings of the chunk, then request the
3377	* VMs to unmap them. Reposition the chunk in the free list so
3378	* it won't be a likely candidate for allocations.
3379	*/
3380	if (pChunk->cMappingsX)
3381	{
3382	/** @todo R0 -> VM request */
3383	/* The chunk can be mapped by more than one VM if fBoundMemoryMode is false! */
3384	Log(("gmmR0FreeChunk: chunk still has %d mappings; don't free!\n", pChunk->cMappingsX));
3385	gmmR0ChunkMutexRelease(&MtxState, pChunk);
3386	return false;
3387	}
3388
3389
3390	/*
3391	* Save and trash the handle.
3392	*/
3393	RTR0MEMOBJ const hMemObj = pChunk->hMemObj;
3394	pChunk->hMemObj = NIL_RTR0MEMOBJ;
3395
3396	/*
3397	* Unlink it from everywhere.
3398	*/
3399	gmmR0UnlinkChunk(pChunk);
3400
3401	RTSpinlockAcquire(pGMM->hSpinLockTree);
3402
3403	RTListNodeRemove(&pChunk->ListNode);
3404
3405	PAVLU32NODECORE pCore = RTAvlU32Remove(&pGMM->pChunks, pChunk->Core.Key);
3406	Assert(pCore == &pChunk->Core); NOREF(pCore);
3407
3408	PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(pChunk->Core.Key)];
3409	if (pTlbe->pChunk == pChunk)
3410	{
3411	pTlbe->idChunk = NIL_GMM_CHUNKID;
3412	pTlbe->pChunk = NULL;
3413	}
3414
3415	Assert(pGMM->cChunks > 0);
3416	pGMM->cChunks--;
3417
3418	RTSpinlockRelease(pGMM->hSpinLockTree);
3419
3420	/*
3421	* Free the Chunk ID before dropping the locks and freeing the rest.
3422	*/
3423	gmmR0FreeChunkId(pGMM, pChunk->Core.Key);
3424	pChunk->Core.Key = NIL_GMM_CHUNKID;
3425
3426	pGMM->cFreedChunks++;
3427
3428	gmmR0ChunkMutexRelease(&MtxState, NULL);
3429	if (fRelaxedSem)
3430	gmmR0MutexRelease(pGMM);
3431
3432	RTMemFree(pChunk->paMappingsX);
3433	pChunk->paMappingsX = NULL;
3434
3435	RTMemFree(pChunk);
3436
3437	#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
3438	int rc = RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3439	#else
3440	int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3441	#endif
3442	AssertLogRelRC(rc);
3443
3444	if (fRelaxedSem)
3445	gmmR0MutexAcquire(pGMM);
3446	return fRelaxedSem;
3447	}
3448
3449
3450	/**
3451	* Free page worker.
3452	*
3453	* The caller does all the statistic decrementing, we do all the incrementing.
3454	*
3455	* @param pGMM Pointer to the GMM instance data.
3456	* @param pGVM Pointer to the GVM instance.
3457	* @param pChunk Pointer to the chunk this page belongs to.
3458	* @param idPage The Page ID.
3459	* @param pPage Pointer to the page.
3460	*/
3461	static void gmmR0FreePageWorker(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint32_t idPage, PGMMPAGE pPage)
3462	{
3463	Log3(("F pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x\n",
3464	pPage, pPage - &pChunk->aPages[0], idPage, pPage->Common.u2State, pChunk->iFreeHead)); NOREF(idPage);
3465
3466	/*
3467	* Put the page on the free list.
3468	*/
3469	pPage->u = 0;
3470	pPage->Free.u2State = GMM_PAGE_STATE_FREE;
3471	Assert(pChunk->iFreeHead < RT_ELEMENTS(pChunk->aPages) \|\| pChunk->iFreeHead == UINT16_MAX);
3472	pPage->Free.iNext = pChunk->iFreeHead;
3473	pChunk->iFreeHead = pPage - &pChunk->aPages[0];
3474
3475	/*
3476	* Update statistics (the cShared/cPrivate stats are up to date already),
3477	* and relink the chunk if necessary.
3478	*/
3479	unsigned const cFree = pChunk->cFree;
3480	if ( !cFree
3481	\|\| gmmR0SelectFreeSetList(cFree) != gmmR0SelectFreeSetList(cFree + 1))
3482	{
3483	gmmR0UnlinkChunk(pChunk);
3484	pChunk->cFree++;
3485	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
3486	}
3487	else
3488	{
3489	pChunk->cFree = cFree + 1;
3490	pChunk->pSet->cFreePages++;
3491	}
3492
3493	/*
3494	* If the chunk becomes empty, consider giving memory back to the host OS.
3495	*
3496	* The current strategy is to try give it back if there are other chunks
3497	* in this free list, meaning if there are at least 240 free pages in this
3498	* category. Note that since there are probably mappings of the chunk,
3499	* it won't be freed up instantly, which probably screws up this logic
3500	* a bit...
3501	*/
3502	/** @todo Do this on the way out. */
3503	if (RT_LIKELY( pChunk->cFree != GMM_CHUNK_NUM_PAGES
3504	\|\| pChunk->pFreeNext == NULL
3505	\|\| pChunk->pFreePrev == NULL /** @todo this is probably misfiring, see reset... */))
3506	{ /* likely */ }
3507	#ifdef GMM_WITH_LEGACY_MODE
3508	else if (RT_LIKELY(pGMM->fLegacyAllocationMode && !(pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE)))
3509	{ /* likely */ }
3510	#endif
3511	else
3512	gmmR0FreeChunk(pGMM, NULL, pChunk, false);
3513
3514	}
3515
3516
3517	/**
3518	* Frees a shared page, the page is known to exist and be valid and such.
3519	*
3520	* @param pGMM Pointer to the GMM instance.
3521	* @param pGVM Pointer to the GVM instance.
3522	* @param idPage The page id.
3523	* @param pPage The page structure.
3524	*/
3525	DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3526	{
3527	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3528	Assert(pChunk);
3529	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3530	Assert(pChunk->cShared > 0);
3531	Assert(pGMM->cSharedPages > 0);
3532	Assert(pGMM->cAllocatedPages > 0);
3533	Assert(!pPage->Shared.cRefs);
3534
3535	pChunk->cShared--;
3536	pGMM->cAllocatedPages--;
3537	pGMM->cSharedPages--;
3538	gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3539	}
3540
3541
3542	/**
3543	* Frees a private page, the page is known to exist and be valid and such.
3544	*
3545	* @param pGMM Pointer to the GMM instance.
3546	* @param pGVM Pointer to the GVM instance.
3547	* @param idPage The page id.
3548	* @param pPage The page structure.
3549	*/
3550	DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3551	{
3552	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3553	Assert(pChunk);
3554	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3555	Assert(pChunk->cPrivate > 0);
3556	Assert(pGMM->cAllocatedPages > 0);
3557
3558	pChunk->cPrivate--;
3559	pGMM->cAllocatedPages--;
3560	gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3561	}
3562
3563
3564	/**
3565	* Common worker for GMMR0FreePages and GMMR0BalloonedPages.
3566	*
3567	* @returns VBox status code:
3568	* @retval xxx
3569	*
3570	* @param pGMM Pointer to the GMM instance data.
3571	* @param pGVM Pointer to the VM.
3572	* @param cPages The number of pages to free.
3573	* @param paPages Pointer to the page descriptors.
3574	* @param enmAccount The account this relates to.
3575	*/
3576	static int gmmR0FreePages(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3577	{
3578	/*
3579	* Check that the request isn't impossible wrt to the account status.
3580	*/
3581	switch (enmAccount)
3582	{
3583	case GMMACCOUNT_BASE:
3584	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3585	{
3586	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3587	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3588	}
3589	break;
3590	case GMMACCOUNT_SHADOW:
3591	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages < cPages))
3592	{
3593	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
3594	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3595	}
3596	break;
3597	case GMMACCOUNT_FIXED:
3598	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages < cPages))
3599	{
3600	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
3601	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3602	}
3603	break;
3604	default:
3605	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3606	}
3607
3608	/*
3609	* Walk the descriptors and free the pages.
3610	*
3611	* Statistics (except the account) are being updated as we go along,
3612	* unlike the alloc code. Also, stop on the first error.
3613	*/
3614	int rc = VINF_SUCCESS;
3615	uint32_t iPage;
3616	for (iPage = 0; iPage < cPages; iPage++)
3617	{
3618	uint32_t idPage = paPages[iPage].idPage;
3619	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3620	if (RT_LIKELY(pPage))
3621	{
3622	if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
3623	{
3624	if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
3625	{
3626	Assert(pGVM->gmm.s.Stats.cPrivatePages);
3627	pGVM->gmm.s.Stats.cPrivatePages--;
3628	gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
3629	}
3630	else
3631	{
3632	Log(("gmmR0AllocatePages: #%#x/%#x: not owner! hGVM=%#x hSelf=%#x\n", iPage, idPage,
3633	pPage->Private.hGVM, pGVM->hSelf));
3634	rc = VERR_GMM_NOT_PAGE_OWNER;
3635	break;
3636	}
3637	}
3638	else if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3639	{
3640	Assert(pGVM->gmm.s.Stats.cSharedPages);
3641	Assert(pPage->Shared.cRefs);
3642	#if defined(VBOX_WITH_PAGE_SHARING) && defined(VBOX_STRICT) && HC_ARCH_BITS == 64
3643	if (pPage->Shared.u14Checksum)
3644	{
3645	uint32_t uChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
3646	uChecksum &= UINT32_C(0x00003fff);
3647	AssertMsg(!uChecksum \|\| uChecksum == pPage->Shared.u14Checksum,
3648	("%#x vs %#x - idPage=%#x\n", uChecksum, pPage->Shared.u14Checksum, idPage));
3649	}
3650	#endif
3651	pGVM->gmm.s.Stats.cSharedPages--;
3652	if (!--pPage->Shared.cRefs)
3653	gmmR0FreeSharedPage(pGMM, pGVM, idPage, pPage);
3654	else
3655	{
3656	Assert(pGMM->cDuplicatePages);
3657	pGMM->cDuplicatePages--;
3658	}
3659	}
3660	else
3661	{
3662	Log(("gmmR0AllocatePages: #%#x/%#x: already free!\n", iPage, idPage));
3663	rc = VERR_GMM_PAGE_ALREADY_FREE;
3664	break;
3665	}
3666	}
3667	else
3668	{
3669	Log(("gmmR0AllocatePages: #%#x/%#x: not found!\n", iPage, idPage));
3670	rc = VERR_GMM_PAGE_NOT_FOUND;
3671	break;
3672	}
3673	paPages[iPage].idPage = NIL_GMM_PAGEID;
3674	}
3675
3676	/*
3677	* Update the account.
3678	*/
3679	switch (enmAccount)
3680	{
3681	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= iPage; break;
3682	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= iPage; break;
3683	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= iPage; break;
3684	default:
3685	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3686	}
3687
3688	/*
3689	* Any threshold stuff to be done here?
3690	*/
3691
3692	return rc;
3693	}
3694
3695
3696	/**
3697	* Free one or more pages.
3698	*
3699	* This is typically used at reset time or power off.
3700	*
3701	* @returns VBox status code:
3702	* @retval xxx
3703	*
3704	* @param pGVM The global (ring-0) VM structure.
3705	* @param idCpu The VCPU id.
3706	* @param cPages The number of pages to allocate.
3707	* @param paPages Pointer to the page descriptors containing the page IDs
3708	* for each page.
3709	* @param enmAccount The account this relates to.
3710	* @thread EMT.
3711	*/
3712	GMMR0DECL(int) GMMR0FreePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3713	{
3714	LogFlow(("GMMR0FreePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3715
3716	/*
3717	* Validate input and get the basics.
3718	*/
3719	PGMM pGMM;
3720	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3721	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3722	if (RT_FAILURE(rc))
3723	return rc;
3724
3725	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3726	AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3727	AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3728
3729	for (unsigned iPage = 0; iPage < cPages; iPage++)
3730	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
3731	/\|\| paPages[iPage].idPage == NIL_GMM_PAGEID/,
3732	("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3733
3734	/*
3735	* Take the semaphore and call the worker function.
3736	*/
3737	gmmR0MutexAcquire(pGMM);
3738	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3739	{
3740	rc = gmmR0FreePages(pGMM, pGVM, cPages, paPages, enmAccount);
3741	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3742	}
3743	else
3744	rc = VERR_GMM_IS_NOT_SANE;
3745	gmmR0MutexRelease(pGMM);
3746	LogFlow(("GMMR0FreePages: returns %Rrc\n", rc));
3747	return rc;
3748	}
3749
3750
3751	/**
3752	* VMMR0 request wrapper for GMMR0FreePages.
3753	*
3754	* @returns see GMMR0FreePages.
3755	* @param pGVM The global (ring-0) VM structure.
3756	* @param idCpu The VCPU id.
3757	* @param pReq Pointer to the request packet.
3758	*/
3759	GMMR0DECL(int) GMMR0FreePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMFREEPAGESREQ pReq)
3760	{
3761	/*
3762	* Validate input and pass it on.
3763	*/
3764	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3765	AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0]),
3766	("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0])),
3767	VERR_INVALID_PARAMETER);
3768	AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages]),
3769	("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages])),
3770	VERR_INVALID_PARAMETER);
3771
3772	return GMMR0FreePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3773	}
3774
3775
3776	/**
3777	* Report back on a memory ballooning request.
3778	*
3779	* The request may or may not have been initiated by the GMM. If it was initiated
3780	* by the GMM it is important that this function is called even if no pages were
3781	* ballooned.
3782	*
3783	* @returns VBox status code:
3784	* @retval VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH
3785	* @retval VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH
3786	* @retval VERR_GMM_OVERCOMMITTED_TRY_AGAIN_IN_A_BIT - reset condition
3787	* indicating that we won't necessarily have sufficient RAM to boot
3788	* the VM again and that it should pause until this changes (we'll try
3789	* balloon some other VM). (For standard deflate we have little choice
3790	* but to hope the VM won't use the memory that was returned to it.)
3791	*
3792	* @param pGVM The global (ring-0) VM structure.
3793	* @param idCpu The VCPU id.
3794	* @param enmAction Inflate/deflate/reset.
3795	* @param cBalloonedPages The number of pages that was ballooned.
3796	*
3797	* @thread EMT(idCpu)
3798	*/
3799	GMMR0DECL(int) GMMR0BalloonedPages(PGVM pGVM, VMCPUID idCpu, GMMBALLOONACTION enmAction, uint32_t cBalloonedPages)
3800	{
3801	LogFlow(("GMMR0BalloonedPages: pGVM=%p enmAction=%d cBalloonedPages=%#x\n",
3802	pGVM, enmAction, cBalloonedPages));
3803
3804	AssertMsgReturn(cBalloonedPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cBalloonedPages), VERR_INVALID_PARAMETER);
3805
3806	/*
3807	* Validate input and get the basics.
3808	*/
3809	PGMM pGMM;
3810	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3811	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3812	if (RT_FAILURE(rc))
3813	return rc;
3814
3815	/*
3816	* Take the semaphore and do some more validations.
3817	*/
3818	gmmR0MutexAcquire(pGMM);
3819	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3820	{
3821	switch (enmAction)
3822	{
3823	case GMMBALLOONACTION_INFLATE:
3824	{
3825	if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cBalloonedPages
3826	<= pGVM->gmm.s.Stats.Reserved.cBasePages))
3827	{
3828	/*
3829	* Record the ballooned memory.
3830	*/
3831	pGMM->cBalloonedPages += cBalloonedPages;
3832	if (pGVM->gmm.s.Stats.cReqBalloonedPages)
3833	{
3834	/* Codepath never taken. Might be interesting in the future to request ballooned memory from guests in low memory conditions.. */
3835	AssertFailed();
3836
3837	pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3838	pGVM->gmm.s.Stats.cReqActuallyBalloonedPages += cBalloonedPages;
3839	Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx Req=%#llx Actual=%#llx (pending)\n",
3840	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages,
3841	pGVM->gmm.s.Stats.cReqBalloonedPages, pGVM->gmm.s.Stats.cReqActuallyBalloonedPages));
3842	}
3843	else
3844	{
3845	pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3846	Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3847	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3848	}
3849	}
3850	else
3851	{
3852	Log(("GMMR0BalloonedPages: cBasePages=%#llx Total=%#llx cBalloonedPages=%#llx Reserved=%#llx\n",
3853	pGVM->gmm.s.Stats.Allocated.cBasePages, pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages,
3854	pGVM->gmm.s.Stats.Reserved.cBasePages));
3855	rc = VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3856	}
3857	break;
3858	}
3859
3860	case GMMBALLOONACTION_DEFLATE:
3861	{
3862	/* Deflate. */
3863	if (pGVM->gmm.s.Stats.cBalloonedPages >= cBalloonedPages)
3864	{
3865	/*
3866	* Record the ballooned memory.
3867	*/
3868	Assert(pGMM->cBalloonedPages >= cBalloonedPages);
3869	pGMM->cBalloonedPages -= cBalloonedPages;
3870	pGVM->gmm.s.Stats.cBalloonedPages -= cBalloonedPages;
3871	if (pGVM->gmm.s.Stats.cReqDeflatePages)
3872	{
3873	AssertFailed(); /* This is path is for later. */
3874	Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx Req=%#llx\n",
3875	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages, pGVM->gmm.s.Stats.cReqDeflatePages));
3876
3877	/*
3878	* Anything we need to do here now when the request has been completed?
3879	*/
3880	pGVM->gmm.s.Stats.cReqDeflatePages = 0;
3881	}
3882	else
3883	Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3884	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3885	}
3886	else
3887	{
3888	Log(("GMMR0BalloonedPages: Total=%#llx cBalloonedPages=%#llx\n", pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages));
3889	rc = VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH;
3890	}
3891	break;
3892	}
3893
3894	case GMMBALLOONACTION_RESET:
3895	{
3896	/* Reset to an empty balloon. */
3897	Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
3898
3899	pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
3900	pGVM->gmm.s.Stats.cBalloonedPages = 0;
3901	break;
3902	}
3903
3904	default:
3905	rc = VERR_INVALID_PARAMETER;
3906	break;
3907	}
3908	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3909	}
3910	else
3911	rc = VERR_GMM_IS_NOT_SANE;
3912
3913	gmmR0MutexRelease(pGMM);
3914	LogFlow(("GMMR0BalloonedPages: returns %Rrc\n", rc));
3915	return rc;
3916	}
3917
3918
3919	/**
3920	* VMMR0 request wrapper for GMMR0BalloonedPages.
3921	*
3922	* @returns see GMMR0BalloonedPages.
3923	* @param pGVM The global (ring-0) VM structure.
3924	* @param idCpu The VCPU id.
3925	* @param pReq Pointer to the request packet.
3926	*/
3927	GMMR0DECL(int) GMMR0BalloonedPagesReq(PGVM pGVM, VMCPUID idCpu, PGMMBALLOONEDPAGESREQ pReq)
3928	{
3929	/*
3930	* Validate input and pass it on.
3931	*/
3932	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3933	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMBALLOONEDPAGESREQ),
3934	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMBALLOONEDPAGESREQ)),
3935	VERR_INVALID_PARAMETER);
3936
3937	return GMMR0BalloonedPages(pGVM, idCpu, pReq->enmAction, pReq->cBalloonedPages);
3938	}
3939
3940
3941	/**
3942	* Return memory statistics for the hypervisor
3943	*
3944	* @returns VBox status code.
3945	* @param pReq Pointer to the request packet.
3946	*/
3947	GMMR0DECL(int) GMMR0QueryHypervisorMemoryStatsReq(PGMMMEMSTATSREQ pReq)
3948	{
3949	/*
3950	* Validate input and pass it on.
3951	*/
3952	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3953	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3954	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3955	VERR_INVALID_PARAMETER);
3956
3957	/*
3958	* Validate input and get the basics.
3959	*/
3960	PGMM pGMM;
3961	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3962	pReq->cAllocPages = pGMM->cAllocatedPages;
3963	pReq->cFreePages = (pGMM->cChunks << (GMM_CHUNK_SHIFT- PAGE_SHIFT)) - pGMM->cAllocatedPages;
3964	pReq->cBalloonedPages = pGMM->cBalloonedPages;
3965	pReq->cMaxPages = pGMM->cMaxPages;
3966	pReq->cSharedPages = pGMM->cDuplicatePages;
3967	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3968
3969	return VINF_SUCCESS;
3970	}
3971
3972
3973	/**
3974	* Return memory statistics for the VM
3975	*
3976	* @returns VBox status code.
3977	* @param pGVM The global (ring-0) VM structure.
3978	* @param idCpu Cpu id.
3979	* @param pReq Pointer to the request packet.
3980	*
3981	* @thread EMT(idCpu)
3982	*/
3983	GMMR0DECL(int) GMMR0QueryMemoryStatsReq(PGVM pGVM, VMCPUID idCpu, PGMMMEMSTATSREQ pReq)
3984	{
3985	/*
3986	* Validate input and pass it on.
3987	*/
3988	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3989	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3990	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3991	VERR_INVALID_PARAMETER);
3992
3993	/*
3994	* Validate input and get the basics.
3995	*/
3996	PGMM pGMM;
3997	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3998	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3999	if (RT_FAILURE(rc))
4000	return rc;
4001
4002	/*
4003	* Take the semaphore and do some more validations.
4004	*/
4005	gmmR0MutexAcquire(pGMM);
4006	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4007	{
4008	pReq->cAllocPages = pGVM->gmm.s.Stats.Allocated.cBasePages;
4009	pReq->cBalloonedPages = pGVM->gmm.s.Stats.cBalloonedPages;
4010	pReq->cMaxPages = pGVM->gmm.s.Stats.Reserved.cBasePages;
4011	pReq->cFreePages = pReq->cMaxPages - pReq->cAllocPages;
4012	}
4013	else
4014	rc = VERR_GMM_IS_NOT_SANE;
4015
4016	gmmR0MutexRelease(pGMM);
4017	LogFlow(("GMMR3QueryVMMemoryStats: returns %Rrc\n", rc));
4018	return rc;
4019	}
4020
4021
4022	/**
4023	* Worker for gmmR0UnmapChunk and gmmr0FreeChunk.
4024	*
4025	* Don't call this in legacy allocation mode!
4026	*
4027	* @returns VBox status code.
4028	* @param pGMM Pointer to the GMM instance data.
4029	* @param pGVM Pointer to the Global VM structure.
4030	* @param pChunk Pointer to the chunk to be unmapped.
4031	*/
4032	static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
4033	{
4034	RT_NOREF_PV(pGMM);
4035	#ifdef GMM_WITH_LEGACY_MODE
4036	Assert(!pGMM->fLegacyAllocationMode \|\| (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE));
4037	#endif
4038
4039	/*
4040	* Find the mapping and try unmapping it.
4041	*/
4042	uint32_t cMappings = pChunk->cMappingsX;
4043	for (uint32_t i = 0; i < cMappings; i++)
4044	{
4045	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4046	if (pChunk->paMappingsX[i].pGVM == pGVM)
4047	{
4048	/* unmap */
4049	int rc = RTR0MemObjFree(pChunk->paMappingsX[i].hMapObj, false /* fFreeMappings (NA) */);
4050	if (RT_SUCCESS(rc))
4051	{
4052	/* update the record. */
4053	cMappings--;
4054	if (i < cMappings)
4055	pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
4056	pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
4057	pChunk->paMappingsX[cMappings].pGVM = NULL;
4058	Assert(pChunk->cMappingsX - 1U == cMappings);
4059	pChunk->cMappingsX = cMappings;
4060	}
4061
4062	return rc;
4063	}
4064	}
4065
4066	Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
4067	return VERR_GMM_CHUNK_NOT_MAPPED;
4068	}
4069
4070
4071	/**
4072	* Unmaps a chunk previously mapped into the address space of the current process.
4073	*
4074	* @returns VBox status code.
4075	* @param pGMM Pointer to the GMM instance data.
4076	* @param pGVM Pointer to the Global VM structure.
4077	* @param pChunk Pointer to the chunk to be unmapped.
4078	* @param fRelaxedSem Whether we can release the semaphore while doing the
4079	* mapping (@c true) or not.
4080	*/
4081	static int gmmR0UnmapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
4082	{
4083	#ifdef GMM_WITH_LEGACY_MODE
4084	if (!pGMM->fLegacyAllocationMode \|\| (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
4085	{
4086	#endif
4087	/*
4088	* Lock the chunk and if possible leave the giant GMM lock.
4089	*/
4090	GMMR0CHUNKMTXSTATE MtxState;
4091	int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4092	fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4093	if (RT_SUCCESS(rc))
4094	{
4095	rc = gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
4096	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4097	}
4098	return rc;
4099	#ifdef GMM_WITH_LEGACY_MODE
4100	}
4101
4102	if (pChunk->hGVM == pGVM->hSelf)
4103	return VINF_SUCCESS;
4104
4105	Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x (legacy)\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
4106	return VERR_GMM_CHUNK_NOT_MAPPED;
4107	#endif
4108	}
4109
4110
4111	/**
4112	* Worker for gmmR0MapChunk.
4113	*
4114	* @returns VBox status code.
4115	* @param pGMM Pointer to the GMM instance data.
4116	* @param pGVM Pointer to the Global VM structure.
4117	* @param pChunk Pointer to the chunk to be mapped.
4118	* @param ppvR3 Where to store the ring-3 address of the mapping.
4119	* In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4120	* contain the address of the existing mapping.
4121	*/
4122	static int gmmR0MapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4123	{
4124	#ifdef GMM_WITH_LEGACY_MODE
4125	/*
4126	* If we're in legacy mode this is simple.
4127	*/
4128	if (pGMM->fLegacyAllocationMode && !(pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
4129	{
4130	if (pChunk->hGVM != pGVM->hSelf)
4131	{
4132	Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4133	return VERR_GMM_CHUNK_NOT_FOUND;
4134	}
4135
4136	*ppvR3 = RTR0MemObjAddressR3(pChunk->hMemObj);
4137	return VINF_SUCCESS;
4138	}
4139	#else
4140	RT_NOREF(pGMM);
4141	#endif
4142
4143	/*
4144	* Check to see if the chunk is already mapped.
4145	*/
4146	for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4147	{
4148	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4149	if (pChunk->paMappingsX[i].pGVM == pGVM)
4150	{
4151	*ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4152	Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4153	#ifdef VBOX_WITH_PAGE_SHARING
4154	/* The ring-3 chunk cache can be out of sync; don't fail. */
4155	return VINF_SUCCESS;
4156	#else
4157	return VERR_GMM_CHUNK_ALREADY_MAPPED;
4158	#endif
4159	}
4160	}
4161
4162	/*
4163	* Do the mapping.
4164	*/
4165	RTR0MEMOBJ hMapObj;
4166	int rc = RTR0MemObjMapUser(&hMapObj, pChunk->hMemObj, (RTR3PTR)-1, 0, RTMEM_PROT_READ \| RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4167	if (RT_SUCCESS(rc))
4168	{
4169	/* reallocate the array? assumes few users per chunk (usually one). */
4170	unsigned iMapping = pChunk->cMappingsX;
4171	if ( iMapping <= 3
4172	\|\| (iMapping & 3) == 0)
4173	{
4174	unsigned cNewSize = iMapping <= 3
4175	? iMapping + 1
4176	: iMapping + 4;
4177	Assert(cNewSize < 4 \|\| RT_ALIGN_32(cNewSize, 4) == cNewSize);
4178	if (RT_UNLIKELY(cNewSize > UINT16_MAX))
4179	{
4180	rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4181	return VERR_GMM_TOO_MANY_CHUNK_MAPPINGS;
4182	}
4183
4184	void pvMappings = RTMemRealloc(pChunk->paMappingsX, cNewSize sizeof(pChunk->paMappingsX[0]));
4185	if (RT_UNLIKELY(!pvMappings))
4186	{
4187	rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4188	return VERR_NO_MEMORY;
4189	}
4190	pChunk->paMappingsX = (PGMMCHUNKMAP)pvMappings;
4191	}
4192
4193	/* insert new entry */
4194	pChunk->paMappingsX[iMapping].hMapObj = hMapObj;
4195	pChunk->paMappingsX[iMapping].pGVM = pGVM;
4196	Assert(pChunk->cMappingsX == iMapping);
4197	pChunk->cMappingsX = iMapping + 1;
4198
4199	*ppvR3 = RTR0MemObjAddressR3(hMapObj);
4200	}
4201
4202	return rc;
4203	}
4204
4205
4206	/**
4207	* Maps a chunk into the user address space of the current process.
4208	*
4209	* @returns VBox status code.
4210	* @param pGMM Pointer to the GMM instance data.
4211	* @param pGVM Pointer to the Global VM structure.
4212	* @param pChunk Pointer to the chunk to be mapped.
4213	* @param fRelaxedSem Whether we can release the semaphore while doing the
4214	* mapping (@c true) or not.
4215	* @param ppvR3 Where to store the ring-3 address of the mapping.
4216	* In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4217	* contain the address of the existing mapping.
4218	*/
4219	static int gmmR0MapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem, PRTR3PTR ppvR3)
4220	{
4221	/*
4222	* Take the chunk lock and leave the giant GMM lock when possible, then
4223	* call the worker function.
4224	*/
4225	GMMR0CHUNKMTXSTATE MtxState;
4226	int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4227	fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4228	if (RT_SUCCESS(rc))
4229	{
4230	rc = gmmR0MapChunkLocked(pGMM, pGVM, pChunk, ppvR3);
4231	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4232	}
4233
4234	return rc;
4235	}
4236
4237
4238
4239	#if defined(VBOX_WITH_PAGE_SHARING) \|\| (defined(VBOX_STRICT) && HC_ARCH_BITS == 64)
4240	/**
4241	* Check if a chunk is mapped into the specified VM
4242	*
4243	* @returns mapped yes/no
4244	* @param pGMM Pointer to the GMM instance.
4245	* @param pGVM Pointer to the Global VM structure.
4246	* @param pChunk Pointer to the chunk to be mapped.
4247	* @param ppvR3 Where to store the ring-3 address of the mapping.
4248	*/
4249	static bool gmmR0IsChunkMapped(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4250	{
4251	GMMR0CHUNKMTXSTATE MtxState;
4252	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
4253	for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4254	{
4255	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4256	if (pChunk->paMappingsX[i].pGVM == pGVM)
4257	{
4258	*ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4259	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4260	return true;
4261	}
4262	}
4263	*ppvR3 = NULL;
4264	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4265	return false;
4266	}
4267	#endif /* VBOX_WITH_PAGE_SHARING \|\| (VBOX_STRICT && 64-BIT) */
4268
4269
4270	/**
4271	* Map a chunk and/or unmap another chunk.
4272	*
4273	* The mapping and unmapping applies to the current process.
4274	*
4275	* This API does two things because it saves a kernel call per mapping when
4276	* when the ring-3 mapping cache is full.
4277	*
4278	* @returns VBox status code.
4279	* @param pGVM The global (ring-0) VM structure.
4280	* @param idChunkMap The chunk to map. NIL_GMM_CHUNKID if nothing to map.
4281	* @param idChunkUnmap The chunk to unmap. NIL_GMM_CHUNKID if nothing to unmap.
4282	* @param ppvR3 Where to store the address of the mapped chunk. NULL is ok if nothing to map.
4283	* @thread EMT ???
4284	*/
4285	GMMR0DECL(int) GMMR0MapUnmapChunk(PGVM pGVM, uint32_t idChunkMap, uint32_t idChunkUnmap, PRTR3PTR ppvR3)
4286	{
4287	LogFlow(("GMMR0MapUnmapChunk: pGVM=%p idChunkMap=%#x idChunkUnmap=%#x ppvR3=%p\n",
4288	pGVM, idChunkMap, idChunkUnmap, ppvR3));
4289
4290	/*
4291	* Validate input and get the basics.
4292	*/
4293	PGMM pGMM;
4294	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4295	int rc = GVMMR0ValidateGVM(pGVM);
4296	if (RT_FAILURE(rc))
4297	return rc;
4298
4299	AssertCompile(NIL_GMM_CHUNKID == 0);
4300	AssertMsgReturn(idChunkMap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkMap), VERR_INVALID_PARAMETER);
4301	AssertMsgReturn(idChunkUnmap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkUnmap), VERR_INVALID_PARAMETER);
4302
4303	if ( idChunkMap == NIL_GMM_CHUNKID
4304	&& idChunkUnmap == NIL_GMM_CHUNKID)
4305	return VERR_INVALID_PARAMETER;
4306
4307	if (idChunkMap != NIL_GMM_CHUNKID)
4308	{
4309	AssertPtrReturn(ppvR3, VERR_INVALID_POINTER);
4310	*ppvR3 = NIL_RTR3PTR;
4311	}
4312
4313	/*
4314	* Take the semaphore and do the work.
4315	*
4316	* The unmapping is done last since it's easier to undo a mapping than
4317	* undoing an unmapping. The ring-3 mapping cache cannot not be so big
4318	* that it pushes the user virtual address space to within a chunk of
4319	* it it's limits, so, no problem here.
4320	*/
4321	gmmR0MutexAcquire(pGMM);
4322	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4323	{
4324	PGMMCHUNK pMap = NULL;
4325	if (idChunkMap != NIL_GVM_HANDLE)
4326	{
4327	pMap = gmmR0GetChunk(pGMM, idChunkMap);
4328	if (RT_LIKELY(pMap))
4329	rc = gmmR0MapChunk(pGMM, pGVM, pMap, true /fRelaxedSem/, ppvR3);
4330	else
4331	{
4332	Log(("GMMR0MapUnmapChunk: idChunkMap=%#x\n", idChunkMap));
4333	rc = VERR_GMM_CHUNK_NOT_FOUND;
4334	}
4335	}
4336	/** @todo split this operation, the bail out might (theoretcially) not be
4337	* entirely safe. */
4338
4339	if ( idChunkUnmap != NIL_GMM_CHUNKID
4340	&& RT_SUCCESS(rc))
4341	{
4342	PGMMCHUNK pUnmap = gmmR0GetChunk(pGMM, idChunkUnmap);
4343	if (RT_LIKELY(pUnmap))
4344	rc = gmmR0UnmapChunk(pGMM, pGVM, pUnmap, true /fRelaxedSem/);
4345	else
4346	{
4347	Log(("GMMR0MapUnmapChunk: idChunkUnmap=%#x\n", idChunkUnmap));
4348	rc = VERR_GMM_CHUNK_NOT_FOUND;
4349	}
4350
4351	if (RT_FAILURE(rc) && pMap)
4352	gmmR0UnmapChunk(pGMM, pGVM, pMap, false /fRelaxedSem/);
4353	}
4354
4355	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4356	}
4357	else
4358	rc = VERR_GMM_IS_NOT_SANE;
4359	gmmR0MutexRelease(pGMM);
4360
4361	LogFlow(("GMMR0MapUnmapChunk: returns %Rrc\n", rc));
4362	return rc;
4363	}
4364
4365
4366	/**
4367	* VMMR0 request wrapper for GMMR0MapUnmapChunk.
4368	*
4369	* @returns see GMMR0MapUnmapChunk.
4370	* @param pGVM The global (ring-0) VM structure.
4371	* @param pReq Pointer to the request packet.
4372	*/
4373	GMMR0DECL(int) GMMR0MapUnmapChunkReq(PGVM pGVM, PGMMMAPUNMAPCHUNKREQ pReq)
4374	{
4375	/*
4376	* Validate input and pass it on.
4377	*/
4378	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4379	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4380
4381	return GMMR0MapUnmapChunk(pGVM, pReq->idChunkMap, pReq->idChunkUnmap, &pReq->pvR3);
4382	}
4383
4384
4385	/**
4386	* Legacy mode API for supplying pages.
4387	*
4388	* The specified user address points to a allocation chunk sized block that
4389	* will be locked down and used by the GMM when the GM asks for pages.
4390	*
4391	* @returns VBox status code.
4392	* @param pGVM The global (ring-0) VM structure.
4393	* @param idCpu The VCPU id.
4394	* @param pvR3 Pointer to the chunk size memory block to lock down.
4395	*/
4396	GMMR0DECL(int) GMMR0SeedChunk(PGVM pGVM, VMCPUID idCpu, RTR3PTR pvR3)
4397	{
4398	#ifdef GMM_WITH_LEGACY_MODE
4399	/*
4400	* Validate input and get the basics.
4401	*/
4402	PGMM pGMM;
4403	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4404	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4405	if (RT_FAILURE(rc))
4406	return rc;
4407
4408	AssertPtrReturn(pvR3, VERR_INVALID_POINTER);
4409	AssertReturn(!(PAGE_OFFSET_MASK & pvR3), VERR_INVALID_POINTER);
4410
4411	if (!pGMM->fLegacyAllocationMode)
4412	{
4413	Log(("GMMR0SeedChunk: not in legacy allocation mode!\n"));
4414	return VERR_NOT_SUPPORTED;
4415	}
4416
4417	/*
4418	* Lock the memory and add it as new chunk with our hGVM.
4419	* (The GMM locking is done inside gmmR0RegisterChunk.)
4420	*/
4421	RTR0MEMOBJ hMemObj;
4422	rc = RTR0MemObjLockUser(&hMemObj, pvR3, GMM_CHUNK_SIZE, RTMEM_PROT_READ \| RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4423	if (RT_SUCCESS(rc))
4424	{
4425	rc = gmmR0RegisterChunk(pGMM, &pGVM->gmm.s.Private, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_SEEDED, NULL);
4426	if (RT_SUCCESS(rc))
4427	gmmR0MutexRelease(pGMM);
4428	else
4429	RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
4430	}
4431
4432	LogFlow(("GMMR0SeedChunk: rc=%d (pvR3=%p)\n", rc, pvR3));
4433	return rc;
4434	#else
4435	RT_NOREF(pGVM, idCpu, pvR3);
4436	return VERR_NOT_SUPPORTED;
4437	#endif
4438	}
4439
4440	#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
4441
4442	/**
4443	* Gets the ring-0 virtual address for the given page.
4444	*
4445	* This is used by PGM when IEM and such wants to access guest RAM from ring-0.
4446	* One of the ASSUMPTIONS here is that the @a idPage is used by the VM and the
4447	* corresponding chunk will remain valid beyond the call (at least till the EMT
4448	* returns to ring-3).
4449	*
4450	* @returns VBox status code.
4451	* @param pGVM Pointer to the kernel-only VM instace data.
4452	* @param idPage The page ID.
4453	* @param ppv Where to store the address.
4454	* @thread EMT
4455	*/
4456	GMMR0DECL(int) GMMR0PageIdToVirt(PGVM pGVM, uint32_t idPage, void **ppv)
4457	{
4458	*ppv = NULL;
4459	PGMM pGMM;
4460	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4461
4462	RTSpinlockAcquire(pGMM->hSpinLockTree);
4463
4464	int rc;
4465	PGMMCHUNK pChunk = gmmR0GetChunkLocked(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4466	if (RT_LIKELY(pChunk))
4467	{
4468	const GMMPAGE *pPage = &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
4469	if (RT_LIKELY( ( GMM_PAGE_IS_PRIVATE(pPage)
4470	&& pPage->Private.hGVM == pGVM->hSelf)
4471	\|\| GMM_PAGE_IS_SHARED(pPage)))
4472	{
4473	AssertPtr(pChunk->pbMapping);
4474	*ppv = &pChunk->pbMapping[(idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT];
4475	rc = VINF_SUCCESS;
4476	}
4477	else
4478	rc = VERR_GMM_NOT_PAGE_OWNER;
4479	}
4480	else
4481	rc = VERR_GMM_PAGE_NOT_FOUND;
4482
4483	RTSpinlockRelease(pGMM->hSpinLockTree);
4484	return rc;
4485	}
4486
4487	#endif
4488
4489	#ifdef VBOX_WITH_PAGE_SHARING
4490
4491	# ifdef VBOX_STRICT
4492	/**
4493	* For checksumming shared pages in strict builds.
4494	*
4495	* The purpose is making sure that a page doesn't change.
4496	*
4497	* @returns Checksum, 0 on failure.
4498	* @param pGMM The GMM instance data.
4499	* @param pGVM Pointer to the kernel-only VM instace data.
4500	* @param idPage The page ID.
4501	*/
4502	static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage)
4503	{
4504	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4505	AssertMsgReturn(pChunk, ("idPage=%#x\n", idPage), 0);
4506
4507	uint8_t *pbChunk;
4508	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4509	return 0;
4510	uint8_t const *pbPage = pbChunk + ((idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4511
4512	return RTCrc32(pbPage, PAGE_SIZE);
4513	}
4514	# endif /* VBOX_STRICT */
4515
4516
4517	/**
4518	* Calculates the module hash value.
4519	*
4520	* @returns Hash value.
4521	* @param pszModuleName The module name.
4522	* @param pszVersion The module version string.
4523	*/
4524	static uint32_t gmmR0ShModCalcHash(const char pszModuleName, const char pszVersion)
4525	{
4526	return RTStrHash1ExN(3, pszModuleName, RTSTR_MAX, "::", (size_t)2, pszVersion, RTSTR_MAX);
4527	}
4528
4529
4530	/**
4531	* Finds a global module.
4532	*
4533	* @returns Pointer to the global module on success, NULL if not found.
4534	* @param pGMM The GMM instance data.
4535	* @param uHash The hash as calculated by gmmR0ShModCalcHash.
4536	* @param cbModule The module size.
4537	* @param enmGuestOS The guest OS type.
4538	* @param cRegions The number of regions.
4539	* @param pszModuleName The module name.
4540	* @param pszVersion The module version.
4541	* @param paRegions The region descriptions.
4542	*/
4543	static PGMMSHAREDMODULE gmmR0ShModFindGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4544	uint32_t cRegions, const char pszModuleName, const char pszVersion,
4545	struct VMMDEVSHAREDREGIONDESC const *paRegions)
4546	{
4547	for (PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTAvllU32Get(&pGMM->pGlobalSharedModuleTree, uHash);
4548	pGblMod;
4549	pGblMod = (PGMMSHAREDMODULE)pGblMod->Core.pList)
4550	{
4551	if (pGblMod->cbModule != cbModule)
4552	continue;
4553	if (pGblMod->enmGuestOS != enmGuestOS)
4554	continue;
4555	if (pGblMod->cRegions != cRegions)
4556	continue;
4557	if (strcmp(pGblMod->szName, pszModuleName))
4558	continue;
4559	if (strcmp(pGblMod->szVersion, pszVersion))
4560	continue;
4561
4562	uint32_t i;
4563	for (i = 0; i < cRegions; i++)
4564	{
4565	uint32_t off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4566	if (pGblMod->aRegions[i].off != off)
4567	break;
4568
4569	uint32_t cb = RT_ALIGN_32(paRegions[i].cbRegion + off, PAGE_SIZE);
4570	if (pGblMod->aRegions[i].cb != cb)
4571	break;
4572	}
4573
4574	if (i == cRegions)
4575	return pGblMod;
4576	}
4577
4578	return NULL;
4579	}
4580
4581
4582	/**
4583	* Creates a new global module.
4584	*
4585	* @returns VBox status code.
4586	* @param pGMM The GMM instance data.
4587	* @param uHash The hash as calculated by gmmR0ShModCalcHash.
4588	* @param cbModule The module size.
4589	* @param enmGuestOS The guest OS type.
4590	* @param cRegions The number of regions.
4591	* @param pszModuleName The module name.
4592	* @param pszVersion The module version.
4593	* @param paRegions The region descriptions.
4594	* @param ppGblMod Where to return the new module on success.
4595	*/
4596	static int gmmR0ShModNewGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4597	uint32_t cRegions, const char pszModuleName, const char pszVersion,
4598	struct VMMDEVSHAREDREGIONDESC const paRegions, PGMMSHAREDMODULE ppGblMod)
4599	{
4600	Log(("gmmR0ShModNewGlobal: %s %s size %#x os %u rgn %u\n", pszModuleName, pszVersion, cbModule, enmGuestOS, cRegions));
4601	if (pGMM->cShareableModules >= GMM_MAX_SHARED_GLOBAL_MODULES)
4602	{
4603	Log(("gmmR0ShModNewGlobal: Too many modules\n"));
4604	return VERR_GMM_TOO_MANY_GLOBAL_MODULES;
4605	}
4606
4607	PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULE, aRegions[cRegions]));
4608	if (!pGblMod)
4609	{
4610	Log(("gmmR0ShModNewGlobal: No memory\n"));
4611	return VERR_NO_MEMORY;
4612	}
4613
4614	pGblMod->Core.Key = uHash;
4615	pGblMod->cbModule = cbModule;
4616	pGblMod->cRegions = cRegions;
4617	pGblMod->cUsers = 1;
4618	pGblMod->enmGuestOS = enmGuestOS;
4619	strcpy(pGblMod->szName, pszModuleName);
4620	strcpy(pGblMod->szVersion, pszVersion);
4621
4622	for (uint32_t i = 0; i < cRegions; i++)
4623	{
4624	Log(("gmmR0ShModNewGlobal: rgn[%u]=%RGvLB%#x\n", i, paRegions[i].GCRegionAddr, paRegions[i].cbRegion));
4625	pGblMod->aRegions[i].off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4626	pGblMod->aRegions[i].cb = paRegions[i].cbRegion + pGblMod->aRegions[i].off;
4627	pGblMod->aRegions[i].cb = RT_ALIGN_32(pGblMod->aRegions[i].cb, PAGE_SIZE);
4628	pGblMod->aRegions[i].paidPages = NULL; /* allocated when needed. */
4629	}
4630
4631	bool fInsert = RTAvllU32Insert(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4632	Assert(fInsert); NOREF(fInsert);
4633	pGMM->cShareableModules++;
4634
4635	*ppGblMod = pGblMod;
4636	return VINF_SUCCESS;
4637	}
4638
4639
4640	/**
4641	* Deletes a global module which is no longer referenced by anyone.
4642	*
4643	* @param pGMM The GMM instance data.
4644	* @param pGblMod The module to delete.
4645	*/
4646	static void gmmR0ShModDeleteGlobal(PGMM pGMM, PGMMSHAREDMODULE pGblMod)
4647	{
4648	Assert(pGblMod->cUsers == 0);
4649	Assert(pGMM->cShareableModules > 0 && pGMM->cShareableModules <= GMM_MAX_SHARED_GLOBAL_MODULES);
4650
4651	void *pvTest = RTAvllU32RemoveNode(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4652	Assert(pvTest == pGblMod); NOREF(pvTest);
4653	pGMM->cShareableModules--;
4654
4655	uint32_t i = pGblMod->cRegions;
4656	while (i-- > 0)
4657	{
4658	if (pGblMod->aRegions[i].paidPages)
4659	{
4660	/* We don't doing anything to the pages as they are handled by the
4661	copy-on-write mechanism in PGM. */
4662	RTMemFree(pGblMod->aRegions[i].paidPages);
4663	pGblMod->aRegions[i].paidPages = NULL;
4664	}
4665	}
4666	RTMemFree(pGblMod);
4667	}
4668
4669
4670	static int gmmR0ShModNewPerVM(PGVM pGVM, RTGCPTR GCBaseAddr, uint32_t cRegions, const VMMDEVSHAREDREGIONDESC *paRegions,
4671	PGMMSHAREDMODULEPERVM *ppRecVM)
4672	{
4673	if (pGVM->gmm.s.Stats.cShareableModules >= GMM_MAX_SHARED_PER_VM_MODULES)
4674	return VERR_GMM_TOO_MANY_PER_VM_MODULES;
4675
4676	PGMMSHAREDMODULEPERVM pRecVM;
4677	pRecVM = (PGMMSHAREDMODULEPERVM)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULEPERVM, aRegionsGCPtrs[cRegions]));
4678	if (!pRecVM)
4679	return VERR_NO_MEMORY;
4680
4681	pRecVM->Core.Key = GCBaseAddr;
4682	for (uint32_t i = 0; i < cRegions; i++)
4683	pRecVM->aRegionsGCPtrs[i] = paRegions[i].GCRegionAddr;
4684
4685	bool fInsert = RTAvlGCPtrInsert(&pGVM->gmm.s.pSharedModuleTree, &pRecVM->Core);
4686	Assert(fInsert); NOREF(fInsert);
4687	pGVM->gmm.s.Stats.cShareableModules++;
4688
4689	*ppRecVM = pRecVM;
4690	return VINF_SUCCESS;
4691	}
4692
4693
4694	static void gmmR0ShModDeletePerVM(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULEPERVM pRecVM, bool fRemove)
4695	{
4696	/*
4697	* Free the per-VM module.
4698	*/
4699	PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
4700	pRecVM->pGlobalModule = NULL;
4701
4702	if (fRemove)
4703	{
4704	void *pvTest = RTAvlGCPtrRemove(&pGVM->gmm.s.pSharedModuleTree, pRecVM->Core.Key);
4705	Assert(pvTest == &pRecVM->Core); NOREF(pvTest);
4706	}
4707
4708	RTMemFree(pRecVM);
4709
4710	/*
4711	* Release the global module.
4712	* (In the registration bailout case, it might not be.)
4713	*/
4714	if (pGblMod)
4715	{
4716	Assert(pGblMod->cUsers > 0);
4717	pGblMod->cUsers--;
4718	if (pGblMod->cUsers == 0)
4719	gmmR0ShModDeleteGlobal(pGMM, pGblMod);
4720	}
4721	}
4722
4723	#endif /* VBOX_WITH_PAGE_SHARING */
4724
4725	/**
4726	* Registers a new shared module for the VM.
4727	*
4728	* @returns VBox status code.
4729	* @param pGVM The global (ring-0) VM structure.
4730	* @param idCpu The VCPU id.
4731	* @param enmGuestOS The guest OS type.
4732	* @param pszModuleName The module name.
4733	* @param pszVersion The module version.
4734	* @param GCPtrModBase The module base address.
4735	* @param cbModule The module size.
4736	* @param cRegions The mumber of shared region descriptors.
4737	* @param paRegions Pointer to an array of shared region(s).
4738	* @thread EMT(idCpu)
4739	*/
4740	GMMR0DECL(int) GMMR0RegisterSharedModule(PGVM pGVM, VMCPUID idCpu, VBOXOSFAMILY enmGuestOS, char *pszModuleName,
4741	char *pszVersion, RTGCPTR GCPtrModBase, uint32_t cbModule,
4742	uint32_t cRegions, struct VMMDEVSHAREDREGIONDESC const *paRegions)
4743	{
4744	#ifdef VBOX_WITH_PAGE_SHARING
4745	/*
4746	* Validate input and get the basics.
4747	*
4748	* Note! Turns out the module size does necessarily match the size of the
4749	* regions. (iTunes on XP)
4750	*/
4751	PGMM pGMM;
4752	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4753	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4754	if (RT_FAILURE(rc))
4755	return rc;
4756
4757	if (RT_UNLIKELY(cRegions > VMMDEVSHAREDREGIONDESC_MAX))
4758	return VERR_GMM_TOO_MANY_REGIONS;
4759
4760	if (RT_UNLIKELY(cbModule == 0 \|\| cbModule > _1G))
4761	return VERR_GMM_BAD_SHARED_MODULE_SIZE;
4762
4763	uint32_t cbTotal = 0;
4764	for (uint32_t i = 0; i < cRegions; i++)
4765	{
4766	if (RT_UNLIKELY(paRegions[i].cbRegion == 0 \|\| paRegions[i].cbRegion > _1G))
4767	return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4768
4769	cbTotal += paRegions[i].cbRegion;
4770	if (RT_UNLIKELY(cbTotal > _1G))
4771	return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4772	}
4773
4774	AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4775	if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4776	return VERR_GMM_MODULE_NAME_TOO_LONG;
4777
4778	AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4779	if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4780	return VERR_GMM_MODULE_NAME_TOO_LONG;
4781
4782	uint32_t const uHash = gmmR0ShModCalcHash(pszModuleName, pszVersion);
4783	Log(("GMMR0RegisterSharedModule %s %s base %RGv size %x hash %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule, uHash));
4784
4785	/*
4786	* Take the semaphore and do some more validations.
4787	*/
4788	gmmR0MutexAcquire(pGMM);
4789	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4790	{
4791	/*
4792	* Check if this module is already locally registered and register
4793	* it if it isn't. The base address is a unique module identifier
4794	* locally.
4795	*/
4796	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4797	bool fNewModule = pRecVM == NULL;
4798	if (fNewModule)
4799	{
4800	rc = gmmR0ShModNewPerVM(pGVM, GCPtrModBase, cRegions, paRegions, &pRecVM);
4801	if (RT_SUCCESS(rc))
4802	{
4803	/*
4804	* Find a matching global module, register a new one if needed.
4805	*/
4806	PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4807	pszModuleName, pszVersion, paRegions);
4808	if (!pGblMod)
4809	{
4810	Assert(fNewModule);
4811	rc = gmmR0ShModNewGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4812	pszModuleName, pszVersion, paRegions, &pGblMod);
4813	if (RT_SUCCESS(rc))
4814	{
4815	pRecVM->pGlobalModule = pGblMod; /* (One referenced returned by gmmR0ShModNewGlobal.) */
4816	Log(("GMMR0RegisterSharedModule: new module %s %s\n", pszModuleName, pszVersion));
4817	}
4818	else
4819	gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /fRemove/);
4820	}
4821	else
4822	{
4823	Assert(pGblMod->cUsers > 0 && pGblMod->cUsers < UINT32_MAX / 2);
4824	pGblMod->cUsers++;
4825	pRecVM->pGlobalModule = pGblMod;
4826
4827	Log(("GMMR0RegisterSharedModule: new per vm module %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4828	}
4829	}
4830	}
4831	else
4832	{
4833	/*
4834	* Attempt to re-register an existing module.
4835	*/
4836	PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4837	pszModuleName, pszVersion, paRegions);
4838	if (pRecVM->pGlobalModule == pGblMod)
4839	{
4840	Log(("GMMR0RegisterSharedModule: already registered %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4841	rc = VINF_GMM_SHARED_MODULE_ALREADY_REGISTERED;
4842	}
4843	else
4844	{
4845	/** @todo may have to unregister+register when this happens in case it's caused
4846	* by VBoxService crashing and being restarted... */
4847	Log(("GMMR0RegisterSharedModule: Address clash!\n"
4848	" incoming at %RGvLB%#x %s %s rgns %u\n"
4849	" existing at %RGvLB%#x %s %s rgns %u\n",
4850	GCPtrModBase, cbModule, pszModuleName, pszVersion, cRegions,
4851	pRecVM->Core.Key, pRecVM->pGlobalModule->cbModule, pRecVM->pGlobalModule->szName,
4852	pRecVM->pGlobalModule->szVersion, pRecVM->pGlobalModule->cRegions));
4853	rc = VERR_GMM_SHARED_MODULE_ADDRESS_CLASH;
4854	}
4855	}
4856	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4857	}
4858	else
4859	rc = VERR_GMM_IS_NOT_SANE;
4860
4861	gmmR0MutexRelease(pGMM);
4862	return rc;
4863	#else
4864
4865	NOREF(pGVM); NOREF(idCpu); NOREF(enmGuestOS); NOREF(pszModuleName); NOREF(pszVersion);
4866	NOREF(GCPtrModBase); NOREF(cbModule); NOREF(cRegions); NOREF(paRegions);
4867	return VERR_NOT_IMPLEMENTED;
4868	#endif
4869	}
4870
4871
4872	/**
4873	* VMMR0 request wrapper for GMMR0RegisterSharedModule.
4874	*
4875	* @returns see GMMR0RegisterSharedModule.
4876	* @param pGVM The global (ring-0) VM structure.
4877	* @param idCpu The VCPU id.
4878	* @param pReq Pointer to the request packet.
4879	*/
4880	GMMR0DECL(int) GMMR0RegisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMREGISTERSHAREDMODULEREQ pReq)
4881	{
4882	/*
4883	* Validate input and pass it on.
4884	*/
4885	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4886	AssertMsgReturn( pReq->Hdr.cbReq >= sizeof(*pReq)
4887	&& pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMREGISTERSHAREDMODULEREQ, aRegions[pReq->cRegions]),
4888	("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4889
4890	/* Pass back return code in the request packet to preserve informational codes. (VMMR3CallR0 chokes on them) */
4891	pReq->rc = GMMR0RegisterSharedModule(pGVM, idCpu, pReq->enmGuestOS, pReq->szName, pReq->szVersion,
4892	pReq->GCBaseAddr, pReq->cbModule, pReq->cRegions, pReq->aRegions);
4893	return VINF_SUCCESS;
4894	}
4895
4896
4897	/**
4898	* Unregisters a shared module for the VM
4899	*
4900	* @returns VBox status code.
4901	* @param pGVM The global (ring-0) VM structure.
4902	* @param idCpu The VCPU id.
4903	* @param pszModuleName The module name.
4904	* @param pszVersion The module version.
4905	* @param GCPtrModBase The module base address.
4906	* @param cbModule The module size.
4907	*/
4908	GMMR0DECL(int) GMMR0UnregisterSharedModule(PGVM pGVM, VMCPUID idCpu, char pszModuleName, char pszVersion,
4909	RTGCPTR GCPtrModBase, uint32_t cbModule)
4910	{
4911	#ifdef VBOX_WITH_PAGE_SHARING
4912	/*
4913	* Validate input and get the basics.
4914	*/
4915	PGMM pGMM;
4916	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4917	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4918	if (RT_FAILURE(rc))
4919	return rc;
4920
4921	AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4922	AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4923	if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4924	return VERR_GMM_MODULE_NAME_TOO_LONG;
4925	if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4926	return VERR_GMM_MODULE_NAME_TOO_LONG;
4927
4928	Log(("GMMR0UnregisterSharedModule %s %s base=%RGv size %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule));
4929
4930	/*
4931	* Take the semaphore and do some more validations.
4932	*/
4933	gmmR0MutexAcquire(pGMM);
4934	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4935	{
4936	/*
4937	* Locate and remove the specified module.
4938	*/
4939	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4940	if (pRecVM)
4941	{
4942	/** @todo Do we need to do more validations here, like that the
4943	* name + version + cbModule matches? */
4944	NOREF(cbModule);
4945	Assert(pRecVM->pGlobalModule);
4946	gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /fRemove/);
4947	}
4948	else
4949	rc = VERR_GMM_SHARED_MODULE_NOT_FOUND;
4950
4951	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4952	}
4953	else
4954	rc = VERR_GMM_IS_NOT_SANE;
4955
4956	gmmR0MutexRelease(pGMM);
4957	return rc;
4958	#else
4959
4960	NOREF(pGVM); NOREF(idCpu); NOREF(pszModuleName); NOREF(pszVersion); NOREF(GCPtrModBase); NOREF(cbModule);
4961	return VERR_NOT_IMPLEMENTED;
4962	#endif
4963	}
4964
4965
4966	/**
4967	* VMMR0 request wrapper for GMMR0UnregisterSharedModule.
4968	*
4969	* @returns see GMMR0UnregisterSharedModule.
4970	* @param pGVM The global (ring-0) VM structure.
4971	* @param idCpu The VCPU id.
4972	* @param pReq Pointer to the request packet.
4973	*/
4974	GMMR0DECL(int) GMMR0UnregisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMUNREGISTERSHAREDMODULEREQ pReq)
4975	{
4976	/*
4977	* Validate input and pass it on.
4978	*/
4979	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4980	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4981
4982	return GMMR0UnregisterSharedModule(pGVM, idCpu, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule);
4983	}
4984
4985	#ifdef VBOX_WITH_PAGE_SHARING
4986
4987	/**
4988	* Increase the use count of a shared page, the page is known to exist and be valid and such.
4989	*
4990	* @param pGMM Pointer to the GMM instance.
4991	* @param pGVM Pointer to the GVM instance.
4992	* @param pPage The page structure.
4993	*/
4994	DECLINLINE(void) gmmR0UseSharedPage(PGMM pGMM, PGVM pGVM, PGMMPAGE pPage)
4995	{
4996	Assert(pGMM->cSharedPages > 0);
4997	Assert(pGMM->cAllocatedPages > 0);
4998
4999	pGMM->cDuplicatePages++;
5000
5001	pPage->Shared.cRefs++;
5002	pGVM->gmm.s.Stats.cSharedPages++;
5003	pGVM->gmm.s.Stats.Allocated.cBasePages++;
5004	}
5005
5006
5007	/**
5008	* Converts a private page to a shared page, the page is known to exist and be valid and such.
5009	*
5010	* @param pGMM Pointer to the GMM instance.
5011	* @param pGVM Pointer to the GVM instance.
5012	* @param HCPhys Host physical address
5013	* @param idPage The Page ID
5014	* @param pPage The page structure.
5015	* @param pPageDesc Shared page descriptor
5016	*/
5017	DECLINLINE(void) gmmR0ConvertToSharedPage(PGMM pGMM, PGVM pGVM, RTHCPHYS HCPhys, uint32_t idPage, PGMMPAGE pPage,
5018	PGMMSHAREDPAGEDESC pPageDesc)
5019	{
5020	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
5021	Assert(pChunk);
5022	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
5023	Assert(GMM_PAGE_IS_PRIVATE(pPage));
5024
5025	pChunk->cPrivate--;
5026	pChunk->cShared++;
5027
5028	pGMM->cSharedPages++;
5029
5030	pGVM->gmm.s.Stats.cSharedPages++;
5031	pGVM->gmm.s.Stats.cPrivatePages--;
5032
5033	/* Modify the page structure. */
5034	pPage->Shared.pfn = (uint32_t)(uint64_t)(HCPhys >> PAGE_SHIFT);
5035	pPage->Shared.cRefs = 1;
5036	#ifdef VBOX_STRICT
5037	pPageDesc->u32StrictChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
5038	pPage->Shared.u14Checksum = pPageDesc->u32StrictChecksum;
5039	#else
5040	NOREF(pPageDesc);
5041	pPage->Shared.u14Checksum = 0;
5042	#endif
5043	pPage->Shared.u2State = GMM_PAGE_STATE_SHARED;
5044	}
5045
5046
5047	static int gmmR0SharedModuleCheckPageFirstTime(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULE pModule,
5048	unsigned idxRegion, unsigned idxPage,
5049	PGMMSHAREDPAGEDESC pPageDesc, PGMMSHAREDREGIONDESC pGlobalRegion)
5050	{
5051	NOREF(pModule);
5052
5053	/* Easy case: just change the internal page type. */
5054	PGMMPAGE pPage = gmmR0GetPage(pGMM, pPageDesc->idPage);
5055	AssertMsgReturn(pPage, ("idPage=%#x (GCPhys=%RGp HCPhys=%RHp idxRegion=%#x idxPage=%#x) #1\n",
5056	pPageDesc->idPage, pPageDesc->GCPhys, pPageDesc->HCPhys, idxRegion, idxPage),
5057	VERR_PGM_PHYS_INVALID_PAGE_ID);
5058	NOREF(idxRegion);
5059
5060	AssertMsg(pPageDesc->GCPhys == (pPage->Private.pfn << 12), ("desc %RGp gmm %RGp\n", pPageDesc->HCPhys, (pPage->Private.pfn << 12)));
5061
5062	gmmR0ConvertToSharedPage(pGMM, pGVM, pPageDesc->HCPhys, pPageDesc->idPage, pPage, pPageDesc);
5063
5064	/* Keep track of these references. */
5065	pGlobalRegion->paidPages[idxPage] = pPageDesc->idPage;
5066
5067	return VINF_SUCCESS;
5068	}
5069
5070	/**
5071	* Checks specified shared module range for changes
5072	*
5073	* Performs the following tasks:
5074	* - If a shared page is new, then it changes the GMM page type to shared and
5075	* returns it in the pPageDesc descriptor.
5076	* - If a shared page already exists, then it checks if the VM page is
5077	* identical and if so frees the VM page and returns the shared page in
5078	* pPageDesc descriptor.
5079	*
5080	* @remarks ASSUMES the caller has acquired the GMM semaphore!!
5081	*
5082	* @returns VBox status code.
5083	* @param pGVM Pointer to the GVM instance data.
5084	* @param pModule Module description
5085	* @param idxRegion Region index
5086	* @param idxPage Page index
5087	* @param pPageDesc Page descriptor
5088	*/
5089	GMMR0DECL(int) GMMR0SharedModuleCheckPage(PGVM pGVM, PGMMSHAREDMODULE pModule, uint32_t idxRegion, uint32_t idxPage,
5090	PGMMSHAREDPAGEDESC pPageDesc)
5091	{
5092	int rc;
5093	PGMM pGMM;
5094	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5095	pPageDesc->u32StrictChecksum = 0;
5096
5097	AssertMsgReturn(idxRegion < pModule->cRegions,
5098	("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5099	VERR_INVALID_PARAMETER);
5100
5101	uint32_t const cPages = pModule->aRegions[idxRegion].cb >> PAGE_SHIFT;
5102	AssertMsgReturn(idxPage < cPages,
5103	("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5104	VERR_INVALID_PARAMETER);
5105
5106	LogFlow(("GMMR0SharedModuleCheckRange %s base %RGv region %d idxPage %d\n", pModule->szName, pModule->Core.Key, idxRegion, idxPage));
5107
5108	/*
5109	* First time; create a page descriptor array.
5110	*/
5111	PGMMSHAREDREGIONDESC pGlobalRegion = &pModule->aRegions[idxRegion];
5112	if (!pGlobalRegion->paidPages)
5113	{
5114	Log(("Allocate page descriptor array for %d pages\n", cPages));
5115	pGlobalRegion->paidPages = (uint32_t )RTMemAlloc(cPages sizeof(pGlobalRegion->paidPages[0]));
5116	AssertReturn(pGlobalRegion->paidPages, VERR_NO_MEMORY);
5117
5118	/* Invalidate all descriptors. */
5119	uint32_t i = cPages;
5120	while (i-- > 0)
5121	pGlobalRegion->paidPages[i] = NIL_GMM_PAGEID;
5122	}
5123
5124	/*
5125	* We've seen this shared page for the first time?
5126	*/
5127	if (pGlobalRegion->paidPages[idxPage] == NIL_GMM_PAGEID)
5128	{
5129	Log(("New shared page guest %RGp host %RHp\n", pPageDesc->GCPhys, pPageDesc->HCPhys));
5130	return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5131	}
5132
5133	/*
5134	* We've seen it before...
5135	*/
5136	Log(("Replace existing page guest %RGp host %RHp id %#x -> id %#x\n",
5137	pPageDesc->GCPhys, pPageDesc->HCPhys, pPageDesc->idPage, pGlobalRegion->paidPages[idxPage]));
5138	Assert(pPageDesc->idPage != pGlobalRegion->paidPages[idxPage]);
5139
5140	/*
5141	* Get the shared page source.
5142	*/
5143	PGMMPAGE pPage = gmmR0GetPage(pGMM, pGlobalRegion->paidPages[idxPage]);
5144	AssertMsgReturn(pPage, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #2\n", pPageDesc->idPage, idxRegion, idxPage),
5145	VERR_PGM_PHYS_INVALID_PAGE_ID);
5146
5147	if (pPage->Common.u2State != GMM_PAGE_STATE_SHARED)
5148	{
5149	/*
5150	* Page was freed at some point; invalidate this entry.
5151	*/
5152	/** @todo this isn't really bullet proof. */
5153	Log(("Old shared page was freed -> create a new one\n"));
5154	pGlobalRegion->paidPages[idxPage] = NIL_GMM_PAGEID;
5155	return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5156	}
5157
5158	Log(("Replace existing page guest host %RHp -> %RHp\n", pPageDesc->HCPhys, ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT));
5159
5160	/*
5161	* Calculate the virtual address of the local page.
5162	*/
5163	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pPageDesc->idPage >> GMM_CHUNKID_SHIFT);
5164	AssertMsgReturn(pChunk, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #4\n", pPageDesc->idPage, idxRegion, idxPage),
5165	VERR_PGM_PHYS_INVALID_PAGE_ID);
5166
5167	uint8_t *pbChunk;
5168	AssertMsgReturn(gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk),
5169	("idPage=%#x (idxRegion=%#x idxPage=%#x) #3\n", pPageDesc->idPage, idxRegion, idxPage),
5170	VERR_PGM_PHYS_INVALID_PAGE_ID);
5171	uint8_t *pbLocalPage = pbChunk + ((pPageDesc->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5172
5173	/*
5174	* Calculate the virtual address of the shared page.
5175	*/
5176	pChunk = gmmR0GetChunk(pGMM, pGlobalRegion->paidPages[idxPage] >> GMM_CHUNKID_SHIFT);
5177	Assert(pChunk); /* can't fail as gmmR0GetPage succeeded. */
5178
5179	/*
5180	* Get the virtual address of the physical page; map the chunk into the VM
5181	* process if not already done.
5182	*/
5183	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5184	{
5185	Log(("Map chunk into process!\n"));
5186	rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/, (PRTR3PTR)&pbChunk);
5187	AssertRCReturn(rc, rc);
5188	}
5189	uint8_t *pbSharedPage = pbChunk + ((pGlobalRegion->paidPages[idxPage] & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5190
5191	#ifdef VBOX_STRICT
5192	pPageDesc->u32StrictChecksum = RTCrc32(pbSharedPage, PAGE_SIZE);
5193	uint32_t uChecksum = pPageDesc->u32StrictChecksum & UINT32_C(0x00003fff);
5194	AssertMsg(!uChecksum \|\| uChecksum == pPage->Shared.u14Checksum \|\| !pPage->Shared.u14Checksum,
5195	("%#x vs %#x - idPage=%#x - %s %s\n", uChecksum, pPage->Shared.u14Checksum,
5196	pGlobalRegion->paidPages[idxPage], pModule->szName, pModule->szVersion));
5197	#endif
5198
5199	/** @todo write ASMMemComparePage. */
5200	if (memcmp(pbSharedPage, pbLocalPage, PAGE_SIZE))
5201	{
5202	Log(("Unexpected differences found between local and shared page; skip\n"));
5203	/* Signal to the caller that this one hasn't changed. */
5204	pPageDesc->idPage = NIL_GMM_PAGEID;
5205	return VINF_SUCCESS;
5206	}
5207
5208	/*
5209	* Free the old local page.
5210	*/
5211	GMMFREEPAGEDESC PageDesc;
5212	PageDesc.idPage = pPageDesc->idPage;
5213	rc = gmmR0FreePages(pGMM, pGVM, 1, &PageDesc, GMMACCOUNT_BASE);
5214	AssertRCReturn(rc, rc);
5215
5216	gmmR0UseSharedPage(pGMM, pGVM, pPage);
5217
5218	/*
5219	* Pass along the new physical address & page id.
5220	*/
5221	pPageDesc->HCPhys = ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT;
5222	pPageDesc->idPage = pGlobalRegion->paidPages[idxPage];
5223
5224	return VINF_SUCCESS;
5225	}
5226
5227
5228	/**
5229	* RTAvlGCPtrDestroy callback.
5230	*
5231	* @returns 0 or VERR_GMM_INSTANCE.
5232	* @param pNode The node to destroy.
5233	* @param pvArgs Pointer to an argument packet.
5234	*/
5235	static DECLCALLBACK(int) gmmR0CleanupSharedModule(PAVLGCPTRNODECORE pNode, void *pvArgs)
5236	{
5237	gmmR0ShModDeletePerVM(((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGMM,
5238	((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGVM,
5239	(PGMMSHAREDMODULEPERVM)pNode,
5240	false /fRemove/);
5241	return VINF_SUCCESS;
5242	}
5243
5244
5245	/**
5246	* Used by GMMR0CleanupVM to clean up shared modules.
5247	*
5248	* This is called without taking the GMM lock so that it can be yielded as
5249	* needed here.
5250	*
5251	* @param pGMM The GMM handle.
5252	* @param pGVM The global VM handle.
5253	*/
5254	static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM)
5255	{
5256	gmmR0MutexAcquire(pGMM);
5257	GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
5258
5259	GMMR0SHMODPERVMDTORARGS Args;
5260	Args.pGVM = pGVM;
5261	Args.pGMM = pGMM;
5262	RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5263
5264	AssertMsg(pGVM->gmm.s.Stats.cShareableModules == 0, ("%d\n", pGVM->gmm.s.Stats.cShareableModules));
5265	pGVM->gmm.s.Stats.cShareableModules = 0;
5266
5267	gmmR0MutexRelease(pGMM);
5268	}
5269
5270	#endif /* VBOX_WITH_PAGE_SHARING */
5271
5272	/**
5273	* Removes all shared modules for the specified VM
5274	*
5275	* @returns VBox status code.
5276	* @param pGVM The global (ring-0) VM structure.
5277	* @param idCpu The VCPU id.
5278	*/
5279	GMMR0DECL(int) GMMR0ResetSharedModules(PGVM pGVM, VMCPUID idCpu)
5280	{
5281	#ifdef VBOX_WITH_PAGE_SHARING
5282	/*
5283	* Validate input and get the basics.
5284	*/
5285	PGMM pGMM;
5286	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5287	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5288	if (RT_FAILURE(rc))
5289	return rc;
5290
5291	/*
5292	* Take the semaphore and do some more validations.
5293	*/
5294	gmmR0MutexAcquire(pGMM);
5295	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5296	{
5297	Log(("GMMR0ResetSharedModules\n"));
5298	GMMR0SHMODPERVMDTORARGS Args;
5299	Args.pGVM = pGVM;
5300	Args.pGMM = pGMM;
5301	RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5302	pGVM->gmm.s.Stats.cShareableModules = 0;
5303
5304	rc = VINF_SUCCESS;
5305	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5306	}
5307	else
5308	rc = VERR_GMM_IS_NOT_SANE;
5309
5310	gmmR0MutexRelease(pGMM);
5311	return rc;
5312	#else
5313	RT_NOREF(pGVM, idCpu);
5314	return VERR_NOT_IMPLEMENTED;
5315	#endif
5316	}
5317
5318	#ifdef VBOX_WITH_PAGE_SHARING
5319
5320	/**
5321	* Tree enumeration callback for checking a shared module.
5322	*/
5323	static DECLCALLBACK(int) gmmR0CheckSharedModule(PAVLGCPTRNODECORE pNode, void *pvUser)
5324	{
5325	GMMCHECKSHAREDMODULEINFO pArgs = (GMMCHECKSHAREDMODULEINFO)pvUser;
5326	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)pNode;
5327	PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
5328
5329	Log(("gmmR0CheckSharedModule: check %s %s base=%RGv size=%x\n",
5330	pGblMod->szName, pGblMod->szVersion, pGblMod->Core.Key, pGblMod->cbModule));
5331
5332	int rc = PGMR0SharedModuleCheck(pArgs->pGVM, pArgs->pGVM, pArgs->idCpu, pGblMod, pRecVM->aRegionsGCPtrs);
5333	if (RT_FAILURE(rc))
5334	return rc;
5335	return VINF_SUCCESS;
5336	}
5337
5338	#endif /* VBOX_WITH_PAGE_SHARING */
5339
5340	/**
5341	* Check all shared modules for the specified VM.
5342	*
5343	* @returns VBox status code.
5344	* @param pGVM The global (ring-0) VM structure.
5345	* @param idCpu The calling EMT number.
5346	* @thread EMT(idCpu)
5347	*/
5348	GMMR0DECL(int) GMMR0CheckSharedModules(PGVM pGVM, VMCPUID idCpu)
5349	{
5350	#ifdef VBOX_WITH_PAGE_SHARING
5351	/*
5352	* Validate input and get the basics.
5353	*/
5354	PGMM pGMM;
5355	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5356	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5357	if (RT_FAILURE(rc))
5358	return rc;
5359
5360	# ifndef DEBUG_sandervl
5361	/*
5362	* Take the semaphore and do some more validations.
5363	*/
5364	gmmR0MutexAcquire(pGMM);
5365	# endif
5366	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5367	{
5368	/*
5369	* Walk the tree, checking each module.
5370	*/
5371	Log(("GMMR0CheckSharedModules\n"));
5372
5373	GMMCHECKSHAREDMODULEINFO Args;
5374	Args.pGVM = pGVM;
5375	Args.idCpu = idCpu;
5376	rc = RTAvlGCPtrDoWithAll(&pGVM->gmm.s.pSharedModuleTree, true /* fFromLeft */, gmmR0CheckSharedModule, &Args);
5377
5378	Log(("GMMR0CheckSharedModules done (rc=%Rrc)!\n", rc));
5379	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5380	}
5381	else
5382	rc = VERR_GMM_IS_NOT_SANE;
5383
5384	# ifndef DEBUG_sandervl
5385	gmmR0MutexRelease(pGMM);
5386	# endif
5387	return rc;
5388	#else
5389	RT_NOREF(pGVM, idCpu);
5390	return VERR_NOT_IMPLEMENTED;
5391	#endif
5392	}
5393
5394	#if defined(VBOX_STRICT) && HC_ARCH_BITS == 64
5395
5396	/**
5397	* Worker for GMMR0FindDuplicatePageReq.
5398	*
5399	* @returns true if duplicate, false if not.
5400	*/
5401	static bool gmmR0FindDupPageInChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint8_t const *pbSourcePage)
5402	{
5403	bool fFoundDuplicate = false;
5404	/* Only take chunks not mapped into this VM process; not entirely correct. */
5405	uint8_t *pbChunk;
5406	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5407	{
5408	int rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/, (PRTR3PTR)&pbChunk);
5409	if (RT_SUCCESS(rc))
5410	{
5411	/*
5412	* Look for duplicate pages
5413	*/
5414	uintptr_t iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
5415	while (iPage-- > 0)
5416	{
5417	if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
5418	{
5419	uint8_t *pbDestPage = pbChunk + (iPage << PAGE_SHIFT);
5420	if (!memcmp(pbSourcePage, pbDestPage, PAGE_SIZE))
5421	{
5422	fFoundDuplicate = true;
5423	break;
5424	}
5425	}
5426	}
5427	gmmR0UnmapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/);
5428	}
5429	}
5430	return fFoundDuplicate;
5431	}
5432
5433
5434	/**
5435	* Find a duplicate of the specified page in other active VMs
5436	*
5437	* @returns VBox status code.
5438	* @param pGVM The global (ring-0) VM structure.
5439	* @param pReq Pointer to the request packet.
5440	*/
5441	GMMR0DECL(int) GMMR0FindDuplicatePageReq(PGVM pGVM, PGMMFINDDUPLICATEPAGEREQ pReq)
5442	{
5443	/*
5444	* Validate input and pass it on.
5445	*/
5446	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5447	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5448
5449	PGMM pGMM;
5450	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5451
5452	int rc = GVMMR0ValidateGVM(pGVM);
5453	if (RT_FAILURE(rc))
5454	return rc;
5455
5456	/*
5457	* Take the semaphore and do some more validations.
5458	*/
5459	rc = gmmR0MutexAcquire(pGMM);
5460	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5461	{
5462	uint8_t *pbChunk;
5463	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pReq->idPage >> GMM_CHUNKID_SHIFT);
5464	if (pChunk)
5465	{
5466	if (gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5467	{
5468	uint8_t *pbSourcePage = pbChunk + ((pReq->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5469	PGMMPAGE pPage = gmmR0GetPage(pGMM, pReq->idPage);
5470	if (pPage)
5471	{
5472	/*
5473	* Walk the chunks
5474	*/
5475	pReq->fDuplicate = false;
5476	RTListForEach(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
5477	{
5478	if (gmmR0FindDupPageInChunk(pGMM, pGVM, pChunk, pbSourcePage))
5479	{
5480	pReq->fDuplicate = true;
5481	break;
5482	}
5483	}
5484	}
5485	else
5486	{
5487	AssertFailed();
5488	rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
5489	}
5490	}
5491	else
5492	AssertFailed();
5493	}
5494	else
5495	AssertFailed();
5496	}
5497	else
5498	rc = VERR_GMM_IS_NOT_SANE;
5499
5500	gmmR0MutexRelease(pGMM);
5501	return rc;
5502	}
5503
5504	#endif /* VBOX_STRICT && HC_ARCH_BITS == 64 */
5505
5506
5507	/**
5508	* Retrieves the GMM statistics visible to the caller.
5509	*
5510	* @returns VBox status code.
5511	*
5512	* @param pStats Where to put the statistics.
5513	* @param pSession The current session.
5514	* @param pGVM The GVM to obtain statistics for. Optional.
5515	*/
5516	GMMR0DECL(int) GMMR0QueryStatistics(PGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5517	{
5518	LogFlow(("GVMMR0QueryStatistics: pStats=%p pSession=%p pGVM=%p\n", pStats, pSession, pGVM));
5519
5520	/*
5521	* Validate input.
5522	*/
5523	AssertPtrReturn(pSession, VERR_INVALID_POINTER);
5524	AssertPtrReturn(pStats, VERR_INVALID_POINTER);
5525	pStats->cMaxPages = 0; /* (crash before taking the mutex...) */
5526
5527	PGMM pGMM;
5528	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5529
5530	/*
5531	* Validate the VM handle, if not NULL, and lock the GMM.
5532	*/
5533	int rc;
5534	if (pGVM)
5535	{
5536	rc = GVMMR0ValidateGVM(pGVM);
5537	if (RT_FAILURE(rc))
5538	return rc;
5539	}
5540
5541	rc = gmmR0MutexAcquire(pGMM);
5542	if (RT_FAILURE(rc))
5543	return rc;
5544
5545	/*
5546	* Copy out the GMM statistics.
5547	*/
5548	pStats->cMaxPages = pGMM->cMaxPages;
5549	pStats->cReservedPages = pGMM->cReservedPages;
5550	pStats->cOverCommittedPages = pGMM->cOverCommittedPages;
5551	pStats->cAllocatedPages = pGMM->cAllocatedPages;
5552	pStats->cSharedPages = pGMM->cSharedPages;
5553	pStats->cDuplicatePages = pGMM->cDuplicatePages;
5554	pStats->cLeftBehindSharedPages = pGMM->cLeftBehindSharedPages;
5555	pStats->cBalloonedPages = pGMM->cBalloonedPages;
5556	pStats->cChunks = pGMM->cChunks;
5557	pStats->cFreedChunks = pGMM->cFreedChunks;
5558	pStats->cShareableModules = pGMM->cShareableModules;
5559	RT_ZERO(pStats->au64Reserved);
5560
5561	/*
5562	* Copy out the VM statistics.
5563	*/
5564	if (pGVM)
5565	pStats->VMStats = pGVM->gmm.s.Stats;
5566	else
5567	RT_ZERO(pStats->VMStats);
5568
5569	gmmR0MutexRelease(pGMM);
5570	return rc;
5571	}
5572
5573
5574	/**
5575	* VMMR0 request wrapper for GMMR0QueryStatistics.
5576	*
5577	* @returns see GMMR0QueryStatistics.
5578	* @param pGVM The global (ring-0) VM structure. Optional.
5579	* @param pReq Pointer to the request packet.
5580	*/
5581	GMMR0DECL(int) GMMR0QueryStatisticsReq(PGVM pGVM, PGMMQUERYSTATISTICSSREQ pReq)
5582	{
5583	/*
5584	* Validate input and pass it on.
5585	*/
5586	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5587	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5588
5589	return GMMR0QueryStatistics(&pReq->Stats, pReq->pSession, pGVM);
5590	}
5591
5592
5593	/**
5594	* Resets the specified GMM statistics.
5595	*
5596	* @returns VBox status code.
5597	*
5598	* @param pStats Which statistics to reset, that is, non-zero fields
5599	* indicates which to reset.
5600	* @param pSession The current session.
5601	* @param pGVM The GVM to reset statistics for. Optional.
5602	*/
5603	GMMR0DECL(int) GMMR0ResetStatistics(PCGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5604	{
5605	NOREF(pStats); NOREF(pSession); NOREF(pGVM);
5606	/* Currently nothing we can reset at the moment. */
5607	return VINF_SUCCESS;
5608	}
5609
5610
5611	/**
5612	* VMMR0 request wrapper for GMMR0ResetStatistics.
5613	*
5614	* @returns see GMMR0ResetStatistics.
5615	* @param pGVM The global (ring-0) VM structure. Optional.
5616	* @param pReq Pointer to the request packet.
5617	*/
5618	GMMR0DECL(int) GMMR0ResetStatisticsReq(PGVM pGVM, PGMMRESETSTATISTICSSREQ pReq)
5619	{
5620	/*
5621	* Validate input and pass it on.
5622	*/
5623	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5624	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5625
5626	return GMMR0ResetStatistics(&pReq->Stats, pReq->pSession, pGVM);
5627	}
5628

注意: 瀏覽 TracBrowser 來幫助您使用儲存庫瀏覽器

source: vbox/trunk/src/VBox/VMM/VMMR0/GMMR0.cpp@ 82978

以其他格式下載: