1 | Buffer mapping patterns
|
---|
2 | -----------------------
|
---|
3 |
|
---|
4 | There are two main strategies the driver has for CPU access to GL buffer
|
---|
5 | objects. One is that the GL calls allocate temporary storage and blit to the GPU
|
---|
6 | at
|
---|
7 | ``glBufferSubData()``/``glBufferData()``/``glFlushMappedBufferRange()``/``glUnmapBuffer()``
|
---|
8 | time. This makes the behavior easily match. However, this may be more costly
|
---|
9 | than direct mapping of the GL BO on some platforms, and is essentially not
|
---|
10 | available to tiling GPUs (since tiling involves running through the command
|
---|
11 | stream multiple times). Thus, GL has additional interfaces to help make it so
|
---|
12 | apps can directly access memory while avoiding implicit blocking on the GPU
|
---|
13 | rendering from those BOs.
|
---|
14 |
|
---|
15 | Rendering engines have a variety of knobs to set on those GL interfaces for data
|
---|
16 | upload, and as a whole they seem to take just about every path available. Let's
|
---|
17 | look at some examples to see how they might constrain GL driver buffer upload
|
---|
18 | behavior.
|
---|
19 |
|
---|
20 | Portal 2
|
---|
21 | ========
|
---|
22 |
|
---|
23 | .. code-block:: console
|
---|
24 |
|
---|
25 | 1030842 glXSwapBuffers(dpy = 0x82a8000, drawable = 20971540)
|
---|
26 | 1030876 glBufferDataARB(target = GL_ELEMENT_ARRAY_BUFFER, size = 65536, data = NULL, usage = GL_DYNAMIC_DRAW)
|
---|
27 | 1030877 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, size = 576, data = blob(576))
|
---|
28 | 1030896 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 526, count = 252, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
|
---|
29 | 1030915 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 19657, count = 36, type = GL_UNSIGNED_SHORT, indices = 0x1f8, basevertex = 0)
|
---|
30 | 1030917 glBufferDataARB(target = GL_ARRAY_BUFFER, size = 1572864, data = NULL, usage = GL_DYNAMIC_DRAW)
|
---|
31 | 1030918 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 128, data = blob(128))
|
---|
32 | 1030919 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 576, size = 12, data = blob(12))
|
---|
33 | 1030936 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x240, basevertex = 0)
|
---|
34 | 1030937 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 128, size = 128, data = blob(128))
|
---|
35 | 1030938 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 588, size = 12, data = blob(12))
|
---|
36 | 1030940 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 4, end = 7, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x24c, basevertex = 0)
|
---|
37 | [... repeated draws at increasing offsets]
|
---|
38 | 1033097 glXSwapBuffers(dpy = 0x82a8000, drawable = 20971540)
|
---|
39 |
|
---|
40 | From this sequence, we can see that it is important that the driver either
|
---|
41 | implement ``glBufferSubData()`` as a blit from a streaming uploader in sequence with
|
---|
42 | the ``glDraw*()`` calls (a common behavior for non-tiled GPUs, particularly those with
|
---|
43 | dedicated memory), or that you:
|
---|
44 |
|
---|
45 | 1) Track the valid range of the buffer so that you don't have to flush the draws
|
---|
46 | and synchronize on each following ``glBufferSubData()``.
|
---|
47 |
|
---|
48 | 2) Reallocate the buffer storage on ``glBufferData`` so that your first
|
---|
49 | ``glBufferSubData()`` of the frame doesn't stall on the last frame's
|
---|
50 | rendering completing.
|
---|
51 |
|
---|
52 | You can't just empty your valid range on ``glBufferData()`` unless you know that
|
---|
53 | the GPU access from the previous frame has completed. This pattern of
|
---|
54 | incrementing ``glBufferSubData()`` offsets interleaved with draws from that data
|
---|
55 | is common among newer Valve games.
|
---|
56 |
|
---|
57 | .. code-block:: console
|
---|
58 |
|
---|
59 | [ during setup ]
|
---|
60 |
|
---|
61 | 679259 glGenBuffersARB(n = 1, buffers = &1314)
|
---|
62 | 679260 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314)
|
---|
63 | 679261 glBufferDataARB(target = GL_ELEMENT_ARRAY_BUFFER, size = 3072, data = NULL, usage = GL_STATIC_DRAW)
|
---|
64 | 679264 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 3072, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384000
|
---|
65 | 679269 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 3072)
|
---|
66 | 679270 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
|
---|
67 |
|
---|
68 | [... setup of other buffers on this binding point]
|
---|
69 |
|
---|
70 | 679343 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314)
|
---|
71 | 679344 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384000
|
---|
72 | 679346 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
|
---|
73 | 679347 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
|
---|
74 | 679348 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 768, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384300
|
---|
75 | 679350 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
|
---|
76 | 679351 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
|
---|
77 | 679352 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 1536, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384600
|
---|
78 | 679354 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
|
---|
79 | 679355 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
|
---|
80 | 679356 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 2304, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384900
|
---|
81 | 679358 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
|
---|
82 | 679359 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
|
---|
83 |
|
---|
84 | [... setup completes and we start drawing later]
|
---|
85 |
|
---|
86 | 761845 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314)
|
---|
87 | 761846 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 323, count = 384, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
|
---|
88 |
|
---|
89 | This suggests that, for non-blitting drivers, resetting your "might be used on
|
---|
90 | the GPU" range after a stall could save you a bunch of additional GPU stalls
|
---|
91 | during setup.
|
---|
92 |
|
---|
93 | Terraria
|
---|
94 | ========
|
---|
95 |
|
---|
96 | .. code-block:: console
|
---|
97 |
|
---|
98 | 167581 glXSwapBuffers(dpy = 0x3004630, drawable = 25165844)
|
---|
99 |
|
---|
100 | 167585 glBufferData(target = GL_ARRAY_BUFFER, size = 196608, data = NULL, usage = GL_STREAM_DRAW)
|
---|
101 | 167586 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 1728, data = blob(1728))
|
---|
102 | 167588 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 71, count = 108, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
|
---|
103 | 167589 glBufferData(target = GL_ARRAY_BUFFER, size = 196608, data = NULL, usage = GL_STREAM_DRAW)
|
---|
104 | 167590 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 27456, data = blob(27456))
|
---|
105 | 167592 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 7, count = 12, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
|
---|
106 | 167594 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 8)
|
---|
107 | 167596 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 12)
|
---|
108 | [...]
|
---|
109 |
|
---|
110 | In this game, we can see ``glBufferData()`` being used on the same array buffer
|
---|
111 | throughout, to get new storage so that the ``glBufferSubData()`` doesn't cause
|
---|
112 | synchronization.
|
---|
113 |
|
---|
114 | Don't Starve
|
---|
115 | ============
|
---|
116 |
|
---|
117 | .. code-block:: console
|
---|
118 |
|
---|
119 | 7251917 glGenBuffers(n = 1, buffers = &115052)
|
---|
120 | 7251918 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115052)
|
---|
121 | 7251919 glBufferData(target = GL_ARRAY_BUFFER, size = 144, data = blob(144), usage = GL_STREAM_DRAW)
|
---|
122 | 7251921 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115052)
|
---|
123 | 7251928 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
|
---|
124 | 7251930 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 114872)
|
---|
125 | 7251936 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 18)
|
---|
126 | 7251938 glGenBuffers(n = 1, buffers = &115053)
|
---|
127 | 7251939 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115053)
|
---|
128 | 7251940 glBufferData(target = GL_ARRAY_BUFFER, size = 144, data = blob(144), usage = GL_STREAM_DRAW)
|
---|
129 | 7251942 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115053)
|
---|
130 | 7251949 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
|
---|
131 | 7251973 glXSwapBuffers(dpy = 0x86dd860, drawable = 20971540)
|
---|
132 | [... drawing next frame]
|
---|
133 | 7252388 glDeleteBuffers(n = 1, buffers = &115052)
|
---|
134 | 7252389 glDeleteBuffers(n = 1, buffers = &115053)
|
---|
135 | 7252390 glXSwapBuffers(dpy = 0x86dd860, drawable = 20971540)
|
---|
136 |
|
---|
137 | In this game we have a lot of tiny ``glBufferData()`` calls, suggesting that we
|
---|
138 | could see working set wins and possibly CPU overhead reduction by packing small
|
---|
139 | GL buffers in the same BO. Interestingly, the deletes of the temporary buffers
|
---|
140 | always happen at the end of the next frame.
|
---|
141 |
|
---|
142 | Euro Truck Simulator
|
---|
143 | ====================
|
---|
144 |
|
---|
145 | .. code-block:: console
|
---|
146 |
|
---|
147 | [usage of VBO 14,15]
|
---|
148 | [...]
|
---|
149 | 885199 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527)
|
---|
150 | 885203 glInvalidateBufferData(buffer = 14)
|
---|
151 | 885204 glInvalidateBufferData(buffer = 15)
|
---|
152 | [...]
|
---|
153 | 889330 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527)
|
---|
154 | 889334 glInvalidateBufferData(buffer = 12)
|
---|
155 | 889335 glInvalidateBufferData(buffer = 16)
|
---|
156 | [...]
|
---|
157 | 893461 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527)
|
---|
158 | 893462 glClientWaitSync(sync = 0x77eee10, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
|
---|
159 | 893463 glDeleteSync(sync = 0x780a630)
|
---|
160 | 893464 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x78ec730
|
---|
161 | 893465 glInvalidateBufferData(buffer = 13)
|
---|
162 | 893466 glInvalidateBufferData(buffer = 17)
|
---|
163 | 893505 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 14)
|
---|
164 | 893506 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 0, length = 788, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034efd1000
|
---|
165 | 893508 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE
|
---|
166 | 893509 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 15)
|
---|
167 | 893510 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 0, length = 32, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034e5df000
|
---|
168 | 893512 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE
|
---|
169 | 893532 glBindVertexBuffers(first = 0, count = 2, buffers = {10, 15}, offsets = {0, 0}, strides = {52, 16})
|
---|
170 | 893552 glDrawElementsInstancedBaseVertex(mode = GL_TRIANGLES, count = 18, type = GL_UNSIGNED_SHORT, indices = 0x13f280, instancecount = 1, basevertex = 25131)
|
---|
171 | 893609 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
|
---|
172 | 893732 glBindVertexBuffers(first = 0, count = 1, buffers = &14, offsets = &0, strides = &48)
|
---|
173 | 893733 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 14)
|
---|
174 | 893744 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = 0xf0, basevertex = 0)
|
---|
175 | 893759 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = 0x2e0, basevertex = 6)
|
---|
176 | 893786 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 600, type = GL_UNSIGNED_SHORT, indices = 0xe87b0, basevertex = 21515)
|
---|
177 | 893822 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
|
---|
178 | 893845 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 14)
|
---|
179 | 893846 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 788, length = 788, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_RANGE_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034efd1314
|
---|
180 | 893848 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE
|
---|
181 | 893886 glDrawElementsInstancedBaseVertex(mode = GL_TRIANGLES, count = 18, type = GL_UNSIGNED_SHORT, indices = 0x13f280, instancecount = 1, basevertex = 25131)
|
---|
182 | 893943 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
|
---|
183 |
|
---|
184 | At the start of this frame, buffer 14 and 15 haven't been used in the previous 2
|
---|
185 | frames, and the :ext:`GL_ARB_sync` fence has ensured that the GPU has at least started
|
---|
186 | frame n-1 as the CPU starts the current frame. The first map is ``offset = 0,
|
---|
187 | INVALIDATE_BUFFER | UNSYNCHRONIZED``, which suggests that the driver should
|
---|
188 | reallocate storage for the mapping even in the ``UNSYNCHRONIZED`` case, except
|
---|
189 | that the buffer is definitely going to be idle, making reallocation unnecessary
|
---|
190 | (you may need to empty your valid range, though, to prevent unnecessary batch
|
---|
191 | flushes).
|
---|
192 |
|
---|
193 | Also note the use of a totally unrelated binding point for the mapping of the
|
---|
194 | vertex array -- you can't effectively use it as a hint for any buffer placement
|
---|
195 | in memory. The game does also use ``glCopyBufferSubData()``, but only on a
|
---|
196 | different buffer.
|
---|
197 |
|
---|
198 |
|
---|
199 | Plague Inc
|
---|
200 | ==========
|
---|
201 |
|
---|
202 | .. code-block:: console
|
---|
203 |
|
---|
204 | 1640732 glXSwapBuffers(dpy = 0xb218f20, drawable = 23068674)
|
---|
205 | 1640733 glClientWaitSync(sync = 0xb4141430, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
|
---|
206 | 1640734 glDeleteSync(sync = 0xb4141430)
|
---|
207 | 1640735 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0xb4141430
|
---|
208 |
|
---|
209 | 1640780 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 78)
|
---|
210 | 1640787 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 79)
|
---|
211 | 1640788 glDrawElements(mode = GL_TRIANGLES, count = 9636, type = GL_UNSIGNED_SHORT, indices = NULL)
|
---|
212 | 1640795 glDrawElements(mode = GL_TRIANGLES, count = 9636, type = GL_UNSIGNED_SHORT, indices = NULL)
|
---|
213 | 1640813 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
|
---|
214 | 1640814 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 67584, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xbfef4000
|
---|
215 | 1640815 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
|
---|
216 | 1640816 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xc3998000
|
---|
217 | 1640817 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
|
---|
218 | 1640819 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 352)
|
---|
219 | 1640820 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
---|
220 | 1640821 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
|
---|
221 | 1640823 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12)
|
---|
222 | 1640824 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
---|
223 | 1640825 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 1096)
|
---|
224 | 1640831 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1091)
|
---|
225 | 1640832 glDrawElements(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL)
|
---|
226 |
|
---|
227 | 1640847 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
|
---|
228 | 1640848 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 352, length = 67584, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xbfef4160
|
---|
229 | 1640849 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
|
---|
230 | 1640850 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 88, length = 12, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xc3998058
|
---|
231 | 1640851 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
|
---|
232 | 1640853 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 352)
|
---|
233 | 1640854 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
---|
234 | 1640855 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
|
---|
235 | 1640857 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12)
|
---|
236 | 1640858 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
---|
237 | 1640863 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x58, basevertex = 4)
|
---|
238 |
|
---|
239 | At the start of this frame, the VBOs haven't been used in about 6 frames, and
|
---|
240 | the :ext:`GL_ARB_sync` fence has ensured that the GPU has started frame n-1.
|
---|
241 |
|
---|
242 | Note the use of ``glFlushMappedBufferRange()`` on a small fraction of the size
|
---|
243 | of the VBO -- it is important that a blitting driver make use of the flush
|
---|
244 | ranges when in explicit mode.
|
---|
245 |
|
---|
246 | Darkest Dungeon
|
---|
247 | ===============
|
---|
248 |
|
---|
249 | .. code-block:: console
|
---|
250 |
|
---|
251 | 938384 glXSwapBuffers(dpy = 0x377fcd0, drawable = 23068692)
|
---|
252 |
|
---|
253 | 938385 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
|
---|
254 | 938386 glBufferData(target = GL_ARRAY_BUFFER, size = 1048576, data = NULL, usage = GL_STREAM_DRAW)
|
---|
255 | 938511 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
|
---|
256 | 938512 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1048576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7a73fcaa7000
|
---|
257 | 938514 glFlushMappedBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 512)
|
---|
258 | 938515 glUnmapBuffer(target = GL_ARRAY_BUFFER) = GL_TRUE
|
---|
259 | 938523 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1)
|
---|
260 | 938524 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
|
---|
261 | 938525 glDrawElements(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = NULL)
|
---|
262 | 938527 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
|
---|
263 | 938528 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1048576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7a73fcaa7000
|
---|
264 | 938530 glFlushMappedBufferRange(target = GL_ARRAY_BUFFER, offset = 512, length = 512)
|
---|
265 | 938531 glUnmapBuffer(target = GL_ARRAY_BUFFER) = GL_TRUE
|
---|
266 | 938539 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1)
|
---|
267 | 938540 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
|
---|
268 | 938541 glDrawElements(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = 0x30)
|
---|
269 | [... more maps and draws at increasing offsets]
|
---|
270 |
|
---|
271 | Interesting note for this game, after the initial ``glBufferData()`` in the
|
---|
272 | frame to reallocate the storage, it unsync maps the whole buffer each time, and
|
---|
273 | just changes which region it flushes. The same GL buffer name is used in every
|
---|
274 | frame.
|
---|
275 |
|
---|
276 | Tabletop Simulator
|
---|
277 | ==================
|
---|
278 |
|
---|
279 | .. code-block:: console
|
---|
280 |
|
---|
281 | 1287594 glXSwapBuffers(dpy = 0x3e10810, drawable = 23068692)
|
---|
282 | 1287595 glClientWaitSync(sync = 0x7abf554e37b0, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
|
---|
283 | 1287596 glDeleteSync(sync = 0x7abf554e37b0)
|
---|
284 | 1287597 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x7abf56647490
|
---|
285 |
|
---|
286 | 1287614 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 480)
|
---|
287 | 1287615 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 384, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_RANGE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7abf2e79a000
|
---|
288 | 1287642 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 614)
|
---|
289 | 1287650 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 5)
|
---|
290 | 1287651 glBufferSubData(target = GL_COPY_WRITE_BUFFER, offset = 0, size = 1088, data = blob(1088))
|
---|
291 | 1287652 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 615)
|
---|
292 | 1287653 glDrawElements(mode = GL_TRIANGLES, count = 1788, type = GL_UNSIGNED_SHORT, indices = NULL)
|
---|
293 | [... more draw calls]
|
---|
294 | 1289055 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 480)
|
---|
295 | 1289057 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 384)
|
---|
296 | 1289058 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
---|
297 | 1289059 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 480)
|
---|
298 | 1289066 glDrawArrays(mode = GL_TRIANGLE_STRIP, first = 12, count = 4)
|
---|
299 | 1289068 glDrawArrays(mode = GL_TRIANGLE_STRIP, first = 8, count = 4)
|
---|
300 | 1289553 glXSwapBuffers(dpy = 0x3e10810, drawable = 23068692)
|
---|
301 |
|
---|
302 | In this app, buffer 480 gets used like this every other frame. The :ext:`GL_ARB_sync`
|
---|
303 | fence ensures that frame n-1 has started on the GPU before CPU work starts on
|
---|
304 | the current frame, so the unsynchronized access to the buffers is safe.
|
---|
305 |
|
---|
306 | Hollow Knight
|
---|
307 | =============
|
---|
308 |
|
---|
309 | .. code-block:: console
|
---|
310 |
|
---|
311 | 1873034 glXSwapBuffers(dpy = 0x28609d0, drawable = 23068692)
|
---|
312 | 1873035 glClientWaitSync(sync = 0x7b1a5ca6e130, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
|
---|
313 | 1873036 glDeleteSync(sync = 0x7b1a5ca6e130)
|
---|
314 | 1873037 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x7b1a5ca6e130
|
---|
315 | 1873038 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
|
---|
316 | 1873039 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 8640, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a04c7e000
|
---|
317 | 1873040 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
|
---|
318 | 1873041 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 720, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a07430000
|
---|
319 | 1873065 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
|
---|
320 | 1873067 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 8640)
|
---|
321 | 1873068 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
---|
322 | 1873069 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
|
---|
323 | 1873071 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 720)
|
---|
324 | 1873072 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
---|
325 | 1873073 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
|
---|
326 | 1873074 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 8640, length = 576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a04c801c0
|
---|
327 | 1873075 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
|
---|
328 | 1873076 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 720, length = 72, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a074302d0
|
---|
329 | 1873077 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
|
---|
330 | 1873079 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 576)
|
---|
331 | 1873080 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
---|
332 | 1873081 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
|
---|
333 | 1873083 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 72)
|
---|
334 | 1873084 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
---|
335 | 1873085 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 29)
|
---|
336 | 1873096 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 30)
|
---|
337 | 1873097 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 36, type = GL_UNSIGNED_SHORT, indices = 0x2d0, basevertex = 240)
|
---|
338 |
|
---|
339 | In this app, buffer 29/30 get used like this starting from offset 0 every other
|
---|
340 | frame. The :ext:`GL_ARB_sync` fence is used to make sure that the GPU has reached the
|
---|
341 | start of the previous frame before we go unsynchronized writing over the n-2
|
---|
342 | frame's buffer.
|
---|
343 |
|
---|
344 | Borderlands 2
|
---|
345 | =============
|
---|
346 |
|
---|
347 | .. code-block:: console
|
---|
348 |
|
---|
349 | 3561998 glFlush()
|
---|
350 | 3562004 glXSwapBuffers(dpy = 0xbaf0f90, drawable = 23068705)
|
---|
351 | 3562006 glClientWaitSync(sync = 0x231c2ab0, flags = GL_SYNC_FLUSH_COMMANDS_BIT, timeout = 10000000000) = GL_ALREADY_SIGNALED
|
---|
352 | 3562007 glDeleteSync(sync = 0x231c2ab0)
|
---|
353 | 3562008 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x231aadc0
|
---|
354 |
|
---|
355 | 3562050 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1193)
|
---|
356 | 3562051 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1792, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT) = 0xde056000
|
---|
357 | 3562053 glUnmapBufferARB(target = GL_ARRAY_BUFFER) = GL_TRUE
|
---|
358 | 3562054 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1194)
|
---|
359 | 3562055 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1280, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT) = 0xd9426000
|
---|
360 | 3562057 glUnmapBufferARB(target = GL_ARRAY_BUFFER) = GL_TRUE
|
---|
361 | [... unrelated draws]
|
---|
362 | 3563051 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1193)
|
---|
363 | 3563064 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 875)
|
---|
364 | 3563065 glDrawElementsInstancedARB(mode = GL_TRIANGLES, count = 72, type = GL_UNSIGNED_SHORT, indices = NULL, instancecount = 28)
|
---|
365 |
|
---|
366 | The :ext:`GL_ARB_sync` fence ensures that the GPU has started frame n-1 before the CPU
|
---|
367 | starts on the current frame.
|
---|
368 |
|
---|
369 | This sequence of buffer uploads appears in each frame with the same buffer
|
---|
370 | names, so you do need to handle the ``GL_MAP_INVALIDATE_BUFFER_BIT`` as a
|
---|
371 | reallocate if the buffer is GPU-busy (it wasn't in this trace capture) to avoid
|
---|
372 | stalls on the n-1 frame completing.
|
---|
373 |
|
---|
374 | Note that this is just one small buffer. Most of the vertex data goes through a
|
---|
375 | ``glBufferSubData()``/``glDraw*()`` path with the VBO used across multiple
|
---|
376 | frames, with a ``glBufferData()`` when needing to wrap.
|
---|
377 |
|
---|
378 | Buffer mapping conclusions
|
---|
379 | --------------------------
|
---|
380 |
|
---|
381 | * Non-blitting drivers must track the valid range of a freshly allocated buffer
|
---|
382 | as it gets uploaded in ``pipe_transfer_map()`` and avoid stalling on the GPU
|
---|
383 | when mapping an undefined portion of the buffer when ``glBufferSubData()`` is
|
---|
384 | interleaved with drawing.
|
---|
385 |
|
---|
386 | * Non-blitting drivers must reallocate storage on ``glBufferData(NULL)`` so that
|
---|
387 | the following ``glBufferSubData()`` won't stall. That ``glBufferData(NULL)``
|
---|
388 | call will appear in the driver as an ``invalidate_resource()`` call if
|
---|
389 | ``PIPE_CAP_INVALIDATE_BUFFER`` is available. (If that flag is not set, then
|
---|
390 | mesa/st will create a new pipe_resource for you). Storage reallocation may be
|
---|
391 | skipped if you for some reason know that the buffer is idle, in which case you
|
---|
392 | can just empty the valid region.
|
---|
393 |
|
---|
394 | * Blitting drivers must use the ``transfer_flush_region()`` region
|
---|
395 | instead of the mapped range when ``PIPE_MAP_FLUSH_EXPLICIT`` is set, to avoid
|
---|
396 | blitting too much data. (When that bit is unset, you just blit the whole
|
---|
397 | mapped range at unmap time.)
|
---|
398 |
|
---|
399 | * Buffer valid range tracking in non-blitting drivers must use the
|
---|
400 | ``transfer_flush_region()`` region instead of the mapped range when
|
---|
401 | ``PIPE_MAP_FLUSH_EXPLICIT`` is set, to avoid excess stalls.
|
---|
402 |
|
---|
403 | * Buffer valid range tracking doesn't need to be fancy, "number of bytes
|
---|
404 | valid starting from 0" is sufficient for all examples found.
|
---|
405 |
|
---|
406 | * Use the ``util_debug_callback`` to report stalls on buffer mapping to ease
|
---|
407 | debug.
|
---|
408 |
|
---|
409 | * Buffer binding points are not useful for tuning buffer placement (See all the
|
---|
410 | ``PIPE_COPY_WRITE_BUFFER`` instances), you have to track the actual usage
|
---|
411 | history of a GL BO name. mesa/st does this for optimizing its state updates
|
---|
412 | on reallocation in the ``!PIPE_CAP_INVALIDATE_BUFFER`` case, and if you set
|
---|
413 | ``PIPE_CAP_INVALIDATE_BUFFER`` then you have to flag your own internal state
|
---|
414 | updates (VBO addresses, XFB addresses, texture buffer addresses, etc.) on
|
---|
415 | reallocation based on usage history.
|
---|