tgsi.rst@ 105254

最後變更在這個檔案從105254是 103996,由 vboxsync 提交於 12 月前
Additions/3D/mesa: export mesa-24.0.2 to OSE. bugref:10606
屬性 svn:eol-style 設為 `native`
檔案大小: 97.2 KB

行
1	TGSI
2	====
3
4	TGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language
5	for describing shaders. Since Gallium is inherently shaderful, shaders are
6	an important part of the API. TGSI is the only intermediate representation
7	used by all drivers.
8
9	Basics
10	------
11
12	All TGSI instructions, known as opcodes, operate on arbitrary-precision
13	floating-point four-component vectors. An opcode may have up to one
14	destination register, known as dst, and between zero and three source
15	registers, called src0 through src2, or simply src if there is only
16	one.
17
18	Some instructions, like :opcode:`I2F`, permit re-interpretation of vector
19	components as integers. Other instructions permit using registers as
20	two-component vectors with double precision; see :ref:`doubleopcodes`.
21
22	When an instruction has a scalar result, the result is usually copied into
23	each of the components of dst. When this happens, the result is said to be
24	replicated to dst. :opcode:`RCP` is one such instruction.
25
26	Source Modifiers
27	^^^^^^^^^^^^^^^^
28
29	TGSI supports 32-bit negate and absolute value modifiers on floating-point
30	inputs, and 32-bit integer negates on some drivers. The negate applies after
31	absolute value if both are present.
32
33	The type of an input can be found by ``tgsi_opcode_infer_src_type()``, and
34	TGSI_OPCODE_MOV and the second and third operands of TGSI_OPCODE_UCMP (which
35	return TGSI_TYPE_UNTYPED) are also considered floats for the purpose of source
36	modifiers.
37
38
39	Other Modifiers
40	^^^^^^^^^^^^^^^
41
42	The saturate modifier clamps 32-bit destination stores to [0.0, 1.0].
43
44	For arithmetic instruction having a precise modifier certain optimizations
45	which may alter the result are disallowed. Example: add(mul(a,b),c) can't be
46	optimized to TGSI_OPCODE_MAD, because some hardware only supports the fused
47	MAD instruction.
48
49	Instruction Set
50	---------------
51
52	Core ISA
53	^^^^^^^^^^^^^^^^^^^^^^^^^
54
55	These opcodes are guaranteed to be available regardless of the driver being
56	used.
57
58	.. opcode:: ARL - Address Register Load
59
60	.. math::
61
62	dst.x = (int) \lfloor src.x\rfloor
63
64	dst.y = (int) \lfloor src.y\rfloor
65
66	dst.z = (int) \lfloor src.z\rfloor
67
68	dst.w = (int) \lfloor src.w\rfloor
69
70
71	.. opcode:: MOV - Move
72
73	.. math::
74
75	dst.x = src.x
76
77	dst.y = src.y
78
79	dst.z = src.z
80
81	dst.w = src.w
82
83
84	.. opcode:: LIT - Light Coefficients
85
86	.. math::
87
88	dst.x = 1
89
90	dst.y = max(src.x, 0)
91
92	dst.z =
93	\left\{
94	\begin{array}{ c l }
95	max(src.y, 0)^{clamp(src.w, -128, 128)} & \quad \textrm{if } src.x \gt 0 \\
96	0 & \quad \textrm{otherwise}
97	\end{array}
98	\right.
99
100	dst.w = 1
101
102
103	.. opcode:: RCP - Reciprocal
104
105	This instruction replicates its result.
106
107	.. math::
108
109	dst = \frac{1}{src.x}
110
111
112	.. opcode:: RSQ - Reciprocal Square Root
113
114	This instruction replicates its result. The results are undefined for src <= 0.
115
116	.. math::
117
118	dst = \frac{1}{\sqrt{src.x}}
119
120
121	.. opcode:: SQRT - Square Root
122
123	This instruction replicates its result. The results are undefined for src < 0.
124
125	.. math::
126
127	dst = {\sqrt{src.x}}
128
129
130	.. opcode:: EXP - Approximate Exponential Base 2
131
132	.. math::
133
134	dst.x &= 2^{\lfloor src.x\rfloor} \\
135	dst.y &= src.x - \lfloor src.x\rfloor \\
136	dst.z &= 2^{src.x} \\
137	dst.w &= 1
138
139
140	.. opcode:: LOG - Approximate Logarithm Base 2
141
142	.. math::
143
144	dst.x &= \lfloor\log_2{\|src.x\|}\rfloor \\
145	dst.y &= \frac{\|src.x\|}{2^{\lfloor\log_2{\|src.x\|}\rfloor}} \\
146	dst.z &= \log_2{\|src.x\|} \\
147	dst.w &= 1
148
149
150	.. opcode:: MUL - Multiply
151
152	.. math::
153
154	dst.x = src0.x \times src1.x
155
156	dst.y = src0.y \times src1.y
157
158	dst.z = src0.z \times src1.z
159
160	dst.w = src0.w \times src1.w
161
162
163	.. opcode:: ADD - Add
164
165	.. math::
166
167	dst.x = src0.x + src1.x
168
169	dst.y = src0.y + src1.y
170
171	dst.z = src0.z + src1.z
172
173	dst.w = src0.w + src1.w
174
175
176	.. opcode:: DP3 - 3-component Dot Product
177
178	This instruction replicates its result.
179
180	.. math::
181
182	\begin{aligned}
183	dst = & src0.x \times src1.x +\\
184	& src0.y \times src1.y +\\
185	& src0.z \times src1.z
186	\end{aligned}
187
188
189	.. opcode:: DP4 - 4-component Dot Product
190
191	This instruction replicates its result.
192
193	.. math::
194
195	\begin{aligned}
196	dst = & src0.x \times src1.x +\\
197	& src0.y \times src1.y +\\
198	& src0.z \times src1.z +\\
199	& src0.w \times src1.w
200	\end{aligned}
201
202
203	.. opcode:: DST - Distance Vector
204
205	.. math::
206
207	dst.x &= 1\\
208	dst.y &= src0.y \times src1.y\\
209	dst.z &= src0.z\\
210	dst.w &= src1.w
211
212
213	.. opcode:: MIN - Minimum
214
215	.. math::
216
217	dst.x = min(src0.x, src1.x)
218
219	dst.y = min(src0.y, src1.y)
220
221	dst.z = min(src0.z, src1.z)
222
223	dst.w = min(src0.w, src1.w)
224
225
226	.. opcode:: MAX - Maximum
227
228	.. math::
229
230	dst.x = max(src0.x, src1.x)
231
232	dst.y = max(src0.y, src1.y)
233
234	dst.z = max(src0.z, src1.z)
235
236	dst.w = max(src0.w, src1.w)
237
238
239	.. opcode:: SLT - Set On Less Than
240
241	.. math::
242
243	dst.x = (src0.x < src1.x) ? 1.0F : 0.0F
244
245	dst.y = (src0.y < src1.y) ? 1.0F : 0.0F
246
247	dst.z = (src0.z < src1.z) ? 1.0F : 0.0F
248
249	dst.w = (src0.w < src1.w) ? 1.0F : 0.0F
250
251
252	.. opcode:: SGE - Set On Greater Equal Than
253
254	.. math::
255
256	dst.x = (src0.x >= src1.x) ? 1.0F : 0.0F
257
258	dst.y = (src0.y >= src1.y) ? 1.0F : 0.0F
259
260	dst.z = (src0.z >= src1.z) ? 1.0F : 0.0F
261
262	dst.w = (src0.w >= src1.w) ? 1.0F : 0.0F
263
264
265	.. opcode:: MAD - Multiply And Add
266
267	Perform a * b + c. The implementation is free to decide whether there is an
268	intermediate rounding step or not.
269
270	.. math::
271
272	dst.x = src0.x \times src1.x + src2.x
273
274	dst.y = src0.y \times src1.y + src2.y
275
276	dst.z = src0.z \times src1.z + src2.z
277
278	dst.w = src0.w \times src1.w + src2.w
279
280
281	.. opcode:: LRP - Linear Interpolate
282
283	.. math::
284
285	dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x
286
287	dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y
288
289	dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z
290
291	dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w
292
293
294	.. opcode:: FMA - Fused Multiply-Add
295
296	Perform a * b + c with no intermediate rounding step.
297
298	.. math::
299
300	dst.x = src0.x \times src1.x + src2.x
301
302	dst.y = src0.y \times src1.y + src2.y
303
304	dst.z = src0.z \times src1.z + src2.z
305
306	dst.w = src0.w \times src1.w + src2.w
307
308
309	.. opcode:: FRC - Fraction
310
311	.. math::
312
313	dst.x = src.x - \lfloor src.x\rfloor
314
315	dst.y = src.y - \lfloor src.y\rfloor
316
317	dst.z = src.z - \lfloor src.z\rfloor
318
319	dst.w = src.w - \lfloor src.w\rfloor
320
321
322	.. opcode:: FLR - Floor
323
324	.. math::
325
326	dst.x = \lfloor src.x\rfloor
327
328	dst.y = \lfloor src.y\rfloor
329
330	dst.z = \lfloor src.z\rfloor
331
332	dst.w = \lfloor src.w\rfloor
333
334
335	.. opcode:: ROUND - Round
336
337	.. math::
338
339	dst.x = round(src.x)
340
341	dst.y = round(src.y)
342
343	dst.z = round(src.z)
344
345	dst.w = round(src.w)
346
347
348	.. opcode:: EX2 - Exponential Base 2
349
350	This instruction replicates its result.
351
352	.. math::
353
354	dst = 2^{src.x}
355
356
357	.. opcode:: LG2 - Logarithm Base 2
358
359	This instruction replicates its result.
360
361	.. math::
362
363	dst = \log_2{src.x}
364
365
366	.. opcode:: POW - Power
367
368	This instruction replicates its result.
369
370	.. math::
371
372	dst = src0.x^{src1.x}
373
374
375	.. opcode:: LDEXP - Multiply Number by Integral Power of 2
376
377	src1 is an integer.
378
379	.. math::
380
381	dst.x = src0.x * 2^{src1.x}
382
383	dst.y = src0.y * 2^{src1.y}
384
385	dst.z = src0.z * 2^{src1.z}
386
387	dst.w = src0.w * 2^{src1.w}
388
389
390	.. opcode:: COS - Cosine
391
392	This instruction replicates its result.
393
394	.. math::
395
396	dst = \cos{src.x}
397
398
399	.. opcode:: DDX, DDX_FINE - Derivative Relative To X
400
401	The fine variant is only used when ``PIPE_CAP_FS_FINE_DERIVATIVE`` is
402	advertised. When it is, the fine version guarantees one derivative per
403	row while DDX is allowed to be the same for the entire 2x2 quad.
404
405	.. math::
406
407	dst.x = partialx(src.x)
408
409	dst.y = partialx(src.y)
410
411	dst.z = partialx(src.z)
412
413	dst.w = partialx(src.w)
414
415
416	.. opcode:: DDY, DDY_FINE - Derivative Relative To Y
417
418	The fine variant is only used when ``PIPE_CAP_FS_FINE_DERIVATIVE`` is
419	advertised. When it is, the fine version guarantees one derivative per
420	column while DDY is allowed to be the same for the entire 2x2 quad.
421
422	.. math::
423
424	dst.x = partialy(src.x)
425
426	dst.y = partialy(src.y)
427
428	dst.z = partialy(src.z)
429
430	dst.w = partialy(src.w)
431
432
433	.. opcode:: PK2H - Pack Two 16-bit Floats
434
435	This instruction replicates its result.
436
437	.. math::
438
439	\begin{aligned}
440	dst = & f32\_to\_f16(src.x) \| \\
441	( & f32\_to\_f16(src.y) \ll 16)
442	\end{aligned}
443
444	.. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars
445
446	This instruction replicates its result.
447
448	.. math::
449
450	\begin{aligned}
451	dst = & f32\_to\_unorm16(src.x) \| \\
452	( & f32\_to\_unorm16(src.y) \ll 16)
453	\end{aligned}
454
455
456	.. opcode:: PK4B - Pack Four Signed 8-bit Scalars
457
458	This instruction replicates its result.
459
460	.. math::
461
462	\begin{aligned}
463	dst = & f32\_to\_snorm8(src.x) \| \\
464	( & f32\_to\_snorm8(src.y) \ll 8) \| \\
465	( & f32\_to\_snorm8(src.z) \ll 16) \| \\
466	( & f32\_to\_snorm8(src.w) \ll 24)
467	\end{aligned}
468
469
470	.. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars
471
472	This instruction replicates its result.
473
474	.. math::
475
476	\begin{aligned}
477	dst = & f32\_to\_unorm8(src.x) \| \\
478	( & f32\_to\_unorm8(src.y) \ll 8) \| \\
479	( & f32\_to\_unorm8(src.z) \ll 16) \| \\
480	( & f32\_to\_unorm8(src.w) \ll 24)
481	\end{aligned}
482
483
484	.. opcode:: SEQ - Set On Equal
485
486	.. math::
487
488	dst.x = (src0.x == src1.x) ? 1.0F : 0.0F
489
490	dst.y = (src0.y == src1.y) ? 1.0F : 0.0F
491
492	dst.z = (src0.z == src1.z) ? 1.0F : 0.0F
493
494	dst.w = (src0.w == src1.w) ? 1.0F : 0.0F
495
496
497	.. opcode:: SGT - Set On Greater Than
498
499	.. math::
500
501	dst.x = (src0.x > src1.x) ? 1.0F : 0.0F
502
503	dst.y = (src0.y > src1.y) ? 1.0F : 0.0F
504
505	dst.z = (src0.z > src1.z) ? 1.0F : 0.0F
506
507	dst.w = (src0.w > src1.w) ? 1.0F : 0.0F
508
509
510	.. opcode:: SIN - Sine
511
512	This instruction replicates its result.
513
514	.. math::
515
516	dst = \sin{src.x}
517
518
519	.. opcode:: SLE - Set On Less Equal Than
520
521	.. math::
522
523	dst.x = (src0.x <= src1.x) ? 1.0F : 0.0F
524
525	dst.y = (src0.y <= src1.y) ? 1.0F : 0.0F
526
527	dst.z = (src0.z <= src1.z) ? 1.0F : 0.0F
528
529	dst.w = (src0.w <= src1.w) ? 1.0F : 0.0F
530
531
532	.. opcode:: SNE - Set On Not Equal
533
534	.. math::
535
536	dst.x = (src0.x != src1.x) ? 1.0F : 0.0F
537
538	dst.y = (src0.y != src1.y) ? 1.0F : 0.0F
539
540	dst.z = (src0.z != src1.z) ? 1.0F : 0.0F
541
542	dst.w = (src0.w != src1.w) ? 1.0F : 0.0F
543
544
545	.. opcode:: TEX - Texture Lookup
546
547	for array textures src0.y contains the slice for 1D,
548	and src0.z contain the slice for 2D.
549
550	for shadow textures with no arrays (and not cube map),
551	src0.z contains the reference value.
552
553	for shadow textures with arrays, src0.z contains
554	the reference value for 1D arrays, and src0.w contains
555	the reference value for 2D arrays and cube maps.
556
557	for cube map array shadow textures, the reference value
558	cannot be passed in src0.w, and TEX2 must be used instead.
559
560	.. math::
561
562	coord = src0
563
564	shadow\_ref = src0.z \textrm{ or } src0.w \textrm{ (optional)}
565
566	unit = src1
567
568	dst = texture\_sample(unit, coord, shadow\_ref)
569
570
571	.. opcode:: TEX2 - Texture Lookup (for shadow cube map arrays only)
572
573	this is the same as TEX, but uses another reg to encode the
574	reference value.
575
576	.. math::
577
578	coord = src0
579
580	shadow\_ref = src1.x
581
582	unit = src2
583
584	dst = texture\_sample(unit, coord, shadow\_ref)
585
586
587	.. opcode:: TXD - Texture Lookup with Derivatives
588
589	.. math::
590
591	coord = src0
592
593	ddx = src1
594
595	ddy = src2
596
597	unit = src3
598
599	dst = texture\_sample\_deriv(unit, coord, ddx, ddy)
600
601
602	.. opcode:: TXP - Projective Texture Lookup
603
604	.. math::
605
606	coord.x = src0.x / src0.w
607
608	coord.y = src0.y / src0.w
609
610	coord.z = src0.z / src0.w
611
612	coord.w = src0.w
613
614	unit = src1
615
616	dst = texture\_sample(unit, coord)
617
618
619	.. opcode:: UP2H - Unpack Two 16-Bit Floats
620
621	.. math::
622
623	dst.x = f16\_to\_f32(src0.x \& 0xffff)
624
625	dst.y = f16\_to\_f32(src0.x \gg 16)
626
627	dst.z = f16\_to\_f32(src0.x \& 0xffff)
628
629	dst.w = f16\_to\_f32(src0.x \gg 16)
630
631	.. note::
632
633	Considered for removal.
634
635	.. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars
636
637	TBD
638
639	.. note::
640
641	Considered for removal.
642
643	.. opcode:: UP4B - Unpack Four Signed 8-Bit Values
644
645	TBD
646
647	.. note::
648
649	Considered for removal.
650
651	.. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars
652
653	TBD
654
655	.. note::
656
657	Considered for removal.
658
659
660	.. opcode:: ARR - Address Register Load With Round
661
662	.. math::
663
664	dst.x = (int) round(src.x)
665
666	dst.y = (int) round(src.y)
667
668	dst.z = (int) round(src.z)
669
670	dst.w = (int) round(src.w)
671
672
673	.. opcode:: SSG - Set Sign
674
675	.. math::
676
677	dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0
678
679	dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0
680
681	dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0
682
683	dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0
684
685
686	.. opcode:: CMP - Compare
687
688	.. math::
689
690	dst.x = (src0.x < 0) ? src1.x : src2.x
691
692	dst.y = (src0.y < 0) ? src1.y : src2.y
693
694	dst.z = (src0.z < 0) ? src1.z : src2.z
695
696	dst.w = (src0.w < 0) ? src1.w : src2.w
697
698
699	.. opcode:: KILL_IF - Conditional Discard
700
701	Conditional discard. Allowed in fragment shaders only.
702
703	Pseudocode::
704
705	if (src.x < 0 \|\| src.y < 0 \|\| src.z < 0 \|\| src.w < 0)
706	discard
707	endif
708
709
710	.. opcode:: KILL - Discard
711
712	Unconditional discard. Allowed in fragment shaders only.
713
714
715	.. opcode:: DEMOTE - Demote Invocation to a Helper
716
717	This demotes the current invocation to a helper, but continues
718	execution (while KILL may or may not terminate the
719	invocation). After this runs, all the usual helper invocation rules
720	apply about discarding buffer and render target writes. This is
721	useful for having accurate derivatives in the other invocations
722	which have not been demoted.
723
724	Allowed in fragment shaders only.
725
726
727	.. opcode:: READ_HELPER - Reads Invocation Helper Status
728
729	This is identical to ``TGSI_SEMANTIC_HELPER_INVOCATION``, except
730	this will read the current value, which might change as a result of
731	a ``DEMOTE`` instruction.
732
733	Allowed in fragment shaders only.
734
735
736	.. opcode:: TXB - Texture Lookup With Bias
737
738	for cube map array textures and shadow cube maps, the bias value
739	cannot be passed in src0.w, and TXB2 must be used instead.
740
741	if the target is a shadow texture, the reference value is always
742	in src.z (this prevents shadow 3d and shadow 2d arrays from
743	using this instruction, but this is not needed).
744
745	.. math::
746
747	coord.x = src0.x
748
749	coord.y = src0.y
750
751	coord.z = src0.z
752
753	coord.w = none
754
755	bias = src0.w
756
757	unit = src1
758
759	dst = texture\_sample(unit, coord, bias)
760
761
762	.. opcode:: TXB2 - Texture Lookup With Bias (some cube maps only)
763
764	this is the same as TXB, but uses another reg to encode the
765	LOD bias value for cube map arrays and shadow cube maps.
766	Presumably shadow 2d arrays and shadow 3d targets could use
767	this encoding too, but this is not legal.
768
769	if the target is a shadow cube map array, the reference value is in
770	src1.y.
771
772	.. math::
773
774	coord = src0
775
776	bias = src1.x
777
778	unit = src2
779
780	dst = texture\_sample(unit, coord, bias)
781
782
783	.. opcode:: DIV - Divide
784
785	.. math::
786
787	dst.x = \frac{src0.x}{src1.x}
788
789	dst.y = \frac{src0.y}{src1.y}
790
791	dst.z = \frac{src0.z}{src1.z}
792
793	dst.w = \frac{src0.w}{src1.w}
794
795
796	.. opcode:: DP2 - 2-component Dot Product
797
798	This instruction replicates its result.
799
800	.. math::
801
802	\begin{aligned}
803	dst = & src0.x \times src1.x + \\
804	& src0.y \times src1.y
805	\end{aligned}
806
807	.. opcode:: TEX_LZ - Texture Lookup With LOD = 0
808
809	This is the same as TXL with LOD = 0. Like every texture opcode, it obeys
810	pipe_sampler_view::u.tex.first_level and pipe_sampler_state::min_lod.
811	There is no way to override those two in shaders.
812
813	.. math::
814
815	coord.x = src0.x
816
817	coord.y = src0.y
818
819	coord.z = src0.z
820
821	coord.w = none
822
823	lod = 0
824
825	unit = src1
826
827	dst = texture\_sample(unit, coord, lod)
828
829
830	.. opcode:: TXL - Texture Lookup With explicit LOD
831
832	for cube map array textures, the explicit LOD value
833	cannot be passed in src0.w, and TXL2 must be used instead.
834
835	if the target is a shadow texture, the reference value is always
836	in src.z (this prevents shadow 3d / 2d array / cube targets from
837	using this instruction, but this is not needed).
838
839	.. math::
840
841	coord.x = src0.x
842
843	coord.y = src0.y
844
845	coord.z = src0.z
846
847	coord.w = none
848
849	lod = src0.w
850
851	unit = src1
852
853	dst = texture\_sample(unit, coord, lod)
854
855
856	.. opcode:: TXL2 - Texture Lookup With explicit LOD (for cube map arrays only)
857
858	this is the same as TXL, but uses another reg to encode the
859	explicit LOD value.
860	Presumably shadow 3d / 2d array / cube targets could use
861	this encoding too, but this is not legal.
862
863	if the target is a shadow cube map array, the reference value is in
864	src1.y.
865
866	.. math::
867
868	coord = src0
869
870	lod = src1.x
871
872	unit = src2
873
874	dst = texture\_sample(unit, coord, lod)
875
876
877	Compute ISA
878	^^^^^^^^^^^^^^^^^^^^^^^^
879
880	These opcodes are primarily provided for special-use computational shaders.
881	Support for these opcodes indicated by a special pipe capability bit (TBD).
882
883	XXX doesn't look like most of the opcodes really belong here.
884
885	.. opcode:: CEIL - Ceiling
886
887	.. math::
888
889	dst.x = \lceil src.x\rceil
890
891	dst.y = \lceil src.y\rceil
892
893	dst.z = \lceil src.z\rceil
894
895	dst.w = \lceil src.w\rceil
896
897
898	.. opcode:: TRUNC - Truncate
899
900	.. math::
901
902	dst.x = trunc(src.x)
903
904	dst.y = trunc(src.y)
905
906	dst.z = trunc(src.z)
907
908	dst.w = trunc(src.w)
909
910
911	.. opcode:: MOD - Modulus
912
913	.. math::
914
915	dst.x = src0.x \bmod src1.x
916
917	dst.y = src0.y \bmod src1.y
918
919	dst.z = src0.z \bmod src1.z
920
921	dst.w = src0.w \bmod src1.w
922
923
924	.. opcode:: UARL - Integer Address Register Load
925
926	Moves the contents of the source register, assumed to be an integer, into the
927	destination register, which is assumed to be an address (ADDR) register.
928
929
930	.. opcode:: TXF - Texel Fetch
931
932	As per :ext:`GL_NV_gpu_program4`, extract a single texel from a specified
933	texture image or PIPE_BUFFER resource. The source sampler may not be a
934	CUBE or SHADOW. src0 is a
935	four-component signed integer vector used to identify the single texel
936	accessed. 3 components + level. If the texture is multisampled, then
937	the fourth component indicates the sample, not the mipmap level.
938	Just like texture instructions, an optional
939	offset vector is provided, which is subject to various driver restrictions
940	(regarding range, source of offsets). This instruction ignores the sampler
941	state.
942
943	TXF(uint_vec coord, int_vec offset).
944
945
946	.. opcode:: TXQ - Texture Size Query
947
948	As per :ext:`GL_NV_gpu_program4`, retrieve the dimensions of the texture
949	depending on the target. For 1D (width), 2D/RECT/CUBE (width, height),
950	3D (width, height, depth), 1D array (width, layers), 2D array (width,
951	height, layers). Also return the number of accessible levels
952	(last_level - first_level + 1) in W.
953
954	For components which don't return a resource dimension, their value
955	is undefined.
956
957	.. math::
958
959	lod = src0.x
960
961	dst.x = texture\_width(unit, lod)
962
963	dst.y = texture\_height(unit, lod)
964
965	dst.z = texture\_depth(unit, lod)
966
967	dst.w = texture\_levels(unit)
968
969
970	.. opcode:: TXQS - Texture Samples Query
971
972	This retrieves the number of samples in the texture, and stores it
973	into the x component as an unsigned integer. The other components are
974	undefined. If the texture is not multisampled, this function returns
975	(1, undef, undef, undef).
976
977	.. math::
978
979	dst.x = texture\_samples(unit)
980
981
982	.. opcode:: TG4 - Texture Gather
983
984	As per :ext:`GL_ARB_texture_gather`, gathers the four texels to be used in a
985	bi-linear filtering operation and packs them into a single register.
986	Only works with 2D, 2D array, cubemaps, and cubemaps arrays. For 2D
987	textures, only the addressing modes of the sampler and the top level of any
988	mip pyramid are used. Set W to zero. It behaves like the TEX instruction,
989	but a filtered sample is not generated. The four samples that contribute to
990	filtering are placed into XYZW in clockwise order, starting with the (u,v)
991	texture coordinate delta at the following locations (-, +), (+, +), (+, -),
992	(-, -), where the magnitude of the deltas are half a texel.
993
994	PIPE_CAP_TEXTURE_SM5 enhances this instruction to support shadow per-sample
995	depth compares, single component selection, and a non-constant offset. It
996	doesn't allow support for the GL independent offset to get i0,j0. This would
997	require another CAP is HW can do it natively. For now we lower that before
998	TGSI.
999
1000	PIPE_CAP_TGSI_TG4_COMPONENT_IN_SWIZZLE changes the encoding so that component
1001	is stored in the sampler source swizzle x.
1002
1003	(without TGSI_TG4_COMPONENT_IN_SWIZZLE)
1004
1005	.. math::
1006
1007	coord = src0
1008
1009	component = src1
1010
1011	dst = texture\_gather4 (unit, coord, component)
1012
1013	(with TGSI_TG4_COMPONENT_IN_SWIZZLE)
1014
1015	.. math::
1016
1017	coord = src0
1018
1019	dst = texture\_gather4 (unit, coord)
1020
1021	\text{component is encoded in sampler swizzle.}
1022
1023	(with SM5 - cube array shadow)
1024
1025	.. math::
1026
1027	coord = src0
1028
1029	compare = src1
1030
1031	dst = texture\_gather (uint, coord, compare)
1032
1033	.. opcode:: LODQ - level of detail query
1034
1035	Compute the LOD information that the texture pipe would use to access the
1036	texture. The Y component contains the computed LOD lambda_prime. The X
1037	component contains the LOD that will be accessed, based on min/max LODs
1038	and mipmap filters.
1039
1040	.. math::
1041
1042	coord = src0
1043
1044	dst.xy = lodq(uint, coord);
1045
1046	.. opcode:: CLOCK - retrieve the current shader time
1047
1048	Invoking this instruction multiple times in the same shader should
1049	cause monotonically increasing values to be returned. The values
1050	are implicitly 64-bit, so if fewer than 64 bits of precision are
1051	available, to provide expected wraparound semantics, the value
1052	should be shifted up so that the most significant bit of the time
1053	is the most significant bit of the 64-bit value.
1054
1055	.. math::
1056
1057	dst.xy = clock()
1058
1059
1060	Integer ISA
1061	^^^^^^^^^^^^^^^^^^^^^^^^
1062	These opcodes are used for integer operations.
1063	Support for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?)
1064
1065
1066	.. opcode:: I2F - Signed Integer To Float
1067
1068	Rounding is unspecified (round to nearest even suggested).
1069
1070	.. math::
1071
1072	dst.x = (float) src.x
1073
1074	dst.y = (float) src.y
1075
1076	dst.z = (float) src.z
1077
1078	dst.w = (float) src.w
1079
1080
1081	.. opcode:: U2F - Unsigned Integer To Float
1082
1083	Rounding is unspecified (round to nearest even suggested).
1084
1085	.. math::
1086
1087	dst.x = (float) src.x
1088
1089	dst.y = (float) src.y
1090
1091	dst.z = (float) src.z
1092
1093	dst.w = (float) src.w
1094
1095
1096	.. opcode:: F2I - Float to Signed Integer
1097
1098	Rounding is towards zero (truncate).
1099	Values outside signed range (including NaNs) produce undefined results.
1100
1101	.. math::
1102
1103	dst.x = (int) src.x
1104
1105	dst.y = (int) src.y
1106
1107	dst.z = (int) src.z
1108
1109	dst.w = (int) src.w
1110
1111
1112	.. opcode:: F2U - Float to Unsigned Integer
1113
1114	Rounding is towards zero (truncate).
1115	Values outside unsigned range (including NaNs) produce undefined results.
1116
1117	.. math::
1118
1119	dst.x = (unsigned) src.x
1120
1121	dst.y = (unsigned) src.y
1122
1123	dst.z = (unsigned) src.z
1124
1125	dst.w = (unsigned) src.w
1126
1127
1128	.. opcode:: UADD - Integer Add
1129
1130	This instruction works the same for signed and unsigned integers.
1131	The low 32bit of the result is returned.
1132
1133	.. math::
1134
1135	dst.x = src0.x + src1.x
1136
1137	dst.y = src0.y + src1.y
1138
1139	dst.z = src0.z + src1.z
1140
1141	dst.w = src0.w + src1.w
1142
1143
1144	.. opcode:: UMAD - Integer Multiply And Add
1145
1146	This instruction works the same for signed and unsigned integers.
1147	The multiplication returns the low 32bit (as does the result itself).
1148
1149	.. math::
1150
1151	dst.x = src0.x \times src1.x + src2.x
1152
1153	dst.y = src0.y \times src1.y + src2.y
1154
1155	dst.z = src0.z \times src1.z + src2.z
1156
1157	dst.w = src0.w \times src1.w + src2.w
1158
1159
1160	.. opcode:: UMUL - Integer Multiply
1161
1162	This instruction works the same for signed and unsigned integers.
1163	The low 32bit of the result is returned.
1164
1165	.. math::
1166
1167	dst.x = src0.x \times src1.x
1168
1169	dst.y = src0.y \times src1.y
1170
1171	dst.z = src0.z \times src1.z
1172
1173	dst.w = src0.w \times src1.w
1174
1175
1176	.. opcode:: IMUL_HI - Signed Integer Multiply High Bits
1177
1178	The high 32bits of the multiplication of 2 signed integers are returned.
1179
1180	.. math::
1181
1182	dst.x = (src0.x \times src1.x) \gg 32
1183
1184	dst.y = (src0.y \times src1.y) \gg 32
1185
1186	dst.z = (src0.z \times src1.z) \gg 32
1187
1188	dst.w = (src0.w \times src1.w) \gg 32
1189
1190
1191	.. opcode:: UMUL_HI - Unsigned Integer Multiply High Bits
1192
1193	The high 32bits of the multiplication of 2 unsigned integers are returned.
1194
1195	.. math::
1196
1197	dst.x = (src0.x \times src1.x) \gg 32
1198
1199	dst.y = (src0.y \times src1.y) \gg 32
1200
1201	dst.z = (src0.z \times src1.z) \gg 32
1202
1203	dst.w = (src0.w \times src1.w) \gg 32
1204
1205
1206	.. opcode:: IDIV - Signed Integer Division
1207
1208	TBD: behavior for division by zero.
1209
1210	.. math::
1211
1212	dst.x = \frac{src0.x}{src1.x}
1213
1214	dst.y = \frac{src0.y}{src1.y}
1215
1216	dst.z = \frac{src0.z}{src1.z}
1217
1218	dst.w = \frac{src0.w}{src1.w}
1219
1220
1221	.. opcode:: UDIV - Unsigned Integer Division
1222
1223	For division by zero, ``0xffffffff`` is returned.
1224
1225	.. math::
1226
1227	dst.x = \frac{src0.x}{src1.x}
1228
1229	dst.y = \frac{src0.y}{src1.y}
1230
1231	dst.z = \frac{src0.z}{src1.z}
1232
1233	dst.w = \frac{src0.w}{src1.w}
1234
1235
1236	.. opcode:: UMOD - Unsigned Integer Remainder
1237
1238	If src1 is zero, ``0xffffffff`` is returned.
1239
1240	.. math::
1241
1242	dst.x = src0.x \bmod src1.x
1243
1244	dst.y = src0.y \bmod src1.y
1245
1246	dst.z = src0.z \bmod src1.z
1247
1248	dst.w = src0.w \bmod src1.w
1249
1250
1251	.. opcode:: NOT - Bitwise Not
1252
1253	.. math::
1254
1255	dst.x = \sim src.x
1256
1257	dst.y = \sim src.y
1258
1259	dst.z = \sim src.z
1260
1261	dst.w = \sim src.w
1262
1263
1264	.. opcode:: AND - Bitwise And
1265
1266	.. math::
1267
1268	dst.x = src0.x \& src1.x
1269
1270	dst.y = src0.y \& src1.y
1271
1272	dst.z = src0.z \& src1.z
1273
1274	dst.w = src0.w \& src1.w
1275
1276
1277	.. opcode:: OR - Bitwise Or
1278
1279	.. math::
1280
1281	dst.x = src0.x \| src1.x
1282
1283	dst.y = src0.y \| src1.y
1284
1285	dst.z = src0.z \| src1.z
1286
1287	dst.w = src0.w \| src1.w
1288
1289
1290	.. opcode:: XOR - Bitwise Xor
1291
1292	.. math::
1293
1294	dst.x = src0.x \oplus src1.x
1295
1296	dst.y = src0.y \oplus src1.y
1297
1298	dst.z = src0.z \oplus src1.z
1299
1300	dst.w = src0.w \oplus src1.w
1301
1302
1303	.. opcode:: IMAX - Maximum of Signed Integers
1304
1305	.. math::
1306
1307	dst.x = max(src0.x, src1.x)
1308
1309	dst.y = max(src0.y, src1.y)
1310
1311	dst.z = max(src0.z, src1.z)
1312
1313	dst.w = max(src0.w, src1.w)
1314
1315
1316	.. opcode:: UMAX - Maximum of Unsigned Integers
1317
1318	.. math::
1319
1320	dst.x = max(src0.x, src1.x)
1321
1322	dst.y = max(src0.y, src1.y)
1323
1324	dst.z = max(src0.z, src1.z)
1325
1326	dst.w = max(src0.w, src1.w)
1327
1328
1329	.. opcode:: IMIN - Minimum of Signed Integers
1330
1331	.. math::
1332
1333	dst.x = min(src0.x, src1.x)
1334
1335	dst.y = min(src0.y, src1.y)
1336
1337	dst.z = min(src0.z, src1.z)
1338
1339	dst.w = min(src0.w, src1.w)
1340
1341
1342	.. opcode:: UMIN - Minimum of Unsigned Integers
1343
1344	.. math::
1345
1346	dst.x = min(src0.x, src1.x)
1347
1348	dst.y = min(src0.y, src1.y)
1349
1350	dst.z = min(src0.z, src1.z)
1351
1352	dst.w = min(src0.w, src1.w)
1353
1354
1355	.. opcode:: SHL - Shift Left
1356
1357	The shift count is masked with ``0x1f`` before the shift is applied.
1358
1359	.. math::
1360
1361	dst.x = src0.x \ll (0x1f \& src1.x)
1362
1363	dst.y = src0.y \ll (0x1f \& src1.y)
1364
1365	dst.z = src0.z \ll (0x1f \& src1.z)
1366
1367	dst.w = src0.w \ll (0x1f \& src1.w)
1368
1369
1370	.. opcode:: ISHR - Arithmetic Shift Right (of Signed Integer)
1371
1372	The shift count is masked with ``0x1f`` before the shift is applied.
1373
1374	.. math::
1375
1376	dst.x = src0.x \gg (0x1f \& src1.x)
1377
1378	dst.y = src0.y \gg (0x1f \& src1.y)
1379
1380	dst.z = src0.z \gg (0x1f \& src1.z)
1381
1382	dst.w = src0.w \gg (0x1f \& src1.w)
1383
1384
1385	.. opcode:: USHR - Logical Shift Right
1386
1387	The shift count is masked with ``0x1f`` before the shift is applied.
1388
1389	.. math::
1390
1391	dst.x = src0.x \gg (unsigned) (0x1f \& src1.x)
1392
1393	dst.y = src0.y \gg (unsigned) (0x1f \& src1.y)
1394
1395	dst.z = src0.z \gg (unsigned) (0x1f \& src1.z)
1396
1397	dst.w = src0.w \gg (unsigned) (0x1f \& src1.w)
1398
1399
1400	.. opcode:: UCMP - Integer Conditional Move
1401
1402	.. math::
1403
1404	dst.x = src0.x ? src1.x : src2.x
1405
1406	dst.y = src0.y ? src1.y : src2.y
1407
1408	dst.z = src0.z ? src1.z : src2.z
1409
1410	dst.w = src0.w ? src1.w : src2.w
1411
1412
1413
1414	.. opcode:: ISSG - Integer Set Sign
1415
1416	.. math::
1417
1418	dst.x = (src0.x < 0) ? -1 : (src0.x > 0) ? 1 : 0
1419
1420	dst.y = (src0.y < 0) ? -1 : (src0.y > 0) ? 1 : 0
1421
1422	dst.z = (src0.z < 0) ? -1 : (src0.z > 0) ? 1 : 0
1423
1424	dst.w = (src0.w < 0) ? -1 : (src0.w > 0) ? 1 : 0
1425
1426
1427
1428	.. opcode:: FSLT - Float Set On Less Than (ordered)
1429
1430	Same comparison as SLT but returns integer instead of 1.0/0.0 float
1431
1432	.. math::
1433
1434	dst.x = (src0.x < src1.x) ? \sim 0 : 0
1435
1436	dst.y = (src0.y < src1.y) ? \sim 0 : 0
1437
1438	dst.z = (src0.z < src1.z) ? \sim 0 : 0
1439
1440	dst.w = (src0.w < src1.w) ? \sim 0 : 0
1441
1442
1443	.. opcode:: ISLT - Signed Integer Set On Less Than
1444
1445	.. math::
1446
1447	dst.x = (src0.x < src1.x) ? \sim 0 : 0
1448
1449	dst.y = (src0.y < src1.y) ? \sim 0 : 0
1450
1451	dst.z = (src0.z < src1.z) ? \sim 0 : 0
1452
1453	dst.w = (src0.w < src1.w) ? \sim 0 : 0
1454
1455
1456	.. opcode:: USLT - Unsigned Integer Set On Less Than
1457
1458	.. math::
1459
1460	dst.x = (src0.x < src1.x) ? \sim 0 : 0
1461
1462	dst.y = (src0.y < src1.y) ? \sim 0 : 0
1463
1464	dst.z = (src0.z < src1.z) ? \sim 0 : 0
1465
1466	dst.w = (src0.w < src1.w) ? \sim 0 : 0
1467
1468
1469	.. opcode:: FSGE - Float Set On Greater Equal Than (ordered)
1470
1471	Same comparison as SGE but returns integer instead of 1.0/0.0 float
1472
1473	.. math::
1474
1475	dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1476
1477	dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1478
1479	dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1480
1481	dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1482
1483
1484	.. opcode:: ISGE - Signed Integer Set On Greater Equal Than
1485
1486	.. math::
1487
1488	dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1489
1490	dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1491
1492	dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1493
1494	dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1495
1496
1497	.. opcode:: USGE - Unsigned Integer Set On Greater Equal Than
1498
1499	.. math::
1500
1501	dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1502
1503	dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1504
1505	dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1506
1507	dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1508
1509
1510	.. opcode:: FSEQ - Float Set On Equal (ordered)
1511
1512	Same comparison as SEQ but returns integer instead of 1.0/0.0 float
1513
1514	.. math::
1515
1516	dst.x = (src0.x == src1.x) ? \sim 0 : 0
1517
1518	dst.y = (src0.y == src1.y) ? \sim 0 : 0
1519
1520	dst.z = (src0.z == src1.z) ? \sim 0 : 0
1521
1522	dst.w = (src0.w == src1.w) ? \sim 0 : 0
1523
1524
1525	.. opcode:: USEQ - Integer Set On Equal
1526
1527	.. math::
1528
1529	dst.x = (src0.x == src1.x) ? \sim 0 : 0
1530
1531	dst.y = (src0.y == src1.y) ? \sim 0 : 0
1532
1533	dst.z = (src0.z == src1.z) ? \sim 0 : 0
1534
1535	dst.w = (src0.w == src1.w) ? \sim 0 : 0
1536
1537
1538	.. opcode:: FSNE - Float Set On Not Equal (unordered)
1539
1540	Same comparison as SNE but returns integer instead of 1.0/0.0 float
1541
1542	.. math::
1543
1544	dst.x = (src0.x != src1.x) ? \sim 0 : 0
1545
1546	dst.y = (src0.y != src1.y) ? \sim 0 : 0
1547
1548	dst.z = (src0.z != src1.z) ? \sim 0 : 0
1549
1550	dst.w = (src0.w != src1.w) ? \sim 0 : 0
1551
1552
1553	.. opcode:: USNE - Integer Set On Not Equal
1554
1555	.. math::
1556
1557	dst.x = (src0.x != src1.x) ? \sim 0 : 0
1558
1559	dst.y = (src0.y != src1.y) ? \sim 0 : 0
1560
1561	dst.z = (src0.z != src1.z) ? \sim 0 : 0
1562
1563	dst.w = (src0.w != src1.w) ? \sim 0 : 0
1564
1565
1566	.. opcode:: INEG - Integer Negate
1567
1568	Two's complement.
1569
1570	.. math::
1571
1572	dst.x = -src.x
1573
1574	dst.y = -src.y
1575
1576	dst.z = -src.z
1577
1578	dst.w = -src.w
1579
1580
1581	.. opcode:: IABS - Integer Absolute Value
1582
1583	.. math::
1584
1585	dst.x = \|src.x\|
1586
1587	dst.y = \|src.y\|
1588
1589	dst.z = \|src.z\|
1590
1591	dst.w = \|src.w\|
1592
1593	Bitwise ISA
1594	^^^^^^^^^^^
1595	These opcodes are used for bit-level manipulation of integers.
1596
1597	.. opcode:: IBFE - Signed Bitfield Extract
1598
1599	Like GLSL bitfieldExtract. Extracts a set of bits from the input, and
1600	sign-extends them if the high bit of the extracted window is set.
1601
1602	Pseudocode::
1603
1604	def ibfe(value, offset, bits):
1605	if offset < 0 or bits < 0 or offset + bits > 32:
1606	return undefined
1607	if bits == 0: return 0
1608	# Note: >> sign-extends
1609	return (value << (32 - offset - bits)) >> (32 - bits)
1610
1611	.. opcode:: UBFE - Unsigned Bitfield Extract
1612
1613	Like GLSL bitfieldExtract. Extracts a set of bits from the input, without
1614	any sign-extension.
1615
1616	Pseudocode::
1617
1618	def ubfe(value, offset, bits):
1619	if offset < 0 or bits < 0 or offset + bits > 32:
1620	return undefined
1621	if bits == 0: return 0
1622	# Note: >> does not sign-extend
1623	return (value << (32 - offset - bits)) >> (32 - bits)
1624
1625	.. opcode:: BFI - Bitfield Insert
1626
1627	Like GLSL bitfieldInsert. Replaces a bit region of 'base' with the low bits
1628	of 'insert'.
1629
1630	Pseudocode::
1631
1632	def bfi(base, insert, offset, bits):
1633	if offset < 0 or bits < 0 or offset + bits > 32:
1634	return undefined
1635	# << defined such that mask == ~0 when bits == 32, offset == 0
1636	mask = ((1 << bits) - 1) << offset
1637	return ((insert << offset) & mask) \| (base & ~mask)
1638
1639	.. opcode:: BREV - Bitfield Reverse
1640
1641	See SM5 instruction BFREV. Reverses the bits of the argument.
1642
1643	.. opcode:: POPC - Population Count
1644
1645	See SM5 instruction COUNTBITS. Counts the number of set bits in the argument.
1646
1647	.. opcode:: LSB - Index of lowest set bit
1648
1649	See SM5 instruction FIRSTBIT_LO. Computes the 0-based index of the first set
1650	bit of the argument. Returns -1 if none are set.
1651
1652	.. opcode:: IMSB - Index of highest non-sign bit
1653
1654	See SM5 instruction FIRSTBIT_SHI. Computes the 0-based index of the highest
1655	non-sign bit of the argument (i.e. highest 0 bit for negative numbers,
1656	highest 1 bit for positive numbers). Returns -1 if all bits are the same
1657	(i.e. for inputs 0 and -1).
1658
1659	.. opcode:: UMSB - Index of highest set bit
1660
1661	See SM5 instruction FIRSTBIT_HI. Computes the 0-based index of the highest
1662	set bit of the argument. Returns -1 if none are set.
1663
1664	Geometry ISA
1665	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1666
1667	These opcodes are only supported in geometry shaders; they have no meaning
1668	in any other type of shader.
1669
1670	.. opcode:: EMIT - Emit
1671
1672	Generate a new vertex for the current primitive into the specified vertex
1673	stream using the values in the output registers.
1674
1675
1676	.. opcode:: ENDPRIM - End Primitive
1677
1678	Complete the current primitive in the specified vertex stream (consisting of
1679	the emitted vertices), and start a new one.
1680
1681
1682	GLSL ISA
1683	^^^^^^^^^^
1684
1685	These opcodes are part of :term:`GLSL`'s opcode set. Support for these
1686	opcodes is determined by a special capability bit, ``GLSL``.
1687	Some require glsl version 1.30 (UIF/SWITCH/CASE/DEFAULT/ENDSWITCH).
1688
1689	.. opcode:: CAL - Subroutine Call
1690
1691	Pseudocode::
1692
1693	push(pc)
1694	pc = target
1695
1696
1697	.. opcode:: RET - Subroutine Call Return
1698
1699	Pseudocode::
1700
1701	pc = pop()
1702
1703
1704	.. opcode:: CONT - Continue
1705
1706	Unconditionally moves the point of execution to the instruction after the
1707	last BGNLOOP. The instruction must appear within a BGNLOOP/ENDLOOP.
1708
1709	.. note::
1710
1711	Support for CONT is determined by a special capability bit,
1712	``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information.
1713
1714
1715	.. opcode:: BGNLOOP - Begin a Loop
1716
1717	Start a loop. Must have a matching ENDLOOP.
1718
1719
1720	.. opcode:: BGNSUB - Begin Subroutine
1721
1722	Starts definition of a subroutine. Must have a matching ENDSUB.
1723
1724
1725	.. opcode:: ENDLOOP - End a Loop
1726
1727	End a loop started with BGNLOOP.
1728
1729
1730	.. opcode:: ENDSUB - End Subroutine
1731
1732	Ends definition of a subroutine.
1733
1734
1735	.. opcode:: NOP - No Operation
1736
1737	Do nothing.
1738
1739
1740	.. opcode:: BRK - Break
1741
1742	Unconditionally moves the point of execution to the instruction after the
1743	next ENDLOOP or ENDSWITCH. The instruction must appear within a
1744	BGNLOOP/ENDLOOP or SWITCH/ENDSWITCH.
1745
1746
1747	.. opcode:: IF - Float If
1748
1749	Start an IF ... ELSE .. ENDIF block. Condition evaluates to true if
1750
1751	src0.x != 0.0
1752
1753	where src0.x is interpreted as a floating point register.
1754
1755
1756	.. opcode:: UIF - Bitwise If
1757
1758	Start an UIF ... ELSE .. ENDIF block. Condition evaluates to true if
1759
1760	src0.x != 0
1761
1762	where src0.x is interpreted as an integer register.
1763
1764
1765	.. opcode:: ELSE - Else
1766
1767	Starts an else block, after an IF or UIF statement.
1768
1769
1770	.. opcode:: ENDIF - End If
1771
1772	Ends an IF or UIF block.
1773
1774
1775	.. opcode:: SWITCH - Switch
1776
1777	Starts a C-style switch expression. The switch consists of one or multiple
1778	CASE statements, and at most one DEFAULT statement. Execution of a statement
1779	ends when a BRK is hit, but just like in C falling through to other cases
1780	without a break is allowed. Similarly, DEFAULT label is allowed anywhere not
1781	just as last statement, and fallthrough is allowed into/from it.
1782	CASE src arguments are evaluated at bit level against the SWITCH src argument.
1783
1784	Example::
1785
1786	SWITCH src[0].x
1787	CASE src[0].x
1788	(some instructions here)
1789	(optional BRK here)
1790	DEFAULT
1791	(some instructions here)
1792	(optional BRK here)
1793	CASE src[0].x
1794	(some instructions here)
1795	(optional BRK here)
1796	ENDSWITCH
1797
1798
1799	.. opcode:: CASE - Switch case
1800
1801	This represents a switch case label. The src arg must be an integer immediate.
1802
1803
1804	.. opcode:: DEFAULT - Switch default
1805
1806	This represents the default case in the switch, which is taken if no other
1807	case matches.
1808
1809
1810	.. opcode:: ENDSWITCH - End of switch
1811
1812	Ends a switch expression.
1813
1814
1815	Interpolation ISA
1816	^^^^^^^^^^^^^^^^^
1817
1818	The interpolation instructions allow an input to be interpolated in a
1819	different way than its declaration. This corresponds to the GLSL 4.00
1820	interpolateAt* functions. The first argument of each of these must come from
1821	``TGSI_FILE_INPUT``.
1822
1823	.. opcode:: INTERP_CENTROID - Interpolate at the centroid
1824
1825	Interpolates the varying specified by src0 at the centroid
1826
1827	.. opcode:: INTERP_SAMPLE - Interpolate at the specified sample
1828
1829	Interpolates the varying specified by src0 at the sample id
1830	specified by src1.x (interpreted as an integer)
1831
1832	.. opcode:: INTERP_OFFSET - Interpolate at the specified offset
1833
1834	Interpolates the varying specified by src0 at the offset src1.xy
1835	from the pixel center (interpreted as floats)
1836
1837
1838	.. _doubleopcodes:
1839
1840	Double ISA
1841	^^^^^^^^^^^^^^^
1842
1843	The double-precision opcodes reinterpret four-component vectors into
1844	two-component vectors with doubled precision in each component.
1845
1846	.. opcode:: DABS - Absolute
1847
1848	.. math::
1849
1850	dst.xy = \|src0.xy\|
1851
1852	dst.zw = \|src0.zw\|
1853
1854	.. opcode:: DADD - Add
1855
1856	.. math::
1857
1858	dst.xy = src0.xy + src1.xy
1859
1860	dst.zw = src0.zw + src1.zw
1861
1862	.. opcode:: DSEQ - Set on Equal
1863
1864	.. math::
1865
1866	dst.x = src0.xy == src1.xy ? \sim 0 : 0
1867
1868	dst.z = src0.zw == src1.zw ? \sim 0 : 0
1869
1870	.. opcode:: DSNE - Set on Not Equal
1871
1872	.. math::
1873
1874	dst.x = src0.xy != src1.xy ? \sim 0 : 0
1875
1876	dst.z = src0.zw != src1.zw ? \sim 0 : 0
1877
1878	.. opcode:: DSLT - Set on Less than
1879
1880	.. math::
1881
1882	dst.x = src0.xy < src1.xy ? \sim 0 : 0
1883
1884	dst.z = src0.zw < src1.zw ? \sim 0 : 0
1885
1886	.. opcode:: DSGE - Set on Greater equal
1887
1888	.. math::
1889
1890	dst.x = src0.xy >= src1.xy ? \sim 0 : 0
1891
1892	dst.z = src0.zw >= src1.zw ? \sim 0 : 0
1893
1894	.. opcode:: DFRAC - Fraction
1895
1896	.. math::
1897
1898	dst.xy = src.xy - \lfloor src.xy\rfloor
1899
1900	dst.zw = src.zw - \lfloor src.zw\rfloor
1901
1902	.. opcode:: DTRUNC - Truncate
1903
1904	.. math::
1905
1906	dst.xy = trunc(src.xy)
1907
1908	dst.zw = trunc(src.zw)
1909
1910	.. opcode:: DCEIL - Ceiling
1911
1912	.. math::
1913
1914	dst.xy = \lceil src.xy\rceil
1915
1916	dst.zw = \lceil src.zw\rceil
1917
1918	.. opcode:: DFLR - Floor
1919
1920	.. math::
1921
1922	dst.xy = \lfloor src.xy\rfloor
1923
1924	dst.zw = \lfloor src.zw\rfloor
1925
1926	.. opcode:: DROUND - Fraction
1927
1928	.. math::
1929
1930	dst.xy = round(src.xy)
1931
1932	dst.zw = round(src.zw)
1933
1934	.. opcode:: DSSG - Set Sign
1935
1936	.. math::
1937
1938	dst.xy = (src.xy > 0) ? 1.0 : (src.xy < 0) ? -1.0 : 0.0
1939
1940	dst.zw = (src.zw > 0) ? 1.0 : (src.zw < 0) ? -1.0 : 0.0
1941
1942	.. opcode:: DLDEXP - Multiply Number by Integral Power of 2
1943
1944	This opcode is the inverse of frexp. The second
1945	source is an integer.
1946
1947	.. math::
1948
1949	dst.xy = src0.xy \times 2^{src1.x}
1950
1951	dst.zw = src0.zw \times 2^{src1.z}
1952
1953	.. opcode:: DMIN - Minimum
1954
1955	.. math::
1956
1957	dst.xy = min(src0.xy, src1.xy)
1958
1959	dst.zw = min(src0.zw, src1.zw)
1960
1961	.. opcode:: DMAX - Maximum
1962
1963	.. math::
1964
1965	dst.xy = max(src0.xy, src1.xy)
1966
1967	dst.zw = max(src0.zw, src1.zw)
1968
1969	.. opcode:: DMUL - Multiply
1970
1971	.. math::
1972
1973	dst.xy = src0.xy \times src1.xy
1974
1975	dst.zw = src0.zw \times src1.zw
1976
1977
1978	.. opcode:: DMAD - Multiply And Add
1979
1980	.. math::
1981
1982	dst.xy = src0.xy \times src1.xy + src2.xy
1983
1984	dst.zw = src0.zw \times src1.zw + src2.zw
1985
1986
1987	.. opcode:: DFMA - Fused Multiply-Add
1988
1989	Perform a * b + c with no intermediate rounding step.
1990
1991	.. math::
1992
1993	dst.xy = src0.xy \times src1.xy + src2.xy
1994
1995	dst.zw = src0.zw \times src1.zw + src2.zw
1996
1997
1998	.. opcode:: DDIV - Divide
1999
2000	.. math::
2001
2002	dst.xy = \frac{src0.xy}{src1.xy}
2003
2004	dst.zw = \frac{src0.zw}{src1.zw}
2005
2006
2007	.. opcode:: DRCP - Reciprocal
2008
2009	.. math::
2010
2011	dst.xy = \frac{1}{src.xy}
2012
2013	dst.zw = \frac{1}{src.zw}
2014
2015	.. opcode:: DSQRT - Square Root
2016
2017	.. math::
2018
2019	dst.xy = \sqrt{src.xy}
2020
2021	dst.zw = \sqrt{src.zw}
2022
2023	.. opcode:: DRSQ - Reciprocal Square Root
2024
2025	.. math::
2026
2027	dst.xy = \frac{1}{\sqrt{src.xy}}
2028
2029	dst.zw = \frac{1}{\sqrt{src.zw}}
2030
2031	.. opcode:: F2D - Float to Double
2032
2033	.. math::
2034
2035	dst.xy = double(src0.x)
2036
2037	dst.zw = double(src0.y)
2038
2039	.. opcode:: D2F - Double to Float
2040
2041	.. math::
2042
2043	dst.x = float(src0.xy)
2044
2045	dst.y = float(src0.zw)
2046
2047	.. opcode:: I2D - Int to Double
2048
2049	.. math::
2050
2051	dst.xy = double(src0.x)
2052
2053	dst.zw = double(src0.y)
2054
2055	.. opcode:: D2I - Double to Int
2056
2057	.. math::
2058
2059	dst.x = int(src0.xy)
2060
2061	dst.y = int(src0.zw)
2062
2063	.. opcode:: U2D - Unsigned Int to Double
2064
2065	.. math::
2066
2067	dst.xy = double(src0.x)
2068
2069	dst.zw = double(src0.y)
2070
2071	.. opcode:: D2U - Double to Unsigned Int
2072
2073	.. math::
2074
2075	dst.x = unsigned(src0.xy)
2076
2077	dst.y = unsigned(src0.zw)
2078
2079	64-bit Integer ISA
2080	^^^^^^^^^^^^^^^^^^
2081
2082	The 64-bit integer opcodes reinterpret four-component vectors into
2083	two-component vectors with 64-bits in each component.
2084
2085	.. opcode:: I64ABS - 64-bit Integer Absolute Value
2086
2087	.. math::
2088
2089	dst.xy = \|src0.xy\|
2090
2091	dst.zw = \|src0.zw\|
2092
2093	.. opcode:: I64NEG - 64-bit Integer Negate
2094
2095	Two's complement.
2096
2097	.. math::
2098
2099	dst.xy = -src.xy
2100
2101	dst.zw = -src.zw
2102
2103	.. opcode:: I64SSG - 64-bit Integer Set Sign
2104
2105	.. math::
2106
2107	dst.xy = (src0.xy < 0) ? -1 : (src0.xy > 0) ? 1 : 0
2108
2109	dst.zw = (src0.zw < 0) ? -1 : (src0.zw > 0) ? 1 : 0
2110
2111	.. opcode:: U64ADD - 64-bit Integer Add
2112
2113	.. math::
2114
2115	dst.xy = src0.xy + src1.xy
2116
2117	dst.zw = src0.zw + src1.zw
2118
2119	.. opcode:: U64MUL - 64-bit Integer Multiply
2120
2121	.. math::
2122
2123	dst.xy = src0.xy * src1.xy
2124
2125	dst.zw = src0.zw * src1.zw
2126
2127	.. opcode:: U64SEQ - 64-bit Integer Set on Equal
2128
2129	.. math::
2130
2131	dst.x = src0.xy == src1.xy ? \sim 0 : 0
2132
2133	dst.z = src0.zw == src1.zw ? \sim 0 : 0
2134
2135	.. opcode:: U64SNE - 64-bit Integer Set on Not Equal
2136
2137	.. math::
2138
2139	dst.x = src0.xy != src1.xy ? \sim 0 : 0
2140
2141	dst.z = src0.zw != src1.zw ? \sim 0 : 0
2142
2143	.. opcode:: U64SLT - 64-bit Unsigned Integer Set on Less Than
2144
2145	.. math::
2146
2147	dst.x = src0.xy < src1.xy ? \sim 0 : 0
2148
2149	dst.z = src0.zw < src1.zw ? \sim 0 : 0
2150
2151	.. opcode:: U64SGE - 64-bit Unsigned Integer Set on Greater Equal
2152
2153	.. math::
2154
2155	dst.x = src0.xy >= src1.xy ? \sim 0 : 0
2156
2157	dst.z = src0.zw >= src1.zw ? \sim 0 : 0
2158
2159	.. opcode:: I64SLT - 64-bit Signed Integer Set on Less Than
2160
2161	.. math::
2162
2163	dst.x = src0.xy < src1.xy ? \sim 0 : 0
2164
2165	dst.z = src0.zw < src1.zw ? \sim 0 : 0
2166
2167	.. opcode:: I64SGE - 64-bit Signed Integer Set on Greater Equal
2168
2169	.. math::
2170
2171	dst.x = src0.xy >= src1.xy ? \sim 0 : 0
2172
2173	dst.z = src0.zw >= src1.zw ? \sim 0 : 0
2174
2175	.. opcode:: I64MIN - Minimum of 64-bit Signed Integers
2176
2177	.. math::
2178
2179	dst.xy = min(src0.xy, src1.xy)
2180
2181	dst.zw = min(src0.zw, src1.zw)
2182
2183	.. opcode:: U64MIN - Minimum of 64-bit Unsigned Integers
2184
2185	.. math::
2186
2187	dst.xy = min(src0.xy, src1.xy)
2188
2189	dst.zw = min(src0.zw, src1.zw)
2190
2191	.. opcode:: I64MAX - Maximum of 64-bit Signed Integers
2192
2193	.. math::
2194
2195	dst.xy = max(src0.xy, src1.xy)
2196
2197	dst.zw = max(src0.zw, src1.zw)
2198
2199	.. opcode:: U64MAX - Maximum of 64-bit Unsigned Integers
2200
2201	.. math::
2202
2203	dst.xy = max(src0.xy, src1.xy)
2204
2205	dst.zw = max(src0.zw, src1.zw)
2206
2207	.. opcode:: U64SHL - Shift Left 64-bit Unsigned Integer
2208
2209	The shift count is masked with ``0x3f`` before the shift is applied.
2210
2211	.. math::
2212
2213	dst.xy = src0.xy \ll (0x3f \& src1.x)
2214
2215	dst.zw = src0.zw \ll (0x3f \& src1.y)
2216
2217	.. opcode:: I64SHR - Arithmetic Shift Right (of 64-bit Signed Integer)
2218
2219	The shift count is masked with ``0x3f`` before the shift is applied.
2220
2221	.. math::
2222
2223	dst.xy = src0.xy \gg (0x3f \& src1.x)
2224
2225	dst.zw = src0.zw \gg (0x3f \& src1.y)
2226
2227	.. opcode:: U64SHR - Logical Shift Right (of 64-bit Unsigned Integer)
2228
2229	The shift count is masked with ``0x3f`` before the shift is applied.
2230
2231	.. math::
2232
2233	dst.xy = src0.xy \gg (unsigned) (0x3f \& src1.x)
2234
2235	dst.zw = src0.zw \gg (unsigned) (0x3f \& src1.y)
2236
2237	.. opcode:: I64DIV - 64-bit Signed Integer Division
2238
2239	.. math::
2240
2241	dst.xy = \frac{src0.xy}{src1.xy}
2242
2243	dst.zw = \frac{src0.zw}{src1.zw}
2244
2245	.. opcode:: U64DIV - 64-bit Unsigned Integer Division
2246
2247	.. math::
2248
2249	dst.xy = \frac{src0.xy}{src1.xy}
2250
2251	dst.zw = \frac{src0.zw}{src1.zw}
2252
2253	.. opcode:: U64MOD - 64-bit Unsigned Integer Remainder
2254
2255	.. math::
2256
2257	dst.xy = src0.xy \bmod src1.xy
2258
2259	dst.zw = src0.zw \bmod src1.zw
2260
2261	.. opcode:: I64MOD - 64-bit Signed Integer Remainder
2262
2263	.. math::
2264
2265	dst.xy = src0.xy \bmod src1.xy
2266
2267	dst.zw = src0.zw \bmod src1.zw
2268
2269	.. opcode:: F2U64 - Float to 64-bit Unsigned Int
2270
2271	.. math::
2272
2273	dst.xy = (uint64_t) src0.x
2274
2275	dst.zw = (uint64_t) src0.y
2276
2277	.. opcode:: F2I64 - Float to 64-bit Int
2278
2279	.. math::
2280
2281	dst.xy = (int64_t) src0.x
2282
2283	dst.zw = (int64_t) src0.y
2284
2285	.. opcode:: U2I64 - Unsigned Integer to 64-bit Integer
2286
2287	This is a zero extension.
2288
2289	.. math::
2290
2291	dst.xy = (int64_t) src0.x
2292
2293	dst.zw = (int64_t) src0.y
2294
2295	.. opcode:: I2I64 - Signed Integer to 64-bit Integer
2296
2297	This is a sign extension.
2298
2299	.. math::
2300
2301	dst.xy = (int64_t) src0.x
2302
2303	dst.zw = (int64_t) src0.y
2304
2305	.. opcode:: D2U64 - Double to 64-bit Unsigned Int
2306
2307	.. math::
2308
2309	dst.xy = (uint64_t) src0.xy
2310
2311	dst.zw = (uint64_t) src0.zw
2312
2313	.. opcode:: D2I64 - Double to 64-bit Int
2314
2315	.. math::
2316
2317	dst.xy = (int64_t) src0.xy
2318
2319	dst.zw = (int64_t) src0.zw
2320
2321	.. opcode:: U642F - 64-bit unsigned integer to float
2322
2323	.. math::
2324
2325	dst.x = (float) src0.xy
2326
2327	dst.y = (float) src0.zw
2328
2329	.. opcode:: I642F - 64-bit Int to Float
2330
2331	.. math::
2332
2333	dst.x = (float) src0.xy
2334
2335	dst.y = (float) src0.zw
2336
2337	.. opcode:: U642D - 64-bit unsigned integer to double
2338
2339	.. math::
2340
2341	dst.xy = (double) src0.xy
2342
2343	dst.zw = (double) src0.zw
2344
2345	.. opcode:: I642D - 64-bit Int to double
2346
2347	.. math::
2348
2349	dst.xy = (double) src0.xy
2350
2351	dst.zw = (double) src0.zw
2352
2353	.. _samplingopcodes:
2354
2355	Resource Sampling Opcodes
2356	^^^^^^^^^^^^^^^^^^^^^^^^^
2357
2358	Those opcodes follow very closely semantics of the respective Direct3D
2359	instructions. If in doubt double check Direct3D documentation.
2360	Note that the swizzle on SVIEW (src1) determines texel swizzling
2361	after lookup.
2362
2363	.. opcode:: SAMPLE
2364
2365	Using provided address, sample data from the specified texture using the
2366	filtering mode identified by the given sampler. The source data may come from
2367	any resource type other than buffers.
2368
2369	Syntax: ``SAMPLE dst, address, sampler_view, sampler``
2370
2371	Example: ``SAMPLE TEMP[0], TEMP[1], SVIEW[0], SAMP[0]``
2372
2373	.. opcode:: SAMPLE_I
2374
2375	Simplified alternative to the SAMPLE instruction. Using the provided
2376	integer address, SAMPLE_I fetches data from the specified sampler view
2377	without any filtering. The source data may come from any resource type
2378	other than CUBE.
2379
2380	Syntax: ``SAMPLE_I dst, address, sampler_view``
2381
2382	Example: ``SAMPLE_I TEMP[0], TEMP[1], SVIEW[0]``
2383
2384	The 'address' is specified as unsigned integers. If the 'address' is out of
2385	range [0...(# texels - 1)] the result of the fetch is always 0 in all
2386	components. As such the instruction doesn't honor address wrap modes, in
2387	cases where that behavior is desirable 'SAMPLE' instruction should be used.
2388	address.w always provides an unsigned integer mipmap level. If the value is
2389	out of the range then the instruction always returns 0 in all components.
2390	address.yz are ignored for buffers and 1d textures. address.z is ignored
2391	for 1d texture arrays and 2d textures.
2392
2393	For 1D texture arrays address.y provides the array index (also as unsigned
2394	integer). If the value is out of the range of available array indices
2395	[0... (array size - 1)] then the opcode always returns 0 in all components.
2396	For 2D texture arrays address.z provides the array index, otherwise it
2397	exhibits the same behavior as in the case for 1D texture arrays. The exact
2398	semantics of the source address are presented in the table below:
2399
2400	+---------------------------+----+-----+-----+---------+
2401	\| resource type \| X \| Y \| Z \| W \|
2402	+===========================+====+=====+=====+=========+
2403	\| ``PIPE_BUFFER`` \| x \| \| \| ignored \|
2404	+---------------------------+----+-----+-----+---------+
2405	\| ``PIPE_TEXTURE_1D`` \| x \| \| \| mpl \|
2406	+---------------------------+----+-----+-----+---------+
2407	\| ``PIPE_TEXTURE_2D`` \| x \| y \| \| mpl \|
2408	+---------------------------+----+-----+-----+---------+
2409	\| ``PIPE_TEXTURE_3D`` \| x \| y \| z \| mpl \|
2410	+---------------------------+----+-----+-----+---------+
2411	\| ``PIPE_TEXTURE_RECT`` \| x \| y \| \| mpl \|
2412	+---------------------------+----+-----+-----+---------+
2413	\| ``PIPE_TEXTURE_CUBE`` \| not allowed as source \|
2414	+---------------------------+----+-----+-----+---------+
2415	\| ``PIPE_TEXTURE_1D_ARRAY`` \| x \| idx \| \| mpl \|
2416	+---------------------------+----+-----+-----+---------+
2417	\| ``PIPE_TEXTURE_2D_ARRAY`` \| x \| y \| idx \| mpl \|
2418	+---------------------------+----+-----+-----+---------+
2419
2420	Where 'mpl' is a mipmap level and 'idx' is the array index.
2421
2422	.. opcode:: SAMPLE_I_MS
2423
2424	Just like SAMPLE_I but allows fetch data from multi-sampled surfaces.
2425
2426	Syntax: ``SAMPLE_I_MS dst, address, sampler_view, sample``
2427
2428	.. opcode:: SAMPLE_B
2429
2430	Just like the SAMPLE instruction with the exception that an additional bias
2431	is applied to the level of detail computed as part of the instruction
2432	execution.
2433
2434	Syntax: ``SAMPLE_B dst, address, sampler_view, sampler, lod_bias``
2435
2436	Example: ``SAMPLE_B TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x``
2437
2438	.. opcode:: SAMPLE_C
2439
2440	Similar to the SAMPLE instruction but it performs a comparison filter. The
2441	operands to SAMPLE_C are identical to SAMPLE, except that there is an
2442	additional float32 operand, reference value, which must be a register with
2443	single-component, or a scalar literal. SAMPLE_C makes the hardware use the
2444	current samplers compare_func (in pipe_sampler_state) to compare reference
2445	value against the red component value for the source resource at each texel
2446	that the currently configured texture filter covers based on the provided
2447	coordinates.
2448
2449	Syntax: ``SAMPLE_C dst, address, sampler_view.r, sampler, ref_value``
2450
2451	Example: ``SAMPLE_C TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x``
2452
2453	.. opcode:: SAMPLE_C_LZ
2454
2455	Same as SAMPLE_C, but LOD is 0 and derivatives are ignored. The LZ stands
2456	for level-zero.
2457
2458	Syntax: ``SAMPLE_C_LZ dst, address, sampler_view.r, sampler, ref_value``
2459
2460	Example: ``SAMPLE_C_LZ TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x``
2461
2462
2463	.. opcode:: SAMPLE_D
2464
2465	SAMPLE_D is identical to the SAMPLE opcode except that the derivatives for
2466	the source address in the x direction and the y direction are provided by
2467	extra parameters.
2468
2469	Syntax: ``SAMPLE_D dst, address, sampler_view, sampler, der_x, der_y``
2470
2471	Example: ``SAMPLE_D TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2], TEMP[3]``
2472
2473	.. opcode:: SAMPLE_L
2474
2475	SAMPLE_L is identical to the SAMPLE opcode except that the LOD is provided
2476	directly as a scalar value, representing no anisotropy.
2477
2478	Syntax: ``SAMPLE_L dst, address, sampler_view, sampler, explicit_lod``
2479
2480	Example: ``SAMPLE_L TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x``
2481
2482	.. opcode:: GATHER4
2483
2484	Gathers the four texels to be used in a bi-linear filtering operation and
2485	packs them into a single register. Only works with 2D, 2D array, cubemaps,
2486	and cubemaps arrays. For 2D textures, only the addressing modes of the
2487	sampler and the top level of any mip pyramid are used. Set W to zero. It
2488	behaves like the SAMPLE instruction, but a filtered sample is not
2489	generated. The four samples that contribute to filtering are placed into
2490	XYZW in counter-clockwise order, starting with the (u,v) texture coordinate
2491	delta at the following locations (-, +), (+, +), (+, -), (-, -), where the
2492	magnitude of the deltas are half a texel.
2493
2494
2495	.. opcode:: SVIEWINFO
2496
2497	Query the dimensions of a given sampler view. dst receives width, height,
2498	depth or array size and number of mipmap levels as int4. The dst can have a
2499	writemask which will specify what info is the caller interested in.
2500
2501	Syntax: ``SVIEWINFO dst, src_mip_level, sampler_view``
2502
2503	Example: ``SVIEWINFO TEMP[0], TEMP[1].x, SVIEW[0]``
2504
2505	src_mip_level is an unsigned integer scalar. If it's out of range then
2506	returns 0 for width, height and depth/array size but the total number of
2507	mipmap is still returned correctly for the given sampler view. The returned
2508	width, height and depth values are for the mipmap level selected by the
2509	src_mip_level and are in the number of texels. For 1d texture array width
2510	is in dst.x, array size is in dst.y and dst.z is 0. The number of mipmaps is
2511	still in dst.w. In contrast to d3d10 resinfo, there's no way in the tgsi
2512	instruction encoding to specify the return type (float/rcpfloat/uint), hence
2513	always using uint. Also, unlike the SAMPLE instructions, the swizzle on src1
2514	resinfo allowing swizzling dst values is ignored (due to the interaction
2515	with rcpfloat modifier which requires some swizzle handling in the state
2516	tracker anyway).
2517
2518	.. opcode:: SAMPLE_POS
2519
2520	Query the position of a sample in the given resource or render target
2521	when per-sample fragment shading is in effect.
2522
2523	Syntax: ``SAMPLE_POS dst, source, sample_index``
2524
2525	dst receives float4 (x, y, undef, undef) indicated where the sample is
2526	located. Sample locations are in the range [0, 1] where 0.5 is the center
2527	of the fragment.
2528
2529	source is either a sampler view (to indicate a shader resource) or temp
2530	register (to indicate the render target). The source register may have
2531	an optional swizzle to apply to the returned result
2532
2533	sample_index is an integer scalar indicating which sample position is to
2534	be queried.
2535
2536	If per-sample shading is not in effect or the source resource or render
2537	target is not multisampled, the result is (0.5, 0.5, undef, undef).
2538
2539	NOTE: no driver has implemented this opcode yet (and no gallium frontend
2540	emits it). This information is subject to change.
2541
2542	.. opcode:: SAMPLE_INFO
2543
2544	Query the number of samples in a multisampled resource or render target.
2545
2546	Syntax: ``SAMPLE_INFO dst, source``
2547
2548	dst receives int4 (n, 0, 0, 0) where n is the number of samples in a
2549	resource or the render target.
2550
2551	source is either a sampler view (to indicate a shader resource) or temp
2552	register (to indicate the render target). The source register may have
2553	an optional swizzle to apply to the returned result
2554
2555	If per-sample shading is not in effect or the source resource or render
2556	target is not multisampled, the result is (1, 0, 0, 0).
2557
2558	NOTE: no driver has implemented this opcode yet (and no gallium frontend
2559	emits it). This information is subject to change.
2560
2561	.. opcode:: LOD - level of detail
2562
2563	Same syntax as the SAMPLE opcode but instead of performing an actual
2564	texture lookup/filter, return the computed LOD information that the
2565	texture pipe would use to access the texture. The Y component contains
2566	the computed LOD lambda_prime. The X component contains the LOD that will
2567	be accessed, based on min/max lod's and mipmap filters.
2568	The Z and W components are set to 0.
2569
2570	Syntax: ``LOD dst, address, sampler_view, sampler``
2571
2572
2573	.. _resourceopcodes:
2574
2575	Resource Access Opcodes
2576	^^^^^^^^^^^^^^^^^^^^^^^
2577
2578	For these opcodes, the resource can be a BUFFER, IMAGE, or MEMORY.
2579
2580	.. opcode:: LOAD - Fetch data from a shader buffer or image
2581
2582	Syntax: ``LOAD dst, resource, address``
2583
2584	Example: ``LOAD TEMP[0], BUFFER[0], TEMP[1]``
2585
2586	Using the provided integer address, LOAD fetches data from the
2587	specified buffer or texture without any filtering.
2588
2589	The 'address' is specified as a vector of unsigned integers. If the
2590	'address' is out of range the result is unspecified.
2591
2592	Only the first mipmap level of a resource can be read from using this
2593	instruction.
2594
2595	For 1D or 2D texture arrays, the array index is provided as an
2596	unsigned integer in address.y or address.z, respectively. address.yz
2597	are ignored for buffers and 1D textures. address.z is ignored for 1D
2598	texture arrays and 2D textures. address.w is always ignored.
2599
2600	A swizzle suffix may be added to the resource argument this will
2601	cause the resource data to be swizzled accordingly.
2602
2603	.. opcode:: STORE - Write data to a shader resource
2604
2605	Syntax: ``STORE resource, address, src``
2606
2607	Example: ``STORE BUFFER[0], TEMP[0], TEMP[1]``
2608
2609	Using the provided integer address, STORE writes data to the
2610	specified buffer or texture.
2611
2612	The 'address' is specified as a vector of unsigned integers. If the
2613	'address' is out of range the result is unspecified.
2614
2615	Only the first mipmap level of a resource can be written to using
2616	this instruction.
2617
2618	For 1D or 2D texture arrays, the array index is provided as an
2619	unsigned integer in address.y or address.z, respectively.
2620	address.yz are ignored for buffers and 1D textures. address.z is
2621	ignored for 1D texture arrays and 2D textures. address.w is always
2622	ignored.
2623
2624	.. opcode:: RESQ - Query information about a resource
2625
2626	Syntax: ``RESQ dst, resource``
2627
2628	Example: ``RESQ TEMP[0], BUFFER[0]``
2629
2630	Returns information about the buffer or image resource. For buffer
2631	resources, the size (in bytes) is returned in the x component. For
2632	image resources, .xyz will contain the width/height/layers of the
2633	image, while .w will contain the number of samples for multi-sampled
2634	images.
2635
2636	.. opcode:: FBFETCH - Load data from framebuffer
2637
2638	Syntax: ``FBFETCH dst, output``
2639
2640	Example: ``FBFETCH TEMP[0], OUT[0]``
2641
2642	This is only valid on ``COLOR`` semantic outputs. Returns the color
2643	of the current position in the framebuffer from before this fragment
2644	shader invocation. May return the same value from multiple calls for
2645	a particular output within a single invocation. Note that result may
2646	be undefined if a fragment is drawn multiple times without a blend
2647	barrier in between.
2648
2649
2650	.. _bindlessopcodes:
2651
2652	Bindless Opcodes
2653	^^^^^^^^^^^^^^^^
2654
2655	These opcodes are for working with bindless sampler or image handles and
2656	require PIPE_CAP_BINDLESS_TEXTURE.
2657
2658	.. opcode:: IMG2HND - Get a bindless handle for a image
2659
2660	Syntax: ``IMG2HND dst, image``
2661
2662	Example: ``IMG2HND TEMP[0], IMAGE[0]``
2663
2664	Sets 'dst' to a bindless handle for 'image'.
2665
2666	.. opcode:: SAMP2HND - Get a bindless handle for a sampler
2667
2668	Syntax: ``SAMP2HND dst, sampler``
2669
2670	Example: ``SAMP2HND TEMP[0], SAMP[0]``
2671
2672	Sets 'dst' to a bindless handle for 'sampler'.
2673
2674
2675	.. _threadsyncopcodes:
2676
2677	Inter-thread synchronization opcodes
2678	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2679
2680	These opcodes are intended for communication between threads running
2681	within the same compute grid. For now they're only valid in compute
2682	programs.
2683
2684	.. opcode:: BARRIER - Thread group barrier
2685
2686	``BARRIER``
2687
2688	This opcode suspends the execution of the current thread until all
2689	the remaining threads in the working group reach the same point of
2690	the program. Results are unspecified if any of the remaining
2691	threads terminates or never reaches an executed BARRIER instruction.
2692
2693	.. opcode:: MEMBAR - Memory barrier
2694
2695	``MEMBAR type``
2696
2697	This opcode waits for the completion of all memory accesses based on
2698	the type passed in. The type is an immediate bitfield with the following
2699	meaning:
2700
2701	Bit 0: Shader storage buffers
2702	Bit 1: Atomic buffers
2703	Bit 2: Images
2704	Bit 3: Shared memory
2705	Bit 4: Thread group
2706
2707	These may be passed in in any combination. An implementation is free to not
2708	distinguish between these as it sees fit. However these map to all the
2709	possibilities made available by GLSL.
2710
2711	.. _atomopcodes:
2712
2713	Atomic opcodes
2714	^^^^^^^^^^^^^^
2715
2716	These opcodes provide atomic variants of some common arithmetic and
2717	logical operations. In this context atomicity means that another
2718	concurrent memory access operation that affects the same memory
2719	location is guaranteed to be performed strictly before or after the
2720	entire execution of the atomic operation. The resource may be a BUFFER,
2721	IMAGE, HWATOMIC, or MEMORY. In the case of an image, the offset works
2722	the same as for ``LOAD`` and ``STORE``, specified above. For atomic
2723	counters, the offset is an immediate index to the base HW atomic
2724	counter for this operation.
2725	These atomic operations may only be used with 32-bit integer image formats.
2726
2727	.. opcode:: ATOMUADD - Atomic integer addition
2728
2729	Syntax: ``ATOMUADD dst, resource, offset, src``
2730
2731	Example: ``ATOMUADD TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2732
2733	The following operation is performed atomically:
2734
2735	.. math::
2736
2737	dst_x = resource[offset]
2738
2739	resource[offset] = dst_x + src_x
2740
2741
2742	.. opcode:: ATOMFADD - Atomic floating point addition
2743
2744	Syntax: ``ATOMFADD dst, resource, offset, src``
2745
2746	Example: ``ATOMFADD TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2747
2748	The following operation is performed atomically:
2749
2750	.. math::
2751
2752	dst_x = resource[offset]
2753
2754	resource[offset] = dst_x + src_x
2755
2756
2757	.. opcode:: ATOMXCHG - Atomic exchange
2758
2759	Syntax: ``ATOMXCHG dst, resource, offset, src``
2760
2761	Example: ``ATOMXCHG TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2762
2763	The following operation is performed atomically:
2764
2765	.. math::
2766
2767	dst_x = resource[offset]
2768
2769	resource[offset] = src_x
2770
2771
2772	.. opcode:: ATOMCAS - Atomic compare-and-exchange
2773
2774	Syntax: ``ATOMCAS dst, resource, offset, cmp, src``
2775
2776	Example: ``ATOMCAS TEMP[0], BUFFER[0], TEMP[1], TEMP[2], TEMP[3]``
2777
2778	The following operation is performed atomically:
2779
2780	.. math::
2781
2782	dst_x = resource[offset]
2783
2784	resource[offset] = (dst_x == cmp_x ? src_x : dst_x)
2785
2786
2787	.. opcode:: ATOMAND - Atomic bitwise And
2788
2789	Syntax: ``ATOMAND dst, resource, offset, src``
2790
2791	Example: ``ATOMAND TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2792
2793	The following operation is performed atomically:
2794
2795	.. math::
2796
2797	dst_x = resource[offset]
2798
2799	resource[offset] = dst_x \& src_x
2800
2801
2802	.. opcode:: ATOMOR - Atomic bitwise Or
2803
2804	Syntax: ``ATOMOR dst, resource, offset, src``
2805
2806	Example: ``ATOMOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2807
2808	The following operation is performed atomically:
2809
2810	.. math::
2811
2812	dst_x = resource[offset]
2813
2814	resource[offset] = dst_x \| src_x
2815
2816
2817	.. opcode:: ATOMXOR - Atomic bitwise Xor
2818
2819	Syntax: ``ATOMXOR dst, resource, offset, src``
2820
2821	Example: ``ATOMXOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2822
2823	The following operation is performed atomically:
2824
2825	.. math::
2826
2827	dst_x = resource[offset]
2828
2829	resource[offset] = dst_x \oplus src_x
2830
2831
2832	.. opcode:: ATOMUMIN - Atomic unsigned minimum
2833
2834	Syntax: ``ATOMUMIN dst, resource, offset, src``
2835
2836	Example: ``ATOMUMIN TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2837
2838	The following operation is performed atomically:
2839
2840	.. math::
2841
2842	dst_x = resource[offset]
2843
2844	resource[offset] = (dst_x < src_x ? dst_x : src_x)
2845
2846
2847	.. opcode:: ATOMUMAX - Atomic unsigned maximum
2848
2849	Syntax: ``ATOMUMAX dst, resource, offset, src``
2850
2851	Example: ``ATOMUMAX TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2852
2853	The following operation is performed atomically:
2854
2855	.. math::
2856
2857	dst_x = resource[offset]
2858
2859	resource[offset] = (dst_x > src_x ? dst_x : src_x)
2860
2861
2862	.. opcode:: ATOMIMIN - Atomic signed minimum
2863
2864	Syntax: ``ATOMIMIN dst, resource, offset, src``
2865
2866	Example: ``ATOMIMIN TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2867
2868	The following operation is performed atomically:
2869
2870	.. math::
2871
2872	dst_x = resource[offset]
2873
2874	resource[offset] = (dst_x < src_x ? dst_x : src_x)
2875
2876
2877	.. opcode:: ATOMIMAX - Atomic signed maximum
2878
2879	Syntax: ``ATOMIMAX dst, resource, offset, src``
2880
2881	Example: ``ATOMIMAX TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2882
2883	The following operation is performed atomically:
2884
2885	.. math::
2886
2887	dst_x = resource[offset]
2888
2889	resource[offset] = (dst_x > src_x ? dst_x : src_x)
2890
2891
2892	.. opcode:: ATOMINC_WRAP - Atomic increment + wrap around
2893
2894	Syntax: ``ATOMINC_WRAP dst, resource, offset, src``
2895
2896	Example: ``ATOMINC_WRAP TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2897
2898	The following operation is performed atomically:
2899
2900	.. math::
2901
2902	dst_x = resource[offset] + 1
2903
2904	resource[offset] = dst_x <= src_x ? dst_x : 0
2905
2906
2907	.. opcode:: ATOMDEC_WRAP - Atomic decrement + wrap around
2908
2909	Syntax: ``ATOMDEC_WRAP dst, resource, offset, src``
2910
2911	Example: ``ATOMDEC_WRAP TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2912
2913	The following operation is performed atomically:
2914
2915	.. math::
2916
2917	dst_x = resource[offset]
2918
2919	resource[offset] =
2920	\left\{
2921	\begin{array}{ c l }
2922	dst_x - 1 & \quad \textrm{if } dst_x \gt 0 \textrm{ and } dst_x \lt src_x \\
2923	0 & \quad \textrm{otherwise}
2924	\end{array}
2925	\right.
2926
2927	.. _interlaneopcodes:
2928
2929	Inter-lane opcodes
2930	^^^^^^^^^^^^^^^^^^
2931
2932	These opcodes reduce the given value across the shader invocations
2933	running in the current SIMD group. Every thread in the subgroup will receive
2934	the same result. The BALLOT operations accept a single-channel argument that
2935	is treated as a boolean and produce a 64-bit value.
2936
2937	.. opcode:: VOTE_ANY - Value is set in any of the active invocations
2938
2939	Syntax: ``VOTE_ANY dst, value``
2940
2941	Example: ``VOTE_ANY TEMP[0].x, TEMP[1].x``
2942
2943
2944	.. opcode:: VOTE_ALL - Value is set in all of the active invocations
2945
2946	Syntax: ``VOTE_ALL dst, value``
2947
2948	Example: ``VOTE_ALL TEMP[0].x, TEMP[1].x``
2949
2950
2951	.. opcode:: VOTE_EQ - Value is the same in all of the active invocations
2952
2953	Syntax: ``VOTE_EQ dst, value``
2954
2955	Example: ``VOTE_EQ TEMP[0].x, TEMP[1].x``
2956
2957
2958	.. opcode:: BALLOT - Lanemask of whether the value is set in each active
2959	invocation
2960
2961	Syntax: ``BALLOT dst, value``
2962
2963	Example: ``BALLOT TEMP[0].xy, TEMP[1].x``
2964
2965	When the argument is a constant true, this produces a bitmask of active
2966	invocations. In fragment shaders, this can include helper invocations
2967	(invocations whose outputs and writes to memory are discarded, but which
2968	are used to compute derivatives).
2969
2970
2971	.. opcode:: READ_FIRST - Broadcast the value from the first active
2972	invocation to all active lanes
2973
2974	Syntax: ``READ_FIRST dst, value``
2975
2976	Example: ``READ_FIRST TEMP[0], TEMP[1]``
2977
2978
2979	.. opcode:: READ_INVOC - Retrieve the value from the given invocation
2980	(need not be uniform)
2981
2982	Syntax: ``READ_INVOC dst, value, invocation``
2983
2984	Example: ``READ_INVOC TEMP[0].xy, TEMP[1].xy, TEMP[2].x``
2985
2986	invocation.x controls the invocation number to read from for all channels.
2987	The invocation number must be the same across all active invocations in a
2988	sub-group; otherwise, the results are undefined.
2989
2990
2991	Explanation of symbols used
2992	------------------------------
2993
2994
2995	Functions
2996	^^^^^^^^^^^^^^
2997
2998
2999	:math:`\|x\|` Absolute value of ``x``.
3000
3001	:math:`\lceil x \rceil` Ceiling of ``x``.
3002
3003	clamp(x,y,z) Clamp x between y and z.
3004	(x < y) ? y : (x > z) ? z : x
3005
3006	:math:`\lfloor x\rfloor` Floor of ``x``.
3007
3008	:math:`\log_2{x}` Logarithm of ``x``, base 2.
3009
3010	max(x,y) Maximum of x and y.
3011	(x > y) ? x : y
3012
3013	min(x,y) Minimum of x and y.
3014	(x < y) ? x : y
3015
3016	partialx(x) Derivative of x relative to fragment's X.
3017
3018	partialy(x) Derivative of x relative to fragment's Y.
3019
3020	pop() Pop from stack.
3021
3022	:math:`x^y` ``x`` to the power ``y``.
3023
3024	push(x) Push x on stack.
3025
3026	round(x) Round x.
3027
3028	trunc(x) Truncate x, i.e. drop the fraction bits.
3029
3030
3031	Keywords
3032	^^^^^^^^^^^^^
3033
3034
3035	discard Discard fragment.
3036
3037	pc Program counter.
3038
3039	target Label of target instruction.
3040
3041
3042	Other tokens
3043	---------------
3044
3045
3046	Declaration
3047	^^^^^^^^^^^
3048
3049
3050	Declares a register that is will be referenced as an operand in Instruction
3051	tokens.
3052
3053	File field contains register file that is being declared and is one
3054	of TGSI_FILE.
3055
3056	UsageMask field specifies which of the register components can be accessed
3057	and is one of TGSI_WRITEMASK.
3058
3059	The Local flag specifies that a given value isn't intended for
3060	subroutine parameter passing and, as a result, the implementation
3061	isn't required to give any guarantees of it being preserved across
3062	subroutine boundaries. As it's merely a compiler hint, the
3063	implementation is free to ignore it.
3064
3065	If Dimension flag is set to 1, a Declaration Dimension token follows.
3066
3067	If Semantic flag is set to 1, a Declaration Semantic token follows.
3068
3069	If Interpolate flag is set to 1, a Declaration Interpolate token follows.
3070
3071	If file is TGSI_FILE_RESOURCE, a Declaration Resource token follows.
3072
3073	If Array flag is set to 1, a Declaration Array token follows.
3074
3075	Array Declaration
3076	^^^^^^^^^^^^^^^^^^^^^^^^
3077
3078	Declarations can optional have an ArrayID attribute which can be referred by
3079	indirect addressing operands. An ArrayID of zero is reserved and treated as
3080	if no ArrayID is specified.
3081
3082	If an indirect addressing operand refers to a specific declaration by using
3083	an ArrayID only the registers in this declaration are guaranteed to be
3084	accessed, accessing any register outside this declaration results in undefined
3085	behavior. Note that for compatibility the effective index is zero-based and
3086	not relative to the specified declaration
3087
3088	If no ArrayID is specified with an indirect addressing operand the whole
3089	register file might be accessed by this operand. This is strongly discouraged
3090	and will prevent packing of scalar/vec2 arrays and effective alias analysis.
3091	This is only legal for TEMP and CONST register files.
3092
3093	Declaration Semantic
3094	^^^^^^^^^^^^^^^^^^^^^^^^
3095
3096	Vertex and fragment shader input and output registers may be labeled
3097	with semantic information consisting of a name and index.
3098
3099	Follows Declaration token if Semantic bit is set.
3100
3101	Since its purpose is to link a shader with other stages of the pipeline,
3102	it is valid to follow only those Declaration tokens that declare a register
3103	either in INPUT or OUTPUT file.
3104
3105	SemanticName field contains the semantic name of the register being declared.
3106	There is no default value.
3107
3108	SemanticIndex is an optional subscript that can be used to distinguish
3109	different register declarations with the same semantic name. The default value
3110	is 0.
3111
3112	The meanings of the individual semantic names are explained in the following
3113	sections.
3114
3115	TGSI_SEMANTIC_POSITION
3116	""""""""""""""""""""""
3117
3118	For vertex shaders, TGSI_SEMANTIC_POSITION indicates the vertex shader
3119	output register which contains the homogeneous vertex position in the clip
3120	space coordinate system. After clipping, the X, Y and Z components of the
3121	vertex will be divided by the W value to get normalized device coordinates.
3122
3123	For fragment shaders, TGSI_SEMANTIC_POSITION is used to indicate that
3124	fragment shader input (or system value, depending on which one is
3125	supported by the driver) contains the fragment's window position. The X
3126	component starts at zero and always increases from left to right.
3127	The Y component starts at zero and always increases but Y=0 may either
3128	indicate the top of the window or the bottom depending on the fragment
3129	coordinate origin convention (see TGSI_PROPERTY_FS_COORD_ORIGIN).
3130	The Z coordinate ranges from 0 to 1 to represent depth from the front
3131	to the back of the Z buffer. The W component contains the interpolated
3132	reciprocal of the vertex position W component (corresponding to gl_Fragcoord,
3133	but unlike d3d10 which interpolates the same 1/w but then gives back
3134	the reciprocal of the interpolated value).
3135
3136	Fragment shaders may also declare an output register with
3137	TGSI_SEMANTIC_POSITION. Only the Z component is writable. This allows
3138	the fragment shader to change the fragment's Z position.
3139
3140
3141
3142	TGSI_SEMANTIC_COLOR
3143	"""""""""""""""""""
3144
3145	For vertex shader outputs or fragment shader inputs/outputs, this
3146	label indicates that the register contains an R,G,B,A color.
3147
3148	Several shader inputs/outputs may contain colors so the semantic index
3149	is used to distinguish them. For example, color[0] may be the diffuse
3150	color while color[1] may be the specular color.
3151
3152	This label is needed so that the flat/smooth shading can be applied
3153	to the right interpolants during rasterization.
3154
3155
3156
3157	TGSI_SEMANTIC_BCOLOR
3158	""""""""""""""""""""
3159
3160	Back-facing colors are only used for back-facing polygons, and are only valid
3161	in vertex shader outputs. After rasterization, all polygons are front-facing
3162	and COLOR and BCOLOR end up occupying the same slots in the fragment shader,
3163	so all BCOLORs effectively become regular COLORs in the fragment shader.
3164
3165
3166	TGSI_SEMANTIC_FOG
3167	"""""""""""""""""
3168
3169	Vertex shader inputs and outputs and fragment shader inputs may be
3170	labeled with TGSI_SEMANTIC_FOG to indicate that the register contains
3171	a fog coordinate. Typically, the fragment shader will use the fog coordinate
3172	to compute a fog blend factor which is used to blend the normal fragment color
3173	with a constant fog color. But fog coord really is just an ordinary vec4
3174	register like regular semantics.
3175
3176
3177	TGSI_SEMANTIC_PSIZE
3178	"""""""""""""""""""
3179
3180	Vertex shader input and output registers may be labeled with
3181	TGIS_SEMANTIC_PSIZE to indicate that the register contains a point size
3182	in the form (S, 0, 0, 1). The point size controls the width or diameter
3183	of points for rasterization. This label cannot be used in fragment
3184	shaders.
3185
3186	When using this semantic, be sure to set the appropriate state in the
3187	:ref:`rasterizer` first.
3188
3189
3190	TGSI_SEMANTIC_TEXCOORD
3191	""""""""""""""""""""""
3192
3193	Only available if PIPE_CAP_TGSI_TEXCOORD is exposed !
3194
3195	Vertex shader outputs and fragment shader inputs may be labeled with
3196	this semantic to make them replaceable by sprite coordinates via the
3197	sprite_coord_enable state in the :ref:`rasterizer`.
3198	The semantic index permitted with this semantic is limited to <= 7.
3199
3200	If the driver does not support TEXCOORD, sprite coordinate replacement
3201	applies to inputs with the GENERIC semantic instead.
3202
3203	The intended use case for this semantic is gl_TexCoord.
3204
3205
3206	TGSI_SEMANTIC_PCOORD
3207	""""""""""""""""""""
3208
3209	Only available if PIPE_CAP_TGSI_TEXCOORD is exposed !
3210
3211	Fragment shader inputs may be labeled with TGSI_SEMANTIC_PCOORD to indicate
3212	that the register contains sprite coordinates in the form (x, y, 0, 1), if
3213	the current primitive is a point and point sprites are enabled. Otherwise,
3214	the contents of the register are undefined.
3215
3216	The intended use case for this semantic is gl_PointCoord.
3217
3218
3219	TGSI_SEMANTIC_GENERIC
3220	"""""""""""""""""""""
3221
3222	All vertex/fragment shader inputs/outputs not labeled with any other
3223	semantic label can be considered to be generic attributes. Typical
3224	uses of generic inputs/outputs are texcoords and user-defined values.
3225
3226
3227	TGSI_SEMANTIC_NORMAL
3228	""""""""""""""""""""
3229
3230	Indicates that a vertex shader input is a normal vector. This is
3231	typically only used for legacy graphics APIs.
3232
3233
3234	TGSI_SEMANTIC_FACE
3235	""""""""""""""""""
3236
3237	This label applies to fragment shader inputs (or system values,
3238	depending on which one is supported by the driver) and indicates that
3239	the register contains front/back-face information.
3240
3241	If it is an input, it will be a floating-point vector in the form (F, 0, 0, 1),
3242	where F will be positive when the fragment belongs to a front-facing polygon,
3243	and negative when the fragment belongs to a back-facing polygon.
3244
3245	If it is a system value, it will be an integer vector in the form (F, 0, 0, 1),
3246	where F is ``0xffffffff`` when the fragment belongs to a front-facing polygon
3247	and ``0`` when the fragment belongs to a back-facing polygon.
3248
3249
3250	TGSI_SEMANTIC_EDGEFLAG
3251	""""""""""""""""""""""
3252
3253	For vertex shaders, this semantic label indicates that an input or
3254	output is a boolean edge flag. The register layout is [F, x, x, x]
3255	where F is 0.0 or 1.0 and x = don't care. Normally, the vertex shader
3256	simply copies the edge flag input to the edgeflag output.
3257
3258	Edge flags are used to control which lines or points are actually
3259	drawn when the polygon mode converts triangles/quads/polygons into
3260	points or lines.
3261
3262
3263	TGSI_SEMANTIC_STENCIL
3264	"""""""""""""""""""""
3265
3266	For fragment shaders, this semantic label indicates that an output
3267	is a writable stencil reference value. Only the Y component is writable.
3268	This allows the fragment shader to change the fragments stencilref value.
3269
3270
3271	TGSI_SEMANTIC_VIEWPORT_INDEX
3272	""""""""""""""""""""""""""""
3273
3274	For geometry shaders, this semantic label indicates that an output
3275	contains the index of the viewport (and scissor) to use.
3276	This is an integer value, and only the X component is used.
3277
3278	If PIPE_CAP_VS_LAYER_VIEWPORT or PIPE_CAP_TES_LAYER_VIEWPORT is
3279	supported, then this semantic label can also be used in vertex or
3280	tessellation evaluation shaders, respectively. Only the value written in the
3281	last vertex processing stage is used.
3282
3283
3284	TGSI_SEMANTIC_LAYER
3285	"""""""""""""""""""
3286
3287	For geometry shaders, this semantic label indicates that an output
3288	contains the layer value to use for the color and depth/stencil surfaces.
3289	This is an integer value, and only the X component is used.
3290	(Also known as rendertarget array index.)
3291
3292	If PIPE_CAP_VS_LAYER_VIEWPORT or PIPE_CAP_TES_LAYER_VIEWPORT is
3293	supported, then this semantic label can also be used in vertex or
3294	tessellation evaluation shaders, respectively. Only the value written in the
3295	last vertex processing stage is used.
3296
3297
3298	TGSI_SEMANTIC_CLIPDIST
3299	""""""""""""""""""""""
3300
3301	Note this covers clipping and culling distances.
3302
3303	When components of vertex elements are identified this way, these
3304	values are each assumed to be a float32 signed distance to a plane.
3305
3306	For clip distances:
3307	Primitive setup only invokes rasterization on pixels for which
3308	the interpolated plane distances are >= 0.
3309
3310	For cull distances:
3311	Primitives will be completely discarded if the plane distance
3312	for all of the vertices in the primitive are < 0.
3313	If a vertex has a cull distance of NaN, that vertex counts as "out"
3314	(as if its < 0);
3315
3316	Multiple clip/cull planes can be implemented simultaneously, by
3317	annotating multiple components of one or more vertex elements with
3318	the above specified semantic.
3319	The limits on both clip and cull distances are bound
3320	by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines
3321	the maximum number of components that can be used to hold the
3322	distances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT
3323	which specifies the maximum number of registers which can be
3324	annotated with those semantics.
3325	The properties NUM_CLIPDIST_ENABLED and NUM_CULLDIST_ENABLED
3326	are used to divide up the 2 x vec4 space between clipping and culling.
3327
3328	TGSI_SEMANTIC_SAMPLEID
3329	""""""""""""""""""""""
3330
3331	For fragment shaders, this semantic label indicates that a system value
3332	contains the current sample id (i.e. gl_SampleID) as an unsigned int.
3333	Only the X component is used. If per-sample shading is not enabled,
3334	the result is (0, undef, undef, undef).
3335
3336	Note that if the fragment shader uses this system value, the fragment
3337	shader is automatically executed at per sample frequency.
3338
3339	TGSI_SEMANTIC_SAMPLEPOS
3340	"""""""""""""""""""""""
3341
3342	For fragment shaders, this semantic label indicates that a system
3343	value contains the current sample's position as float4(x, y, undef, undef)
3344	in the render target (i.e. gl_SamplePosition) when per-fragment shading
3345	is in effect. Position values are in the range [0, 1] where 0.5 is
3346	the center of the fragment.
3347
3348	Note that if the fragment shader uses this system value, the fragment
3349	shader is automatically executed at per sample frequency.
3350
3351	TGSI_SEMANTIC_SAMPLEMASK
3352	""""""""""""""""""""""""
3353
3354	For fragment shaders, this semantic label can be applied to either a
3355	shader system value input or output.
3356
3357	For a system value, the sample mask indicates the set of samples covered by
3358	the current primitive. If MSAA is not enabled, the value is (1, 0, 0, 0).
3359
3360	For an output, the sample mask is used to disable further sample processing.
3361
3362	For both, the register type is uint[4] but only the X component is used
3363	(i.e. gl_SampleMask[0]). Each bit corresponds to one sample position (up
3364	to 32x MSAA is supported).
3365
3366	TGSI_SEMANTIC_INVOCATIONID
3367	""""""""""""""""""""""""""
3368
3369	For geometry shaders, this semantic label indicates that a system value
3370	contains the current invocation id (i.e. gl_InvocationID).
3371	This is an integer value, and only the X component is used.
3372
3373	TGSI_SEMANTIC_INSTANCEID
3374	""""""""""""""""""""""""
3375
3376	For vertex shaders, this semantic label indicates that a system value contains
3377	the current instance id (i.e. gl_InstanceID). It does not include the base
3378	instance. This is an integer value, and only the X component is used.
3379
3380	TGSI_SEMANTIC_VERTEXID
3381	""""""""""""""""""""""
3382
3383	For vertex shaders, this semantic label indicates that a system value contains
3384	the current vertex id (i.e. gl_VertexID). It does (unlike in d3d10) include the
3385	base vertex. This is an integer value, and only the X component is used.
3386
3387	TGSI_SEMANTIC_VERTEXID_NOBASE
3388	"""""""""""""""""""""""""""""""
3389
3390	For vertex shaders, this semantic label indicates that a system value contains
3391	the current vertex id without including the base vertex (this corresponds to
3392	d3d10 vertex id, so TGSI_SEMANTIC_VERTEXID_NOBASE + TGSI_SEMANTIC_BASEVERTEX
3393	== TGSI_SEMANTIC_VERTEXID). This is an integer value, and only the X component
3394	is used.
3395
3396	TGSI_SEMANTIC_BASEVERTEX
3397	""""""""""""""""""""""""
3398
3399	For vertex shaders, this semantic label indicates that a system value contains
3400	the base vertex (i.e. gl_BaseVertex). Note that for non-indexed draw calls,
3401	this contains the first (or start) value instead.
3402	This is an integer value, and only the X component is used.
3403
3404	TGSI_SEMANTIC_PRIMID
3405	""""""""""""""""""""
3406
3407	For geometry and fragment shaders, this semantic label indicates the value
3408	contains the primitive id (i.e. gl_PrimitiveID). This is an integer value,
3409	and only the X component is used.
3410	FIXME: This right now can be either a ordinary input or a system value...
3411
3412
3413	TGSI_SEMANTIC_PATCH
3414	"""""""""""""""""""
3415
3416	For tessellation evaluation/control shaders, this semantic label indicates a
3417	generic per-patch attribute. Such semantics will not implicitly be per-vertex
3418	arrays.
3419
3420	TGSI_SEMANTIC_TESSCOORD
3421	"""""""""""""""""""""""
3422
3423	For tessellation evaluation shaders, this semantic label indicates the
3424	coordinates of the vertex being processed. This is available in XYZ; W is
3425	undefined.
3426
3427	TGSI_SEMANTIC_TESSOUTER
3428	"""""""""""""""""""""""
3429
3430	For tessellation evaluation/control shaders, this semantic label indicates the
3431	outer tessellation levels of the patch. Isoline tessellation will only have XY
3432	defined, triangle will have XYZ and quads will have XYZW defined. This
3433	corresponds to gl_TessLevelOuter.
3434
3435	TGSI_SEMANTIC_TESSINNER
3436	"""""""""""""""""""""""
3437
3438	For tessellation evaluation/control shaders, this semantic label indicates the
3439	inner tessellation levels of the patch. The X value is only defined for
3440	triangle tessellation, while quads will have XY defined. This is entirely
3441	undefined for isoline tessellation.
3442
3443	TGSI_SEMANTIC_VERTICESIN
3444	""""""""""""""""""""""""
3445
3446	For tessellation evaluation/control shaders, this semantic label indicates the
3447	number of vertices provided in the input patch. Only the X value is defined.
3448
3449	TGSI_SEMANTIC_HELPER_INVOCATION
3450	"""""""""""""""""""""""""""""""
3451
3452	For fragment shaders, this semantic indicates whether the current
3453	invocation is covered or not. Helper invocations are created in order
3454	to properly compute derivatives, however it may be desirable to skip
3455	some of the logic in those cases. See ``gl_HelperInvocation`` documentation.
3456
3457	TGSI_SEMANTIC_BASEINSTANCE
3458	""""""""""""""""""""""""""
3459
3460	For vertex shaders, the base instance argument supplied for this
3461	draw. This is an integer value, and only the X component is used.
3462
3463	TGSI_SEMANTIC_DRAWID
3464	""""""""""""""""""""
3465
3466	For vertex shaders, the zero-based index of the current draw in a
3467	``glMultiDraw*`` invocation. This is an integer value, and only the X
3468	component is used.
3469
3470
3471	TGSI_SEMANTIC_WORK_DIM
3472	""""""""""""""""""""""
3473
3474	For compute shaders started via OpenCL this retrieves the work_dim
3475	parameter to the clEnqueueNDRangeKernel call with which the shader
3476	was started.
3477
3478
3479	TGSI_SEMANTIC_GRID_SIZE
3480	"""""""""""""""""""""""
3481
3482	For compute shaders, this semantic indicates the maximum (x, y, z) dimensions
3483	of a grid of thread blocks.
3484
3485
3486	TGSI_SEMANTIC_BLOCK_ID
3487	""""""""""""""""""""""
3488
3489	For compute shaders, this semantic indicates the (x, y, z) coordinates of the
3490	current block inside of the grid.
3491
3492
3493	TGSI_SEMANTIC_BLOCK_SIZE
3494	""""""""""""""""""""""""
3495
3496	For compute shaders, this semantic indicates the maximum (x, y, z) dimensions
3497	of a block in threads.
3498
3499
3500	TGSI_SEMANTIC_THREAD_ID
3501	"""""""""""""""""""""""
3502
3503	For compute shaders, this semantic indicates the (x, y, z) coordinates of the
3504	current thread inside of the block.
3505
3506
3507	TGSI_SEMANTIC_SUBGROUP_SIZE
3508	"""""""""""""""""""""""""""
3509
3510	This semantic indicates the subgroup size for the current invocation. This is
3511	an integer of at most 64, as it indicates the width of lanemasks. It does not
3512	depend on the number of invocations that are active.
3513
3514
3515	TGSI_SEMANTIC_SUBGROUP_INVOCATION
3516	"""""""""""""""""""""""""""""""""
3517
3518	The index of the current invocation within its subgroup.
3519
3520
3521	TGSI_SEMANTIC_SUBGROUP_EQ_MASK
3522	""""""""""""""""""""""""""""""
3523
3524	A bit mask of ``bit index == TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3525	``1 << subgroup_invocation`` in arbitrary precision arithmetic.
3526
3527
3528	TGSI_SEMANTIC_SUBGROUP_GE_MASK
3529	""""""""""""""""""""""""""""""
3530
3531	A bit mask of ``bit index >= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3532	``((1 << (subgroup_size - subgroup_invocation)) - 1) << subgroup_invocation``
3533	in arbitrary precision arithmetic.
3534
3535
3536	TGSI_SEMANTIC_SUBGROUP_GT_MASK
3537	""""""""""""""""""""""""""""""
3538
3539	A bit mask of ``bit index > TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3540	``((1 << (subgroup_size - subgroup_invocation - 1)) - 1) << (subgroup_invocation + 1)``
3541	in arbitrary precision arithmetic.
3542
3543
3544	TGSI_SEMANTIC_SUBGROUP_LE_MASK
3545	""""""""""""""""""""""""""""""
3546
3547	A bit mask of ``bit index <= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3548	``(1 << (subgroup_invocation + 1)) - 1`` in arbitrary precision arithmetic.
3549
3550
3551	TGSI_SEMANTIC_SUBGROUP_LT_MASK
3552	""""""""""""""""""""""""""""""
3553
3554	A bit mask of ``bit index < TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3555	``(1 << subgroup_invocation) - 1`` in arbitrary precision arithmetic.
3556
3557
3558	TGSI_SEMANTIC_VIEWPORT_MASK
3559	"""""""""""""""""""""""""""
3560
3561	A bit mask of viewports to broadcast the current primitive to. See
3562	:ext:`GL_NV_viewport_array2` for more details.
3563
3564
3565	TGSI_SEMANTIC_TESS_DEFAULT_OUTER_LEVEL
3566	""""""""""""""""""""""""""""""""""""""
3567
3568	A system value equal to the default_outer_level array set via set_tess_level.
3569
3570
3571	TGSI_SEMANTIC_TESS_DEFAULT_INNER_LEVEL
3572	""""""""""""""""""""""""""""""""""""""
3573
3574	A system value equal to the default_inner_level array set via set_tess_level.
3575
3576
3577	Declaration Interpolate
3578	^^^^^^^^^^^^^^^^^^^^^^^
3579
3580	This token is only valid for fragment shader INPUT declarations.
3581
3582	The Interpolate field specifies the way input is being interpolated by
3583	the rasterizer and is one of TGSI_INTERPOLATE_*.
3584
3585	The Location field specifies the location inside the pixel that the
3586	interpolation should be done at, one of ``TGSI_INTERPOLATE_LOC_*``. Note that
3587	when per-sample shading is enabled, the implementation may choose to
3588	interpolate at the sample irrespective of the Location field.
3589
3590
3591	Declaration Sampler View
3592	^^^^^^^^^^^^^^^^^^^^^^^^
3593
3594	Follows Declaration token if file is TGSI_FILE_SAMPLER_VIEW.
3595
3596	DCL SVIEW[#], resource, type(s)
3597
3598	Declares a shader input sampler view and assigns it to a SVIEW[#]
3599	register.
3600
3601	resource can be one of BUFFER, 1D, 2D, 3D, 1D_ARRAY and 2D_ARRAY.
3602
3603	type must be 1 or 4 entries (if specifying on a per-component
3604	level) out of UNORM, SNORM, SINT, UINT and FLOAT.
3605
3606	For TEX\* style texture sample opcodes (as opposed to SAMPLE\* opcodes
3607	which take an explicit SVIEW[#] source register), there may be optionally
3608	SVIEW[#] declarations. In this case, the SVIEW index is implied by the
3609	SAMP index, and there must be a corresponding SVIEW[#] declaration for
3610	each SAMP[#] declaration. Drivers are free to ignore this if they wish.
3611	But note in particular that some drivers need to know the sampler type
3612	(float/int/unsigned) in order to generate the correct code, so cases
3613	where integer textures are sampled, SVIEW[#] declarations should be
3614	used.
3615
3616	NOTE: It is NOT legal to mix SAMPLE\* style opcodes and TEX\* opcodes
3617	in the same shader.
3618
3619	Declaration Resource
3620	^^^^^^^^^^^^^^^^^^^^
3621
3622	Follows Declaration token if file is TGSI_FILE_RESOURCE.
3623
3624	DCL RES[#], resource [, WR] [, RAW]
3625
3626	Declares a shader input resource and assigns it to a RES[#]
3627	register.
3628
3629	resource can be one of BUFFER, 1D, 2D, 3D, CUBE, 1D_ARRAY and
3630	2D_ARRAY.
3631
3632	If the RAW keyword is not specified, the texture data will be
3633	subject to conversion, swizzling and scaling as required to yield
3634	the specified data type from the physical data format of the bound
3635	resource.
3636
3637	If the RAW keyword is specified, no channel conversion will be
3638	performed: the values read for each of the channels (X,Y,Z,W) will
3639	correspond to consecutive words in the same order and format
3640	they're found in memory. No element-to-address conversion will be
3641	performed either: the value of the provided X coordinate will be
3642	interpreted in byte units instead of texel units. The result of
3643	accessing a misaligned address is undefined.
3644
3645	Usage of the STORE opcode is only allowed if the WR (writable) flag
3646	is set.
3647
3648	Hardware Atomic Register File
3649	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3650
3651	Hardware atomics are declared as a 2D array with an optional array id.
3652
3653	The first member of the dimension is the buffer resource the atomic
3654	is located in.
3655	The second member is a range into the buffer resource, either for
3656	one or multiple counters. If this is an array, the declaration will have
3657	an unique array id.
3658
3659	Each counter is 4 bytes in size, and index and ranges are in counters not bytes.
3660	DCL HWATOMIC[0][0]
3661	DCL HWATOMIC[0][1]
3662
3663	This declares two atomics, one at the start of the buffer and one in the
3664	second 4 bytes.
3665
3666	DCL HWATOMIC[0][0]
3667	DCL HWATOMIC[1][0]
3668	DCL HWATOMIC[1][1..3], ARRAY(1)
3669
3670	This declares 5 atomics, one in buffer 0 at 0,
3671	one in buffer 1 at 0, and an array of 3 atomics in
3672	the buffer 1, starting at 1.
3673
3674	Properties
3675	^^^^^^^^^^^^^^^^^^^^^^^^
3676
3677	Properties are general directives that apply to the whole TGSI program.
3678
3679	FS_COORD_ORIGIN
3680	"""""""""""""""
3681
3682	Specifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin.
3683	The default value is UPPER_LEFT.
3684
3685	If UPPER_LEFT, the position will be (0,0) at the upper left corner and
3686	increase downward and rightward.
3687	If LOWER_LEFT, the position will be (0,0) at the lower left corner and
3688	increase upward and rightward.
3689
3690	OpenGL defaults to LOWER_LEFT, and is configurable with the
3691	:ext:`GL_ARB_fragment_coord_conventions` extension.
3692
3693	DirectX 9/10 use UPPER_LEFT.
3694
3695	FS_COORD_PIXEL_CENTER
3696	"""""""""""""""""""""
3697
3698	Specifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention.
3699	The default value is HALF_INTEGER.
3700
3701	If HALF_INTEGER, the fractional part of the position will be 0.5
3702	If INTEGER, the fractional part of the position will be 0.0
3703
3704	Note that this does not affect the set of fragments generated by
3705	rasterization, which is instead controlled by half_pixel_center in the
3706	rasterizer.
3707
3708	OpenGL defaults to HALF_INTEGER, and is configurable with the
3709	:ext:`GL_ARB_fragment_coord_conventions` extension.
3710
3711	DirectX 9 uses INTEGER.
3712	DirectX 10 uses HALF_INTEGER.
3713
3714	FS_COLOR0_WRITES_ALL_CBUFS
3715	""""""""""""""""""""""""""
3716	Specifies that writes to the fragment shader color 0 are replicated to all
3717	bound cbufs. This facilitates OpenGL's fragColor output vs fragData[0] where
3718	fragData is directed to a single color buffer, but fragColor is broadcast.
3719
3720	VS_PROHIBIT_UCPS
3721	""""""""""""""""""""""""""
3722	If this property is set on the program bound to the shader stage before the
3723	fragment shader, user clip planes should have no effect (be disabled) even if
3724	that shader does not write to any clip distance outputs and the rasterizer's
3725	clip_plane_enable is non-zero.
3726	This property is only supported by drivers that also support shader clip
3727	distance outputs.
3728	This is useful for APIs that don't have UCPs and where clip distances written
3729	by a shader cannot be disabled.
3730
3731	GS_INVOCATIONS
3732	""""""""""""""
3733
3734	Specifies the number of times a geometry shader should be executed for each
3735	input primitive. Each invocation will have a different
3736	TGSI_SEMANTIC_INVOCATIONID system value set. If not specified, assumed to
3737	be 1.
3738
3739	VS_WINDOW_SPACE_POSITION
3740	""""""""""""""""""""""""""
3741	If this property is set on the vertex shader, the TGSI_SEMANTIC_POSITION output
3742	is assumed to contain window space coordinates.
3743	Division of X,Y,Z by W and the viewport transformation are disabled, and 1/W is
3744	directly taken from the 4-th component of the shader output.
3745	Naturally, clipping is not performed on window coordinates either.
3746	The effect of this property is undefined if a geometry or tessellation shader
3747	are in use.
3748
3749	TCS_VERTICES_OUT
3750	""""""""""""""""
3751
3752	The number of vertices written by the tessellation control shader. This
3753	effectively defines the patch input size of the tessellation evaluation shader
3754	as well.
3755
3756	TES_PRIM_MODE
3757	"""""""""""""
3758
3759	This sets the tessellation primitive mode, one of ``MESA_PRIM_TRIANGLES``,
3760	``MESA_PRIM_QUADS``, or ``MESA_PRIM_LINES``. (Unlike in GL, there is no
3761	separate isolines settings, the regular lines is assumed to mean isolines.)
3762
3763	TES_SPACING
3764	"""""""""""
3765
3766	This sets the spacing mode of the tessellation generator, one of
3767	``PIPE_TESS_SPACING_*``.
3768
3769	TES_VERTEX_ORDER_CW
3770	"""""""""""""""""""
3771
3772	This sets the vertex order to be clockwise if the value is 1, or
3773	counter-clockwise if set to 0.
3774
3775	TES_POINT_MODE
3776	""""""""""""""
3777
3778	If set to a non-zero value, this turns on point mode for the tessellator,
3779	which means that points will be generated instead of primitives.
3780
3781	NUM_CLIPDIST_ENABLED
3782	""""""""""""""""""""
3783
3784	How many clip distance scalar outputs are enabled.
3785
3786	NUM_CULLDIST_ENABLED
3787	""""""""""""""""""""
3788
3789	How many cull distance scalar outputs are enabled.
3790
3791	FS_EARLY_DEPTH_STENCIL
3792	""""""""""""""""""""""
3793
3794	Whether depth test, stencil test, and occlusion query should run before
3795	the fragment shader (regardless of fragment shader side effects). Corresponds
3796	to GLSL early_fragment_tests.
3797
3798	NEXT_SHADER
3799	"""""""""""
3800
3801	Which shader stage will MOST LIKELY follow after this shader when the shader
3802	is bound. This is only a hint to the driver and doesn't have to be precise.
3803	Only set for VS and TES.
3804
3805	CS_FIXED_BLOCK_WIDTH / HEIGHT / DEPTH
3806	"""""""""""""""""""""""""""""""""""""
3807
3808	Threads per block in each dimension, if known at compile time. If the block size
3809	is known all three should be at least 1. If it is unknown they should all be set
3810	to 0 or not set.
3811
3812	LEGACY_MATH_RULES
3813	"""""""""""""""""
3814
3815	The MUL TGSI operation (FP32 multiplication) will return 0 if either
3816	of the operands are equal to 0. That means that 0 * Inf = 0. This
3817	should be set the same way for an entire pipeline. Note that this
3818	applies not only to the literal MUL TGSI opcode, but all FP32
3819	multiplications implied by other operations, such as MAD, FMA, DP2,
3820	DP3, DP4, DST, LOG, LRP, and possibly others. If there is a
3821	mismatch between shaders, then it is unspecified whether this behavior
3822	will be enabled.
3823
3824	FS_POST_DEPTH_COVERAGE
3825	""""""""""""""""""""""
3826
3827	When enabled, the input for TGSI_SEMANTIC_SAMPLEMASK will exclude samples
3828	that have failed the depth/stencil tests. This is only valid when
3829	FS_EARLY_DEPTH_STENCIL is also specified.
3830
3831	LAYER_VIEWPORT_RELATIVE
3832	"""""""""""""""""""""""
3833
3834	When enabled, the TGSI_SEMATNIC_LAYER output value is relative to the
3835	current viewport. This is especially useful in conjunction with
3836	TGSI_SEMANTIC_VIEWPORT_MASK.
3837
3838
3839	Texture Sampling and Texture Formats
3840	------------------------------------
3841
3842	This table shows how texture image components are returned as (x,y,z,w) tuples
3843	by TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and
3844	:opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as
3845	well.
3846
3847	+--------------------+--------------+--------------------+--------------+
3848	\| Texture Components \| Gallium \| OpenGL \| Direct3D 9 \|
3849	+====================+==============+====================+==============+
3850	\| R \| (r, 0, 0, 1) \| (r, 0, 0, 1) \| (r, 1, 1, 1) \|
3851	+--------------------+--------------+--------------------+--------------+
3852	\| RG \| (r, g, 0, 1) \| (r, g, 0, 1) \| (r, g, 1, 1) \|
3853	+--------------------+--------------+--------------------+--------------+
3854	\| RGB \| (r, g, b, 1) \| (r, g, b, 1) \| (r, g, b, 1) \|
3855	+--------------------+--------------+--------------------+--------------+
3856	\| RGBA \| (r, g, b, a) \| (r, g, b, a) \| (r, g, b, a) \|
3857	+--------------------+--------------+--------------------+--------------+
3858	\| A \| (0, 0, 0, a) \| (0, 0, 0, a) \| (0, 0, 0, a) \|
3859	+--------------------+--------------+--------------------+--------------+
3860	\| L \| (l, l, l, 1) \| (l, l, l, 1) \| (l, l, l, 1) \|
3861	+--------------------+--------------+--------------------+--------------+
3862	\| LA \| (l, l, l, a) \| (l, l, l, a) \| (l, l, l, a) \|
3863	+--------------------+--------------+--------------------+--------------+
3864	\| I \| (i, i, i, i) \| (i, i, i, i) \| N/A \|
3865	+--------------------+--------------+--------------------+--------------+
3866	\| UV \| XXX TBD \| (0, 0, 0, 1) \| (u, v, 1, 1) \|
3867	\| \| \| [#envmap-bumpmap]_ \| \|
3868	+--------------------+--------------+--------------------+--------------+
3869	\| Z \| (z, z, z, z) \| (z, z, z, 1) \| (0, z, 0, 1) \|
3870	\| \| \| [#depth-tex-mode]_ \| \|
3871	+--------------------+--------------+--------------------+--------------+
3872	\| S \| (s, s, s, s) \| unknown \| unknown \|
3873	+--------------------+--------------+--------------------+--------------+
3874
3875	.. [#envmap-bumpmap] https://registry.khronos.org/OpenGL/extensions/ATI/ATI_envmap_bumpmap.txt
3876	.. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z)
3877	or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE.

注意: 瀏覽 TracBrowser 來幫助您使用儲存庫瀏覽器

source: vbox/trunk/src/VBox/Additions/3D/mesa/mesa-24.0.2/docs/gallium/tgsi.rst@ 105254

以其他格式下載: