VirtualBox

source: vbox/trunk/src/VBox/Additions/3D/mesa/mesa-24.0.2/docs/gallium/tgsi.rst@ 105254

最後變更 在這個檔案從105254是 103996,由 vboxsync 提交於 12 月 前

Additions/3D/mesa: export mesa-24.0.2 to OSE. bugref:10606

  • 屬性 svn:eol-style 設為 native
檔案大小: 97.2 KB
 
1TGSI
2====
3
4TGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language
5for describing shaders. Since Gallium is inherently shaderful, shaders are
6an important part of the API. TGSI is the only intermediate representation
7used by all drivers.
8
9Basics
10------
11
12All TGSI instructions, known as *opcodes*, operate on arbitrary-precision
13floating-point four-component vectors. An opcode may have up to one
14destination register, known as *dst*, and between zero and three source
15registers, called *src0* through *src2*, or simply *src* if there is only
16one.
17
18Some instructions, like :opcode:`I2F`, permit re-interpretation of vector
19components as integers. Other instructions permit using registers as
20two-component vectors with double precision; see :ref:`doubleopcodes`.
21
22When an instruction has a scalar result, the result is usually copied into
23each of the components of *dst*. When this happens, the result is said to be
24*replicated* to *dst*. :opcode:`RCP` is one such instruction.
25
26Source Modifiers
27^^^^^^^^^^^^^^^^
28
29TGSI supports 32-bit negate and absolute value modifiers on floating-point
30inputs, and 32-bit integer negates on some drivers. The negate applies after
31absolute value if both are present.
32
33The type of an input can be found by ``tgsi_opcode_infer_src_type()``, and
34TGSI_OPCODE_MOV and the second and third operands of TGSI_OPCODE_UCMP (which
35return TGSI_TYPE_UNTYPED) are also considered floats for the purpose of source
36modifiers.
37
38
39Other Modifiers
40^^^^^^^^^^^^^^^
41
42The saturate modifier clamps 32-bit destination stores to [0.0, 1.0].
43
44For arithmetic instruction having a precise modifier certain optimizations
45which may alter the result are disallowed. Example: *add(mul(a,b),c)* can't be
46optimized to TGSI_OPCODE_MAD, because some hardware only supports the fused
47MAD instruction.
48
49Instruction Set
50---------------
51
52Core ISA
53^^^^^^^^^^^^^^^^^^^^^^^^^
54
55These opcodes are guaranteed to be available regardless of the driver being
56used.
57
58.. opcode:: ARL - Address Register Load
59
60 .. math::
61
62 dst.x = (int) \lfloor src.x\rfloor
63
64 dst.y = (int) \lfloor src.y\rfloor
65
66 dst.z = (int) \lfloor src.z\rfloor
67
68 dst.w = (int) \lfloor src.w\rfloor
69
70
71.. opcode:: MOV - Move
72
73 .. math::
74
75 dst.x = src.x
76
77 dst.y = src.y
78
79 dst.z = src.z
80
81 dst.w = src.w
82
83
84.. opcode:: LIT - Light Coefficients
85
86 .. math::
87
88 dst.x = 1
89
90 dst.y = max(src.x, 0)
91
92 dst.z =
93 \left\{
94 \begin{array}{ c l }
95 max(src.y, 0)^{clamp(src.w, -128, 128)} & \quad \textrm{if } src.x \gt 0 \\
96 0 & \quad \textrm{otherwise}
97 \end{array}
98 \right.
99
100 dst.w = 1
101
102
103.. opcode:: RCP - Reciprocal
104
105 This instruction replicates its result.
106
107 .. math::
108
109 dst = \frac{1}{src.x}
110
111
112.. opcode:: RSQ - Reciprocal Square Root
113
114 This instruction replicates its result. The results are undefined for *src* <= 0.
115
116 .. math::
117
118 dst = \frac{1}{\sqrt{src.x}}
119
120
121.. opcode:: SQRT - Square Root
122
123 This instruction replicates its result. The results are undefined for *src* < 0.
124
125 .. math::
126
127 dst = {\sqrt{src.x}}
128
129
130.. opcode:: EXP - Approximate Exponential Base 2
131
132 .. math::
133
134 dst.x &= 2^{\lfloor src.x\rfloor} \\
135 dst.y &= src.x - \lfloor src.x\rfloor \\
136 dst.z &= 2^{src.x} \\
137 dst.w &= 1
138
139
140.. opcode:: LOG - Approximate Logarithm Base 2
141
142 .. math::
143
144 dst.x &= \lfloor\log_2{|src.x|}\rfloor \\
145 dst.y &= \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}} \\
146 dst.z &= \log_2{|src.x|} \\
147 dst.w &= 1
148
149
150.. opcode:: MUL - Multiply
151
152 .. math::
153
154 dst.x = src0.x \times src1.x
155
156 dst.y = src0.y \times src1.y
157
158 dst.z = src0.z \times src1.z
159
160 dst.w = src0.w \times src1.w
161
162
163.. opcode:: ADD - Add
164
165 .. math::
166
167 dst.x = src0.x + src1.x
168
169 dst.y = src0.y + src1.y
170
171 dst.z = src0.z + src1.z
172
173 dst.w = src0.w + src1.w
174
175
176.. opcode:: DP3 - 3-component Dot Product
177
178 This instruction replicates its result.
179
180 .. math::
181
182 \begin{aligned}
183 dst = & src0.x \times src1.x +\\
184 & src0.y \times src1.y +\\
185 & src0.z \times src1.z
186 \end{aligned}
187
188
189.. opcode:: DP4 - 4-component Dot Product
190
191 This instruction replicates its result.
192
193 .. math::
194
195 \begin{aligned}
196 dst = & src0.x \times src1.x +\\
197 & src0.y \times src1.y +\\
198 & src0.z \times src1.z +\\
199 & src0.w \times src1.w
200 \end{aligned}
201
202
203.. opcode:: DST - Distance Vector
204
205 .. math::
206
207 dst.x &= 1\\
208 dst.y &= src0.y \times src1.y\\
209 dst.z &= src0.z\\
210 dst.w &= src1.w
211
212
213.. opcode:: MIN - Minimum
214
215 .. math::
216
217 dst.x = min(src0.x, src1.x)
218
219 dst.y = min(src0.y, src1.y)
220
221 dst.z = min(src0.z, src1.z)
222
223 dst.w = min(src0.w, src1.w)
224
225
226.. opcode:: MAX - Maximum
227
228 .. math::
229
230 dst.x = max(src0.x, src1.x)
231
232 dst.y = max(src0.y, src1.y)
233
234 dst.z = max(src0.z, src1.z)
235
236 dst.w = max(src0.w, src1.w)
237
238
239.. opcode:: SLT - Set On Less Than
240
241 .. math::
242
243 dst.x = (src0.x < src1.x) ? 1.0F : 0.0F
244
245 dst.y = (src0.y < src1.y) ? 1.0F : 0.0F
246
247 dst.z = (src0.z < src1.z) ? 1.0F : 0.0F
248
249 dst.w = (src0.w < src1.w) ? 1.0F : 0.0F
250
251
252.. opcode:: SGE - Set On Greater Equal Than
253
254 .. math::
255
256 dst.x = (src0.x >= src1.x) ? 1.0F : 0.0F
257
258 dst.y = (src0.y >= src1.y) ? 1.0F : 0.0F
259
260 dst.z = (src0.z >= src1.z) ? 1.0F : 0.0F
261
262 dst.w = (src0.w >= src1.w) ? 1.0F : 0.0F
263
264
265.. opcode:: MAD - Multiply And Add
266
267 Perform a * b + c. The implementation is free to decide whether there is an
268 intermediate rounding step or not.
269
270 .. math::
271
272 dst.x = src0.x \times src1.x + src2.x
273
274 dst.y = src0.y \times src1.y + src2.y
275
276 dst.z = src0.z \times src1.z + src2.z
277
278 dst.w = src0.w \times src1.w + src2.w
279
280
281.. opcode:: LRP - Linear Interpolate
282
283 .. math::
284
285 dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x
286
287 dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y
288
289 dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z
290
291 dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w
292
293
294.. opcode:: FMA - Fused Multiply-Add
295
296 Perform a * b + c with no intermediate rounding step.
297
298 .. math::
299
300 dst.x = src0.x \times src1.x + src2.x
301
302 dst.y = src0.y \times src1.y + src2.y
303
304 dst.z = src0.z \times src1.z + src2.z
305
306 dst.w = src0.w \times src1.w + src2.w
307
308
309.. opcode:: FRC - Fraction
310
311 .. math::
312
313 dst.x = src.x - \lfloor src.x\rfloor
314
315 dst.y = src.y - \lfloor src.y\rfloor
316
317 dst.z = src.z - \lfloor src.z\rfloor
318
319 dst.w = src.w - \lfloor src.w\rfloor
320
321
322.. opcode:: FLR - Floor
323
324 .. math::
325
326 dst.x = \lfloor src.x\rfloor
327
328 dst.y = \lfloor src.y\rfloor
329
330 dst.z = \lfloor src.z\rfloor
331
332 dst.w = \lfloor src.w\rfloor
333
334
335.. opcode:: ROUND - Round
336
337 .. math::
338
339 dst.x = round(src.x)
340
341 dst.y = round(src.y)
342
343 dst.z = round(src.z)
344
345 dst.w = round(src.w)
346
347
348.. opcode:: EX2 - Exponential Base 2
349
350 This instruction replicates its result.
351
352 .. math::
353
354 dst = 2^{src.x}
355
356
357.. opcode:: LG2 - Logarithm Base 2
358
359 This instruction replicates its result.
360
361 .. math::
362
363 dst = \log_2{src.x}
364
365
366.. opcode:: POW - Power
367
368 This instruction replicates its result.
369
370 .. math::
371
372 dst = src0.x^{src1.x}
373
374
375.. opcode:: LDEXP - Multiply Number by Integral Power of 2
376
377 *src1* is an integer.
378
379 .. math::
380
381 dst.x = src0.x * 2^{src1.x}
382
383 dst.y = src0.y * 2^{src1.y}
384
385 dst.z = src0.z * 2^{src1.z}
386
387 dst.w = src0.w * 2^{src1.w}
388
389
390.. opcode:: COS - Cosine
391
392 This instruction replicates its result.
393
394 .. math::
395
396 dst = \cos{src.x}
397
398
399.. opcode:: DDX, DDX_FINE - Derivative Relative To X
400
401 The fine variant is only used when ``PIPE_CAP_FS_FINE_DERIVATIVE`` is
402 advertised. When it is, the fine version guarantees one derivative per
403 row while DDX is allowed to be the same for the entire 2x2 quad.
404
405 .. math::
406
407 dst.x = partialx(src.x)
408
409 dst.y = partialx(src.y)
410
411 dst.z = partialx(src.z)
412
413 dst.w = partialx(src.w)
414
415
416.. opcode:: DDY, DDY_FINE - Derivative Relative To Y
417
418 The fine variant is only used when ``PIPE_CAP_FS_FINE_DERIVATIVE`` is
419 advertised. When it is, the fine version guarantees one derivative per
420 column while DDY is allowed to be the same for the entire 2x2 quad.
421
422 .. math::
423
424 dst.x = partialy(src.x)
425
426 dst.y = partialy(src.y)
427
428 dst.z = partialy(src.z)
429
430 dst.w = partialy(src.w)
431
432
433.. opcode:: PK2H - Pack Two 16-bit Floats
434
435 This instruction replicates its result.
436
437 .. math::
438
439 \begin{aligned}
440 dst = & f32\_to\_f16(src.x) | \\
441 ( & f32\_to\_f16(src.y) \ll 16)
442 \end{aligned}
443
444.. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars
445
446 This instruction replicates its result.
447
448 .. math::
449
450 \begin{aligned}
451 dst = & f32\_to\_unorm16(src.x) | \\
452 ( & f32\_to\_unorm16(src.y) \ll 16)
453 \end{aligned}
454
455
456.. opcode:: PK4B - Pack Four Signed 8-bit Scalars
457
458 This instruction replicates its result.
459
460 .. math::
461
462 \begin{aligned}
463 dst = & f32\_to\_snorm8(src.x) | \\
464 ( & f32\_to\_snorm8(src.y) \ll 8) | \\
465 ( & f32\_to\_snorm8(src.z) \ll 16) | \\
466 ( & f32\_to\_snorm8(src.w) \ll 24)
467 \end{aligned}
468
469
470.. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars
471
472 This instruction replicates its result.
473
474 .. math::
475
476 \begin{aligned}
477 dst = & f32\_to\_unorm8(src.x) | \\
478 ( & f32\_to\_unorm8(src.y) \ll 8) | \\
479 ( & f32\_to\_unorm8(src.z) \ll 16) | \\
480 ( & f32\_to\_unorm8(src.w) \ll 24)
481 \end{aligned}
482
483
484.. opcode:: SEQ - Set On Equal
485
486 .. math::
487
488 dst.x = (src0.x == src1.x) ? 1.0F : 0.0F
489
490 dst.y = (src0.y == src1.y) ? 1.0F : 0.0F
491
492 dst.z = (src0.z == src1.z) ? 1.0F : 0.0F
493
494 dst.w = (src0.w == src1.w) ? 1.0F : 0.0F
495
496
497.. opcode:: SGT - Set On Greater Than
498
499 .. math::
500
501 dst.x = (src0.x > src1.x) ? 1.0F : 0.0F
502
503 dst.y = (src0.y > src1.y) ? 1.0F : 0.0F
504
505 dst.z = (src0.z > src1.z) ? 1.0F : 0.0F
506
507 dst.w = (src0.w > src1.w) ? 1.0F : 0.0F
508
509
510.. opcode:: SIN - Sine
511
512 This instruction replicates its result.
513
514 .. math::
515
516 dst = \sin{src.x}
517
518
519.. opcode:: SLE - Set On Less Equal Than
520
521 .. math::
522
523 dst.x = (src0.x <= src1.x) ? 1.0F : 0.0F
524
525 dst.y = (src0.y <= src1.y) ? 1.0F : 0.0F
526
527 dst.z = (src0.z <= src1.z) ? 1.0F : 0.0F
528
529 dst.w = (src0.w <= src1.w) ? 1.0F : 0.0F
530
531
532.. opcode:: SNE - Set On Not Equal
533
534 .. math::
535
536 dst.x = (src0.x != src1.x) ? 1.0F : 0.0F
537
538 dst.y = (src0.y != src1.y) ? 1.0F : 0.0F
539
540 dst.z = (src0.z != src1.z) ? 1.0F : 0.0F
541
542 dst.w = (src0.w != src1.w) ? 1.0F : 0.0F
543
544
545.. opcode:: TEX - Texture Lookup
546
547 for array textures *src0.y* contains the slice for 1D,
548 and *src0.z* contain the slice for 2D.
549
550 for shadow textures with no arrays (and not cube map),
551 *src0.z* contains the reference value.
552
553 for shadow textures with arrays, *src0.z* contains
554 the reference value for 1D arrays, and *src0.w* contains
555 the reference value for 2D arrays and cube maps.
556
557 for cube map array shadow textures, the reference value
558 cannot be passed in *src0.w*, and TEX2 must be used instead.
559
560 .. math::
561
562 coord = src0
563
564 shadow\_ref = src0.z \textrm{ or } src0.w \textrm{ (optional)}
565
566 unit = src1
567
568 dst = texture\_sample(unit, coord, shadow\_ref)
569
570
571.. opcode:: TEX2 - Texture Lookup (for shadow cube map arrays only)
572
573 this is the same as TEX, but uses another reg to encode the
574 reference value.
575
576 .. math::
577
578 coord = src0
579
580 shadow\_ref = src1.x
581
582 unit = src2
583
584 dst = texture\_sample(unit, coord, shadow\_ref)
585
586
587.. opcode:: TXD - Texture Lookup with Derivatives
588
589 .. math::
590
591 coord = src0
592
593 ddx = src1
594
595 ddy = src2
596
597 unit = src3
598
599 dst = texture\_sample\_deriv(unit, coord, ddx, ddy)
600
601
602.. opcode:: TXP - Projective Texture Lookup
603
604 .. math::
605
606 coord.x = src0.x / src0.w
607
608 coord.y = src0.y / src0.w
609
610 coord.z = src0.z / src0.w
611
612 coord.w = src0.w
613
614 unit = src1
615
616 dst = texture\_sample(unit, coord)
617
618
619.. opcode:: UP2H - Unpack Two 16-Bit Floats
620
621 .. math::
622
623 dst.x = f16\_to\_f32(src0.x \& 0xffff)
624
625 dst.y = f16\_to\_f32(src0.x \gg 16)
626
627 dst.z = f16\_to\_f32(src0.x \& 0xffff)
628
629 dst.w = f16\_to\_f32(src0.x \gg 16)
630
631 .. note::
632
633 Considered for removal.
634
635.. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars
636
637 TBD
638
639 .. note::
640
641 Considered for removal.
642
643.. opcode:: UP4B - Unpack Four Signed 8-Bit Values
644
645 TBD
646
647 .. note::
648
649 Considered for removal.
650
651.. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars
652
653 TBD
654
655 .. note::
656
657 Considered for removal.
658
659
660.. opcode:: ARR - Address Register Load With Round
661
662 .. math::
663
664 dst.x = (int) round(src.x)
665
666 dst.y = (int) round(src.y)
667
668 dst.z = (int) round(src.z)
669
670 dst.w = (int) round(src.w)
671
672
673.. opcode:: SSG - Set Sign
674
675 .. math::
676
677 dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0
678
679 dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0
680
681 dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0
682
683 dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0
684
685
686.. opcode:: CMP - Compare
687
688 .. math::
689
690 dst.x = (src0.x < 0) ? src1.x : src2.x
691
692 dst.y = (src0.y < 0) ? src1.y : src2.y
693
694 dst.z = (src0.z < 0) ? src1.z : src2.z
695
696 dst.w = (src0.w < 0) ? src1.w : src2.w
697
698
699.. opcode:: KILL_IF - Conditional Discard
700
701 Conditional discard. Allowed in fragment shaders only.
702
703 Pseudocode::
704
705 if (src.x < 0 || src.y < 0 || src.z < 0 || src.w < 0)
706 discard
707 endif
708
709
710.. opcode:: KILL - Discard
711
712 Unconditional discard. Allowed in fragment shaders only.
713
714
715.. opcode:: DEMOTE - Demote Invocation to a Helper
716
717 This demotes the current invocation to a helper, but continues
718 execution (while KILL may or may not terminate the
719 invocation). After this runs, all the usual helper invocation rules
720 apply about discarding buffer and render target writes. This is
721 useful for having accurate derivatives in the other invocations
722 which have not been demoted.
723
724 Allowed in fragment shaders only.
725
726
727.. opcode:: READ_HELPER - Reads Invocation Helper Status
728
729 This is identical to ``TGSI_SEMANTIC_HELPER_INVOCATION``, except
730 this will read the current value, which might change as a result of
731 a ``DEMOTE`` instruction.
732
733 Allowed in fragment shaders only.
734
735
736.. opcode:: TXB - Texture Lookup With Bias
737
738 for cube map array textures and shadow cube maps, the bias value
739 cannot be passed in *src0.w*, and TXB2 must be used instead.
740
741 if the target is a shadow texture, the reference value is always
742 in *src.z* (this prevents shadow 3d and shadow 2d arrays from
743 using this instruction, but this is not needed).
744
745 .. math::
746
747 coord.x = src0.x
748
749 coord.y = src0.y
750
751 coord.z = src0.z
752
753 coord.w = none
754
755 bias = src0.w
756
757 unit = src1
758
759 dst = texture\_sample(unit, coord, bias)
760
761
762.. opcode:: TXB2 - Texture Lookup With Bias (some cube maps only)
763
764 this is the same as TXB, but uses another reg to encode the
765 LOD bias value for cube map arrays and shadow cube maps.
766 Presumably shadow 2d arrays and shadow 3d targets could use
767 this encoding too, but this is not legal.
768
769 if the target is a shadow cube map array, the reference value is in
770 *src1.y*.
771
772 .. math::
773
774 coord = src0
775
776 bias = src1.x
777
778 unit = src2
779
780 dst = texture\_sample(unit, coord, bias)
781
782
783.. opcode:: DIV - Divide
784
785 .. math::
786
787 dst.x = \frac{src0.x}{src1.x}
788
789 dst.y = \frac{src0.y}{src1.y}
790
791 dst.z = \frac{src0.z}{src1.z}
792
793 dst.w = \frac{src0.w}{src1.w}
794
795
796.. opcode:: DP2 - 2-component Dot Product
797
798 This instruction replicates its result.
799
800 .. math::
801
802 \begin{aligned}
803 dst = & src0.x \times src1.x + \\
804 & src0.y \times src1.y
805 \end{aligned}
806
807.. opcode:: TEX_LZ - Texture Lookup With LOD = 0
808
809 This is the same as TXL with LOD = 0. Like every texture opcode, it obeys
810 pipe_sampler_view::u.tex.first_level and pipe_sampler_state::min_lod.
811 There is no way to override those two in shaders.
812
813 .. math::
814
815 coord.x = src0.x
816
817 coord.y = src0.y
818
819 coord.z = src0.z
820
821 coord.w = none
822
823 lod = 0
824
825 unit = src1
826
827 dst = texture\_sample(unit, coord, lod)
828
829
830.. opcode:: TXL - Texture Lookup With explicit LOD
831
832 for cube map array textures, the explicit LOD value
833 cannot be passed in *src0.w*, and TXL2 must be used instead.
834
835 if the target is a shadow texture, the reference value is always
836 in *src.z* (this prevents shadow 3d / 2d array / cube targets from
837 using this instruction, but this is not needed).
838
839 .. math::
840
841 coord.x = src0.x
842
843 coord.y = src0.y
844
845 coord.z = src0.z
846
847 coord.w = none
848
849 lod = src0.w
850
851 unit = src1
852
853 dst = texture\_sample(unit, coord, lod)
854
855
856.. opcode:: TXL2 - Texture Lookup With explicit LOD (for cube map arrays only)
857
858 this is the same as TXL, but uses another reg to encode the
859 explicit LOD value.
860 Presumably shadow 3d / 2d array / cube targets could use
861 this encoding too, but this is not legal.
862
863 if the target is a shadow cube map array, the reference value is in
864 *src1.y*.
865
866 .. math::
867
868 coord = src0
869
870 lod = src1.x
871
872 unit = src2
873
874 dst = texture\_sample(unit, coord, lod)
875
876
877Compute ISA
878^^^^^^^^^^^^^^^^^^^^^^^^
879
880These opcodes are primarily provided for special-use computational shaders.
881Support for these opcodes indicated by a special pipe capability bit (TBD).
882
883XXX doesn't look like most of the opcodes really belong here.
884
885.. opcode:: CEIL - Ceiling
886
887 .. math::
888
889 dst.x = \lceil src.x\rceil
890
891 dst.y = \lceil src.y\rceil
892
893 dst.z = \lceil src.z\rceil
894
895 dst.w = \lceil src.w\rceil
896
897
898.. opcode:: TRUNC - Truncate
899
900 .. math::
901
902 dst.x = trunc(src.x)
903
904 dst.y = trunc(src.y)
905
906 dst.z = trunc(src.z)
907
908 dst.w = trunc(src.w)
909
910
911.. opcode:: MOD - Modulus
912
913 .. math::
914
915 dst.x = src0.x \bmod src1.x
916
917 dst.y = src0.y \bmod src1.y
918
919 dst.z = src0.z \bmod src1.z
920
921 dst.w = src0.w \bmod src1.w
922
923
924.. opcode:: UARL - Integer Address Register Load
925
926 Moves the contents of the source register, assumed to be an integer, into the
927 destination register, which is assumed to be an address (ADDR) register.
928
929
930.. opcode:: TXF - Texel Fetch
931
932 As per :ext:`GL_NV_gpu_program4`, extract a single texel from a specified
933 texture image or PIPE_BUFFER resource. The source sampler may not be a
934 CUBE or SHADOW. *src0* is a
935 four-component signed integer vector used to identify the single texel
936 accessed. 3 components + level. If the texture is multisampled, then
937 the fourth component indicates the sample, not the mipmap level.
938 Just like texture instructions, an optional
939 offset vector is provided, which is subject to various driver restrictions
940 (regarding range, source of offsets). This instruction ignores the sampler
941 state.
942
943 TXF(uint_vec coord, int_vec offset).
944
945
946.. opcode:: TXQ - Texture Size Query
947
948 As per :ext:`GL_NV_gpu_program4`, retrieve the dimensions of the texture
949 depending on the target. For 1D (width), 2D/RECT/CUBE (width, height),
950 3D (width, height, depth), 1D array (width, layers), 2D array (width,
951 height, layers). Also return the number of accessible levels
952 (last_level - first_level + 1) in W.
953
954 For components which don't return a resource dimension, their value
955 is undefined.
956
957 .. math::
958
959 lod = src0.x
960
961 dst.x = texture\_width(unit, lod)
962
963 dst.y = texture\_height(unit, lod)
964
965 dst.z = texture\_depth(unit, lod)
966
967 dst.w = texture\_levels(unit)
968
969
970.. opcode:: TXQS - Texture Samples Query
971
972 This retrieves the number of samples in the texture, and stores it
973 into the x component as an unsigned integer. The other components are
974 undefined. If the texture is not multisampled, this function returns
975 (1, undef, undef, undef).
976
977 .. math::
978
979 dst.x = texture\_samples(unit)
980
981
982.. opcode:: TG4 - Texture Gather
983
984 As per :ext:`GL_ARB_texture_gather`, gathers the four texels to be used in a
985 bi-linear filtering operation and packs them into a single register.
986 Only works with 2D, 2D array, cubemaps, and cubemaps arrays. For 2D
987 textures, only the addressing modes of the sampler and the top level of any
988 mip pyramid are used. Set W to zero. It behaves like the TEX instruction,
989 but a filtered sample is not generated. The four samples that contribute to
990 filtering are placed into XYZW in clockwise order, starting with the (u,v)
991 texture coordinate delta at the following locations (-, +), (+, +), (+, -),
992 (-, -), where the magnitude of the deltas are half a texel.
993
994 PIPE_CAP_TEXTURE_SM5 enhances this instruction to support shadow per-sample
995 depth compares, single component selection, and a non-constant offset. It
996 doesn't allow support for the GL independent offset to get i0,j0. This would
997 require another CAP is HW can do it natively. For now we lower that before
998 TGSI.
999
1000 PIPE_CAP_TGSI_TG4_COMPONENT_IN_SWIZZLE changes the encoding so that component
1001 is stored in the sampler source swizzle x.
1002
1003 (without TGSI_TG4_COMPONENT_IN_SWIZZLE)
1004
1005 .. math::
1006
1007 coord = src0
1008
1009 component = src1
1010
1011 dst = texture\_gather4 (unit, coord, component)
1012
1013 (with TGSI_TG4_COMPONENT_IN_SWIZZLE)
1014
1015 .. math::
1016
1017 coord = src0
1018
1019 dst = texture\_gather4 (unit, coord)
1020
1021 \text{component is encoded in sampler swizzle.}
1022
1023 (with SM5 - cube array shadow)
1024
1025 .. math::
1026
1027 coord = src0
1028
1029 compare = src1
1030
1031 dst = texture\_gather (uint, coord, compare)
1032
1033.. opcode:: LODQ - level of detail query
1034
1035 Compute the LOD information that the texture pipe would use to access the
1036 texture. The Y component contains the computed LOD lambda_prime. The X
1037 component contains the LOD that will be accessed, based on min/max LODs
1038 and mipmap filters.
1039
1040 .. math::
1041
1042 coord = src0
1043
1044 dst.xy = lodq(uint, coord);
1045
1046.. opcode:: CLOCK - retrieve the current shader time
1047
1048 Invoking this instruction multiple times in the same shader should
1049 cause monotonically increasing values to be returned. The values
1050 are implicitly 64-bit, so if fewer than 64 bits of precision are
1051 available, to provide expected wraparound semantics, the value
1052 should be shifted up so that the most significant bit of the time
1053 is the most significant bit of the 64-bit value.
1054
1055 .. math::
1056
1057 dst.xy = clock()
1058
1059
1060Integer ISA
1061^^^^^^^^^^^^^^^^^^^^^^^^
1062These opcodes are used for integer operations.
1063Support for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?)
1064
1065
1066.. opcode:: I2F - Signed Integer To Float
1067
1068 Rounding is unspecified (round to nearest even suggested).
1069
1070 .. math::
1071
1072 dst.x = (float) src.x
1073
1074 dst.y = (float) src.y
1075
1076 dst.z = (float) src.z
1077
1078 dst.w = (float) src.w
1079
1080
1081.. opcode:: U2F - Unsigned Integer To Float
1082
1083 Rounding is unspecified (round to nearest even suggested).
1084
1085 .. math::
1086
1087 dst.x = (float) src.x
1088
1089 dst.y = (float) src.y
1090
1091 dst.z = (float) src.z
1092
1093 dst.w = (float) src.w
1094
1095
1096.. opcode:: F2I - Float to Signed Integer
1097
1098 Rounding is towards zero (truncate).
1099 Values outside signed range (including NaNs) produce undefined results.
1100
1101 .. math::
1102
1103 dst.x = (int) src.x
1104
1105 dst.y = (int) src.y
1106
1107 dst.z = (int) src.z
1108
1109 dst.w = (int) src.w
1110
1111
1112.. opcode:: F2U - Float to Unsigned Integer
1113
1114 Rounding is towards zero (truncate).
1115 Values outside unsigned range (including NaNs) produce undefined results.
1116
1117 .. math::
1118
1119 dst.x = (unsigned) src.x
1120
1121 dst.y = (unsigned) src.y
1122
1123 dst.z = (unsigned) src.z
1124
1125 dst.w = (unsigned) src.w
1126
1127
1128.. opcode:: UADD - Integer Add
1129
1130 This instruction works the same for signed and unsigned integers.
1131 The low 32bit of the result is returned.
1132
1133 .. math::
1134
1135 dst.x = src0.x + src1.x
1136
1137 dst.y = src0.y + src1.y
1138
1139 dst.z = src0.z + src1.z
1140
1141 dst.w = src0.w + src1.w
1142
1143
1144.. opcode:: UMAD - Integer Multiply And Add
1145
1146 This instruction works the same for signed and unsigned integers.
1147 The multiplication returns the low 32bit (as does the result itself).
1148
1149 .. math::
1150
1151 dst.x = src0.x \times src1.x + src2.x
1152
1153 dst.y = src0.y \times src1.y + src2.y
1154
1155 dst.z = src0.z \times src1.z + src2.z
1156
1157 dst.w = src0.w \times src1.w + src2.w
1158
1159
1160.. opcode:: UMUL - Integer Multiply
1161
1162 This instruction works the same for signed and unsigned integers.
1163 The low 32bit of the result is returned.
1164
1165 .. math::
1166
1167 dst.x = src0.x \times src1.x
1168
1169 dst.y = src0.y \times src1.y
1170
1171 dst.z = src0.z \times src1.z
1172
1173 dst.w = src0.w \times src1.w
1174
1175
1176.. opcode:: IMUL_HI - Signed Integer Multiply High Bits
1177
1178 The high 32bits of the multiplication of 2 signed integers are returned.
1179
1180 .. math::
1181
1182 dst.x = (src0.x \times src1.x) \gg 32
1183
1184 dst.y = (src0.y \times src1.y) \gg 32
1185
1186 dst.z = (src0.z \times src1.z) \gg 32
1187
1188 dst.w = (src0.w \times src1.w) \gg 32
1189
1190
1191.. opcode:: UMUL_HI - Unsigned Integer Multiply High Bits
1192
1193 The high 32bits of the multiplication of 2 unsigned integers are returned.
1194
1195 .. math::
1196
1197 dst.x = (src0.x \times src1.x) \gg 32
1198
1199 dst.y = (src0.y \times src1.y) \gg 32
1200
1201 dst.z = (src0.z \times src1.z) \gg 32
1202
1203 dst.w = (src0.w \times src1.w) \gg 32
1204
1205
1206.. opcode:: IDIV - Signed Integer Division
1207
1208 TBD: behavior for division by zero.
1209
1210 .. math::
1211
1212 dst.x = \frac{src0.x}{src1.x}
1213
1214 dst.y = \frac{src0.y}{src1.y}
1215
1216 dst.z = \frac{src0.z}{src1.z}
1217
1218 dst.w = \frac{src0.w}{src1.w}
1219
1220
1221.. opcode:: UDIV - Unsigned Integer Division
1222
1223 For division by zero, ``0xffffffff`` is returned.
1224
1225 .. math::
1226
1227 dst.x = \frac{src0.x}{src1.x}
1228
1229 dst.y = \frac{src0.y}{src1.y}
1230
1231 dst.z = \frac{src0.z}{src1.z}
1232
1233 dst.w = \frac{src0.w}{src1.w}
1234
1235
1236.. opcode:: UMOD - Unsigned Integer Remainder
1237
1238 If *src1* is zero, ``0xffffffff`` is returned.
1239
1240 .. math::
1241
1242 dst.x = src0.x \bmod src1.x
1243
1244 dst.y = src0.y \bmod src1.y
1245
1246 dst.z = src0.z \bmod src1.z
1247
1248 dst.w = src0.w \bmod src1.w
1249
1250
1251.. opcode:: NOT - Bitwise Not
1252
1253 .. math::
1254
1255 dst.x = \sim src.x
1256
1257 dst.y = \sim src.y
1258
1259 dst.z = \sim src.z
1260
1261 dst.w = \sim src.w
1262
1263
1264.. opcode:: AND - Bitwise And
1265
1266 .. math::
1267
1268 dst.x = src0.x \& src1.x
1269
1270 dst.y = src0.y \& src1.y
1271
1272 dst.z = src0.z \& src1.z
1273
1274 dst.w = src0.w \& src1.w
1275
1276
1277.. opcode:: OR - Bitwise Or
1278
1279 .. math::
1280
1281 dst.x = src0.x | src1.x
1282
1283 dst.y = src0.y | src1.y
1284
1285 dst.z = src0.z | src1.z
1286
1287 dst.w = src0.w | src1.w
1288
1289
1290.. opcode:: XOR - Bitwise Xor
1291
1292 .. math::
1293
1294 dst.x = src0.x \oplus src1.x
1295
1296 dst.y = src0.y \oplus src1.y
1297
1298 dst.z = src0.z \oplus src1.z
1299
1300 dst.w = src0.w \oplus src1.w
1301
1302
1303.. opcode:: IMAX - Maximum of Signed Integers
1304
1305 .. math::
1306
1307 dst.x = max(src0.x, src1.x)
1308
1309 dst.y = max(src0.y, src1.y)
1310
1311 dst.z = max(src0.z, src1.z)
1312
1313 dst.w = max(src0.w, src1.w)
1314
1315
1316.. opcode:: UMAX - Maximum of Unsigned Integers
1317
1318 .. math::
1319
1320 dst.x = max(src0.x, src1.x)
1321
1322 dst.y = max(src0.y, src1.y)
1323
1324 dst.z = max(src0.z, src1.z)
1325
1326 dst.w = max(src0.w, src1.w)
1327
1328
1329.. opcode:: IMIN - Minimum of Signed Integers
1330
1331 .. math::
1332
1333 dst.x = min(src0.x, src1.x)
1334
1335 dst.y = min(src0.y, src1.y)
1336
1337 dst.z = min(src0.z, src1.z)
1338
1339 dst.w = min(src0.w, src1.w)
1340
1341
1342.. opcode:: UMIN - Minimum of Unsigned Integers
1343
1344 .. math::
1345
1346 dst.x = min(src0.x, src1.x)
1347
1348 dst.y = min(src0.y, src1.y)
1349
1350 dst.z = min(src0.z, src1.z)
1351
1352 dst.w = min(src0.w, src1.w)
1353
1354
1355.. opcode:: SHL - Shift Left
1356
1357 The shift count is masked with ``0x1f`` before the shift is applied.
1358
1359 .. math::
1360
1361 dst.x = src0.x \ll (0x1f \& src1.x)
1362
1363 dst.y = src0.y \ll (0x1f \& src1.y)
1364
1365 dst.z = src0.z \ll (0x1f \& src1.z)
1366
1367 dst.w = src0.w \ll (0x1f \& src1.w)
1368
1369
1370.. opcode:: ISHR - Arithmetic Shift Right (of Signed Integer)
1371
1372 The shift count is masked with ``0x1f`` before the shift is applied.
1373
1374 .. math::
1375
1376 dst.x = src0.x \gg (0x1f \& src1.x)
1377
1378 dst.y = src0.y \gg (0x1f \& src1.y)
1379
1380 dst.z = src0.z \gg (0x1f \& src1.z)
1381
1382 dst.w = src0.w \gg (0x1f \& src1.w)
1383
1384
1385.. opcode:: USHR - Logical Shift Right
1386
1387 The shift count is masked with ``0x1f`` before the shift is applied.
1388
1389 .. math::
1390
1391 dst.x = src0.x \gg (unsigned) (0x1f \& src1.x)
1392
1393 dst.y = src0.y \gg (unsigned) (0x1f \& src1.y)
1394
1395 dst.z = src0.z \gg (unsigned) (0x1f \& src1.z)
1396
1397 dst.w = src0.w \gg (unsigned) (0x1f \& src1.w)
1398
1399
1400.. opcode:: UCMP - Integer Conditional Move
1401
1402 .. math::
1403
1404 dst.x = src0.x ? src1.x : src2.x
1405
1406 dst.y = src0.y ? src1.y : src2.y
1407
1408 dst.z = src0.z ? src1.z : src2.z
1409
1410 dst.w = src0.w ? src1.w : src2.w
1411
1412
1413
1414.. opcode:: ISSG - Integer Set Sign
1415
1416 .. math::
1417
1418 dst.x = (src0.x < 0) ? -1 : (src0.x > 0) ? 1 : 0
1419
1420 dst.y = (src0.y < 0) ? -1 : (src0.y > 0) ? 1 : 0
1421
1422 dst.z = (src0.z < 0) ? -1 : (src0.z > 0) ? 1 : 0
1423
1424 dst.w = (src0.w < 0) ? -1 : (src0.w > 0) ? 1 : 0
1425
1426
1427
1428.. opcode:: FSLT - Float Set On Less Than (ordered)
1429
1430 Same comparison as SLT but returns integer instead of 1.0/0.0 float
1431
1432 .. math::
1433
1434 dst.x = (src0.x < src1.x) ? \sim 0 : 0
1435
1436 dst.y = (src0.y < src1.y) ? \sim 0 : 0
1437
1438 dst.z = (src0.z < src1.z) ? \sim 0 : 0
1439
1440 dst.w = (src0.w < src1.w) ? \sim 0 : 0
1441
1442
1443.. opcode:: ISLT - Signed Integer Set On Less Than
1444
1445 .. math::
1446
1447 dst.x = (src0.x < src1.x) ? \sim 0 : 0
1448
1449 dst.y = (src0.y < src1.y) ? \sim 0 : 0
1450
1451 dst.z = (src0.z < src1.z) ? \sim 0 : 0
1452
1453 dst.w = (src0.w < src1.w) ? \sim 0 : 0
1454
1455
1456.. opcode:: USLT - Unsigned Integer Set On Less Than
1457
1458 .. math::
1459
1460 dst.x = (src0.x < src1.x) ? \sim 0 : 0
1461
1462 dst.y = (src0.y < src1.y) ? \sim 0 : 0
1463
1464 dst.z = (src0.z < src1.z) ? \sim 0 : 0
1465
1466 dst.w = (src0.w < src1.w) ? \sim 0 : 0
1467
1468
1469.. opcode:: FSGE - Float Set On Greater Equal Than (ordered)
1470
1471 Same comparison as SGE but returns integer instead of 1.0/0.0 float
1472
1473 .. math::
1474
1475 dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1476
1477 dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1478
1479 dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1480
1481 dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1482
1483
1484.. opcode:: ISGE - Signed Integer Set On Greater Equal Than
1485
1486 .. math::
1487
1488 dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1489
1490 dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1491
1492 dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1493
1494 dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1495
1496
1497.. opcode:: USGE - Unsigned Integer Set On Greater Equal Than
1498
1499 .. math::
1500
1501 dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1502
1503 dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1504
1505 dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1506
1507 dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1508
1509
1510.. opcode:: FSEQ - Float Set On Equal (ordered)
1511
1512 Same comparison as SEQ but returns integer instead of 1.0/0.0 float
1513
1514 .. math::
1515
1516 dst.x = (src0.x == src1.x) ? \sim 0 : 0
1517
1518 dst.y = (src0.y == src1.y) ? \sim 0 : 0
1519
1520 dst.z = (src0.z == src1.z) ? \sim 0 : 0
1521
1522 dst.w = (src0.w == src1.w) ? \sim 0 : 0
1523
1524
1525.. opcode:: USEQ - Integer Set On Equal
1526
1527 .. math::
1528
1529 dst.x = (src0.x == src1.x) ? \sim 0 : 0
1530
1531 dst.y = (src0.y == src1.y) ? \sim 0 : 0
1532
1533 dst.z = (src0.z == src1.z) ? \sim 0 : 0
1534
1535 dst.w = (src0.w == src1.w) ? \sim 0 : 0
1536
1537
1538.. opcode:: FSNE - Float Set On Not Equal (unordered)
1539
1540 Same comparison as SNE but returns integer instead of 1.0/0.0 float
1541
1542 .. math::
1543
1544 dst.x = (src0.x != src1.x) ? \sim 0 : 0
1545
1546 dst.y = (src0.y != src1.y) ? \sim 0 : 0
1547
1548 dst.z = (src0.z != src1.z) ? \sim 0 : 0
1549
1550 dst.w = (src0.w != src1.w) ? \sim 0 : 0
1551
1552
1553.. opcode:: USNE - Integer Set On Not Equal
1554
1555 .. math::
1556
1557 dst.x = (src0.x != src1.x) ? \sim 0 : 0
1558
1559 dst.y = (src0.y != src1.y) ? \sim 0 : 0
1560
1561 dst.z = (src0.z != src1.z) ? \sim 0 : 0
1562
1563 dst.w = (src0.w != src1.w) ? \sim 0 : 0
1564
1565
1566.. opcode:: INEG - Integer Negate
1567
1568 Two's complement.
1569
1570 .. math::
1571
1572 dst.x = -src.x
1573
1574 dst.y = -src.y
1575
1576 dst.z = -src.z
1577
1578 dst.w = -src.w
1579
1580
1581.. opcode:: IABS - Integer Absolute Value
1582
1583 .. math::
1584
1585 dst.x = |src.x|
1586
1587 dst.y = |src.y|
1588
1589 dst.z = |src.z|
1590
1591 dst.w = |src.w|
1592
1593Bitwise ISA
1594^^^^^^^^^^^
1595These opcodes are used for bit-level manipulation of integers.
1596
1597.. opcode:: IBFE - Signed Bitfield Extract
1598
1599 Like GLSL bitfieldExtract. Extracts a set of bits from the input, and
1600 sign-extends them if the high bit of the extracted window is set.
1601
1602 Pseudocode::
1603
1604 def ibfe(value, offset, bits):
1605 if offset < 0 or bits < 0 or offset + bits > 32:
1606 return undefined
1607 if bits == 0: return 0
1608 # Note: >> sign-extends
1609 return (value << (32 - offset - bits)) >> (32 - bits)
1610
1611.. opcode:: UBFE - Unsigned Bitfield Extract
1612
1613 Like GLSL bitfieldExtract. Extracts a set of bits from the input, without
1614 any sign-extension.
1615
1616 Pseudocode::
1617
1618 def ubfe(value, offset, bits):
1619 if offset < 0 or bits < 0 or offset + bits > 32:
1620 return undefined
1621 if bits == 0: return 0
1622 # Note: >> does not sign-extend
1623 return (value << (32 - offset - bits)) >> (32 - bits)
1624
1625.. opcode:: BFI - Bitfield Insert
1626
1627 Like GLSL bitfieldInsert. Replaces a bit region of 'base' with the low bits
1628 of 'insert'.
1629
1630 Pseudocode::
1631
1632 def bfi(base, insert, offset, bits):
1633 if offset < 0 or bits < 0 or offset + bits > 32:
1634 return undefined
1635 # << defined such that mask == ~0 when bits == 32, offset == 0
1636 mask = ((1 << bits) - 1) << offset
1637 return ((insert << offset) & mask) | (base & ~mask)
1638
1639.. opcode:: BREV - Bitfield Reverse
1640
1641 See SM5 instruction BFREV. Reverses the bits of the argument.
1642
1643.. opcode:: POPC - Population Count
1644
1645 See SM5 instruction COUNTBITS. Counts the number of set bits in the argument.
1646
1647.. opcode:: LSB - Index of lowest set bit
1648
1649 See SM5 instruction FIRSTBIT_LO. Computes the 0-based index of the first set
1650 bit of the argument. Returns -1 if none are set.
1651
1652.. opcode:: IMSB - Index of highest non-sign bit
1653
1654 See SM5 instruction FIRSTBIT_SHI. Computes the 0-based index of the highest
1655 non-sign bit of the argument (i.e. highest 0 bit for negative numbers,
1656 highest 1 bit for positive numbers). Returns -1 if all bits are the same
1657 (i.e. for inputs 0 and -1).
1658
1659.. opcode:: UMSB - Index of highest set bit
1660
1661 See SM5 instruction FIRSTBIT_HI. Computes the 0-based index of the highest
1662 set bit of the argument. Returns -1 if none are set.
1663
1664Geometry ISA
1665^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1666
1667These opcodes are only supported in geometry shaders; they have no meaning
1668in any other type of shader.
1669
1670.. opcode:: EMIT - Emit
1671
1672 Generate a new vertex for the current primitive into the specified vertex
1673 stream using the values in the output registers.
1674
1675
1676.. opcode:: ENDPRIM - End Primitive
1677
1678 Complete the current primitive in the specified vertex stream (consisting of
1679 the emitted vertices), and start a new one.
1680
1681
1682GLSL ISA
1683^^^^^^^^^^
1684
1685These opcodes are part of :term:`GLSL`'s opcode set. Support for these
1686opcodes is determined by a special capability bit, ``GLSL``.
1687Some require glsl version 1.30 (UIF/SWITCH/CASE/DEFAULT/ENDSWITCH).
1688
1689.. opcode:: CAL - Subroutine Call
1690
1691 Pseudocode::
1692
1693 push(pc)
1694 pc = target
1695
1696
1697.. opcode:: RET - Subroutine Call Return
1698
1699 Pseudocode::
1700
1701 pc = pop()
1702
1703
1704.. opcode:: CONT - Continue
1705
1706 Unconditionally moves the point of execution to the instruction after the
1707 last BGNLOOP. The instruction must appear within a BGNLOOP/ENDLOOP.
1708
1709.. note::
1710
1711 Support for CONT is determined by a special capability bit,
1712 ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information.
1713
1714
1715.. opcode:: BGNLOOP - Begin a Loop
1716
1717 Start a loop. Must have a matching ENDLOOP.
1718
1719
1720.. opcode:: BGNSUB - Begin Subroutine
1721
1722 Starts definition of a subroutine. Must have a matching ENDSUB.
1723
1724
1725.. opcode:: ENDLOOP - End a Loop
1726
1727 End a loop started with BGNLOOP.
1728
1729
1730.. opcode:: ENDSUB - End Subroutine
1731
1732 Ends definition of a subroutine.
1733
1734
1735.. opcode:: NOP - No Operation
1736
1737 Do nothing.
1738
1739
1740.. opcode:: BRK - Break
1741
1742 Unconditionally moves the point of execution to the instruction after the
1743 next ENDLOOP or ENDSWITCH. The instruction must appear within a
1744 BGNLOOP/ENDLOOP or SWITCH/ENDSWITCH.
1745
1746
1747.. opcode:: IF - Float If
1748
1749 Start an IF ... ELSE .. ENDIF block. Condition evaluates to true if
1750
1751 *src0.x* != 0.0
1752
1753 where *src0.x* is interpreted as a floating point register.
1754
1755
1756.. opcode:: UIF - Bitwise If
1757
1758 Start an UIF ... ELSE .. ENDIF block. Condition evaluates to true if
1759
1760 *src0.x* != 0
1761
1762 where *src0.x* is interpreted as an integer register.
1763
1764
1765.. opcode:: ELSE - Else
1766
1767 Starts an else block, after an IF or UIF statement.
1768
1769
1770.. opcode:: ENDIF - End If
1771
1772 Ends an IF or UIF block.
1773
1774
1775.. opcode:: SWITCH - Switch
1776
1777 Starts a C-style switch expression. The switch consists of one or multiple
1778 CASE statements, and at most one DEFAULT statement. Execution of a statement
1779 ends when a BRK is hit, but just like in C falling through to other cases
1780 without a break is allowed. Similarly, DEFAULT label is allowed anywhere not
1781 just as last statement, and fallthrough is allowed into/from it.
1782 CASE *src* arguments are evaluated at bit level against the SWITCH *src* argument.
1783
1784 Example::
1785
1786 SWITCH src[0].x
1787 CASE src[0].x
1788 (some instructions here)
1789 (optional BRK here)
1790 DEFAULT
1791 (some instructions here)
1792 (optional BRK here)
1793 CASE src[0].x
1794 (some instructions here)
1795 (optional BRK here)
1796 ENDSWITCH
1797
1798
1799.. opcode:: CASE - Switch case
1800
1801 This represents a switch case label. The *src* arg must be an integer immediate.
1802
1803
1804.. opcode:: DEFAULT - Switch default
1805
1806 This represents the default case in the switch, which is taken if no other
1807 case matches.
1808
1809
1810.. opcode:: ENDSWITCH - End of switch
1811
1812 Ends a switch expression.
1813
1814
1815Interpolation ISA
1816^^^^^^^^^^^^^^^^^
1817
1818The interpolation instructions allow an input to be interpolated in a
1819different way than its declaration. This corresponds to the GLSL 4.00
1820interpolateAt* functions. The first argument of each of these must come from
1821``TGSI_FILE_INPUT``.
1822
1823.. opcode:: INTERP_CENTROID - Interpolate at the centroid
1824
1825 Interpolates the varying specified by *src0* at the centroid
1826
1827.. opcode:: INTERP_SAMPLE - Interpolate at the specified sample
1828
1829 Interpolates the varying specified by *src0* at the sample id
1830 specified by *src1.x* (interpreted as an integer)
1831
1832.. opcode:: INTERP_OFFSET - Interpolate at the specified offset
1833
1834 Interpolates the varying specified by *src0* at the offset *src1.xy*
1835 from the pixel center (interpreted as floats)
1836
1837
1838.. _doubleopcodes:
1839
1840Double ISA
1841^^^^^^^^^^^^^^^
1842
1843The double-precision opcodes reinterpret four-component vectors into
1844two-component vectors with doubled precision in each component.
1845
1846.. opcode:: DABS - Absolute
1847
1848 .. math::
1849
1850 dst.xy = |src0.xy|
1851
1852 dst.zw = |src0.zw|
1853
1854.. opcode:: DADD - Add
1855
1856 .. math::
1857
1858 dst.xy = src0.xy + src1.xy
1859
1860 dst.zw = src0.zw + src1.zw
1861
1862.. opcode:: DSEQ - Set on Equal
1863
1864 .. math::
1865
1866 dst.x = src0.xy == src1.xy ? \sim 0 : 0
1867
1868 dst.z = src0.zw == src1.zw ? \sim 0 : 0
1869
1870.. opcode:: DSNE - Set on Not Equal
1871
1872 .. math::
1873
1874 dst.x = src0.xy != src1.xy ? \sim 0 : 0
1875
1876 dst.z = src0.zw != src1.zw ? \sim 0 : 0
1877
1878.. opcode:: DSLT - Set on Less than
1879
1880 .. math::
1881
1882 dst.x = src0.xy < src1.xy ? \sim 0 : 0
1883
1884 dst.z = src0.zw < src1.zw ? \sim 0 : 0
1885
1886.. opcode:: DSGE - Set on Greater equal
1887
1888 .. math::
1889
1890 dst.x = src0.xy >= src1.xy ? \sim 0 : 0
1891
1892 dst.z = src0.zw >= src1.zw ? \sim 0 : 0
1893
1894.. opcode:: DFRAC - Fraction
1895
1896 .. math::
1897
1898 dst.xy = src.xy - \lfloor src.xy\rfloor
1899
1900 dst.zw = src.zw - \lfloor src.zw\rfloor
1901
1902.. opcode:: DTRUNC - Truncate
1903
1904 .. math::
1905
1906 dst.xy = trunc(src.xy)
1907
1908 dst.zw = trunc(src.zw)
1909
1910.. opcode:: DCEIL - Ceiling
1911
1912 .. math::
1913
1914 dst.xy = \lceil src.xy\rceil
1915
1916 dst.zw = \lceil src.zw\rceil
1917
1918.. opcode:: DFLR - Floor
1919
1920 .. math::
1921
1922 dst.xy = \lfloor src.xy\rfloor
1923
1924 dst.zw = \lfloor src.zw\rfloor
1925
1926.. opcode:: DROUND - Fraction
1927
1928 .. math::
1929
1930 dst.xy = round(src.xy)
1931
1932 dst.zw = round(src.zw)
1933
1934.. opcode:: DSSG - Set Sign
1935
1936 .. math::
1937
1938 dst.xy = (src.xy > 0) ? 1.0 : (src.xy < 0) ? -1.0 : 0.0
1939
1940 dst.zw = (src.zw > 0) ? 1.0 : (src.zw < 0) ? -1.0 : 0.0
1941
1942.. opcode:: DLDEXP - Multiply Number by Integral Power of 2
1943
1944 This opcode is the inverse of frexp. The second
1945 source is an integer.
1946
1947 .. math::
1948
1949 dst.xy = src0.xy \times 2^{src1.x}
1950
1951 dst.zw = src0.zw \times 2^{src1.z}
1952
1953.. opcode:: DMIN - Minimum
1954
1955 .. math::
1956
1957 dst.xy = min(src0.xy, src1.xy)
1958
1959 dst.zw = min(src0.zw, src1.zw)
1960
1961.. opcode:: DMAX - Maximum
1962
1963 .. math::
1964
1965 dst.xy = max(src0.xy, src1.xy)
1966
1967 dst.zw = max(src0.zw, src1.zw)
1968
1969.. opcode:: DMUL - Multiply
1970
1971 .. math::
1972
1973 dst.xy = src0.xy \times src1.xy
1974
1975 dst.zw = src0.zw \times src1.zw
1976
1977
1978.. opcode:: DMAD - Multiply And Add
1979
1980 .. math::
1981
1982 dst.xy = src0.xy \times src1.xy + src2.xy
1983
1984 dst.zw = src0.zw \times src1.zw + src2.zw
1985
1986
1987.. opcode:: DFMA - Fused Multiply-Add
1988
1989 Perform a * b + c with no intermediate rounding step.
1990
1991 .. math::
1992
1993 dst.xy = src0.xy \times src1.xy + src2.xy
1994
1995 dst.zw = src0.zw \times src1.zw + src2.zw
1996
1997
1998.. opcode:: DDIV - Divide
1999
2000 .. math::
2001
2002 dst.xy = \frac{src0.xy}{src1.xy}
2003
2004 dst.zw = \frac{src0.zw}{src1.zw}
2005
2006
2007.. opcode:: DRCP - Reciprocal
2008
2009 .. math::
2010
2011 dst.xy = \frac{1}{src.xy}
2012
2013 dst.zw = \frac{1}{src.zw}
2014
2015.. opcode:: DSQRT - Square Root
2016
2017 .. math::
2018
2019 dst.xy = \sqrt{src.xy}
2020
2021 dst.zw = \sqrt{src.zw}
2022
2023.. opcode:: DRSQ - Reciprocal Square Root
2024
2025 .. math::
2026
2027 dst.xy = \frac{1}{\sqrt{src.xy}}
2028
2029 dst.zw = \frac{1}{\sqrt{src.zw}}
2030
2031.. opcode:: F2D - Float to Double
2032
2033 .. math::
2034
2035 dst.xy = double(src0.x)
2036
2037 dst.zw = double(src0.y)
2038
2039.. opcode:: D2F - Double to Float
2040
2041 .. math::
2042
2043 dst.x = float(src0.xy)
2044
2045 dst.y = float(src0.zw)
2046
2047.. opcode:: I2D - Int to Double
2048
2049 .. math::
2050
2051 dst.xy = double(src0.x)
2052
2053 dst.zw = double(src0.y)
2054
2055.. opcode:: D2I - Double to Int
2056
2057 .. math::
2058
2059 dst.x = int(src0.xy)
2060
2061 dst.y = int(src0.zw)
2062
2063.. opcode:: U2D - Unsigned Int to Double
2064
2065 .. math::
2066
2067 dst.xy = double(src0.x)
2068
2069 dst.zw = double(src0.y)
2070
2071.. opcode:: D2U - Double to Unsigned Int
2072
2073 .. math::
2074
2075 dst.x = unsigned(src0.xy)
2076
2077 dst.y = unsigned(src0.zw)
2078
207964-bit Integer ISA
2080^^^^^^^^^^^^^^^^^^
2081
2082The 64-bit integer opcodes reinterpret four-component vectors into
2083two-component vectors with 64-bits in each component.
2084
2085.. opcode:: I64ABS - 64-bit Integer Absolute Value
2086
2087 .. math::
2088
2089 dst.xy = |src0.xy|
2090
2091 dst.zw = |src0.zw|
2092
2093.. opcode:: I64NEG - 64-bit Integer Negate
2094
2095 Two's complement.
2096
2097 .. math::
2098
2099 dst.xy = -src.xy
2100
2101 dst.zw = -src.zw
2102
2103.. opcode:: I64SSG - 64-bit Integer Set Sign
2104
2105 .. math::
2106
2107 dst.xy = (src0.xy < 0) ? -1 : (src0.xy > 0) ? 1 : 0
2108
2109 dst.zw = (src0.zw < 0) ? -1 : (src0.zw > 0) ? 1 : 0
2110
2111.. opcode:: U64ADD - 64-bit Integer Add
2112
2113 .. math::
2114
2115 dst.xy = src0.xy + src1.xy
2116
2117 dst.zw = src0.zw + src1.zw
2118
2119.. opcode:: U64MUL - 64-bit Integer Multiply
2120
2121 .. math::
2122
2123 dst.xy = src0.xy * src1.xy
2124
2125 dst.zw = src0.zw * src1.zw
2126
2127.. opcode:: U64SEQ - 64-bit Integer Set on Equal
2128
2129 .. math::
2130
2131 dst.x = src0.xy == src1.xy ? \sim 0 : 0
2132
2133 dst.z = src0.zw == src1.zw ? \sim 0 : 0
2134
2135.. opcode:: U64SNE - 64-bit Integer Set on Not Equal
2136
2137 .. math::
2138
2139 dst.x = src0.xy != src1.xy ? \sim 0 : 0
2140
2141 dst.z = src0.zw != src1.zw ? \sim 0 : 0
2142
2143.. opcode:: U64SLT - 64-bit Unsigned Integer Set on Less Than
2144
2145 .. math::
2146
2147 dst.x = src0.xy < src1.xy ? \sim 0 : 0
2148
2149 dst.z = src0.zw < src1.zw ? \sim 0 : 0
2150
2151.. opcode:: U64SGE - 64-bit Unsigned Integer Set on Greater Equal
2152
2153 .. math::
2154
2155 dst.x = src0.xy >= src1.xy ? \sim 0 : 0
2156
2157 dst.z = src0.zw >= src1.zw ? \sim 0 : 0
2158
2159.. opcode:: I64SLT - 64-bit Signed Integer Set on Less Than
2160
2161 .. math::
2162
2163 dst.x = src0.xy < src1.xy ? \sim 0 : 0
2164
2165 dst.z = src0.zw < src1.zw ? \sim 0 : 0
2166
2167.. opcode:: I64SGE - 64-bit Signed Integer Set on Greater Equal
2168
2169 .. math::
2170
2171 dst.x = src0.xy >= src1.xy ? \sim 0 : 0
2172
2173 dst.z = src0.zw >= src1.zw ? \sim 0 : 0
2174
2175.. opcode:: I64MIN - Minimum of 64-bit Signed Integers
2176
2177 .. math::
2178
2179 dst.xy = min(src0.xy, src1.xy)
2180
2181 dst.zw = min(src0.zw, src1.zw)
2182
2183.. opcode:: U64MIN - Minimum of 64-bit Unsigned Integers
2184
2185 .. math::
2186
2187 dst.xy = min(src0.xy, src1.xy)
2188
2189 dst.zw = min(src0.zw, src1.zw)
2190
2191.. opcode:: I64MAX - Maximum of 64-bit Signed Integers
2192
2193 .. math::
2194
2195 dst.xy = max(src0.xy, src1.xy)
2196
2197 dst.zw = max(src0.zw, src1.zw)
2198
2199.. opcode:: U64MAX - Maximum of 64-bit Unsigned Integers
2200
2201 .. math::
2202
2203 dst.xy = max(src0.xy, src1.xy)
2204
2205 dst.zw = max(src0.zw, src1.zw)
2206
2207.. opcode:: U64SHL - Shift Left 64-bit Unsigned Integer
2208
2209 The shift count is masked with ``0x3f`` before the shift is applied.
2210
2211 .. math::
2212
2213 dst.xy = src0.xy \ll (0x3f \& src1.x)
2214
2215 dst.zw = src0.zw \ll (0x3f \& src1.y)
2216
2217.. opcode:: I64SHR - Arithmetic Shift Right (of 64-bit Signed Integer)
2218
2219 The shift count is masked with ``0x3f`` before the shift is applied.
2220
2221 .. math::
2222
2223 dst.xy = src0.xy \gg (0x3f \& src1.x)
2224
2225 dst.zw = src0.zw \gg (0x3f \& src1.y)
2226
2227.. opcode:: U64SHR - Logical Shift Right (of 64-bit Unsigned Integer)
2228
2229 The shift count is masked with ``0x3f`` before the shift is applied.
2230
2231 .. math::
2232
2233 dst.xy = src0.xy \gg (unsigned) (0x3f \& src1.x)
2234
2235 dst.zw = src0.zw \gg (unsigned) (0x3f \& src1.y)
2236
2237.. opcode:: I64DIV - 64-bit Signed Integer Division
2238
2239 .. math::
2240
2241 dst.xy = \frac{src0.xy}{src1.xy}
2242
2243 dst.zw = \frac{src0.zw}{src1.zw}
2244
2245.. opcode:: U64DIV - 64-bit Unsigned Integer Division
2246
2247 .. math::
2248
2249 dst.xy = \frac{src0.xy}{src1.xy}
2250
2251 dst.zw = \frac{src0.zw}{src1.zw}
2252
2253.. opcode:: U64MOD - 64-bit Unsigned Integer Remainder
2254
2255 .. math::
2256
2257 dst.xy = src0.xy \bmod src1.xy
2258
2259 dst.zw = src0.zw \bmod src1.zw
2260
2261.. opcode:: I64MOD - 64-bit Signed Integer Remainder
2262
2263 .. math::
2264
2265 dst.xy = src0.xy \bmod src1.xy
2266
2267 dst.zw = src0.zw \bmod src1.zw
2268
2269.. opcode:: F2U64 - Float to 64-bit Unsigned Int
2270
2271 .. math::
2272
2273 dst.xy = (uint64_t) src0.x
2274
2275 dst.zw = (uint64_t) src0.y
2276
2277.. opcode:: F2I64 - Float to 64-bit Int
2278
2279 .. math::
2280
2281 dst.xy = (int64_t) src0.x
2282
2283 dst.zw = (int64_t) src0.y
2284
2285.. opcode:: U2I64 - Unsigned Integer to 64-bit Integer
2286
2287 This is a zero extension.
2288
2289 .. math::
2290
2291 dst.xy = (int64_t) src0.x
2292
2293 dst.zw = (int64_t) src0.y
2294
2295.. opcode:: I2I64 - Signed Integer to 64-bit Integer
2296
2297 This is a sign extension.
2298
2299 .. math::
2300
2301 dst.xy = (int64_t) src0.x
2302
2303 dst.zw = (int64_t) src0.y
2304
2305.. opcode:: D2U64 - Double to 64-bit Unsigned Int
2306
2307 .. math::
2308
2309 dst.xy = (uint64_t) src0.xy
2310
2311 dst.zw = (uint64_t) src0.zw
2312
2313.. opcode:: D2I64 - Double to 64-bit Int
2314
2315 .. math::
2316
2317 dst.xy = (int64_t) src0.xy
2318
2319 dst.zw = (int64_t) src0.zw
2320
2321.. opcode:: U642F - 64-bit unsigned integer to float
2322
2323 .. math::
2324
2325 dst.x = (float) src0.xy
2326
2327 dst.y = (float) src0.zw
2328
2329.. opcode:: I642F - 64-bit Int to Float
2330
2331 .. math::
2332
2333 dst.x = (float) src0.xy
2334
2335 dst.y = (float) src0.zw
2336
2337.. opcode:: U642D - 64-bit unsigned integer to double
2338
2339 .. math::
2340
2341 dst.xy = (double) src0.xy
2342
2343 dst.zw = (double) src0.zw
2344
2345.. opcode:: I642D - 64-bit Int to double
2346
2347 .. math::
2348
2349 dst.xy = (double) src0.xy
2350
2351 dst.zw = (double) src0.zw
2352
2353.. _samplingopcodes:
2354
2355Resource Sampling Opcodes
2356^^^^^^^^^^^^^^^^^^^^^^^^^
2357
2358Those opcodes follow very closely semantics of the respective Direct3D
2359instructions. If in doubt double check Direct3D documentation.
2360Note that the swizzle on SVIEW (src1) determines texel swizzling
2361after lookup.
2362
2363.. opcode:: SAMPLE
2364
2365 Using provided address, sample data from the specified texture using the
2366 filtering mode identified by the given sampler. The source data may come from
2367 any resource type other than buffers.
2368
2369 Syntax: ``SAMPLE dst, address, sampler_view, sampler``
2370
2371 Example: ``SAMPLE TEMP[0], TEMP[1], SVIEW[0], SAMP[0]``
2372
2373.. opcode:: SAMPLE_I
2374
2375 Simplified alternative to the SAMPLE instruction. Using the provided
2376 integer address, SAMPLE_I fetches data from the specified sampler view
2377 without any filtering. The source data may come from any resource type
2378 other than CUBE.
2379
2380 Syntax: ``SAMPLE_I dst, address, sampler_view``
2381
2382 Example: ``SAMPLE_I TEMP[0], TEMP[1], SVIEW[0]``
2383
2384 The 'address' is specified as unsigned integers. If the 'address' is out of
2385 range [0...(# texels - 1)] the result of the fetch is always 0 in all
2386 components. As such the instruction doesn't honor address wrap modes, in
2387 cases where that behavior is desirable 'SAMPLE' instruction should be used.
2388 address.w always provides an unsigned integer mipmap level. If the value is
2389 out of the range then the instruction always returns 0 in all components.
2390 address.yz are ignored for buffers and 1d textures. address.z is ignored
2391 for 1d texture arrays and 2d textures.
2392
2393 For 1D texture arrays address.y provides the array index (also as unsigned
2394 integer). If the value is out of the range of available array indices
2395 [0... (array size - 1)] then the opcode always returns 0 in all components.
2396 For 2D texture arrays address.z provides the array index, otherwise it
2397 exhibits the same behavior as in the case for 1D texture arrays. The exact
2398 semantics of the source address are presented in the table below:
2399
2400 +---------------------------+----+-----+-----+---------+
2401 | resource type | X | Y | Z | W |
2402 +===========================+====+=====+=====+=========+
2403 | ``PIPE_BUFFER`` | x | | | ignored |
2404 +---------------------------+----+-----+-----+---------+
2405 | ``PIPE_TEXTURE_1D`` | x | | | mpl |
2406 +---------------------------+----+-----+-----+---------+
2407 | ``PIPE_TEXTURE_2D`` | x | y | | mpl |
2408 +---------------------------+----+-----+-----+---------+
2409 | ``PIPE_TEXTURE_3D`` | x | y | z | mpl |
2410 +---------------------------+----+-----+-----+---------+
2411 | ``PIPE_TEXTURE_RECT`` | x | y | | mpl |
2412 +---------------------------+----+-----+-----+---------+
2413 | ``PIPE_TEXTURE_CUBE`` | not allowed as source |
2414 +---------------------------+----+-----+-----+---------+
2415 | ``PIPE_TEXTURE_1D_ARRAY`` | x | idx | | mpl |
2416 +---------------------------+----+-----+-----+---------+
2417 | ``PIPE_TEXTURE_2D_ARRAY`` | x | y | idx | mpl |
2418 +---------------------------+----+-----+-----+---------+
2419
2420 Where 'mpl' is a mipmap level and 'idx' is the array index.
2421
2422.. opcode:: SAMPLE_I_MS
2423
2424 Just like SAMPLE_I but allows fetch data from multi-sampled surfaces.
2425
2426 Syntax: ``SAMPLE_I_MS dst, address, sampler_view, sample``
2427
2428.. opcode:: SAMPLE_B
2429
2430 Just like the SAMPLE instruction with the exception that an additional bias
2431 is applied to the level of detail computed as part of the instruction
2432 execution.
2433
2434 Syntax: ``SAMPLE_B dst, address, sampler_view, sampler, lod_bias``
2435
2436 Example: ``SAMPLE_B TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x``
2437
2438.. opcode:: SAMPLE_C
2439
2440 Similar to the SAMPLE instruction but it performs a comparison filter. The
2441 operands to SAMPLE_C are identical to SAMPLE, except that there is an
2442 additional float32 operand, reference value, which must be a register with
2443 single-component, or a scalar literal. SAMPLE_C makes the hardware use the
2444 current samplers compare_func (in pipe_sampler_state) to compare reference
2445 value against the red component value for the source resource at each texel
2446 that the currently configured texture filter covers based on the provided
2447 coordinates.
2448
2449 Syntax: ``SAMPLE_C dst, address, sampler_view.r, sampler, ref_value``
2450
2451 Example: ``SAMPLE_C TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x``
2452
2453.. opcode:: SAMPLE_C_LZ
2454
2455 Same as SAMPLE_C, but LOD is 0 and derivatives are ignored. The LZ stands
2456 for level-zero.
2457
2458 Syntax: ``SAMPLE_C_LZ dst, address, sampler_view.r, sampler, ref_value``
2459
2460 Example: ``SAMPLE_C_LZ TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x``
2461
2462
2463.. opcode:: SAMPLE_D
2464
2465 SAMPLE_D is identical to the SAMPLE opcode except that the derivatives for
2466 the source address in the x direction and the y direction are provided by
2467 extra parameters.
2468
2469 Syntax: ``SAMPLE_D dst, address, sampler_view, sampler, der_x, der_y``
2470
2471 Example: ``SAMPLE_D TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2], TEMP[3]``
2472
2473.. opcode:: SAMPLE_L
2474
2475 SAMPLE_L is identical to the SAMPLE opcode except that the LOD is provided
2476 directly as a scalar value, representing no anisotropy.
2477
2478 Syntax: ``SAMPLE_L dst, address, sampler_view, sampler, explicit_lod``
2479
2480 Example: ``SAMPLE_L TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x``
2481
2482.. opcode:: GATHER4
2483
2484 Gathers the four texels to be used in a bi-linear filtering operation and
2485 packs them into a single register. Only works with 2D, 2D array, cubemaps,
2486 and cubemaps arrays. For 2D textures, only the addressing modes of the
2487 sampler and the top level of any mip pyramid are used. Set W to zero. It
2488 behaves like the SAMPLE instruction, but a filtered sample is not
2489 generated. The four samples that contribute to filtering are placed into
2490 XYZW in counter-clockwise order, starting with the (u,v) texture coordinate
2491 delta at the following locations (-, +), (+, +), (+, -), (-, -), where the
2492 magnitude of the deltas are half a texel.
2493
2494
2495.. opcode:: SVIEWINFO
2496
2497 Query the dimensions of a given sampler view. dst receives width, height,
2498 depth or array size and number of mipmap levels as int4. The dst can have a
2499 writemask which will specify what info is the caller interested in.
2500
2501 Syntax: ``SVIEWINFO dst, src_mip_level, sampler_view``
2502
2503 Example: ``SVIEWINFO TEMP[0], TEMP[1].x, SVIEW[0]``
2504
2505 src_mip_level is an unsigned integer scalar. If it's out of range then
2506 returns 0 for width, height and depth/array size but the total number of
2507 mipmap is still returned correctly for the given sampler view. The returned
2508 width, height and depth values are for the mipmap level selected by the
2509 src_mip_level and are in the number of texels. For 1d texture array width
2510 is in dst.x, array size is in dst.y and dst.z is 0. The number of mipmaps is
2511 still in dst.w. In contrast to d3d10 resinfo, there's no way in the tgsi
2512 instruction encoding to specify the return type (float/rcpfloat/uint), hence
2513 always using uint. Also, unlike the SAMPLE instructions, the swizzle on src1
2514 resinfo allowing swizzling dst values is ignored (due to the interaction
2515 with rcpfloat modifier which requires some swizzle handling in the state
2516 tracker anyway).
2517
2518.. opcode:: SAMPLE_POS
2519
2520 Query the position of a sample in the given resource or render target
2521 when per-sample fragment shading is in effect.
2522
2523 Syntax: ``SAMPLE_POS dst, source, sample_index``
2524
2525 dst receives float4 (x, y, undef, undef) indicated where the sample is
2526 located. Sample locations are in the range [0, 1] where 0.5 is the center
2527 of the fragment.
2528
2529 source is either a sampler view (to indicate a shader resource) or temp
2530 register (to indicate the render target). The source register may have
2531 an optional swizzle to apply to the returned result
2532
2533 sample_index is an integer scalar indicating which sample position is to
2534 be queried.
2535
2536 If per-sample shading is not in effect or the source resource or render
2537 target is not multisampled, the result is (0.5, 0.5, undef, undef).
2538
2539 NOTE: no driver has implemented this opcode yet (and no gallium frontend
2540 emits it). This information is subject to change.
2541
2542.. opcode:: SAMPLE_INFO
2543
2544 Query the number of samples in a multisampled resource or render target.
2545
2546 Syntax: ``SAMPLE_INFO dst, source``
2547
2548 dst receives int4 (n, 0, 0, 0) where n is the number of samples in a
2549 resource or the render target.
2550
2551 source is either a sampler view (to indicate a shader resource) or temp
2552 register (to indicate the render target). The source register may have
2553 an optional swizzle to apply to the returned result
2554
2555 If per-sample shading is not in effect or the source resource or render
2556 target is not multisampled, the result is (1, 0, 0, 0).
2557
2558 NOTE: no driver has implemented this opcode yet (and no gallium frontend
2559 emits it). This information is subject to change.
2560
2561.. opcode:: LOD - level of detail
2562
2563 Same syntax as the SAMPLE opcode but instead of performing an actual
2564 texture lookup/filter, return the computed LOD information that the
2565 texture pipe would use to access the texture. The Y component contains
2566 the computed LOD lambda_prime. The X component contains the LOD that will
2567 be accessed, based on min/max lod's and mipmap filters.
2568 The Z and W components are set to 0.
2569
2570 Syntax: ``LOD dst, address, sampler_view, sampler``
2571
2572
2573.. _resourceopcodes:
2574
2575Resource Access Opcodes
2576^^^^^^^^^^^^^^^^^^^^^^^
2577
2578For these opcodes, the resource can be a BUFFER, IMAGE, or MEMORY.
2579
2580.. opcode:: LOAD - Fetch data from a shader buffer or image
2581
2582 Syntax: ``LOAD dst, resource, address``
2583
2584 Example: ``LOAD TEMP[0], BUFFER[0], TEMP[1]``
2585
2586 Using the provided integer address, LOAD fetches data from the
2587 specified buffer or texture without any filtering.
2588
2589 The 'address' is specified as a vector of unsigned integers. If the
2590 'address' is out of range the result is unspecified.
2591
2592 Only the first mipmap level of a resource can be read from using this
2593 instruction.
2594
2595 For 1D or 2D texture arrays, the array index is provided as an
2596 unsigned integer in address.y or address.z, respectively. address.yz
2597 are ignored for buffers and 1D textures. address.z is ignored for 1D
2598 texture arrays and 2D textures. address.w is always ignored.
2599
2600 A swizzle suffix may be added to the resource argument this will
2601 cause the resource data to be swizzled accordingly.
2602
2603.. opcode:: STORE - Write data to a shader resource
2604
2605 Syntax: ``STORE resource, address, src``
2606
2607 Example: ``STORE BUFFER[0], TEMP[0], TEMP[1]``
2608
2609 Using the provided integer address, STORE writes data to the
2610 specified buffer or texture.
2611
2612 The 'address' is specified as a vector of unsigned integers. If the
2613 'address' is out of range the result is unspecified.
2614
2615 Only the first mipmap level of a resource can be written to using
2616 this instruction.
2617
2618 For 1D or 2D texture arrays, the array index is provided as an
2619 unsigned integer in address.y or address.z, respectively.
2620 address.yz are ignored for buffers and 1D textures. address.z is
2621 ignored for 1D texture arrays and 2D textures. address.w is always
2622 ignored.
2623
2624.. opcode:: RESQ - Query information about a resource
2625
2626 Syntax: ``RESQ dst, resource``
2627
2628 Example: ``RESQ TEMP[0], BUFFER[0]``
2629
2630 Returns information about the buffer or image resource. For buffer
2631 resources, the size (in bytes) is returned in the x component. For
2632 image resources, .xyz will contain the width/height/layers of the
2633 image, while .w will contain the number of samples for multi-sampled
2634 images.
2635
2636.. opcode:: FBFETCH - Load data from framebuffer
2637
2638 Syntax: ``FBFETCH dst, output``
2639
2640 Example: ``FBFETCH TEMP[0], OUT[0]``
2641
2642 This is only valid on ``COLOR`` semantic outputs. Returns the color
2643 of the current position in the framebuffer from before this fragment
2644 shader invocation. May return the same value from multiple calls for
2645 a particular output within a single invocation. Note that result may
2646 be undefined if a fragment is drawn multiple times without a blend
2647 barrier in between.
2648
2649
2650.. _bindlessopcodes:
2651
2652Bindless Opcodes
2653^^^^^^^^^^^^^^^^
2654
2655These opcodes are for working with bindless sampler or image handles and
2656require PIPE_CAP_BINDLESS_TEXTURE.
2657
2658.. opcode:: IMG2HND - Get a bindless handle for a image
2659
2660 Syntax: ``IMG2HND dst, image``
2661
2662 Example: ``IMG2HND TEMP[0], IMAGE[0]``
2663
2664 Sets 'dst' to a bindless handle for 'image'.
2665
2666.. opcode:: SAMP2HND - Get a bindless handle for a sampler
2667
2668 Syntax: ``SAMP2HND dst, sampler``
2669
2670 Example: ``SAMP2HND TEMP[0], SAMP[0]``
2671
2672 Sets 'dst' to a bindless handle for 'sampler'.
2673
2674
2675.. _threadsyncopcodes:
2676
2677Inter-thread synchronization opcodes
2678^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2679
2680These opcodes are intended for communication between threads running
2681within the same compute grid. For now they're only valid in compute
2682programs.
2683
2684.. opcode:: BARRIER - Thread group barrier
2685
2686 ``BARRIER``
2687
2688 This opcode suspends the execution of the current thread until all
2689 the remaining threads in the working group reach the same point of
2690 the program. Results are unspecified if any of the remaining
2691 threads terminates or never reaches an executed BARRIER instruction.
2692
2693.. opcode:: MEMBAR - Memory barrier
2694
2695 ``MEMBAR type``
2696
2697 This opcode waits for the completion of all memory accesses based on
2698 the type passed in. The type is an immediate bitfield with the following
2699 meaning:
2700
2701 Bit 0: Shader storage buffers
2702 Bit 1: Atomic buffers
2703 Bit 2: Images
2704 Bit 3: Shared memory
2705 Bit 4: Thread group
2706
2707 These may be passed in in any combination. An implementation is free to not
2708 distinguish between these as it sees fit. However these map to all the
2709 possibilities made available by GLSL.
2710
2711.. _atomopcodes:
2712
2713Atomic opcodes
2714^^^^^^^^^^^^^^
2715
2716These opcodes provide atomic variants of some common arithmetic and
2717logical operations. In this context atomicity means that another
2718concurrent memory access operation that affects the same memory
2719location is guaranteed to be performed strictly before or after the
2720entire execution of the atomic operation. The resource may be a BUFFER,
2721IMAGE, HWATOMIC, or MEMORY. In the case of an image, the offset works
2722the same as for ``LOAD`` and ``STORE``, specified above. For atomic
2723counters, the offset is an immediate index to the base HW atomic
2724counter for this operation.
2725These atomic operations may only be used with 32-bit integer image formats.
2726
2727.. opcode:: ATOMUADD - Atomic integer addition
2728
2729 Syntax: ``ATOMUADD dst, resource, offset, src``
2730
2731 Example: ``ATOMUADD TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2732
2733 The following operation is performed atomically:
2734
2735 .. math::
2736
2737 dst_x = resource[offset]
2738
2739 resource[offset] = dst_x + src_x
2740
2741
2742.. opcode:: ATOMFADD - Atomic floating point addition
2743
2744 Syntax: ``ATOMFADD dst, resource, offset, src``
2745
2746 Example: ``ATOMFADD TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2747
2748 The following operation is performed atomically:
2749
2750 .. math::
2751
2752 dst_x = resource[offset]
2753
2754 resource[offset] = dst_x + src_x
2755
2756
2757.. opcode:: ATOMXCHG - Atomic exchange
2758
2759 Syntax: ``ATOMXCHG dst, resource, offset, src``
2760
2761 Example: ``ATOMXCHG TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2762
2763 The following operation is performed atomically:
2764
2765 .. math::
2766
2767 dst_x = resource[offset]
2768
2769 resource[offset] = src_x
2770
2771
2772.. opcode:: ATOMCAS - Atomic compare-and-exchange
2773
2774 Syntax: ``ATOMCAS dst, resource, offset, cmp, src``
2775
2776 Example: ``ATOMCAS TEMP[0], BUFFER[0], TEMP[1], TEMP[2], TEMP[3]``
2777
2778 The following operation is performed atomically:
2779
2780 .. math::
2781
2782 dst_x = resource[offset]
2783
2784 resource[offset] = (dst_x == cmp_x ? src_x : dst_x)
2785
2786
2787.. opcode:: ATOMAND - Atomic bitwise And
2788
2789 Syntax: ``ATOMAND dst, resource, offset, src``
2790
2791 Example: ``ATOMAND TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2792
2793 The following operation is performed atomically:
2794
2795 .. math::
2796
2797 dst_x = resource[offset]
2798
2799 resource[offset] = dst_x \& src_x
2800
2801
2802.. opcode:: ATOMOR - Atomic bitwise Or
2803
2804 Syntax: ``ATOMOR dst, resource, offset, src``
2805
2806 Example: ``ATOMOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2807
2808 The following operation is performed atomically:
2809
2810 .. math::
2811
2812 dst_x = resource[offset]
2813
2814 resource[offset] = dst_x | src_x
2815
2816
2817.. opcode:: ATOMXOR - Atomic bitwise Xor
2818
2819 Syntax: ``ATOMXOR dst, resource, offset, src``
2820
2821 Example: ``ATOMXOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2822
2823 The following operation is performed atomically:
2824
2825 .. math::
2826
2827 dst_x = resource[offset]
2828
2829 resource[offset] = dst_x \oplus src_x
2830
2831
2832.. opcode:: ATOMUMIN - Atomic unsigned minimum
2833
2834 Syntax: ``ATOMUMIN dst, resource, offset, src``
2835
2836 Example: ``ATOMUMIN TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2837
2838 The following operation is performed atomically:
2839
2840 .. math::
2841
2842 dst_x = resource[offset]
2843
2844 resource[offset] = (dst_x < src_x ? dst_x : src_x)
2845
2846
2847.. opcode:: ATOMUMAX - Atomic unsigned maximum
2848
2849 Syntax: ``ATOMUMAX dst, resource, offset, src``
2850
2851 Example: ``ATOMUMAX TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2852
2853 The following operation is performed atomically:
2854
2855 .. math::
2856
2857 dst_x = resource[offset]
2858
2859 resource[offset] = (dst_x > src_x ? dst_x : src_x)
2860
2861
2862.. opcode:: ATOMIMIN - Atomic signed minimum
2863
2864 Syntax: ``ATOMIMIN dst, resource, offset, src``
2865
2866 Example: ``ATOMIMIN TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2867
2868 The following operation is performed atomically:
2869
2870 .. math::
2871
2872 dst_x = resource[offset]
2873
2874 resource[offset] = (dst_x < src_x ? dst_x : src_x)
2875
2876
2877.. opcode:: ATOMIMAX - Atomic signed maximum
2878
2879 Syntax: ``ATOMIMAX dst, resource, offset, src``
2880
2881 Example: ``ATOMIMAX TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2882
2883 The following operation is performed atomically:
2884
2885 .. math::
2886
2887 dst_x = resource[offset]
2888
2889 resource[offset] = (dst_x > src_x ? dst_x : src_x)
2890
2891
2892.. opcode:: ATOMINC_WRAP - Atomic increment + wrap around
2893
2894 Syntax: ``ATOMINC_WRAP dst, resource, offset, src``
2895
2896 Example: ``ATOMINC_WRAP TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2897
2898 The following operation is performed atomically:
2899
2900 .. math::
2901
2902 dst_x = resource[offset] + 1
2903
2904 resource[offset] = dst_x <= src_x ? dst_x : 0
2905
2906
2907.. opcode:: ATOMDEC_WRAP - Atomic decrement + wrap around
2908
2909 Syntax: ``ATOMDEC_WRAP dst, resource, offset, src``
2910
2911 Example: ``ATOMDEC_WRAP TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2912
2913 The following operation is performed atomically:
2914
2915 .. math::
2916
2917 dst_x = resource[offset]
2918
2919 resource[offset] =
2920 \left\{
2921 \begin{array}{ c l }
2922 dst_x - 1 & \quad \textrm{if } dst_x \gt 0 \textrm{ and } dst_x \lt src_x \\
2923 0 & \quad \textrm{otherwise}
2924 \end{array}
2925 \right.
2926
2927.. _interlaneopcodes:
2928
2929Inter-lane opcodes
2930^^^^^^^^^^^^^^^^^^
2931
2932These opcodes reduce the given value across the shader invocations
2933running in the current SIMD group. Every thread in the subgroup will receive
2934the same result. The BALLOT operations accept a single-channel argument that
2935is treated as a boolean and produce a 64-bit value.
2936
2937.. opcode:: VOTE_ANY - Value is set in any of the active invocations
2938
2939 Syntax: ``VOTE_ANY dst, value``
2940
2941 Example: ``VOTE_ANY TEMP[0].x, TEMP[1].x``
2942
2943
2944.. opcode:: VOTE_ALL - Value is set in all of the active invocations
2945
2946 Syntax: ``VOTE_ALL dst, value``
2947
2948 Example: ``VOTE_ALL TEMP[0].x, TEMP[1].x``
2949
2950
2951.. opcode:: VOTE_EQ - Value is the same in all of the active invocations
2952
2953 Syntax: ``VOTE_EQ dst, value``
2954
2955 Example: ``VOTE_EQ TEMP[0].x, TEMP[1].x``
2956
2957
2958.. opcode:: BALLOT - Lanemask of whether the value is set in each active
2959 invocation
2960
2961 Syntax: ``BALLOT dst, value``
2962
2963 Example: ``BALLOT TEMP[0].xy, TEMP[1].x``
2964
2965 When the argument is a constant true, this produces a bitmask of active
2966 invocations. In fragment shaders, this can include helper invocations
2967 (invocations whose outputs and writes to memory are discarded, but which
2968 are used to compute derivatives).
2969
2970
2971.. opcode:: READ_FIRST - Broadcast the value from the first active
2972 invocation to all active lanes
2973
2974 Syntax: ``READ_FIRST dst, value``
2975
2976 Example: ``READ_FIRST TEMP[0], TEMP[1]``
2977
2978
2979.. opcode:: READ_INVOC - Retrieve the value from the given invocation
2980 (need not be uniform)
2981
2982 Syntax: ``READ_INVOC dst, value, invocation``
2983
2984 Example: ``READ_INVOC TEMP[0].xy, TEMP[1].xy, TEMP[2].x``
2985
2986 invocation.x controls the invocation number to read from for all channels.
2987 The invocation number must be the same across all active invocations in a
2988 sub-group; otherwise, the results are undefined.
2989
2990
2991Explanation of symbols used
2992------------------------------
2993
2994
2995Functions
2996^^^^^^^^^^^^^^
2997
2998
2999 :math:`|x|` Absolute value of ``x``.
3000
3001 :math:`\lceil x \rceil` Ceiling of ``x``.
3002
3003 clamp(x,y,z) Clamp x between y and z.
3004 (x < y) ? y : (x > z) ? z : x
3005
3006 :math:`\lfloor x\rfloor` Floor of ``x``.
3007
3008 :math:`\log_2{x}` Logarithm of ``x``, base 2.
3009
3010 max(x,y) Maximum of x and y.
3011 (x > y) ? x : y
3012
3013 min(x,y) Minimum of x and y.
3014 (x < y) ? x : y
3015
3016 partialx(x) Derivative of x relative to fragment's X.
3017
3018 partialy(x) Derivative of x relative to fragment's Y.
3019
3020 pop() Pop from stack.
3021
3022 :math:`x^y` ``x`` to the power ``y``.
3023
3024 push(x) Push x on stack.
3025
3026 round(x) Round x.
3027
3028 trunc(x) Truncate x, i.e. drop the fraction bits.
3029
3030
3031Keywords
3032^^^^^^^^^^^^^
3033
3034
3035 discard Discard fragment.
3036
3037 pc Program counter.
3038
3039 target Label of target instruction.
3040
3041
3042Other tokens
3043---------------
3044
3045
3046Declaration
3047^^^^^^^^^^^
3048
3049
3050Declares a register that is will be referenced as an operand in Instruction
3051tokens.
3052
3053File field contains register file that is being declared and is one
3054of TGSI_FILE.
3055
3056UsageMask field specifies which of the register components can be accessed
3057and is one of TGSI_WRITEMASK.
3058
3059The Local flag specifies that a given value isn't intended for
3060subroutine parameter passing and, as a result, the implementation
3061isn't required to give any guarantees of it being preserved across
3062subroutine boundaries. As it's merely a compiler hint, the
3063implementation is free to ignore it.
3064
3065If Dimension flag is set to 1, a Declaration Dimension token follows.
3066
3067If Semantic flag is set to 1, a Declaration Semantic token follows.
3068
3069If Interpolate flag is set to 1, a Declaration Interpolate token follows.
3070
3071If file is TGSI_FILE_RESOURCE, a Declaration Resource token follows.
3072
3073If Array flag is set to 1, a Declaration Array token follows.
3074
3075Array Declaration
3076^^^^^^^^^^^^^^^^^^^^^^^^
3077
3078Declarations can optional have an ArrayID attribute which can be referred by
3079indirect addressing operands. An ArrayID of zero is reserved and treated as
3080if no ArrayID is specified.
3081
3082If an indirect addressing operand refers to a specific declaration by using
3083an ArrayID only the registers in this declaration are guaranteed to be
3084accessed, accessing any register outside this declaration results in undefined
3085behavior. Note that for compatibility the effective index is zero-based and
3086not relative to the specified declaration
3087
3088If no ArrayID is specified with an indirect addressing operand the whole
3089register file might be accessed by this operand. This is strongly discouraged
3090and will prevent packing of scalar/vec2 arrays and effective alias analysis.
3091This is only legal for TEMP and CONST register files.
3092
3093Declaration Semantic
3094^^^^^^^^^^^^^^^^^^^^^^^^
3095
3096Vertex and fragment shader input and output registers may be labeled
3097with semantic information consisting of a name and index.
3098
3099Follows Declaration token if Semantic bit is set.
3100
3101Since its purpose is to link a shader with other stages of the pipeline,
3102it is valid to follow only those Declaration tokens that declare a register
3103either in INPUT or OUTPUT file.
3104
3105SemanticName field contains the semantic name of the register being declared.
3106There is no default value.
3107
3108SemanticIndex is an optional subscript that can be used to distinguish
3109different register declarations with the same semantic name. The default value
3110is 0.
3111
3112The meanings of the individual semantic names are explained in the following
3113sections.
3114
3115TGSI_SEMANTIC_POSITION
3116""""""""""""""""""""""
3117
3118For vertex shaders, TGSI_SEMANTIC_POSITION indicates the vertex shader
3119output register which contains the homogeneous vertex position in the clip
3120space coordinate system. After clipping, the X, Y and Z components of the
3121vertex will be divided by the W value to get normalized device coordinates.
3122
3123For fragment shaders, TGSI_SEMANTIC_POSITION is used to indicate that
3124fragment shader input (or system value, depending on which one is
3125supported by the driver) contains the fragment's window position. The X
3126component starts at zero and always increases from left to right.
3127The Y component starts at zero and always increases but Y=0 may either
3128indicate the top of the window or the bottom depending on the fragment
3129coordinate origin convention (see TGSI_PROPERTY_FS_COORD_ORIGIN).
3130The Z coordinate ranges from 0 to 1 to represent depth from the front
3131to the back of the Z buffer. The W component contains the interpolated
3132reciprocal of the vertex position W component (corresponding to gl_Fragcoord,
3133but unlike d3d10 which interpolates the same 1/w but then gives back
3134the reciprocal of the interpolated value).
3135
3136Fragment shaders may also declare an output register with
3137TGSI_SEMANTIC_POSITION. Only the Z component is writable. This allows
3138the fragment shader to change the fragment's Z position.
3139
3140
3141
3142TGSI_SEMANTIC_COLOR
3143"""""""""""""""""""
3144
3145For vertex shader outputs or fragment shader inputs/outputs, this
3146label indicates that the register contains an R,G,B,A color.
3147
3148Several shader inputs/outputs may contain colors so the semantic index
3149is used to distinguish them. For example, color[0] may be the diffuse
3150color while color[1] may be the specular color.
3151
3152This label is needed so that the flat/smooth shading can be applied
3153to the right interpolants during rasterization.
3154
3155
3156
3157TGSI_SEMANTIC_BCOLOR
3158""""""""""""""""""""
3159
3160Back-facing colors are only used for back-facing polygons, and are only valid
3161in vertex shader outputs. After rasterization, all polygons are front-facing
3162and COLOR and BCOLOR end up occupying the same slots in the fragment shader,
3163so all BCOLORs effectively become regular COLORs in the fragment shader.
3164
3165
3166TGSI_SEMANTIC_FOG
3167"""""""""""""""""
3168
3169Vertex shader inputs and outputs and fragment shader inputs may be
3170labeled with TGSI_SEMANTIC_FOG to indicate that the register contains
3171a fog coordinate. Typically, the fragment shader will use the fog coordinate
3172to compute a fog blend factor which is used to blend the normal fragment color
3173with a constant fog color. But fog coord really is just an ordinary vec4
3174register like regular semantics.
3175
3176
3177TGSI_SEMANTIC_PSIZE
3178"""""""""""""""""""
3179
3180Vertex shader input and output registers may be labeled with
3181TGIS_SEMANTIC_PSIZE to indicate that the register contains a point size
3182in the form (S, 0, 0, 1). The point size controls the width or diameter
3183of points for rasterization. This label cannot be used in fragment
3184shaders.
3185
3186When using this semantic, be sure to set the appropriate state in the
3187:ref:`rasterizer` first.
3188
3189
3190TGSI_SEMANTIC_TEXCOORD
3191""""""""""""""""""""""
3192
3193Only available if PIPE_CAP_TGSI_TEXCOORD is exposed !
3194
3195Vertex shader outputs and fragment shader inputs may be labeled with
3196this semantic to make them replaceable by sprite coordinates via the
3197sprite_coord_enable state in the :ref:`rasterizer`.
3198The semantic index permitted with this semantic is limited to <= 7.
3199
3200If the driver does not support TEXCOORD, sprite coordinate replacement
3201applies to inputs with the GENERIC semantic instead.
3202
3203The intended use case for this semantic is gl_TexCoord.
3204
3205
3206TGSI_SEMANTIC_PCOORD
3207""""""""""""""""""""
3208
3209Only available if PIPE_CAP_TGSI_TEXCOORD is exposed !
3210
3211Fragment shader inputs may be labeled with TGSI_SEMANTIC_PCOORD to indicate
3212that the register contains sprite coordinates in the form (x, y, 0, 1), if
3213the current primitive is a point and point sprites are enabled. Otherwise,
3214the contents of the register are undefined.
3215
3216The intended use case for this semantic is gl_PointCoord.
3217
3218
3219TGSI_SEMANTIC_GENERIC
3220"""""""""""""""""""""
3221
3222All vertex/fragment shader inputs/outputs not labeled with any other
3223semantic label can be considered to be generic attributes. Typical
3224uses of generic inputs/outputs are texcoords and user-defined values.
3225
3226
3227TGSI_SEMANTIC_NORMAL
3228""""""""""""""""""""
3229
3230Indicates that a vertex shader input is a normal vector. This is
3231typically only used for legacy graphics APIs.
3232
3233
3234TGSI_SEMANTIC_FACE
3235""""""""""""""""""
3236
3237This label applies to fragment shader inputs (or system values,
3238depending on which one is supported by the driver) and indicates that
3239the register contains front/back-face information.
3240
3241If it is an input, it will be a floating-point vector in the form (F, 0, 0, 1),
3242where F will be positive when the fragment belongs to a front-facing polygon,
3243and negative when the fragment belongs to a back-facing polygon.
3244
3245If it is a system value, it will be an integer vector in the form (F, 0, 0, 1),
3246where F is ``0xffffffff`` when the fragment belongs to a front-facing polygon
3247and ``0`` when the fragment belongs to a back-facing polygon.
3248
3249
3250TGSI_SEMANTIC_EDGEFLAG
3251""""""""""""""""""""""
3252
3253For vertex shaders, this semantic label indicates that an input or
3254output is a boolean edge flag. The register layout is [F, x, x, x]
3255where F is 0.0 or 1.0 and x = don't care. Normally, the vertex shader
3256simply copies the edge flag input to the edgeflag output.
3257
3258Edge flags are used to control which lines or points are actually
3259drawn when the polygon mode converts triangles/quads/polygons into
3260points or lines.
3261
3262
3263TGSI_SEMANTIC_STENCIL
3264"""""""""""""""""""""
3265
3266For fragment shaders, this semantic label indicates that an output
3267is a writable stencil reference value. Only the Y component is writable.
3268This allows the fragment shader to change the fragments stencilref value.
3269
3270
3271TGSI_SEMANTIC_VIEWPORT_INDEX
3272""""""""""""""""""""""""""""
3273
3274For geometry shaders, this semantic label indicates that an output
3275contains the index of the viewport (and scissor) to use.
3276This is an integer value, and only the X component is used.
3277
3278If PIPE_CAP_VS_LAYER_VIEWPORT or PIPE_CAP_TES_LAYER_VIEWPORT is
3279supported, then this semantic label can also be used in vertex or
3280tessellation evaluation shaders, respectively. Only the value written in the
3281last vertex processing stage is used.
3282
3283
3284TGSI_SEMANTIC_LAYER
3285"""""""""""""""""""
3286
3287For geometry shaders, this semantic label indicates that an output
3288contains the layer value to use for the color and depth/stencil surfaces.
3289This is an integer value, and only the X component is used.
3290(Also known as rendertarget array index.)
3291
3292If PIPE_CAP_VS_LAYER_VIEWPORT or PIPE_CAP_TES_LAYER_VIEWPORT is
3293supported, then this semantic label can also be used in vertex or
3294tessellation evaluation shaders, respectively. Only the value written in the
3295last vertex processing stage is used.
3296
3297
3298TGSI_SEMANTIC_CLIPDIST
3299""""""""""""""""""""""
3300
3301Note this covers clipping and culling distances.
3302
3303When components of vertex elements are identified this way, these
3304values are each assumed to be a float32 signed distance to a plane.
3305
3306For clip distances:
3307Primitive setup only invokes rasterization on pixels for which
3308the interpolated plane distances are >= 0.
3309
3310For cull distances:
3311Primitives will be completely discarded if the plane distance
3312for all of the vertices in the primitive are < 0.
3313If a vertex has a cull distance of NaN, that vertex counts as "out"
3314(as if its < 0);
3315
3316Multiple clip/cull planes can be implemented simultaneously, by
3317annotating multiple components of one or more vertex elements with
3318the above specified semantic.
3319The limits on both clip and cull distances are bound
3320by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines
3321the maximum number of components that can be used to hold the
3322distances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT
3323which specifies the maximum number of registers which can be
3324annotated with those semantics.
3325The properties NUM_CLIPDIST_ENABLED and NUM_CULLDIST_ENABLED
3326are used to divide up the 2 x vec4 space between clipping and culling.
3327
3328TGSI_SEMANTIC_SAMPLEID
3329""""""""""""""""""""""
3330
3331For fragment shaders, this semantic label indicates that a system value
3332contains the current sample id (i.e. gl_SampleID) as an unsigned int.
3333Only the X component is used. If per-sample shading is not enabled,
3334the result is (0, undef, undef, undef).
3335
3336Note that if the fragment shader uses this system value, the fragment
3337shader is automatically executed at per sample frequency.
3338
3339TGSI_SEMANTIC_SAMPLEPOS
3340"""""""""""""""""""""""
3341
3342For fragment shaders, this semantic label indicates that a system
3343value contains the current sample's position as float4(x, y, undef, undef)
3344in the render target (i.e. gl_SamplePosition) when per-fragment shading
3345is in effect. Position values are in the range [0, 1] where 0.5 is
3346the center of the fragment.
3347
3348Note that if the fragment shader uses this system value, the fragment
3349shader is automatically executed at per sample frequency.
3350
3351TGSI_SEMANTIC_SAMPLEMASK
3352""""""""""""""""""""""""
3353
3354For fragment shaders, this semantic label can be applied to either a
3355shader system value input or output.
3356
3357For a system value, the sample mask indicates the set of samples covered by
3358the current primitive. If MSAA is not enabled, the value is (1, 0, 0, 0).
3359
3360For an output, the sample mask is used to disable further sample processing.
3361
3362For both, the register type is uint[4] but only the X component is used
3363(i.e. gl_SampleMask[0]). Each bit corresponds to one sample position (up
3364to 32x MSAA is supported).
3365
3366TGSI_SEMANTIC_INVOCATIONID
3367""""""""""""""""""""""""""
3368
3369For geometry shaders, this semantic label indicates that a system value
3370contains the current invocation id (i.e. gl_InvocationID).
3371This is an integer value, and only the X component is used.
3372
3373TGSI_SEMANTIC_INSTANCEID
3374""""""""""""""""""""""""
3375
3376For vertex shaders, this semantic label indicates that a system value contains
3377the current instance id (i.e. gl_InstanceID). It does not include the base
3378instance. This is an integer value, and only the X component is used.
3379
3380TGSI_SEMANTIC_VERTEXID
3381""""""""""""""""""""""
3382
3383For vertex shaders, this semantic label indicates that a system value contains
3384the current vertex id (i.e. gl_VertexID). It does (unlike in d3d10) include the
3385base vertex. This is an integer value, and only the X component is used.
3386
3387TGSI_SEMANTIC_VERTEXID_NOBASE
3388"""""""""""""""""""""""""""""""
3389
3390For vertex shaders, this semantic label indicates that a system value contains
3391the current vertex id without including the base vertex (this corresponds to
3392d3d10 vertex id, so TGSI_SEMANTIC_VERTEXID_NOBASE + TGSI_SEMANTIC_BASEVERTEX
3393== TGSI_SEMANTIC_VERTEXID). This is an integer value, and only the X component
3394is used.
3395
3396TGSI_SEMANTIC_BASEVERTEX
3397""""""""""""""""""""""""
3398
3399For vertex shaders, this semantic label indicates that a system value contains
3400the base vertex (i.e. gl_BaseVertex). Note that for non-indexed draw calls,
3401this contains the first (or start) value instead.
3402This is an integer value, and only the X component is used.
3403
3404TGSI_SEMANTIC_PRIMID
3405""""""""""""""""""""
3406
3407For geometry and fragment shaders, this semantic label indicates the value
3408contains the primitive id (i.e. gl_PrimitiveID). This is an integer value,
3409and only the X component is used.
3410FIXME: This right now can be either a ordinary input or a system value...
3411
3412
3413TGSI_SEMANTIC_PATCH
3414"""""""""""""""""""
3415
3416For tessellation evaluation/control shaders, this semantic label indicates a
3417generic per-patch attribute. Such semantics will not implicitly be per-vertex
3418arrays.
3419
3420TGSI_SEMANTIC_TESSCOORD
3421"""""""""""""""""""""""
3422
3423For tessellation evaluation shaders, this semantic label indicates the
3424coordinates of the vertex being processed. This is available in XYZ; W is
3425undefined.
3426
3427TGSI_SEMANTIC_TESSOUTER
3428"""""""""""""""""""""""
3429
3430For tessellation evaluation/control shaders, this semantic label indicates the
3431outer tessellation levels of the patch. Isoline tessellation will only have XY
3432defined, triangle will have XYZ and quads will have XYZW defined. This
3433corresponds to gl_TessLevelOuter.
3434
3435TGSI_SEMANTIC_TESSINNER
3436"""""""""""""""""""""""
3437
3438For tessellation evaluation/control shaders, this semantic label indicates the
3439inner tessellation levels of the patch. The X value is only defined for
3440triangle tessellation, while quads will have XY defined. This is entirely
3441undefined for isoline tessellation.
3442
3443TGSI_SEMANTIC_VERTICESIN
3444""""""""""""""""""""""""
3445
3446For tessellation evaluation/control shaders, this semantic label indicates the
3447number of vertices provided in the input patch. Only the X value is defined.
3448
3449TGSI_SEMANTIC_HELPER_INVOCATION
3450"""""""""""""""""""""""""""""""
3451
3452For fragment shaders, this semantic indicates whether the current
3453invocation is covered or not. Helper invocations are created in order
3454to properly compute derivatives, however it may be desirable to skip
3455some of the logic in those cases. See ``gl_HelperInvocation`` documentation.
3456
3457TGSI_SEMANTIC_BASEINSTANCE
3458""""""""""""""""""""""""""
3459
3460For vertex shaders, the base instance argument supplied for this
3461draw. This is an integer value, and only the X component is used.
3462
3463TGSI_SEMANTIC_DRAWID
3464""""""""""""""""""""
3465
3466For vertex shaders, the zero-based index of the current draw in a
3467``glMultiDraw*`` invocation. This is an integer value, and only the X
3468component is used.
3469
3470
3471TGSI_SEMANTIC_WORK_DIM
3472""""""""""""""""""""""
3473
3474For compute shaders started via OpenCL this retrieves the work_dim
3475parameter to the clEnqueueNDRangeKernel call with which the shader
3476was started.
3477
3478
3479TGSI_SEMANTIC_GRID_SIZE
3480"""""""""""""""""""""""
3481
3482For compute shaders, this semantic indicates the maximum (x, y, z) dimensions
3483of a grid of thread blocks.
3484
3485
3486TGSI_SEMANTIC_BLOCK_ID
3487""""""""""""""""""""""
3488
3489For compute shaders, this semantic indicates the (x, y, z) coordinates of the
3490current block inside of the grid.
3491
3492
3493TGSI_SEMANTIC_BLOCK_SIZE
3494""""""""""""""""""""""""
3495
3496For compute shaders, this semantic indicates the maximum (x, y, z) dimensions
3497of a block in threads.
3498
3499
3500TGSI_SEMANTIC_THREAD_ID
3501"""""""""""""""""""""""
3502
3503For compute shaders, this semantic indicates the (x, y, z) coordinates of the
3504current thread inside of the block.
3505
3506
3507TGSI_SEMANTIC_SUBGROUP_SIZE
3508"""""""""""""""""""""""""""
3509
3510This semantic indicates the subgroup size for the current invocation. This is
3511an integer of at most 64, as it indicates the width of lanemasks. It does not
3512depend on the number of invocations that are active.
3513
3514
3515TGSI_SEMANTIC_SUBGROUP_INVOCATION
3516"""""""""""""""""""""""""""""""""
3517
3518The index of the current invocation within its subgroup.
3519
3520
3521TGSI_SEMANTIC_SUBGROUP_EQ_MASK
3522""""""""""""""""""""""""""""""
3523
3524A bit mask of ``bit index == TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3525``1 << subgroup_invocation`` in arbitrary precision arithmetic.
3526
3527
3528TGSI_SEMANTIC_SUBGROUP_GE_MASK
3529""""""""""""""""""""""""""""""
3530
3531A bit mask of ``bit index >= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3532``((1 << (subgroup_size - subgroup_invocation)) - 1) << subgroup_invocation``
3533in arbitrary precision arithmetic.
3534
3535
3536TGSI_SEMANTIC_SUBGROUP_GT_MASK
3537""""""""""""""""""""""""""""""
3538
3539A bit mask of ``bit index > TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3540``((1 << (subgroup_size - subgroup_invocation - 1)) - 1) << (subgroup_invocation + 1)``
3541in arbitrary precision arithmetic.
3542
3543
3544TGSI_SEMANTIC_SUBGROUP_LE_MASK
3545""""""""""""""""""""""""""""""
3546
3547A bit mask of ``bit index <= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3548``(1 << (subgroup_invocation + 1)) - 1`` in arbitrary precision arithmetic.
3549
3550
3551TGSI_SEMANTIC_SUBGROUP_LT_MASK
3552""""""""""""""""""""""""""""""
3553
3554A bit mask of ``bit index < TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3555``(1 << subgroup_invocation) - 1`` in arbitrary precision arithmetic.
3556
3557
3558TGSI_SEMANTIC_VIEWPORT_MASK
3559"""""""""""""""""""""""""""
3560
3561A bit mask of viewports to broadcast the current primitive to. See
3562:ext:`GL_NV_viewport_array2` for more details.
3563
3564
3565TGSI_SEMANTIC_TESS_DEFAULT_OUTER_LEVEL
3566""""""""""""""""""""""""""""""""""""""
3567
3568A system value equal to the default_outer_level array set via set_tess_level.
3569
3570
3571TGSI_SEMANTIC_TESS_DEFAULT_INNER_LEVEL
3572""""""""""""""""""""""""""""""""""""""
3573
3574A system value equal to the default_inner_level array set via set_tess_level.
3575
3576
3577Declaration Interpolate
3578^^^^^^^^^^^^^^^^^^^^^^^
3579
3580This token is only valid for fragment shader INPUT declarations.
3581
3582The Interpolate field specifies the way input is being interpolated by
3583the rasterizer and is one of TGSI_INTERPOLATE_*.
3584
3585The Location field specifies the location inside the pixel that the
3586interpolation should be done at, one of ``TGSI_INTERPOLATE_LOC_*``. Note that
3587when per-sample shading is enabled, the implementation may choose to
3588interpolate at the sample irrespective of the Location field.
3589
3590
3591Declaration Sampler View
3592^^^^^^^^^^^^^^^^^^^^^^^^
3593
3594Follows Declaration token if file is TGSI_FILE_SAMPLER_VIEW.
3595
3596DCL SVIEW[#], resource, type(s)
3597
3598Declares a shader input sampler view and assigns it to a SVIEW[#]
3599register.
3600
3601resource can be one of BUFFER, 1D, 2D, 3D, 1D_ARRAY and 2D_ARRAY.
3602
3603type must be 1 or 4 entries (if specifying on a per-component
3604level) out of UNORM, SNORM, SINT, UINT and FLOAT.
3605
3606For TEX\* style texture sample opcodes (as opposed to SAMPLE\* opcodes
3607which take an explicit SVIEW[#] source register), there may be optionally
3608SVIEW[#] declarations. In this case, the SVIEW index is implied by the
3609SAMP index, and there must be a corresponding SVIEW[#] declaration for
3610each SAMP[#] declaration. Drivers are free to ignore this if they wish.
3611But note in particular that some drivers need to know the sampler type
3612(float/int/unsigned) in order to generate the correct code, so cases
3613where integer textures are sampled, SVIEW[#] declarations should be
3614used.
3615
3616NOTE: It is NOT legal to mix SAMPLE\* style opcodes and TEX\* opcodes
3617in the same shader.
3618
3619Declaration Resource
3620^^^^^^^^^^^^^^^^^^^^
3621
3622Follows Declaration token if file is TGSI_FILE_RESOURCE.
3623
3624DCL RES[#], resource [, WR] [, RAW]
3625
3626Declares a shader input resource and assigns it to a RES[#]
3627register.
3628
3629resource can be one of BUFFER, 1D, 2D, 3D, CUBE, 1D_ARRAY and
36302D_ARRAY.
3631
3632If the RAW keyword is not specified, the texture data will be
3633subject to conversion, swizzling and scaling as required to yield
3634the specified data type from the physical data format of the bound
3635resource.
3636
3637If the RAW keyword is specified, no channel conversion will be
3638performed: the values read for each of the channels (X,Y,Z,W) will
3639correspond to consecutive words in the same order and format
3640they're found in memory. No element-to-address conversion will be
3641performed either: the value of the provided X coordinate will be
3642interpreted in byte units instead of texel units. The result of
3643accessing a misaligned address is undefined.
3644
3645Usage of the STORE opcode is only allowed if the WR (writable) flag
3646is set.
3647
3648Hardware Atomic Register File
3649^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3650
3651Hardware atomics are declared as a 2D array with an optional array id.
3652
3653The first member of the dimension is the buffer resource the atomic
3654is located in.
3655The second member is a range into the buffer resource, either for
3656one or multiple counters. If this is an array, the declaration will have
3657an unique array id.
3658
3659Each counter is 4 bytes in size, and index and ranges are in counters not bytes.
3660DCL HWATOMIC[0][0]
3661DCL HWATOMIC[0][1]
3662
3663This declares two atomics, one at the start of the buffer and one in the
3664second 4 bytes.
3665
3666DCL HWATOMIC[0][0]
3667DCL HWATOMIC[1][0]
3668DCL HWATOMIC[1][1..3], ARRAY(1)
3669
3670This declares 5 atomics, one in buffer 0 at 0,
3671one in buffer 1 at 0, and an array of 3 atomics in
3672the buffer 1, starting at 1.
3673
3674Properties
3675^^^^^^^^^^^^^^^^^^^^^^^^
3676
3677Properties are general directives that apply to the whole TGSI program.
3678
3679FS_COORD_ORIGIN
3680"""""""""""""""
3681
3682Specifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin.
3683The default value is UPPER_LEFT.
3684
3685If UPPER_LEFT, the position will be (0,0) at the upper left corner and
3686increase downward and rightward.
3687If LOWER_LEFT, the position will be (0,0) at the lower left corner and
3688increase upward and rightward.
3689
3690OpenGL defaults to LOWER_LEFT, and is configurable with the
3691:ext:`GL_ARB_fragment_coord_conventions` extension.
3692
3693DirectX 9/10 use UPPER_LEFT.
3694
3695FS_COORD_PIXEL_CENTER
3696"""""""""""""""""""""
3697
3698Specifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention.
3699The default value is HALF_INTEGER.
3700
3701If HALF_INTEGER, the fractional part of the position will be 0.5
3702If INTEGER, the fractional part of the position will be 0.0
3703
3704Note that this does not affect the set of fragments generated by
3705rasterization, which is instead controlled by half_pixel_center in the
3706rasterizer.
3707
3708OpenGL defaults to HALF_INTEGER, and is configurable with the
3709:ext:`GL_ARB_fragment_coord_conventions` extension.
3710
3711DirectX 9 uses INTEGER.
3712DirectX 10 uses HALF_INTEGER.
3713
3714FS_COLOR0_WRITES_ALL_CBUFS
3715""""""""""""""""""""""""""
3716Specifies that writes to the fragment shader color 0 are replicated to all
3717bound cbufs. This facilitates OpenGL's fragColor output vs fragData[0] where
3718fragData is directed to a single color buffer, but fragColor is broadcast.
3719
3720VS_PROHIBIT_UCPS
3721""""""""""""""""""""""""""
3722If this property is set on the program bound to the shader stage before the
3723fragment shader, user clip planes should have no effect (be disabled) even if
3724that shader does not write to any clip distance outputs and the rasterizer's
3725clip_plane_enable is non-zero.
3726This property is only supported by drivers that also support shader clip
3727distance outputs.
3728This is useful for APIs that don't have UCPs and where clip distances written
3729by a shader cannot be disabled.
3730
3731GS_INVOCATIONS
3732""""""""""""""
3733
3734Specifies the number of times a geometry shader should be executed for each
3735input primitive. Each invocation will have a different
3736TGSI_SEMANTIC_INVOCATIONID system value set. If not specified, assumed to
3737be 1.
3738
3739VS_WINDOW_SPACE_POSITION
3740""""""""""""""""""""""""""
3741If this property is set on the vertex shader, the TGSI_SEMANTIC_POSITION output
3742is assumed to contain window space coordinates.
3743Division of X,Y,Z by W and the viewport transformation are disabled, and 1/W is
3744directly taken from the 4-th component of the shader output.
3745Naturally, clipping is not performed on window coordinates either.
3746The effect of this property is undefined if a geometry or tessellation shader
3747are in use.
3748
3749TCS_VERTICES_OUT
3750""""""""""""""""
3751
3752The number of vertices written by the tessellation control shader. This
3753effectively defines the patch input size of the tessellation evaluation shader
3754as well.
3755
3756TES_PRIM_MODE
3757"""""""""""""
3758
3759This sets the tessellation primitive mode, one of ``MESA_PRIM_TRIANGLES``,
3760``MESA_PRIM_QUADS``, or ``MESA_PRIM_LINES``. (Unlike in GL, there is no
3761separate isolines settings, the regular lines is assumed to mean isolines.)
3762
3763TES_SPACING
3764"""""""""""
3765
3766This sets the spacing mode of the tessellation generator, one of
3767``PIPE_TESS_SPACING_*``.
3768
3769TES_VERTEX_ORDER_CW
3770"""""""""""""""""""
3771
3772This sets the vertex order to be clockwise if the value is 1, or
3773counter-clockwise if set to 0.
3774
3775TES_POINT_MODE
3776""""""""""""""
3777
3778If set to a non-zero value, this turns on point mode for the tessellator,
3779which means that points will be generated instead of primitives.
3780
3781NUM_CLIPDIST_ENABLED
3782""""""""""""""""""""
3783
3784How many clip distance scalar outputs are enabled.
3785
3786NUM_CULLDIST_ENABLED
3787""""""""""""""""""""
3788
3789How many cull distance scalar outputs are enabled.
3790
3791FS_EARLY_DEPTH_STENCIL
3792""""""""""""""""""""""
3793
3794Whether depth test, stencil test, and occlusion query should run before
3795the fragment shader (regardless of fragment shader side effects). Corresponds
3796to GLSL early_fragment_tests.
3797
3798NEXT_SHADER
3799"""""""""""
3800
3801Which shader stage will MOST LIKELY follow after this shader when the shader
3802is bound. This is only a hint to the driver and doesn't have to be precise.
3803Only set for VS and TES.
3804
3805CS_FIXED_BLOCK_WIDTH / HEIGHT / DEPTH
3806"""""""""""""""""""""""""""""""""""""
3807
3808Threads per block in each dimension, if known at compile time. If the block size
3809is known all three should be at least 1. If it is unknown they should all be set
3810to 0 or not set.
3811
3812LEGACY_MATH_RULES
3813"""""""""""""""""
3814
3815The MUL TGSI operation (FP32 multiplication) will return 0 if either
3816of the operands are equal to 0. That means that 0 * Inf = 0. This
3817should be set the same way for an entire pipeline. Note that this
3818applies not only to the literal MUL TGSI opcode, but all FP32
3819multiplications implied by other operations, such as MAD, FMA, DP2,
3820DP3, DP4, DST, LOG, LRP, and possibly others. If there is a
3821mismatch between shaders, then it is unspecified whether this behavior
3822will be enabled.
3823
3824FS_POST_DEPTH_COVERAGE
3825""""""""""""""""""""""
3826
3827When enabled, the input for TGSI_SEMANTIC_SAMPLEMASK will exclude samples
3828that have failed the depth/stencil tests. This is only valid when
3829FS_EARLY_DEPTH_STENCIL is also specified.
3830
3831LAYER_VIEWPORT_RELATIVE
3832"""""""""""""""""""""""
3833
3834When enabled, the TGSI_SEMATNIC_LAYER output value is relative to the
3835current viewport. This is especially useful in conjunction with
3836TGSI_SEMANTIC_VIEWPORT_MASK.
3837
3838
3839Texture Sampling and Texture Formats
3840------------------------------------
3841
3842This table shows how texture image components are returned as (x,y,z,w) tuples
3843by TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and
3844:opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as
3845well.
3846
3847+--------------------+--------------+--------------------+--------------+
3848| Texture Components | Gallium | OpenGL | Direct3D 9 |
3849+====================+==============+====================+==============+
3850| R | (r, 0, 0, 1) | (r, 0, 0, 1) | (r, 1, 1, 1) |
3851+--------------------+--------------+--------------------+--------------+
3852| RG | (r, g, 0, 1) | (r, g, 0, 1) | (r, g, 1, 1) |
3853+--------------------+--------------+--------------------+--------------+
3854| RGB | (r, g, b, 1) | (r, g, b, 1) | (r, g, b, 1) |
3855+--------------------+--------------+--------------------+--------------+
3856| RGBA | (r, g, b, a) | (r, g, b, a) | (r, g, b, a) |
3857+--------------------+--------------+--------------------+--------------+
3858| A | (0, 0, 0, a) | (0, 0, 0, a) | (0, 0, 0, a) |
3859+--------------------+--------------+--------------------+--------------+
3860| L | (l, l, l, 1) | (l, l, l, 1) | (l, l, l, 1) |
3861+--------------------+--------------+--------------------+--------------+
3862| LA | (l, l, l, a) | (l, l, l, a) | (l, l, l, a) |
3863+--------------------+--------------+--------------------+--------------+
3864| I | (i, i, i, i) | (i, i, i, i) | N/A |
3865+--------------------+--------------+--------------------+--------------+
3866| UV | XXX TBD | (0, 0, 0, 1) | (u, v, 1, 1) |
3867| | | [#envmap-bumpmap]_ | |
3868+--------------------+--------------+--------------------+--------------+
3869| Z | (z, z, z, z) | (z, z, z, 1) | (0, z, 0, 1) |
3870| | | [#depth-tex-mode]_ | |
3871+--------------------+--------------+--------------------+--------------+
3872| S | (s, s, s, s) | unknown | unknown |
3873+--------------------+--------------+--------------------+--------------+
3874
3875.. [#envmap-bumpmap] https://registry.khronos.org/OpenGL/extensions/ATI/ATI_envmap_bumpmap.txt
3876.. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z)
3877 or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE.
注意: 瀏覽 TracBrowser 來幫助您使用儲存庫瀏覽器

© 2025 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette