TestFloat-general.html@ 103075

最後變更在這個檔案從103075是 94551,由 vboxsync 提交於 3 年前
libs/softfloat: Copied TestFloat-3e from vendor branch and to testfloat subdir. bugref:9898
屬性 svn:eol-style 設為 `native` 屬性 svn:mime-type 設為 `text/html`
檔案大小: 41.5 KB

行
1
2	<HTML>
3
4	<HEAD>
5	<TITLE>Berkeley TestFloat General Documentation</TITLE>
6	</HEAD>
7
8	<BODY>
9
10	<H1>Berkeley TestFloat Release 3e: General Documentation</H1>
11
12	<P>
13	John R. Hauser<BR>
14	2018 January 20<BR>
15	</P>
16
17
18	<H2>Contents</H2>
19
20	<BLOCKQUOTE>
21	<TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0>
22	<COL WIDTH=25>
23	<COL WIDTH=*>
24	<TR><TD COLSPAN=2>1. Introduction</TD></TR>
25	<TR><TD COLSPAN=2>2. Limitations</TD></TR>
26	<TR><TD COLSPAN=2>3. Acknowledgments and License</TD></TR>
27	<TR><TD COLSPAN=2>4. What TestFloat Does</TD></TR>
28	<TR><TD COLSPAN=2>5. Executing TestFloat</TD></TR>
29	<TR><TD COLSPAN=2>6. Operations Tested by TestFloat</TD></TR>
30	<TR><TD></TD><TD>6.1. Conversion Operations</TD></TR>
31	<TR><TD></TD><TD>6.2. Basic Arithmetic Operations</TD></TR>
32	<TR><TD></TD><TD>6.3. Fused Multiply-Add Operations</TD></TR>
33	<TR><TD></TD><TD>6.4. Remainder Operations</TD></TR>
34	<TR><TD></TD><TD>6.5. Round-to-Integer Operations</TD></TR>
35	<TR><TD></TD><TD>6.6. Comparison Operations</TD></TR>
36	<TR><TD COLSPAN=2>7. Interpreting TestFloat Output</TD></TR>
37	<TR>
38	<TD COLSPAN=2>8. Variations Allowed by the IEEE Floating-Point Standard</TD>
39	</TR>
40	<TR><TD></TD><TD>8.1. Underflow</TD></TR>
41	<TR><TD></TD><TD>8.2. NaNs</TD></TR>
42	<TR><TD></TD><TD>8.3. Conversions to Integer</TD></TR>
43	<TR><TD COLSPAN=2>9. Contact Information</TD></TR>
44	</TABLE>
45	</BLOCKQUOTE>
46
47
48	<H2>1. Introduction</H2>
49
50	<P>
51	Berkeley TestFloat is a small collection of programs for testing that an
52	implementation of binary floating-point conforms to the IEEE Standard for
53	Floating-Point Arithmetic.
54	All operations required by the original 1985 version of the IEEE Floating-Point
55	Standard can be tested, except for conversions to and from decimal.
56	With the current release, the following binary formats can be tested:
57	<NOBR>16-bit</NOBR> half-precision, <NOBR>32-bit</NOBR> single-precision,
58	<NOBR>64-bit</NOBR> double-precision, <NOBR>80-bit</NOBR>
59	double-extended-precision, and/or <NOBR>128-bit</NOBR> quadruple-precision.
60	TestFloat cannot test decimal floating-point.
61	</P>
62
63	<P>
64	Included in the TestFloat package are the <CODE>testsoftfloat</CODE> and
65	<CODE>timesoftfloat</CODE> programs for testing the Berkeley SoftFloat software
66	implementation of floating-point and for measuring its speed.
67	Information about SoftFloat can be found at the SoftFloat Web page,
68	<A HREF="http://www.jhauser.us/arithmetic/SoftFloat.html"><NOBR><CODE>http://www.jhauser.us/arithmetic/SoftFloat.html</CODE></NOBR></A>.
69	The <CODE>testsoftfloat</CODE> and <CODE>timesoftfloat</CODE> programs are
70	expected to be of interest only to people compiling the SoftFloat sources.
71	</P>
72
73	<P>
74	This document explains how to use the TestFloat programs.
75	It does not attempt to define or explain much of the IEEE Floating-Point
76	Standard.
77	Details about the standard are available elsewhere.
78	</P>
79
80	<P>
81	The current version of TestFloat is <NOBR>Release 3e</NOBR>.
82	This version differs from earlier releases 3b through 3d in only minor ways.
83	Compared to the original <NOBR>Release 3</NOBR>:
84	<UL>
85	<LI>
86	<NOBR>Release 3b</NOBR> added the ability to test the <NOBR>16-bit</NOBR>
87	half-precision format.
88	<LI>
89	<NOBR>Release 3c</NOBR> added the ability to test a rarely used rounding mode,
90	<I>round to odd</I>, also known as <I>jamming</I>.
91	<LI>
92	<NOBR>Release 3d</NOBR> modified the code for testing C arithmetic to
93	potentially include testing newer library functions <CODE>sqrtf</CODE>,
94	<CODE>sqrtl</CODE>, <CODE>fmaf</CODE>, <CODE>fma</CODE>, and <CODE>fmal</CODE>.
95	</UL>
96	This release adds a few more small improvements, including modifying the
97	expected behavior of rounding mode <CODE>odd</CODE> and fixing a minor bug in
98	the all-in-one <CODE>testfloat</CODE> program.
99	</P>
100
101	<P>
102	Compared to Release 2c and earlier, the set of TestFloat programs, as well as
103	the programs’ arguments and behavior, changed some with
104	<NOBR>Release 3</NOBR>.
105	For more about the evolution of TestFloat releases, see
106	<A HREF="TestFloat-history.html"><NOBR><CODE>TestFloat-history.html</CODE></NOBR></A>.
107	</P>
108
109
110	<H2>2. Limitations</H2>
111
112	<P>
113	TestFloat output is not always easily interpreted.
114	Detailed knowledge of the IEEE Floating-Point Standard and its vagaries is
115	needed to use TestFloat responsibly.
116	</P>
117
118	<P>
119	TestFloat performs relatively simple tests designed to check the fundamental
120	soundness of the floating-point under test.
121	TestFloat may also at times manage to find rarer and more subtle bugs, but it
122	will probably only find such bugs by chance.
123	Software that purposefully seeks out various kinds of subtle floating-point
124	bugs can be found through links posted on the TestFloat Web page,
125	<A HREF="http://www.jhauser.us/arithmetic/TestFloat.html"><NOBR><CODE>http://www.jhauser.us/arithmetic/TestFloat.html</CODE></NOBR></A>.
126	</P>
127
128
129	<H2>3. Acknowledgments and License</H2>
130
131	<P>
132	The TestFloat package was written by me, <NOBR>John R.</NOBR> Hauser.
133	<NOBR>Release 3</NOBR> of TestFloat was a completely new implementation
134	supplanting earlier releases.
135	The project to create <NOBR>Release 3</NOBR> (now <NOBR>through 3e</NOBR>) was
136	done in the employ of the University of California, Berkeley, within the
137	Department of Electrical Engineering and Computer Sciences, first for the
138	Parallel Computing Laboratory (Par Lab) and then for the ASPIRE Lab.
139	The work was officially overseen by Prof. Krste Asanovic, with funding provided
140	by these sources:
141	<BLOCKQUOTE>
142	<TABLE>
143	<COL>
144	<COL WIDTH=10>
145	<COL>
146	<TR>
147	<TD VALIGN=TOP><NOBR>Par Lab:</NOBR></TD>
148	<TD></TD>
149	<TD>
150	Microsoft (Award #024263), Intel (Award #024894), and U.C. Discovery
151	(Award #DIG07-10227), with additional support from Par Lab affiliates Nokia,
152	NVIDIA, Oracle, and Samsung.
153	</TD>
154	</TR>
155	<TR>
156	<TD VALIGN=TOP><NOBR>ASPIRE Lab:</NOBR></TD>
157	<TD></TD>
158	<TD>
159	DARPA PERFECT program (Award #HR0011-12-2-0016), with additional support from
160	ASPIRE industrial sponsor Intel and ASPIRE affiliates Google, Nokia, NVIDIA,
161	Oracle, and Samsung.
162	</TD>
163	</TR>
164	</TABLE>
165	</BLOCKQUOTE>
166	</P>
167
168	<P>
169	The following applies to the whole of TestFloat <NOBR>Release 3e</NOBR> as well
170	as to each source file individually.
171	</P>
172
173	<P>
174	Copyright 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018 The Regents of the
175	University of California.
176	All rights reserved.
177	</P>
178
179	<P>
180	Redistribution and use in source and binary forms, with or without
181	modification, are permitted provided that the following conditions are met:
182	<OL>
183
184	<LI>
185	<P>
186	Redistributions of source code must retain the above copyright notice, this
187	list of conditions, and the following disclaimer.
188	</P>
189
190	<LI>
191	<P>
192	Redistributions in binary form must reproduce the above copyright notice, this
193	list of conditions, and the following disclaimer in the documentation and/or
194	other materials provided with the distribution.
195	</P>
196
197	<LI>
198	<P>
199	Neither the name of the University nor the names of its contributors may be
200	used to endorse or promote products derived from this software without specific
201	prior written permission.
202	</P>
203
204	</OL>
205	</P>
206
207	<P>
208	THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS “AS IS”,
209	AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
210	IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, ARE
211	DISCLAIMED.
212	IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
213	INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
214	BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
215	DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
216	LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
217	OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
218	ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
219	</P>
220
221
222	<H2>4. What TestFloat Does</H2>
223
224	<P>
225	TestFloat is designed to test a floating-point implementation by comparing its
226	behavior with that of TestFloat’s own internal floating-point implemented
227	in software.
228	For each operation to be tested, the TestFloat programs can generate a large
229	number of test cases, made up of simple pattern tests intermixed with weighted
230	random inputs.
231	The cases generated should be adequate for testing carry chain propagations,
232	and the rounding of addition, subtraction, multiplication, and simple
233	operations like conversions.
234	TestFloat makes a point of checking all boundary cases of the arithmetic,
235	including underflows, overflows, invalid operations, subnormal inputs, zeros
236	(positive and negative), infinities, and NaNs.
237	For the interesting operations like addition and multiplication, millions of
238	test cases may be checked.
239	</P>
240
241	<P>
242	TestFloat is not remarkably good at testing difficult rounding cases for
243	division and square root.
244	It also makes no attempt to find bugs specific to SRT division and the like
245	(such as the infamous Pentium division bug).
246	Software that tests for such failures can be found through links on the
247	TestFloat Web page,
248	<A HREF="http://www.jhauser.us/arithmetic/TestFloat.html"><NOBR><CODE>http://www.jhauser.us/arithmetic/TestFloat.html</CODE></NOBR></A>.
249	</P>
250
251	<P>
252	NOTE!<BR>
253	It is the responsibility of the user to verify that the discrepancies TestFloat
254	finds actually represent faults in the implementation being tested.
255	Advice to help with this task is provided later in this document.
256	Furthermore, even if TestFloat finds no fault with a floating-point
257	implementation, that in no way guarantees that the implementation is bug-free.
258	</P>
259
260	<P>
261	For each operation, TestFloat can test all five rounding modes defined by the
262	IEEE Floating-Point Standard, plus possibly a sixth mode, <I>round to odd</I>
263	(depending on the options selected when TestFloat was built).
264	TestFloat verifies not only that the numeric results of an operation are
265	correct, but also that the proper floating-point exception flags are raised.
266	All five exception flags are tested, including the <I>inexact</I> flag.
267	TestFloat does not attempt to verify that the floating-point exception flags
268	are actually implemented as sticky flags.
269	</P>
270
271	<P>
272	For the <NOBR>80-bit</NOBR> double-extended-precision format, TestFloat can
273	test the addition, subtraction, multiplication, division, and square root
274	operations at all three of the standard rounding precisions.
275	The rounding precision can be set to <NOBR>32 bits</NOBR>, equivalent to
276	single-precision, to <NOBR>64 bits</NOBR>, equivalent to double-precision, or
277	to the full <NOBR>80 bits</NOBR> of the double-extended-precision.
278	Rounding precision control can be applied only to the double-extended-precision
279	format and only for the five basic arithmetic operations: addition,
280	subtraction, multiplication, division, and square root.
281	Other operations can be tested only at full precision.
282	</P>
283
284	<P>
285	As a rule, TestFloat is not particular about the bit patterns of NaNs that
286	appear as operation results.
287	Any NaN is considered as good a result as another.
288	This laxness can be overridden so that TestFloat checks for particular bit
289	patterns within NaN results.
290	See <NOBR>section 8</NOBR> below, <I>Variations Allowed by the IEEE
291	Floating-Point Standard</I>, plus the <CODE>-checkNaNs</CODE> and
292	<CODE>-checkInvInts</CODE> options documented for programs
293	<CODE>testfloat_ver</CODE> and <CODE>testfloat</CODE>.
294	</P>
295
296	<P>
297	TestFloat normally compares an implementation of floating-point against the
298	Berkeley SoftFloat software implementation of floating-point, also created by
299	me.
300	The SoftFloat functions are linked into each TestFloat program’s
301	executable.
302	Information about SoftFloat can be found at the Web page
303	<A HREF="http://www.jhauser.us/arithmetic/SoftFloat.html"><NOBR><CODE>http://www.jhauser.us/arithmetic/SoftFloat.html</CODE></NOBR></A>.
304	</P>
305
306	<P>
307	For testing SoftFloat itself, the TestFloat package includes a
308	<CODE>testsoftfloat</CODE> program that compares SoftFloat’s
309	floating-point against <EM>another</EM> software floating-point implementation.
310	The second software floating-point is simpler and slower than SoftFloat, and is
311	completely independent of SoftFloat.
312	Although the second software floating-point cannot be guaranteed to be
313	bug-free, the chance that it would mimic any of SoftFloat’s bugs is low.
314	Consequently, an error in one or the other floating-point version should appear
315	as an unexpected difference between the two implementations.
316	Note that testing SoftFloat should be necessary only when compiling a new
317	TestFloat executable or when compiling SoftFloat for some other reason.
318	</P>
319
320
321	<H2>5. Executing TestFloat</H2>
322
323	<P>
324	The TestFloat package consists of five programs, all intended to be executed
325	from a command-line interpreter:
326	<BLOCKQUOTE>
327	<TABLE>
328	<TR>
329	<TD>
330	<A HREF="testfloat_gen.html"><CODE>testfloat_gen</CODE></A><CODE>   </CODE>
331	</TD>
332	<TD>
333	Generates test cases for a specific floating-point operation.
334	</TD>
335	</TR>
336	<TR>
337	<TD>
338	<A HREF="testfloat_ver.html"><CODE>testfloat_ver</CODE></A>
339	</TD>
340	<TD>
341	Verifies whether the results from executing a floating-point operation are as
342	expected.
343	</TD>
344	</TR>
345	<TR>
346	<TD>
347	<A HREF="testfloat.html"><CODE>testfloat</CODE></A>
348	</TD>
349	<TD>
350	An all-in-one program that generates test cases, executes floating-point
351	operations, and verifies whether the results match expectations.
352	</TD>
353	</TR>
354	<TR>
355	<TD>
356	<A HREF="testsoftfloat.html"><CODE>testsoftfloat</CODE></A><CODE>   </CODE>
357	</TD>
358	<TD>
359	Like <CODE>testfloat</CODE>, but for testing SoftFloat.
360	</TD>
361	</TR>
362	<TR>
363	<TD>
364	<A HREF="timesoftfloat.html"><CODE>timesoftfloat</CODE></A><CODE>   </CODE>
365	</TD>
366	<TD>
367	A program for measuring the speed of SoftFloat (included in the TestFloat
368	package for convenience).
369	</TD>
370	</TR>
371	</TABLE>
372	</BLOCKQUOTE>
373	Each program has its own page of documentation that can be opened through the
374	links in the table above.
375	</P>
376
377	<P>
378	To test a floating-point implementation other than SoftFloat, one of three
379	different methods can be used.
380	The first method pipes output from <CODE>testfloat_gen</CODE> to a program
381	that:
382	<NOBR>(a) reads</NOBR> the incoming test cases, <NOBR>(b) invokes</NOBR> the
383	floating-point operation being tested, and <NOBR>(c) writes</NOBR> the
384	operation results to output.
385	These results can then be piped to <CODE>testfloat_ver</CODE> to be checked for
386	correctness.
387	Assuming a vertical bar (<CODE>\|</CODE>) indicates a pipe between programs, the
388	complete process could be written as a single command like so:
389	<BLOCKQUOTE>
390	<PRE>
391	testfloat_gen ... <<I>type</I>> \| <<I>program-that-invokes-op</I>> \| testfloat_ver ... <<I>function</I>>
392	</PRE>
393	</BLOCKQUOTE>
394	The program in the middle is not supplied by TestFloat but must be created
395	independently.
396	If for some reason this program cannot take command-line arguments, the
397	<CODE>-prefix</CODE> option of <CODE>testfloat_gen</CODE> can communicate
398	parameters through the pipe.
399	</P>
400
401	<P>
402	A second method for running TestFloat is similar but has
403	<CODE>testfloat_gen</CODE> supply not only the test inputs but also the
404	expected results for each case.
405	With this additional information, the job done by <CODE>testfloat_ver</CODE>
406	can be folded into the invoking program to give the following command:
407	<BLOCKQUOTE>
408	<PRE>
409	testfloat_gen ... <<I>function</I>> \| <<I>program-that-invokes-op-and-compares-results</I>>
410	</PRE>
411	</BLOCKQUOTE>
412	Again, the program that actually invokes the floating-point operation is not
413	supplied by TestFloat but must be created independently.
414	Depending on circumstance, it may be preferable either to let
415	<CODE>testfloat_ver</CODE> check and report suspected errors (first method) or
416	to include this step in the invoking program (second method).
417	</P>
418
419	<P>
420	The third way to use TestFloat is the all-in-one <CODE>testfloat</CODE>
421	program.
422	This program can perform all the steps of creating test cases, invoking the
423	floating-point operation, checking the results, and reporting suspected errors.
424	However, for this to be possible, <CODE>testfloat</CODE> must be compiled to
425	contain the method for invoking the floating-point operations to test.
426	Each build of <CODE>testfloat</CODE> is therefore capable of testing
427	<EM>only</EM> the floating-point implementation it was built to invoke.
428	To test a new implementation of floating-point, a new <CODE>testfloat</CODE>
429	must be created, linked to that specific implementation.
430	By comparison, the <CODE>testfloat_gen</CODE> and <CODE>testfloat_ver</CODE>
431	programs are entirely generic;
432	one instance is usable for testing any floating-point implementation, because
433	implementation-specific details are segregated in the custom program that
434	follows <CODE>testfloat_gen</CODE>.
435	</P>
436
437	<P>
438	Program <CODE>testsoftfloat</CODE> is another all-in-one program specifically
439	for testing SoftFloat.
440	</P>
441
442	<P>
443	Programs <CODE>testfloat_ver</CODE>, <CODE>testfloat</CODE>, and
444	<CODE>testsoftfloat</CODE> all report status and error information in a common
445	way.
446	As it executes, each of these programs writes status information to the
447	standard error output, which should be the screen by default.
448	In order for this status to be displayed properly, the standard error stream
449	should not be redirected to a file.
450	Any discrepancies that are found are written to the standard output stream,
451	which is easily redirected to a file if desired.
452	Unless redirected, reported errors will appear intermixed with the ongoing
453	status information in the output.
454	</P>
455
456
457	<H2>6. Operations Tested by TestFloat</H2>
458
459	<P>
460	TestFloat can test all operations required by the original 1985 IEEE
461	Floating-Point Standard except for conversions to and from decimal.
462	These operations are:
463	<UL>
464	<LI>
465	conversions among the supported floating-point formats, and also between
466	integers (<NOBR>32-bit</NOBR> and <NOBR>64-bit</NOBR>, signed and unsigned) and
467	any of the floating-point formats;
468	<LI>
469	for each floating-point format, the usual addition, subtraction,
470	multiplication, division, and square root operations;
471	<LI>
472	for each format, the floating-point remainder operation defined by the IEEE
473	Standard;
474	<LI>
475	for each format, a “round to integer” operation that rounds to the
476	nearest integer value in the same format; and
477	<LI>
478	comparisons between two values in the same floating-point format.
479	</UL>
480	In addition, TestFloat can also test
481	<UL>
482	<LI>
483	for each floating-point format except <NOBR>80-bit</NOBR>
484	double-extended-precision, the fused multiply-add operation defined by the 2008
485	IEEE Standard.
486	</UL>
487	</P>
488
489	<P>
490	More information about all these operations is given below.
491	In the operation names used by TestFloat, <NOBR>16-bit</NOBR> half-precision is
492	called <CODE>f16</CODE>, <NOBR>32-bit</NOBR> single-precision is
493	<CODE>f32</CODE>, <NOBR>64-bit</NOBR> double-precision is <CODE>f64</CODE>,
494	<NOBR>80-bit</NOBR> double-extended-precision is <CODE>extF80</CODE>, and
495	<NOBR>128-bit</NOBR> quadruple-precision is <CODE>f128</CODE>.
496	TestFloat generally uses the same names for operations as Berkeley SoftFloat,
497	except that TestFloat’s names never include the <CODE>M</CODE> that
498	SoftFloat uses to indicate that values are passed through pointers.
499	</P>
500
501	<H3>6.1. Conversion Operations</H3>
502
503	<P>
504	All conversions among the floating-point formats and all conversions between a
505	floating-point format and <NOBR>32-bit</NOBR> and <NOBR>64-bit</NOBR> integers
506	can be tested.
507	The conversion operations are:
508	<BLOCKQUOTE>
509	<PRE>
510	ui32_to_f16 ui64_to_f16 i32_to_f16 i64_to_f16
511	ui32_to_f32 ui64_to_f32 i32_to_f32 i64_to_f32
512	ui32_to_f64 ui64_to_f64 i32_to_f64 i64_to_f64
513	ui32_to_extF80 ui64_to_extF80 i32_to_extF80 i64_to_extF80
514	ui32_to_f128 ui64_to_f128 i32_to_f128 i64_to_f128
515
516	f16_to_ui32 f32_to_ui32 f64_to_ui32 extF80_to_ui32 f128_to_ui32
517	f16_to_ui64 f32_to_ui64 f64_to_ui64 extF80_to_ui64 f128_to_ui64
518	f16_to_i32 f32_to_i32 f64_to_i32 extF80_to_i32 f128_to_i32
519	f16_to_i64 f32_to_i64 f64_to_i64 extF80_to_i64 f128_to_i64
520
521	f16_to_f32 f32_to_f16 f64_to_f16 extF80_to_f16 f128_to_f16
522	f16_to_f64 f32_to_f64 f64_to_f32 extF80_to_f32 f128_to_f32
523	f16_to_extF80 f32_to_extF80 f64_to_extF80 extF80_to_f64 f128_to_f64
524	f16_to_f128 f32_to_f128 f64_to_f128 extF80_to_f128 f128_to_extF80
525	</PRE>
526	</BLOCKQUOTE>
527	Abbreviations <CODE>ui32</CODE> and <CODE>ui64</CODE> indicate
528	<NOBR>32-bit</NOBR> and <NOBR>64-bit</NOBR> unsigned integer types, while
529	<CODE>i32</CODE> and <CODE>i64</CODE> indicate their signed counterparts.
530	These conversions all round according to the current rounding mode as relevant.
531	Conversions from a smaller to a larger floating-point format are always exact
532	and so require no rounding.
533	Likewise, conversions from <NOBR>32-bit</NOBR> integers to <NOBR>64-bit</NOBR>
534	double-precision or to any larger floating-point format are also exact, as are
535	conversions from <NOBR>64-bit</NOBR> integers to <NOBR>80-bit</NOBR>
536	double-extended-precision and <NOBR>128-bit</NOBR> quadruple-precision.
537	</P>
538
539	<P>
540	For the all-in-one <CODE>testfloat</CODE> program, this list of conversion
541	operations requires amendment.
542	For <CODE>testfloat</CODE> only, conversions to an integer type have names that
543	explicitly specify the rounding mode and treatment of inexactness.
544	Thus, instead of
545	<BLOCKQUOTE>
546	<PRE>
547	<<I>float</I>>_to_<<I>int</I>>
548	</PRE>
549	</BLOCKQUOTE>
550	as listed above, operations converting to integer type have names of these
551	forms:
552	<BLOCKQUOTE>
553	<PRE>
554	<<I>float</I>>_to_<<I>int</I>>_r_<<I>round</I>>
555	<<I>float</I>>_to_<<I>int</I>>_rx_<<I>round</I>>
556	</PRE>
557	</BLOCKQUOTE>
558	The <CODE><<I>round</I>></CODE> component is one of
559	‘<CODE>near_even</CODE>’, ‘<CODE>near_maxMag</CODE>’,
560	‘<CODE>minMag</CODE>’, ‘<CODE>min</CODE>’, or
561	‘<CODE>max</CODE>’, choosing the rounding mode.
562	Any other indication of rounding mode is ignored.
563	The operations with ‘<CODE>_r_</CODE>’ in their names never raise
564	the <I>inexact</I> exception, while those with ‘<CODE>_rx_</CODE>’
565	raise the <I>inexact</I> exception whenever the result is not exact.
566	</P>
567
568	<P>
569	TestFloat assumes that conversions from floating-point to an integer type
570	should raise the <I>invalid</I> exception if the input cannot be rounded to an
571	integer representable in the result format.
572	In such a circumstance:
573	<UL>
574
575	<LI>
576	<P>
577	If the result type is an unsigned integer, TestFloat normally expects the
578	result of the operation to be the type’s largest integer value.
579	In the case that the input is a negative number (not a NaN), a zero result may
580	also be accepted.
581	</P>
582
583	<LI>
584	<P>
585	If the result type is a signed integer and the input is a number (not a NaN),
586	TestFloat expects the result to be the largest-magnitude integer with the same
587	sign as the input.
588	When a NaN is converted to a signed integer type, TestFloat allows either the
589	largest postive or largest-magnitude negative integer to be returned.
590	</P>
591
592	</UL>
593	Conversions to integer types are expected never to raise the <I>overflow</I>
594	exception.
595	</P>
596
597	<H3>6.2. Basic Arithmetic Operations</H3>
598
599	<P>
600	The following standard arithmetic operations can be tested:
601	<BLOCKQUOTE>
602	<PRE>
603	f16_add f16_sub f16_mul f16_div f16_sqrt
604	f32_add f32_sub f32_mul f32_div f32_sqrt
605	f64_add f64_sub f64_mul f64_div f64_sqrt
606	extF80_add extF80_sub extF80_mul extF80_div extF80_sqrt
607	f128_add f128_sub f128_mul f128_div f128_sqrt
608	</PRE>
609	</BLOCKQUOTE>
610	The double-extended-precision (<CODE>extF80</CODE>) operations can be rounded
611	to reduced precision under rounding precision control.
612	</P>
613
614	<H3>6.3. Fused Multiply-Add Operations</H3>
615
616	<P>
617	For all floating-point formats except <NOBR>80-bit</NOBR>
618	double-extended-precision, TestFloat can test the fused multiply-add operation
619	defined by the 2008 IEEE Floating-Point Standard.
620	The fused multiply-add operations are:
621	<BLOCKQUOTE>
622	<PRE>
623	f16_mulAdd
624	f32_mulAdd
625	f64_mulAdd
626	f128_mulAdd
627	</PRE>
628	</BLOCKQUOTE>
629	</P>
630
631	<P>
632	If one of the multiplication operands is infinite and the other is zero,
633	TestFloat expects the fused multiply-add operation to raise the <I>invalid</I>
634	exception even if the third operand is a quiet NaN.
635	</P>
636
637	<H3>6.4. Remainder Operations</H3>
638
639	<P>
640	For each format, TestFloat can test the IEEE Standard’s remainder
641	operation.
642	These operations are:
643	<BLOCKQUOTE>
644	<PRE>
645	f16_rem
646	f32_rem
647	f64_rem
648	extF80_rem
649	f128_rem
650	</PRE>
651	</BLOCKQUOTE>
652	The remainder operations are always exact and so require no rounding.
653	</P>
654
655	<H3>6.5. Round-to-Integer Operations</H3>
656
657	<P>
658	For each format, TestFloat can test the IEEE Standard’s round-to-integer
659	operation.
660	For most TestFloat programs, these operations are:
661	<BLOCKQUOTE>
662	<PRE>
663	f16_roundToInt
664	f32_roundToInt
665	f64_roundToInt
666	extF80_roundToInt
667	f128_roundToInt
668	</PRE>
669	</BLOCKQUOTE>
670	</P>
671
672	<P>
673	Just as for conversions to integer types (<NOBR>section 6.1</NOBR> above), the
674	all-in-one <CODE>testfloat</CODE> program is again an exception.
675	For <CODE>testfloat</CODE> only, the round-to-integer operations have names of
676	these forms:
677	<BLOCKQUOTE>
678	<PRE>
679	<<I>float</I>>_roundToInt_r_<<I>round</I>>
680	<<I>float</I>>_roundToInt_x
681	</PRE>
682	</BLOCKQUOTE>
683	For the ‘<CODE>_r_</CODE>’ versions, the <I>inexact</I> exception
684	is never raised, and the <CODE><<I>round</I>></CODE> component specifies
685	the rounding mode as one of ‘<CODE>near_even</CODE>’,
686	‘<CODE>near_maxMag</CODE>’, ‘<CODE>minMag</CODE>’,
687	‘<CODE>min</CODE>’, or ‘<CODE>max</CODE>’.
688	The usual indication of rounding mode is ignored.
689	In contrast, the ‘<CODE>_x</CODE>’ versions accept the usual
690	indication of rounding mode and raise the <I>inexact</I> exception whenever the
691	result is not exact.
692	This irregular system follows the IEEE Standard’s particular
693	specification for the round-to-integer operations.
694	</P>
695
696	<H3>6.6. Comparison Operations</H3>
697
698	<P>
699	The following floating-point comparison operations can be tested:
700	<BLOCKQUOTE>
701	<PRE>
702	f16_eq f16_le f16_lt
703	f32_eq f32_le f32_lt
704	f64_eq f64_le f64_lt
705	extF80_eq extF80_le extF80_lt
706	f128_eq f128_le f128_lt
707	</PRE>
708	</BLOCKQUOTE>
709	The abbreviation <CODE>eq</CODE> stands for “equal” (=),
710	<CODE>le</CODE> stands for “less than or equal” (≤), and
711	<CODE>lt</CODE> stands for “less than” (<).
712	</P>
713
714	<P>
715	The IEEE Standard specifies that, by default, the less-than-or-equal and
716	less-than comparisons raise the <I>invalid</I> exception if either input is any
717	kind of NaN.
718	The equality comparisons, on the other hand, are defined by default to raise
719	the <I>invalid</I> exception only for signaling NaNs, not for quiet NaNs.
720	For completeness, the following additional operations can be tested if
721	supported:
722	<BLOCKQUOTE>
723	<PRE>
724	f16_eq_signaling f16_le_quiet f16_lt_quiet
725	f32_eq_signaling f32_le_quiet f32_lt_quiet
726	f64_eq_signaling f64_le_quiet f64_lt_quiet
727	extF80_eq_signaling extF80_le_quiet extF80_lt_quiet
728	f128_eq_signaling f128_le_quiet f128_lt_quiet
729	</PRE>
730	</BLOCKQUOTE>
731	The <CODE>signaling</CODE> equality comparisons are identical to the standard
732	operations except that the <I>invalid</I> exception should be raised for any
733	NaN input.
734	Similarly, the <CODE>quiet</CODE> comparison operations should be identical to
735	their counterparts except that the <I>invalid</I> exception is not raised for
736	quiet NaNs.
737	</P>
738
739	<P>
740	Obviously, no comparison operations ever require rounding.
741	Any rounding mode is ignored.
742	</P>
743
744
745	<H2>7. Interpreting TestFloat Output</H2>
746
747	<P>
748	The “errors” reported by TestFloat programs may or may not really
749	represent errors in the system being tested.
750	For each test case tried, the results from the floating-point implementation
751	being tested could differ from the expected results for several reasons:
752	<UL>
753	<LI>
754	The IEEE Floating-Point Standard allows for some variation in how conforming
755	floating-point behaves.
756	Two implementations can sometimes give different results without either being
757	incorrect.
758	<LI>
759	The trusted floating-point emulation could be faulty.
760	This could be because there is a bug in the way the emulation is coded, or
761	because a mistake was made when the code was compiled for the current system.
762	<LI>
763	The TestFloat program may not work properly, reporting differences that do not
764	exist.
765	<LI>
766	Lastly, the floating-point being tested could actually be faulty.
767	</UL>
768	It is the responsibility of the user to determine the causes for the
769	discrepancies that are reported.
770	Making this determination can require detailed knowledge about the IEEE
771	Standard.
772	Assuming TestFloat is working properly, any differences found will be due to
773	either the first or last of the reasons above.
774	Variations in the IEEE Standard that could lead to false error reports are
775	discussed in <NOBR>section 8</NOBR>, <I>Variations Allowed by the IEEE
776	Floating-Point Standard</I>.
777	</P>
778
779	<P>
780	For each reported error (or apparent error), a line of text is written to the
781	default output.
782	If a line would be longer than 79 characters, it is divided.
783	The first part of each error line begins in the leftmost column, and any
784	subsequent “continuation” lines are indented with a tab.
785	</P>
786
787	<P>
788	Each error reported is of the form:
789	<BLOCKQUOTE>
790	<PRE>
791	<<I>inputs</I>> => <<I>observed-output</I>> expected: <<I>expected-output</I>>
792	</PRE>
793	</BLOCKQUOTE>
794	The <CODE><<I>inputs</I>></CODE> are the inputs to the operation.
795	Each output (observed or expected) is shown as a pair: the result value first,
796	followed by the exception flags.
797	</P>
798
799	<P>
800	For example, two typical error lines could be
801	<BLOCKQUOTE>
802	<PRE>
803	-00.7FFF00 -7F.000100 => +01.000000 ...ux expected: +01.000000 ....x
804	+81.000004 +00.1FFFFF => +01.000000 ...ux expected: +01.000000 ....x
805	</PRE>
806	</BLOCKQUOTE>
807	In the first line, the inputs are <CODE>-00.7FFF00</CODE> and
808	<CODE>-7F.000100</CODE>, and the observed result is <CODE>+01.000000</CODE>
809	with flags <CODE>...ux</CODE>.
810	The trusted emulation result is the same but with different flags,
811	<CODE>....x</CODE>.
812	Items such as <CODE>-00.7FFF00</CODE> composed of a sign character
813	<NOBR>(<CODE>+</CODE>/<CODE>-</CODE>)</NOBR>, hexadecimal digits, and a single
814	period represent floating-point values (here <NOBR>32-bit</NOBR>
815	single-precision).
816	The two instances above were reported as errors because the exception flag
817	results differ.
818	</P>
819
820	<P>
821	Aside from the exception flags, there are ten data types that may be
822	represented.
823	Five are floating-point types: <NOBR>16-bit</NOBR> half-precision,
824	<NOBR>32-bit</NOBR> single-precision, <NOBR>64-bit</NOBR> double-precision,
825	<NOBR>80-bit</NOBR> double-extended-precision, and <NOBR>128-bit</NOBR>
826	quadruple-precision.
827	The remaining five types are <NOBR>32-bit</NOBR> and <NOBR>64-bit</NOBR>
828	unsigned integers, <NOBR>32-bit</NOBR> and <NOBR>64-bit</NOBR>
829	two’s-complement signed integers, and Boolean values (the results of
830	comparison operations).
831	Boolean values are represented as a single character, either a <CODE>0</CODE>
832	(false) or a <CODE>1</CODE> (true).
833	A <NOBR>32-bit</NOBR> integer is represented as 8 hexadecimal digits.
834	Thus, for a signed <NOBR>32-bit</NOBR> integer, <CODE>FFFFFFFF</CODE> is
835	−1, and <CODE>7FFFFFFF</CODE> is the largest positive value.
836	<NOBR>64-bit</NOBR> integers are the same except with 16 hexadecimal digits.
837	</P>
838
839	<P>
840	Floating-point values are written decomposed into their sign, encoded exponent,
841	and encoded significand.
842	First is the sign character <NOBR>(<CODE>+</CODE> or <CODE>-</CODE>),</NOBR>
843	followed by the encoded exponent in hexadecimal, then a period
844	(<CODE>.</CODE>), and lastly the encoded significand in hexadecimal.
845	</P>
846
847	<P>
848	For <NOBR>16-bit</NOBR> half-precision, notable values include:
849	<BLOCKQUOTE>
850	<TABLE CELLSPACING=0 CELLPADDING=0>
851	<TR><TD><CODE>+00.000    </CODE></TD><TD>+0</TD></TR>
852	<TR><TD><CODE>+0F.000</CODE></TD><TD> 1</TD></TR>
853	<TR><TD><CODE>+10.000</CODE></TD><TD> 2</TD></TR>
854	<TR><TD><CODE>+1E.3FF</CODE></TD><TD>maximum finite value</TD></TR>
855	<TR><TD><CODE>+1F.000</CODE></TD><TD>+infinity</TD></TR>
856	<TR><TD> </TD></TR>
857	<TR><TD><CODE>-00.000</CODE></TD><TD>−0</TD></TR>
858	<TR><TD><CODE>-0F.000</CODE></TD><TD>−1</TD></TR>
859	<TR><TD><CODE>-10.000</CODE></TD><TD>−2</TD></TR>
860	<TR>
861	<TD><CODE>-1E.3FF</CODE></TD>
862	<TD>minimum finite value (largest magnitude, but negative)</TD>
863	</TR>
864	<TR><TD><CODE>-1F.000</CODE></TD><TD>−infinity</TD></TR>
865	</TABLE>
866	</BLOCKQUOTE>
867	Certain categories are easily distinguished (assuming the <CODE>x</CODE>s are
868	not all 0):
869	<BLOCKQUOTE>
870	<TABLE CELLSPACING=0 CELLPADDING=0>
871	<TR>
872	<TD><CODE>+00.xxx    </CODE></TD>
873	<TD>positive subnormal numbers</TD>
874	</TR>
875	<TR><TD><CODE>+1F.xxx</CODE></TD><TD>positive NaNs</TD></TR>
876	<TR><TD><CODE>-00.xxx</CODE></TD><TD>negative subnormal numbers</TD></TR>
877	<TR><TD><CODE>-1F.xxx</CODE></TD><TD>negative NaNs</TD></TR>
878	</TABLE>
879	</BLOCKQUOTE>
880	</P>
881
882	<P>
883	Likewise for other formats:
884	<BLOCKQUOTE>
885	<TABLE CELLSPACING=0 CELLPADDING=0>
886	<TR><TD>32-bit single</TD><TD>64-bit double</TD><TD>128-bit quadruple</TD></TR>
887	<TR><TD> </TD></TR>
888	<TR>
889	<TD><CODE>+00.000000    </CODE></TD>
890	<TD><CODE>+000.0000000000000    </CODE></TD>
891	<TD><CODE>+0000.0000000000000000000000000000    </CODE></TD>
892	<TD>+0</TD>
893	</TR>
894	<TR>
895	<TD><CODE>+7F.000000</CODE></TD>
896	<TD><CODE>+3FF.0000000000000</CODE></TD>
897	<TD><CODE>+3FFF.0000000000000000000000000000</CODE></TD>
898	<TD> 1</TD>
899	</TR>
900	<TR>
901	<TD><CODE>+80.000000</CODE></TD>
902	<TD><CODE>+400.0000000000000</CODE></TD>
903	<TD><CODE>+4000.0000000000000000000000000000</CODE></TD>
904	<TD> 2</TD>
905	</TR>
906	<TR>
907	<TD><CODE>+FE.7FFFFF</CODE></TD>
908	<TD><CODE>+7FE.FFFFFFFFFFFFF</CODE></TD>
909	<TD><CODE>+7FFE.FFFFFFFFFFFFFFFFFFFFFFFFFFFF</CODE></TD>
910	<TD>maximum finite value</TD>
911	</TR>
912	<TR>
913	<TD><CODE>+FF.000000</CODE></TD>
914	<TD><CODE>+7FF.0000000000000</CODE></TD>
915	<TD><CODE>+7FFF.0000000000000000000000000000</CODE></TD>
916	<TD>+infinity</TD>
917	</TR>
918	<TR><TD> </TD></TR>
919	<TR>
920	<TD><CODE>-00.000000    </CODE></TD>
921	<TD><CODE>-000.0000000000000    </CODE></TD>
922	<TD><CODE>-0000.0000000000000000000000000000    </CODE></TD>
923	<TD>−0</TD>
924	</TR>
925	<TR>
926	<TD><CODE>-7F.000000</CODE></TD>
927	<TD><CODE>-3FF.0000000000000</CODE></TD>
928	<TD><CODE>-3FFF.0000000000000000000000000000</CODE></TD>
929	<TD>−1</TD>
930	</TR>
931	<TR>
932	<TD><CODE>-80.000000</CODE></TD>
933	<TD><CODE>-400.0000000000000</CODE></TD>
934	<TD><CODE>-4000.0000000000000000000000000000</CODE></TD>
935	<TD>−2</TD>
936	</TR>
937	<TR>
938	<TD><CODE>-FE.7FFFFF</CODE></TD>
939	<TD><CODE>-7FE.FFFFFFFFFFFFF</CODE></TD>
940	<TD><CODE>-7FFE.FFFFFFFFFFFFFFFFFFFFFFFFFFFF</CODE></TD>
941	<TD>minimum finite value</TD>
942	</TR>
943	<TR>
944	<TD><CODE>-FF.000000</CODE></TD>
945	<TD><CODE>-7FF.0000000000000</CODE></TD>
946	<TD><CODE>-7FFF.0000000000000000000000000000</CODE></TD>
947	<TD>−infinity</TD>
948	</TR>
949	<TR><TD> </TD></TR>
950	<TR>
951	<TD><CODE>+00.xxxxxx</CODE></TD>
952	<TD><CODE>+000.xxxxxxxxxxxxx</CODE></TD>
953	<TD><CODE>+0000.xxxxxxxxxxxxxxxxxxxxxxxxxxxx</CODE></TD>
954	<TD>positive subnormals</TD>
955	</TR>
956	<TR>
957	<TD><CODE>+FF.xxxxxx</CODE></TD>
958	<TD><CODE>+7FF.xxxxxxxxxxxxx</CODE></TD>
959	<TD><CODE>+7FFF.xxxxxxxxxxxxxxxxxxxxxxxxxxxx</CODE></TD>
960	<TD>positive NaNs</TD>
961	</TR>
962	<TR>
963	<TD><CODE>-00.xxxxxx</CODE></TD>
964	<TD><CODE>-000.xxxxxxxxxxxxx</CODE></TD>
965	<TD><CODE>-0000.xxxxxxxxxxxxxxxxxxxxxxxxxxxx</CODE></TD>
966	<TD>negative subnormals</TD>
967	</TR>
968	<TR>
969	<TD><CODE>-FF.xxxxxx</CODE></TD>
970	<TD><CODE>-7FF.xxxxxxxxxxxxx</CODE></TD>
971	<TD><CODE>-7FFF.xxxxxxxxxxxxxxxxxxxxxxxxxxxx</CODE></TD>
972	<TD>negative NaNs</TD>
973	</TR>
974	</TABLE>
975	</BLOCKQUOTE>
976	</P>
977
978	<P>
979	The <NOBR>80-bit</NOBR> double-extended-precision values are a little unusual
980	in that the leading bit of precision is not hidden as with other formats.
981	When canonically encoded, the leading significand bit of an <NOBR>80-bit</NOBR>
982	double-extended-precision value will be 0 if the value is zero or subnormal,
983	and will be 1 otherwise.
984	Hence, the same values listed above appear in <NOBR>80-bit</NOBR>
985	double-extended-precision as follows (note the leading <CODE>8</CODE> digit in
986	the significands):
987	<BLOCKQUOTE>
988	<TABLE CELLSPACING=0 CELLPADDING=0>
989	<TR>
990	<TD><CODE>+0000.0000000000000000    </CODE></TD>
991	<TD>+0</TD>
992	</TR>
993	<TR><TD><CODE>+3FFF.8000000000000000</CODE></TD><TD> 1</TD></TR>
994	<TR><TD><CODE>+4000.8000000000000000</CODE></TD><TD> 2</TD></TR>
995	<TR>
996	<TD><CODE>+7FFE.FFFFFFFFFFFFFFFF</CODE></TD>
997	<TD>maximum finite value</TD>
998	</TR>
999	<TR><TD><CODE>+7FFF.8000000000000000</CODE></TD><TD>+infinity</TD></TR>
1000	<TR><TD> </TD></TR>
1001	<TR><TD><CODE>-0000.0000000000000000</CODE></TD><TD>−0</TD></TR>
1002	<TR><TD><CODE>-3FFF.8000000000000000</CODE></TD><TD>−1</TD></TR>
1003	<TR><TD><CODE>-4000.8000000000000000</CODE></TD><TD>−2</TD></TR>
1004	<TR>
1005	<TD><CODE>-7FFE.FFFFFFFFFFFFFFFF</CODE></TD>
1006	<TD>minimum finite value</TD>
1007	</TR>
1008	<TR><TD><CODE>-7FFF.8000000000000000</CODE></TD><TD>−infinity</TD></TR>
1009	</TABLE>
1010	</BLOCKQUOTE>
1011	</P>
1012
1013	<P>
1014	Lastly, exception flag values are represented by five characters, one character
1015	per flag.
1016	Each flag is written as either a letter or a period (<CODE>.</CODE>) according
1017	to whether the flag was set or not by the operation.
1018	A period indicates the flag was not set.
1019	The letter used to indicate a set flag depends on the flag:
1020	<BLOCKQUOTE>
1021	<TABLE CELLSPACING=0 CELLPADDING=0>
1022	<TR>
1023	<TD><CODE>v    </CODE></TD>
1024	<TD><I>invalid</I> exception</TD>
1025	</TR>
1026	<TR>
1027	<TD><CODE>i</CODE></TD>
1028	<TD><I>infinite</I> exception (“divide by zero”)</TD>
1029	</TR>
1030	<TR><TD><CODE>o</CODE></TD><TD><I>overflow</I> exception</TD></TR>
1031	<TR><TD><CODE>u</CODE></TD><TD><I>underflow</I> exception</TD></TR>
1032	<TR><TD><CODE>x</CODE></TD><TD><I>inexact</I> exception</TD></TR>
1033	</TABLE>
1034	</BLOCKQUOTE>
1035	For example, the notation <CODE>...ux</CODE> indicates that the
1036	<I>underflow</I> and <I>inexact</I> exception flags were set and that the other
1037	three flags (<I>invalid</I>, <I>infinite</I>, and <I>overflow</I>) were not
1038	set.
1039	The exception flags are always written following the value returned as the
1040	result of the operation.
1041	</P>
1042
1043
1044	<H2>8. Variations Allowed by the IEEE Floating-Point Standard</H2>
1045
1046	<P>
1047	The IEEE Floating-Point Standard admits some variation among conforming
1048	implementations.
1049	Because TestFloat expects the two implementations being compared to deliver
1050	bit-for-bit identical results under most circumstances, this leeway in the
1051	standard can result in false errors being reported if the two implementations
1052	do not make the same choices everywhere the standard provides an option.
1053	</P>
1054
1055	<H3>8.1. Underflow</H3>
1056
1057	<P>
1058	The standard specifies that the <I>underflow</I> exception flag is to be raised
1059	when two conditions are met simultaneously:
1060	<NOBR>(1) <I>tininess</I></NOBR> and <NOBR>(2) <I>loss of accuracy</I></NOBR>.
1061	</P>
1062
1063	<P>
1064	A result is tiny when its magnitude is nonzero yet smaller than any normalized
1065	floating-point number.
1066	The standard allows tininess to be determined either before or after a result
1067	is rounded to the destination precision.
1068	If tininess is detected before rounding, some borderline cases will be flagged
1069	as underflows even though the result after rounding actually lies within the
1070	normal floating-point range.
1071	By detecting tininess after rounding, a system can avoid some unnecessary
1072	signaling of underflow.
1073	All the TestFloat programs support options <CODE>-tininessbefore</CODE> and
1074	<CODE>-tininessafter</CODE> to control whether TestFloat expects tininess on
1075	underflow to be detected before or after rounding.
1076	One or the other is selected as the default when TestFloat is compiled, but
1077	these command options allow the default to be overridden.
1078	</P>
1079
1080	<P>
1081	Loss of accuracy occurs when the subnormal format is not sufficient to
1082	represent an underflowed result accurately.
1083	The original 1985 version of the IEEE Standard allowed loss of accuracy to be
1084	detected either as an <I>inexact result</I> or as a
1085	<I>denormalization loss</I>;
1086	however, few if any systems ever chose the latter.
1087	The latest standard requires that loss of accuracy be detected as an inexact
1088	result, and TestFloat can test only for this case.
1089	</P>
1090
1091	<H3>8.2. NaNs</H3>
1092
1093	<P>
1094	The IEEE Standard gives the floating-point formats a large number of NaN
1095	encodings and specifies that NaNs are to be returned as results under certain
1096	conditions.
1097	However, the standard allows an implementation almost complete freedom over
1098	<EM>which</EM> NaN to return in each situation.
1099	</P>
1100
1101	<P>
1102	By default, TestFloat does not check the bit patterns of NaN results.
1103	When the result of an operation should be a NaN, any NaN is considered as good
1104	as another.
1105	This laxness can be overridden with the <CODE>-checkNaNs</CODE> option of
1106	programs <CODE>testfloat_ver</CODE> and <CODE>testfloat</CODE>.
1107	In order for this option to be sensible, TestFloat must have been compiled so
1108	that its internal floating-point implementation (SoftFloat) generates the
1109	proper NaN results for the system being tested.
1110	</P>
1111
1112	<H3>8.3. Conversions to Integer</H3>
1113
1114	<P>
1115	Conversion of a floating-point value to an integer format will fail if the
1116	source value is a NaN or if it is too large.
1117	The IEEE Standard does not specify what value should be returned as the integer
1118	result in these cases.
1119	Moreover, according to the standard, the <I>invalid</I> exception can be raised
1120	or an unspecified alternative mechanism may be used to signal such cases.
1121	</P>
1122
1123	<P>
1124	TestFloat assumes that conversions to integer will raise the <I>invalid</I>
1125	exception if the source value cannot be rounded to a representable integer.
1126	In such cases, TestFloat expects the result value to be the largest-magnitude
1127	positive or negative integer or zero, as detailed earlier in
1128	<NOBR>section 6.1</NOBR>, <I>Conversion Operations</I>.
1129	If option <CODE>-checkInvInts</CODE> is selected with programs
1130	<CODE>testfloat_ver</CODE> and <CODE>testfloat</CODE>, integer results of
1131	invalid operations are checked for an exact match.
1132	In order for this option to be sensible, TestFloat must have been compiled so
1133	that its internal floating-point implementation (SoftFloat) generates the
1134	proper integer results for the system being tested.
1135	</P>
1136
1137
1138	<H2>9. Contact Information</H2>
1139
1140	<P>
1141	At the time of this writing, the most up-to-date information about TestFloat
1142	and the latest release can be found at the Web page
1143	<A HREF="http://www.jhauser.us/arithmetic/TestFloat.html"><NOBR><CODE>http://www.jhauser.us/arithmetic/TestFloat.html</CODE></NOBR></A>.
1144	</P>
1145
1146
1147	</BODY>
1148

注意: 瀏覽 TracBrowser 來幫助您使用儲存庫瀏覽器

source: vbox/trunk/src/libs/softfloat-3e/testfloat/doc/TestFloat-general.html@ 103075

以其他格式下載: