[cffi-devel] how to treat expected failures in tests
Jeffrey Cunningham
jeffrey at jkcunningham.com
Wed Jan 11 16:30:48 UTC 2012
On Wed, 11 Jan 2012 07:00:53 -0800, Robert Goldman <rpgoldman at sift.info>
wrote:
> On 1/11/12 Jan 11 -1:16 AM, Daniel Herring wrote:
>> On Wed, 11 Jan 2012, Daniel Herring wrote:
>>> On Tue, 10 Jan 2012, Jeff Cunningham wrote:
>>>> How about OK, FAIL, UNEXPECTEDOK, and EXPECTEDFAIL?
>>>
>>> FWIW, here's one established set of terms:
>>> PASS, FAIL, UNRESOLVED, UNTESTED, UNSUPPORTED
>>> (XPASS and XFAIL are not in POSIX; change test polarity if desired)
>>> http://www.gnu.org/software/dejagnu/manual/x47.html#posix
>>
>
> I guess I'd be inclined to say "too bad for POSIX" and add XPASS and
> XFAIL....
>
> The reason that I'd be willing to flout (or "extend and extinguish" ;->)
> the standard is that there is no obvious advantage to POSIX compliance
> in this case that would compensate for the loss in information.
>
> cheers,
> r
I agree.
I really have no idea what is common practice in standard Unit Testing
protocols - it isn't my background (which is mathematics). The only reason
I suggested the additions is that it is useful information, some of which
is lost if you don't have all four cases. And in my consulting practice I
have used all four and seen them in use by others in one form or another
in most test settings.
There are many good descriptions of binary hypothesis testing, here is
one: (the two models in this setting would be something like H='test
passes' and 0='test fails')
"In binary hypothesis testing, assuming at least one of the two models
does indeed correspond to reality, there are four possible scenarios:
Case 1: H
0
is true, and we declare H
0
to be true
Case 2: H
0
is true, but we declare H
1
to be true
Case 3: H
1
is true, and we declare H
1
to be true
Case 4: H
1
is true, but we declare H
0
to be true
In cases 2 and 4, errors occur. The names given to these errors depend on
the area of application. In statistics, they are called type I and type II
errors respectively, while in signal processing they are known as a false
alarm or a miss."
(from http://cnx.org/content/m11531/latest/)
One might argue that Bayes testing procedures are not appropriate in
software verification tests but I think this would be short-sighted. It is
virtually impossible to design tests which cover every possible data/usage
scenario for any but the simplest pieces of code. So what in fact happens
is that the test designer picks the tests he thinks are most important.
That's where the statistics come in, in the broader sense. Testing several
hundred out of the hundreds of thousands or millions of possible
permutations of test parameters always implies that statistical
assumptions are being made. Being limited to 2 of 4 test results makes it
impossible to evaluate the results with any degree of rigor.
I am indifferent as to the terminology applied to cases 2 and 4, so long
as they are available. If they are not, it throws unnecessary uncertainty
over the entire corpus of test results. And having them available doesn't
force those who don't see their necessity to use them. They can choose to
simply ignore them and limit their information to the two conditional
cases:
{Case 1 | not Case 2}
{Case 3 | not Case 4}
Regards,
Jeff
More information about the cffi-devel
mailing list