Discussion:
parallel check output changes?
Andrew MacLeod
2014-09-18 12:56:50 UTC
Permalink
Has the changes that have gone into the check parallelization made the
.sum file non-deterministic?
I'm seeing a lot of small hunks in different orders which cause my
comparison scripts to show big differences.
I haven't been paying attention to the nature of the make check changes
so Im not sure if this is expected...

Or is this something else? Its the same code base between runs, just
with a few changes made to some include files.

ie: the order of the options -mstackrealign and -mno-stackrealign are
swapped in this output:

Running
/gcc/2014-09-16/gcc/gcc/testsuite/gcc.target/i386/stackalign/stackalign.exp
...
- UNSUPPORTED: gcc.target/i386/stackalign/asm-1.c -mstackrealign
UNSUPPORTED: gcc.target/i386/stackalign/asm-1.c -mno-stackrealign
! UNSUPPORTED: gcc.target/i386/stackalign/longlong-1.c -mstackrealign
UNSUPPORTED: gcc.target/i386/stackalign/longlong-1.c -mno-stackrealign
UNSUPPORTED: gcc.target/i386/stackalign/longlong-2.c -mstackrealign
UNSUPPORTED: gcc.target/i386/stackalign/longlong-2.c -mno-stackrealign
PASS: gcc.target/i386/stackalign/pr39146.c -mstackrealign (test for
excess errors)
--- 110393,110402 ----
PASS: gcc.target/i386/math-torture/trunc.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none (test for excess errors)
PASS: gcc.target/i386/math-torture/trunc.c -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects (test for excess errors)
Running
/gcc/2014-09-16/gcc/gcc/testsuite/gcc.target/i386/stackalign/stackalign.exp
...
UNSUPPORTED: gcc.target/i386/stackalign/asm-1.c -mno-stackrealign
! UNSUPPORTED: gcc.target/i386/stackalign/asm-1.c -mstackrealign
UNSUPPORTED: gcc.target/i386/stackalign/longlong-1.c -mno-stackrealign
+ UNSUPPORTED: gcc.target/i386/stackalign/longlong-1.c -mstackrealign
UNSUPPORTED: gcc.target/i386/stackalign/longlong-2.c -mstackrealign
UNSUPPORTED: gcc.target/i386/stackalign/longlong-2.c -mno-stackrealign
PASS: gcc.target/i386/stackalign/pr39146.c -mstackrealign (test for
excess errors)


Andrew
Andrew MacLeod
2014-09-18 13:05:04 UTC
Permalink
Has the changes that have gone into the check parallelization made the .sum
file non-deterministic?
I'm seeing a lot of small hunks in different orders which cause my
comparison scripts to show big differences.
I haven't been paying attention to the nature of the make check changes so
Im not sure if this is expected...
Or is this something else? Its the same code base between runs, just with a
few changes made to some include files.
I'm using contrib/test_summary and haven't seen any non-determinisms in the
output of that command. As for dg-extract-results.sh, we have two versions
of that, one if you have python 2.6 or newer, another one if you don't.
Perhaps the behavior of those two (I'm using the python version probably)
differs?
Jakub
Not sure, although I do have python 2.7.5 installed for what its
worth... I'll try another run in a bit.

Andrew
Andrew MacLeod
2014-09-18 15:03:45 UTC
Permalink
Post by Andrew MacLeod
Has the changes that have gone into the check parallelization made the .sum
file non-deterministic?
I'm seeing a lot of small hunks in different orders which cause my
comparison scripts to show big differences.
I haven't been paying attention to the nature of the make check changes so
Im not sure if this is expected...
Or is this something else? Its the same code base between runs, just with a
few changes made to some include files.
I'm using contrib/test_summary and haven't seen any non-determinisms in the
output of that command. As for dg-extract-results.sh, we have two versions
of that, one if you have python 2.6 or newer, another one if you don't.
Perhaps the behavior of those two (I'm using the python version probably)
differs?
Jakub
Not sure, although I do have python 2.7.5 installed for what its
worth... I'll try another run in a bit.
Andrew
hum. My 3rd run (which has no compilation change from the 2nd one) is
different from both other runs :-P. I did tweak my -j parameter in
the make check, but that is it.

Andrew
Bernd Schmidt
2014-09-18 17:32:00 UTC
Permalink
Post by Andrew MacLeod
Post by Andrew MacLeod
Has the changes that have gone into the check parallelization made the .sum
file non-deterministic?
I'm seeing a lot of small hunks in different orders which cause my
comparison scripts to show big differences.
I haven't been paying attention to the nature of the make check changes so
Im not sure if this is expected...
Or is this something else? Its the same code base between runs, just with a
few changes made to some include files.
I'm using contrib/test_summary and haven't seen any non-determinisms in the
output of that command. As for dg-extract-results.sh, we have two versions
of that, one if you have python 2.6 or newer, another one if you don't.
Perhaps the behavior of those two (I'm using the python version probably)
differs?
Jakub
Not sure, although I do have python 2.7.5 installed for what its
worth... I'll try another run in a bit.
Andrew
hum. My 3rd run (which has no compilation change from the 2nd one) is
different from both other runs :-P. I did tweak my -j parameter in
the make check, but that is it.
I'm also seeing this. Python 3.3.5 here.


Bernd
Jakub Jelinek
2014-09-18 17:36:09 UTC
Permalink
Post by Bernd Schmidt
Post by Andrew MacLeod
hum. My 3rd run (which has no compilation change from the 2nd one) is
different from both other runs :-P. I did tweak my -j parameter in
the make check, but that is it.
I'm also seeing this. Python 3.3.5 here.
Segher on IRC mentioned that changing result_re in dg-extract-results.py
should help here (or disabling the python version, *.sh version should
sort everything).

Jakub
Segher Boessenkool
2014-09-18 18:44:55 UTC
Permalink
Post by Jakub Jelinek
Segher on IRC mentioned that changing result_re in dg-extract-results.py
should help here (or disabling the python version, *.sh version should
sort everything).
I am testing a patch that is just


diff --git a/contrib/dg-extract-results.py b/contrib/dg-extract-results.py
index cccbfd3..3781423 100644
--- a/contrib/dg-extract-results.py
+++ b/contrib/dg-extract-results.py
@@ -117,7 +117,7 @@ class Prog:
self.tool_re = re.compile (r'^\t\t=== (.*) tests ===$')
self.result_re = re.compile (r'^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED'
r'|WARNING|ERROR|UNSUPPORTED|UNTESTED'
- r'|KFAIL):\s*(\S+)')
+ r'|KFAIL):\s*(.+)')
self.completed_re = re.compile (r'.* completed at (.*)')
# Pieces of text to write at the head of the output.
# start_line is a pair in which the first element is a datetime


Relatedly, is it just me or are most lines of the test summaries (the "#"
lines after "===") missing since the parallelisation patches?


Segher
Segher Boessenkool
2014-09-19 09:37:23 UTC
Permalink
Post by Segher Boessenkool
I am testing a patch that is just
diff --git a/contrib/dg-extract-results.py b/contrib/dg-extract-results.py
index cccbfd3..3781423 100644
--- a/contrib/dg-extract-results.py
+++ b/contrib/dg-extract-results.py
self.tool_re = re.compile (r'^\t\t=== (.*) tests ===$')
self.result_re = re.compile (r'^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED'
r'|WARNING|ERROR|UNSUPPORTED|UNTESTED'
- r'|KFAIL):\s*(\S+)')
+ r'|KFAIL):\s*(.+)')
self.completed_re = re.compile (r'.* completed at (.*)')
# Pieces of text to write at the head of the output.
# start_line is a pair in which the first element is a datetime
Tested that with four runs on powerpc64-linux, four configs each time;
test-summary
shows the same in all cases. Many lines have moved compared to without
the patch, but that cannot be helped. Okay for mainline?


2014-09-19 Segher Boessenkool <***@kernel.crashing.org>

contrib/
* dg-extract-results.py (Prog.result_re): Include options in test name.
Post by Segher Boessenkool
Relatedly, is it just me or are most lines of the test summaries (the "#"
lines after "===") missing since the parallelisation patches?
This is still open.


I also did some timings for make -j60 -k check, same -m64,-m32,-m32/-mpowerpc64,
-m64/-mlra configs. A run takes 65m, is effectively 42x parallel, and has 15%
system time.


Segher
Mike Stump
2014-09-19 16:30:52 UTC
Permalink
Post by Segher Boessenkool
Post by Segher Boessenkool
I am testing a patch that is just
diff --git a/contrib/dg-extract-results.py b/contrib/dg-extract-results.py
index cccbfd3..3781423 100644
--- a/contrib/dg-extract-results.py
+++ b/contrib/dg-extract-results.py
self.tool_re = re.compile (r'^\t\t=== (.*) tests ===$')
self.result_re = re.compile (r'^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED'
r'|WARNING|ERROR|UNSUPPORTED|UNTESTED'
- r'|KFAIL):\s*(\S+)')
+ r'|KFAIL):\s*(.+)')
self.completed_re = re.compile (r'.* completed at (.*)')
# Pieces of text to write at the head of the output.
# start_line is a pair in which the first element is a datetime
Tested that with four runs on powerpc64-linux, four configs each time;
test-summary
shows the same in all cases. Many lines have moved compared to without
the patch, but that cannot be helped. Okay for mainline?
Ok.
Post by Segher Boessenkool
I also did some timings for make -j60 -k check, same -m64,-m32,-m32/-mpowerpc64,
-m64/-mlra configs. A run takes 65m, is effectively 42x parallel, and has 15%
system time.
Thanks for the work and for the timings.
Richard Sandiford
2014-09-23 15:33:19 UTC
Permalink
Post by Segher Boessenkool
Post by Segher Boessenkool
I am testing a patch that is just
diff --git a/contrib/dg-extract-results.py b/contrib/dg-extract-results.py
index cccbfd3..3781423 100644
--- a/contrib/dg-extract-results.py
+++ b/contrib/dg-extract-results.py
self.tool_re = re.compile (r'^\t\t=== (.*) tests ===$')
self.result_re = re.compile (r'^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED'
r'|WARNING|ERROR|UNSUPPORTED|UNTESTED'
- r'|KFAIL):\s*(\S+)')
+ r'|KFAIL):\s*(.+)')
self.completed_re = re.compile (r'.* completed at (.*)')
# Pieces of text to write at the head of the output.
# start_line is a pair in which the first element is a datetime
Tested that with four runs on powerpc64-linux, four configs each time;
test-summary
shows the same in all cases. Many lines have moved compared to without
the patch, but that cannot be helped. Okay for mainline?
contrib/
* dg-extract-results.py (Prog.result_re): Include options in test name.
FWIW, the \S+ thing was deliberate. When one test is run multiple times
with different options, those options aren't necessarily tried in
alphabetical order. The old sh/awk script therefore used just the test
name as the key and kept tests with the same name in the order that
they were encountered:

/^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED|WARNING|ERROR|UNSUPPORTED|UNTESTED|KFAIL):/ {
testname=\$2
# Ugly hack for gfortran.dg/dg.exp
if ("$TOOL" == "gfortran" && testname ~ /^gfortran.dg\/g77\//)
testname="h"testname
}

(note the "$2"). This means that the output of the script is in the same
order as it would be for non-parallel runs. I was following (or trying
to follow) that behaviour in the python script.

Your patch instead sorts based on the full test name, including options,
which means that the output no longer matches what you'd get from a
non-parallel run. AFAICT, it also no longer matches what you'd get from
the .sh version. That might be OK, just thought I'd mention it.

Thanks,
Richard
Jakub Jelinek
2014-09-23 15:42:50 UTC
Permalink
Post by Richard Sandiford
FWIW, the \S+ thing was deliberate. When one test is run multiple times
with different options, those options aren't necessarily tried in
alphabetical order. The old sh/awk script therefore used just the test
name as the key and kept tests with the same name in the order that
/^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED|WARNING|ERROR|UNSUPPORTED|UNTESTED|KFAIL):/ {
testname=\$2
# Ugly hack for gfortran.dg/dg.exp
if ("$TOOL" == "gfortran" && testname ~ /^gfortran.dg\/g77\//)
testname="h"testname
}
(note the "$2"). This means that the output of the script is in the same
order as it would be for non-parallel runs. I was following (or trying
to follow) that behaviour in the python script.
My understanding was that the sh version sorts the testcase name followed
by the full line and then removes whatever has been there before the
PASS/XPASS etc., so while e.g. whether some test PASSed or FAILed etc.
is then more important than the option, if two tests PASS, the options are
still used for the sorting. Note that before the parallelization changes,
usually the same test filename would be run all by a single runtest
instance, so it really didn't matter that much.
Post by Richard Sandiford
Your patch instead sorts based on the full test name, including options,
which means that the output no longer matches what you'd get from a
non-parallel run. AFAICT, it also no longer matches what you'd get from
the .sh version. That might be OK, just thought I'd mention it.
I'm afraid there is not enough info to reconstruct the order serial version
has.

Jakub
Andrew MacLeod
2014-09-24 14:54:57 UTC
Permalink
Post by Richard Sandiford
Post by Segher Boessenkool
Post by Segher Boessenkool
I am testing a patch that is just
diff --git a/contrib/dg-extract-results.py b/contrib/dg-extract-results.py
index cccbfd3..3781423 100644
--- a/contrib/dg-extract-results.py
+++ b/contrib/dg-extract-results.py
self.tool_re = re.compile (r'^\t\t=== (.*) tests ===$')
self.result_re = re.compile (r'^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED'
r'|WARNING|ERROR|UNSUPPORTED|UNTESTED'
- r'|KFAIL):\s*(\S+)')
+ r'|KFAIL):\s*(.+)')
self.completed_re = re.compile (r'.* completed at (.*)')
# Pieces of text to write at the head of the output.
# start_line is a pair in which the first element is a datetime
Tested that with four runs on powerpc64-linux, four configs each time;
test-summary
shows the same in all cases. Many lines have moved compared to without
the patch, but that cannot be helped. Okay for mainline?
contrib/
* dg-extract-results.py (Prog.result_re): Include options in test name.
FWIW, the \S+ thing was deliberate. When one test is run multiple times
with different options, those options aren't necessarily tried in
alphabetical order. The old sh/awk script therefore used just the test
name as the key and kept tests with the same name in the order that
/^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED|WARNING|ERROR|UNSUPPORTED|UNTESTED|KFAIL):/ {
testname=\$2
# Ugly hack for gfortran.dg/dg.exp
if ("$TOOL" == "gfortran" && testname ~ /^gfortran.dg\/g77\//)
testname="h"testname
}
(note the "$2"). This means that the output of the script is in the same
order as it would be for non-parallel runs. I was following (or trying
to follow) that behaviour in the python script.
Your patch instead sorts based on the full test name, including options,
which means that the output no longer matches what you'd get from a
non-parallel run. AFAICT, it also no longer matches what you'd get from
the .sh version. That might be OK, just thought I'd mention it.
Thanks,
Richard
Is this suppose to be resolved now? I'm still seeing some issues with a
branch cut from mainline from yesterday. This is from the following
sequence:

check out revision 215511 , build, make -j16 check, make -j16 check,
then compare all the .sum files:

PASS: gcc.dg/tls/asm-1.c (test for errors, line 7)
PASS: gcc.dg/tls/asm-1.c (test for excess errors)
PASS: gcc.dg/tls/debug-1.c (test for excess errors)
PASS: gcc.dg/tls/diag-1.c (test for excess errors)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 4)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 5)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 6)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 7)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 11)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 12)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 13)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 14)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 17)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 18)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 19)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 20)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 22)

and then
PASS: gcc.dg/tls/asm-1.c (test for errors, line 7)
PASS: gcc.dg/tls/asm-1.c (test for excess errors)
PASS: gcc.dg/tls/debug-1.c (test for excess errors)
PASS: gcc.dg/tls/diag-1.c (test for excess errors)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 11)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 12)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 13)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 14)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 17)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 18)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 19)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 20)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 22)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 4)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 5)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 6)
PASS: gcc.dg/tls/diag-2.c (test for errors, line 7)

it looks like the first time sorted by line numerically (or just
happened to leave the run order) and the second time did the sort
alphabetically...

Andrew
Segher Boessenkool
2014-09-24 16:10:39 UTC
Permalink
Post by Andrew MacLeod
Post by Richard Sandiford
Your patch instead sorts based on the full test name, including options,
which means that the output no longer matches what you'd get from a
non-parallel run. AFAICT, it also no longer matches what you'd get from
the .sh version. That might be OK, just thought I'd mention it.
With the parallellisation changes the output was pretty random order. My
patch made that a fixed order again, albeit a different one from before.
Post by Andrew MacLeod
Is this suppose to be resolved now? I'm still seeing some issues with a
branch cut from mainline from yesterday. This is from the following
check out revision 215511 , build, make -j16 check, make -j16 check,
I don't understand what exactly you did; you have left out some steps
I think?


Segher
Andrew MacLeod
2014-09-24 16:29:40 UTC
Permalink
Post by Segher Boessenkool
Post by Andrew MacLeod
Post by Richard Sandiford
Your patch instead sorts based on the full test name, including options,
which means that the output no longer matches what you'd get from a
non-parallel run. AFAICT, it also no longer matches what you'd get from
the .sh version. That might be OK, just thought I'd mention it.
With the parallellisation changes the output was pretty random order. My
patch made that a fixed order again, albeit a different one from before.
Post by Andrew MacLeod
Is this suppose to be resolved now? I'm still seeing some issues with a
branch cut from mainline from yesterday. This is from the following
check out revision 215511 , build, make -j16 check, make -j16 check,
I don't understand what exactly you did; you have left out some steps
I think?
What? no.. like what? check out a tree, basic configure and build from
scratch (./configure --verbose, make -j16 all) and then run make check
twice in a row.. literally "make -j16 -i check". nothing in between. so
the compiler and toolchain are exactly the same. and different results.
same way Ive done it forever. except I am still getting some different
results from run to run. target is a normal build-x86_64-unknown-linux-gnu

what I'm saying is something still isn't all getting sorted all the time
(maybe if a section wasn't split up, it doesn't sort?), or all the
patches to fix it aren't in, or there is something else still amok.
Notice it isn't options that is the problem this time.. its the trailing
line number of the test case warning. One is in numerical order, the
other is in alphabetical order.

Im running it a third time now.. we'll see if its different than both
the others or not.

Andrew
Andrew MacLeod
2014-09-24 17:58:55 UTC
Permalink
Post by Andrew MacLeod
Post by Segher Boessenkool
Post by Andrew MacLeod
Post by Richard Sandiford
Your patch instead sorts based on the full test name, including options,
which means that the output no longer matches what you'd get from a
non-parallel run. AFAICT, it also no longer matches what you'd get from
the .sh version. That might be OK, just thought I'd mention it.
With the parallellisation changes the output was pretty random order. My
patch made that a fixed order again, albeit a different one from before.
Post by Andrew MacLeod
Is this suppose to be resolved now? I'm still seeing some issues with a
branch cut from mainline from yesterday. This is from the following
check out revision 215511 , build, make -j16 check, make -j16 check,
I don't understand what exactly you did; you have left out some steps
I think?
What? no.. like what? check out a tree, basic configure and build
from scratch (./configure --verbose, make -j16 all) and then run make
check twice in a row.. literally "make -j16 -i check". nothing in
between. so the compiler and toolchain are exactly the same. and
different results. same way Ive done it forever. except I am still
getting some different results from run to run. target is a normal
build-x86_64-unknown-linux-gnu
what I'm saying is something still isn't all getting sorted all the
time (maybe if a section wasn't split up, it doesn't sort?), or all
the patches to fix it aren't in, or there is something else still
amok. Notice it isn't options that is the problem this time.. its the
trailing line number of the test case warning. One is in numerical
order, the other is in alphabetical order.
Im running it a third time now.. we'll see if its different than both
the others or not.
Andrew
AH. interesting.

The third run has a gcc.sum that is exactly the same as the first run.
so only the second run differs, and it seems to be from an alphabetical
sort. So run 3 and 1 match.
the gfortran.sum from the third run is identical to the *second* run,
but it is different from the *first* run. so run 2 and 3 match.

the two runs that match (2nd and 3rd run) look like:
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single -O2 (test
for excess errors)
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single -O2
execution test
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib -O2
-lcaf_single (test for excess errors)
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib -O2
-lcaf_single execution test
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single -O2 (test
for excess errors)
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single -O2
execution test
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib -O2
-lcaf_single (test for excess errors)
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib -O2
-lcaf_single execution test

and the odd one out (firstrun:)
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib -O2
-lcaf_single (test for excess errors)
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib -O2
-lcaf_single execution test
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single -O2 (test
for excess errors)
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single -O2
execution test
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib -O2
-lcaf_single (test for excess errors)
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib -O2
-lcaf_single execution test
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single -O2 (test
for excess errors)
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single -O2
execution test

looks like the first run was sorted, and the other 2 weren't.

There must be some condition under which we don't sort the results? or
another place which needs to be tweaked to do the sort as well...?

Andrew
Andrew MacLeod
2014-09-25 12:22:29 UTC
Permalink
Post by Andrew MacLeod
AH. interesting.
The third run has a gcc.sum that is exactly the same as the first run.
so only the second run differs, and it seems to be from an
alphabetical sort. So run 3 and 1 match.
the gfortran.sum from the third run is identical to the *second* run,
but it is different from the *first* run. so run 2 and 3 match.
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single -O2 (test
for excess errors)
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single -O2
execution test
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib -O2
-lcaf_single (test for excess errors)
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib -O2
-lcaf_single execution test
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single -O2 (test
for excess errors)
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single -O2
execution test
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib -O2
-lcaf_single (test for excess errors)
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib -O2
-lcaf_single execution test
and the odd one out (firstrun:)
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib -O2
-lcaf_single (test for excess errors)
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib -O2
-lcaf_single execution test
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single -O2 (test
for excess errors)
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single -O2
execution test
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib -O2
-lcaf_single (test for excess errors)
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib -O2
-lcaf_single execution test
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single -O2 (test
for excess errors)
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single -O2
execution test
looks like the first run was sorted, and the other 2 weren't.
There must be some condition under which we don't sort the results? or
another place which needs to be tweaked to do the sort as well...?
Andrew
So to be fair, I could use test_summary, but I think the concern is
warranted because if this inconsistent ordering can happen to PASS, I
would expect the same non-deterministic behaviour if those tests happen
to FAIL. we just have far less FAILS so we aren't seeing it with
test_summary at the moment...

Aggregating all my .sum files, I see a sampling of about 257,000 PASSs,
whereas I see a total of 141 FAILs. FAILs only account for < 0.06% of
the output. ( I'm getting an average of about 510 mis-ordered PASSs, so
it only affects a small portion of them as well.)

I would think the output of .sum needs to be consistent from one run to
the next in order for test_summary to consistently report its results as
well.

Andrew
Segher Boessenkool
2014-09-25 17:02:32 UTC
Permalink
Post by Andrew MacLeod
So to be fair, I could use test_summary, but I think the concern is
warranted because if this inconsistent ordering can happen to PASS, I
would expect the same non-deterministic behaviour if those tests happen
to FAIL. we just have far less FAILS so we aren't seeing it with
test_summary at the moment...
Aggregating all my .sum files, I see a sampling of about 257,000 PASSs,
whereas I see a total of 141 FAILs. FAILs only account for < 0.06% of
the output. ( I'm getting an average of about 510 mis-ordered PASSs, so
it only affects a small portion of them as well.)
0.24% here (2241 FAILs, 917715 PASSes).

You're seeing about 1 in 500 misordered, so if it was independent (which
of course it is not) I should see it in the FAILs already.
Post by Andrew MacLeod
I would think the output of .sum needs to be consistent from one run to
the next in order for test_summary to consistently report its results as
well.
Yes. There also is the problem of the summaries being messed up (which
they were already before the parallelisation changes, but now the result
is much worse).

I'll have another look.


Segher
Segher Boessenkool
2014-10-02 16:47:39 UTC
Permalink
Post by Andrew MacLeod
Is this suppose to be resolved now? I'm still seeing some issues with a
branch cut from mainline from yesterday.
Confirmed. The following patch works for me, and Andrew has tested it
as well. The comment it removes isn't valid before the patch either.

Okay for mainline?


Segher


2014-10-02 Segher Boessenkool <***@kernel.crashing.org>

contrib/
* dg-extract-results.py (output_variation): Always sort if do_sum.


diff --git a/contrib/dg-extract-results.py b/contrib/dg-extract-results.py
index fafd38e..7db5e64 100644
--- a/contrib/dg-extract-results.py
+++ b/contrib/dg-extract-results.py
@@ -495,15 +495,7 @@ class Prog:
key = attrgetter ('name')):
sys.stdout.write ('Running ' + harness.name + ' ...\n')
if self.do_sum:
- # Keep the original test result order if there was only
- # one segment for this harness. This is needed for
- # unsorted.exp, which has unusual test names. Otherwise
- # sort the tests by test filename. If there are several
- # subtests for the same test filename (such as 'compilation',
- # 'test for excess errors', etc.) then keep the subtests
- # in the original order.
- if len (harness.segments) > 1:
- harness.results.sort()
+ harness.results.sort()
for (key, line) in harness.results:
sys.stdout.write (line)
else:
Jakub Jelinek
2014-10-02 17:05:03 UTC
Permalink
Post by Segher Boessenkool
Post by Andrew MacLeod
Is this suppose to be resolved now? I'm still seeing some issues with a
branch cut from mainline from yesterday.
Confirmed. The following patch works for me, and Andrew has tested it
as well. The comment it removes isn't valid before the patch either.
Okay for mainline?
Segher
contrib/
* dg-extract-results.py (output_variation): Always sort if do_sum.
Ok, thanks.
Post by Segher Boessenkool
--- a/contrib/dg-extract-results.py
+++ b/contrib/dg-extract-results.py
sys.stdout.write ('Running ' + harness.name + ' ...\n')
- # Keep the original test result order if there was only
- # one segment for this harness. This is needed for
- # unsorted.exp, which has unusual test names. Otherwise
- # sort the tests by test filename. If there are several
- # subtests for the same test filename (such as 'compilation',
- # 'test for excess errors', etc.) then keep the subtests
- # in the original order.
- harness.results.sort()
+ harness.results.sort()
sys.stdout.write (line)
Jakub
Richard Sandiford
2014-10-02 17:46:19 UTC
Permalink
Post by Segher Boessenkool
Post by Andrew MacLeod
Is this suppose to be resolved now? I'm still seeing some issues with a
branch cut from mainline from yesterday.
Confirmed. The following patch works for me, and Andrew has tested it
as well. The comment it removes isn't valid before the patch either.
I get the impression from a short dismissal like that that this script
is pretty hated :-(. Remember that originally the script was trying to
make the result of combining separate .sum files the same as the .sum
file you'd get for -j1. As Jakub said upthread, that's a lost cause
with the new approach to parallel testing, but I think the comment was
valid while matching -j1 was still a goal.

Thanks,
Richard
Jakub Jelinek
2014-10-02 17:59:50 UTC
Permalink
Post by Richard Sandiford
Post by Segher Boessenkool
Post by Andrew MacLeod
Is this suppose to be resolved now? I'm still seeing some issues with a
branch cut from mainline from yesterday.
Confirmed. The following patch works for me, and Andrew has tested it
as well. The comment it removes isn't valid before the patch either.
I get the impression from a short dismissal like that that this script
is pretty hated :-(. Remember that originally the script was trying to
No, it is certainly appreciated that it speeded up the processing.
Post by Richard Sandiford
make the result of combining separate .sum files the same as the .sum
file you'd get for -j1. As Jakub said upthread, that's a lost cause
with the new approach to parallel testing, but I think the comment was
valid while matching -j1 was still a goal.
I'm sorry for invalidating those assumptions. Indeed, before my recent
changes, all tests for the same testcase name were run serially by the same
job. If we wanted to preserve that property, we could e.g. store the
results of gcc_parallel_test_run_p in some tcl array with testcase as the
key, and after the
if { $gcc_runtest_parallelize_enable == 0 } {
return 1
}
test add a test if we've been asked about a particular testcase already,
just return what we've returned before. Perhaps accompany that with
lowering the granularity (e.g. from 10 to 5).

Jakub
Richard Sandiford
2014-10-04 10:32:20 UTC
Permalink
Post by Jakub Jelinek
Post by Richard Sandiford
make the result of combining separate .sum files the same as the .sum
file you'd get for -j1. As Jakub said upthread, that's a lost cause
with the new approach to parallel testing, but I think the comment was
valid while matching -j1 was still a goal.
I'm sorry for invalidating those assumptions. Indeed, before my recent
changes, all tests for the same testcase name were run serially by the same
job. If we wanted to preserve that property, we could e.g. store the
results of gcc_parallel_test_run_p in some tcl array with testcase as the
key, and after the
if { $gcc_runtest_parallelize_enable == 0 } {
return 1
}
test add a test if we've been asked about a particular testcase already,
just return what we've returned before. Perhaps accompany that with
lowering the granularity (e.g. from 10 to 5).
That sounds like it'd help if we have any lingering cases where the
text after the PASS: etc. isn't unique, since otherwise it's probably
unpredictable which one comes first in the combined summary (as it was
when the test order was still keyed only off filename). OTOH I suppose
we should just fix those tests so that the name is unique.

Also, now that even partial RUNTESTFLAGS-based testuite runs can be
fully parallelised (very nice, thanks), what -j1 did is probably
no longer relevant anyway.

Thanks,
Richard
Mike Stump
2014-10-05 17:53:11 UTC
Permalink
Post by Richard Sandiford
we should just fix those tests so that the name is unique.
Yes. This is good in all sorts of ways.

Segher Boessenkool
2014-10-02 18:14:50 UTC
Permalink
Post by Richard Sandiford
Post by Segher Boessenkool
Post by Andrew MacLeod
Is this suppose to be resolved now? I'm still seeing some issues with a
branch cut from mainline from yesterday.
Confirmed. The following patch works for me, and Andrew has tested it
as well. The comment it removes isn't valid before the patch either.
I get the impression from a short dismissal like that that this script
is pretty hated :-(.
I meant that it isn't valid currently; it was valid before the parallelisation
patches. It would be nice if we could reconstruct the original order somehow.
Without this patch the order is different every run though, and that makes
comparing testresults unworkable.


Segher
Andrew MacLeod
2014-10-02 19:04:35 UTC
Permalink
Post by Segher Boessenkool
Post by Richard Sandiford
Post by Segher Boessenkool
Post by Andrew MacLeod
Is this suppose to be resolved now? I'm still seeing some issues with a
branch cut from mainline from yesterday.
Confirmed. The following patch works for me, and Andrew has tested it
as well. The comment it removes isn't valid before the patch either.
I get the impression from a short dismissal like that that this script
is pretty hated :-(.
I meant that it isn't valid currently; it was valid before the parallelisation
patches. It would be nice if we could reconstruct the original order somehow.
Without this patch the order is different every run though, and that makes
comparing testresults unworkable.
Doesn't this patch make it always sort? And that should mean that -j1
will be the same as -JN again... ? it won't be the same order as
before the patches... but I doubt that is important... not that I'm
aware of anyway.

Andrew



Andrew
Jakub Jelinek
2014-09-18 13:01:09 UTC
Permalink
Has the changes that have gone into the check parallelization made the .sum
file non-deterministic?
I'm seeing a lot of small hunks in different orders which cause my
comparison scripts to show big differences.
I haven't been paying attention to the nature of the make check changes so
Im not sure if this is expected...
Or is this something else? Its the same code base between runs, just with a
few changes made to some include files.
I'm using contrib/test_summary and haven't seen any non-determinisms in the
output of that command. As for dg-extract-results.sh, we have two versions
of that, one if you have python 2.6 or newer, another one if you don't.
Perhaps the behavior of those two (I'm using the python version probably)
differs?

Jakub
Loading...