What we have now:
1) no OpenCL support for HD4xxx cards in new drivers
2) no OpenCL support in Windows XP with new drivers
3) same erratic elapsed times when running on host with not idle CPU, under new drivers too.
So, maybe it was worth not to achieve 1) and 2) "great goals" but just fix issue number 3 that remains untouched more than half year already ???
AMD peoples, what your opinions are ?
Looks like AMD staff miss the question.
Well, this picture illustrates situation I talk about.
Here one can see results of few test runs of exactly same workload 2 runs were done with CPU idle (C-60 APU device used for this bench). Another 2 runs were done with cPU busy with idle priority processes. It's very important, CPU processes had idle priority. Priority of process of GPU app is "below normal". How do you think, does difference between pairs of green dots acceptable? This corresponds the difference in performance for more than the order of magnitude for first dot.
X-axis represents the change of domain size for some kernels in app. The bigger param is the bigger corresponding kernels are.
And some additional observation to think on: long elapsed times usually correspond to lesser CPU times. Cause app code does exactly the same I attribute this CPU time to difference in synching modes used inside driver. So, just as in my another recent post, there is some fundamental issue in host<->GPU synching implementation in recent drivers. It manifests itself as big change in performance sometimes (as shown here) or in greatly increased CPU time consumption (as shown here: http://devgurus.amd.com/message/1282663#1282663).
Maybe it would be useful to expose some control of sync mode to OpenCL runtime API via some extension or runtime call? Busy-wait or interrupt driven...
I hope to hear some thoughts on this topic from AMD devs....