Bucle anidado en GPU con OpenACC devuelve Fallo de segmentación

there are weeks that I lose my mind with OpenACC! For example I have to parallelize the code

float temp;
for( i_v = 0; i_v < ntv; i_v++ )
{
    temp = 0;

    for( i_el = 0; i_el < len_tv; i_el++ )
        temp += pow( tva[i_v*len_tv + i_el], (float)2.0 );

    tv_sq[i_v]=temp;
}    

to do that, I have tried several ways, the last after the reading of Nvidia Technology Group that explains very well the function of gang and vector. So I try:

float temp;
#pragma acc data copyin(tva[:nfa]) copyout(tv_sq[:ntv]) create(temp)
{
    #pragma acc kernels loop independent 
        for( i_v = 0; i_v < ntv; i_v++ )
        {
            temp = 0;

            #pragma acc loop independent gang vector reduction(+:temp)
                for( i_el = 0; i_el < len_tv; i_el++ )
                    temp += pow( tva[i_v*len_tv + i_el], (float)2.0 );

            tv_sq[i_v]=temp;
        }
}

the compiler said

 710, Generating create(temp)
     Generating copyout(tv_sq[0:ntv])
     Generating copyin(tva[0:nfa])
712, Loop is parallelizable
     Accelerator kernel generated
    712, #pragma acc loop gang /* blockIdx.x */
    716, #pragma acc loop vector(128) /* threadIdx.x */
712, Generating present_or_copyout(tv_sq[0:ntv])
     Generating present_or_copyin(tva[0:nfa])
     Generating NVIDIA code
     Generating compute capability 3.5 binary
716, Loop is parallelizable

but when I try to execute:

Segmentation fault (core dumped)

the same with cuda-memcheck, it answers only segmentation fault. Previously I insert the pragmas in "parallel" ways and not "kernels", but I had problems of "Invalid global write of size 8 (...) Out of bounds".

Clearly there is something that I haven't understand, so if somebody can tell me the way to parallelize these code I'll try to understand my errors

Muchas muchas gracias

preguntado el 14 de febrero de 14 a las 01:02

A seg fault almost always indicates a problem in host code, not accelerator code. Probably you haven't shown enough of what you are doing. How is nfa set, for example? What version of PGI tools are you using? When I build a simple app (only modifying nfa) around the code you have shown, I get Buenos resultados. If you want help, please provide a completar application, just as I have shown, along with the compile command and output, and results from running it, just as I have shown. -

discard the pragmas and check whether you still get the segfault. To do this, compile the source code with gcc/g++. -

0 Respuestas

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.