Bucle anidado en GPU con OpenACC devuelve Fallo de segmentación
Frecuentes
Visto 610 equipos
1
there are weeks that I lose my mind with OpenACC! For example I have to parallelize the code
float temp;
for( i_v = 0; i_v < ntv; i_v++ )
{
temp = 0;
for( i_el = 0; i_el < len_tv; i_el++ )
temp += pow( tva[i_v*len_tv + i_el], (float)2.0 );
tv_sq[i_v]=temp;
}
to do that, I have tried several ways, the last after the reading of Nvidia Technology Group that explains very well the function of gang and vector. So I try:
float temp;
#pragma acc data copyin(tva[:nfa]) copyout(tv_sq[:ntv]) create(temp)
{
#pragma acc kernels loop independent
for( i_v = 0; i_v < ntv; i_v++ )
{
temp = 0;
#pragma acc loop independent gang vector reduction(+:temp)
for( i_el = 0; i_el < len_tv; i_el++ )
temp += pow( tva[i_v*len_tv + i_el], (float)2.0 );
tv_sq[i_v]=temp;
}
}
the compiler said
710, Generating create(temp)
Generating copyout(tv_sq[0:ntv])
Generating copyin(tva[0:nfa])
712, Loop is parallelizable
Accelerator kernel generated
712, #pragma acc loop gang /* blockIdx.x */
716, #pragma acc loop vector(128) /* threadIdx.x */
712, Generating present_or_copyout(tv_sq[0:ntv])
Generating present_or_copyin(tva[0:nfa])
Generating NVIDIA code
Generating compute capability 3.5 binary
716, Loop is parallelizable
but when I try to execute:
Segmentation fault (core dumped)
the same with cuda-memcheck, it answers only segmentation fault. Previously I insert the pragmas in "parallel" ways and not "kernels", but I had problems of "Invalid global write of size 8 (...) Out of bounds".
Clearly there is something that I haven't understand, so if somebody can tell me the way to parallelize these code I'll try to understand my errors
Muchas muchas gracias
0 Respuestas
No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas openacc or haz tu propia pregunta.
A seg fault almost always indicates a problem in host code, not accelerator code. Probably you haven't shown enough of what you are doing. How is
nfa
set, for example? What version of PGI tools are you using? When I build a simple app (only modifyingnfa
) around the code you have shown, I get Buenos resultados. If you want help, please provide a completar application, just as I have shown, along with the compile command and output, and results from running it, just as I have shown. - Robert Crovelladiscard the pragmas and check whether you still get the segfault. To do this, compile the source code with gcc/g++. - lashgar