SSE _mm_load_ps causando fallas de segmentación
Frecuentes
Visto 3,466 equipos
3
So I have been having trouble with this toy example for learning to program with SSE intrinsics. I read on other threads here that sometimes segmentation faults with the _mm_load_ps function are caused by not aligning things right but I think it should be solved by the atributo((alineado(16))) thing that I did. Also, when I comment out either line 23 or 24 (or both) in my code the problem goes away but obviously this makes the code not work.
#include <iostream>
using namespace std;
int main()
{
float temp1[] __attribute__((__aligned__(16))) = {1.1,1.2,1.3,14.5,3.1,5.2,2.3,3.4};
float temp2[] __attribute__((__aligned__(16))) = {1.2,2.3,3.4,3.5,1.2,2.3,4.2,2.2};
float temp3[8];
__m128 m, *m_result;
__m128 arr1 = _mm_load_ps(temp1);
__m128 arr2 = _mm_load_ps(temp2);
m = _mm_mul_ps(arr1, arr2);
*m_result = _mm_add_ps(m, m);
_mm_store_ps(temp3, *m_result);
for(int i = 0; i < 4; i++)
{
cout << temp3[i] << endl;
}
m_result++;
arr1 = _mm_load_ps(temp1+4);
arr2 = _mm_load_ps(temp2+4);
m = _mm_mul_ps(arr1, arr2);
*m_result = _mm_add_ps(m,m);
_mm_store_ps(temp3, *m_result);
for(int i = 0; i < 4; i++)
{
cout << temp3[i] << endl;
}
return 0;
}
Line 23 is arr1 = _mm_load_ps(temp1+4). It's weird to me that I can do one or the other but not both. Any help would be appreciated, thanks!
2 Respuestas
6
Your problem is that you declare a pointer __m128 *m_result
but you never allocate any space for it. Later you also do m_result++
which points to another memory address which has not been allocate. There is no reason to use a pointer here.
#include <xmmintrin.h> // SSE
#include <iostream>
using namespace std;
int main()
{
float temp1[] __attribute__((__aligned__(16))) = {1.1,1.2,1.3,14.5,3.1,5.2,2.3,3.4};
float temp2[] __attribute__((__aligned__(16))) = {1.2,2.3,3.4,3.5,1.2,2.3,4.2,2.2};
float temp3[8];
__m128 m, m_result;
__m128 arr1 = _mm_load_ps(temp1);
__m128 arr2 = _mm_load_ps(temp2);
m = _mm_mul_ps(arr1, arr2);
m_result = _mm_add_ps(m, m);
_mm_store_ps(temp3, m_result);
for(int i = 0; i < 4; i++)
{
cout << temp3[i] << endl;
}
arr1 = _mm_load_ps(temp1+4);
arr2 = _mm_load_ps(temp2+4);
m = _mm_mul_ps(arr1, arr2);
m_result = _mm_add_ps(m,m);
_mm_store_ps(temp3, m_result);
for(int i = 0; i < 4; i++)
{
cout << temp3[i] << endl;
}
return 0;
}
Respondido 12 Feb 14, 09:02
Tenga en cuenta también que temp3
is potentially misaligned. - Paul R
Good point. I just tried the code and it worked. That's probably because it was compiled in 64-bit mode which alignes variables on the stack to 16 bytes. - Bosón Z
Yes, compiling for 64 bits can be both a blessing and a curse. Note also that you do not need SSE4.1 here - there's nothing beyond SSE2 in the code above. - Paul R
Right again. I though he said he used _mm_dp_ps
but I don't see that now. - Bosón Z
@PaulR, actually, there is nothing beyond SSE here. Not even SSE2 is needed. - Bosón Z
3
(1) m_result
es solo un puntero salvaje:
__m128 m, *m_result;
Cambiar todas las apariciones de *m_result
a m_result
y deshacerse de la m_result++;
(m_result
is just a temporary vector variable that you are subsequently storing to temp3
).
(2) Your two stores are potentially misaligned, since temp3
has no guaranteed alignment - either change:
float temp3[8];
a:
float temp3[8] __attribute__((__aligned__(16)));
vea la sección _mm_storeu_ps
:
_mm_storeu_ps(temp3, m_result);
^^^
Respondido 12 Feb 14, 13:02
+1 Since your answer is a bit better than mine. Though I think the proper term here is a wild pointer not a dangling pointer. Wild pointers are pointers that have not been initialized to allocated memory and dangling pointers are ones the point to memory that has already been deallocated. en.wikipedia.org/wiki/Dangling_pointer - Bosón Z
You're right - strictly speaking this is a wild pointer - I've always called them dangling pointers, regardless of whether they are wild or dangling, and it's a hard habit to break. I'll fix it. - Paul R
No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas c++ segmentation-fault sse intrinsics or haz tu propia pregunta.
You've aligned temp1 to 16 bytes but then added 4 to it pretty much ensuring that it is NOT 16 byte aligned... - jcoder
I'm sorry, I don't follow, don't I have to use _mm_load_ps(temp1+4) to load in the next 4 values? - Dan
@jcoder: +4 (floats) is +16 bytes, so the loads are OK - it's the two stores that are the problem here. - Paul R
...and the dangling pointer of course. - Paul R
@PaulR absolutely. Please disregard my comment... I shoudn't comment until I've had at least 2 cups of coffee! My apologies for the wrong comment. - jcoder