SSE _mm_load_ps causando fallas de segmentación

So I have been having trouble with this toy example for learning to program with SSE intrinsics. I read on other threads here that sometimes segmentation faults with the _mm_load_ps function are caused by not aligning things right but I think it should be solved by the atributo((alineado(16))) thing that I did. Also, when I comment out either line 23 or 24 (or both) in my code the problem goes away but obviously this makes the code not work.

#include <iostream>
using namespace std;

int main()
{
        float temp1[] __attribute__((__aligned__(16))) = {1.1,1.2,1.3,14.5,3.1,5.2,2.3,3.4};
        float temp2[] __attribute__((__aligned__(16))) = {1.2,2.3,3.4,3.5,1.2,2.3,4.2,2.2};
        float temp3[8];
        __m128 m, *m_result;
        __m128 arr1 = _mm_load_ps(temp1);
        __m128 arr2 = _mm_load_ps(temp2);

        m = _mm_mul_ps(arr1, arr2);
        *m_result = _mm_add_ps(m, m); 
        _mm_store_ps(temp3, *m_result); 
        for(int i = 0; i < 4; i++)
        {   
            cout << temp3[i] << endl;
        }   

        m_result++;
        arr1 = _mm_load_ps(temp1+4);
        arr2 = _mm_load_ps(temp2+4);
        m = _mm_mul_ps(arr1, arr2);
        *m_result = _mm_add_ps(m,m);
        _mm_store_ps(temp3, *m_result); 


        for(int i = 0; i < 4; i++)
        {   
            cout << temp3[i] << endl;
        }   
        return 0;
}

Line 23 is arr1 = _mm_load_ps(temp1+4). It's weird to me that I can do one or the other but not both. Any help would be appreciated, thanks!

preguntado el 12 de febrero de 14 a las 08:02

You've aligned temp1 to 16 bytes but then added 4 to it pretty much ensuring that it is NOT 16 byte aligned... -

I'm sorry, I don't follow, don't I have to use _mm_load_ps(temp1+4) to load in the next 4 values? -

@jcoder: +4 (floats) is +16 bytes, so the loads are OK - it's the two stores that are the problem here. -

...and the dangling pointer of course. -

@PaulR absolutely. Please disregard my comment... I shoudn't comment until I've had at least 2 cups of coffee! My apologies for the wrong comment. -

2 Respuestas

Your problem is that you declare a pointer __m128 *m_result but you never allocate any space for it. Later you also do m_result++ which points to another memory address which has not been allocate. There is no reason to use a pointer here.

#include <xmmintrin.h>                 // SSE
#include <iostream>
using namespace std;

int main()
{
        float temp1[] __attribute__((__aligned__(16))) = {1.1,1.2,1.3,14.5,3.1,5.2,2.3,3.4};
        float temp2[] __attribute__((__aligned__(16))) = {1.2,2.3,3.4,3.5,1.2,2.3,4.2,2.2};
        float temp3[8];
        __m128 m, m_result;
        __m128 arr1 = _mm_load_ps(temp1);
        __m128 arr2 = _mm_load_ps(temp2);

        m = _mm_mul_ps(arr1, arr2);
        m_result = _mm_add_ps(m, m); 
        _mm_store_ps(temp3, m_result); 
        for(int i = 0; i < 4; i++)
        {   
            cout << temp3[i] << endl;
        }   

        arr1 = _mm_load_ps(temp1+4);
        arr2 = _mm_load_ps(temp2+4);
        m = _mm_mul_ps(arr1, arr2);
        m_result = _mm_add_ps(m,m);
        _mm_store_ps(temp3, m_result); 


        for(int i = 0; i < 4; i++)
        {   
            cout << temp3[i] << endl;
        }   
        return 0;
}

Respondido 12 Feb 14, 09:02

Tenga en cuenta también que temp3 is potentially misaligned. - Paul R

Good point. I just tried the code and it worked. That's probably because it was compiled in 64-bit mode which alignes variables on the stack to 16 bytes. - Bosón Z

Yes, compiling for 64 bits can be both a blessing and a curse. Note also that you do not need SSE4.1 here - there's nothing beyond SSE2 in the code above. - Paul R

Right again. I though he said he used _mm_dp_ps but I don't see that now. - Bosón Z

@PaulR, actually, there is nothing beyond SSE here. Not even SSE2 is needed. - Bosón Z

(1) m_result es solo un puntero salvaje:

     __m128 m, *m_result;

Cambiar todas las apariciones de *m_result a m_result y deshacerse de la m_result++; (m_result is just a temporary vector variable that you are subsequently storing to temp3).

(2) Your two stores are potentially misaligned, since temp3 has no guaranteed alignment - either change:

    float temp3[8];

a:

    float temp3[8] __attribute__((__aligned__(16)));

vea la sección _mm_storeu_ps:

    _mm_storeu_ps(temp3, m_result); 
            ^^^

Respondido 12 Feb 14, 13:02

+1 Since your answer is a bit better than mine. Though I think the proper term here is a wild pointer not a dangling pointer. Wild pointers are pointers that have not been initialized to allocated memory and dangling pointers are ones the point to memory that has already been deallocated. en.wikipedia.org/wiki/Dangling_pointer - Bosón Z

You're right - strictly speaking this is a wild pointer - I've always called them dangling pointers, regardless of whether they are wild or dangling, and it's a hard habit to break. I'll fix it. - Paul R

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.