Rendimiento del producto de matriz Intel MKL con tipos de datos double, float e int

I'm experimenting with the Intel MKL library for performing Matrix multiplication using the Boost::uBLAS interface they provide (including mkl_boost_ublas_matrix_prod.hpp). My data are just integers so I tried changing my matrix template type to int and performance went way down, mostly it seems due to the code using only a single CPU core instead of the 12 I have available. I couldn't find anything in the MKL docs to explain why integers weren't using the OpenMP multithreading capabilities of MKL (I'm guessing they weren't using MKL at all?).

Furthermore, I'm seeing a 50% performance hit with doubles when compared to floats.

PREGUNTAS:

  1. Why the disparity between floats and doubles?
  2. Why can't I use integers?

Here are my results from the code below:

  matrix<float>(10000x10000):  13 seconds (12 threads used)
 matrix<double>(10000x10000):  26 seconds (12 threads used)
    matrix<int>(10000x10000):  >1000 seconds (1 thread used, stopped early)
  matrix<float>(25000x25000): 187 seconds (12 threads used)
 matrix<double>(25000x25000): 401 seconds (12 threads used)

Code Used (replace both matrix< type > lines as required):

#include <boost/numeric/ublas/matrix.hpp>
#include <mkl_boost_ublas_matrix_prod.hpp>

using namespace boost::numeric::ublas;

void benchmark() {

    int size = 10000;
    matrix<float> m(size, size);
    for (int i = 0; i < size; ++i) {
        for (int j = 0; j < size; ++j) {
            m(i,j) = 2*i-j;
        }
    }
    matrix<float> r(size, size);
    r = prod(m,m);
}

int main(int argc, char *argv[]) {
    benchmark();
    return 0;
}

Compilado con:

 g++ Flags: -std=c++0x -O3 -DNDEBUG -DMKL_ILP64  -m64 -msse4.2 -march=native -mtune=native
 ld Flags:  -lmkl_intel_ilp64 -lmkl_gnu_thread -lmkl_core -fopenmp -lpthread -lm

Procesador:

 Intel Xeon E7530 with 6 Cores (x2) with HT. 

The MKL doesn't use hyperthreads as they say it wouldn't help with anything so I have 12 threads available, not 24.

preguntado el 31 de julio de 12 a las 15:07

1 Respuestas

Why the disparity between floats and doubles?

Modern CPUs use vector instructions to perform floating point arithmetics. These instructions have fixed throughput and length, for instance each core of Intel Xeon E7530 is capable of processing two 128-bit add or multiply per cycle. This results in 4 doubles or 8 floats per cycle.

Why can't I use integers?

Templates in ublas examples map matrix multiplication for float and double templates to MKL SGEMM and DGEMM functions. When you change matrix template from float/double to int BOOST uses reference implementation from matrix multiplication, as MKL doesn't provide support for integer matrix multiplication.

Respondido 01 ago 12, 11:08

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.