Procesamiento de texturas por línea acelerado con OpenGL / OpenCL

I have a rendering step which I would like to perform on a dynamically-generated texture.

The algorithm can operate on rows independently in parallel. For each row, the algorithm will visit each pixel in left-to-right order and modify it in situ (no distinct output buffer is needed, if that helps). Each pass uses state variables which must be reset at the beginning of each row and persist as we traverse the columns.

Can I set up OpenGL shaders, or OpenCL, or whatever, to do this? Please provide a minimal example with code.

preguntado el 02 de febrero de 12 a las 10:02

What sort of algorithm are you applying to the pixels? Random noise with some restrictions on colour? Are the pixel values related to their neighbours'? (I assume this is what the state is used for..) -

Yes, pixel values are related to their neighbours. Nothing random. -

2 Respuestas

If you have access to GL 4.x-class hardware that implements EXT_shader_image_load_store or ARB_shader_image_load_store, I imagine you could pull it off. Otherwise, in-situ read/write of an image is generally not possible (though there are ways with NV_texture_barrier).

That being said, once you start wanting pixels to share state the way you do, you kill off most of your potential gains from parallelism. If the value you compute for a pixel is dependent on the computations of the pixel to its left, then you cannot actually execute each pixel in parallel. Which means that the only parallelism your algorithm actually has is per-row.

That's not going to buy you much.

Si realmente want to do this, use OpenCL. It's much friendlier to this kind of thing.

Respondido 02 Feb 12, 20:02

Yes, as the question says, it's per-row parallelism. And it buys a lot -- consider a 1000*1000 image, does your GPU have 1,000,000 processing units? In-situ processing isn't a requirement, it's a possibility. - Spraff

@spraff: I think you're missing Nicol's point. OpenGL does not typically process rows, it processes individual pixels. So sharing data means synchronizing pixels to some extent. OpenCL, OTOH, can process a single row in one shader execution. The execution model there makes it easier. - Bahbar

Ok, I've expanded the question. I would welcome an OpenCL solution, but don't know how to go about it. - Spraff

@spraff: That didn't really expand the question; I already answered that yes, you puede do it with the right hardware. In both OpenGL and OpenCL. And while you might gain something from processing 1000 rows, the problem is that to share state for each row, you have to either synchronize fragment shaders with each other (difficult and slow) or process the entire row with a single fragment shader, which could run into implementation limits on the number of fetches you can do in a single shader. - Nicol bolas

Yes, you can do it. No, you don't need 4.X hardware for that, you need fragment shaders (with flow control), framebuffer objects and floating point texture support.

You need to encode your data into 2D texture.

Store "state variable" in 1st pixel for each row, and encode the rest of the data into the rest of the pixels. It goes without saying that it is recommended to use floating point texture format.

Use two framebuffers, and render them onto each other in a loop using fragment shader that updates "state variable" at the first column, and performs whatever operation you need on another column, which is "current". To reduce amount of wasted resources you can limit rendering to columns you want to process. NVidia OpenGL SDK examples had "game of life", "GDGPU fluid", "GPU partciles" demos that work in similar fashion - by encoding data into texture and then using shaders to update it.

However, because you can do it, it doesn't mean you should do it and it doesn't mean that it is guaranteed to be fast. Some GPUs might have a very high memory texture memory read speed, but relatively slow computation speed (and vice versa) and not all GPUs have many conveyors for processing things in parallel.

Also, depending on your app, CUDA or OpenCL might be more suitable.

Respondido 02 Feb 12, 21:02

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.