¿Por qué mis VBO son más lentos que las listas de visualización?

I created two simple voxel engines, literally just chunks that hold cubes. For the first one, I use display lists and can render hundreds of chunks at 60 FPS no problem, despite the fact that the technology behind it is years old and deprecated by now. With my VBO version, I try to render 27 chunks and I suddenly drop to less than 50 FPS. What gives? I use shaders for my VBO version, but not for display list one. Without shaders for the VBO version, I still get the same FPS rate. I'll post some relevant code:

Vbo

Initialization of chunk:

public void initGL() {
    rand = new Random();

    sizeX = (int) pos.getX() + CHUNKSIZE;
    sizeY = (int) pos.getY() + CHUNKSIZE;
    sizeZ = (int) pos.getZ() + CHUNKSIZE;

    tiles = new byte[sizeX][sizeY][sizeZ];

    vCoords = BufferUtils.createFloatBuffer(CHUNKSIZE * CHUNKSIZE * CHUNKSIZE * (3 * 4 * 6));
    cCoords = BufferUtils.createFloatBuffer(CHUNKSIZE * CHUNKSIZE * CHUNKSIZE * (4 * 4 * 6));

    createChunk();

    verticeCount = CHUNKSIZE * CHUNKSIZE * CHUNKSIZE * (4 * 4 * 6);

    vCoords.flip();
    cCoords.flip();

    vID = glGenBuffers();
    glBindBuffer(GL_ARRAY_BUFFER, vID);
    glBufferData(GL_ARRAY_BUFFER, vCoords, GL_STATIC_DRAW);
    glBindBuffer(GL_ARRAY_BUFFER, 0);

    cID = glGenBuffers();
    glBindBuffer(GL_ARRAY_BUFFER, cID);
    glBufferData(GL_ARRAY_BUFFER, cCoords, GL_STATIC_DRAW);
    glBindBuffer(GL_ARRAY_BUFFER, 0);
}
private void createChunk() {
    for (int x = (int) pos.getX(); x < sizeX; x++) {
        for (int y = (int) pos.getY(); y < sizeY; y++) {
            for (int z = (int) pos.getZ(); z < sizeZ; z++) {
                if (rand.nextBoolean() == true) {
                    tiles[x][y][z] = Tile.Grass.getId();
                } else {
                    tiles[x][y][z] = Tile.Void.getId();
                }
                vCoords.put(Shape.createCubeVertices(x, y, z, 1));
                cCoords.put(Shape.getCubeColors(tiles[x][y][z]));
            }
        }
    }
}

And then rendering:

public void render() {
    glBindBuffer(GL_ARRAY_BUFFER, vID);
    glVertexPointer(3, GL_FLOAT, 0, 0L);

    glBindBuffer(GL_ARRAY_BUFFER, cID);
    glColorPointer(4, GL_FLOAT, 0, 0L);

    glEnableClientState(GL_VERTEX_ARRAY);
    glEnableClientState(GL_COLOR_ARRAY);

    shader.use();
    glDrawArrays(GL_QUADS, 0, verticeCount);
    shader.release();

    glDisableClientState(GL_COLOR_ARRAY);
    glDisableClientState(GL_VERTEX_ARRAY);
}

I know I use quads, and that's bad, but I'm also using quads for my display list engine. The shaders are very simple, all they do is take a color and apply it to the vertices, I won't even post them they are that simple.

Lista de visualización

Inicialización:

public void init() {
    rand = new Random();

    opaqueID = glGenLists(1);

    tiles = new byte[(int) lPosition.x][(int) lPosition.y][(int) lPosition.z];

    genRandomWorld();
    rebuild();
}
public void rebuild() {
    glNewList(opaqueID, GL_COMPILE);
    glBegin(GL_QUADS);
    for (int x = (int) sPosition.x; x < (int) lPosition.x; x++) {
        for (int y = (int) sPosition.y; y < (int) lPosition.y; y++) {
            for (int z = (int) sPosition.z; z < (int) lPosition.z; z++) {
                if (checkCubeHidden(x, y, z)) {
                    // check if tiles hidden. if not, add vertices to
                    // display list
                    if (type != 0) {
                        Tile.getTile(tiles[x][y][z]).getVertices(x, y, z, 1, spritesheet.getTextureCoordsX(tiles[x][y][z]), spritesheet.getTextureCoordsY(tiles[x][y][z]));
                    } else {
                        Tile.getTile(tiles[x][y][z]).getVertices(x, y, z, 1);
                    }
                }
            }
        }
    }
    glEnd();
    glEndList();
    spritesheet.bind();
}

I should note that in my display list version, I only add in the visible cubes. So, that may be an unfair advantage, but it should not bring the VBO version down to that FPS with just 27 chunks versus 500 chunks for the display list version. I render like this:

public void render() {
    if (tiles.length != -1) {
        glEnable(GL_BLEND);
        glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
        glCallList(opaqueID);
    }
}

So, after all of that code, I really still wonder why my VBO version is just so darn slow? I do have a one dimensional list of chunks in my display list version for when I'm calling them to render, and a 3 dimensional one in my VBO version, but I think the JVM pretty much eliminates any lag with the extra dimensions. So, what am I doing wrong?

preguntado el 27 de noviembre de 13 a las 01:11

I would not call display lists old technology to be honest. They are simply a way of encapsulating the graphics API's command stream. They continue to exist to this day in some surprising places, I can tell you from experience that they still exist on the PS3's proprietary graphics API. As for why they are deprecated in OpenGL, they were never really designed correctly as far as I am concerned, they can cause all sorts of state leaks and other undesirable side-effects. VBOs are not always quicker as you have discovered, particularly in NVIDIA driver implementations. -

So, would you recommend just sticking with display lists? The thing that confounds me is that I've always been told buffer objects are great for static geometry, like the kind of geometry I'm using. The only thing I could think of that could be slowing the program down would be the buffer switches. I have 27 chunks and I render all of them at once, with each chunk having its own VBO. Maybe that's why its slow? I have no idea how slow switching a buffer is, though. -

Nobody really knows how slow switching a bound GL object is, because it is implementation specific. You are definitely right however, binding objects has additional overhead (how much, I could not say). If you can coalesce some of these VBOs into ranges of vertices within larger VBOs you might see an improvement. Likewise, you are not using interleaved VBOs, which will often further improve performance (and this is the simpler thing to try first). -

You could try interleaving your position and colour data in a single buffer. That is the usual recommendation for static data as it gives better memory access patterns during rendering. -

As SAKrisT says, use a VAO. That will encapsulate the VBO state so you won't need those extra calls. Also, interleave your vertices. Put attributes for each vertex (position, normal) next to each other, not in two separate arrays. Interleaving is kinder on the cache. -

1 Respuestas

It is hard to answer such question without having an actual project and a profiler at hand, so these are theories:

  • You don't show your Display Lists generation code in detail, so I'm assuming you are doing something alike glColor(); glVertex3f(); in a loop (not that you declared color once and done with it).
  • Display List implementation is implementation-specific, but usually that is interleaved array of vertex properties, because that is much more friendly to a cache (all vertice props are tightly aligned by 16bytes instead of being spread by a size of array). On the other hand, VBO you use is coming in two non-interleaved chunks - Coordinates and Colors. This could cause excessive unfriendly cache usage (especially with big amounts of data).

Como se señaló en los comentarios:

try interleaving your position and colour data in a single buffer. That is the usual recommendation for static data as it gives better memory access patterns during rendering. – GuyRT`

Respondido 10 Jul 14, 06:07

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.