c ++: trabajando con bytes

My problem is, that I need to load a binary file and work with single bits from the file. After that I need to save it out as bytes of course.

My main problem is - what datatype to choose to work in - char or long int? Can I somehow work with chars?

preguntado el 09 de marzo de 12 a las 14:03

Btw, how long is your file? Is it really necessary to think about optimalization already? And do you have to change single bytes or are the 'single bits' chunks of bytes? -

@Deepak: Using ints to parse binary data is just asking for endianness problems. -

It depends on what operations he wants to do, ANDing 8 chars is equal to one int operation.(x64) -

Deepak: sizeof(long int) no siempre es lo mismo que sizeof(int). It's certainly not on the setup I'm typing this on. -

@Deepak: when its the same, then why sizefo(long int) != sizeof(int) here? -

6 Respuestas

Unless performance is mission-critical here, use whatever makes your code easiest to understand and maintain.

respondido 09 mar '12, 14:03

Disregard my answer, this is rule #1 - Daramarak

+1 And do not reinvent the wheel if possible, if you do not have to work with a predefined serialization format, don't go invent one. - KillianDS

Agree, even though it is so fun reinventing the wheel. "Look, mine is squared" - Daramarak

It's possible that a clarified question could invite a more detailed recommendation. Not clear to me that this needs to be overthought from the info to hand, though. - Steve Townsend

Before beginning to code any thing make sure you understand Endianess, c++ type sizes, Y cómo extrañas puede ser que sean.

El unsigned char is the only type that is a fixed size (natural byte of the machine, normally 8 bits). So if you design for portability that is a safe bet. But it isn't hard to just use the unsigned int o incluso un long long to speed up the process and use size_of to find out how many bits you are getting in each read, although the code gets more complex that way.

You should know that for true portability none of the internal types of c++ is fixed. An unsigned char might have 9 bits, and the int might be as small as in the range of 0 to 65535, as noted in este y este https://www.youtube.com/watch?v=xB-eutXNUMXJtA&feature=youtu.be

Another alternative, as user1200129 suggests, is to use the boost integer library to reduce all these uncertainties. This is if you have boost available on your platform. Although if going for external libraries there are many serializing libraries to choose from.

But first and foremost before even start optimizing, make something simple that work. Then you can start profiling when you start experiencing timing issues.

contestado el 23 de mayo de 17 a las 13:05

Yeah, world of programming gets strange at once you start exploring alien platforms ;) - Daramarak

You can use boost integer.hpp for portable int types. For example, if you need to ensure you get 64 signed bits, you can use boost::int64_t across different compilers and operating systems and you'll always get the type you expect. This is especially important when you need to reinterpret_cast data. - 01100110

It really just depends on what you are wanting to do, but I would say in general, the best speed will be to stick with the size of integers that your program is compiled in. So if you have a 32 bit program, then choose 32 bit integers, and if you have 64 bit, choose 64 bit.

This could be different if there are some bytes in your file, or if there are integers. Without knowing the exact structure of your file, it's difficult to determine what the optimal value is.

respondido 09 mar '12, 14:03

Your sentences are not really correct English, but as far as I can interpret the question you can beter use unsigned char (which is a byte) type to be able to modify each byte separately.

Edit: changed according to comment.

respondido 09 mar '12, 14:03

What's an unsigned byte? byte is an unsigned char. - MByD

Now it is somewhat proper English. :) - Profesor Falken

Since there is no definition for byte in C, you can't say if it's signed or not. - Señor lister

@Michel you edited it the wrong way round. you were looking for unsigned char. - Señor lister

Fixed (Friday Afternoon Syndrome) - michel keijzers

If you are dealing with bytes then the best way to do this is to use a size specific type.

#include <algorithm>
#include <iterator>
#include <cinttypes>
#include <vector>
#include <fstream>

int main()
{
     std::vector<int8_t> file_data;
     std::ifstream file("file_name", std::ios::binary);

     //read
     std::copy(std::istream_iterator<int8_t>(file),
               std::istream_iterator<int8_t>(),
               std::back_inserter(file_data));

     //write
     std::ofstream out("outfile");           
     std::copy(file_data.begin(), file_data.end(),
               std::ostream_iterator<int8_t>(out));

}

EDIT fixed bug

respondido 09 mar '12, 14:03

the uint8_t are not guaranteed to be defined for all systems. But it much more clearly states the intent of the use. - Daramarak

The C99 standard has been around a long time, and almost all systems have <stdint.h>. (I can't think of one that doesn't, honestly. It's one of the easiest headers ever to provide.) The C++ equivalent might not be there, but that's easily worked around. - mike de simone

If you need to enforce how many bits are in an integer type, you need to be using the <stdint.h> header. It is present in both C and C++. It defines type such as uint8_t (8-bit unsigned integer), which are guaranteed to resolve to the proper type on the platform. It also tells other programmers who read your code that the number of bits is important.

If you're worrying about performance, you might want to use the larger-than-8-bits types, such as uint32_t. However, when reading and writing files, you will need to pay attention to the Endianess of your system. Notably, if you have a ascendente hacia la izquierda system (e.g. x86, most all ARM), then the 32-bit value 0x12345678 will be written to the file as the four bytes 0x78 0x56 0x34 0x12, while if you have a big endian system (e.g. Sparc, PowerPC, Cell, some ARM, and the Internet), it will be written as 0x12 0x34 0x56 0x78. (same goes or reading). You can, of course, work with 8-bit types and avoid this issue entirely.

respondido 09 mar '12, 14:03

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.