¿La aritmética de punteros sigue funcionando fuera de la matriz?

I am always reading that pointer arithmetic is defined as long as you don't leave the bounds of the array. I am not sure I completely understand what this means and I was a little worried. Hence this question.

Suppose I start with a pointer to the beginning of an array:

int *p = (int*) malloc(4 * sizeof(int));

Now I create two new pointers that lie outside the bounds of the array:

int *q = p + 10;
int *r = p - 2;

Now the pointers q-10, q-9, ..., r+2, r+3, and so on all lie inside the bounds of the array. Are they valid? For example, is r[3] garantiza to give the same result as p[1]?

I have done some testing and it works. But I want to know if this is covered by the usual C specifications. Specifically, I am using Visual Studio 2010, Windows, and I am programming in native C (not C++). Am I covered?

preguntado el 24 de agosto de 12 a las 04:08

@Chris post that as an answer -

As long as the dereferencing of the address points to the block of memory, all of which are inside bound, then the behavior should be defined. -

@chris If you post that as an answer, please, add some more details. I am asking the question because I already read that and it wasn't completely clear to me. -

@chris: Historically, the el porqué is segmented memory; while pointers might involve a segment and offset component, a compiler could do the arithmetic just on the offset component to avoid runtime cost. This makes arithmetic between pointers not in the same segment invalid. -

@R.. You should insert that last comment in your answer. -

3 Respuestas

What you're doing works on the implementation you're using, as well as most popular implementations, but it's not conforming C. As chris cited,

§6.5.6/8: If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined

The fact that it's undefined will probably become increasingly important in the future, with more advanced static analysis allowing compilers to turn this kind of code into fatal errors without incurring runtime cost.

By the way, the historical reason for subtracting pointers not within the same array being undefined is segmented memory (think 16-bit x86; those familiar with it will want to think of the "large" memory model). While pointers might involve a segment and offset component, a compiler could do the arithmetic just on the offset component to avoid runtime cost. This makes arithmetic between pointers not in the same segment invalid since the "high part" of the difference is lost.

Respondido 24 ago 12, 05:08

In addition to issues regarding x86-style segmentation, it may also be helpful for compilers--especially in some "troubleshooting" scenarios, to have each "pointer" actually hold three addresses--the start and end of the allocated region of which the pointed-to- object is a part, as well as a pointer to the object itself, and use such information to trap code which would perform invalid pointer computations. Trapping erroneous pointer computations when they occur can often greatly facilitate the tracking down of pointer-related bugs, and the Standard doesn't want to forbid that. - Super gato

@supercat: Absolutely. Advanced pointer representations aimed at helping to diagnose/catch usage errors, or even make the system provably memory-safe (this is possible!), are great reasons for maintaining the C language's restrictions on pointer arithmetic. - R .. GitHub DEJA DE AYUDAR A ICE

IMHO, the proper thing for the Standard to do in many such cases would be to define _STDC* macros which would indicate what guarantees the compiler will or won't provide as presently configured. Since many compilers have options that can guarantee behaviors in cases not required by the Standard, being able to precede existing code that relies upon such semantics with #if (__STDC_GUARANTEES && !__STDC_DIRECT_LINEAR_POINTERS) #error This code requires direct linear pointer semantics. #endif would ensure that moving to a new C17 compiler would not cause the code to behave erroneously. - Super gato

It may be that the only way to make the code usable with a new compiler would be to rewrite it so as not to require such semantics, but the need for rewrite would become apparent when it arose. If within the useful lifetime of the code it never becomes necessary to use it with a compiler that can't support such semantics, rewriting the code for compatibility with such compilers impose huge costs but negative benefit (since any mistakes in the rewrite could add bugs to code whose behavior would otherwise have been correct when evaluated using the specified semantics). - Super gato

@supercat: There's already a way to get compile-time errors when the application requires munging pointers as integers and the implementation doesn't support it: the cast to uintptr_t is an error because the (optional) type uintptr_t is not defined. Of course there's a theoretical case where the type/conversion is defined but not a flat linear mapping, so if that would break your application too you need to deal with it in some other way... - R .. GitHub DEJA DE AYUDAR A ICE

According to the C11 standard, §6.5.6/8 (I put in the first part for context):

When an expression that has integer type is added to or subtracted from a pointer
...
Si tanto el operando del puntero como el resultado apuntan a elementos del mismo objeto de matriz, o uno más allá del último elemento del objeto de matriz, la evaluación no producirá un desbordamiento; de lo contrario, el comportamiento no está definido.

Therefore, a result that is outside of the array and not one past the end is undefined behaviour.

Respondido 24 ago 12, 04:08

Which is funny because p + 2 - 2 está definido, pero p - 2 + 2 is undefined. So much for associativity... - Mística

@Mysticial: Indeed. By the way, this brings up an easy-to-make error. Code like ptr + index - base_index is wrong and invokes UB if index goes outside the bounds of the pointed-to array. The alternate forms ptr + (index - base_index) or &ptr[index-base_index] are usually what you need. I've made this mistake a number of times myself. - R .. GitHub DEJA DE AYUDAR A ICE

@Mysticial Exactly! That's why I thought it was weird. - hacer señas

@R.. But this mistake has resulted in an actual error? Or just, say, a warning? - hacer señas

Not that exact issue, but a similar one has led to a major error. I was mistakenly adding a moderately over-large offset to a base pointer to get an "end pointer", then looping as long as the pointer was less than the end pointer. The code worked fine on i386-linux running native on 32-bit processors (where pointers in the 3-4gb range never occur) and on x86_64 processors, but failed in i386 code running on a 64-bit kernel, since the stack got put very close to 0xffffffff and the addition overflowed. The result was a bad crash that was difficult to track down. - R .. GitHub DEJA DE AYUDAR A ICE

"Yes" the conditions you mentioned are covered in specifications.

    int *r = p - 2; 

r is outside bounds of array p, the evaluation results in allocation of position to r, 2 int positions behind/before the address of p.

`r[3]` is simply the "4th" int position after the address of r

Respondido 24 ago 12, 07:08

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.