Justin Vallon writes:
On Wed, 16 Jun 1999, Kyle Jones wrote:
> To try and answer Hrvoje's question, which was roughly "What is the
> difference between these two functions?"
>
> > int f() { int f() {
> > a_union t; int a;
> > int* ip; int *ip;
> > t.d = 3.0; a = 10;
> > ip = &t.i; ip = &a;
> > return *ip; return *ip;
> > } }
>
> In the function on the left the optimizer will see that there is
> no pointer reference to t.d. &t.i, not &t.d, is copied to ip.
> t.d and t.i don't necessarily share storage space unless you
> reference the union element directly. Sort of like quantum
> states being unresolved until you look at them. So 3.0 never
> has to be copied from a register to memory.
I don't think I understand that. t.i and t.d do share the same memory
space. Since the address of t.i is taken, the compiler must ensure that
at all "places that matter" (function calls, dereferences, etc), that t.d
and t.i are "flushed" to memory, since they occupy the same bits.
Not with the new rules. Taking the address does not guarantee the
flush will happen. The reason the compiler can get away with it
is the union t has a storage class of auto. No legal pointers to
this object can exist outside the block so the compiler aoid the
flush confident that nothing is pointing to union's memory.
A clever optimizer may be able to save the memory store, but it would
have
to know how to slice a double into an int. [Remember that f() does not
return 3, but the upper/lower 2/4/8 bytes of the double.]
That's the reason for the 'similar types' exception in
[...] an object of one type is assumed never to reside at the
same address as an object of a different type, unless the
types are almost the same.
The compiler writers are willing to do punning where the two
objects could fit into the same sort of register. Otherwise
forget it.
> In the function on the right, &a is copied to ip. ip is
later
> dereferenced, so the register contents need to be in memory before
> that dereference happens. (ip should really be declared "volatile
> int *ip" to keep the ip assignment and subsequent deference from
> being optimized away.)
Why bother? Let the optimizer do its best:
I meant for the sake of the example, which falls apart otherwise.