What are the advantages of making double an 8-byte alignment with the -malign-double option?

Asked 2 years ago, Updated 2 years ago, 270 views

What are the specific benefits of double being an 8-byte alignment with the -malign-double option?

c++ c gcc

2022-09-30 22:00

2 Answers

According to gcc manual x86,

  • doublelong doublelong long depends on whether you place it on a 32-bit or 64-bit boundary
  • Pentium speeds up a bit (in exchange for a small amount of memory sacrifice)
  • x86-64 Enabled by default

and so on.The last section is subtly inaccurate, or x86-64 always has -malign-double enabled and -mno-align-double disabled.

So what makes me happy is that it's hard to understand unless I know the hardware specifications.Close your eyes to the rigor and talk about it roughly.

  • A 32-bit CPU has a 32-bit bus width, so you can read and write a 32-bit value with a single bus access.In other words, double one read/write requires two bus access.
  • A 64-bit CPU has a 64-bit bus width, so you can read and write a 64-bit value with a single bus access.In other words, double one read/write requires one bus access.

The above statement is only valid if it is consistent. Do you understand that if the 64-bit value, or double, is not matched to 64-bit, bus access is required twice?It's easy to understand, but I'm not good at AA, so it's abbreviated.

Therefore, the answer is that matching reduces the number of bus accesses in hardware, so it can be faster.

The -malign-double object and the -mno-align-double object (on x86 32-bit) are incompatible and are dangerous to mix.

I don't think I explained it well, so I added it

The 32-bit CPU and 32-bit OS/app combination does not change the speed (the number of bus accesses remains the same) regardless of whether you specify -malign-double.It would be safer not to specify the risk of not mixing.

Only 64-bit CPUs and 32-bit OS/apps are more likely to reduce bus access by 1 (not always effective even if -malign-double is not specified) but the x86 does not differ enough to experience even if the L1cache access is reduced by 1.

For 64-bit CPUs and 64-bit OS/apps, -malign-double is enabled from the beginning and cannot be disabled, so there is no difference in the first place.

So if there's a difference you can experience,

  • Hardly, 64-bit CPUs have become popular, but 32-bit software is still the mainstream
  • Using 32-bit numerical analysis software that simply repeats the double operation

It's about o'clock (if you're a true ordinary person, the software would have moved to 64bit immediately, so if you're in this situation, you're probably an ordinary person).

For general use, you don't expect a noticeable speed boost, and you don't see much of the benefits of a subtle increase in memory consumption, and there are only the disadvantages of structure becoming ABI incompatible.Compiler implementers naturally understand all of this, so there are good reasons for compilation options that are not enabled by default.It is important for us end users to understand the risks and benefits when doing so.


2022-09-30 22:00

As 774RR suggests, it seems to be an old topic.

Prior to the introduction of SSE2, Intel Architecture Optimization Manual (1997)

3.4.2 Data

Pentium processors require an extra cost of at least three clock cycles to perform unaligned access in a data cache or on a bus.In Pentium Pro and Pentium II processors, the cost of 9-12 clock cycles is generated when access is performed in a data cache that does not align (cross the cache line boundary).We recommend aligning your data to the boundaries according to the following guidelines for best execution performance on any processor:

  • The 8-bit data aligns to any boundary.
  • Align 16-bit data so that the alignment is within the matching 4-byte word
  • 32-bit data aligns to any boundary that is an integer multiple of 4 bytes
  • 64-bit data aligns to any boundary that is an integer multiple of 8 bytes
  • 80-bit data aligns to a 128-bit boundary (i.e., any boundary that is an integer multiple of 16 bytes).

For proper performance, we recommend that the double be 8-byte align.This book contains

3.5.1.5 Alignment of data in memory and on the stack

Pentium processors require extra three cycles of access to 64-bit variables that do not align with the 8-byte boundary.On Pentium Pro and Pentium II processors, such variables may occur over a 32-byte cache line boundary.Some compilers on the market do not align double-precision data with 8-byte boundaries.

There is also a complaining description.

The Fundamental Types of the SYSTEM VAPPLICATION BINARY INTERFACE Intel386 Architecture Processor Supplement referenced by Linux and others states that the double is a four-byte line.Below it is

The Intel386 architecture does not require doubleword alignment for double precision values.Nevertheless, for data structure compatibility with other Intel architectures, compilers may provide method to align double-precision values on doubleword boundaries.

It says, and he seems to know that it will be 8 bytes.

Therefore, I think that gcc has a -malign-double that follows the platform specification and allows it to be a 4-byte line, but also provides an 8-byte line option for proper performance>

By the way, on Windows, use /Zp(Struct Member Alignment) to

  • x86, ARM and ARM64 are 8-byte line
  • x64 is a 16-byte line

-malign-double equivalent is the default value in line with Intel's recommendation.

With the advent of the SSE instruction, the situation has changed completely.Unlike previous commands, data alignment is required for SSE instructions.For example, the SSE handles 128-bit data, so it must be 128-bit, or 16-byte align.
As a result, the topic of the 8-byte line has disappeared in the Intel 64 Architecture and IA-32 Architecture Optimization Reference Manual (2011).


2022-09-30 22:00

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.