According to gcc manual x86,
double
long double
long long
depends on whether you place it on a 32-bit or 64-bit boundaryx86-64
Enabled by defaultand so on.The last section is subtly inaccurate, or x86-64
always has -malign-double
enabled and -mno-align-double
disabled.
So what makes me happy is that it's hard to understand unless I know the hardware specifications.Close your eyes to the rigor and talk about it roughly.
double
one read/write requires two bus access.double
one read/write requires one bus access.The above statement is only valid if it is consistent. Do you understand that if the 64-bit value, or double
, is not matched to 64-bit, bus access is required twice?It's easy to understand, but I'm not good at AA, so it's abbreviated.
Therefore, the answer is that matching reduces the number of bus accesses in hardware, so it can be faster.
The -malign-double
object and the -mno-align-double
object (on x86 32-bit) are incompatible and are dangerous to mix.
I don't think I explained it well, so I added it
The 32-bit CPU and 32-bit OS/app combination does not change the speed (the number of bus accesses remains the same) regardless of whether you specify -malign-double
.It would be safer not to specify the risk of not mixing.
Only 64-bit CPUs and 32-bit OS/apps are more likely to reduce bus access by 1 (not always effective even if -malign-double
is not specified) but the x86 does not differ enough to experience even if the L1cache access is reduced by 1.
For 64-bit CPUs and 64-bit OS/apps, -malign-double
is enabled from the beginning and cannot be disabled, so there is no difference in the first place.
So if there's a difference you can experience,
It's about o'clock (if you're a true ordinary person, the software would have moved to 64bit immediately, so if you're in this situation, you're probably an ordinary person).
For general use, you don't expect a noticeable speed boost, and you don't see much of the benefits of a subtle increase in memory consumption, and there are only the disadvantages of structure
becoming ABI incompatible.Compiler implementers naturally understand all of this, so there are good reasons for compilation options that are not enabled by default.It is important for us end users to understand the risks and benefits when doing so.
As 774RR suggests, it seems to be an old topic.
Prior to the introduction of SSE2, Intel Architecture Optimization Manual (1997)
Pentium processors require an extra cost of at least three clock cycles to perform unaligned access in a data cache or on a bus.In Pentium Pro and Pentium II processors, the cost of 9-12 clock cycles is generated when access is performed in a data cache that does not align (cross the cache line boundary).We recommend aligning your data to the boundaries according to the following guidelines for best execution performance on any processor:
For proper performance, we recommend that the double
be 8-byte align.This book contains
Pentium processors require extra three cycles of access to 64-bit variables that do not align with the 8-byte boundary.On Pentium Pro and Pentium II processors, such variables may occur over a 32-byte cache line boundary.Some compilers on the market do not align double-precision data with 8-byte boundaries.
There is also a complaining description.
The Fundamental Types of the SYSTEM VAPPLICATION BINARY INTERFACE Intel386 Architecture Processor Supplement referenced by Linux and others states that the double
is a four-byte line.Below it is
The Intel386 architecture does not require doubleword alignment for double precision values.Nevertheless, for data structure compatibility with other Intel architectures, compilers may provide method to align double-precision values on doubleword boundaries. It says, and he seems to know that it will be 8 bytes. Therefore, I think that gcc has a By the way, on Windows, use /Zp(Struct Member Alignment) to With the advent of the SSE instruction, the situation has changed completely.Unlike previous commands, data alignment is required for SSE instructions.For example, the SSE handles 128-bit data, so it must be 128-bit, or 16-byte align.-malign-double
that follows the platform specification and allows it to be a 4-byte line, but also provides an 8-byte line option for proper performance>
-malign-double
equivalent is the default value in line with Intel's recommendation.
As a result, the topic of the 8-byte line has disappeared in the Intel 64 Architecture and IA-32 Architecture Optimization Reference Manual (2011).
613 GDB gets version error when attempting to debug with the Presense SDK (IDE)
574 Who developed the "avformat-59.dll" that comes with FFmpeg?
573 rails db:create error: Could not find mysql2-0.5.4 in any of the sources
916 When building Fast API+Uvicorn environment with PyInstaller, console=False results in an error
© 2024 OneMinuteCode. All rights reserved.