In general, developing fast applications for Intel Architecture (IA) processors is not difficult. An understanding of the architecture and good development practices make the difference between a fast application and one that runs significantly slower than its full potential. Of course, applications developed for the 8086/8088, 80286, Intel386™ (DX or SX), and Intel486™ processors will execute on the Pentium ®, Pentium Pro and Pentium II processors without any modification or recompilation. However, the following code optimization techniques and architectural information will help you tune your application to its greatest potential.

Tuning an application to execute fast across the Intel Architecture (IA) is relatively simple when the programmer has the appropriate tools. To begin the tuning process, you need the following:
• Knowledge of the Intel Architecture. See Chapter 2.
• Knowledge of critical stall situations that may impact the performance of your application. See Chapters 3, 4 and 5.
• Knowledge of how good your compiler is at optimization and an understanding of how to help the compiler produce good code.
• Knowledge of the performance bottlenecks within your application. Use the VTune performance monitoring tool described in this document.
• Ability to monitor the performance of the application. Use VTune.

VTune, Intel’s Visual Tuning Environment Release 2.0 is a useful tool to help you understand your application and where to begin tuning. The Pentium and Pentium Pro processors provide the ability to monitor your code with performance event counters. These performance event counters can be accessed using VTune. Within each section of this document the appropriate performance counter for measurement will be noted with additional tuning information. Additional information on the performance counter events and programming the counters can be found in Chapter 7. Section 1.4 contains order information for VTune.

Download pdf Intel Architecture Optimization Manual