Software optimization resources. C++ and assembly. Windows, Linux, BSD, Mac OS X

This series of five manuals describes everything you need to know about optimizing code for x86 and x86-64 family microprocessors, including optimization advices for C++ and assembly language, details about the microarchitecture and instruction timings of most Intel, AMD and VIA processors, and details about different compilers and calling conventions. This is a collection of C++ classes, functions and operators that makes it easier to use the the vector instructions (Single Instruction Multiple Data instructions) of modern CPUs without using assembly language. Supports the SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX512, FMA, and XOP instruction sets. Includes standard mathematical functions. Can compile for different instruction sets from the same source code. Description and instructions. Message board. This utility can be used for converting object files between COFF/PE, OMF, ELF and Mach-O formats for all 32-bit and 64-bit x86 platforms. Can modify symbol names in object files. Can build, modify and convert function libraries across platforms. Can dump object files and executable files. Also includes a very good disassembler supporting the SSE4, AVX, AVX2, AVX512, FMA3, FMA4, XOP and Knights Corner instruction sets. Source code included (GPL). Manual. This is a library of optimized subroutines coded in assembly language. The functions in this library can be called from C, C++ and other compiled high-level languages. Supports many different compilers under Windows, Linux, BSD and Mac OS X operating systems, 32 and 64 bits. This library contains faster versions of common C/C++ memory and string functions, fast functions for string search and string parsing, fast integer division and integer vector division, as well as several useful functions not found elsewhere. The package contains library files in many different file formats, C++ header file and assembly language source code. Gnu general public license applies. Manual. This is a proposal and discussion of how an ideal instruction set architecture can be constructed. The proposed instruction set combines the best from the RISC and CISC principles to produce a flexible, consistent, modular, orthogonal and expansible instruction set for high performance microprocessors. This instruction set has variable-length vector registers and a special addressing mode that allows the software to automatically adapt to different microprocessors with different maximum vector lengths and make efficient loops through arrays regardless of whether the array size is divisible by the vector length. Standardization of the corresponding ecosystem of ABI standards, function libraries, compilers, etc. makes it possible to combine different programming languages in the same program. Test programs that I have used for my research. Can measure clock cycles and performance monitor counters such as cache misses, branch mispredictions, resource stalls etc. in a small piece of code in C, C++ or assembly. Can also set up performance monitor counters for reading inside another program. Supports Windows and Linux, 32 and 64 bit mode, multiple threads. This is a program that can change the CPUID vendor string, family and model number on VIA Nano processors. See my blog for a discussion of the purpose of this program. MAQAO (Modular Assembly Quality Analyzer and Optimizer), a tool for analyzing and optimizing binary codes. Reference manuals and other documents can be found at Intel's web site. Intel's web site is refurnished so often that any link I could provide here to specific documents would be broken after a few months. I will therefore recommend that you use the search facilities at and search for 'Software Developer's Manual' and 'Optimization Reference Manual'. Source.

Яндекс.Метрика Рейтинг Free Web Counter
page counter
Last Modified: April 18, 2016 @ 6:06 am