SH Options - Using the GNU Compiler Collection (GCC)

Next: Solaris 2 Options, Previous: Score Options, Up: Submodel Options

3.17.38 SH Options

These ‘-m’ options are defined for the SH implementations:

-m1

Generate code for the SH1.

-m2

Generate code for the SH2.

-m2e

Generate code for the SH2e.

-m2a-nofpu

Generate code for the SH2a without FPU, or for a SH2a-FPU in such a way that the floating-point unit is not used.

-m2a-single-only

Generate code for the SH2a-FPU, in such a way that no double-precision floating-point operations are used.

-m2a-single

Generate code for the SH2a-FPU assuming the floating-point unit is in single-precision mode by default.

-m2a

Generate code for the SH2a-FPU assuming the floating-point unit is in double-precision mode by default.

-m3

Generate code for the SH3.

-m3e

Generate code for the SH3e.

-m4-nofpu

Generate code for the SH4 without a floating-point unit.

-m4-single-only

Generate code for the SH4 with a floating-point unit that only supports single-precision arithmetic.

-m4-single

Generate code for the SH4 assuming the floating-point unit is in single-precision mode by default.

-m4

Generate code for the SH4.

-m4-100

Generate code for SH4-100.

-m4-100-nofpu

Generate code for SH4-100 in such a way that the floating-point unit is not used.

-m4-100-single

Generate code for SH4-100 assuming the floating-point unit is in single-precision mode by default.

-m4-100-single-only

Generate code for SH4-100 in such a way that no double-precision floating-point operations are used.

-m4-200

Generate code for SH4-200.

-m4-200-nofpu

Generate code for SH4-200 without in such a way that the floating-point unit is not used.

-m4-200-single

Generate code for SH4-200 assuming the floating-point unit is in single-precision mode by default.

-m4-200-single-only

Generate code for SH4-200 in such a way that no double-precision floating-point operations are used.

-m4-300

Generate code for SH4-300.

-m4-300-nofpu

Generate code for SH4-300 without in such a way that the floating-point unit is not used.

-m4-300-single

Generate code for SH4-300 in such a way that no double-precision floating-point operations are used.

-m4-300-single-only

Generate code for SH4-300 in such a way that no double-precision floating-point operations are used.

-m4-340

Generate code for SH4-340 (no MMU, no FPU).

-m4-500

Generate code for SH4-500 (no FPU). Passes -isa=sh4-nofpu to the assembler.

-m4a-nofpu

Generate code for the SH4al-dsp, or for a SH4a in such a way that the floating-point unit is not used.

-m4a-single-only

Generate code for the SH4a, in such a way that no double-precision floating-point operations are used.

-m4a-single

Generate code for the SH4a assuming the floating-point unit is in single-precision mode by default.

-m4a

Generate code for the SH4a.

-m4al

Same as -m4a-nofpu, except that it implicitly passes -dsp to the assembler. GCC doesn't generate any DSP instructions at the moment.

-m5-32media

Generate 32-bit code for SHmedia.

-m5-32media-nofpu

Generate 32-bit code for SHmedia in such a way that the floating-point unit is not used.

-m5-64media

Generate 64-bit code for SHmedia.

-m5-64media-nofpu

Generate 64-bit code for SHmedia in such a way that the floating-point unit is not used.

-m5-compact

Generate code for SHcompact.

-m5-compact-nofpu

Generate code for SHcompact in such a way that the floating-point unit is not used.

-mb

Compile code for the processor in big-endian mode.

-ml

Compile code for the processor in little-endian mode.

-mdalign

Align doubles at 64-bit boundaries. Note that this changes the calling conventions, and thus some functions from the standard C library do not work unless you recompile it first with -mdalign.

-mrelax

Shorten some address references at link time, when possible; uses the linker option -relax.

-mbigtable

Use 32-bit offsets in switch tables. The default is to use 16-bit offsets.

-mbitops

Enable the use of bit manipulation instructions on SH2A.

-mfmovd

Enable the use of the instruction fmovd. Check -mdalign for alignment constraints.

-mrenesas

Comply with the calling conventions defined by Renesas.

-mno-renesas

Comply with the calling conventions defined for GCC before the Renesas conventions were available. This option is the default for all targets of the SH toolchain.

-mnomacsave

Mark the MAC register as call-clobbered, even if -mrenesas is given.

-mieee

-mno-ieee

Control the IEEE compliance of floating-point comparisons, which affects the handling of cases where the result of a comparison is unordered. By default -mieee is implicitly enabled. If -ffinite-math-only is enabled -mno-ieee is implicitly set, which results in faster floating-point greater-equal and less-equal comparisons. The implcit settings can be overridden by specifying either -mieee or -mno-ieee.

-minline-ic_invalidate

Inline code to invalidate instruction cache entries after setting up nested function trampolines. This option has no effect if -musermode is in effect and the selected code generation option (e.g. -m4) does not allow the use of the icbi instruction. If the selected code generation option does not allow the use of the icbi instruction, and -musermode is not in effect, the inlined code manipulates the instruction cache address array directly with an associative write. This not only requires privileged mode at run time, but it also fails if the cache line had been mapped via the TLB and has become unmapped.

-misize

Dump instruction size and location in the assembly code.

-mpadstruct

This option is deprecated. It pads structures to multiple of 4 bytes, which is incompatible with the SH ABI.

-matomic-model=model

Sets the model of atomic operations and additional parameters as a comma separated list. For details on the atomic built-in functions see __atomic Builtins. The following models and parameters are supported:

‘none’: Disable compiler generated atomic sequences and emit library calls for atomic operations. This is the default if the target is not sh*-*-linux*.
‘soft-gusa’: Generate GNU/Linux compatible gUSA software atomic sequences for the atomic built-in functions. The generated atomic sequences require additional support from the interrupt/exception handling code of the system and are only suitable for SH3* and SH4* single-core systems. This option is enabled by default when the target is sh*-*-linux* and SH3* or SH4*. When the target is SH4A, this option will also partially utilize the hardware atomic instructions movli.l and movco.l to create more efficient code, unless ‘strict’ is specified.
‘soft-tcb’: Generate software atomic sequences that use a variable in the thread control block. This is a variation of the gUSA sequences which can also be used on SH1* and SH2* targets. The generated atomic sequences require additional support from the interrupt/exception handling code of the system and are only suitable for single-core systems. When using this model, the ‘gbr-offset=’ parameter has to be specified as well.
‘soft-imask’: Generate software atomic sequences that temporarily disable interrupts by setting SR.IMASK = 1111. This model works only when the program runs in privileged mode and is only suitable for single-core systems. Additional support from the interrupt/exception handling code of the system is not required. This model is enabled by default when the target is sh*-*-linux* and SH1* or SH2*.
‘hard-llcs’: Generate hardware atomic sequences using the movli.l and movco.l instructions only. This is only available on SH4A and is suitable for multi-core systems. Since the hardware instructions support only 32 bit atomic variables access to 8 or 16 bit variables is emulated with 32 bit accesses. Code compiled with this option will also be compatible with other software atomic model interrupt/exception handling systems if executed on an SH4A system. Additional support from the interrupt/exception handling code of the system is not required for this model.
‘gbr-offset=’: This parameter specifies the offset in bytes of the variable in the thread control block structure that should be used by the generated atomic sequences when the ‘soft-tcb’ model has been selected. For other models this parameter is ignored. The specified value must be an integer multiple of four and in the range 0-1020.
‘strict’: This parameter prevents mixed usage of multiple atomic models, even though they would be compatible, and will make the compiler generate atomic sequences of the specified model only.

-mtas

Generate the tas.b opcode for __atomic_test_and_set. Notice that depending on the particular hardware and software configuration this can degrade overall performance due to the operand cache line flushes that are implied by the tas.b instruction. On multi-core SH4A processors the tas.b instruction must be used with caution since it can result in data corruption for certain cache configurations.

-mprefergot

When generating position-independent code, emit function calls using the Global Offset Table instead of the Procedure Linkage Table.

-musermode

-mno-usermode

Don't allow (allow) the compiler generating privileged mode code. Specifying -musermode also implies -mno-inline-ic_invalidate if the inlined code would not work in user mode. -musermode is the default when the target is sh*-*-linux*. If the target is SH1* or SH2* -musermode has no effect, since there is no user mode.

-multcost=number

Set the cost to assume for a multiply insn.

-mdiv=strategy

Set the division strategy to be used for integer division operations. For SHmedia strategy can be one of:

‘fp’: Performs the operation in floating point. This has a very high latency, but needs only a few instructions, so it might be a good choice if your code has enough easily-exploitable ILP to allow the compiler to schedule the floating-point instructions together with other instructions. Division by zero causes a floating-point exception.
‘inv’: Uses integer operations to calculate the inverse of the divisor, and then multiplies the dividend with the inverse. This strategy allows CSE and hoisting of the inverse calculation. Division by zero calculates an unspecified result, but does not trap.
‘inv:minlat’: A variant of ‘inv’ where, if no CSE or hoisting opportunities have been found, or if the entire operation has been hoisted to the same place, the last stages of the inverse calculation are intertwined with the final multiply to reduce the overall latency, at the expense of using a few more instructions, and thus offering fewer scheduling opportunities with other code.
‘call’: Calls a library function that usually implements the ‘inv:minlat’ strategy. This gives high code density for m5-*media-nofpu compilations.
‘call2’: Uses a different entry point of the same library function, where it assumes that a pointer to a lookup table has already been set up, which exposes the pointer load to CSE and code hoisting optimizations.
‘inv:call’
‘inv:call2’
‘inv:fp’: Use the ‘inv’ algorithm for initial code generation, but if the code stays unoptimized, revert to the ‘call’, ‘call2’, or ‘fp’ strategies, respectively. Note that the potentially-trapping side effect of division by zero is carried by a separate instruction, so it is possible that all the integer instructions are hoisted out, but the marker for the side effect stays where it is. A recombination to floating-point operations or a call is not possible in that case.
‘inv20u’
‘inv20l’: Variants of the ‘inv:minlat’ strategy. In the case that the inverse calculation is not separated from the multiply, they speed up division where the dividend fits into 20 bits (plus sign where applicable) by inserting a test to skip a number of operations in this case; this test slows down the case of larger dividends. ‘inv20u’ assumes the case of a such a small dividend to be unlikely, and ‘inv20l’ assumes it to be likely.

For targets other than SHmedia strategy can be one of:

‘call-div1’: Calls a library function that uses the single-step division instruction div1 to perform the operation. Division by zero calculates an unspecified result and does not trap. This is the default except for SH4, SH2A and SHcompact.
‘call-fp’: Calls a library function that performs the operation in double precision floating point. Division by zero causes a floating-point exception. This is the default for SHcompact with FPU. Specifying this for targets that do not have a double precision FPU will default to call-div1.
‘call-table’: Calls a library function that uses a lookup table for small divisors and the div1 instruction with case distinction for larger divisors. Division by zero calculates an unspecified result and does not trap. This is the default for SH4. Specifying this for targets that do not have dynamic shift instructions will default to call-div1.

When a division strategy has not been specified the default strategy will be selected based on the current target. For SH2A the default strategy is to use the divs and divu instructions instead of library function calls.

-maccumulate-outgoing-args

Reserve space once for outgoing arguments in the function prologue rather than around each call. Generally beneficial for performance and size. Also needed for unwinding to avoid changing the stack frame around conditional code.

-mdivsi3_libfunc=name

Set the name of the library function used for 32-bit signed division to name. This only affects the name used in the ‘call’ and ‘inv:call’ division strategies, and the compiler still expects the same sets of input/output/clobbered registers as if this option were not present.

-mfixed-range=register-range

Generate code treating the given register range as fixed registers. A fixed register is one that the register allocator can not use. This is useful when compiling kernel code. A register range is specified as two registers separated by a dash. Multiple register ranges can be specified separated by a comma.

-mindexed-addressing

Enable the use of the indexed addressing mode for SHmedia32/SHcompact. This is only safe if the hardware and/or OS implement 32-bit wrap-around semantics for the indexed addressing mode. The architecture allows the implementation of processors with 64-bit MMU, which the OS could use to get 32-bit addressing, but since no current hardware implementation supports this or any other way to make the indexed addressing mode safe to use in the 32-bit ABI, the default is -mno-indexed-addressing.

-mgettrcost=number

Set the cost assumed for the gettr instruction to number. The default is 2 if -mpt-fixed is in effect, 100 otherwise.

-mpt-fixed

Assume pt* instructions won't trap. This generally generates better-scheduled code, but is unsafe on current hardware. The current architecture definition says that ptabs and ptrel trap when the target anded with 3 is 3. This has the unintentional effect of making it unsafe to schedule these instructions before a branch, or hoist them out of a loop. For example, __do_global_ctors, a part of libgcc that runs constructors at program startup, calls functions in a list which is delimited by −1. With the -mpt-fixed option, the ptabs is done before testing against −1. That means that all the constructors run a bit more quickly, but when the loop comes to the end of the list, the program crashes because ptabs loads −1 into a target register.

Since this option is unsafe for any hardware implementing the current architecture specification, the default is -mno-pt-fixed. Unless specified explicitly with -mgettrcost, -mno-pt-fixed also implies -mgettrcost=100; this deters register allocation from using target registers for storing ordinary integers.

-minvalid-symbols

Assume symbols might be invalid. Ordinary function symbols generated by the compiler are always valid to load with movi/shori/ptabs or movi/shori/ptrel, but with assembler and/or linker tricks it is possible to generate symbols that cause ptabs or ptrel to trap. This option is only meaningful when -mno-pt-fixed is in effect. It prevents cross-basic-block CSE, hoisting and most scheduling of symbol loads. The default is -mno-invalid-symbols.

-mbranch-cost=num

Assume num to be the cost for a branch instruction. Higher numbers make the compiler try to generate more branch-free code if possible. If not specified the value is selected depending on the processor type that is being compiled for.

-mzdcbranch

-mno-zdcbranch

Assume (do not assume) that zero displacement conditional branch instructions bt and bf are fast. If -mzdcbranch is specified, the compiler will try to prefer zero displacement branch code sequences. This is enabled by default when generating code for SH4 and SH4A. It can be explicitly disabled by specifying -mno-zdcbranch.

-mcbranchdi

Enable the cbranchdi4 instruction pattern.

-mcmpeqdi

Emit the cmpeqdi_t instruction pattern even when -mcbranchdi is in effect.

-mfused-madd

-mno-fused-madd

Generate code that uses (does not use) the floating-point multiply and accumulate instructions. These instructions are generated by default if hardware floating point is used. The machine-dependent -mfused-madd option is now mapped to the machine-independent -ffp-contract=fast option, and -mno-fused-madd is mapped to -ffp-contract=off.

-mfsca

-mno-fsca

Allow or disallow the compiler to emit the fsca instruction for sine and cosine approximations. The option -mfsca must be used in combination with -funsafe-math-optimizations. It is enabled by default when generating code for SH4A. Using -mno-fsca disables sine and cosine approximations even if -funsafe-math-optimizations is in effect.

-mfsrra

-mno-fsrra

Allow or disallow the compiler to emit the fsrra instruction for reciprocal square root approximations. The option -mfsrra must be used in combination with -funsafe-math-optimizations and -ffinite-math-only. It is enabled by default when generating code for SH4A. Using -mno-fsrra disables reciprocal square root approximations even if -funsafe-math-optimizations and -ffinite-math-only are in effect.

-mpretend-cmove

Prefer zero-displacement conditional branches for conditional move instruction patterns. This can result in faster code on the SH4 processor.