marvell-doc/MARVELL_GCC_Options.txt - toolchains/armada - Git at Google

 Topic:		Marvell-defined options in GCC C/C++ Compiler
 Release:	2013Q2 (gcc version 4.6.4)
 Date:		2013/06


     The document describes the command line options of GCC created by Marvell
 in addition to original options.


 	Default Core Options of Marvell GCC
 	Marvell-defined GCC Options
 	Depreciated Marvell-defined GCC Options
 	Marvell-defined Binutils Features


 --------------------------------------------------------------------------------
 Default Core Options of Marvell GCC
 --------------------------------------------------------------------------------


     The following listing is the basic Marvell's CPU cores:

 	-------------- --------- ------- ---------- ---------------
 	CPU Cores      ISA       Thumb   VFP        VFP+SIMD(NEON)
 	-mcpu=         -march=   -mthumb -mfpu=     -mfpu=
 	-------------- --------- ------- ---------- ---------------
 	marvell-pj1    armv5te   THUMB1  vfp        N/A
 	marvell-f      armv5te   THUMB1  vfp        N/A
 	marvell-fv7    armv7-a   THUMB2  vfpv3-d16  N/A
 	marvell-pj4    armv7-a   THUMB2  vfpv3-d16  N/A
 	marvell-pj4b   armv7-a   THUMB2  vfpv3      neon
 	                                 vfpv3-fp16 neon-fp16
 	marvell-pj4c   armv7-a   THUMB2  vfpv3      neon
 	                                 vfpv3-fp16 neon-fp16
 	-------------- --------- ------- ---------- ---------------

     'marvell-pj4c' is only for experiment.

     The Marvell GCC uses the following options to select what features the
 target supports.

 	-mcpu=<marvell-...>
 		Specify the target core with its ISA.  It's better to use
 		'-mcpu=<marvell-...>' instead of '-march=<armv5te/armv7-a>'.
 		The latter may not known the new features of Marvell cores.

 	-mfpu=<vfp/vfpv3-d16/vfpv3/vfpv3-fp16/neon/neon-fp16>
 		Specify the version of VFP or SIMD(NEON).

 	-mfloat-abi=<soft/softfp/hard>
 		'soft' float ABI does not generate VFP instructions, and others
 		do.  For function calls, 'softfp' does not pass floating-point
 		value by VFP registers, but 'hard' may do.  Object files
 		generated with 'soft' and 'softfp' are compatible, but not
 		with 'hard'.

 	-mthumb
 		For ARMv5te, enable THUMB1 features; for ARMv7-a, enable THUMB2.
 		Note that THUMB1 and VFP cannot co-work, so it is necessary to
 		use -mfloat-abi=soft to stop generating VFP instruction.

 	-mwmmxt
 		If your Marvell Cores support WMMXT feature, this option let
 		GCC generate the WMMXT instructions.  WMMXT and NEON are
 		incompatible.


     It's hard to build the Marvell GCC Toolchains by user himself, and use
 the correct options mentioned above.  Marvell provides some pre-built toolchains
 to support different target features.  See the list:

 --------------- ------- ----------------- ------------------------------------
 Package         ISA     CPU Core          VFP and ABI
 --------------- ------- ----------------- ------------------------------------
 armv5-*-soft    ARMv5te -mcpu=marvell-f                   -mfloat-abi=soft
 armv5-*-softfp  ARMv5te -mcpu=marvell-f   -mfpu=vfp       -mfloat-abi=softfp
 armv5-*-hard    ARMv5te -mcpu=marvell-f   -mfpu=vfp       -mfloat-abi=hard
 armv7-*-soft    ARMv7-a -mcpu=marvell-pj4                 -mfloat-abi=soft
 armv7-*-softfp  ARMv7-a -mcpu=marvell-pj4 -mfpu=vfpv3-d16 -mfloat-abi=softfp
 armv7-*-hard    ARMv7-a -mcpu=marvell-pj4 -mfpu=vfpv3-d16 -mfloat-abi=hard
 --------------- ------- ----------------- ------------------------------------
 * Note: For BE32(ARMv5te)/BE8(ARMv7-a) package, the prefix is armeb*-*-*.


     The armv5-* packages only provide ARM-code run-time and standard C/C++
 libraries.  That is your program may be built as THUMB1-code with '-mthumb'
 options, but linked to the ARM-code libraries.


     The armv7-* packages provide the multilib mechanism and provide ARM-code
 and THUMB2-code run-time and standard C/C++ libraries.  Use '-mthumb' option
 to link to the THUMB2-code libraries.


     If you are not sure the default core options, you can use 'gcc -v' to
 check the setting of '--with-cpu', '--with-fpu'  and '--with-float'.  For
 example:

 	$ arm-marvell-eabi-gcc -v
 	Using built-in specs.
 	Target: arm-marvell-eabi
 	Configured with: ... --with-cpu=marvell-f
 	                     --with-fpu=vfp
 	                     --with-float=softfp ...


     If your Marvell core is not matched, you should select the best similar
 package according to ARMv5te/ARMv7-a and soft/softfp/hard features.  And then
 use '-mcpu=' and '-mfpu=' options to compile the programs.  For example:

 	$ arm-marvell-eabi-gcc ... -mcpu=marvell-pj4b -mfpu=neon ...


     But you should known that about options does not affect the compiled
 libraries, unless the whole toolchains are re-configured and re-built.


 --------------------------------------------------------------------------------
 Marvell-defined GCC Options
 --------------------------------------------------------------------------------

 -mmarvell-div				(default: off)

     Generate Marvell-defined hardware integer division instructions supported
     by some Marvell cores instead of calling run-time suport library.

     For example, r1 = r2 / r3, the following co-processor instructions would be
     emitted according to singed or unsigned integer:

 	mrc	p6, 1, r1, c2, c3, 4	@ signed div
 	mrc	p6, 1, r1, c2, c3, 0	@ unsigned div

     Above codes are incompatiable with not-supporting chips and non-Marvell ARM
     cores.

     This option makes the predefined macro "__hw_int_div__" available.  This
     may be helpful for assembler coding.  Assembler can use SDIV/UDIV pseudo
     code instead of MRC code if the under gas is supported by Marvell.  User
     should note that the predefined macro "__hw_int_div__" is for Marvell-
     defined hardware integer division instructions, but "__ARM_ARCH_EXT_IDIV__"
     for ARM-defined ones.


 -mldrd-strd				(default: on for LDRD/STRD is supported)

     Turn on to support ldrd/strd code generation. It works iff architecture is
     ARMv6t2/ARMv7 in ARM/THUMB mode or ARMv5te in ARM mode.

     The old ABI -mabi=apcs-gnu/atpcs does not require double words at 8-bytes
     alignment, so -mldrd-strd is useless.

     The AAPCS ABI -mabi=aapcs/aapcs-linux always uses LDRD/STRD to access
     8-byte-aligned double-word data (e.g. 'long long', 'double', or 'struct'
     contains them).  But for some stange reasons that the data are not aligned
     on 8-bytes, you may wish to use -mno-ldrd-strd to avoid generating
     LDRD/STRD instructions.

     If this option is turned off, the predefined macro "__no_ldrd_strd__" is
     available.

     If you need to trun it off to solve the non-alignment access problem, you
     may better check your whole program.  Maybe the initial code (before 'main'
     function) does not adjust the 'sp' register to the 8-byte boundary.  Or
     maybe the data layout does not follow the AAPCS ABI.


 -mtune-ldrd				(default: off)

     Tune to generate LDRD/STRD (double word load-stores) instructions over
     LDM/STM.  For some Marvell's chips, the performacne of LDRD/STRD is better
     than LDM/STM.  Becasue of the requirement of 8-byte alignment for LDRD/STRD,
     the compiler may rearrange data aligned at most on 8-byte boundary in order
     to promote the probability of using them.

     Example 1:

 	struct S {
 	  char a[16];
 	};

 	struct S a, b;

 	void foo() { a = b; }

     Using -mtune-ldrd, the structure 'a' and 'b' are placed on the 8-byte
     boundary, so the compiler can generate LDRD/STRD insructions instead of
     LDM/STM to do the memory block coping.

     Example 2:

 	memcpy(q, "01234567", 8);

     The const string literal, i.e. "01234567", may be forced to aling 8-byte,
     so the memory routines, e.g. memcpy, may do a better job by using LDRD/STRD.
     Moreover, the compiler may inline or expand these routines and use
     LDRD/STRD.

     Example 3:

 	long long q;
 	q = 0x3456789034567890LL;

     The const long long literal, i.e. 0x3456789034567890LL, is divided into two
     words, and use 2 LDR instructions to load them.  Using -mtune-ldrd, the
     two-word long long literal is placed on the 8-byte boundary and 1 LDRD
     instruction is generated to load.

     Because of larger alignment and more LDRD/STRD code numbers, this option
     may waste code and data space, so is turned off if using -Os.

     Note that some data may be adjusted to align on 8-byte boundary.  If you
     want to keep the data's natural alignment, e.g. 4-byte alignment, you
     should do not use this option.  For building Linux or uBoot examples,
     turning on may corrupt the layout of some arrays which natural
     4-byte-aligned items come from different objects and are merged by the
     static linker (i.e ld).


 -mbxret					(default: on)

     Try to use the 'BX LR' instruction for function returns as possible.
     Marvell cores has tuned it with the hardware return-stack mechanism.

     For example, the original output codes:

 	foo:
 		@ function prologue
 		stmfd	sp!, {r3, lr}
 		...
 		@ function epilogue
 		ldmfd	sp!, {r3, pc}

     would becomes:

 	foo:
 		@ function prologue
 		stmfd	sp!, {r3, lr}
 		...
 		@ function epilogue
 		ldmfd	sp!, {r3, lr}
 		bx	lr

     This option may increase the code size.  Use -mno-bxret to turn it off.


 -mcond-exec				(default: on)

     Enable conditional execution (CE) instructions, overriding processor
     specific tune settings, while -mno-cond-exec does the reverse and forces
     all CE to be avoided.

     If this option is turned off, the predefined macro "__no_cond_exec__" is
     available.

     Some Marvell cores may get benefits from the pipeline scheduling by
     avoiding conditional execution.


 -mwmmxt					(default: off)

     Enable IWMMXT feature in Marvell Core.  You'd better use this option
     instead of -mcpu=xscale/iwmmxt/iwmmxt2.

     If this option is turned on, the predefined macro "__IWMMXT__"、
     __IWMMXT2__ and "__ARM_WMMX" are available.


 -msched1=ARM_CORE			(default: same as -mcpu=)
 -msched2=ARM_CORE			(default: same as -mcpu=)

     Specify pipeline description in pipeline scheduling staget before (
     -msched1=) and after (-msched2=) register allocation.  The valid ARM_CORE
     name is the same with '-mcpu=ARM_CORE' option.


 -mthumb-cbz				(default: off)

     Generate thumb instructions CBZ/CBNZ if possible. So far Marvell chips
     don't prefer them.


 -mpromote-inline-asm-input-int		(default: on)

     Promote the small integer type, e.g. short, of input operands of the inline
     asm to the full word-size integer.

 	__asm__ ("..." : ... : "r" (short_var) : ... );

     would be converted to:

 	__asm__ ("..." : ... : "r" ((int)short_var) : ... );


     The default is true.  If turning off, the compiler may do not zero- nor
     signed-extend the small int to the full one in the general integer
     register, and the high-part bytes of that register may contain garbage
     value.


 -mmemcpy-ninsn=N			(default: 4. Internal tunning option.)

     Specify the threshold of inlined memcpy instruction number. If exceeding
     this threshold, memcpy will take a library call.

     This internal tunning option affects the MOVE_RATIO value used in the
     Scalar Reduction of Aggregates (SRA) pass. For example:

         struct S s, *p;
         s = *p;        // structure copy even though some fields are not used!
 	...
 	s.f2++;

     If the threshold is higher, the compiler may not use memcpy call to do the
     structure assignment. Instead it may be tuned to the following code:

         s.f1 = p->f1;  // maybe redundant and can be removed latter.
         s.f2 = p->f2;
         ...
 	s.f2++;

     The member-wise assignment may be good sometime, e.g. simple structure and
     removing the redundant assignments,  but the higher the threshold is, the
     code size and register pressure also may become higher.

     This option may waste code and data space, so is turned off if using -Os.


 -freduce-passed-addressof		(default: off)

     Try to reduce the number of passing &var argument for improving points-to
     analysis.  For example:

 	foo(&local_var);

     If foo does not escape the local_var outside the current function, the
     compiler may convert it under safety to:

 	tmp = local_var;
 	foo(&tmp);
 	local_var = tmp;

     If being lucky, some optimizers may improve the useage of that local
     variable because of not escaping.


 -fargument-restrict			(default: off)

     Force pointer arguments to be qualified with restrict keyword. For
     example:

 	foo(int* p1, int* p2);

     is converted to:

 	foo(int* restrict p1, int* restrict p2);

     So that the compiler can assume the memory blocks pointed to by p1 and
     p2 are not overlapped and do more advance optimization.

     This option only works for the C language.  This is a dangerous option
     and works on the whole compilation unit.  Users should use it carefully.


 -floop-post-opt				(default: off)

     Perform some optimizations immediately after RTL-level loop unrolling, such
     as address forwarding and dead-code elimination.  From the experience of
     clinpack benchmark, moving some optimization passes early can reduce
     the data dependences and improve the instruction scheduling finally.


 -falign-arrays=N			(default: 0)

     Align the start of non-local arrays to the next power-of-two greater than or
     equal to the maximum value of N and their natural alignment.  For example:

 	char a1[11];
 	static char a2[22] = { 'a', 'b', ... };
 	void foo() { char a3[33]; ... }

     If N is 17 ~ 32, then a1 and a2 are aligned to 32-byte boundary, but the
     local stack variable, a3, is not affected by it.

     Sometime it can improve the cache operation performance. E.g. benchmark
     STREAM with N=32 may be improved on some Marvell's platforms.

     This option may waste data space, so is turned off if using -Os.


 --------------------------------------------------------------------------------
 Depreciated Marvell-defined GCC Options
 --------------------------------------------------------------------------------


     The following options are depreciated and will become obsolete (nothing) in
 the next Marvell GCC toolchains (mgcc48x).


 -miwmmxt-use-realign			(default: off)

     Enable IWMMXT Vectorizer using WALIGNR & WALIGNI to support unaligned
     load.


 -miwmmxt-use-aggressive-realign		(default: off)

     Use optimized realign scheme for IWMMXT vectorizer.


 -mneon-no-always-misalign		(default: on)

     Enable Vectorizer do loop peeling for vectorized loop to avoid unaligned
     load/store.


 -mfused-mac				(default: off, experiment)

     For VFPv4 or above, generate floating-point fused multiply-add instructions
     supported by some Marvell cores.


 --------------------------------------------------------------------------------
 Marvell-defined Binutils Features
 --------------------------------------------------------------------------------


 . Marvell-defined ld option

 	--skip-vendor-check
 		This option will skip vendor information check, so you
 		can link libraries built by different toolchain vendors, like
 		ARM RVCT, successfully.


 . Marvell-defined opcodes

   The following testsuite give you the Marvell-defined opcode list:
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 @	Tests for Marvell ALU extersion instructions
 	.text
 	.syntax unified
 	.thumb
 	sdiv r0, r1, r2
 	mrc p6, 1, r0, cr1, cr2, 4
 	sdiv r11, r7, r3
 	mrc p6, 1, r11, cr7, cr3, 4

 	.arm
 	sdiv r0, r1, r2
 	mrc p6, 1, r0, cr1, cr2, 4
 	sdiv r11, r7, r3
 	mrc p6, 1, r11, cr7, cr3, 4

 	.thumb
 	udiv r1, r2, r3
 	mrc p6, 1, r1, cr2, cr3, 0
 	udiv r10, r6, r2
 	mrc p6, 1, r10, cr6, cr2, 0

 	.arm
 	udiv r1, r2, r3
 	mrc p6, 1, r1, cr2, cr3, 0
 	udiv r10, r6, r2
 	mrc p6, 1, r10, cr6, cr2, 0

 	.thumb
 	cnt32 r11, r12
 	mrc p6,2,r11, cr12,cr0,0
 	cnt32 r1, r2
 	mrc p6,2,r1, cr2,cr0,0
 	cnt32 r3, r7
 	mrc p6,2,r3, cr7,cr0,0

 	.arm
 	cnt32 r11, r12
 	mrc p6,2,r11, cr12,cr0,0
 	cnt32 r1, r2
 	mrc p6,2,r1, cr2,cr0,0
 	cnt32 r3, r7
 	mrc p6,2,r3, cr7,cr0,0

 	.thumb
 	bitcnt2 r11, r12, r14
 	mrc p6,2,r11, cr12,cr14,1
 	bitcnt2 r1, r2, r0
 	mrc p6,2,r1, cr2,cr0,1
 	bitcnt2 r3, r7, r7
 	mrc p6,2,r3, cr7,cr7,1

 	and3 r0, r14, r8, r9, 7
 	mrc p6, 3, r0, cr14, cr8, 7
 	and3 r4, r2, r12, r13, 1
 	mrc p6, 3, r4, cr2, cr12, 1
 	and3 r14, r7, r0, r1, 3
 	mrc p6, 3, r14, cr7, cr0, 3

 	and3one r0, r14, r8, r9, 7
 	mrc p6, 4, r0, cr14, cr8, 7
 	and3one r4, r2, r12, r13, 1
 	mrc p6, 4, r4, cr2, cr12, 1
 	and3one r14, r7, r0, r1, 3
 	mrc p6, 4, r14, cr7, cr0, 3

 	mrc p6, 5, r4, cr2, cr3, 7
 	qaddsub r4, r7, r2, r3

 	mrc p6, 6, r4, cr2, cr3, 7
 	qdaddsub r4, r7, r2, r3

 	cmp4x4 r14, r7, r7
 	mrc p6, 0, r14, cr7, cr7, 0
 	cmp4x4 r7, r1, r10
 	mrc p6, 0, r7, cr1, cr10, 0
 	cmp4x4 r2, r9, r2
 	mrc p6, 0, r2, cr9, cr2, 0
 	cmp4x4 r3, r2, r15
 	mrc p6, 0, r3, cr2, cr15, 0

 	cmp4x4s r7, r1, r10
 	mrc p6, 0, r7, cr1, cr10, 1
 	cmp1x4 r2, r9, r2
 	mrc p6, 0, r2, cr9, cr2, 2
 	cmp1x4s r3, r2, r15
 	mrc p6, 0, r3, cr2, cr15, 3
 	cmp1x3 r2, r9, r2
 	mrc p6, 0, r2, cr9, cr2, 4
 	cmp1x3s r3, r2, r15
 	mrc p6, 0, r3, cr2, cr15, 5
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	Topic: Marvell-defined options in GCC C/C++ Compiler
	Release: 2013Q2 (gcc version 4.6.4)
	Date: 2013/06


	The document describes the command line options of GCC created by Marvell
	in addition to original options.


	Default Core Options of Marvell GCC
	Marvell-defined GCC Options
	Depreciated Marvell-defined GCC Options
	Marvell-defined Binutils Features





	--------------------------------------------------------------------------------
	Default Core Options of Marvell GCC
	--------------------------------------------------------------------------------


	The following listing is the basic Marvell's CPU cores:

	-------------- --------- ------- ---------- ---------------
	CPU Cores ISA Thumb VFP VFP+SIMD(NEON)
	-mcpu= -march= -mthumb -mfpu= -mfpu=
	-------------- --------- ------- ---------- ---------------
	marvell-pj1 armv5te THUMB1 vfp N/A
	marvell-f armv5te THUMB1 vfp N/A
	marvell-fv7 armv7-a THUMB2 vfpv3-d16 N/A
	marvell-pj4 armv7-a THUMB2 vfpv3-d16 N/A
	marvell-pj4b armv7-a THUMB2 vfpv3 neon
	vfpv3-fp16 neon-fp16
	marvell-pj4c armv7-a THUMB2 vfpv3 neon
	vfpv3-fp16 neon-fp16
	-------------- --------- ------- ---------- ---------------

	'marvell-pj4c' is only for experiment.

	The Marvell GCC uses the following options to select what features the
	target supports.

	-mcpu=<marvell-...>
	Specify the target core with its ISA. It's better to use
	'-mcpu=<marvell-...>' instead of '-march=<armv5te/armv7-a>'.
	The latter may not known the new features of Marvell cores.

	-mfpu=<vfp/vfpv3-d16/vfpv3/vfpv3-fp16/neon/neon-fp16>
	Specify the version of VFP or SIMD(NEON).

	-mfloat-abi=<soft/softfp/hard>
	'soft' float ABI does not generate VFP instructions, and others
	do. For function calls, 'softfp' does not pass floating-point
	value by VFP registers, but 'hard' may do. Object files
	generated with 'soft' and 'softfp' are compatible, but not
	with 'hard'.

	-mthumb
	For ARMv5te, enable THUMB1 features; for ARMv7-a, enable THUMB2.
	Note that THUMB1 and VFP cannot co-work, so it is necessary to
	use -mfloat-abi=soft to stop generating VFP instruction.

	-mwmmxt
	If your Marvell Cores support WMMXT feature, this option let
	GCC generate the WMMXT instructions. WMMXT and NEON are
	incompatible.


	It's hard to build the Marvell GCC Toolchains by user himself, and use
	the correct options mentioned above. Marvell provides some pre-built toolchains
	to support different target features. See the list:

	--------------- ------- ----------------- ------------------------------------
	Package ISA CPU Core VFP and ABI
	--------------- ------- ----------------- ------------------------------------
	armv5-*-soft ARMv5te -mcpu=marvell-f -mfloat-abi=soft
	armv5-*-softfp ARMv5te -mcpu=marvell-f -mfpu=vfp -mfloat-abi=softfp
	armv5-*-hard ARMv5te -mcpu=marvell-f -mfpu=vfp -mfloat-abi=hard
	armv7-*-soft ARMv7-a -mcpu=marvell-pj4 -mfloat-abi=soft
	armv7-*-softfp ARMv7-a -mcpu=marvell-pj4 -mfpu=vfpv3-d16 -mfloat-abi=softfp
	armv7-*-hard ARMv7-a -mcpu=marvell-pj4 -mfpu=vfpv3-d16 -mfloat-abi=hard
	--------------- ------- ----------------- ------------------------------------
	* Note: For BE32(ARMv5te)/BE8(ARMv7-a) package, the prefix is armeb--*.


	The armv5-* packages only provide ARM-code run-time and standard C/C++
	libraries. That is your program may be built as THUMB1-code with '-mthumb'
	options, but linked to the ARM-code libraries.


	The armv7-* packages provide the multilib mechanism and provide ARM-code
	and THUMB2-code run-time and standard C/C++ libraries. Use '-mthumb' option
	to link to the THUMB2-code libraries.


	If you are not sure the default core options, you can use 'gcc -v' to
	check the setting of '--with-cpu', '--with-fpu' and '--with-float'. For
	example:

	$ arm-marvell-eabi-gcc -v
	Using built-in specs.
	Target: arm-marvell-eabi
	Configured with: ... --with-cpu=marvell-f
	--with-fpu=vfp
	--with-float=softfp ...


	If your Marvell core is not matched, you should select the best similar
	package according to ARMv5te/ARMv7-a and soft/softfp/hard features. And then
	use '-mcpu=' and '-mfpu=' options to compile the programs. For example:

	$ arm-marvell-eabi-gcc ... -mcpu=marvell-pj4b -mfpu=neon ...


	But you should known that about options does not affect the compiled
	libraries, unless the whole toolchains are re-configured and re-built.





	--------------------------------------------------------------------------------
	Marvell-defined GCC Options
	--------------------------------------------------------------------------------

	-mmarvell-div (default: off)

	Generate Marvell-defined hardware integer division instructions supported
	by some Marvell cores instead of calling run-time suport library.

	For example, r1 = r2 / r3, the following co-processor instructions would be
	emitted according to singed or unsigned integer:

	mrc p6, 1, r1, c2, c3, 4 @ signed div
	mrc p6, 1, r1, c2, c3, 0 @ unsigned div

	Above codes are incompatiable with not-supporting chips and non-Marvell ARM
	cores.

	This option makes the predefined macro "__hw_int_div__" available. This
	may be helpful for assembler coding. Assembler can use SDIV/UDIV pseudo
	code instead of MRC code if the under gas is supported by Marvell. User
	should note that the predefined macro "__hw_int_div__" is for Marvell-
	defined hardware integer division instructions, but "__ARM_ARCH_EXT_IDIV__"
	for ARM-defined ones.



	-mldrd-strd (default: on for LDRD/STRD is supported)

	Turn on to support ldrd/strd code generation. It works iff architecture is
	ARMv6t2/ARMv7 in ARM/THUMB mode or ARMv5te in ARM mode.

	The old ABI -mabi=apcs-gnu/atpcs does not require double words at 8-bytes
	alignment, so -mldrd-strd is useless.

	The AAPCS ABI -mabi=aapcs/aapcs-linux always uses LDRD/STRD to access
	8-byte-aligned double-word data (e.g. 'long long', 'double', or 'struct'
	contains them). But for some stange reasons that the data are not aligned
	on 8-bytes, you may wish to use -mno-ldrd-strd to avoid generating
	LDRD/STRD instructions.

	If this option is turned off, the predefined macro "__no_ldrd_strd__" is
	available.

	If you need to trun it off to solve the non-alignment access problem, you
	may better check your whole program. Maybe the initial code (before 'main'
	function) does not adjust the 'sp' register to the 8-byte boundary. Or
	maybe the data layout does not follow the AAPCS ABI.



	-mtune-ldrd (default: off)

	Tune to generate LDRD/STRD (double word load-stores) instructions over
	LDM/STM. For some Marvell's chips, the performacne of LDRD/STRD is better
	than LDM/STM. Becasue of the requirement of 8-byte alignment for LDRD/STRD,
	the compiler may rearrange data aligned at most on 8-byte boundary in order
	to promote the probability of using them.

	Example 1:

	struct S {
	char a[16];
	};

	struct S a, b;

	void foo() { a = b; }

	Using -mtune-ldrd, the structure 'a' and 'b' are placed on the 8-byte
	boundary, so the compiler can generate LDRD/STRD insructions instead of
	LDM/STM to do the memory block coping.

	Example 2:

	memcpy(q, "01234567", 8);

	The const string literal, i.e. "01234567", may be forced to aling 8-byte,
	so the memory routines, e.g. memcpy, may do a better job by using LDRD/STRD.
	Moreover, the compiler may inline or expand these routines and use
	LDRD/STRD.

	Example 3:

	long long q;
	q = 0x3456789034567890LL;

	The const long long literal, i.e. 0x3456789034567890LL, is divided into two
	words, and use 2 LDR instructions to load them. Using -mtune-ldrd, the
	two-word long long literal is placed on the 8-byte boundary and 1 LDRD
	instruction is generated to load.

	Because of larger alignment and more LDRD/STRD code numbers, this option
	may waste code and data space, so is turned off if using -Os.

	Note that some data may be adjusted to align on 8-byte boundary. If you
	want to keep the data's natural alignment, e.g. 4-byte alignment, you
	should do not use this option. For building Linux or uBoot examples,
	turning on may corrupt the layout of some arrays which natural
	4-byte-aligned items come from different objects and are merged by the
	static linker (i.e ld).



	-mbxret (default: on)

	Try to use the 'BX LR' instruction for function returns as possible.
	Marvell cores has tuned it with the hardware return-stack mechanism.

	For example, the original output codes:

	foo:
	@ function prologue
	stmfd sp!, {r3, lr}
	...
	@ function epilogue
	ldmfd sp!, {r3, pc}

	would becomes:

	foo:
	@ function prologue
	stmfd sp!, {r3, lr}
	...
	@ function epilogue
	ldmfd sp!, {r3, lr}
	bx lr

	This option may increase the code size. Use -mno-bxret to turn it off.



	-mcond-exec (default: on)

	Enable conditional execution (CE) instructions, overriding processor
	specific tune settings, while -mno-cond-exec does the reverse and forces
	all CE to be avoided.

	If this option is turned off, the predefined macro "__no_cond_exec__" is
	available.

	Some Marvell cores may get benefits from the pipeline scheduling by
	avoiding conditional execution.



	-mwmmxt (default: off)

	Enable IWMMXT feature in Marvell Core. You'd better use this option
	instead of -mcpu=xscale/iwmmxt/iwmmxt2.

	If this option is turned on, the predefined macro "__IWMMXT__"、
	__IWMMXT2__ and "__ARM_WMMX" are available.



	-msched1=ARM_CORE (default: same as -mcpu=)
	-msched2=ARM_CORE (default: same as -mcpu=)

	Specify pipeline description in pipeline scheduling staget before (
	-msched1=) and after (-msched2=) register allocation. The valid ARM_CORE
	name is the same with '-mcpu=ARM_CORE' option.



	-mthumb-cbz (default: off)

	Generate thumb instructions CBZ/CBNZ if possible. So far Marvell chips
	don't prefer them.



	-mpromote-inline-asm-input-int (default: on)

	Promote the small integer type, e.g. short, of input operands of the inline
	asm to the full word-size integer.

	__asm__ ("..." : ... : "r" (short_var) : ... );

	would be converted to:

	__asm__ ("..." : ... : "r" ((int)short_var) : ... );


	The default is true. If turning off, the compiler may do not zero- nor
	signed-extend the small int to the full one in the general integer
	register, and the high-part bytes of that register may contain garbage
	value.



	-mmemcpy-ninsn=N (default: 4. Internal tunning option.)

	Specify the threshold of inlined memcpy instruction number. If exceeding
	this threshold, memcpy will take a library call.

	This internal tunning option affects the MOVE_RATIO value used in the
	Scalar Reduction of Aggregates (SRA) pass. For example:

	struct S s, *p;
	s = *p; // structure copy even though some fields are not used!
	...
	s.f2++;

	If the threshold is higher, the compiler may not use memcpy call to do the
	structure assignment. Instead it may be tuned to the following code:

	s.f1 = p->f1; // maybe redundant and can be removed latter.
	s.f2 = p->f2;
	...
	s.f2++;

	The member-wise assignment may be good sometime, e.g. simple structure and
	removing the redundant assignments, but the higher the threshold is, the
	code size and register pressure also may become higher.

	This option may waste code and data space, so is turned off if using -Os.



	-freduce-passed-addressof (default: off)

	Try to reduce the number of passing &var argument for improving points-to
	analysis. For example:

	foo(&local_var);

	If foo does not escape the local_var outside the current function, the
	compiler may convert it under safety to:

	tmp = local_var;
	foo(&tmp);
	local_var = tmp;

	If being lucky, some optimizers may improve the useage of that local
	variable because of not escaping.



	-fargument-restrict (default: off)

	Force pointer arguments to be qualified with restrict keyword. For
	example:

	foo(int* p1, int* p2);

	is converted to:

	foo(int* restrict p1, int* restrict p2);

	So that the compiler can assume the memory blocks pointed to by p1 and
	p2 are not overlapped and do more advance optimization.

	This option only works for the C language. This is a dangerous option
	and works on the whole compilation unit. Users should use it carefully.



	-floop-post-opt (default: off)

	Perform some optimizations immediately after RTL-level loop unrolling, such
	as address forwarding and dead-code elimination. From the experience of
	clinpack benchmark, moving some optimization passes early can reduce
	the data dependences and improve the instruction scheduling finally.



	-falign-arrays=N (default: 0)

	Align the start of non-local arrays to the next power-of-two greater than or
	equal to the maximum value of N and their natural alignment. For example:

	char a1[11];
	static char a2[22] = { 'a', 'b', ... };
	void foo() { char a3[33]; ... }

	If N is 17 ~ 32, then a1 and a2 are aligned to 32-byte boundary, but the
	local stack variable, a3, is not affected by it.

	Sometime it can improve the cache operation performance. E.g. benchmark
	STREAM with N=32 may be improved on some Marvell's platforms.

	This option may waste data space, so is turned off if using -Os.





	--------------------------------------------------------------------------------
	Depreciated Marvell-defined GCC Options
	--------------------------------------------------------------------------------



	The following options are depreciated and will become obsolete (nothing) in
	the next Marvell GCC toolchains (mgcc48x).



	-miwmmxt-use-realign (default: off)

	Enable IWMMXT Vectorizer using WALIGNR & WALIGNI to support unaligned
	load.



	-miwmmxt-use-aggressive-realign (default: off)

	Use optimized realign scheme for IWMMXT vectorizer.



	-mneon-no-always-misalign (default: on)

	Enable Vectorizer do loop peeling for vectorized loop to avoid unaligned
	load/store.



	-mfused-mac (default: off, experiment)

	For VFPv4 or above, generate floating-point fused multiply-add instructions
	supported by some Marvell cores.





	--------------------------------------------------------------------------------
	Marvell-defined Binutils Features
	--------------------------------------------------------------------------------



	. Marvell-defined ld option

	--skip-vendor-check
	This option will skip vendor information check, so you
	can link libraries built by different toolchain vendors, like
	ARM RVCT, successfully.



	. Marvell-defined opcodes

	The following testsuite give you the Marvell-defined opcode list:
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	@ Tests for Marvell ALU extersion instructions
	.text
	.syntax unified
	.thumb
	sdiv r0, r1, r2
	mrc p6, 1, r0, cr1, cr2, 4
	sdiv r11, r7, r3
	mrc p6, 1, r11, cr7, cr3, 4

	.arm
	sdiv r0, r1, r2
	mrc p6, 1, r0, cr1, cr2, 4
	sdiv r11, r7, r3
	mrc p6, 1, r11, cr7, cr3, 4

	.thumb
	udiv r1, r2, r3
	mrc p6, 1, r1, cr2, cr3, 0
	udiv r10, r6, r2
	mrc p6, 1, r10, cr6, cr2, 0

	.arm
	udiv r1, r2, r3
	mrc p6, 1, r1, cr2, cr3, 0
	udiv r10, r6, r2
	mrc p6, 1, r10, cr6, cr2, 0

	.thumb
	cnt32 r11, r12
	mrc p6,2,r11, cr12,cr0,0
	cnt32 r1, r2
	mrc p6,2,r1, cr2,cr0,0
	cnt32 r3, r7
	mrc p6,2,r3, cr7,cr0,0

	.arm
	cnt32 r11, r12
	mrc p6,2,r11, cr12,cr0,0
	cnt32 r1, r2
	mrc p6,2,r1, cr2,cr0,0
	cnt32 r3, r7
	mrc p6,2,r3, cr7,cr0,0

	.thumb
	bitcnt2 r11, r12, r14
	mrc p6,2,r11, cr12,cr14,1
	bitcnt2 r1, r2, r0
	mrc p6,2,r1, cr2,cr0,1
	bitcnt2 r3, r7, r7
	mrc p6,2,r3, cr7,cr7,1

	and3 r0, r14, r8, r9, 7
	mrc p6, 3, r0, cr14, cr8, 7
	and3 r4, r2, r12, r13, 1
	mrc p6, 3, r4, cr2, cr12, 1
	and3 r14, r7, r0, r1, 3
	mrc p6, 3, r14, cr7, cr0, 3

	and3one r0, r14, r8, r9, 7
	mrc p6, 4, r0, cr14, cr8, 7
	and3one r4, r2, r12, r13, 1
	mrc p6, 4, r4, cr2, cr12, 1
	and3one r14, r7, r0, r1, 3
	mrc p6, 4, r14, cr7, cr0, 3

	mrc p6, 5, r4, cr2, cr3, 7
	qaddsub r4, r7, r2, r3

	mrc p6, 6, r4, cr2, cr3, 7
	qdaddsub r4, r7, r2, r3

	cmp4x4 r14, r7, r7
	mrc p6, 0, r14, cr7, cr7, 0
	cmp4x4 r7, r1, r10
	mrc p6, 0, r7, cr1, cr10, 0
	cmp4x4 r2, r9, r2
	mrc p6, 0, r2, cr9, cr2, 0
	cmp4x4 r3, r2, r15
	mrc p6, 0, r3, cr2, cr15, 0

	cmp4x4s r7, r1, r10
	mrc p6, 0, r7, cr1, cr10, 1
	cmp1x4 r2, r9, r2
	mrc p6, 0, r2, cr9, cr2, 2
	cmp1x4s r3, r2, r15
	mrc p6, 0, r3, cr2, cr15, 3
	cmp1x3 r2, r9, r2
	mrc p6, 0, r2, cr9, cr2, 4
	cmp1x3s r3, r2, r15
	mrc p6, 0, r3, cr2, cr15, 5
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~