やまものブログ

メモ書きブログです (^_^;A

ARM DEN0013C - Appendix B NEON and VFP Instruction Summary

Cortex-A Series
Version: 3.0
Programmer’s Guide
ARM DEN0013C (ID071612)

Appendix B NEON and VFP Instruction Summary
を一覧表にまとめてみました。
Perlスクリプトで情報を抜き出したのですが、うまく抜き出せないところが多々あり、手で情報をコピペしました。ですので、写し間違い可能性はありますm(_ _)m
ブログのページ幅の関係で美しくないですが、コピペして Microsoft Excel や Libre Office Calc に張り付けるとちゃんとスプレッドシートになります。



B.1 NEON general data processing instructions
B.1.1VCVTVector Convert (fixed-point or integer to floating-point)VCVT{cond}.type Qd, Qm {, #fbits}
VCVT{cond}.type Dd, Dm {, #fbits}
B.1.2VCVTVector Convert (between half-precision and single-precision floating-point)VCVT{cond}.F32.F16 Qd, Dm
VCVT{cond}.F16.F32 Dd, Qm
B.1.3VDUPVector DuplicateVDUP{cond}.size Qd, Dm[x]
VDUP{cond}.size Dd, Dm[x]
VDUP{cond}.size Qd, Rm
VDUP{cond}.size Dd, Rm
B.1.4VEXTVector ExtractVEXT{cond}.8 {Qd,} Qn, Qm, #imm
VEXT{cond}.8 {Dd,} Dn, Dm, #imm
B.1.5VMOVVector Bitwise MoveVMOV{cond}.datatype Qd, #imm
VMOV{cond}.datatype Dd, #im
B.1.6VMVNVector Bitwise NOTVMVN{cond}.datatype Qd, #imm
VMVN{cond}.datatype Dd, #imm
B.1.7VMOVL, V{Q}MOVN, VQMOVUNVMOVL (Vector Move Long) VMOVN (Vector Move and Narrow)
VQMOVN (Vector Saturating Move and Narrow)
VQMOVUN (Vector Saturating Move and Narrow, signed operand with Unsigned result)
VMOVL{cond}.datatype Qd, Dm
V{Q}MOVN{cond}.datatype Dd, Qm
VQMOVUN{cond}.datatype Dd, Qm
B.1.8VREVVREV16 (Vector Reverse halfwords) VREV32 (Vector Reverse words)
VREV64 (Vector Reverse doublewords)
VREVn{cond}.size Qd, Qm
VREVn{cond}.size Dd, Dm
B.1.9VSWPVector SwapVSWP{cond}{.datatype} Qd, Qm
VSWP{cond}{.datatype} Dd, Dm
B.1.10VTBLVector Table LookupVTBL{cond}.8 Dd, list, Dm
B.1.11VTBXVector Table ExtensionVTBX{cond}.8 Dd, list, Dm
B.1.12VTRNVector TransposeVTRN{cond}.size Qd, Qm
VTRN{cond}.size Dd, Dm
B.1.13VUZPVector UnzipVUZP{cond}.size Qd, Qm
VUZP{cond}.size Dd, Dm
B.1.14VZIPVector ZipVZIP{cond}.size Qd, Qm
VZIP{cond}.size Dd, Dm

B.2 NEON shift instructions
B.2.1VSHL, VQSHL, VQSHLU, VSHLLVector Shift Left (by immediate)
VSHL (Vector Shift Left), VQSHL (Vector Saturating Shift Left), VQSHLU (Vector Saturating Shift Left Unsigned), VSHLL (Vector Shift Left Long)
V{Q}SHL{U}{cond}.datatype {Qd,} Qm, #imm
V{Q}SHL{U}{cond}.datatype {Dd,} Dm, #imm
VSHLL{cond}.datatype Qd, Dm, #imm
B.2.2V{Q}{R}SHLVector Shift Left (by signed variable)V{Q}{R}SHL{cond}.datatype {Qd,} Qm, Qn.
V{Q}{R}SHL{cond}.datatype {Dd,} Dm, Dn.
B.2.3V{R}SHR{N}, V{R}SRAV{R}SHR{N} (Vector Shift Right) (by immediate value)
V{R}SRA (Vector Shift Right (by immediate value) and Accumulate)
V{R}SHR{cond}.datatype {Qd,} Qm, #imm
V{R}SHR{cond}.datatype {Dd,} Dm, #imm
V{R}SRA{cond}.datatype {Qd,} Qm, #imm
V{R}SRA{cond}.datatype {Dd,} Dm, #imm
V{R}SHRN{cond}.datatype Dd, Qm, #imm
B.2.4VQ{R}SHR{U}NVector Saturating Shift Right, Narrow (by immediate value) with optional
Rounding
VQ{R}SHR{U}N{cond}.datatype Dd, Qm, #imm
B.2.5VSLIVector Shift Left and InsertVSLI{cond}.size {Qd,} Qm, #imm
VSLI{cond}.size {Dd,} Dm, #imm
B.2.6VSRIVector Shift Right and InsertVSRI{cond}.size {Qd,} Qm, #imm
VSRI{cond}.size {Dd,} Dm, #imm

B.3 NEON logical and compare operations
B.3.1VACGE, VACGTVector Absolute CompareVACop{cond}.F32 {Qd,} Qn, Qm
VACop{cond}.F32 {Dd,} Dn, Dm
B.3.2VANDbitwise logical AND operationVAND{cond}{.datatype} {Qd,} Qn, Qm
VAND{cond}{.datatype} {Dd,} Dn, Dm
B.3.3VBICVector Bitwise ClearVBIC{cond}.datatype Qd, #imm
VBIC{cond}.datatype Dd, #imm
B.3.4VBICVector Bitwise ClearVBIC{cond}{.datatype} {Qd,} Qn, Qm
VBIC{cond}{.datatype} {Dd,} Dn, Dm
B.3.5VBIFVector Bitwise Insert if FalseVBIF{cond}{.datatype} {Qd,} Qn, Qm
VBIF{cond}{.datatype} {Dd,} Dn, Dm
B.3.6VBITVector Bitwise Insert if TrueVBIT{cond}{.datatype} {Qd,} Qn, Qm
VBIT{cond}{.datatype} {Dd,} Dn, Dm
B.3.7VBSLVector Bitwise SelectVBSL{cond}{.datatype} {Qd,} Qn, Qm
VBSL{cond}{.datatype} {Dd,} Dn, Dm
B.3.8VCEQ, VCGE, VCGT, VCLE, VCLTVector CompareVCop{cond}.datatype {Qd,} Qn, Qm
B.3.9VEORbitwise logical exclusive ORVEOR{cond}{.datatype} {Qd,} Qn, Qm
VEOR{cond}{.datatype} {Dd,} Dn, Dm
B.3.10VMOVVector Move (register)VMOV{cond}{.datatype} Qd, Qm
B.3.11VMVNVector Bitwise NOT (register)VMVN{cond}{.datatype} Qd, Qm
VMVN{cond}{.datatype} Dd, Dm
B.3.12VORNbitwise logical OR NOT operationVORN{cond}{.datatype} {Qd,} Qn, Qm
VORN{cond}{.datatype} {Dd,} Dn, Dm
B.3.13VORRBitwise OR (immediate)VORR{cond}.datatype Qd, #imm
VORR{cond}.datatype Dd, #imm
B.3.14VORRBitwise OR (register)VORR{cond}{.datatype} {Qd,} Qn, Qm
VORR{cond}{.datatype} {Dd,} Dn, Dm
B.3.15VTSTVector Test BitsVTST{cond}.size {Qd,} Qn, Qm
VTST{cond}.size {Dd,} Dn, Dm

B.4 NEON arithmetic instructions
B.4.1VABA{L}Vector Absolute Difference and AccumulateVABA{cond}.datatype {Qd,} Qn, Qm
VABA{cond}.datatype {Dd,} Dn, Dm
VABAL{cond}.datatype Qd, Dn, Dm
B.4.2VABD{L}Vector Absolute DifferenceVABD{cond}.datatype {Qd,} Qn, Qm
VABD{cond}.datatype {Dd,} Dn, Dm
VABDL{cond}.datatype Qd, Dn, Dm
B.4.3V{Q}ABSVector AbsoluteV{Q}ABS{cond}.datatype Qd, Qm
V{Q}ABS{cond}.datatype Dd, Dm
B.4.4V{Q}ADD, VADDL, VADDWVector AddV{Q}ADD{cond}.datatype {Qd,} Qn, Qm ;
V{Q}ADD{cond}.datatype {Dd,} Dn, Dm ;
VADDL{cond}.datatype Qd, Dn, Dm ;
VADDW{cond}.datatype {Qd,} Qn, Dm ;
B.4.5V{R}ADDHNVector Add and Narrow, selecting High halfV{R}ADDHN{cond}.datatype Dd, Qn, Qm
B.4.6VCLSVector Count Leading Sign BitsVCLS{cond}.datatype Qd, Qm
VCLS{cond}.datatype Dd, Dm
B.4.7VCLZVector Count Leading ZerosVCLZ{cond}.datatype Qd, Qm
VCLZ{cond}.datatype Dd, Dm
B.4.8VCNTVector Count Set BitsVCNT{cond}.datatype Qd, Qm
VCNT{cond}.datatype Dd, Dm
B.4.9V{R}HADDVector Halving AddV{R}HADD{cond}.datatype {Qd,} Qn, Qm
V{R}HADD{cond}.datatype {Dd,} Dn, Dm
B.4.10VHSUBVector Halving SubtractVHSUB{cond}.datatype {Qd,} Qn, Qm
VHSUB{cond}.datatype {Dd,} Dn, Dm
B.4.11VMAX, VMINVMAX (Vector Maximum)
VMIN (Vector Minimum)
VMAX{cond}.datatype Qd, Qn, Qm
VMAX{cond}.datatype Dd, Dn, Dm
VMIN{cond}.datatype Qd, Qn, Qm
VMIN{cond}.datatype Dd, Dn, Dm
B.4.12V{Q}NEG Vector NegateV{Q}NEG{cond}.datatype Qd, Qm
V{Q}NEG{cond}.datatype Dd, Dm
B.4.13VPADD{L}, VPADALVPADD (Vector Pairwise Add)
VPADDL (Vector Pairwise Add Long)
VPADAL (Vector Pairwise Add and Accumulate Long)
VPADD{cond}.datatype {Dd,} Dn, Dm
VPopL{cond}.datatype Qd, Qm
VPopL{cond}.datatype Dd, Dm
B.4.14VPMAX, VPMINVPMAX (Vector Pairwise Maximum)
VPMIN (Vector Pairwise Minimum)
VPMAX{cond}.datatype Dd, Dn, Dm
VPMIN{cond}.datatype Dd, Dn, Dm
B.4.15VRECPE Vector Reciprocal EstimateVRECPE{cond}.datatype Qd, Qm
VRECPE{cond}.datatype Dd, Dm
B.4.16VRECPS Vector Reciprocal StepVRECPS{cond}.F32 {Qd,} Qn, Qm
VRECPS{cond}.F32 {Dd,} Dn, Dm
B.4.17VRSQRTE Vector Reciprocal Square Root EstimateVRSQRTE{cond}.datatype Qd, Qm
VRSQRTE{cond}.datatype Dd, Dm
B.4.18VRSQRTS Vector Reciprocal Square Root StepVSQRTS{cond}.F32 {Qd,} Qn, Qm
VSQRTS{cond}.F32 {Dd,} Dn, Dm
B.4.19V{Q}SUB, VSUBL, VSUBWVector SubtractV{Q}SUB{cond}.datatype {Qd,} Qn, Qm
V{Q}SUB{cond}.datatype {Dd,} Dn, Dm
VSUBL{cond}.datatype Qd, Dn, Dm
VSUBW{cond}.datatype {Qd,} Qn, Dm
B.4.20V{R}SUBHN Vector Subtract and Narrow, selecting High HalfV{R}SUBHN{cond}.datatype Dd, Qn, Qm

B.5 NEON multiply instructions
B.5.1VFMA, VFMSVFMA (Vector Fused Multiply Accumulate)
VFMS (Vector Fused Multiply Subtract)
Vop{cond}.F32 {Qd,} Qn, Qm
Vop{cond}.F32 {Dd,} Dn, Dm
Vop{cond}.F64 {Dd,} Dn, Dm
Vop{cond}.F32 {Sd,} Sn, Sm
B.5.2VMUL{L}, VMLA{L}, VMLS{L}VMUL (Vector Multiply)
VMLA (Vector Multiply Accumulate)
VMLS (Vector Multiply Subtract)
Vop{cond}.datatype {Qd,} Qn, Qm
Vop{cond}.datatype {Dd,} Dn, Dm
VopL{cond}.datatype Qd, Dn, Dm
B.5.3VMUL{L}, VMLA{L}, VMLS{L}VMUL (Vector Multiply by scalar)
VMLA (Vector Multiply Accumulate)
VMLS (Vector Multiply Subtract)
Vop{cond}.datatype {Qd,} Qn, Dm[x]
Vop{cond}.datatype {Dd,} Dn, Dm[x]
VopL{cond}.datatype Qd, Dn, Dm[x]
B.5.4VQ{R}DMULHVector Saturating Doubling Multiply Returning High Half (by vector or by scalar)VQ{R}DMULH{cond}.datatype {Qd,} Qn, Qm
VQ{R}DMULH{cond}.datatype {Dd,} Dn, Dm
VQ{R}DMULH{cond}.datatype {Qd,} Qn, Dm[x]
VQ{R}DMULH{cond}.datatype {Dd,} Dn, Dm[x]
B.5.5VQDMULL, VQDMLAL, VQDMLSLVector Saturating Doubling Multiply Long (by vector or by scalar)VQDopL{cond}.datatype Qd, Dn, Dm
VQDopL{cond}.datatype Qd, Dn, Dm[x]

B.6 NEON load and store element and structure instructions
B.6.1VLDn, VSTnVLDn (Vector Load single n-element structure to one lane)
VSTn (Vector Store single n-element structure to one lane)
Vopn{cond}.datatype list, [Rn{:align}]{!}
Vopn{cond}.datatype list, [Rn{:align}], Rm
B.6.2VLDnVector Load single n-element structure to all lanesVLDn{cond}.datatype list, [Rn{:align}]{!}
VLDn{cond}.datatype list, [Rn{:align}], Rm
B.6.3VLDn, VSTnVLDn (Vector Load multiple n-element structures)
VSTn (Vector Store multiple n-element structures)
Vopn{cond}.datatype list, [Rn{:align}]{!}
Vopn{cond}.datatype list, [Rn{:align}], Rm
B.6.4VLDR, VSTRVLDR loads a single extension register from memory, using an address from an ARM core register, with an optional offset.
VSTR saves the contents of a NEON or VFP register to memory.
VLDR{cond}{.size} Fd, [Rn{, #offset}]
VSTR{cond}{.size} Fd, [Rn{, #offset}]
VLDR{cond}{.size} Fd, label
VSTR{cond}{.size} Fd, label
B.6.5VLDM, VSTM, VPOP, VPUSHNEON and VFP register load multiple (VLDM), store multiple (VSTM), pop from stack (VPOP), push onto stack (VPUSH).VLDMmode{cond} Rn{!}, Registers
VSTMmode{cond} Rn{!}, Registers
VPOP{cond} Registers
VPUSH{cond} Registers
B.6.6VMOVTransfer contents between two ARM registers and a 64-bit NEON or VFP register, or two consecutive 32-bit VFP registers.VMOV{cond} Dm, Rd, Rn
VMOV{cond} Rd, Rn, Dm
VMOV{cond} Sm, Sm1, Rd, Rn
VMOV{cond} Rd, Rn, Sm, Sm1
B.6.7VMOVTransfer contents between an ARM register and a NEON scalar.VMOV{cond}{.size} Dn[x], Rd
VMOV{cond}{.datatype} Rd, Dn[x]
B.6.8VMRS, VMSRVMRS transfers the contents of NEON or VFP system register FPSCR into Rd.
VMSR transfers the contents of Rd into a NEON or VFP system register, FPSCR.
VMRS{cond} Rd, extsysreg
VMSR{cond} extsysreg, Rd

B.7 VFP instructions
B.7.1VABSFloating-point absolute value (VABS).VABS{cond}.F32 Sd, Sm
VABS{cond}.F64 Dd, Dm
B.7.2VADDVADD adds the values in the operand registers and places the result in the destination register.VADD{cond}.F32 {Sd,} Sn, Sm
VADD{cond}.F64 {Dd,} Dn, Dm
B.7.3VCMPFloating-Point CompareVCMP{cond}.F32 Sd, Sm
VCMP{cond}.F32 Sd, #0
VCMP{cond}.F64 Dd, Dm
VCMP{cond}.F64 Dd, #0
B.7.4VCVT (between single-precision and double-precision)VCVT converts the single-precision value in Sm to double-precision and stores the result in Dd, or converts the double-precision value in Dm to single-precision, storing the result in Sd.VCVT{cond}.F64.F32 Dd, Sm
VCVT{cond}.F32.F64 Sd, Dm
B.7.5VCVT(between floating-point and integer)VCVT forms which convert from floating-point to integer or from integer to floating-point.VCVT{R}{cond}.type.F64 Sd, Dm
VCVT{R}{cond}.type.F32 Sd, Sm
VCVT{cond}.F64.type Dd, Sm
VCVT{cond}.F32.type Sd, Sm
B.7.6VCVT(between floating-point and fixed-point)Convert between floating-point and fixed-point numbers.VCVT{cond}.type.F64 Dd, Dd, #fbits
VCVT{cond}.type.F32 Sd, Sd, #fbits
VCVT{cond}.F64.type Dd, Dd, #fbits
VCVT{cond}.F32.type Sd, Sd, #fbits
B.7.7VCVTB, VCVTTVCVTB uses the lower half (bits [15:0]) of the single word register to obtain the half-precision value.
VCVTT uses the upper half (bits [31:16]) of the single word register to obtain the half-precision value.
VCVTB{cond}.type Sd, Sm
VCVTT{cond}.type Sd, Sm
B.7.8VDIVVDIV divides the value in the first operand register by the value in the second operand register,
and places the result in the destination register.
VDIV{cond}.F32 {Sd,} Sn, Sm
VDIV{cond}.F64 {Dd,} Dn, Dm
B.7.9VFMA, VFNMA, VFMS, VFNMSFused Floating-point Multiply AccumulateVF{N}op{cond}.F64 {Dd,} Dn, Dm
VF{N}op{cond}.F32 {Sd,} Sn, Sm
B.7.10VMOVVMOV puts a floating-point immediate value into a single-precision or double-precision register,
or copies one register into another register.
VMOV{cond}.F32 Sd, #imm
VMOV{cond}.F64 Dd, #imm
VMOV{cond}.F32 Sd, Sm
VMOV{cond}.F64 Dd, Dm
B.7.11VMOVTransfer contents between a single-precision floating-point register and an ARM register.VMOV{cond} Rd, Sn
VMOV{cond} Sn, Rd
B.7.12VMUL, VMLA, VMLS, VNMUL, VNMLA, VNMLSVFMA (Fused Floating-point Multiply Accumulate (with optional Negation))
VFMS (Fused Floating-point Multiply Subtract (with optional Negation))
VMLS (Floating-point Multiply and Multiply Subtract (with optional Negation))
The final result is negated if the N option is used.
V{N}MUL{cond}.F32 {Sd,} Sn, Sm
V{N}MUL{cond}.F64 {Dd,} Dn, Dm
V{N}MLA{cond}.F32 Sd, Sn, Sm
V{N}MLA{cond}.F64 Dd, Dn, Dm
V{N}MLS{cond}.F32 Sd, Sn, Sm
V{N}MLS{cond}.F64 Dd, Dn, Dm
B.7.13VNEGFloating-point negateVNEG{cond}.F32 Sd, Sm
B.7.14VSQRTFloating-point square rootVSQRT{cond}.F32 Sd, Sm
VSQRT{cond}.F64 Dd, Dm
B.7.15VSUBVSUB subtracts the value in the second operand register from the value in the first operand
register, and places the result in the destination register.
VSUB{cond}.F32 {Sd,} Sn, Sm
VSUB{cond}.F64 {Dd,} Dn, Dm

B.8 NEON and VFP pseudo-instructions
B.8.1VACLE, VACLTVector Absolute Compare takes the absolute value of each element in a vector, and compares
with the absolute value of the corresponding element of a second vector.
VACop{cond}.datatype {Qd,} Qn, Qm
VACop{cond}.datatype {Dd,} Dn, Dm
B.8.2VAND (immediate)bitwise AND immediateVAND{cond}.datatype Qd, #imm
VAND{cond}.datatype Dd, #imm
B.8.3VCLE, VCLTVector Compare takes the value of each element in a vector, and compares it with the value of
the corresponding element of a second vector.
VCop{cond}.datatype {Qd,} Qn, Qm
VCop{cond}.datatype {Dd,} Dn, Dm
B.8.4VLDR pseudo-instructionThe VLDR pseudo-instruction loads a constant value into every element of a 64-bit NEON vector
(or a VFP single-precision or double-precision register).
VLDR{cond}.datatype Dd,=constant
VLDR{cond}.datatype Sd,=constant
B.8.5VLDR, VSTR (post-increment and pre-decrement)The VLDR and VSTR pseudo-instructions load or store extension registers with post-increment and
pre-decrement.
VMOV2{cond}.datatype Qd, #constant
VMOV2{cond}.datatype Dd, #constant
B.8.6VMOV2The VMOV2 pseudo-instruction generates an immediate value and places it in every element of a
NEON vector, without a load from a literal pool.
VMOV2{cond}.datatype Qd, #constant
VMOV2{cond}.datatype Dd, #constant
B.8.7VORNBitwise OR NOT (immediate)VORN{cond}.datatype Qd, #imm
VORN{cond}.datatype Dd, #imm