ARM DEN0013C - Appendix B NEON and VFP Instruction Summary
Cortex-A Series Version: 3.0 Programmer’s Guide ARM DEN0013C (ID071612) |
Appendix B NEON and VFP Instruction Summary |
Perlスクリプトで情報を抜き出したのですが、うまく抜き出せないところが多々あり、手で情報をコピペしました。ですので、写し間違い可能性はありますm(_ _)m
ブログのページ幅の関係で美しくないですが、コピペして Microsoft Excel や Libre Office Calc に張り付けるとちゃんとスプレッドシートになります。
B.1 NEON general data processing instructions | |||
B.1.1 | VCVT | Vector Convert (fixed-point or integer to floating-point) | VCVT{cond}.type Qd, Qm {, #fbits} VCVT{cond}.type Dd, Dm {, #fbits} |
B.1.2 | VCVT | Vector Convert (between half-precision and single-precision floating-point) | VCVT{cond}.F32.F16 Qd, Dm VCVT{cond}.F16.F32 Dd, Qm |
B.1.3 | VDUP | Vector Duplicate | VDUP{cond}.size Qd, Dm[x] VDUP{cond}.size Dd, Dm[x] VDUP{cond}.size Qd, Rm VDUP{cond}.size Dd, Rm |
B.1.4 | VEXT | Vector Extract | VEXT{cond}.8 {Qd,} Qn, Qm, #imm VEXT{cond}.8 {Dd,} Dn, Dm, #imm |
B.1.5 | VMOV | Vector Bitwise Move | VMOV{cond}.datatype Qd, #imm VMOV{cond}.datatype Dd, #im |
B.1.6 | VMVN | Vector Bitwise NOT | VMVN{cond}.datatype Qd, #imm VMVN{cond}.datatype Dd, #imm |
B.1.7 | VMOVL, V{Q}MOVN, VQMOVUN | VMOVL (Vector Move Long) VMOVN (Vector Move and Narrow) VQMOVN (Vector Saturating Move and Narrow) VQMOVUN (Vector Saturating Move and Narrow, signed operand with Unsigned result) | VMOVL{cond}.datatype Qd, Dm V{Q}MOVN{cond}.datatype Dd, Qm VQMOVUN{cond}.datatype Dd, Qm |
B.1.8 | VREV | VREV16 (Vector Reverse halfwords) VREV32 (Vector Reverse words) VREV64 (Vector Reverse doublewords) | VREVn{cond}.size Qd, Qm VREVn{cond}.size Dd, Dm |
B.1.9 | VSWP | Vector Swap | VSWP{cond}{.datatype} Qd, Qm VSWP{cond}{.datatype} Dd, Dm |
B.1.10 | VTBL | Vector Table Lookup | VTBL{cond}.8 Dd, list, Dm |
B.1.11 | VTBX | Vector Table Extension | VTBX{cond}.8 Dd, list, Dm |
B.1.12 | VTRN | Vector Transpose | VTRN{cond}.size Qd, Qm VTRN{cond}.size Dd, Dm |
B.1.13 | VUZP | Vector Unzip | VUZP{cond}.size Qd, Qm VUZP{cond}.size Dd, Dm |
B.1.14 | VZIP | Vector Zip | VZIP{cond}.size Qd, Qm VZIP{cond}.size Dd, Dm |
B.2 NEON shift instructions | |||
B.2.1 | VSHL, VQSHL, VQSHLU, VSHLL | Vector Shift Left (by immediate) VSHL (Vector Shift Left), VQSHL (Vector Saturating Shift Left), VQSHLU (Vector Saturating Shift Left Unsigned), VSHLL (Vector Shift Left Long) | V{Q}SHL{U}{cond}.datatype {Qd,} Qm, #imm V{Q}SHL{U}{cond}.datatype {Dd,} Dm, #imm VSHLL{cond}.datatype Qd, Dm, #imm |
B.2.2 | V{Q}{R}SHL | Vector Shift Left (by signed variable) | V{Q}{R}SHL{cond}.datatype {Qd,} Qm, Qn. V{Q}{R}SHL{cond}.datatype {Dd,} Dm, Dn. |
B.2.3 | V{R}SHR{N}, V{R}SRA | V{R}SHR{N} (Vector Shift Right) (by immediate value) V{R}SRA (Vector Shift Right (by immediate value) and Accumulate) | V{R}SHR{cond}.datatype {Qd,} Qm, #imm V{R}SHR{cond}.datatype {Dd,} Dm, #imm V{R}SRA{cond}.datatype {Qd,} Qm, #imm V{R}SRA{cond}.datatype {Dd,} Dm, #imm V{R}SHRN{cond}.datatype Dd, Qm, #imm |
B.2.4 | VQ{R}SHR{U}N | Vector Saturating Shift Right, Narrow (by immediate value) with optional Rounding | VQ{R}SHR{U}N{cond}.datatype Dd, Qm, #imm |
B.2.5 | VSLI | Vector Shift Left and Insert | VSLI{cond}.size {Qd,} Qm, #imm VSLI{cond}.size {Dd,} Dm, #imm |
B.2.6 | VSRI | Vector Shift Right and Insert | VSRI{cond}.size {Qd,} Qm, #imm VSRI{cond}.size {Dd,} Dm, #imm |
B.3 NEON logical and compare operations | |||
B.3.1 | VACGE, VACGT | Vector Absolute Compare | VACop{cond}.F32 {Qd,} Qn, Qm VACop{cond}.F32 {Dd,} Dn, Dm |
B.3.2 | VAND | bitwise logical AND operation | VAND{cond}{.datatype} {Qd,} Qn, Qm VAND{cond}{.datatype} {Dd,} Dn, Dm |
B.3.3 | VBIC | Vector Bitwise Clear | VBIC{cond}.datatype Qd, #imm VBIC{cond}.datatype Dd, #imm |
B.3.4 | VBIC | Vector Bitwise Clear | VBIC{cond}{.datatype} {Qd,} Qn, Qm VBIC{cond}{.datatype} {Dd,} Dn, Dm |
B.3.5 | VBIF | Vector Bitwise Insert if False | VBIF{cond}{.datatype} {Qd,} Qn, Qm VBIF{cond}{.datatype} {Dd,} Dn, Dm |
B.3.6 | VBIT | Vector Bitwise Insert if True | VBIT{cond}{.datatype} {Qd,} Qn, Qm VBIT{cond}{.datatype} {Dd,} Dn, Dm |
B.3.7 | VBSL | Vector Bitwise Select | VBSL{cond}{.datatype} {Qd,} Qn, Qm VBSL{cond}{.datatype} {Dd,} Dn, Dm |
B.3.8 | VCEQ, VCGE, VCGT, VCLE, VCLT | Vector Compare | VCop{cond}.datatype {Qd,} Qn, Qm |
B.3.9 | VEOR | bitwise logical exclusive OR | VEOR{cond}{.datatype} {Qd,} Qn, Qm VEOR{cond}{.datatype} {Dd,} Dn, Dm |
B.3.10 | VMOV | Vector Move (register) | VMOV{cond}{.datatype} Qd, Qm |
B.3.11 | VMVN | Vector Bitwise NOT (register) | VMVN{cond}{.datatype} Qd, Qm VMVN{cond}{.datatype} Dd, Dm |
B.3.12 | VORN | bitwise logical OR NOT operation | VORN{cond}{.datatype} {Qd,} Qn, Qm VORN{cond}{.datatype} {Dd,} Dn, Dm |
B.3.13 | VORR | Bitwise OR (immediate) | VORR{cond}.datatype Qd, #imm VORR{cond}.datatype Dd, #imm |
B.3.14 | VORR | Bitwise OR (register) | VORR{cond}{.datatype} {Qd,} Qn, Qm VORR{cond}{.datatype} {Dd,} Dn, Dm |
B.3.15 | VTST | Vector Test Bits | VTST{cond}.size {Qd,} Qn, Qm VTST{cond}.size {Dd,} Dn, Dm |
B.4 NEON arithmetic instructions | |||
B.4.1 | VABA{L} | Vector Absolute Difference and Accumulate | VABA{cond}.datatype {Qd,} Qn, Qm VABA{cond}.datatype {Dd,} Dn, Dm VABAL{cond}.datatype Qd, Dn, Dm |
B.4.2 | VABD{L} | Vector Absolute Difference | VABD{cond}.datatype {Qd,} Qn, Qm VABD{cond}.datatype {Dd,} Dn, Dm VABDL{cond}.datatype Qd, Dn, Dm |
B.4.3 | V{Q}ABS | Vector Absolute | V{Q}ABS{cond}.datatype Qd, Qm V{Q}ABS{cond}.datatype Dd, Dm |
B.4.4 | V{Q}ADD, VADDL, VADDW | Vector Add | V{Q}ADD{cond}.datatype {Qd,} Qn, Qm ; V{Q}ADD{cond}.datatype {Dd,} Dn, Dm ; VADDL{cond}.datatype Qd, Dn, Dm ; VADDW{cond}.datatype {Qd,} Qn, Dm ; |
B.4.5 | V{R}ADDHN | Vector Add and Narrow, selecting High half | V{R}ADDHN{cond}.datatype Dd, Qn, Qm |
B.4.6 | VCLS | Vector Count Leading Sign Bits | VCLS{cond}.datatype Qd, Qm VCLS{cond}.datatype Dd, Dm |
B.4.7 | VCLZ | Vector Count Leading Zeros | VCLZ{cond}.datatype Qd, Qm VCLZ{cond}.datatype Dd, Dm |
B.4.8 | VCNT | Vector Count Set Bits | VCNT{cond}.datatype Qd, Qm VCNT{cond}.datatype Dd, Dm |
B.4.9 | V{R}HADD | Vector Halving Add | V{R}HADD{cond}.datatype {Qd,} Qn, Qm V{R}HADD{cond}.datatype {Dd,} Dn, Dm |
B.4.10 | VHSUB | Vector Halving Subtract | VHSUB{cond}.datatype {Qd,} Qn, Qm VHSUB{cond}.datatype {Dd,} Dn, Dm |
B.4.11 | VMAX, VMIN | VMAX (Vector Maximum) VMIN (Vector Minimum) | VMAX{cond}.datatype Qd, Qn, Qm VMAX{cond}.datatype Dd, Dn, Dm VMIN{cond}.datatype Qd, Qn, Qm VMIN{cond}.datatype Dd, Dn, Dm |
B.4.12 | V{Q}NEG | Vector Negate | V{Q}NEG{cond}.datatype Qd, Qm V{Q}NEG{cond}.datatype Dd, Dm |
B.4.13 | VPADD{L}, VPADAL | VPADD (Vector Pairwise Add) VPADDL (Vector Pairwise Add Long) VPADAL (Vector Pairwise Add and Accumulate Long) | VPADD{cond}.datatype {Dd,} Dn, Dm VPopL{cond}.datatype Qd, Qm VPopL{cond}.datatype Dd, Dm |
B.4.14 | VPMAX, VPMIN | VPMAX (Vector Pairwise Maximum) VPMIN (Vector Pairwise Minimum) | VPMAX{cond}.datatype Dd, Dn, Dm VPMIN{cond}.datatype Dd, Dn, Dm |
B.4.15 | VRECPE | Vector Reciprocal Estimate | VRECPE{cond}.datatype Qd, Qm VRECPE{cond}.datatype Dd, Dm |
B.4.16 | VRECPS | Vector Reciprocal Step | VRECPS{cond}.F32 {Qd,} Qn, Qm VRECPS{cond}.F32 {Dd,} Dn, Dm |
B.4.17 | VRSQRTE | Vector Reciprocal Square Root Estimate | VRSQRTE{cond}.datatype Qd, Qm VRSQRTE{cond}.datatype Dd, Dm |
B.4.18 | VRSQRTS | Vector Reciprocal Square Root Step | VSQRTS{cond}.F32 {Qd,} Qn, Qm VSQRTS{cond}.F32 {Dd,} Dn, Dm |
B.4.19 | V{Q}SUB, VSUBL, VSUBW | Vector Subtract | V{Q}SUB{cond}.datatype {Qd,} Qn, Qm V{Q}SUB{cond}.datatype {Dd,} Dn, Dm VSUBL{cond}.datatype Qd, Dn, Dm VSUBW{cond}.datatype {Qd,} Qn, Dm |
B.4.20 | V{R}SUBHN | Vector Subtract and Narrow, selecting High Half | V{R}SUBHN{cond}.datatype Dd, Qn, Qm |
B.5 NEON multiply instructions | |||
B.5.1 | VFMA, VFMS | VFMA (Vector Fused Multiply Accumulate) VFMS (Vector Fused Multiply Subtract) | Vop{cond}.F32 {Qd,} Qn, Qm Vop{cond}.F32 {Dd,} Dn, Dm Vop{cond}.F64 {Dd,} Dn, Dm Vop{cond}.F32 {Sd,} Sn, Sm |
B.5.2 | VMUL{L}, VMLA{L}, VMLS{L} | VMUL (Vector Multiply) VMLA (Vector Multiply Accumulate) VMLS (Vector Multiply Subtract) | Vop{cond}.datatype {Qd,} Qn, Qm Vop{cond}.datatype {Dd,} Dn, Dm VopL{cond}.datatype Qd, Dn, Dm |
B.5.3 | VMUL{L}, VMLA{L}, VMLS{L} | VMUL (Vector Multiply by scalar) VMLA (Vector Multiply Accumulate) VMLS (Vector Multiply Subtract) | Vop{cond}.datatype {Qd,} Qn, Dm[x] Vop{cond}.datatype {Dd,} Dn, Dm[x] VopL{cond}.datatype Qd, Dn, Dm[x] |
B.5.4 | VQ{R}DMULH | Vector Saturating Doubling Multiply Returning High Half (by vector or by scalar) | VQ{R}DMULH{cond}.datatype {Qd,} Qn, Qm VQ{R}DMULH{cond}.datatype {Dd,} Dn, Dm VQ{R}DMULH{cond}.datatype {Qd,} Qn, Dm[x] VQ{R}DMULH{cond}.datatype {Dd,} Dn, Dm[x] |
B.5.5 | VQDMULL, VQDMLAL, VQDMLSL | Vector Saturating Doubling Multiply Long (by vector or by scalar) | VQDopL{cond}.datatype Qd, Dn, Dm VQDopL{cond}.datatype Qd, Dn, Dm[x] |
B.6 NEON load and store element and structure instructions | |||
B.6.1 | VLDn, VSTn | VLDn (Vector Load single n-element structure to one lane) VSTn (Vector Store single n-element structure to one lane) | Vopn{cond}.datatype list, [Rn{:align}]{!} Vopn{cond}.datatype list, [Rn{:align}], Rm |
B.6.2 | VLDn | Vector Load single n-element structure to all lanes | VLDn{cond}.datatype list, [Rn{:align}]{!} VLDn{cond}.datatype list, [Rn{:align}], Rm |
B.6.3 | VLDn, VSTn | VLDn (Vector Load multiple n-element structures) VSTn (Vector Store multiple n-element structures) | Vopn{cond}.datatype list, [Rn{:align}]{!} Vopn{cond}.datatype list, [Rn{:align}], Rm |
B.6.4 | VLDR, VSTR | VLDR loads a single extension register from memory, using an address from an ARM core register, with an optional offset. VSTR saves the contents of a NEON or VFP register to memory. | VLDR{cond}{.size} Fd, [Rn{, #offset}] VSTR{cond}{.size} Fd, [Rn{, #offset}] VLDR{cond}{.size} Fd, label VSTR{cond}{.size} Fd, label |
B.6.5 | VLDM, VSTM, VPOP, VPUSH | NEON and VFP register load multiple (VLDM), store multiple (VSTM), pop from stack (VPOP), push onto stack (VPUSH). | VLDMmode{cond} Rn{!}, Registers VSTMmode{cond} Rn{!}, Registers VPOP{cond} Registers VPUSH{cond} Registers |
B.6.6 | VMOV | Transfer contents between two ARM registers and a 64-bit NEON or VFP register, or two consecutive 32-bit VFP registers. | VMOV{cond} Dm, Rd, Rn VMOV{cond} Rd, Rn, Dm VMOV{cond} Sm, Sm1, Rd, Rn VMOV{cond} Rd, Rn, Sm, Sm1 |
B.6.7 | VMOV | Transfer contents between an ARM register and a NEON scalar. | VMOV{cond}{.size} Dn[x], Rd VMOV{cond}{.datatype} Rd, Dn[x] |
B.6.8 | VMRS, VMSR | VMRS transfers the contents of NEON or VFP system register FPSCR into Rd. VMSR transfers the contents of Rd into a NEON or VFP system register, FPSCR. | VMRS{cond} Rd, extsysreg VMSR{cond} extsysreg, Rd |
B.7 VFP instructions | |||
B.7.1 | VABS | Floating-point absolute value (VABS). | VABS{cond}.F32 Sd, Sm VABS{cond}.F64 Dd, Dm |
B.7.2 | VADD | VADD adds the values in the operand registers and places the result in the destination register. | VADD{cond}.F32 {Sd,} Sn, Sm VADD{cond}.F64 {Dd,} Dn, Dm |
B.7.3 | VCMP | Floating-Point Compare | VCMP{cond}.F32 Sd, Sm VCMP{cond}.F32 Sd, #0 VCMP{cond}.F64 Dd, Dm VCMP{cond}.F64 Dd, #0 |
B.7.4 | VCVT (between single-precision and double-precision) | VCVT converts the single-precision value in Sm to double-precision and stores the result in Dd, or converts the double-precision value in Dm to single-precision, storing the result in Sd. | VCVT{cond}.F64.F32 Dd, Sm VCVT{cond}.F32.F64 Sd, Dm |
B.7.5 | VCVT(between floating-point and integer) | VCVT forms which convert from floating-point to integer or from integer to floating-point. | VCVT{R}{cond}.type.F64 Sd, Dm VCVT{R}{cond}.type.F32 Sd, Sm VCVT{cond}.F64.type Dd, Sm VCVT{cond}.F32.type Sd, Sm |
B.7.6 | VCVT(between floating-point and fixed-point) | Convert between floating-point and fixed-point numbers. | VCVT{cond}.type.F64 Dd, Dd, #fbits VCVT{cond}.type.F32 Sd, Sd, #fbits VCVT{cond}.F64.type Dd, Dd, #fbits VCVT{cond}.F32.type Sd, Sd, #fbits |
B.7.7 | VCVTB, VCVTT | VCVTB uses the lower half (bits [15:0]) of the single word register to obtain the half-precision value. VCVTT uses the upper half (bits [31:16]) of the single word register to obtain the half-precision value. | VCVTB{cond}.type Sd, Sm VCVTT{cond}.type Sd, Sm |
B.7.8 | VDIV | VDIV divides the value in the first operand register by the value in the second operand register, and places the result in the destination register. | VDIV{cond}.F32 {Sd,} Sn, Sm VDIV{cond}.F64 {Dd,} Dn, Dm |
B.7.9 | VFMA, VFNMA, VFMS, VFNMS | Fused Floating-point Multiply Accumulate | VF{N}op{cond}.F64 {Dd,} Dn, Dm VF{N}op{cond}.F32 {Sd,} Sn, Sm |
B.7.10 | VMOV | VMOV puts a floating-point immediate value into a single-precision or double-precision register, or copies one register into another register. | VMOV{cond}.F32 Sd, #imm VMOV{cond}.F64 Dd, #imm VMOV{cond}.F32 Sd, Sm VMOV{cond}.F64 Dd, Dm |
B.7.11 | VMOV | Transfer contents between a single-precision floating-point register and an ARM register. | VMOV{cond} Rd, Sn VMOV{cond} Sn, Rd |
B.7.12 | VMUL, VMLA, VMLS, VNMUL, VNMLA, VNMLS | VFMA (Fused Floating-point Multiply Accumulate (with optional Negation)) VFMS (Fused Floating-point Multiply Subtract (with optional Negation)) VMLS (Floating-point Multiply and Multiply Subtract (with optional Negation)) The final result is negated if the N option is used. | V{N}MUL{cond}.F32 {Sd,} Sn, Sm V{N}MUL{cond}.F64 {Dd,} Dn, Dm V{N}MLA{cond}.F32 Sd, Sn, Sm V{N}MLA{cond}.F64 Dd, Dn, Dm V{N}MLS{cond}.F32 Sd, Sn, Sm V{N}MLS{cond}.F64 Dd, Dn, Dm |
B.7.13 | VNEG | Floating-point negate | VNEG{cond}.F32 Sd, Sm |
B.7.14 | VSQRT | Floating-point square root | VSQRT{cond}.F32 Sd, Sm VSQRT{cond}.F64 Dd, Dm |
B.7.15 | VSUB | VSUB subtracts the value in the second operand register from the value in the first operand register, and places the result in the destination register. | VSUB{cond}.F32 {Sd,} Sn, Sm VSUB{cond}.F64 {Dd,} Dn, Dm |
B.8 NEON and VFP pseudo-instructions | |||
B.8.1 | VACLE, VACLT | Vector Absolute Compare takes the absolute value of each element in a vector, and compares with the absolute value of the corresponding element of a second vector. | VACop{cond}.datatype {Qd,} Qn, Qm VACop{cond}.datatype {Dd,} Dn, Dm |
B.8.2 | VAND (immediate) | bitwise AND immediate | VAND{cond}.datatype Qd, #imm VAND{cond}.datatype Dd, #imm |
B.8.3 | VCLE, VCLT | Vector Compare takes the value of each element in a vector, and compares it with the value of the corresponding element of a second vector. | VCop{cond}.datatype {Qd,} Qn, Qm VCop{cond}.datatype {Dd,} Dn, Dm |
B.8.4 | VLDR pseudo-instruction | The VLDR pseudo-instruction loads a constant value into every element of a 64-bit NEON vector (or a VFP single-precision or double-precision register). | VLDR{cond}.datatype Dd,=constant VLDR{cond}.datatype Sd,=constant |
B.8.5 | VLDR, VSTR (post-increment and pre-decrement) | The VLDR and VSTR pseudo-instructions load or store extension registers with post-increment and pre-decrement. | VMOV2{cond}.datatype Qd, #constant VMOV2{cond}.datatype Dd, #constant |
B.8.6 | VMOV2 | The VMOV2 pseudo-instruction generates an immediate value and places it in every element of a NEON vector, without a load from a literal pool. | VMOV2{cond}.datatype Qd, #constant VMOV2{cond}.datatype Dd, #constant |
B.8.7 | VORN | Bitwise OR NOT (immediate) | VORN{cond}.datatype Qd, #imm VORN{cond}.datatype Dd, #imm |