package archsimdImport Pathsimd/archsimd (on go.dev)Dependency Relation
imports 3 packages, and imported by 0 packages
Involved Source Filescompare_gen_amd64.gocpu.go Package archsimd provides access to architecture-specific SIMD operations.
This is a low-level package that exposes hardware-specific functionality.
It currently supports AMD64.
This package is experimental, and not subject to the Go 1 compatibility promise.
It only exists when building with the GOEXPERIMENT=simd environment variable set.
# Vector types and operations
Vector types are defined as structs, such as Int8x16 and Float64x8, corresponding
to the hardware's vector registers. On AMD64, 128-, 256-, and 512-bit vectors are
supported.
Mask types are defined similarly, such as Mask8x16, and are represented as
opaque types, handling the differences in the underlying representations.
A mask can be converted to/from the corresponding integer vector type, or
to/from a bitmask.
Operations are mostly defined as methods on the vector types. Most of them
are compiler intrinsics and correspond directly to hardware instructions.
Common operations include:
- Load/Store: Load a vector from memory or store a vector to memory.
- Arithmetic: Add, Sub, Mul, etc.
- Bitwise: And, Or, Xor, etc.
- Comparison: Equal, Greater, etc., which produce a mask.
- Conversion: Convert between different vector types.
- Field selection and rearrangement: GetElem, Permute, etc.
- Masking: Masked, Merge.
The compiler recognizes certain patterns of operations and may optimize
them to more performant instructions. For example, on AVX512, an Add operation
followed by Masked may be optimized to a masked add instruction.
For this reason, not all hardware instructions are available as APIs.
# CPU feature checks
The package provides global variables to check for CPU features available
at runtime. For example, on AMD64, the [X86] variable provides methods to
check for AVX2, AVX512, etc.
It is recommended to check for CPU features before using the corresponding
vector operations.
# Notes
- This package is not portable, as the available types and operations depend
on the target architecture. It is not recommended to expose the SIMD types
defined in this package in public APIs.
- For performance reasons, it is recommended to use the vector types directly
as values. It is not recommended to take the address of a vector type,
allocate it in the heap, or put it in an aggregate type.extra_amd64.gogenerate.gomaskmerge_gen_amd64.goops_amd64.goops_internal_amd64.goother_gen_amd64.goshuffles_amd64.goslice_gen_amd64.goslicepart_amd64.gostring.gotypes_amd64.gounsafe_helpers.godummy.s
Package-Level Type Names (total 43)
/* sort by: | */
Float32x16 is a 512-bit SIMD vector of 16 float32 Add adds corresponding elements of two vectors.
Asm: VADDPS, CPU Feature: AVX512 Float64x8 converts from Float32x16 to Float64x8 Int16x32 converts from Float32x16 to Int16x32 Int32x16 converts from Float32x16 to Int32x16 Int64x8 converts from Float32x16 to Int64x8 Int8x64 converts from Float32x16 to Int8x64 Uint16x32 converts from Float32x16 to Uint16x32 Uint32x16 converts from Float32x16 to Uint32x16 Uint64x8 converts from Float32x16 to Uint64x8 Uint8x64 converts from Float32x16 to Uint8x64 CeilScaled rounds elements up with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512 CeilScaledResidue computes the difference after ceiling with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VCOMPRESSPS, CPU Feature: AVX512 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2PS, CPU Feature: AVX512 ConvertToInt32 converts element values to int32.
When a conversion is inexact, a truncated (round toward zero) value is returned.
If a converted result cannot be represented in int32, an implementation-defined
architecture-specific value is returned.
Asm: VCVTTPS2DQ, CPU Feature: AVX512 ConvertToUint32 converts element values to uint32.
When a conversion is inexact, a truncated (round toward zero) value is returned.
If a converted result cannot be represented in uint32, an implementation-defined
architecture-specific value is returned.
Asm: VCVTTPS2UDQ, CPU Feature: AVX512 Div divides elements of two vectors.
Asm: VDIVPS, CPU Feature: AVX512 Equal returns x equals y, elementwise.
Asm: VCMPPS, CPU Feature: AVX512 Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VEXPANDPS, CPU Feature: AVX512 FloorScaled rounds elements down with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512 FloorScaledResidue computes the difference after flooring with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512 GetHi returns the upper half of x.
Asm: VEXTRACTF64X4, CPU Feature: AVX512 GetLo returns the lower half of x.
Asm: VEXTRACTF64X4, CPU Feature: AVX512 Greater returns x greater-than y, elementwise.
Asm: VCMPPS, CPU Feature: AVX512 GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VCMPPS, CPU Feature: AVX512 IsNan checks if elements are NaN. Use as x.IsNan(x).
Asm: VCMPPS, CPU Feature: AVX512 Len returns the number of elements in a Float32x16 Less returns x less-than y, elementwise.
Asm: VCMPPS, CPU Feature: AVX512 LessEqual returns x less-than-or-equals y, elementwise.
Asm: VCMPPS, CPU Feature: AVX512 Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VMAXPS, CPU Feature: AVX512 Merge returns x but with elements set to y where m is false. Min computes the minimum of corresponding elements.
Asm: VMINPS, CPU Feature: AVX512 Mul multiplies corresponding elements of two vectors.
Asm: VMULPS, CPU Feature: AVX512 MulAdd performs a fused (x * y) + z.
Asm: VFMADD213PS, CPU Feature: AVX512 MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
Asm: VFMADDSUB213PS, CPU Feature: AVX512 MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
Asm: VFMSUBADD213PS, CPU Feature: AVX512 NotEqual returns x not-equals y, elementwise.
Asm: VCMPPS, CPU Feature: AVX512 Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 4 bits (values 0-15) of each element of indices is used
Asm: VPERMPS, CPU Feature: AVX512 Reciprocal computes an approximate reciprocal of each element.
Asm: VRCP14PS, CPU Feature: AVX512 ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
Asm: VRSQRT14PS, CPU Feature: AVX512 RoundToEvenScaled rounds elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512 RoundToEvenScaledResidue computes the difference after rounding with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512 Scale multiplies elements by a power of 2.
Asm: VSCALEFPS, CPU Feature: AVX512 SelectFromPairGrouped returns, for each of the four 128-bit subvectors
of the vectors x and y, the selection of four elements from x and y,
where selector values in the range 0-3 specify elements from x and
values in the range 4-7 specify the 0-3 elements of y.
When the selectors are constants and can be the selection
can be implemented in a single instruction, it will be, otherwise
it requires two.
If the selectors are not constant this will translate to a function
call.
Asm: VSHUFPS, CPU Feature: AVX512 SetHi returns x with its upper half set to y.
Asm: VINSERTF64X4, CPU Feature: AVX512 SetLo returns x with its lower half set to y.
Asm: VINSERTF64X4, CPU Feature: AVX512 Sqrt computes the square root of each element.
Asm: VSQRTPS, CPU Feature: AVX512 Store stores a Float32x16 to an array StoreMasked stores a Float32x16 to an array,
at those elements enabled by mask
Asm: VMOVDQU32, CPU Feature: AVX512 StoreSlice stores x into a slice of at least 16 float32s StoreSlicePart stores the 16 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 16 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VSUBPS, CPU Feature: AVX512 TruncScaled truncates elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512 TruncScaledResidue computes the difference after truncating with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
Float32x16 : expvar.Var
Float32x16 : fmt.Stringer
func BroadcastFloat32x16(x float32) Float32x16
func LoadFloat32x16(y *[16]float32) Float32x16
func LoadFloat32x16Slice(s []float32) Float32x16
func LoadFloat32x16SlicePart(s []float32) Float32x16
func LoadMaskedFloat32x16(y *[16]float32, mask Mask32x16) Float32x16
func Float32x16.Add(y Float32x16) Float32x16
func Float32x16.CeilScaled(prec uint8) Float32x16
func Float32x16.CeilScaledResidue(prec uint8) Float32x16
func Float32x16.Compress(mask Mask32x16) Float32x16
func Float32x16.ConcatPermute(y Float32x16, indices Uint32x16) Float32x16
func Float32x16.Div(y Float32x16) Float32x16
func Float32x16.Expand(mask Mask32x16) Float32x16
func Float32x16.FloorScaled(prec uint8) Float32x16
func Float32x16.FloorScaledResidue(prec uint8) Float32x16
func Float32x16.Masked(mask Mask32x16) Float32x16
func Float32x16.Max(y Float32x16) Float32x16
func Float32x16.Merge(y Float32x16, mask Mask32x16) Float32x16
func Float32x16.Min(y Float32x16) Float32x16
func Float32x16.Mul(y Float32x16) Float32x16
func Float32x16.MulAdd(y Float32x16, z Float32x16) Float32x16
func Float32x16.MulAddSub(y Float32x16, z Float32x16) Float32x16
func Float32x16.MulSubAdd(y Float32x16, z Float32x16) Float32x16
func Float32x16.Permute(indices Uint32x16) Float32x16
func Float32x16.Reciprocal() Float32x16
func Float32x16.ReciprocalSqrt() Float32x16
func Float32x16.RoundToEvenScaled(prec uint8) Float32x16
func Float32x16.RoundToEvenScaledResidue(prec uint8) Float32x16
func Float32x16.Scale(y Float32x16) Float32x16
func Float32x16.SelectFromPairGrouped(a, b, c, d uint8, y Float32x16) Float32x16
func Float32x16.SetHi(y Float32x8) Float32x16
func Float32x16.SetLo(y Float32x8) Float32x16
func Float32x16.Sqrt() Float32x16
func Float32x16.Sub(y Float32x16) Float32x16
func Float32x16.TruncScaled(prec uint8) Float32x16
func Float32x16.TruncScaledResidue(prec uint8) Float32x16
func Float32x4.Broadcast512() Float32x16
func Float64x8.AsFloat32x16() (to Float32x16)
func Int16x32.AsFloat32x16() (to Float32x16)
func Int32x16.AsFloat32x16() (to Float32x16)
func Int32x16.ConvertToFloat32() Float32x16
func Int64x8.AsFloat32x16() (to Float32x16)
func Int8x64.AsFloat32x16() (to Float32x16)
func Uint16x32.AsFloat32x16() (to Float32x16)
func Uint32x16.AsFloat32x16() (to Float32x16)
func Uint32x16.ConvertToFloat32() Float32x16
func Uint64x8.AsFloat32x16() (to Float32x16)
func Uint8x64.AsFloat32x16() (to Float32x16)
func Float32x16.Add(y Float32x16) Float32x16
func Float32x16.ConcatPermute(y Float32x16, indices Uint32x16) Float32x16
func Float32x16.Div(y Float32x16) Float32x16
func Float32x16.Equal(y Float32x16) Mask32x16
func Float32x16.Greater(y Float32x16) Mask32x16
func Float32x16.GreaterEqual(y Float32x16) Mask32x16
func Float32x16.IsNan(y Float32x16) Mask32x16
func Float32x16.Less(y Float32x16) Mask32x16
func Float32x16.LessEqual(y Float32x16) Mask32x16
func Float32x16.Max(y Float32x16) Float32x16
func Float32x16.Merge(y Float32x16, mask Mask32x16) Float32x16
func Float32x16.Min(y Float32x16) Float32x16
func Float32x16.Mul(y Float32x16) Float32x16
func Float32x16.MulAdd(y Float32x16, z Float32x16) Float32x16
func Float32x16.MulAdd(y Float32x16, z Float32x16) Float32x16
func Float32x16.MulAddSub(y Float32x16, z Float32x16) Float32x16
func Float32x16.MulAddSub(y Float32x16, z Float32x16) Float32x16
func Float32x16.MulSubAdd(y Float32x16, z Float32x16) Float32x16
func Float32x16.MulSubAdd(y Float32x16, z Float32x16) Float32x16
func Float32x16.NotEqual(y Float32x16) Mask32x16
func Float32x16.Scale(y Float32x16) Float32x16
func Float32x16.SelectFromPairGrouped(a, b, c, d uint8, y Float32x16) Float32x16
func Float32x16.Sub(y Float32x16) Float32x16
Float32x4 is a 128-bit SIMD vector of 4 float32 Add adds corresponding elements of two vectors.
Asm: VADDPS, CPU Feature: AVX AddPairs horizontally adds adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VHADDPS, CPU Feature: AVX AddSub subtracts even elements and adds odd elements of two vectors.
Asm: VADDSUBPS, CPU Feature: AVX Float64x2 converts from Float32x4 to Float64x2 Int16x8 converts from Float32x4 to Int16x8 Int32x4 converts from Float32x4 to Int32x4 Int64x2 converts from Float32x4 to Int64x2 Int8x16 converts from Float32x4 to Int8x16 Uint16x8 converts from Float32x4 to Uint16x8 Uint32x4 converts from Float32x4 to Uint32x4 Uint64x2 converts from Float32x4 to Uint64x2 Uint8x16 converts from Float32x4 to Uint8x16 Broadcast128 copies element zero of its (128-bit) input to all elements of
the 128-bit output vector.
Asm: VBROADCASTSS, CPU Feature: AVX2 Broadcast256 copies element zero of its (128-bit) input to all elements of
the 256-bit output vector.
Asm: VBROADCASTSS, CPU Feature: AVX2 Broadcast512 copies element zero of its (128-bit) input to all elements of
the 512-bit output vector.
Asm: VBROADCASTSS, CPU Feature: AVX512 Ceil rounds elements up to the nearest integer.
Asm: VROUNDPS, CPU Feature: AVX CeilScaled rounds elements up with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512 CeilScaledResidue computes the difference after ceiling with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VCOMPRESSPS, CPU Feature: AVX512 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2PS, CPU Feature: AVX512 ConvertToFloat64 converts element values to float64.
Asm: VCVTPS2PD, CPU Feature: AVX ConvertToInt32 converts element values to int32.
When a conversion is inexact, a truncated (round toward zero) value is returned.
If a converted result cannot be represented in int32, an implementation-defined
architecture-specific value is returned.
Asm: VCVTTPS2DQ, CPU Feature: AVX ConvertToInt64 converts element values to int64.
When a conversion is inexact, a truncated (round toward zero) value is returned.
If a converted result cannot be represented in int64, an implementation-defined
architecture-specific value is returned.
Asm: VCVTTPS2QQ, CPU Feature: AVX512 ConvertToUint32 converts element values to uint32.
When a conversion is inexact, a truncated (round toward zero) value is returned.
If a converted result cannot be represented in uint32, an implementation-defined
architecture-specific value is returned.
Asm: VCVTTPS2UDQ, CPU Feature: AVX512 ConvertToUint64 converts element values to uint64.
When a conversion is inexact, a truncated (round toward zero) value is returned.
If a converted result cannot be represented in uint64, an implementation-defined
architecture-specific value is returned.
Asm: VCVTTPS2UQQ, CPU Feature: AVX512 Div divides elements of two vectors.
Asm: VDIVPS, CPU Feature: AVX Equal returns x equals y, elementwise.
Asm: VCMPPS, CPU Feature: AVX Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VEXPANDPS, CPU Feature: AVX512 Floor rounds elements down to the nearest integer.
Asm: VROUNDPS, CPU Feature: AVX FloorScaled rounds elements down with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512 FloorScaledResidue computes the difference after flooring with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512 GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRD, CPU Feature: AVX Greater returns x greater-than y, elementwise.
Asm: VCMPPS, CPU Feature: AVX GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VCMPPS, CPU Feature: AVX IsNan checks if elements are NaN. Use as x.IsNan(x).
Asm: VCMPPS, CPU Feature: AVX Len returns the number of elements in a Float32x4 Less returns x less-than y, elementwise.
Asm: VCMPPS, CPU Feature: AVX LessEqual returns x less-than-or-equals y, elementwise.
Asm: VCMPPS, CPU Feature: AVX Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VMAXPS, CPU Feature: AVX Merge returns x but with elements set to y where mask is false. Min computes the minimum of corresponding elements.
Asm: VMINPS, CPU Feature: AVX Mul multiplies corresponding elements of two vectors.
Asm: VMULPS, CPU Feature: AVX MulAdd performs a fused (x * y) + z.
Asm: VFMADD213PS, CPU Feature: AVX512 MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
Asm: VFMADDSUB213PS, CPU Feature: AVX512 MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
Asm: VFMSUBADD213PS, CPU Feature: AVX512 NotEqual returns x not-equals y, elementwise.
Asm: VCMPPS, CPU Feature: AVX Reciprocal computes an approximate reciprocal of each element.
Asm: VRCPPS, CPU Feature: AVX ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
Asm: VRSQRTPS, CPU Feature: AVX RoundToEven rounds elements to the nearest integer.
Asm: VROUNDPS, CPU Feature: AVX RoundToEvenScaled rounds elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512 RoundToEvenScaledResidue computes the difference after rounding with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512 Scale multiplies elements by a power of 2.
Asm: VSCALEFPS, CPU Feature: AVX512 SelectFromPair returns the selection of four elements from the two
vectors x and y, where selector values in the range 0-3 specify
elements from x and values in the range 4-7 specify the 0-3 elements
of y. When the selectors are constants and can be the selection
can be implemented in a single instruction, it will be, otherwise
it requires two. a is the source index of the least element in the
output, and b, c, and d are the indices of the 2nd, 3rd, and 4th
elements in the output. For example,
{1,2,4,8}.SelectFromPair(2,3,5,7,{9,25,49,81}) returns {4,8,25,81}
If the selectors are not constant this will translate to a function
call.
Asm: VSHUFPS, CPU Feature: AVX SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRD, CPU Feature: AVX Sqrt computes the square root of each element.
Asm: VSQRTPS, CPU Feature: AVX Store stores a Float32x4 to an array StoreMasked stores a Float32x4 to an array,
at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2 StoreSlice stores x into a slice of at least 4 float32s StoreSlicePart stores the 4 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 4 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VSUBPS, CPU Feature: AVX SubPairs horizontally subtracts adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VHSUBPS, CPU Feature: AVX Trunc truncates elements towards zero.
Asm: VROUNDPS, CPU Feature: AVX TruncScaled truncates elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512 TruncScaledResidue computes the difference after truncating with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
Float32x4 : expvar.Var
Float32x4 : fmt.Stringer
func BroadcastFloat32x4(x float32) Float32x4
func LoadFloat32x4(y *[4]float32) Float32x4
func LoadFloat32x4Slice(s []float32) Float32x4
func LoadFloat32x4SlicePart(s []float32) Float32x4
func LoadMaskedFloat32x4(y *[4]float32, mask Mask32x4) Float32x4
func Float32x4.Add(y Float32x4) Float32x4
func Float32x4.AddPairs(y Float32x4) Float32x4
func Float32x4.AddSub(y Float32x4) Float32x4
func Float32x4.Broadcast128() Float32x4
func Float32x4.Ceil() Float32x4
func Float32x4.CeilScaled(prec uint8) Float32x4
func Float32x4.CeilScaledResidue(prec uint8) Float32x4
func Float32x4.Compress(mask Mask32x4) Float32x4
func Float32x4.ConcatPermute(y Float32x4, indices Uint32x4) Float32x4
func Float32x4.Div(y Float32x4) Float32x4
func Float32x4.Expand(mask Mask32x4) Float32x4
func Float32x4.Floor() Float32x4
func Float32x4.FloorScaled(prec uint8) Float32x4
func Float32x4.FloorScaledResidue(prec uint8) Float32x4
func Float32x4.Masked(mask Mask32x4) Float32x4
func Float32x4.Max(y Float32x4) Float32x4
func Float32x4.Merge(y Float32x4, mask Mask32x4) Float32x4
func Float32x4.Min(y Float32x4) Float32x4
func Float32x4.Mul(y Float32x4) Float32x4
func Float32x4.MulAdd(y Float32x4, z Float32x4) Float32x4
func Float32x4.MulAddSub(y Float32x4, z Float32x4) Float32x4
func Float32x4.MulSubAdd(y Float32x4, z Float32x4) Float32x4
func Float32x4.Reciprocal() Float32x4
func Float32x4.ReciprocalSqrt() Float32x4
func Float32x4.RoundToEven() Float32x4
func Float32x4.RoundToEvenScaled(prec uint8) Float32x4
func Float32x4.RoundToEvenScaledResidue(prec uint8) Float32x4
func Float32x4.Scale(y Float32x4) Float32x4
func Float32x4.SelectFromPair(a, b, c, d uint8, y Float32x4) Float32x4
func Float32x4.SetElem(index uint8, y float32) Float32x4
func Float32x4.Sqrt() Float32x4
func Float32x4.Sub(y Float32x4) Float32x4
func Float32x4.SubPairs(y Float32x4) Float32x4
func Float32x4.Trunc() Float32x4
func Float32x4.TruncScaled(prec uint8) Float32x4
func Float32x4.TruncScaledResidue(prec uint8) Float32x4
func Float32x8.GetHi() Float32x4
func Float32x8.GetLo() Float32x4
func Float64x2.AsFloat32x4() (to Float32x4)
func Float64x2.ConvertToFloat32() Float32x4
func Float64x4.ConvertToFloat32() Float32x4
func Int16x8.AsFloat32x4() (to Float32x4)
func Int32x4.AsFloat32x4() (to Float32x4)
func Int32x4.ConvertToFloat32() Float32x4
func Int64x2.AsFloat32x4() (to Float32x4)
func Int64x2.ConvertToFloat32() Float32x4
func Int64x4.ConvertToFloat32() Float32x4
func Int8x16.AsFloat32x4() (to Float32x4)
func Uint16x8.AsFloat32x4() (to Float32x4)
func Uint32x4.AsFloat32x4() (to Float32x4)
func Uint32x4.ConvertToFloat32() Float32x4
func Uint64x2.AsFloat32x4() (to Float32x4)
func Uint64x2.ConvertToFloat32() Float32x4
func Uint64x4.ConvertToFloat32() Float32x4
func Uint8x16.AsFloat32x4() (to Float32x4)
func Float32x4.Add(y Float32x4) Float32x4
func Float32x4.AddPairs(y Float32x4) Float32x4
func Float32x4.AddSub(y Float32x4) Float32x4
func Float32x4.ConcatPermute(y Float32x4, indices Uint32x4) Float32x4
func Float32x4.Div(y Float32x4) Float32x4
func Float32x4.Equal(y Float32x4) Mask32x4
func Float32x4.Greater(y Float32x4) Mask32x4
func Float32x4.GreaterEqual(y Float32x4) Mask32x4
func Float32x4.IsNan(y Float32x4) Mask32x4
func Float32x4.Less(y Float32x4) Mask32x4
func Float32x4.LessEqual(y Float32x4) Mask32x4
func Float32x4.Max(y Float32x4) Float32x4
func Float32x4.Merge(y Float32x4, mask Mask32x4) Float32x4
func Float32x4.Min(y Float32x4) Float32x4
func Float32x4.Mul(y Float32x4) Float32x4
func Float32x4.MulAdd(y Float32x4, z Float32x4) Float32x4
func Float32x4.MulAdd(y Float32x4, z Float32x4) Float32x4
func Float32x4.MulAddSub(y Float32x4, z Float32x4) Float32x4
func Float32x4.MulAddSub(y Float32x4, z Float32x4) Float32x4
func Float32x4.MulSubAdd(y Float32x4, z Float32x4) Float32x4
func Float32x4.MulSubAdd(y Float32x4, z Float32x4) Float32x4
func Float32x4.NotEqual(y Float32x4) Mask32x4
func Float32x4.Scale(y Float32x4) Float32x4
func Float32x4.SelectFromPair(a, b, c, d uint8, y Float32x4) Float32x4
func Float32x4.Sub(y Float32x4) Float32x4
func Float32x4.SubPairs(y Float32x4) Float32x4
func Float32x8.SetHi(y Float32x4) Float32x8
func Float32x8.SetLo(y Float32x4) Float32x8
Float32x8 is a 256-bit SIMD vector of 8 float32 Add adds corresponding elements of two vectors.
Asm: VADDPS, CPU Feature: AVX AddPairs horizontally adds adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VHADDPS, CPU Feature: AVX AddSub subtracts even elements and adds odd elements of two vectors.
Asm: VADDSUBPS, CPU Feature: AVX Float64x4 converts from Float32x8 to Float64x4 Int16x16 converts from Float32x8 to Int16x16 Int32x8 converts from Float32x8 to Int32x8 Int64x4 converts from Float32x8 to Int64x4 Int8x32 converts from Float32x8 to Int8x32 Uint16x16 converts from Float32x8 to Uint16x16 Uint32x8 converts from Float32x8 to Uint32x8 Uint64x4 converts from Float32x8 to Uint64x4 Uint8x32 converts from Float32x8 to Uint8x32 Ceil rounds elements up to the nearest integer.
Asm: VROUNDPS, CPU Feature: AVX CeilScaled rounds elements up with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512 CeilScaledResidue computes the difference after ceiling with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VCOMPRESSPS, CPU Feature: AVX512 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2PS, CPU Feature: AVX512 ConvertToFloat64 converts element values to float64.
Asm: VCVTPS2PD, CPU Feature: AVX512 ConvertToInt32 converts element values to int32.
When a conversion is inexact, a truncated (round toward zero) value is returned.
If a converted result cannot be represented in int32, an implementation-defined
architecture-specific value is returned.
Asm: VCVTTPS2DQ, CPU Feature: AVX ConvertToInt64 converts element values to int64.
When a conversion is inexact, a truncated (round toward zero) value is returned.
If a converted result cannot be represented in int64, an implementation-defined
architecture-specific value is returned.
Asm: VCVTTPS2QQ, CPU Feature: AVX512 ConvertToUint32 converts element values to uint32.
When a conversion is inexact, a truncated (round toward zero) value is returned.
If a converted result cannot be represented in uint32, an implementation-defined
architecture-specific value is returned.
Asm: VCVTTPS2UDQ, CPU Feature: AVX512 ConvertToUint64 converts element values to uint64.
When a conversion is inexact, a truncated (round toward zero) value is returned.
If a converted result cannot be represented in uint64, an implementation-defined
architecture-specific value is returned.
Asm: VCVTTPS2UQQ, CPU Feature: AVX512 Div divides elements of two vectors.
Asm: VDIVPS, CPU Feature: AVX Equal returns x equals y, elementwise.
Asm: VCMPPS, CPU Feature: AVX Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VEXPANDPS, CPU Feature: AVX512 Floor rounds elements down to the nearest integer.
Asm: VROUNDPS, CPU Feature: AVX FloorScaled rounds elements down with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512 FloorScaledResidue computes the difference after flooring with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512 GetHi returns the upper half of x.
Asm: VEXTRACTF128, CPU Feature: AVX GetLo returns the lower half of x.
Asm: VEXTRACTF128, CPU Feature: AVX Greater returns x greater-than y, elementwise.
Asm: VCMPPS, CPU Feature: AVX GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VCMPPS, CPU Feature: AVX IsNan checks if elements are NaN. Use as x.IsNan(x).
Asm: VCMPPS, CPU Feature: AVX Len returns the number of elements in a Float32x8 Less returns x less-than y, elementwise.
Asm: VCMPPS, CPU Feature: AVX LessEqual returns x less-than-or-equals y, elementwise.
Asm: VCMPPS, CPU Feature: AVX Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VMAXPS, CPU Feature: AVX Merge returns x but with elements set to y where mask is false. Min computes the minimum of corresponding elements.
Asm: VMINPS, CPU Feature: AVX Mul multiplies corresponding elements of two vectors.
Asm: VMULPS, CPU Feature: AVX MulAdd performs a fused (x * y) + z.
Asm: VFMADD213PS, CPU Feature: AVX512 MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
Asm: VFMADDSUB213PS, CPU Feature: AVX512 MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
Asm: VFMSUBADD213PS, CPU Feature: AVX512 NotEqual returns x not-equals y, elementwise.
Asm: VCMPPS, CPU Feature: AVX Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 3 bits (values 0-7) of each element of indices is used
Asm: VPERMPS, CPU Feature: AVX2 Reciprocal computes an approximate reciprocal of each element.
Asm: VRCPPS, CPU Feature: AVX ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
Asm: VRSQRTPS, CPU Feature: AVX RoundToEven rounds elements to the nearest integer.
Asm: VROUNDPS, CPU Feature: AVX RoundToEvenScaled rounds elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512 RoundToEvenScaledResidue computes the difference after rounding with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512 Scale multiplies elements by a power of 2.
Asm: VSCALEFPS, CPU Feature: AVX512 Select128FromPair treats the 256-bit vectors x and y as a single vector of four
128-bit elements, and returns a 256-bit result formed by
concatenating the two elements specified by lo and hi.
For example,
{40, 41, 42, 43, 50, 51, 52, 53}.Select128FromPair(3, 0, {60, 61, 62, 63, 70, 71, 72, 73})
returns {70, 71, 72, 73, 40, 41, 42, 43}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table.
lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2F128, CPU Feature: AVX SelectFromPairGrouped returns, for each of the two 128-bit halves of
the vectors x and y, the selection of four elements from x and y,
where selector values in the range 0-3 specify elements from x and
values in the range 4-7 specify the 0-3 elements of y.
When the selectors are constants and can be the selection
can be implemented in a single instruction, it will be, otherwise
it requires two. a is the source index of the least element in the
output, and b, c, and d are the indices of the 2nd, 3rd, and 4th
elements in the output. For example,
{1,2,4,8,16,32,64,128}.SelectFromPair(2,3,5,7,{9,25,49,81,121,169,225,289})
returns {4,8,25,81,64,128,169,289}
If the selectors are not constant this will translate to a function
call.
Asm: VSHUFPS, CPU Feature: AVX SetHi returns x with its upper half set to y.
Asm: VINSERTF128, CPU Feature: AVX SetLo returns x with its lower half set to y.
Asm: VINSERTF128, CPU Feature: AVX Sqrt computes the square root of each element.
Asm: VSQRTPS, CPU Feature: AVX Store stores a Float32x8 to an array StoreMasked stores a Float32x8 to an array,
at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2 StoreSlice stores x into a slice of at least 8 float32s StoreSlicePart stores the 8 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 8 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VSUBPS, CPU Feature: AVX SubPairs horizontally subtracts adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VHSUBPS, CPU Feature: AVX Trunc truncates elements towards zero.
Asm: VROUNDPS, CPU Feature: AVX TruncScaled truncates elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512 TruncScaledResidue computes the difference after truncating with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
Float32x8 : expvar.Var
Float32x8 : fmt.Stringer
func BroadcastFloat32x8(x float32) Float32x8
func LoadFloat32x8(y *[8]float32) Float32x8
func LoadFloat32x8Slice(s []float32) Float32x8
func LoadFloat32x8SlicePart(s []float32) Float32x8
func LoadMaskedFloat32x8(y *[8]float32, mask Mask32x8) Float32x8
func Float32x16.GetHi() Float32x8
func Float32x16.GetLo() Float32x8
func Float32x4.Broadcast256() Float32x8
func Float32x8.Add(y Float32x8) Float32x8
func Float32x8.AddPairs(y Float32x8) Float32x8
func Float32x8.AddSub(y Float32x8) Float32x8
func Float32x8.Ceil() Float32x8
func Float32x8.CeilScaled(prec uint8) Float32x8
func Float32x8.CeilScaledResidue(prec uint8) Float32x8
func Float32x8.Compress(mask Mask32x8) Float32x8
func Float32x8.ConcatPermute(y Float32x8, indices Uint32x8) Float32x8
func Float32x8.Div(y Float32x8) Float32x8
func Float32x8.Expand(mask Mask32x8) Float32x8
func Float32x8.Floor() Float32x8
func Float32x8.FloorScaled(prec uint8) Float32x8
func Float32x8.FloorScaledResidue(prec uint8) Float32x8
func Float32x8.Masked(mask Mask32x8) Float32x8
func Float32x8.Max(y Float32x8) Float32x8
func Float32x8.Merge(y Float32x8, mask Mask32x8) Float32x8
func Float32x8.Min(y Float32x8) Float32x8
func Float32x8.Mul(y Float32x8) Float32x8
func Float32x8.MulAdd(y Float32x8, z Float32x8) Float32x8
func Float32x8.MulAddSub(y Float32x8, z Float32x8) Float32x8
func Float32x8.MulSubAdd(y Float32x8, z Float32x8) Float32x8
func Float32x8.Permute(indices Uint32x8) Float32x8
func Float32x8.Reciprocal() Float32x8
func Float32x8.ReciprocalSqrt() Float32x8
func Float32x8.RoundToEven() Float32x8
func Float32x8.RoundToEvenScaled(prec uint8) Float32x8
func Float32x8.RoundToEvenScaledResidue(prec uint8) Float32x8
func Float32x8.Scale(y Float32x8) Float32x8
func Float32x8.Select128FromPair(lo, hi uint8, y Float32x8) Float32x8
func Float32x8.SelectFromPairGrouped(a, b, c, d uint8, y Float32x8) Float32x8
func Float32x8.SetHi(y Float32x4) Float32x8
func Float32x8.SetLo(y Float32x4) Float32x8
func Float32x8.Sqrt() Float32x8
func Float32x8.Sub(y Float32x8) Float32x8
func Float32x8.SubPairs(y Float32x8) Float32x8
func Float32x8.Trunc() Float32x8
func Float32x8.TruncScaled(prec uint8) Float32x8
func Float32x8.TruncScaledResidue(prec uint8) Float32x8
func Float64x4.AsFloat32x8() (to Float32x8)
func Float64x8.ConvertToFloat32() Float32x8
func Int16x16.AsFloat32x8() (to Float32x8)
func Int32x8.AsFloat32x8() (to Float32x8)
func Int32x8.ConvertToFloat32() Float32x8
func Int64x4.AsFloat32x8() (to Float32x8)
func Int64x8.ConvertToFloat32() Float32x8
func Int8x32.AsFloat32x8() (to Float32x8)
func Uint16x16.AsFloat32x8() (to Float32x8)
func Uint32x8.AsFloat32x8() (to Float32x8)
func Uint32x8.ConvertToFloat32() Float32x8
func Uint64x4.AsFloat32x8() (to Float32x8)
func Uint64x8.ConvertToFloat32() Float32x8
func Uint8x32.AsFloat32x8() (to Float32x8)
func Float32x16.SetHi(y Float32x8) Float32x16
func Float32x16.SetLo(y Float32x8) Float32x16
func Float32x8.Add(y Float32x8) Float32x8
func Float32x8.AddPairs(y Float32x8) Float32x8
func Float32x8.AddSub(y Float32x8) Float32x8
func Float32x8.ConcatPermute(y Float32x8, indices Uint32x8) Float32x8
func Float32x8.Div(y Float32x8) Float32x8
func Float32x8.Equal(y Float32x8) Mask32x8
func Float32x8.Greater(y Float32x8) Mask32x8
func Float32x8.GreaterEqual(y Float32x8) Mask32x8
func Float32x8.IsNan(y Float32x8) Mask32x8
func Float32x8.Less(y Float32x8) Mask32x8
func Float32x8.LessEqual(y Float32x8) Mask32x8
func Float32x8.Max(y Float32x8) Float32x8
func Float32x8.Merge(y Float32x8, mask Mask32x8) Float32x8
func Float32x8.Min(y Float32x8) Float32x8
func Float32x8.Mul(y Float32x8) Float32x8
func Float32x8.MulAdd(y Float32x8, z Float32x8) Float32x8
func Float32x8.MulAdd(y Float32x8, z Float32x8) Float32x8
func Float32x8.MulAddSub(y Float32x8, z Float32x8) Float32x8
func Float32x8.MulAddSub(y Float32x8, z Float32x8) Float32x8
func Float32x8.MulSubAdd(y Float32x8, z Float32x8) Float32x8
func Float32x8.MulSubAdd(y Float32x8, z Float32x8) Float32x8
func Float32x8.NotEqual(y Float32x8) Mask32x8
func Float32x8.Scale(y Float32x8) Float32x8
func Float32x8.Select128FromPair(lo, hi uint8, y Float32x8) Float32x8
func Float32x8.SelectFromPairGrouped(a, b, c, d uint8, y Float32x8) Float32x8
func Float32x8.Sub(y Float32x8) Float32x8
func Float32x8.SubPairs(y Float32x8) Float32x8
Float64x2 is a 128-bit SIMD vector of 2 float64 Add adds corresponding elements of two vectors.
Asm: VADDPD, CPU Feature: AVX AddPairs horizontally adds adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VHADDPD, CPU Feature: AVX AddSub subtracts even elements and adds odd elements of two vectors.
Asm: VADDSUBPD, CPU Feature: AVX Float32x4 converts from Float64x2 to Float32x4 Int16x8 converts from Float64x2 to Int16x8 Int32x4 converts from Float64x2 to Int32x4 Int64x2 converts from Float64x2 to Int64x2 Int8x16 converts from Float64x2 to Int8x16 Uint16x8 converts from Float64x2 to Uint16x8 Uint32x4 converts from Float64x2 to Uint32x4 Uint64x2 converts from Float64x2 to Uint64x2 Uint8x16 converts from Float64x2 to Uint8x16 Broadcast128 copies element zero of its (128-bit) input to all elements of
the 128-bit output vector.
Asm: VPBROADCASTQ, CPU Feature: AVX2 Broadcast256 copies element zero of its (128-bit) input to all elements of
the 256-bit output vector.
Asm: VBROADCASTSD, CPU Feature: AVX2 Broadcast512 copies element zero of its (128-bit) input to all elements of
the 512-bit output vector.
Asm: VBROADCASTSD, CPU Feature: AVX512 Ceil rounds elements up to the nearest integer.
Asm: VROUNDPD, CPU Feature: AVX CeilScaled rounds elements up with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512 CeilScaledResidue computes the difference after ceiling with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VCOMPRESSPD, CPU Feature: AVX512 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2PD, CPU Feature: AVX512 ConvertToFloat32 converts element values to float32.
The result vector's elements are rounded to the nearest value.
Asm: VCVTPD2PSX, CPU Feature: AVX ConvertToInt32 converts element values to int32.
When a conversion is inexact, a truncated (round toward zero) value is returned.
If a converted result cannot be represented in int32, an implementation-defined
architecture-specific value is returned.
Asm: VCVTTPD2DQX, CPU Feature: AVX ConvertToInt64 converts element values to int64.
When a conversion is inexact, a truncated (round toward zero) value is returned.
If a converted result cannot be represented in int64, an implementation-defined
architecture-specific value is returned.
Asm: VCVTTPD2QQ, CPU Feature: AVX512 ConvertToUint32 converts element values to uint32.
When a conversion is inexact, a truncated (round toward zero) value is returned.
If a converted result cannot be represented in uint32, an implementation-defined
architecture-specific value is returned.
Asm: VCVTTPD2UDQX, CPU Feature: AVX512 ConvertToUint64 converts element values to uint64.
When a conversion is inexact, a truncated (round toward zero) value is returned.
If a converted result cannot be represented in uint64, an implementation-defined
architecture-specific value is returned.
Asm: VCVTTPD2UQQ, CPU Feature: AVX512 Div divides elements of two vectors.
Asm: VDIVPD, CPU Feature: AVX Equal returns x equals y, elementwise.
Asm: VCMPPD, CPU Feature: AVX Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VEXPANDPD, CPU Feature: AVX512 Floor rounds elements down to the nearest integer.
Asm: VROUNDPD, CPU Feature: AVX FloorScaled rounds elements down with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512 FloorScaledResidue computes the difference after flooring with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512 GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRQ, CPU Feature: AVX Greater returns x greater-than y, elementwise.
Asm: VCMPPD, CPU Feature: AVX GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VCMPPD, CPU Feature: AVX IsNan checks if elements are NaN. Use as x.IsNan(x).
Asm: VCMPPD, CPU Feature: AVX Len returns the number of elements in a Float64x2 Less returns x less-than y, elementwise.
Asm: VCMPPD, CPU Feature: AVX LessEqual returns x less-than-or-equals y, elementwise.
Asm: VCMPPD, CPU Feature: AVX Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VMAXPD, CPU Feature: AVX Merge returns x but with elements set to y where mask is false. Min computes the minimum of corresponding elements.
Asm: VMINPD, CPU Feature: AVX Mul multiplies corresponding elements of two vectors.
Asm: VMULPD, CPU Feature: AVX MulAdd performs a fused (x * y) + z.
Asm: VFMADD213PD, CPU Feature: AVX512 MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
Asm: VFMADDSUB213PD, CPU Feature: AVX512 MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
Asm: VFMSUBADD213PD, CPU Feature: AVX512 NotEqual returns x not-equals y, elementwise.
Asm: VCMPPD, CPU Feature: AVX Reciprocal computes an approximate reciprocal of each element.
Asm: VRCP14PD, CPU Feature: AVX512 ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
Asm: VRSQRT14PD, CPU Feature: AVX512 RoundToEven rounds elements to the nearest integer.
Asm: VROUNDPD, CPU Feature: AVX RoundToEvenScaled rounds elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512 RoundToEvenScaledResidue computes the difference after rounding with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512 Scale multiplies elements by a power of 2.
Asm: VSCALEFPD, CPU Feature: AVX512 SelectFromPair returns the selection of two elements from the two
vectors x and y, where selector values in the range 0-1 specify
elements from x and values in the range 2-3 specify the 0-1 elements
of y. When the selectors are constants the selection can be
implemented in a single instruction.
If the selectors are not constant this will translate to a function
call.
Asm: VSHUFPD, CPU Feature: AVX SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRQ, CPU Feature: AVX Sqrt computes the square root of each element.
Asm: VSQRTPD, CPU Feature: AVX Store stores a Float64x2 to an array StoreMasked stores a Float64x2 to an array,
at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2 StoreSlice stores x into a slice of at least 2 float64s StoreSlicePart stores the 2 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 2 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VSUBPD, CPU Feature: AVX SubPairs horizontally subtracts adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VHSUBPD, CPU Feature: AVX Trunc truncates elements towards zero.
Asm: VROUNDPD, CPU Feature: AVX TruncScaled truncates elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512 TruncScaledResidue computes the difference after truncating with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
Float64x2 : expvar.Var
Float64x2 : fmt.Stringer
func BroadcastFloat64x2(x float64) Float64x2
func LoadFloat64x2(y *[2]float64) Float64x2
func LoadFloat64x2Slice(s []float64) Float64x2
func LoadFloat64x2SlicePart(s []float64) Float64x2
func LoadMaskedFloat64x2(y *[2]float64, mask Mask64x2) Float64x2
func Float32x4.AsFloat64x2() (to Float64x2)
func Float64x2.Add(y Float64x2) Float64x2
func Float64x2.AddPairs(y Float64x2) Float64x2
func Float64x2.AddSub(y Float64x2) Float64x2
func Float64x2.Broadcast128() Float64x2
func Float64x2.Ceil() Float64x2
func Float64x2.CeilScaled(prec uint8) Float64x2
func Float64x2.CeilScaledResidue(prec uint8) Float64x2
func Float64x2.Compress(mask Mask64x2) Float64x2
func Float64x2.ConcatPermute(y Float64x2, indices Uint64x2) Float64x2
func Float64x2.Div(y Float64x2) Float64x2
func Float64x2.Expand(mask Mask64x2) Float64x2
func Float64x2.Floor() Float64x2
func Float64x2.FloorScaled(prec uint8) Float64x2
func Float64x2.FloorScaledResidue(prec uint8) Float64x2
func Float64x2.Masked(mask Mask64x2) Float64x2
func Float64x2.Max(y Float64x2) Float64x2
func Float64x2.Merge(y Float64x2, mask Mask64x2) Float64x2
func Float64x2.Min(y Float64x2) Float64x2
func Float64x2.Mul(y Float64x2) Float64x2
func Float64x2.MulAdd(y Float64x2, z Float64x2) Float64x2
func Float64x2.MulAddSub(y Float64x2, z Float64x2) Float64x2
func Float64x2.MulSubAdd(y Float64x2, z Float64x2) Float64x2
func Float64x2.Reciprocal() Float64x2
func Float64x2.ReciprocalSqrt() Float64x2
func Float64x2.RoundToEven() Float64x2
func Float64x2.RoundToEvenScaled(prec uint8) Float64x2
func Float64x2.RoundToEvenScaledResidue(prec uint8) Float64x2
func Float64x2.Scale(y Float64x2) Float64x2
func Float64x2.SelectFromPair(a, b uint8, y Float64x2) Float64x2
func Float64x2.SetElem(index uint8, y float64) Float64x2
func Float64x2.Sqrt() Float64x2
func Float64x2.Sub(y Float64x2) Float64x2
func Float64x2.SubPairs(y Float64x2) Float64x2
func Float64x2.Trunc() Float64x2
func Float64x2.TruncScaled(prec uint8) Float64x2
func Float64x2.TruncScaledResidue(prec uint8) Float64x2
func Float64x4.GetHi() Float64x2
func Float64x4.GetLo() Float64x2
func Int16x8.AsFloat64x2() (to Float64x2)
func Int32x4.AsFloat64x2() (to Float64x2)
func Int64x2.AsFloat64x2() (to Float64x2)
func Int64x2.ConvertToFloat64() Float64x2
func Int8x16.AsFloat64x2() (to Float64x2)
func Uint16x8.AsFloat64x2() (to Float64x2)
func Uint32x4.AsFloat64x2() (to Float64x2)
func Uint64x2.AsFloat64x2() (to Float64x2)
func Uint64x2.ConvertToFloat64() Float64x2
func Uint8x16.AsFloat64x2() (to Float64x2)
func Float64x2.Add(y Float64x2) Float64x2
func Float64x2.AddPairs(y Float64x2) Float64x2
func Float64x2.AddSub(y Float64x2) Float64x2
func Float64x2.ConcatPermute(y Float64x2, indices Uint64x2) Float64x2
func Float64x2.Div(y Float64x2) Float64x2
func Float64x2.Equal(y Float64x2) Mask64x2
func Float64x2.Greater(y Float64x2) Mask64x2
func Float64x2.GreaterEqual(y Float64x2) Mask64x2
func Float64x2.IsNan(y Float64x2) Mask64x2
func Float64x2.Less(y Float64x2) Mask64x2
func Float64x2.LessEqual(y Float64x2) Mask64x2
func Float64x2.Max(y Float64x2) Float64x2
func Float64x2.Merge(y Float64x2, mask Mask64x2) Float64x2
func Float64x2.Min(y Float64x2) Float64x2
func Float64x2.Mul(y Float64x2) Float64x2
func Float64x2.MulAdd(y Float64x2, z Float64x2) Float64x2
func Float64x2.MulAdd(y Float64x2, z Float64x2) Float64x2
func Float64x2.MulAddSub(y Float64x2, z Float64x2) Float64x2
func Float64x2.MulAddSub(y Float64x2, z Float64x2) Float64x2
func Float64x2.MulSubAdd(y Float64x2, z Float64x2) Float64x2
func Float64x2.MulSubAdd(y Float64x2, z Float64x2) Float64x2
func Float64x2.NotEqual(y Float64x2) Mask64x2
func Float64x2.Scale(y Float64x2) Float64x2
func Float64x2.SelectFromPair(a, b uint8, y Float64x2) Float64x2
func Float64x2.Sub(y Float64x2) Float64x2
func Float64x2.SubPairs(y Float64x2) Float64x2
func Float64x4.SetHi(y Float64x2) Float64x4
func Float64x4.SetLo(y Float64x2) Float64x4
Float64x4 is a 256-bit SIMD vector of 4 float64 Add adds corresponding elements of two vectors.
Asm: VADDPD, CPU Feature: AVX AddPairs horizontally adds adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VHADDPD, CPU Feature: AVX AddSub subtracts even elements and adds odd elements of two vectors.
Asm: VADDSUBPD, CPU Feature: AVX Float32x8 converts from Float64x4 to Float32x8 Int16x16 converts from Float64x4 to Int16x16 Int32x8 converts from Float64x4 to Int32x8 Int64x4 converts from Float64x4 to Int64x4 Int8x32 converts from Float64x4 to Int8x32 Uint16x16 converts from Float64x4 to Uint16x16 Uint32x8 converts from Float64x4 to Uint32x8 Uint64x4 converts from Float64x4 to Uint64x4 Uint8x32 converts from Float64x4 to Uint8x32 Ceil rounds elements up to the nearest integer.
Asm: VROUNDPD, CPU Feature: AVX CeilScaled rounds elements up with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512 CeilScaledResidue computes the difference after ceiling with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VCOMPRESSPD, CPU Feature: AVX512 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2PD, CPU Feature: AVX512 ConvertToFloat32 converts element values to float32.
The result vector's elements are rounded to the nearest value.
Asm: VCVTPD2PSY, CPU Feature: AVX ConvertToInt32 converts element values to int32.
When a conversion is inexact, a truncated (round toward zero) value is returned.
If a converted result cannot be represented in int32, an implementation-defined
architecture-specific value is returned.
Asm: VCVTTPD2DQY, CPU Feature: AVX ConvertToInt64 converts element values to int64.
When a conversion is inexact, a truncated (round toward zero) value is returned.
If a converted result cannot be represented in int64, an implementation-defined
architecture-specific value is returned.
Asm: VCVTTPD2QQ, CPU Feature: AVX512 ConvertToUint32 converts element values to uint32.
When a conversion is inexact, a truncated (round toward zero) value is returned.
If a converted result cannot be represented in uint32, an implementation-defined
architecture-specific value is returned.
Asm: VCVTTPD2UDQY, CPU Feature: AVX512 ConvertToUint64 converts element values to uint64.
When a conversion is inexact, a truncated (round toward zero) value is returned.
If a converted result cannot be represented in uint64, an implementation-defined
architecture-specific value is returned.
Asm: VCVTTPD2UQQ, CPU Feature: AVX512 Div divides elements of two vectors.
Asm: VDIVPD, CPU Feature: AVX Equal returns x equals y, elementwise.
Asm: VCMPPD, CPU Feature: AVX Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VEXPANDPD, CPU Feature: AVX512 Floor rounds elements down to the nearest integer.
Asm: VROUNDPD, CPU Feature: AVX FloorScaled rounds elements down with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512 FloorScaledResidue computes the difference after flooring with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512 GetHi returns the upper half of x.
Asm: VEXTRACTF128, CPU Feature: AVX GetLo returns the lower half of x.
Asm: VEXTRACTF128, CPU Feature: AVX Greater returns x greater-than y, elementwise.
Asm: VCMPPD, CPU Feature: AVX GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VCMPPD, CPU Feature: AVX IsNan checks if elements are NaN. Use as x.IsNan(x).
Asm: VCMPPD, CPU Feature: AVX Len returns the number of elements in a Float64x4 Less returns x less-than y, elementwise.
Asm: VCMPPD, CPU Feature: AVX LessEqual returns x less-than-or-equals y, elementwise.
Asm: VCMPPD, CPU Feature: AVX Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VMAXPD, CPU Feature: AVX Merge returns x but with elements set to y where mask is false. Min computes the minimum of corresponding elements.
Asm: VMINPD, CPU Feature: AVX Mul multiplies corresponding elements of two vectors.
Asm: VMULPD, CPU Feature: AVX MulAdd performs a fused (x * y) + z.
Asm: VFMADD213PD, CPU Feature: AVX512 MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
Asm: VFMADDSUB213PD, CPU Feature: AVX512 MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
Asm: VFMSUBADD213PD, CPU Feature: AVX512 NotEqual returns x not-equals y, elementwise.
Asm: VCMPPD, CPU Feature: AVX Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 2 bits (values 0-3) of each element of indices is used
Asm: VPERMPD, CPU Feature: AVX512 Reciprocal computes an approximate reciprocal of each element.
Asm: VRCP14PD, CPU Feature: AVX512 ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
Asm: VRSQRT14PD, CPU Feature: AVX512 RoundToEven rounds elements to the nearest integer.
Asm: VROUNDPD, CPU Feature: AVX RoundToEvenScaled rounds elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512 RoundToEvenScaledResidue computes the difference after rounding with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512 Scale multiplies elements by a power of 2.
Asm: VSCALEFPD, CPU Feature: AVX512 Select128FromPair treats the 256-bit vectors x and y as a single vector of four
128-bit elements, and returns a 256-bit result formed by
concatenating the two elements specified by lo and hi.
For example,
{40, 41, 50, 51}.Select128FromPair(3, 0, {60, 61, 70, 71})
returns {70, 71, 40, 41}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table.
lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2F128, CPU Feature: AVX SelectFromPairGrouped returns, for each of the two 128-bit halves of
the vectors x and y, the selection of two elements from the two
vectors x and y, where selector values in the range 0-1 specify
elements from x and values in the range 2-3 specify the 0-1 elements
of y. When the selectors are constants the selection can be
implemented in a single instruction.
If the selectors are not constant this will translate to a function
call.
Asm: VSHUFPD, CPU Feature: AVX SetHi returns x with its upper half set to y.
Asm: VINSERTF128, CPU Feature: AVX SetLo returns x with its lower half set to y.
Asm: VINSERTF128, CPU Feature: AVX Sqrt computes the square root of each element.
Asm: VSQRTPD, CPU Feature: AVX Store stores a Float64x4 to an array StoreMasked stores a Float64x4 to an array,
at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2 StoreSlice stores x into a slice of at least 4 float64s StoreSlicePart stores the 4 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 4 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VSUBPD, CPU Feature: AVX SubPairs horizontally subtracts adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VHSUBPD, CPU Feature: AVX Trunc truncates elements towards zero.
Asm: VROUNDPD, CPU Feature: AVX TruncScaled truncates elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512 TruncScaledResidue computes the difference after truncating with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
Float64x4 : expvar.Var
Float64x4 : fmt.Stringer
func BroadcastFloat64x4(x float64) Float64x4
func LoadFloat64x4(y *[4]float64) Float64x4
func LoadFloat64x4Slice(s []float64) Float64x4
func LoadFloat64x4SlicePart(s []float64) Float64x4
func LoadMaskedFloat64x4(y *[4]float64, mask Mask64x4) Float64x4
func Float32x4.ConvertToFloat64() Float64x4
func Float32x8.AsFloat64x4() (to Float64x4)
func Float64x2.Broadcast256() Float64x4
func Float64x4.Add(y Float64x4) Float64x4
func Float64x4.AddPairs(y Float64x4) Float64x4
func Float64x4.AddSub(y Float64x4) Float64x4
func Float64x4.Ceil() Float64x4
func Float64x4.CeilScaled(prec uint8) Float64x4
func Float64x4.CeilScaledResidue(prec uint8) Float64x4
func Float64x4.Compress(mask Mask64x4) Float64x4
func Float64x4.ConcatPermute(y Float64x4, indices Uint64x4) Float64x4
func Float64x4.Div(y Float64x4) Float64x4
func Float64x4.Expand(mask Mask64x4) Float64x4
func Float64x4.Floor() Float64x4
func Float64x4.FloorScaled(prec uint8) Float64x4
func Float64x4.FloorScaledResidue(prec uint8) Float64x4
func Float64x4.Masked(mask Mask64x4) Float64x4
func Float64x4.Max(y Float64x4) Float64x4
func Float64x4.Merge(y Float64x4, mask Mask64x4) Float64x4
func Float64x4.Min(y Float64x4) Float64x4
func Float64x4.Mul(y Float64x4) Float64x4
func Float64x4.MulAdd(y Float64x4, z Float64x4) Float64x4
func Float64x4.MulAddSub(y Float64x4, z Float64x4) Float64x4
func Float64x4.MulSubAdd(y Float64x4, z Float64x4) Float64x4
func Float64x4.Permute(indices Uint64x4) Float64x4
func Float64x4.Reciprocal() Float64x4
func Float64x4.ReciprocalSqrt() Float64x4
func Float64x4.RoundToEven() Float64x4
func Float64x4.RoundToEvenScaled(prec uint8) Float64x4
func Float64x4.RoundToEvenScaledResidue(prec uint8) Float64x4
func Float64x4.Scale(y Float64x4) Float64x4
func Float64x4.Select128FromPair(lo, hi uint8, y Float64x4) Float64x4
func Float64x4.SelectFromPairGrouped(a, b uint8, y Float64x4) Float64x4
func Float64x4.SetHi(y Float64x2) Float64x4
func Float64x4.SetLo(y Float64x2) Float64x4
func Float64x4.Sqrt() Float64x4
func Float64x4.Sub(y Float64x4) Float64x4
func Float64x4.SubPairs(y Float64x4) Float64x4
func Float64x4.Trunc() Float64x4
func Float64x4.TruncScaled(prec uint8) Float64x4
func Float64x4.TruncScaledResidue(prec uint8) Float64x4
func Float64x8.GetHi() Float64x4
func Float64x8.GetLo() Float64x4
func Int16x16.AsFloat64x4() (to Float64x4)
func Int32x4.ConvertToFloat64() Float64x4
func Int32x8.AsFloat64x4() (to Float64x4)
func Int64x4.AsFloat64x4() (to Float64x4)
func Int64x4.ConvertToFloat64() Float64x4
func Int8x32.AsFloat64x4() (to Float64x4)
func Uint16x16.AsFloat64x4() (to Float64x4)
func Uint32x4.ConvertToFloat64() Float64x4
func Uint32x8.AsFloat64x4() (to Float64x4)
func Uint64x4.AsFloat64x4() (to Float64x4)
func Uint64x4.ConvertToFloat64() Float64x4
func Uint8x32.AsFloat64x4() (to Float64x4)
func Float64x4.Add(y Float64x4) Float64x4
func Float64x4.AddPairs(y Float64x4) Float64x4
func Float64x4.AddSub(y Float64x4) Float64x4
func Float64x4.ConcatPermute(y Float64x4, indices Uint64x4) Float64x4
func Float64x4.Div(y Float64x4) Float64x4
func Float64x4.Equal(y Float64x4) Mask64x4
func Float64x4.Greater(y Float64x4) Mask64x4
func Float64x4.GreaterEqual(y Float64x4) Mask64x4
func Float64x4.IsNan(y Float64x4) Mask64x4
func Float64x4.Less(y Float64x4) Mask64x4
func Float64x4.LessEqual(y Float64x4) Mask64x4
func Float64x4.Max(y Float64x4) Float64x4
func Float64x4.Merge(y Float64x4, mask Mask64x4) Float64x4
func Float64x4.Min(y Float64x4) Float64x4
func Float64x4.Mul(y Float64x4) Float64x4
func Float64x4.MulAdd(y Float64x4, z Float64x4) Float64x4
func Float64x4.MulAdd(y Float64x4, z Float64x4) Float64x4
func Float64x4.MulAddSub(y Float64x4, z Float64x4) Float64x4
func Float64x4.MulAddSub(y Float64x4, z Float64x4) Float64x4
func Float64x4.MulSubAdd(y Float64x4, z Float64x4) Float64x4
func Float64x4.MulSubAdd(y Float64x4, z Float64x4) Float64x4
func Float64x4.NotEqual(y Float64x4) Mask64x4
func Float64x4.Scale(y Float64x4) Float64x4
func Float64x4.Select128FromPair(lo, hi uint8, y Float64x4) Float64x4
func Float64x4.SelectFromPairGrouped(a, b uint8, y Float64x4) Float64x4
func Float64x4.Sub(y Float64x4) Float64x4
func Float64x4.SubPairs(y Float64x4) Float64x4
func Float64x8.SetHi(y Float64x4) Float64x8
func Float64x8.SetLo(y Float64x4) Float64x8
Float64x8 is a 512-bit SIMD vector of 8 float64 Add adds corresponding elements of two vectors.
Asm: VADDPD, CPU Feature: AVX512 Float32x16 converts from Float64x8 to Float32x16 Int16x32 converts from Float64x8 to Int16x32 Int32x16 converts from Float64x8 to Int32x16 Int64x8 converts from Float64x8 to Int64x8 Int8x64 converts from Float64x8 to Int8x64 Uint16x32 converts from Float64x8 to Uint16x32 Uint32x16 converts from Float64x8 to Uint32x16 Uint64x8 converts from Float64x8 to Uint64x8 Uint8x64 converts from Float64x8 to Uint8x64 CeilScaled rounds elements up with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512 CeilScaledResidue computes the difference after ceiling with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VCOMPRESSPD, CPU Feature: AVX512 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2PD, CPU Feature: AVX512 ConvertToFloat32 converts element values to float32.
The result vector's elements are rounded to the nearest value.
Asm: VCVTPD2PS, CPU Feature: AVX512 ConvertToInt32 converts element values to int32.
When a conversion is inexact, a truncated (round toward zero) value is returned.
If a converted result cannot be represented in int32, an implementation-defined
architecture-specific value is returned.
Asm: VCVTTPD2DQ, CPU Feature: AVX512 ConvertToInt64 converts element values to int64.
When a conversion is inexact, a truncated (round toward zero) value is returned.
If a converted result cannot be represented in int64, an implementation-defined
architecture-specific value is returned.
Asm: VCVTTPD2QQ, CPU Feature: AVX512 ConvertToUint32 converts element values to uint32.
When a conversion is inexact, a truncated (round toward zero) value is returned.
If a converted result cannot be represented in uint32, an implementation-defined
architecture-specific value is returned.
Asm: VCVTTPD2UDQ, CPU Feature: AVX512 ConvertToUint64 converts element values to uint64.
When a conversion is inexact, a truncated (round toward zero) value is returned.
If a converted result cannot be represented in uint64, an implementation-defined
architecture-specific value is returned.
Asm: VCVTTPD2UQQ, CPU Feature: AVX512 Div divides elements of two vectors.
Asm: VDIVPD, CPU Feature: AVX512 Equal returns x equals y, elementwise.
Asm: VCMPPD, CPU Feature: AVX512 Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VEXPANDPD, CPU Feature: AVX512 FloorScaled rounds elements down with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512 FloorScaledResidue computes the difference after flooring with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512 GetHi returns the upper half of x.
Asm: VEXTRACTF64X4, CPU Feature: AVX512 GetLo returns the lower half of x.
Asm: VEXTRACTF64X4, CPU Feature: AVX512 Greater returns x greater-than y, elementwise.
Asm: VCMPPD, CPU Feature: AVX512 GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VCMPPD, CPU Feature: AVX512 IsNan checks if elements are NaN. Use as x.IsNan(x).
Asm: VCMPPD, CPU Feature: AVX512 Len returns the number of elements in a Float64x8 Less returns x less-than y, elementwise.
Asm: VCMPPD, CPU Feature: AVX512 LessEqual returns x less-than-or-equals y, elementwise.
Asm: VCMPPD, CPU Feature: AVX512 Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VMAXPD, CPU Feature: AVX512 Merge returns x but with elements set to y where m is false. Min computes the minimum of corresponding elements.
Asm: VMINPD, CPU Feature: AVX512 Mul multiplies corresponding elements of two vectors.
Asm: VMULPD, CPU Feature: AVX512 MulAdd performs a fused (x * y) + z.
Asm: VFMADD213PD, CPU Feature: AVX512 MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
Asm: VFMADDSUB213PD, CPU Feature: AVX512 MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
Asm: VFMSUBADD213PD, CPU Feature: AVX512 NotEqual returns x not-equals y, elementwise.
Asm: VCMPPD, CPU Feature: AVX512 Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 3 bits (values 0-7) of each element of indices is used
Asm: VPERMPD, CPU Feature: AVX512 Reciprocal computes an approximate reciprocal of each element.
Asm: VRCP14PD, CPU Feature: AVX512 ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
Asm: VRSQRT14PD, CPU Feature: AVX512 RoundToEvenScaled rounds elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512 RoundToEvenScaledResidue computes the difference after rounding with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512 Scale multiplies elements by a power of 2.
Asm: VSCALEFPD, CPU Feature: AVX512 SelectFromPairGrouped returns, for each of the four 128-bit subvectors
of the vectors x and y, the selection of two elements from the two
vectors x and y, where selector values in the range 0-1 specify
elements from x and values in the range 2-3 specify the 0-1 elements
of y. When the selectors are constants the selection can be
implemented in a single instruction.
If the selectors are not constant this will translate to a function
call.
Asm: VSHUFPD, CPU Feature: AVX512 SetHi returns x with its upper half set to y.
Asm: VINSERTF64X4, CPU Feature: AVX512 SetLo returns x with its lower half set to y.
Asm: VINSERTF64X4, CPU Feature: AVX512 Sqrt computes the square root of each element.
Asm: VSQRTPD, CPU Feature: AVX512 Store stores a Float64x8 to an array StoreMasked stores a Float64x8 to an array,
at those elements enabled by mask
Asm: VMOVDQU64, CPU Feature: AVX512 StoreSlice stores x into a slice of at least 8 float64s StoreSlicePart stores the 8 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 8 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VSUBPD, CPU Feature: AVX512 TruncScaled truncates elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512 TruncScaledResidue computes the difference after truncating with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
Float64x8 : expvar.Var
Float64x8 : fmt.Stringer
func BroadcastFloat64x8(x float64) Float64x8
func LoadFloat64x8(y *[8]float64) Float64x8
func LoadFloat64x8Slice(s []float64) Float64x8
func LoadFloat64x8SlicePart(s []float64) Float64x8
func LoadMaskedFloat64x8(y *[8]float64, mask Mask64x8) Float64x8
func Float32x16.AsFloat64x8() (to Float64x8)
func Float32x8.ConvertToFloat64() Float64x8
func Float64x2.Broadcast512() Float64x8
func Float64x8.Add(y Float64x8) Float64x8
func Float64x8.CeilScaled(prec uint8) Float64x8
func Float64x8.CeilScaledResidue(prec uint8) Float64x8
func Float64x8.Compress(mask Mask64x8) Float64x8
func Float64x8.ConcatPermute(y Float64x8, indices Uint64x8) Float64x8
func Float64x8.Div(y Float64x8) Float64x8
func Float64x8.Expand(mask Mask64x8) Float64x8
func Float64x8.FloorScaled(prec uint8) Float64x8
func Float64x8.FloorScaledResidue(prec uint8) Float64x8
func Float64x8.Masked(mask Mask64x8) Float64x8
func Float64x8.Max(y Float64x8) Float64x8
func Float64x8.Merge(y Float64x8, mask Mask64x8) Float64x8
func Float64x8.Min(y Float64x8) Float64x8
func Float64x8.Mul(y Float64x8) Float64x8
func Float64x8.MulAdd(y Float64x8, z Float64x8) Float64x8
func Float64x8.MulAddSub(y Float64x8, z Float64x8) Float64x8
func Float64x8.MulSubAdd(y Float64x8, z Float64x8) Float64x8
func Float64x8.Permute(indices Uint64x8) Float64x8
func Float64x8.Reciprocal() Float64x8
func Float64x8.ReciprocalSqrt() Float64x8
func Float64x8.RoundToEvenScaled(prec uint8) Float64x8
func Float64x8.RoundToEvenScaledResidue(prec uint8) Float64x8
func Float64x8.Scale(y Float64x8) Float64x8
func Float64x8.SelectFromPairGrouped(a, b uint8, y Float64x8) Float64x8
func Float64x8.SetHi(y Float64x4) Float64x8
func Float64x8.SetLo(y Float64x4) Float64x8
func Float64x8.Sqrt() Float64x8
func Float64x8.Sub(y Float64x8) Float64x8
func Float64x8.TruncScaled(prec uint8) Float64x8
func Float64x8.TruncScaledResidue(prec uint8) Float64x8
func Int16x32.AsFloat64x8() (to Float64x8)
func Int32x16.AsFloat64x8() (to Float64x8)
func Int32x8.ConvertToFloat64() Float64x8
func Int64x8.AsFloat64x8() (to Float64x8)
func Int64x8.ConvertToFloat64() Float64x8
func Int8x64.AsFloat64x8() (to Float64x8)
func Uint16x32.AsFloat64x8() (to Float64x8)
func Uint32x16.AsFloat64x8() (to Float64x8)
func Uint32x8.ConvertToFloat64() Float64x8
func Uint64x8.AsFloat64x8() (to Float64x8)
func Uint64x8.ConvertToFloat64() Float64x8
func Uint8x64.AsFloat64x8() (to Float64x8)
func Float64x8.Add(y Float64x8) Float64x8
func Float64x8.ConcatPermute(y Float64x8, indices Uint64x8) Float64x8
func Float64x8.Div(y Float64x8) Float64x8
func Float64x8.Equal(y Float64x8) Mask64x8
func Float64x8.Greater(y Float64x8) Mask64x8
func Float64x8.GreaterEqual(y Float64x8) Mask64x8
func Float64x8.IsNan(y Float64x8) Mask64x8
func Float64x8.Less(y Float64x8) Mask64x8
func Float64x8.LessEqual(y Float64x8) Mask64x8
func Float64x8.Max(y Float64x8) Float64x8
func Float64x8.Merge(y Float64x8, mask Mask64x8) Float64x8
func Float64x8.Min(y Float64x8) Float64x8
func Float64x8.Mul(y Float64x8) Float64x8
func Float64x8.MulAdd(y Float64x8, z Float64x8) Float64x8
func Float64x8.MulAdd(y Float64x8, z Float64x8) Float64x8
func Float64x8.MulAddSub(y Float64x8, z Float64x8) Float64x8
func Float64x8.MulAddSub(y Float64x8, z Float64x8) Float64x8
func Float64x8.MulSubAdd(y Float64x8, z Float64x8) Float64x8
func Float64x8.MulSubAdd(y Float64x8, z Float64x8) Float64x8
func Float64x8.NotEqual(y Float64x8) Mask64x8
func Float64x8.Scale(y Float64x8) Float64x8
func Float64x8.SelectFromPairGrouped(a, b uint8, y Float64x8) Float64x8
func Float64x8.Sub(y Float64x8) Float64x8
Int16x16 is a 256-bit SIMD vector of 16 int16 Abs computes the absolute value of each element.
Asm: VPABSW, CPU Feature: AVX2 Add adds corresponding elements of two vectors.
Asm: VPADDW, CPU Feature: AVX2 AddPairs horizontally adds adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VPHADDW, CPU Feature: AVX2 AddPairsSaturated horizontally adds adjacent pairs of elements with saturation.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VPHADDSW, CPU Feature: AVX2 AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDSW, CPU Feature: AVX2 And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2 AndNot performs a bitwise x &^ y.
Asm: VPANDN, CPU Feature: AVX2 Float32x8 converts from Int16x16 to Float32x8 Float64x4 converts from Int16x16 to Float64x4 Int32x8 converts from Int16x16 to Int32x8 Int64x4 converts from Int16x16 to Int64x4 Int8x32 converts from Int16x16 to Int8x32 Uint16x16 converts from Int16x16 to Uint16x16 Uint32x8 converts from Int16x16 to Uint32x8 Uint64x4 converts from Int16x16 to Uint64x4 Uint8x32 converts from Int16x16 to Uint8x32 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2W, CPU Feature: AVX512 CopySign returns the product of the first operand with -1, 0, or 1,
whichever constant is nearest to the value of the second operand.
Asm: VPSIGNW, CPU Feature: AVX2 DotProductPairs multiplies the elements and add the pairs together,
yielding a vector of half as many elements with twice the input element size.
Asm: VPMADDWD, CPU Feature: AVX2 Equal returns x equals y, elementwise.
Asm: VPCMPEQW, CPU Feature: AVX2 Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDW, CPU Feature: AVX512VBMI2 ExtendToInt32 converts element values to int32.
The result vector's elements are sign-extended.
Asm: VPMOVSXWD, CPU Feature: AVX512 GetHi returns the upper half of x.
Asm: VEXTRACTI128, CPU Feature: AVX2 GetLo returns the lower half of x.
Asm: VEXTRACTI128, CPU Feature: AVX2 Greater returns x greater-than y, elementwise.
Asm: VPCMPGTW, CPU Feature: AVX2 GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX2 InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHWD, CPU Feature: AVX2 InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLWD, CPU Feature: AVX2 IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x.
x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX Len returns the number of elements in a Int16x16 Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX2 LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX2 Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXSW, CPU Feature: AVX2 Merge returns x but with elements set to y where mask is false. Min computes the minimum of corresponding elements.
Asm: VPMINSW, CPU Feature: AVX2 Mul multiplies corresponding elements of two vectors.
Asm: VPMULLW, CPU Feature: AVX2 MulHigh multiplies elements and stores the high part of the result.
Asm: VPMULHW, CPU Feature: AVX2 Not returns the bitwise complement of x
Emulated, CPU Feature AVX2 NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX2 OnesCount counts the number of set bits in each element.
Asm: VPOPCNTW, CPU Feature: AVX512BITALG Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2 Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 4 bits (values 0-15) of each element of indices is used
Asm: VPERMW, CPU Feature: AVX512 PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4],
x[8], x[9], x[10], x[11], x[a+12], x[b+12], x[c+12], x[d+12]}
Parameters a,b,c,d should have values between 0 and 3.
If a through d are constants, then an instruction will be inlined, otherwise
a jump table is generated.
Asm: VPSHUFHW, CPU Feature: AVX2 PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7],
x[a+8], x[b+8], x[c+8], x[d+8], x[12], x[13], x[14], x[15]}
Parameters a,b,c,d should have values between 0 and 3.
If a through d are constants, then an instruction will be inlined, otherwise
a jump table is generated.
Asm: VPSHUFLW, CPU Feature: AVX2 SaturateToInt8 converts element values to int8.
Conversion is done with saturation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSWB, CPU Feature: AVX512 SaturateToUint8 converts element values to uint8.
Conversion is done with saturation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSWB, CPU Feature: AVX512 Select128FromPair treats the 256-bit vectors x and y as a single vector of four
128-bit elements, and returns a 256-bit result formed by
concatenating the two elements specified by lo and hi.
For example,
{40, 41, 42, 43, 44, 45, 46, 47, 50, 51, 52, 53, 54, 55, 56, 57}.Select128FromPair(3, 0,
{60, 61, 62, 63, 64, 65, 66, 67, 70, 71, 72, 73, 74, 75, 76, 77})
returns {70, 71, 72, 73, 74, 75, 76, 77, 40, 41, 42, 43, 44, 45, 46, 47}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table.
lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2 SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2 SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2 ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLW, CPU Feature: AVX2 ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDW, CPU Feature: AVX512VBMI2 ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
Asm: VPSRAW, CPU Feature: AVX2 ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDW, CPU Feature: AVX512VBMI2 ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVW, CPU Feature: AVX512 ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVW, CPU Feature: AVX512VBMI2 ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
Asm: VPSRAVW, CPU Feature: AVX512 ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVW, CPU Feature: AVX512VBMI2 Store stores a Int16x16 to an array StoreSlice stores x into a slice of at least 16 int16s StoreSlicePart stores the elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 16 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBW, CPU Feature: AVX2 SubPairs horizontally subtracts adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VPHSUBW, CPU Feature: AVX2 SubPairsSaturated horizontally subtracts adjacent pairs of elements with saturation.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VPHSUBSW, CPU Feature: AVX2 SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBSW, CPU Feature: AVX2 ToMask converts from Int16x16 to Mask16x16, mask element is set to true when the corresponding vector element is non-zero. TruncateToInt8 converts element values to int8.
Conversion is done with truncation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVWB, CPU Feature: AVX512 Xor performs a bitwise XOR operation between two vectors.
Asm: VPXOR, CPU Feature: AVX2
Int16x16 : expvar.Var
Int16x16 : fmt.Stringer
func BroadcastInt16x16(x int16) Int16x16
func LoadInt16x16(y *[16]int16) Int16x16
func LoadInt16x16Slice(s []int16) Int16x16
func LoadInt16x16SlicePart(s []int16) Int16x16
func Float32x8.AsInt16x16() (to Int16x16)
func Float64x4.AsInt16x16() (to Int16x16)
func Int16x16.Abs() Int16x16
func Int16x16.Add(y Int16x16) Int16x16
func Int16x16.AddPairs(y Int16x16) Int16x16
func Int16x16.AddPairsSaturated(y Int16x16) Int16x16
func Int16x16.AddSaturated(y Int16x16) Int16x16
func Int16x16.And(y Int16x16) Int16x16
func Int16x16.AndNot(y Int16x16) Int16x16
func Int16x16.Compress(mask Mask16x16) Int16x16
func Int16x16.ConcatPermute(y Int16x16, indices Uint16x16) Int16x16
func Int16x16.CopySign(y Int16x16) Int16x16
func Int16x16.Expand(mask Mask16x16) Int16x16
func Int16x16.InterleaveHiGrouped(y Int16x16) Int16x16
func Int16x16.InterleaveLoGrouped(y Int16x16) Int16x16
func Int16x16.Masked(mask Mask16x16) Int16x16
func Int16x16.Max(y Int16x16) Int16x16
func Int16x16.Merge(y Int16x16, mask Mask16x16) Int16x16
func Int16x16.Min(y Int16x16) Int16x16
func Int16x16.Mul(y Int16x16) Int16x16
func Int16x16.MulHigh(y Int16x16) Int16x16
func Int16x16.Not() Int16x16
func Int16x16.OnesCount() Int16x16
func Int16x16.Or(y Int16x16) Int16x16
func Int16x16.Permute(indices Uint16x16) Int16x16
func Int16x16.PermuteScalarsHiGrouped(a, b, c, d uint8) Int16x16
func Int16x16.PermuteScalarsLoGrouped(a, b, c, d uint8) Int16x16
func Int16x16.Select128FromPair(lo, hi uint8, y Int16x16) Int16x16
func Int16x16.SetHi(y Int16x8) Int16x16
func Int16x16.SetLo(y Int16x8) Int16x16
func Int16x16.ShiftAllLeft(y uint64) Int16x16
func Int16x16.ShiftAllLeftConcat(shift uint8, y Int16x16) Int16x16
func Int16x16.ShiftAllRight(y uint64) Int16x16
func Int16x16.ShiftAllRightConcat(shift uint8, y Int16x16) Int16x16
func Int16x16.ShiftLeft(y Int16x16) Int16x16
func Int16x16.ShiftLeftConcat(y Int16x16, z Int16x16) Int16x16
func Int16x16.ShiftRight(y Int16x16) Int16x16
func Int16x16.ShiftRightConcat(y Int16x16, z Int16x16) Int16x16
func Int16x16.Sub(y Int16x16) Int16x16
func Int16x16.SubPairs(y Int16x16) Int16x16
func Int16x16.SubPairsSaturated(y Int16x16) Int16x16
func Int16x16.SubSaturated(y Int16x16) Int16x16
func Int16x16.Xor(y Int16x16) Int16x16
func Int16x32.GetHi() Int16x16
func Int16x32.GetLo() Int16x16
func Int16x8.Broadcast256() Int16x16
func Int32x16.SaturateToInt16() Int16x16
func Int32x16.TruncateToInt16() Int16x16
func Int32x8.AsInt16x16() (to Int16x16)
func Int32x8.SaturateToInt16Concat(y Int32x8) Int16x16
func Int64x4.AsInt16x16() (to Int16x16)
func Int8x16.ExtendToInt16() Int16x16
func Int8x32.AsInt16x16() (to Int16x16)
func Mask16x16.ToInt16x16() (to Int16x16)
func Uint16x16.AsInt16x16() (to Int16x16)
func Uint32x8.AsInt16x16() (to Int16x16)
func Uint64x4.AsInt16x16() (to Int16x16)
func Uint8x32.AsInt16x16() (to Int16x16)
func Uint8x32.DotProductPairsSaturated(y Int8x32) Int16x16
func Int16x16.Add(y Int16x16) Int16x16
func Int16x16.AddPairs(y Int16x16) Int16x16
func Int16x16.AddPairsSaturated(y Int16x16) Int16x16
func Int16x16.AddSaturated(y Int16x16) Int16x16
func Int16x16.And(y Int16x16) Int16x16
func Int16x16.AndNot(y Int16x16) Int16x16
func Int16x16.ConcatPermute(y Int16x16, indices Uint16x16) Int16x16
func Int16x16.CopySign(y Int16x16) Int16x16
func Int16x16.DotProductPairs(y Int16x16) Int32x8
func Int16x16.Equal(y Int16x16) Mask16x16
func Int16x16.Greater(y Int16x16) Mask16x16
func Int16x16.GreaterEqual(y Int16x16) Mask16x16
func Int16x16.InterleaveHiGrouped(y Int16x16) Int16x16
func Int16x16.InterleaveLoGrouped(y Int16x16) Int16x16
func Int16x16.Less(y Int16x16) Mask16x16
func Int16x16.LessEqual(y Int16x16) Mask16x16
func Int16x16.Max(y Int16x16) Int16x16
func Int16x16.Merge(y Int16x16, mask Mask16x16) Int16x16
func Int16x16.Min(y Int16x16) Int16x16
func Int16x16.Mul(y Int16x16) Int16x16
func Int16x16.MulHigh(y Int16x16) Int16x16
func Int16x16.NotEqual(y Int16x16) Mask16x16
func Int16x16.Or(y Int16x16) Int16x16
func Int16x16.Select128FromPair(lo, hi uint8, y Int16x16) Int16x16
func Int16x16.ShiftAllLeftConcat(shift uint8, y Int16x16) Int16x16
func Int16x16.ShiftAllRightConcat(shift uint8, y Int16x16) Int16x16
func Int16x16.ShiftLeft(y Int16x16) Int16x16
func Int16x16.ShiftLeftConcat(y Int16x16, z Int16x16) Int16x16
func Int16x16.ShiftLeftConcat(y Int16x16, z Int16x16) Int16x16
func Int16x16.ShiftRight(y Int16x16) Int16x16
func Int16x16.ShiftRightConcat(y Int16x16, z Int16x16) Int16x16
func Int16x16.ShiftRightConcat(y Int16x16, z Int16x16) Int16x16
func Int16x16.Sub(y Int16x16) Int16x16
func Int16x16.SubPairs(y Int16x16) Int16x16
func Int16x16.SubPairsSaturated(y Int16x16) Int16x16
func Int16x16.SubSaturated(y Int16x16) Int16x16
func Int16x16.Xor(y Int16x16) Int16x16
func Int16x32.SetHi(y Int16x16) Int16x32
func Int16x32.SetLo(y Int16x16) Int16x32
Int16x32 is a 512-bit SIMD vector of 32 int16 Abs computes the absolute value of each element.
Asm: VPABSW, CPU Feature: AVX512 Add adds corresponding elements of two vectors.
Asm: VPADDW, CPU Feature: AVX512 AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDSW, CPU Feature: AVX512 And performs a bitwise AND operation between two vectors.
Asm: VPANDD, CPU Feature: AVX512 AndNot performs a bitwise x &^ y.
Asm: VPANDND, CPU Feature: AVX512 Float32x16 converts from Int16x32 to Float32x16 Float64x8 converts from Int16x32 to Float64x8 Int32x16 converts from Int16x32 to Int32x16 Int64x8 converts from Int16x32 to Int64x8 Int8x64 converts from Int16x32 to Int8x64 Uint16x32 converts from Int16x32 to Uint16x32 Uint32x16 converts from Int16x32 to Uint32x16 Uint64x8 converts from Int16x32 to Uint64x8 Uint8x64 converts from Int16x32 to Uint8x64 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2W, CPU Feature: AVX512 DotProductPairs multiplies the elements and add the pairs together,
yielding a vector of half as many elements with twice the input element size.
Asm: VPMADDWD, CPU Feature: AVX512 Equal returns x equals y, elementwise.
Asm: VPCMPEQW, CPU Feature: AVX512 Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDW, CPU Feature: AVX512VBMI2 GetHi returns the upper half of x.
Asm: VEXTRACTI64X4, CPU Feature: AVX512 GetLo returns the lower half of x.
Asm: VEXTRACTI64X4, CPU Feature: AVX512 Greater returns x greater-than y, elementwise.
Asm: VPCMPGTW, CPU Feature: AVX512 GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VPCMPW, CPU Feature: AVX512 InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHWD, CPU Feature: AVX512 InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLWD, CPU Feature: AVX512 Len returns the number of elements in a Int16x32 Less returns x less-than y, elementwise.
Asm: VPCMPW, CPU Feature: AVX512 LessEqual returns x less-than-or-equals y, elementwise.
Asm: VPCMPW, CPU Feature: AVX512 Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXSW, CPU Feature: AVX512 Merge returns x but with elements set to y where m is false. Min computes the minimum of corresponding elements.
Asm: VPMINSW, CPU Feature: AVX512 Mul multiplies corresponding elements of two vectors.
Asm: VPMULLW, CPU Feature: AVX512 MulHigh multiplies elements and stores the high part of the result.
Asm: VPMULHW, CPU Feature: AVX512 Not returns the bitwise complement of x
Emulated, CPU Feature AVX512 NotEqual returns x not-equals y, elementwise.
Asm: VPCMPW, CPU Feature: AVX512 OnesCount counts the number of set bits in each element.
Asm: VPOPCNTW, CPU Feature: AVX512BITALG Or performs a bitwise OR operation between two vectors.
Asm: VPORD, CPU Feature: AVX512 Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 5 bits (values 0-31) of each element of indices is used
Asm: VPERMW, CPU Feature: AVX512 PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4],
x[8], x[9], x[10], x[11], x[a+12], x[b+12], x[c+12], x[d+12],
x[16], x[17], x[18], x[19], x[a+20], x[b+20], x[c+20], x[d+20],
x[24], x[25], x[26], x[27], x[a+28], x[b+28], x[c+28], x[d+28]}
Parameters a,b,c,d should have values between 0 and 3.
If a through d are constants, then an instruction will be inlined, otherwise
a jump table is generated.
Asm: VPSHUFHW, CPU Feature: AVX512 PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7],
x[a+8], x[b+8], x[c+8], x[d+8], x[12], x[13], x[14], x[15],
x[a+16], x[b+16], x[c+16], x[d+16], x[20], x[21], x[22], x[23],
x[a+24], x[b+24], x[c+24], x[d+24], x[28], x[29], x[30], x[31]}
Parameters a,b,c,d should have values between 0 and 3.
If a through d are constants, then an instruction will be inlined, otherwise
a jump table is generated.
Asm: VPSHUFLW, CPU Feature: AVX512 SaturateToInt8 converts element values to int8.
Conversion is done with saturation on the vector elements.
Asm: VPMOVSWB, CPU Feature: AVX512 SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512 SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512 ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLW, CPU Feature: AVX512 ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDW, CPU Feature: AVX512VBMI2 ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
Asm: VPSRAW, CPU Feature: AVX512 ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDW, CPU Feature: AVX512VBMI2 ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVW, CPU Feature: AVX512 ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVW, CPU Feature: AVX512VBMI2 ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
Asm: VPSRAVW, CPU Feature: AVX512 ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVW, CPU Feature: AVX512VBMI2 Store stores a Int16x32 to an array StoreMasked stores a Int16x32 to an array,
at those elements enabled by mask
Asm: VMOVDQU16, CPU Feature: AVX512 StoreSlice stores x into a slice of at least 32 int16s StoreSlicePart stores the 32 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 32 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBW, CPU Feature: AVX512 SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBSW, CPU Feature: AVX512 ToMask converts from Int16x32 to Mask16x32, mask element is set to true when the corresponding vector element is non-zero. TruncateToInt8 converts element values to int8.
Conversion is done with truncation on the vector elements.
Asm: VPMOVWB, CPU Feature: AVX512 Xor performs a bitwise XOR operation between two vectors.
Asm: VPXORD, CPU Feature: AVX512
Int16x32 : expvar.Var
Int16x32 : fmt.Stringer
func BroadcastInt16x32(x int16) Int16x32
func LoadInt16x32(y *[32]int16) Int16x32
func LoadInt16x32Slice(s []int16) Int16x32
func LoadInt16x32SlicePart(s []int16) Int16x32
func LoadMaskedInt16x32(y *[32]int16, mask Mask16x32) Int16x32
func Float32x16.AsInt16x32() (to Int16x32)
func Float64x8.AsInt16x32() (to Int16x32)
func Int16x32.Abs() Int16x32
func Int16x32.Add(y Int16x32) Int16x32
func Int16x32.AddSaturated(y Int16x32) Int16x32
func Int16x32.And(y Int16x32) Int16x32
func Int16x32.AndNot(y Int16x32) Int16x32
func Int16x32.Compress(mask Mask16x32) Int16x32
func Int16x32.ConcatPermute(y Int16x32, indices Uint16x32) Int16x32
func Int16x32.Expand(mask Mask16x32) Int16x32
func Int16x32.InterleaveHiGrouped(y Int16x32) Int16x32
func Int16x32.InterleaveLoGrouped(y Int16x32) Int16x32
func Int16x32.Masked(mask Mask16x32) Int16x32
func Int16x32.Max(y Int16x32) Int16x32
func Int16x32.Merge(y Int16x32, mask Mask16x32) Int16x32
func Int16x32.Min(y Int16x32) Int16x32
func Int16x32.Mul(y Int16x32) Int16x32
func Int16x32.MulHigh(y Int16x32) Int16x32
func Int16x32.Not() Int16x32
func Int16x32.OnesCount() Int16x32
func Int16x32.Or(y Int16x32) Int16x32
func Int16x32.Permute(indices Uint16x32) Int16x32
func Int16x32.PermuteScalarsHiGrouped(a, b, c, d uint8) Int16x32
func Int16x32.PermuteScalarsLoGrouped(a, b, c, d uint8) Int16x32
func Int16x32.SetHi(y Int16x16) Int16x32
func Int16x32.SetLo(y Int16x16) Int16x32
func Int16x32.ShiftAllLeft(y uint64) Int16x32
func Int16x32.ShiftAllLeftConcat(shift uint8, y Int16x32) Int16x32
func Int16x32.ShiftAllRight(y uint64) Int16x32
func Int16x32.ShiftAllRightConcat(shift uint8, y Int16x32) Int16x32
func Int16x32.ShiftLeft(y Int16x32) Int16x32
func Int16x32.ShiftLeftConcat(y Int16x32, z Int16x32) Int16x32
func Int16x32.ShiftRight(y Int16x32) Int16x32
func Int16x32.ShiftRightConcat(y Int16x32, z Int16x32) Int16x32
func Int16x32.Sub(y Int16x32) Int16x32
func Int16x32.SubSaturated(y Int16x32) Int16x32
func Int16x32.Xor(y Int16x32) Int16x32
func Int16x8.Broadcast512() Int16x32
func Int32x16.AsInt16x32() (to Int16x32)
func Int32x16.SaturateToInt16Concat(y Int32x16) Int16x32
func Int64x8.AsInt16x32() (to Int16x32)
func Int8x32.ExtendToInt16() Int16x32
func Int8x64.AsInt16x32() (to Int16x32)
func Mask16x32.ToInt16x32() (to Int16x32)
func Uint16x32.AsInt16x32() (to Int16x32)
func Uint32x16.AsInt16x32() (to Int16x32)
func Uint64x8.AsInt16x32() (to Int16x32)
func Uint8x64.AsInt16x32() (to Int16x32)
func Uint8x64.DotProductPairsSaturated(y Int8x64) Int16x32
func Int16x32.Add(y Int16x32) Int16x32
func Int16x32.AddSaturated(y Int16x32) Int16x32
func Int16x32.And(y Int16x32) Int16x32
func Int16x32.AndNot(y Int16x32) Int16x32
func Int16x32.ConcatPermute(y Int16x32, indices Uint16x32) Int16x32
func Int16x32.DotProductPairs(y Int16x32) Int32x16
func Int16x32.Equal(y Int16x32) Mask16x32
func Int16x32.Greater(y Int16x32) Mask16x32
func Int16x32.GreaterEqual(y Int16x32) Mask16x32
func Int16x32.InterleaveHiGrouped(y Int16x32) Int16x32
func Int16x32.InterleaveLoGrouped(y Int16x32) Int16x32
func Int16x32.Less(y Int16x32) Mask16x32
func Int16x32.LessEqual(y Int16x32) Mask16x32
func Int16x32.Max(y Int16x32) Int16x32
func Int16x32.Merge(y Int16x32, mask Mask16x32) Int16x32
func Int16x32.Min(y Int16x32) Int16x32
func Int16x32.Mul(y Int16x32) Int16x32
func Int16x32.MulHigh(y Int16x32) Int16x32
func Int16x32.NotEqual(y Int16x32) Mask16x32
func Int16x32.Or(y Int16x32) Int16x32
func Int16x32.ShiftAllLeftConcat(shift uint8, y Int16x32) Int16x32
func Int16x32.ShiftAllRightConcat(shift uint8, y Int16x32) Int16x32
func Int16x32.ShiftLeft(y Int16x32) Int16x32
func Int16x32.ShiftLeftConcat(y Int16x32, z Int16x32) Int16x32
func Int16x32.ShiftLeftConcat(y Int16x32, z Int16x32) Int16x32
func Int16x32.ShiftRight(y Int16x32) Int16x32
func Int16x32.ShiftRightConcat(y Int16x32, z Int16x32) Int16x32
func Int16x32.ShiftRightConcat(y Int16x32, z Int16x32) Int16x32
func Int16x32.Sub(y Int16x32) Int16x32
func Int16x32.SubSaturated(y Int16x32) Int16x32
func Int16x32.Xor(y Int16x32) Int16x32
Int16x8 is a 128-bit SIMD vector of 8 int16 Abs computes the absolute value of each element.
Asm: VPABSW, CPU Feature: AVX Add adds corresponding elements of two vectors.
Asm: VPADDW, CPU Feature: AVX AddPairs horizontally adds adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VPHADDW, CPU Feature: AVX AddPairsSaturated horizontally adds adjacent pairs of elements with saturation.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VPHADDSW, CPU Feature: AVX AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDSW, CPU Feature: AVX And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX AndNot performs a bitwise x &^ y.
Asm: VPANDN, CPU Feature: AVX Float32x4 converts from Int16x8 to Float32x4 Float64x2 converts from Int16x8 to Float64x2 Int32x4 converts from Int16x8 to Int32x4 Int64x2 converts from Int16x8 to Int64x2 Int8x16 converts from Int16x8 to Int8x16 Uint16x8 converts from Int16x8 to Uint16x8 Uint32x4 converts from Int16x8 to Uint32x4 Uint64x2 converts from Int16x8 to Uint64x2 Uint8x16 converts from Int16x8 to Uint8x16 Broadcast128 copies element zero of its (128-bit) input to all elements of
the 128-bit output vector.
Asm: VPBROADCASTW, CPU Feature: AVX2 Broadcast256 copies element zero of its (128-bit) input to all elements of
the 256-bit output vector.
Asm: VPBROADCASTW, CPU Feature: AVX2 Broadcast512 copies element zero of its (128-bit) input to all elements of
the 512-bit output vector.
Asm: VPBROADCASTW, CPU Feature: AVX512 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2W, CPU Feature: AVX512 CopySign returns the product of the first operand with -1, 0, or 1,
whichever constant is nearest to the value of the second operand.
Asm: VPSIGNW, CPU Feature: AVX DotProductPairs multiplies the elements and add the pairs together,
yielding a vector of half as many elements with twice the input element size.
Asm: VPMADDWD, CPU Feature: AVX Equal returns x equals y, elementwise.
Asm: VPCMPEQW, CPU Feature: AVX Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDW, CPU Feature: AVX512VBMI2 ExtendLo2ToInt64x2 converts 2 lowest vector element values to int64.
The result vector's elements are sign-extended.
Asm: VPMOVSXWQ, CPU Feature: AVX ExtendLo4ToInt32x4 converts 4 lowest vector element values to int32.
The result vector's elements are sign-extended.
Asm: VPMOVSXWD, CPU Feature: AVX ExtendLo4ToInt64x4 converts 4 lowest vector element values to int64.
The result vector's elements are sign-extended.
Asm: VPMOVSXWQ, CPU Feature: AVX2 ExtendToInt32 converts element values to int32.
The result vector's elements are sign-extended.
Asm: VPMOVSXWD, CPU Feature: AVX2 ExtendToInt64 converts element values to int64.
The result vector's elements are sign-extended.
Asm: VPMOVSXWQ, CPU Feature: AVX512 GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRW, CPU Feature: AVX512 Greater returns x greater-than y, elementwise.
Asm: VPCMPGTW, CPU Feature: AVX GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX InterleaveHi interleaves the elements of the high halves of x and y.
Asm: VPUNPCKHWD, CPU Feature: AVX InterleaveLo interleaves the elements of the low halves of x and y.
Asm: VPUNPCKLWD, CPU Feature: AVX IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x.
x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX Len returns the number of elements in a Int16x8 Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXSW, CPU Feature: AVX Merge returns x but with elements set to y where mask is false. Min computes the minimum of corresponding elements.
Asm: VPMINSW, CPU Feature: AVX Mul multiplies corresponding elements of two vectors.
Asm: VPMULLW, CPU Feature: AVX MulHigh multiplies elements and stores the high part of the result.
Asm: VPMULHW, CPU Feature: AVX Not returns the bitwise complement of x
Emulated, CPU Feature AVX NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX OnesCount counts the number of set bits in each element.
Asm: VPOPCNTW, CPU Feature: AVX512BITALG Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 3 bits (values 0-7) of each element of indices is used
Asm: VPERMW, CPU Feature: AVX512 PermuteScalarsHi performs a permutation of vector x using the supplied indices:
result = {x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4]}
Parameters a,b,c,d should have values between 0 and 3.
If a through d are constants, then an instruction will be inlined, otherwise
a jump table is generated.
Asm: VPSHUFHW, CPU Feature: AVX512 PermuteScalarsLo performs a permutation of vector x using the supplied indices:
result = {x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7]}
Parameters a,b,c,d should have values between 0 and 3.
If a through d are constants, then an instruction will be inlined, otherwise
a jump table is generated.
Asm: VPSHUFLW, CPU Feature: AVX512 SaturateToInt8 converts element values to int8.
Conversion is done with saturation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSWB, CPU Feature: AVX512 SaturateToUint8 converts element values to uint8.
Conversion is done with saturation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSWB, CPU Feature: AVX512 SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRW, CPU Feature: AVX ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLW, CPU Feature: AVX ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDW, CPU Feature: AVX512VBMI2 ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
Asm: VPSRAW, CPU Feature: AVX ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDW, CPU Feature: AVX512VBMI2 ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVW, CPU Feature: AVX512 ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVW, CPU Feature: AVX512VBMI2 ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
Asm: VPSRAVW, CPU Feature: AVX512 ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVW, CPU Feature: AVX512VBMI2 Store stores a Int16x8 to an array StoreSlice stores x into a slice of at least 8 int16s StoreSlicePart stores the elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 8 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBW, CPU Feature: AVX SubPairs horizontally subtracts adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VPHSUBW, CPU Feature: AVX SubPairsSaturated horizontally subtracts adjacent pairs of elements with saturation.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VPHSUBSW, CPU Feature: AVX SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBSW, CPU Feature: AVX ToMask converts from Int16x8 to Mask16x8, mask element is set to true when the corresponding vector element is non-zero. TruncateToInt8 converts element values to int8.
Conversion is done with truncation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVWB, CPU Feature: AVX512 Xor performs a bitwise XOR operation between two vectors.
Asm: VPXOR, CPU Feature: AVX
Int16x8 : expvar.Var
Int16x8 : fmt.Stringer
func BroadcastInt16x8(x int16) Int16x8
func LoadInt16x8(y *[8]int16) Int16x8
func LoadInt16x8Slice(s []int16) Int16x8
func LoadInt16x8SlicePart(s []int16) Int16x8
func Float32x4.AsInt16x8() (to Int16x8)
func Float64x2.AsInt16x8() (to Int16x8)
func Int16x16.GetHi() Int16x8
func Int16x16.GetLo() Int16x8
func Int16x8.Abs() Int16x8
func Int16x8.Add(y Int16x8) Int16x8
func Int16x8.AddPairs(y Int16x8) Int16x8
func Int16x8.AddPairsSaturated(y Int16x8) Int16x8
func Int16x8.AddSaturated(y Int16x8) Int16x8
func Int16x8.And(y Int16x8) Int16x8
func Int16x8.AndNot(y Int16x8) Int16x8
func Int16x8.Broadcast128() Int16x8
func Int16x8.Compress(mask Mask16x8) Int16x8
func Int16x8.ConcatPermute(y Int16x8, indices Uint16x8) Int16x8
func Int16x8.CopySign(y Int16x8) Int16x8
func Int16x8.Expand(mask Mask16x8) Int16x8
func Int16x8.InterleaveHi(y Int16x8) Int16x8
func Int16x8.InterleaveLo(y Int16x8) Int16x8
func Int16x8.Masked(mask Mask16x8) Int16x8
func Int16x8.Max(y Int16x8) Int16x8
func Int16x8.Merge(y Int16x8, mask Mask16x8) Int16x8
func Int16x8.Min(y Int16x8) Int16x8
func Int16x8.Mul(y Int16x8) Int16x8
func Int16x8.MulHigh(y Int16x8) Int16x8
func Int16x8.Not() Int16x8
func Int16x8.OnesCount() Int16x8
func Int16x8.Or(y Int16x8) Int16x8
func Int16x8.Permute(indices Uint16x8) Int16x8
func Int16x8.PermuteScalarsHi(a, b, c, d uint8) Int16x8
func Int16x8.PermuteScalarsLo(a, b, c, d uint8) Int16x8
func Int16x8.SetElem(index uint8, y int16) Int16x8
func Int16x8.ShiftAllLeft(y uint64) Int16x8
func Int16x8.ShiftAllLeftConcat(shift uint8, y Int16x8) Int16x8
func Int16x8.ShiftAllRight(y uint64) Int16x8
func Int16x8.ShiftAllRightConcat(shift uint8, y Int16x8) Int16x8
func Int16x8.ShiftLeft(y Int16x8) Int16x8
func Int16x8.ShiftLeftConcat(y Int16x8, z Int16x8) Int16x8
func Int16x8.ShiftRight(y Int16x8) Int16x8
func Int16x8.ShiftRightConcat(y Int16x8, z Int16x8) Int16x8
func Int16x8.Sub(y Int16x8) Int16x8
func Int16x8.SubPairs(y Int16x8) Int16x8
func Int16x8.SubPairsSaturated(y Int16x8) Int16x8
func Int16x8.SubSaturated(y Int16x8) Int16x8
func Int16x8.Xor(y Int16x8) Int16x8
func Int32x4.AsInt16x8() (to Int16x8)
func Int32x4.SaturateToInt16() Int16x8
func Int32x4.SaturateToInt16Concat(y Int32x4) Int16x8
func Int32x4.TruncateToInt16() Int16x8
func Int32x8.SaturateToInt16() Int16x8
func Int32x8.TruncateToInt16() Int16x8
func Int64x2.AsInt16x8() (to Int16x8)
func Int64x2.SaturateToInt16() Int16x8
func Int64x2.TruncateToInt16() Int16x8
func Int64x4.SaturateToInt16() Int16x8
func Int64x4.TruncateToInt16() Int16x8
func Int64x8.SaturateToInt16() Int16x8
func Int64x8.TruncateToInt16() Int16x8
func Int8x16.AsInt16x8() (to Int16x8)
func Int8x16.ExtendLo8ToInt16x8() Int16x8
func Mask16x8.ToInt16x8() (to Int16x8)
func Uint16x8.AsInt16x8() (to Int16x8)
func Uint32x4.AsInt16x8() (to Int16x8)
func Uint64x2.AsInt16x8() (to Int16x8)
func Uint8x16.AsInt16x8() (to Int16x8)
func Uint8x16.DotProductPairsSaturated(y Int8x16) Int16x8
func Int16x16.SetHi(y Int16x8) Int16x16
func Int16x16.SetLo(y Int16x8) Int16x16
func Int16x8.Add(y Int16x8) Int16x8
func Int16x8.AddPairs(y Int16x8) Int16x8
func Int16x8.AddPairsSaturated(y Int16x8) Int16x8
func Int16x8.AddSaturated(y Int16x8) Int16x8
func Int16x8.And(y Int16x8) Int16x8
func Int16x8.AndNot(y Int16x8) Int16x8
func Int16x8.ConcatPermute(y Int16x8, indices Uint16x8) Int16x8
func Int16x8.CopySign(y Int16x8) Int16x8
func Int16x8.DotProductPairs(y Int16x8) Int32x4
func Int16x8.Equal(y Int16x8) Mask16x8
func Int16x8.Greater(y Int16x8) Mask16x8
func Int16x8.GreaterEqual(y Int16x8) Mask16x8
func Int16x8.InterleaveHi(y Int16x8) Int16x8
func Int16x8.InterleaveLo(y Int16x8) Int16x8
func Int16x8.Less(y Int16x8) Mask16x8
func Int16x8.LessEqual(y Int16x8) Mask16x8
func Int16x8.Max(y Int16x8) Int16x8
func Int16x8.Merge(y Int16x8, mask Mask16x8) Int16x8
func Int16x8.Min(y Int16x8) Int16x8
func Int16x8.Mul(y Int16x8) Int16x8
func Int16x8.MulHigh(y Int16x8) Int16x8
func Int16x8.NotEqual(y Int16x8) Mask16x8
func Int16x8.Or(y Int16x8) Int16x8
func Int16x8.ShiftAllLeftConcat(shift uint8, y Int16x8) Int16x8
func Int16x8.ShiftAllRightConcat(shift uint8, y Int16x8) Int16x8
func Int16x8.ShiftLeft(y Int16x8) Int16x8
func Int16x8.ShiftLeftConcat(y Int16x8, z Int16x8) Int16x8
func Int16x8.ShiftLeftConcat(y Int16x8, z Int16x8) Int16x8
func Int16x8.ShiftRight(y Int16x8) Int16x8
func Int16x8.ShiftRightConcat(y Int16x8, z Int16x8) Int16x8
func Int16x8.ShiftRightConcat(y Int16x8, z Int16x8) Int16x8
func Int16x8.Sub(y Int16x8) Int16x8
func Int16x8.SubPairs(y Int16x8) Int16x8
func Int16x8.SubPairsSaturated(y Int16x8) Int16x8
func Int16x8.SubSaturated(y Int16x8) Int16x8
func Int16x8.Xor(y Int16x8) Int16x8
Int32x16 is a 512-bit SIMD vector of 16 int32 Abs computes the absolute value of each element.
Asm: VPABSD, CPU Feature: AVX512 Add adds corresponding elements of two vectors.
Asm: VPADDD, CPU Feature: AVX512 And performs a bitwise AND operation between two vectors.
Asm: VPANDD, CPU Feature: AVX512 AndNot performs a bitwise x &^ y.
Asm: VPANDND, CPU Feature: AVX512 Float32x16 converts from Int32x16 to Float32x16 Float64x8 converts from Int32x16 to Float64x8 Int16x32 converts from Int32x16 to Int16x32 Int64x8 converts from Int32x16 to Int64x8 Int8x64 converts from Int32x16 to Int8x64 Uint16x32 converts from Int32x16 to Uint16x32 Uint32x16 converts from Int32x16 to Uint32x16 Uint64x8 converts from Int32x16 to Uint64x8 Uint8x64 converts from Int32x16 to Uint8x64 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSD, CPU Feature: AVX512 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2D, CPU Feature: AVX512 ConvertToFloat32 converts element values to float32.
Asm: VCVTDQ2PS, CPU Feature: AVX512 Equal returns x equals y, elementwise.
Asm: VPCMPEQD, CPU Feature: AVX512 Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDD, CPU Feature: AVX512 GetHi returns the upper half of x.
Asm: VEXTRACTI64X4, CPU Feature: AVX512 GetLo returns the lower half of x.
Asm: VEXTRACTI64X4, CPU Feature: AVX512 Greater returns x greater-than y, elementwise.
Asm: VPCMPGTD, CPU Feature: AVX512 GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VPCMPD, CPU Feature: AVX512 InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHDQ, CPU Feature: AVX512 InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLDQ, CPU Feature: AVX512 LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTD, CPU Feature: AVX512 Len returns the number of elements in a Int32x16 Less returns x less-than y, elementwise.
Asm: VPCMPD, CPU Feature: AVX512 LessEqual returns x less-than-or-equals y, elementwise.
Asm: VPCMPD, CPU Feature: AVX512 Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXSD, CPU Feature: AVX512 Merge returns x but with elements set to y where m is false. Min computes the minimum of corresponding elements.
Asm: VPMINSD, CPU Feature: AVX512 Mul multiplies corresponding elements of two vectors.
Asm: VPMULLD, CPU Feature: AVX512 Not returns the bitwise complement of x
Emulated, CPU Feature AVX512 NotEqual returns x not-equals y, elementwise.
Asm: VPCMPD, CPU Feature: AVX512 OnesCount counts the number of set bits in each element.
Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ Or performs a bitwise OR operation between two vectors.
Asm: VPORD, CPU Feature: AVX512 Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 4 bits (values 0-15) of each element of indices is used
Asm: VPERMD, CPU Feature: AVX512 PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{ x[a], x[b], x[c], x[d], x[a+4], x[b+4], x[c+4], x[d+4],
x[a+8], x[b+8], x[c+8], x[d+8], x[a+12], x[b+12], x[c+12], x[d+12]}
Parameters a,b,c,d should have values between 0 and 3.
If a through d are constants, then an instruction will be inlined, otherwise
a jump table may be generated.
Asm: VPSHUFD, CPU Feature: AVX512 RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLD, CPU Feature: AVX512 RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORD, CPU Feature: AVX512 RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVD, CPU Feature: AVX512 RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVD, CPU Feature: AVX512 SaturateToInt16 converts element values to int16.
Conversion is done with saturation on the vector elements.
Asm: VPMOVSDW, CPU Feature: AVX512 SaturateToInt16Concat converts element values to int16.
With each 128-bit as a group:
The converted group from the first input vector will be packed to the lower part of the result vector,
the converted group from the second input vector will be packed to the upper part of the result vector.
Conversion is done with saturation on the vector elements.
Asm: VPACKSSDW, CPU Feature: AVX512 SaturateToInt8 converts element values to int8.
Conversion is done with saturation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSDB, CPU Feature: AVX512 SaturateToUint8 converts element values to uint8.
Conversion is done with saturation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSDB, CPU Feature: AVX512 SelectFromPairGrouped returns, for each of the four 128-bit subvectors
of the vectors x and y, the selection of four elements from x and y,
where selector values in the range 0-3 specify elements from x and
values in the range 4-7 specify the 0-3 elements of y.
When the selectors are constants and can be the selection
can be implemented in a single instruction, it will be, otherwise
it requires two.
If the selectors are not constant this will translate to a function
call.
Asm: VSHUFPS, CPU Feature: AVX512 SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512 SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512 ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLD, CPU Feature: AVX512 ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDD, CPU Feature: AVX512VBMI2 ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
Asm: VPSRAD, CPU Feature: AVX512 ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDD, CPU Feature: AVX512VBMI2 ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVD, CPU Feature: AVX512 ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVD, CPU Feature: AVX512VBMI2 ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
Asm: VPSRAVD, CPU Feature: AVX512 ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVD, CPU Feature: AVX512VBMI2 Store stores a Int32x16 to an array StoreMasked stores a Int32x16 to an array,
at those elements enabled by mask
Asm: VMOVDQU32, CPU Feature: AVX512 StoreSlice stores x into a slice of at least 16 int32s StoreSlicePart stores the 16 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 16 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBD, CPU Feature: AVX512 ToMask converts from Int32x16 to Mask32x16, mask element is set to true when the corresponding vector element is non-zero. TruncateToInt16 converts element values to int16.
Conversion is done with truncation on the vector elements.
Asm: VPMOVDW, CPU Feature: AVX512 TruncateToInt8 converts element values to int8.
Conversion is done with truncation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVDB, CPU Feature: AVX512 Xor performs a bitwise XOR operation between two vectors.
Asm: VPXORD, CPU Feature: AVX512
Int32x16 : expvar.Var
Int32x16 : fmt.Stringer
func BroadcastInt32x16(x int32) Int32x16
func LoadInt32x16(y *[16]int32) Int32x16
func LoadInt32x16Slice(s []int32) Int32x16
func LoadInt32x16SlicePart(s []int32) Int32x16
func LoadMaskedInt32x16(y *[16]int32, mask Mask32x16) Int32x16
func Float32x16.AsInt32x16() (to Int32x16)
func Float32x16.ConvertToInt32() Int32x16
func Float64x8.AsInt32x16() (to Int32x16)
func Int16x16.ExtendToInt32() Int32x16
func Int16x32.AsInt32x16() (to Int32x16)
func Int16x32.DotProductPairs(y Int16x32) Int32x16
func Int32x16.Abs() Int32x16
func Int32x16.Add(y Int32x16) Int32x16
func Int32x16.And(y Int32x16) Int32x16
func Int32x16.AndNot(y Int32x16) Int32x16
func Int32x16.Compress(mask Mask32x16) Int32x16
func Int32x16.ConcatPermute(y Int32x16, indices Uint32x16) Int32x16
func Int32x16.Expand(mask Mask32x16) Int32x16
func Int32x16.InterleaveHiGrouped(y Int32x16) Int32x16
func Int32x16.InterleaveLoGrouped(y Int32x16) Int32x16
func Int32x16.LeadingZeros() Int32x16
func Int32x16.Masked(mask Mask32x16) Int32x16
func Int32x16.Max(y Int32x16) Int32x16
func Int32x16.Merge(y Int32x16, mask Mask32x16) Int32x16
func Int32x16.Min(y Int32x16) Int32x16
func Int32x16.Mul(y Int32x16) Int32x16
func Int32x16.Not() Int32x16
func Int32x16.OnesCount() Int32x16
func Int32x16.Or(y Int32x16) Int32x16
func Int32x16.Permute(indices Uint32x16) Int32x16
func Int32x16.PermuteScalarsGrouped(a, b, c, d uint8) Int32x16
func Int32x16.RotateAllLeft(shift uint8) Int32x16
func Int32x16.RotateAllRight(shift uint8) Int32x16
func Int32x16.RotateLeft(y Int32x16) Int32x16
func Int32x16.RotateRight(y Int32x16) Int32x16
func Int32x16.SelectFromPairGrouped(a, b, c, d uint8, y Int32x16) Int32x16
func Int32x16.SetHi(y Int32x8) Int32x16
func Int32x16.SetLo(y Int32x8) Int32x16
func Int32x16.ShiftAllLeft(y uint64) Int32x16
func Int32x16.ShiftAllLeftConcat(shift uint8, y Int32x16) Int32x16
func Int32x16.ShiftAllRight(y uint64) Int32x16
func Int32x16.ShiftAllRightConcat(shift uint8, y Int32x16) Int32x16
func Int32x16.ShiftLeft(y Int32x16) Int32x16
func Int32x16.ShiftLeftConcat(y Int32x16, z Int32x16) Int32x16
func Int32x16.ShiftRight(y Int32x16) Int32x16
func Int32x16.ShiftRightConcat(y Int32x16, z Int32x16) Int32x16
func Int32x16.Sub(y Int32x16) Int32x16
func Int32x16.Xor(y Int32x16) Int32x16
func Int32x4.Broadcast512() Int32x16
func Int64x8.AsInt32x16() (to Int32x16)
func Int8x16.ExtendToInt32() Int32x16
func Int8x64.AsInt32x16() (to Int32x16)
func Int8x64.DotProductQuadruple(y Uint8x64) Int32x16
func Int8x64.DotProductQuadrupleSaturated(y Uint8x64) Int32x16
func Mask32x16.ToInt32x16() (to Int32x16)
func Uint16x32.AsInt32x16() (to Int32x16)
func Uint32x16.AsInt32x16() (to Int32x16)
func Uint64x8.AsInt32x16() (to Int32x16)
func Uint8x64.AsInt32x16() (to Int32x16)
func Int32x16.Add(y Int32x16) Int32x16
func Int32x16.And(y Int32x16) Int32x16
func Int32x16.AndNot(y Int32x16) Int32x16
func Int32x16.ConcatPermute(y Int32x16, indices Uint32x16) Int32x16
func Int32x16.Equal(y Int32x16) Mask32x16
func Int32x16.Greater(y Int32x16) Mask32x16
func Int32x16.GreaterEqual(y Int32x16) Mask32x16
func Int32x16.InterleaveHiGrouped(y Int32x16) Int32x16
func Int32x16.InterleaveLoGrouped(y Int32x16) Int32x16
func Int32x16.Less(y Int32x16) Mask32x16
func Int32x16.LessEqual(y Int32x16) Mask32x16
func Int32x16.Max(y Int32x16) Int32x16
func Int32x16.Merge(y Int32x16, mask Mask32x16) Int32x16
func Int32x16.Min(y Int32x16) Int32x16
func Int32x16.Mul(y Int32x16) Int32x16
func Int32x16.NotEqual(y Int32x16) Mask32x16
func Int32x16.Or(y Int32x16) Int32x16
func Int32x16.RotateLeft(y Int32x16) Int32x16
func Int32x16.RotateRight(y Int32x16) Int32x16
func Int32x16.SaturateToInt16Concat(y Int32x16) Int16x32
func Int32x16.SelectFromPairGrouped(a, b, c, d uint8, y Int32x16) Int32x16
func Int32x16.ShiftAllLeftConcat(shift uint8, y Int32x16) Int32x16
func Int32x16.ShiftAllRightConcat(shift uint8, y Int32x16) Int32x16
func Int32x16.ShiftLeft(y Int32x16) Int32x16
func Int32x16.ShiftLeftConcat(y Int32x16, z Int32x16) Int32x16
func Int32x16.ShiftLeftConcat(y Int32x16, z Int32x16) Int32x16
func Int32x16.ShiftRight(y Int32x16) Int32x16
func Int32x16.ShiftRightConcat(y Int32x16, z Int32x16) Int32x16
func Int32x16.ShiftRightConcat(y Int32x16, z Int32x16) Int32x16
func Int32x16.Sub(y Int32x16) Int32x16
func Int32x16.Xor(y Int32x16) Int32x16
Int32x4 is a 128-bit SIMD vector of 4 int32 Abs computes the absolute value of each element.
Asm: VPABSD, CPU Feature: AVX Add adds corresponding elements of two vectors.
Asm: VPADDD, CPU Feature: AVX AddPairs horizontally adds adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VPHADDD, CPU Feature: AVX And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX AndNot performs a bitwise x &^ y.
Asm: VPANDN, CPU Feature: AVX Float32x4 converts from Int32x4 to Float32x4 Float64x2 converts from Int32x4 to Float64x2 Int16x8 converts from Int32x4 to Int16x8 Int64x2 converts from Int32x4 to Int64x2 Int8x16 converts from Int32x4 to Int8x16 Uint16x8 converts from Int32x4 to Uint16x8 Uint32x4 converts from Int32x4 to Uint32x4 Uint64x2 converts from Int32x4 to Uint64x2 Uint8x16 converts from Int32x4 to Uint8x16 Broadcast128 copies element zero of its (128-bit) input to all elements of
the 128-bit output vector.
Asm: VPBROADCASTD, CPU Feature: AVX2 Broadcast256 copies element zero of its (128-bit) input to all elements of
the 256-bit output vector.
Asm: VPBROADCASTD, CPU Feature: AVX2 Broadcast512 copies element zero of its (128-bit) input to all elements of
the 512-bit output vector.
Asm: VPBROADCASTD, CPU Feature: AVX512 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSD, CPU Feature: AVX512 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2D, CPU Feature: AVX512 ConvertToFloat32 converts element values to float32.
Asm: VCVTDQ2PS, CPU Feature: AVX ConvertToFloat64 converts element values to float64.
Asm: VCVTDQ2PD, CPU Feature: AVX CopySign returns the product of the first operand with -1, 0, or 1,
whichever constant is nearest to the value of the second operand.
Asm: VPSIGND, CPU Feature: AVX Equal returns x equals y, elementwise.
Asm: VPCMPEQD, CPU Feature: AVX Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDD, CPU Feature: AVX512 ExtendLo2ToInt64x2 converts 2 lowest vector element values to int64.
The result vector's elements are sign-extended.
Asm: VPMOVSXDQ, CPU Feature: AVX ExtendToInt64 converts element values to int64.
The result vector's elements are sign-extended.
Asm: VPMOVSXDQ, CPU Feature: AVX2 GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRD, CPU Feature: AVX Greater returns x greater-than y, elementwise.
Asm: VPCMPGTD, CPU Feature: AVX GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX InterleaveHi interleaves the elements of the high halves of x and y.
Asm: VPUNPCKHDQ, CPU Feature: AVX InterleaveLo interleaves the elements of the low halves of x and y.
Asm: VPUNPCKLDQ, CPU Feature: AVX IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x.
x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTD, CPU Feature: AVX512 Len returns the number of elements in a Int32x4 Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXSD, CPU Feature: AVX Merge returns x but with elements set to y where mask is false. Min computes the minimum of corresponding elements.
Asm: VPMINSD, CPU Feature: AVX Mul multiplies corresponding elements of two vectors.
Asm: VPMULLD, CPU Feature: AVX MulEvenWiden multiplies even-indexed elements, widening the result.
Result[i] = v1.Even[i] * v2.Even[i].
Asm: VPMULDQ, CPU Feature: AVX Not returns the bitwise complement of x
Emulated, CPU Feature AVX NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX OnesCount counts the number of set bits in each element.
Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX PermuteScalars performs a permutation of vector x's elements using the supplied indices:
result = {x[a], x[b], x[c], x[d]}
Parameters a,b,c,d should have values between 0 and 3.
If a through d are constants, then an instruction will be inlined, otherwise
a jump table may be generated.
Asm: VPSHUFD, CPU Feature: AVX RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLD, CPU Feature: AVX512 RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORD, CPU Feature: AVX512 RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVD, CPU Feature: AVX512 RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVD, CPU Feature: AVX512 SaturateToInt16 converts element values to int16.
Conversion is done with saturation on the vector elements.
Asm: VPMOVSDW, CPU Feature: AVX512 SaturateToInt16Concat converts element values to int16.
With each 128-bit as a group:
The converted group from the first input vector will be packed to the lower part of the result vector,
the converted group from the second input vector will be packed to the upper part of the result vector.
Conversion is done with saturation on the vector elements.
Asm: VPACKSSDW, CPU Feature: AVX SaturateToInt8 converts element values to int8.
Conversion is done with saturation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSDB, CPU Feature: AVX512 SaturateToUint8 converts element values to uint8.
Conversion is done with saturation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSDB, CPU Feature: AVX512 SelectFromPair returns the selection of four elements from the two
vectors x and y, where selector values in the range 0-3 specify
elements from x and values in the range 4-7 specify the 0-3 elements
of y. When the selectors are constants and the selection can be
implemented in a single instruction, it will be, otherwise it
requires two. a is the source index of the least element in the
output, and b, c, and d are the indices of the 2nd, 3rd, and 4th
elements in the output. For example,
{1,2,4,8}.SelectFromPair(2,3,5,7,{9,25,49,81}) returns {4,8,25,81}
If the selectors are not constant this will translate to a function
call.
Asm: VSHUFPS, CPU Feature: AVX SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRD, CPU Feature: AVX ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLD, CPU Feature: AVX ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDD, CPU Feature: AVX512VBMI2 ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
Asm: VPSRAD, CPU Feature: AVX ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDD, CPU Feature: AVX512VBMI2 ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVD, CPU Feature: AVX2 ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVD, CPU Feature: AVX512VBMI2 ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
Asm: VPSRAVD, CPU Feature: AVX2 ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVD, CPU Feature: AVX512VBMI2 Store stores a Int32x4 to an array StoreMasked stores a Int32x4 to an array,
at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2 StoreSlice stores x into a slice of at least 4 int32s StoreSlicePart stores the 4 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 4 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBD, CPU Feature: AVX SubPairs horizontally subtracts adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VPHSUBD, CPU Feature: AVX ToMask converts from Int32x4 to Mask32x4, mask element is set to true when the corresponding vector element is non-zero. TruncateToInt16 converts element values to int16.
Conversion is done with truncation on the vector elements.
Asm: VPMOVDW, CPU Feature: AVX512 TruncateToInt8 converts element values to int8.
Conversion is done with truncation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVDB, CPU Feature: AVX512 Xor performs a bitwise XOR operation between two vectors.
Asm: VPXOR, CPU Feature: AVX
Int32x4 : expvar.Var
Int32x4 : fmt.Stringer
func BroadcastInt32x4(x int32) Int32x4
func LoadInt32x4(y *[4]int32) Int32x4
func LoadInt32x4Slice(s []int32) Int32x4
func LoadInt32x4SlicePart(s []int32) Int32x4
func LoadMaskedInt32x4(y *[4]int32, mask Mask32x4) Int32x4
func Float32x4.AsInt32x4() (to Int32x4)
func Float32x4.ConvertToInt32() Int32x4
func Float64x2.AsInt32x4() (to Int32x4)
func Float64x2.ConvertToInt32() Int32x4
func Float64x4.ConvertToInt32() Int32x4
func Int16x8.AsInt32x4() (to Int32x4)
func Int16x8.DotProductPairs(y Int16x8) Int32x4
func Int16x8.ExtendLo4ToInt32x4() Int32x4
func Int32x4.Abs() Int32x4
func Int32x4.Add(y Int32x4) Int32x4
func Int32x4.AddPairs(y Int32x4) Int32x4
func Int32x4.And(y Int32x4) Int32x4
func Int32x4.AndNot(y Int32x4) Int32x4
func Int32x4.Broadcast128() Int32x4
func Int32x4.Compress(mask Mask32x4) Int32x4
func Int32x4.ConcatPermute(y Int32x4, indices Uint32x4) Int32x4
func Int32x4.CopySign(y Int32x4) Int32x4
func Int32x4.Expand(mask Mask32x4) Int32x4
func Int32x4.InterleaveHi(y Int32x4) Int32x4
func Int32x4.InterleaveLo(y Int32x4) Int32x4
func Int32x4.LeadingZeros() Int32x4
func Int32x4.Masked(mask Mask32x4) Int32x4
func Int32x4.Max(y Int32x4) Int32x4
func Int32x4.Merge(y Int32x4, mask Mask32x4) Int32x4
func Int32x4.Min(y Int32x4) Int32x4
func Int32x4.Mul(y Int32x4) Int32x4
func Int32x4.Not() Int32x4
func Int32x4.OnesCount() Int32x4
func Int32x4.Or(y Int32x4) Int32x4
func Int32x4.PermuteScalars(a, b, c, d uint8) Int32x4
func Int32x4.RotateAllLeft(shift uint8) Int32x4
func Int32x4.RotateAllRight(shift uint8) Int32x4
func Int32x4.RotateLeft(y Int32x4) Int32x4
func Int32x4.RotateRight(y Int32x4) Int32x4
func Int32x4.SelectFromPair(a, b, c, d uint8, y Int32x4) Int32x4
func Int32x4.SetElem(index uint8, y int32) Int32x4
func Int32x4.ShiftAllLeft(y uint64) Int32x4
func Int32x4.ShiftAllLeftConcat(shift uint8, y Int32x4) Int32x4
func Int32x4.ShiftAllRight(y uint64) Int32x4
func Int32x4.ShiftAllRightConcat(shift uint8, y Int32x4) Int32x4
func Int32x4.ShiftLeft(y Int32x4) Int32x4
func Int32x4.ShiftLeftConcat(y Int32x4, z Int32x4) Int32x4
func Int32x4.ShiftRight(y Int32x4) Int32x4
func Int32x4.ShiftRightConcat(y Int32x4, z Int32x4) Int32x4
func Int32x4.Sub(y Int32x4) Int32x4
func Int32x4.SubPairs(y Int32x4) Int32x4
func Int32x4.Xor(y Int32x4) Int32x4
func Int32x8.GetHi() Int32x4
func Int32x8.GetLo() Int32x4
func Int64x2.AsInt32x4() (to Int32x4)
func Int64x2.SaturateToInt32() Int32x4
func Int64x2.TruncateToInt32() Int32x4
func Int64x4.SaturateToInt32() Int32x4
func Int64x4.TruncateToInt32() Int32x4
func Int8x16.AsInt32x4() (to Int32x4)
func Int8x16.DotProductQuadruple(y Uint8x16) Int32x4
func Int8x16.DotProductQuadrupleSaturated(y Uint8x16) Int32x4
func Int8x16.ExtendLo4ToInt32x4() Int32x4
func Mask32x4.ToInt32x4() (to Int32x4)
func Uint16x8.AsInt32x4() (to Int32x4)
func Uint32x4.AsInt32x4() (to Int32x4)
func Uint64x2.AsInt32x4() (to Int32x4)
func Uint8x16.AsInt32x4() (to Int32x4)
func Int32x4.Add(y Int32x4) Int32x4
func Int32x4.AddPairs(y Int32x4) Int32x4
func Int32x4.And(y Int32x4) Int32x4
func Int32x4.AndNot(y Int32x4) Int32x4
func Int32x4.ConcatPermute(y Int32x4, indices Uint32x4) Int32x4
func Int32x4.CopySign(y Int32x4) Int32x4
func Int32x4.Equal(y Int32x4) Mask32x4
func Int32x4.Greater(y Int32x4) Mask32x4
func Int32x4.GreaterEqual(y Int32x4) Mask32x4
func Int32x4.InterleaveHi(y Int32x4) Int32x4
func Int32x4.InterleaveLo(y Int32x4) Int32x4
func Int32x4.Less(y Int32x4) Mask32x4
func Int32x4.LessEqual(y Int32x4) Mask32x4
func Int32x4.Max(y Int32x4) Int32x4
func Int32x4.Merge(y Int32x4, mask Mask32x4) Int32x4
func Int32x4.Min(y Int32x4) Int32x4
func Int32x4.Mul(y Int32x4) Int32x4
func Int32x4.MulEvenWiden(y Int32x4) Int64x2
func Int32x4.NotEqual(y Int32x4) Mask32x4
func Int32x4.Or(y Int32x4) Int32x4
func Int32x4.RotateLeft(y Int32x4) Int32x4
func Int32x4.RotateRight(y Int32x4) Int32x4
func Int32x4.SaturateToInt16Concat(y Int32x4) Int16x8
func Int32x4.SelectFromPair(a, b, c, d uint8, y Int32x4) Int32x4
func Int32x4.ShiftAllLeftConcat(shift uint8, y Int32x4) Int32x4
func Int32x4.ShiftAllRightConcat(shift uint8, y Int32x4) Int32x4
func Int32x4.ShiftLeft(y Int32x4) Int32x4
func Int32x4.ShiftLeftConcat(y Int32x4, z Int32x4) Int32x4
func Int32x4.ShiftLeftConcat(y Int32x4, z Int32x4) Int32x4
func Int32x4.ShiftRight(y Int32x4) Int32x4
func Int32x4.ShiftRightConcat(y Int32x4, z Int32x4) Int32x4
func Int32x4.ShiftRightConcat(y Int32x4, z Int32x4) Int32x4
func Int32x4.Sub(y Int32x4) Int32x4
func Int32x4.SubPairs(y Int32x4) Int32x4
func Int32x4.Xor(y Int32x4) Int32x4
func Int32x8.SetHi(y Int32x4) Int32x8
func Int32x8.SetLo(y Int32x4) Int32x8
Int32x8 is a 256-bit SIMD vector of 8 int32 Abs computes the absolute value of each element.
Asm: VPABSD, CPU Feature: AVX2 Add adds corresponding elements of two vectors.
Asm: VPADDD, CPU Feature: AVX2 AddPairs horizontally adds adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VPHADDD, CPU Feature: AVX2 And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2 AndNot performs a bitwise x &^ y.
Asm: VPANDN, CPU Feature: AVX2 Float32x8 converts from Int32x8 to Float32x8 Float64x4 converts from Int32x8 to Float64x4 Int16x16 converts from Int32x8 to Int16x16 Int64x4 converts from Int32x8 to Int64x4 Int8x32 converts from Int32x8 to Int8x32 Uint16x16 converts from Int32x8 to Uint16x16 Uint32x8 converts from Int32x8 to Uint32x8 Uint64x4 converts from Int32x8 to Uint64x4 Uint8x32 converts from Int32x8 to Uint8x32 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSD, CPU Feature: AVX512 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2D, CPU Feature: AVX512 ConvertToFloat32 converts element values to float32.
Asm: VCVTDQ2PS, CPU Feature: AVX ConvertToFloat64 converts element values to float64.
Asm: VCVTDQ2PD, CPU Feature: AVX512 CopySign returns the product of the first operand with -1, 0, or 1,
whichever constant is nearest to the value of the second operand.
Asm: VPSIGND, CPU Feature: AVX2 Equal returns x equals y, elementwise.
Asm: VPCMPEQD, CPU Feature: AVX2 Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDD, CPU Feature: AVX512 ExtendToInt64 converts element values to int64.
The result vector's elements are sign-extended.
Asm: VPMOVSXDQ, CPU Feature: AVX512 GetHi returns the upper half of x.
Asm: VEXTRACTI128, CPU Feature: AVX2 GetLo returns the lower half of x.
Asm: VEXTRACTI128, CPU Feature: AVX2 Greater returns x greater-than y, elementwise.
Asm: VPCMPGTD, CPU Feature: AVX2 GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX2 InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHDQ, CPU Feature: AVX2 InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLDQ, CPU Feature: AVX2 IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x.
x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTD, CPU Feature: AVX512 Len returns the number of elements in a Int32x8 Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX2 LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX2 Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXSD, CPU Feature: AVX2 Merge returns x but with elements set to y where mask is false. Min computes the minimum of corresponding elements.
Asm: VPMINSD, CPU Feature: AVX2 Mul multiplies corresponding elements of two vectors.
Asm: VPMULLD, CPU Feature: AVX2 MulEvenWiden multiplies even-indexed elements, widening the result.
Result[i] = v1.Even[i] * v2.Even[i].
Asm: VPMULDQ, CPU Feature: AVX2 Not returns the bitwise complement of x
Emulated, CPU Feature AVX2 NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX2 OnesCount counts the number of set bits in each element.
Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2 Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 3 bits (values 0-7) of each element of indices is used
Asm: VPERMD, CPU Feature: AVX2 PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:
result = {x[a], x[b], x[c], x[d], x[a+4], x[b+4], x[c+4], x[d+4]}
Parameters a,b,c,d should have values between 0 and 3.
If a through d are constants, then an instruction will be inlined, otherwise
a jump table may be generated.
Asm: VPSHUFD, CPU Feature: AVX2 RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLD, CPU Feature: AVX512 RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORD, CPU Feature: AVX512 RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVD, CPU Feature: AVX512 RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVD, CPU Feature: AVX512 SaturateToInt16 converts element values to int16.
Conversion is done with saturation on the vector elements.
Asm: VPMOVSDW, CPU Feature: AVX512 SaturateToInt16Concat converts element values to int16.
With each 128-bit as a group:
The converted group from the first input vector will be packed to the lower part of the result vector,
the converted group from the second input vector will be packed to the upper part of the result vector.
Conversion is done with saturation on the vector elements.
Asm: VPACKSSDW, CPU Feature: AVX2 SaturateToInt8 converts element values to int8.
Conversion is done with saturation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSDB, CPU Feature: AVX512 SaturateToUint8 converts element values to uint8.
Conversion is done with saturation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSDB, CPU Feature: AVX512 Select128FromPair treats the 256-bit vectors x and y as a single vector of four
128-bit elements, and returns a 256-bit result formed by
concatenating the two elements specified by lo and hi.
For example,
{40, 41, 42, 43, 50, 51, 52, 53}.Select128FromPair(3, 0, {60, 61, 62, 63, 70, 71, 72, 73})
returns {70, 71, 72, 73, 40, 41, 42, 43}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table.
lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2 SelectFromPairGrouped returns, for each of the two 128-bit halves of
the vectors x and y, the selection of four elements from x and y,
where selector values in the range 0-3 specify elements from x and
values in the range 4-7 specify the 0-3 elements of y.
When the selectors are constants and can be the selection
can be implemented in a single instruction, it will be, otherwise
it requires two. a is the source index of the least element in the
output, and b, c, and d are the indices of the 2nd, 3rd, and 4th
elements in the output. For example,
{1,2,4,8,16,32,64,128}.SelectFromPair(2,3,5,7,{9,25,49,81,121,169,225,289})
returns {4,8,25,81,64,128,169,289}
If the selectors are not constant this will translate to a function
call.
Asm: VSHUFPS, CPU Feature: AVX SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2 SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2 ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLD, CPU Feature: AVX2 ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDD, CPU Feature: AVX512VBMI2 ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
Asm: VPSRAD, CPU Feature: AVX2 ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDD, CPU Feature: AVX512VBMI2 ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVD, CPU Feature: AVX2 ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVD, CPU Feature: AVX512VBMI2 ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
Asm: VPSRAVD, CPU Feature: AVX2 ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVD, CPU Feature: AVX512VBMI2 Store stores a Int32x8 to an array StoreMasked stores a Int32x8 to an array,
at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2 StoreSlice stores x into a slice of at least 8 int32s StoreSlicePart stores the 8 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 8 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBD, CPU Feature: AVX2 SubPairs horizontally subtracts adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VPHSUBD, CPU Feature: AVX2 ToMask converts from Int32x8 to Mask32x8, mask element is set to true when the corresponding vector element is non-zero. TruncateToInt16 converts element values to int16.
Conversion is done with truncation on the vector elements.
Asm: VPMOVDW, CPU Feature: AVX512 TruncateToInt8 converts element values to int8.
Conversion is done with truncation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVDB, CPU Feature: AVX512 Xor performs a bitwise XOR operation between two vectors.
Asm: VPXOR, CPU Feature: AVX2
Int32x8 : expvar.Var
Int32x8 : fmt.Stringer
func BroadcastInt32x8(x int32) Int32x8
func LoadInt32x8(y *[8]int32) Int32x8
func LoadInt32x8Slice(s []int32) Int32x8
func LoadInt32x8SlicePart(s []int32) Int32x8
func LoadMaskedInt32x8(y *[8]int32, mask Mask32x8) Int32x8
func Float32x8.AsInt32x8() (to Int32x8)
func Float32x8.ConvertToInt32() Int32x8
func Float64x4.AsInt32x8() (to Int32x8)
func Float64x8.ConvertToInt32() Int32x8
func Int16x16.AsInt32x8() (to Int32x8)
func Int16x16.DotProductPairs(y Int16x16) Int32x8
func Int16x8.ExtendToInt32() Int32x8
func Int32x16.GetHi() Int32x8
func Int32x16.GetLo() Int32x8
func Int32x4.Broadcast256() Int32x8
func Int32x8.Abs() Int32x8
func Int32x8.Add(y Int32x8) Int32x8
func Int32x8.AddPairs(y Int32x8) Int32x8
func Int32x8.And(y Int32x8) Int32x8
func Int32x8.AndNot(y Int32x8) Int32x8
func Int32x8.Compress(mask Mask32x8) Int32x8
func Int32x8.ConcatPermute(y Int32x8, indices Uint32x8) Int32x8
func Int32x8.CopySign(y Int32x8) Int32x8
func Int32x8.Expand(mask Mask32x8) Int32x8
func Int32x8.InterleaveHiGrouped(y Int32x8) Int32x8
func Int32x8.InterleaveLoGrouped(y Int32x8) Int32x8
func Int32x8.LeadingZeros() Int32x8
func Int32x8.Masked(mask Mask32x8) Int32x8
func Int32x8.Max(y Int32x8) Int32x8
func Int32x8.Merge(y Int32x8, mask Mask32x8) Int32x8
func Int32x8.Min(y Int32x8) Int32x8
func Int32x8.Mul(y Int32x8) Int32x8
func Int32x8.Not() Int32x8
func Int32x8.OnesCount() Int32x8
func Int32x8.Or(y Int32x8) Int32x8
func Int32x8.Permute(indices Uint32x8) Int32x8
func Int32x8.PermuteScalarsGrouped(a, b, c, d uint8) Int32x8
func Int32x8.RotateAllLeft(shift uint8) Int32x8
func Int32x8.RotateAllRight(shift uint8) Int32x8
func Int32x8.RotateLeft(y Int32x8) Int32x8
func Int32x8.RotateRight(y Int32x8) Int32x8
func Int32x8.Select128FromPair(lo, hi uint8, y Int32x8) Int32x8
func Int32x8.SelectFromPairGrouped(a, b, c, d uint8, y Int32x8) Int32x8
func Int32x8.SetHi(y Int32x4) Int32x8
func Int32x8.SetLo(y Int32x4) Int32x8
func Int32x8.ShiftAllLeft(y uint64) Int32x8
func Int32x8.ShiftAllLeftConcat(shift uint8, y Int32x8) Int32x8
func Int32x8.ShiftAllRight(y uint64) Int32x8
func Int32x8.ShiftAllRightConcat(shift uint8, y Int32x8) Int32x8
func Int32x8.ShiftLeft(y Int32x8) Int32x8
func Int32x8.ShiftLeftConcat(y Int32x8, z Int32x8) Int32x8
func Int32x8.ShiftRight(y Int32x8) Int32x8
func Int32x8.ShiftRightConcat(y Int32x8, z Int32x8) Int32x8
func Int32x8.Sub(y Int32x8) Int32x8
func Int32x8.SubPairs(y Int32x8) Int32x8
func Int32x8.Xor(y Int32x8) Int32x8
func Int64x4.AsInt32x8() (to Int32x8)
func Int64x8.SaturateToInt32() Int32x8
func Int64x8.TruncateToInt32() Int32x8
func Int8x16.ExtendLo8ToInt32x8() Int32x8
func Int8x32.AsInt32x8() (to Int32x8)
func Int8x32.DotProductQuadruple(y Uint8x32) Int32x8
func Int8x32.DotProductQuadrupleSaturated(y Uint8x32) Int32x8
func Mask32x8.ToInt32x8() (to Int32x8)
func Uint16x16.AsInt32x8() (to Int32x8)
func Uint32x8.AsInt32x8() (to Int32x8)
func Uint64x4.AsInt32x8() (to Int32x8)
func Uint8x32.AsInt32x8() (to Int32x8)
func Int32x16.SetHi(y Int32x8) Int32x16
func Int32x16.SetLo(y Int32x8) Int32x16
func Int32x8.Add(y Int32x8) Int32x8
func Int32x8.AddPairs(y Int32x8) Int32x8
func Int32x8.And(y Int32x8) Int32x8
func Int32x8.AndNot(y Int32x8) Int32x8
func Int32x8.ConcatPermute(y Int32x8, indices Uint32x8) Int32x8
func Int32x8.CopySign(y Int32x8) Int32x8
func Int32x8.Equal(y Int32x8) Mask32x8
func Int32x8.Greater(y Int32x8) Mask32x8
func Int32x8.GreaterEqual(y Int32x8) Mask32x8
func Int32x8.InterleaveHiGrouped(y Int32x8) Int32x8
func Int32x8.InterleaveLoGrouped(y Int32x8) Int32x8
func Int32x8.Less(y Int32x8) Mask32x8
func Int32x8.LessEqual(y Int32x8) Mask32x8
func Int32x8.Max(y Int32x8) Int32x8
func Int32x8.Merge(y Int32x8, mask Mask32x8) Int32x8
func Int32x8.Min(y Int32x8) Int32x8
func Int32x8.Mul(y Int32x8) Int32x8
func Int32x8.MulEvenWiden(y Int32x8) Int64x4
func Int32x8.NotEqual(y Int32x8) Mask32x8
func Int32x8.Or(y Int32x8) Int32x8
func Int32x8.RotateLeft(y Int32x8) Int32x8
func Int32x8.RotateRight(y Int32x8) Int32x8
func Int32x8.SaturateToInt16Concat(y Int32x8) Int16x16
func Int32x8.Select128FromPair(lo, hi uint8, y Int32x8) Int32x8
func Int32x8.SelectFromPairGrouped(a, b, c, d uint8, y Int32x8) Int32x8
func Int32x8.ShiftAllLeftConcat(shift uint8, y Int32x8) Int32x8
func Int32x8.ShiftAllRightConcat(shift uint8, y Int32x8) Int32x8
func Int32x8.ShiftLeft(y Int32x8) Int32x8
func Int32x8.ShiftLeftConcat(y Int32x8, z Int32x8) Int32x8
func Int32x8.ShiftLeftConcat(y Int32x8, z Int32x8) Int32x8
func Int32x8.ShiftRight(y Int32x8) Int32x8
func Int32x8.ShiftRightConcat(y Int32x8, z Int32x8) Int32x8
func Int32x8.ShiftRightConcat(y Int32x8, z Int32x8) Int32x8
func Int32x8.Sub(y Int32x8) Int32x8
func Int32x8.SubPairs(y Int32x8) Int32x8
func Int32x8.Xor(y Int32x8) Int32x8
Int64x2 is a 128-bit SIMD vector of 2 int64 Abs computes the absolute value of each element.
Asm: VPABSQ, CPU Feature: AVX512 Add adds corresponding elements of two vectors.
Asm: VPADDQ, CPU Feature: AVX And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX AndNot performs a bitwise x &^ y.
Asm: VPANDN, CPU Feature: AVX Float32x4 converts from Int64x2 to Float32x4 Float64x2 converts from Int64x2 to Float64x2 Int16x8 converts from Int64x2 to Int16x8 Int32x4 converts from Int64x2 to Int32x4 Int8x16 converts from Int64x2 to Int8x16 Uint16x8 converts from Int64x2 to Uint16x8 Uint32x4 converts from Int64x2 to Uint32x4 Uint64x2 converts from Int64x2 to Uint64x2 Uint8x16 converts from Int64x2 to Uint8x16 Broadcast128 copies element zero of its (128-bit) input to all elements of
the 128-bit output vector.
Asm: VPBROADCASTQ, CPU Feature: AVX2 Broadcast256 copies element zero of its (128-bit) input to all elements of
the 256-bit output vector.
Asm: VPBROADCASTQ, CPU Feature: AVX2 Broadcast512 copies element zero of its (128-bit) input to all elements of
the 512-bit output vector.
Asm: VPBROADCASTQ, CPU Feature: AVX512 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSQ, CPU Feature: AVX512 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2Q, CPU Feature: AVX512 ConvertToFloat32 converts element values to float32.
Asm: VCVTQQ2PSX, CPU Feature: AVX512 ConvertToFloat64 converts element values to float64.
Asm: VCVTQQ2PD, CPU Feature: AVX512 Equal returns x equals y, elementwise.
Asm: VPCMPEQQ, CPU Feature: AVX Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDQ, CPU Feature: AVX512 GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRQ, CPU Feature: AVX Greater returns x greater-than y, elementwise.
Asm: VPCMPGTQ, CPU Feature: AVX GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX InterleaveHi interleaves the elements of the high halves of x and y.
Asm: VPUNPCKHQDQ, CPU Feature: AVX InterleaveLo interleaves the elements of the low halves of x and y.
Asm: VPUNPCKLQDQ, CPU Feature: AVX IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x.
x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTQ, CPU Feature: AVX512 Len returns the number of elements in a Int64x2 Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXSQ, CPU Feature: AVX512 Merge returns x but with elements set to y where mask is false. Min computes the minimum of corresponding elements.
Asm: VPMINSQ, CPU Feature: AVX512 Mul multiplies corresponding elements of two vectors.
Asm: VPMULLQ, CPU Feature: AVX512 Not returns the bitwise complement of x
Emulated, CPU Feature AVX NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX OnesCount counts the number of set bits in each element.
Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLQ, CPU Feature: AVX512 RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORQ, CPU Feature: AVX512 RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVQ, CPU Feature: AVX512 RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVQ, CPU Feature: AVX512 SaturateToInt16 converts element values to int16.
Conversion is done with saturation on the vector elements.
Asm: VPMOVSQW, CPU Feature: AVX512 SaturateToInt32 converts element values to int32.
Conversion is done with saturation on the vector elements.
Asm: VPMOVSQD, CPU Feature: AVX512 SaturateToInt8 converts element values to int8.
Conversion is done with saturation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSQB, CPU Feature: AVX512 SaturateToUint8 converts element values to uint8.
Conversion is done with saturation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSQB, CPU Feature: AVX512 SelectFromPair returns the selection of two elements from the two
vectors x and y, where selector values in the range 0-1 specify
elements from x and values in the range 2-3 specify the 0-1 elements
of y. When the selectors are constants the selection can be
implemented in a single instruction.
If the selectors are not constant this will translate to a function
call.
Asm: VSHUFPD, CPU Feature: AVX SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRQ, CPU Feature: AVX ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLQ, CPU Feature: AVX ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDQ, CPU Feature: AVX512VBMI2 ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
Asm: VPSRAQ, CPU Feature: AVX512 ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDQ, CPU Feature: AVX512VBMI2 ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVQ, CPU Feature: AVX2 ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2 ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
Asm: VPSRAVQ, CPU Feature: AVX512 ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2 Store stores a Int64x2 to an array StoreMasked stores a Int64x2 to an array,
at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2 StoreSlice stores x into a slice of at least 2 int64s StoreSlicePart stores the 2 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 2 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBQ, CPU Feature: AVX ToMask converts from Int64x2 to Mask64x2, mask element is set to true when the corresponding vector element is non-zero. TruncateToInt16 converts element values to int16.
Conversion is done with truncation on the vector elements.
Asm: VPMOVQW, CPU Feature: AVX512 TruncateToInt32 converts element values to int32.
Conversion is done with truncation on the vector elements.
Asm: VPMOVQD, CPU Feature: AVX512 TruncateToInt8 converts element values to int8.
Conversion is done with truncation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVQB, CPU Feature: AVX512 Xor performs a bitwise XOR operation between two vectors.
Asm: VPXOR, CPU Feature: AVX
Int64x2 : expvar.Var
Int64x2 : fmt.Stringer
func BroadcastInt64x2(x int64) Int64x2
func LoadInt64x2(y *[2]int64) Int64x2
func LoadInt64x2Slice(s []int64) Int64x2
func LoadInt64x2SlicePart(s []int64) Int64x2
func LoadMaskedInt64x2(y *[2]int64, mask Mask64x2) Int64x2
func Float32x4.AsInt64x2() (to Int64x2)
func Float64x2.AsInt64x2() (to Int64x2)
func Float64x2.ConvertToInt64() Int64x2
func Int16x8.AsInt64x2() (to Int64x2)
func Int16x8.ExtendLo2ToInt64x2() Int64x2
func Int32x4.AsInt64x2() (to Int64x2)
func Int32x4.ExtendLo2ToInt64x2() Int64x2
func Int32x4.MulEvenWiden(y Int32x4) Int64x2
func Int64x2.Abs() Int64x2
func Int64x2.Add(y Int64x2) Int64x2
func Int64x2.And(y Int64x2) Int64x2
func Int64x2.AndNot(y Int64x2) Int64x2
func Int64x2.Broadcast128() Int64x2
func Int64x2.Compress(mask Mask64x2) Int64x2
func Int64x2.ConcatPermute(y Int64x2, indices Uint64x2) Int64x2
func Int64x2.Expand(mask Mask64x2) Int64x2
func Int64x2.InterleaveHi(y Int64x2) Int64x2
func Int64x2.InterleaveLo(y Int64x2) Int64x2
func Int64x2.LeadingZeros() Int64x2
func Int64x2.Masked(mask Mask64x2) Int64x2
func Int64x2.Max(y Int64x2) Int64x2
func Int64x2.Merge(y Int64x2, mask Mask64x2) Int64x2
func Int64x2.Min(y Int64x2) Int64x2
func Int64x2.Mul(y Int64x2) Int64x2
func Int64x2.Not() Int64x2
func Int64x2.OnesCount() Int64x2
func Int64x2.Or(y Int64x2) Int64x2
func Int64x2.RotateAllLeft(shift uint8) Int64x2
func Int64x2.RotateAllRight(shift uint8) Int64x2
func Int64x2.RotateLeft(y Int64x2) Int64x2
func Int64x2.RotateRight(y Int64x2) Int64x2
func Int64x2.SelectFromPair(a, b uint8, y Int64x2) Int64x2
func Int64x2.SetElem(index uint8, y int64) Int64x2
func Int64x2.ShiftAllLeft(y uint64) Int64x2
func Int64x2.ShiftAllLeftConcat(shift uint8, y Int64x2) Int64x2
func Int64x2.ShiftAllRight(y uint64) Int64x2
func Int64x2.ShiftAllRightConcat(shift uint8, y Int64x2) Int64x2
func Int64x2.ShiftLeft(y Int64x2) Int64x2
func Int64x2.ShiftLeftConcat(y Int64x2, z Int64x2) Int64x2
func Int64x2.ShiftRight(y Int64x2) Int64x2
func Int64x2.ShiftRightConcat(y Int64x2, z Int64x2) Int64x2
func Int64x2.Sub(y Int64x2) Int64x2
func Int64x2.Xor(y Int64x2) Int64x2
func Int64x4.GetHi() Int64x2
func Int64x4.GetLo() Int64x2
func Int8x16.AsInt64x2() (to Int64x2)
func Int8x16.ExtendLo2ToInt64x2() Int64x2
func Mask64x2.ToInt64x2() (to Int64x2)
func Uint16x8.AsInt64x2() (to Int64x2)
func Uint32x4.AsInt64x2() (to Int64x2)
func Uint64x2.AsInt64x2() (to Int64x2)
func Uint8x16.AsInt64x2() (to Int64x2)
func Int64x2.Add(y Int64x2) Int64x2
func Int64x2.And(y Int64x2) Int64x2
func Int64x2.AndNot(y Int64x2) Int64x2
func Int64x2.ConcatPermute(y Int64x2, indices Uint64x2) Int64x2
func Int64x2.Equal(y Int64x2) Mask64x2
func Int64x2.Greater(y Int64x2) Mask64x2
func Int64x2.GreaterEqual(y Int64x2) Mask64x2
func Int64x2.InterleaveHi(y Int64x2) Int64x2
func Int64x2.InterleaveLo(y Int64x2) Int64x2
func Int64x2.Less(y Int64x2) Mask64x2
func Int64x2.LessEqual(y Int64x2) Mask64x2
func Int64x2.Max(y Int64x2) Int64x2
func Int64x2.Merge(y Int64x2, mask Mask64x2) Int64x2
func Int64x2.Min(y Int64x2) Int64x2
func Int64x2.Mul(y Int64x2) Int64x2
func Int64x2.NotEqual(y Int64x2) Mask64x2
func Int64x2.Or(y Int64x2) Int64x2
func Int64x2.RotateLeft(y Int64x2) Int64x2
func Int64x2.RotateRight(y Int64x2) Int64x2
func Int64x2.SelectFromPair(a, b uint8, y Int64x2) Int64x2
func Int64x2.ShiftAllLeftConcat(shift uint8, y Int64x2) Int64x2
func Int64x2.ShiftAllRightConcat(shift uint8, y Int64x2) Int64x2
func Int64x2.ShiftLeft(y Int64x2) Int64x2
func Int64x2.ShiftLeftConcat(y Int64x2, z Int64x2) Int64x2
func Int64x2.ShiftLeftConcat(y Int64x2, z Int64x2) Int64x2
func Int64x2.ShiftRight(y Int64x2) Int64x2
func Int64x2.ShiftRightConcat(y Int64x2, z Int64x2) Int64x2
func Int64x2.ShiftRightConcat(y Int64x2, z Int64x2) Int64x2
func Int64x2.Sub(y Int64x2) Int64x2
func Int64x2.Xor(y Int64x2) Int64x2
func Int64x4.SetHi(y Int64x2) Int64x4
func Int64x4.SetLo(y Int64x2) Int64x4
Int64x4 is a 256-bit SIMD vector of 4 int64 Abs computes the absolute value of each element.
Asm: VPABSQ, CPU Feature: AVX512 Add adds corresponding elements of two vectors.
Asm: VPADDQ, CPU Feature: AVX2 And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2 AndNot performs a bitwise x &^ y.
Asm: VPANDN, CPU Feature: AVX2 Float32x8 converts from Int64x4 to Float32x8 Float64x4 converts from Int64x4 to Float64x4 Int16x16 converts from Int64x4 to Int16x16 Int32x8 converts from Int64x4 to Int32x8 Int8x32 converts from Int64x4 to Int8x32 Uint16x16 converts from Int64x4 to Uint16x16 Uint32x8 converts from Int64x4 to Uint32x8 Uint64x4 converts from Int64x4 to Uint64x4 Uint8x32 converts from Int64x4 to Uint8x32 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSQ, CPU Feature: AVX512 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2Q, CPU Feature: AVX512 ConvertToFloat32 converts element values to float32.
Asm: VCVTQQ2PSY, CPU Feature: AVX512 ConvertToFloat64 converts element values to float64.
Asm: VCVTQQ2PD, CPU Feature: AVX512 Equal returns x equals y, elementwise.
Asm: VPCMPEQQ, CPU Feature: AVX2 Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDQ, CPU Feature: AVX512 GetHi returns the upper half of x.
Asm: VEXTRACTI128, CPU Feature: AVX2 GetLo returns the lower half of x.
Asm: VEXTRACTI128, CPU Feature: AVX2 Greater returns x greater-than y, elementwise.
Asm: VPCMPGTQ, CPU Feature: AVX2 GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX2 InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHQDQ, CPU Feature: AVX2 InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLQDQ, CPU Feature: AVX2 IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x.
x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTQ, CPU Feature: AVX512 Len returns the number of elements in a Int64x4 Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX2 LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX2 Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXSQ, CPU Feature: AVX512 Merge returns x but with elements set to y where mask is false. Min computes the minimum of corresponding elements.
Asm: VPMINSQ, CPU Feature: AVX512 Mul multiplies corresponding elements of two vectors.
Asm: VPMULLQ, CPU Feature: AVX512 Not returns the bitwise complement of x
Emulated, CPU Feature AVX2 NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX2 OnesCount counts the number of set bits in each element.
Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2 Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 2 bits (values 0-3) of each element of indices is used
Asm: VPERMQ, CPU Feature: AVX512 RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLQ, CPU Feature: AVX512 RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORQ, CPU Feature: AVX512 RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVQ, CPU Feature: AVX512 RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVQ, CPU Feature: AVX512 SaturateToInt16 converts element values to int16.
Conversion is done with saturation on the vector elements.
Asm: VPMOVSQW, CPU Feature: AVX512 SaturateToInt32 converts element values to int32.
Conversion is done with saturation on the vector elements.
Asm: VPMOVSQD, CPU Feature: AVX512 SaturateToInt8 converts element values to int8.
Conversion is done with saturation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSQB, CPU Feature: AVX512 SaturateToUint8 converts element values to uint8.
Conversion is done with saturation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSQB, CPU Feature: AVX512 Select128FromPair treats the 256-bit vectors x and y as a single vector of four
128-bit elements, and returns a 256-bit result formed by
concatenating the two elements specified by lo and hi.
For example,
{40, 41, 50, 51}.Select128FromPair(3, 0, {60, 61, 70, 71})
returns {70, 71, 40, 41}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table.
lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2 SelectFromPairGrouped returns, for each of the two 128-bit halves of
the vectors x and y, the selection of two elements from the two
vectors x and y, where selector values in the range 0-1 specify
elements from x and values in the range 2-3 specify the 0-1 elements
of y. When the selectors are constants the selection can be
implemented in a single instruction.
If the selectors are not constant this will translate to a function
call.
Asm: VSHUFPD, CPU Feature: AVX SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2 SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2 ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLQ, CPU Feature: AVX2 ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDQ, CPU Feature: AVX512VBMI2 ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
Asm: VPSRAQ, CPU Feature: AVX512 ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDQ, CPU Feature: AVX512VBMI2 ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVQ, CPU Feature: AVX2 ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2 ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
Asm: VPSRAVQ, CPU Feature: AVX512 ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2 Store stores a Int64x4 to an array StoreMasked stores a Int64x4 to an array,
at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2 StoreSlice stores x into a slice of at least 4 int64s StoreSlicePart stores the 4 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 4 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBQ, CPU Feature: AVX2 ToMask converts from Int64x4 to Mask64x4, mask element is set to true when the corresponding vector element is non-zero. TruncateToInt16 converts element values to int16.
Conversion is done with truncation on the vector elements.
Asm: VPMOVQW, CPU Feature: AVX512 TruncateToInt32 converts element values to int32.
Conversion is done with truncation on the vector elements.
Asm: VPMOVQD, CPU Feature: AVX512 TruncateToInt8 converts element values to int8.
Conversion is done with truncation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVQB, CPU Feature: AVX512 Xor performs a bitwise XOR operation between two vectors.
Asm: VPXOR, CPU Feature: AVX2
Int64x4 : expvar.Var
Int64x4 : fmt.Stringer
func BroadcastInt64x4(x int64) Int64x4
func LoadInt64x4(y *[4]int64) Int64x4
func LoadInt64x4Slice(s []int64) Int64x4
func LoadInt64x4SlicePart(s []int64) Int64x4
func LoadMaskedInt64x4(y *[4]int64, mask Mask64x4) Int64x4
func Float32x4.ConvertToInt64() Int64x4
func Float32x8.AsInt64x4() (to Int64x4)
func Float64x4.AsInt64x4() (to Int64x4)
func Float64x4.ConvertToInt64() Int64x4
func Int16x16.AsInt64x4() (to Int64x4)
func Int16x8.ExtendLo4ToInt64x4() Int64x4
func Int32x4.ExtendToInt64() Int64x4
func Int32x8.AsInt64x4() (to Int64x4)
func Int32x8.MulEvenWiden(y Int32x8) Int64x4
func Int64x2.Broadcast256() Int64x4
func Int64x4.Abs() Int64x4
func Int64x4.Add(y Int64x4) Int64x4
func Int64x4.And(y Int64x4) Int64x4
func Int64x4.AndNot(y Int64x4) Int64x4
func Int64x4.Compress(mask Mask64x4) Int64x4
func Int64x4.ConcatPermute(y Int64x4, indices Uint64x4) Int64x4
func Int64x4.Expand(mask Mask64x4) Int64x4
func Int64x4.InterleaveHiGrouped(y Int64x4) Int64x4
func Int64x4.InterleaveLoGrouped(y Int64x4) Int64x4
func Int64x4.LeadingZeros() Int64x4
func Int64x4.Masked(mask Mask64x4) Int64x4
func Int64x4.Max(y Int64x4) Int64x4
func Int64x4.Merge(y Int64x4, mask Mask64x4) Int64x4
func Int64x4.Min(y Int64x4) Int64x4
func Int64x4.Mul(y Int64x4) Int64x4
func Int64x4.Not() Int64x4
func Int64x4.OnesCount() Int64x4
func Int64x4.Or(y Int64x4) Int64x4
func Int64x4.Permute(indices Uint64x4) Int64x4
func Int64x4.RotateAllLeft(shift uint8) Int64x4
func Int64x4.RotateAllRight(shift uint8) Int64x4
func Int64x4.RotateLeft(y Int64x4) Int64x4
func Int64x4.RotateRight(y Int64x4) Int64x4
func Int64x4.Select128FromPair(lo, hi uint8, y Int64x4) Int64x4
func Int64x4.SelectFromPairGrouped(a, b uint8, y Int64x4) Int64x4
func Int64x4.SetHi(y Int64x2) Int64x4
func Int64x4.SetLo(y Int64x2) Int64x4
func Int64x4.ShiftAllLeft(y uint64) Int64x4
func Int64x4.ShiftAllLeftConcat(shift uint8, y Int64x4) Int64x4
func Int64x4.ShiftAllRight(y uint64) Int64x4
func Int64x4.ShiftAllRightConcat(shift uint8, y Int64x4) Int64x4
func Int64x4.ShiftLeft(y Int64x4) Int64x4
func Int64x4.ShiftLeftConcat(y Int64x4, z Int64x4) Int64x4
func Int64x4.ShiftRight(y Int64x4) Int64x4
func Int64x4.ShiftRightConcat(y Int64x4, z Int64x4) Int64x4
func Int64x4.Sub(y Int64x4) Int64x4
func Int64x4.Xor(y Int64x4) Int64x4
func Int64x8.GetHi() Int64x4
func Int64x8.GetLo() Int64x4
func Int8x16.ExtendLo4ToInt64x4() Int64x4
func Int8x32.AsInt64x4() (to Int64x4)
func Mask64x4.ToInt64x4() (to Int64x4)
func Uint16x16.AsInt64x4() (to Int64x4)
func Uint32x8.AsInt64x4() (to Int64x4)
func Uint64x4.AsInt64x4() (to Int64x4)
func Uint8x32.AsInt64x4() (to Int64x4)
func Int64x4.Add(y Int64x4) Int64x4
func Int64x4.And(y Int64x4) Int64x4
func Int64x4.AndNot(y Int64x4) Int64x4
func Int64x4.ConcatPermute(y Int64x4, indices Uint64x4) Int64x4
func Int64x4.Equal(y Int64x4) Mask64x4
func Int64x4.Greater(y Int64x4) Mask64x4
func Int64x4.GreaterEqual(y Int64x4) Mask64x4
func Int64x4.InterleaveHiGrouped(y Int64x4) Int64x4
func Int64x4.InterleaveLoGrouped(y Int64x4) Int64x4
func Int64x4.Less(y Int64x4) Mask64x4
func Int64x4.LessEqual(y Int64x4) Mask64x4
func Int64x4.Max(y Int64x4) Int64x4
func Int64x4.Merge(y Int64x4, mask Mask64x4) Int64x4
func Int64x4.Min(y Int64x4) Int64x4
func Int64x4.Mul(y Int64x4) Int64x4
func Int64x4.NotEqual(y Int64x4) Mask64x4
func Int64x4.Or(y Int64x4) Int64x4
func Int64x4.RotateLeft(y Int64x4) Int64x4
func Int64x4.RotateRight(y Int64x4) Int64x4
func Int64x4.Select128FromPair(lo, hi uint8, y Int64x4) Int64x4
func Int64x4.SelectFromPairGrouped(a, b uint8, y Int64x4) Int64x4
func Int64x4.ShiftAllLeftConcat(shift uint8, y Int64x4) Int64x4
func Int64x4.ShiftAllRightConcat(shift uint8, y Int64x4) Int64x4
func Int64x4.ShiftLeft(y Int64x4) Int64x4
func Int64x4.ShiftLeftConcat(y Int64x4, z Int64x4) Int64x4
func Int64x4.ShiftLeftConcat(y Int64x4, z Int64x4) Int64x4
func Int64x4.ShiftRight(y Int64x4) Int64x4
func Int64x4.ShiftRightConcat(y Int64x4, z Int64x4) Int64x4
func Int64x4.ShiftRightConcat(y Int64x4, z Int64x4) Int64x4
func Int64x4.Sub(y Int64x4) Int64x4
func Int64x4.Xor(y Int64x4) Int64x4
func Int64x8.SetHi(y Int64x4) Int64x8
func Int64x8.SetLo(y Int64x4) Int64x8
Int64x8 is a 512-bit SIMD vector of 8 int64 Abs computes the absolute value of each element.
Asm: VPABSQ, CPU Feature: AVX512 Add adds corresponding elements of two vectors.
Asm: VPADDQ, CPU Feature: AVX512 And performs a bitwise AND operation between two vectors.
Asm: VPANDQ, CPU Feature: AVX512 AndNot performs a bitwise x &^ y.
Asm: VPANDNQ, CPU Feature: AVX512 Float32x16 converts from Int64x8 to Float32x16 Float64x8 converts from Int64x8 to Float64x8 Int16x32 converts from Int64x8 to Int16x32 Int32x16 converts from Int64x8 to Int32x16 Int8x64 converts from Int64x8 to Int8x64 Uint16x32 converts from Int64x8 to Uint16x32 Uint32x16 converts from Int64x8 to Uint32x16 Uint64x8 converts from Int64x8 to Uint64x8 Uint8x64 converts from Int64x8 to Uint8x64 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSQ, CPU Feature: AVX512 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2Q, CPU Feature: AVX512 ConvertToFloat32 converts element values to float32.
Asm: VCVTQQ2PS, CPU Feature: AVX512 ConvertToFloat64 converts element values to float64.
Asm: VCVTQQ2PD, CPU Feature: AVX512 Equal returns x equals y, elementwise.
Asm: VPCMPEQQ, CPU Feature: AVX512 Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDQ, CPU Feature: AVX512 GetHi returns the upper half of x.
Asm: VEXTRACTI64X4, CPU Feature: AVX512 GetLo returns the lower half of x.
Asm: VEXTRACTI64X4, CPU Feature: AVX512 Greater returns x greater-than y, elementwise.
Asm: VPCMPGTQ, CPU Feature: AVX512 GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VPCMPQ, CPU Feature: AVX512 InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHQDQ, CPU Feature: AVX512 InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLQDQ, CPU Feature: AVX512 LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTQ, CPU Feature: AVX512 Len returns the number of elements in a Int64x8 Less returns x less-than y, elementwise.
Asm: VPCMPQ, CPU Feature: AVX512 LessEqual returns x less-than-or-equals y, elementwise.
Asm: VPCMPQ, CPU Feature: AVX512 Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXSQ, CPU Feature: AVX512 Merge returns x but with elements set to y where m is false. Min computes the minimum of corresponding elements.
Asm: VPMINSQ, CPU Feature: AVX512 Mul multiplies corresponding elements of two vectors.
Asm: VPMULLQ, CPU Feature: AVX512 Not returns the bitwise complement of x
Emulated, CPU Feature AVX512 NotEqual returns x not-equals y, elementwise.
Asm: VPCMPQ, CPU Feature: AVX512 OnesCount counts the number of set bits in each element.
Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ Or performs a bitwise OR operation between two vectors.
Asm: VPORQ, CPU Feature: AVX512 Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 3 bits (values 0-7) of each element of indices is used
Asm: VPERMQ, CPU Feature: AVX512 RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLQ, CPU Feature: AVX512 RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORQ, CPU Feature: AVX512 RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVQ, CPU Feature: AVX512 RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVQ, CPU Feature: AVX512 SaturateToInt16 converts element values to int16.
Conversion is done with saturation on the vector elements.
Asm: VPMOVSQW, CPU Feature: AVX512 SaturateToInt32 converts element values to int32.
Conversion is done with saturation on the vector elements.
Asm: VPMOVSQD, CPU Feature: AVX512 SaturateToInt8 converts element values to int8.
Conversion is done with saturation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSQB, CPU Feature: AVX512 SaturateToUint8 converts element values to uint8.
Conversion is done with saturation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSQB, CPU Feature: AVX512 SelectFromPairGrouped returns, for each of the four 128-bit subvectors
of the vectors x and y, the selection of two elements from the two
vectors x and y, where selector values in the range 0-1 specify
elements from x and values in the range 2-3 specify the 0-1 elements
of y. When the selectors are constants the selection can be
implemented in a single instruction.
If the selectors are not constant this will translate to a function
call.
Asm: VSHUFPD, CPU Feature: AVX512 SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512 SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512 ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLQ, CPU Feature: AVX512 ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDQ, CPU Feature: AVX512VBMI2 ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
Asm: VPSRAQ, CPU Feature: AVX512 ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDQ, CPU Feature: AVX512VBMI2 ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVQ, CPU Feature: AVX512 ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2 ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
Asm: VPSRAVQ, CPU Feature: AVX512 ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2 Store stores a Int64x8 to an array StoreMasked stores a Int64x8 to an array,
at those elements enabled by mask
Asm: VMOVDQU64, CPU Feature: AVX512 StoreSlice stores x into a slice of at least 8 int64s StoreSlicePart stores the 8 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 8 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBQ, CPU Feature: AVX512 ToMask converts from Int64x8 to Mask64x8, mask element is set to true when the corresponding vector element is non-zero. TruncateToInt16 converts element values to int16.
Conversion is done with truncation on the vector elements.
Asm: VPMOVQW, CPU Feature: AVX512 TruncateToInt32 converts element values to int32.
Conversion is done with truncation on the vector elements.
Asm: VPMOVQD, CPU Feature: AVX512 TruncateToInt8 converts element values to int8.
Conversion is done with truncation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVQB, CPU Feature: AVX512 Xor performs a bitwise XOR operation between two vectors.
Asm: VPXORQ, CPU Feature: AVX512
Int64x8 : expvar.Var
Int64x8 : fmt.Stringer
func BroadcastInt64x8(x int64) Int64x8
func LoadInt64x8(y *[8]int64) Int64x8
func LoadInt64x8Slice(s []int64) Int64x8
func LoadInt64x8SlicePart(s []int64) Int64x8
func LoadMaskedInt64x8(y *[8]int64, mask Mask64x8) Int64x8
func Float32x16.AsInt64x8() (to Int64x8)
func Float32x8.ConvertToInt64() Int64x8
func Float64x8.AsInt64x8() (to Int64x8)
func Float64x8.ConvertToInt64() Int64x8
func Int16x32.AsInt64x8() (to Int64x8)
func Int16x8.ExtendToInt64() Int64x8
func Int32x16.AsInt64x8() (to Int64x8)
func Int32x8.ExtendToInt64() Int64x8
func Int64x2.Broadcast512() Int64x8
func Int64x8.Abs() Int64x8
func Int64x8.Add(y Int64x8) Int64x8
func Int64x8.And(y Int64x8) Int64x8
func Int64x8.AndNot(y Int64x8) Int64x8
func Int64x8.Compress(mask Mask64x8) Int64x8
func Int64x8.ConcatPermute(y Int64x8, indices Uint64x8) Int64x8
func Int64x8.Expand(mask Mask64x8) Int64x8
func Int64x8.InterleaveHiGrouped(y Int64x8) Int64x8
func Int64x8.InterleaveLoGrouped(y Int64x8) Int64x8
func Int64x8.LeadingZeros() Int64x8
func Int64x8.Masked(mask Mask64x8) Int64x8
func Int64x8.Max(y Int64x8) Int64x8
func Int64x8.Merge(y Int64x8, mask Mask64x8) Int64x8
func Int64x8.Min(y Int64x8) Int64x8
func Int64x8.Mul(y Int64x8) Int64x8
func Int64x8.Not() Int64x8
func Int64x8.OnesCount() Int64x8
func Int64x8.Or(y Int64x8) Int64x8
func Int64x8.Permute(indices Uint64x8) Int64x8
func Int64x8.RotateAllLeft(shift uint8) Int64x8
func Int64x8.RotateAllRight(shift uint8) Int64x8
func Int64x8.RotateLeft(y Int64x8) Int64x8
func Int64x8.RotateRight(y Int64x8) Int64x8
func Int64x8.SelectFromPairGrouped(a, b uint8, y Int64x8) Int64x8
func Int64x8.SetHi(y Int64x4) Int64x8
func Int64x8.SetLo(y Int64x4) Int64x8
func Int64x8.ShiftAllLeft(y uint64) Int64x8
func Int64x8.ShiftAllLeftConcat(shift uint8, y Int64x8) Int64x8
func Int64x8.ShiftAllRight(y uint64) Int64x8
func Int64x8.ShiftAllRightConcat(shift uint8, y Int64x8) Int64x8
func Int64x8.ShiftLeft(y Int64x8) Int64x8
func Int64x8.ShiftLeftConcat(y Int64x8, z Int64x8) Int64x8
func Int64x8.ShiftRight(y Int64x8) Int64x8
func Int64x8.ShiftRightConcat(y Int64x8, z Int64x8) Int64x8
func Int64x8.Sub(y Int64x8) Int64x8
func Int64x8.Xor(y Int64x8) Int64x8
func Int8x16.ExtendLo8ToInt64x8() Int64x8
func Int8x64.AsInt64x8() (to Int64x8)
func Mask64x8.ToInt64x8() (to Int64x8)
func Uint16x32.AsInt64x8() (to Int64x8)
func Uint32x16.AsInt64x8() (to Int64x8)
func Uint64x8.AsInt64x8() (to Int64x8)
func Uint8x64.AsInt64x8() (to Int64x8)
func Int64x8.Add(y Int64x8) Int64x8
func Int64x8.And(y Int64x8) Int64x8
func Int64x8.AndNot(y Int64x8) Int64x8
func Int64x8.ConcatPermute(y Int64x8, indices Uint64x8) Int64x8
func Int64x8.Equal(y Int64x8) Mask64x8
func Int64x8.Greater(y Int64x8) Mask64x8
func Int64x8.GreaterEqual(y Int64x8) Mask64x8
func Int64x8.InterleaveHiGrouped(y Int64x8) Int64x8
func Int64x8.InterleaveLoGrouped(y Int64x8) Int64x8
func Int64x8.Less(y Int64x8) Mask64x8
func Int64x8.LessEqual(y Int64x8) Mask64x8
func Int64x8.Max(y Int64x8) Int64x8
func Int64x8.Merge(y Int64x8, mask Mask64x8) Int64x8
func Int64x8.Min(y Int64x8) Int64x8
func Int64x8.Mul(y Int64x8) Int64x8
func Int64x8.NotEqual(y Int64x8) Mask64x8
func Int64x8.Or(y Int64x8) Int64x8
func Int64x8.RotateLeft(y Int64x8) Int64x8
func Int64x8.RotateRight(y Int64x8) Int64x8
func Int64x8.SelectFromPairGrouped(a, b uint8, y Int64x8) Int64x8
func Int64x8.ShiftAllLeftConcat(shift uint8, y Int64x8) Int64x8
func Int64x8.ShiftAllRightConcat(shift uint8, y Int64x8) Int64x8
func Int64x8.ShiftLeft(y Int64x8) Int64x8
func Int64x8.ShiftLeftConcat(y Int64x8, z Int64x8) Int64x8
func Int64x8.ShiftLeftConcat(y Int64x8, z Int64x8) Int64x8
func Int64x8.ShiftRight(y Int64x8) Int64x8
func Int64x8.ShiftRightConcat(y Int64x8, z Int64x8) Int64x8
func Int64x8.ShiftRightConcat(y Int64x8, z Int64x8) Int64x8
func Int64x8.Sub(y Int64x8) Int64x8
func Int64x8.Xor(y Int64x8) Int64x8
Int8x16 is a 128-bit SIMD vector of 16 int8 Abs computes the absolute value of each element.
Asm: VPABSB, CPU Feature: AVX Add adds corresponding elements of two vectors.
Asm: VPADDB, CPU Feature: AVX AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDSB, CPU Feature: AVX And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX AndNot performs a bitwise x &^ y.
Asm: VPANDN, CPU Feature: AVX Float32x4 converts from Int8x16 to Float32x4 Float64x2 converts from Int8x16 to Float64x2 Int16x8 converts from Int8x16 to Int16x8 Int32x4 converts from Int8x16 to Int32x4 Int64x2 converts from Int8x16 to Int64x2 Uint16x8 converts from Int8x16 to Uint16x8 Uint32x4 converts from Int8x16 to Uint32x4 Uint64x2 converts from Int8x16 to Uint64x2 Uint8x16 converts from Int8x16 to Uint8x16 Broadcast128 copies element zero of its (128-bit) input to all elements of
the 128-bit output vector.
Asm: VPBROADCASTB, CPU Feature: AVX2 Broadcast256 copies element zero of its (128-bit) input to all elements of
the 256-bit output vector.
Asm: VPBROADCASTB, CPU Feature: AVX2 Broadcast512 copies element zero of its (128-bit) input to all elements of
the 512-bit output vector.
Asm: VPBROADCASTB, CPU Feature: AVX512 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2B, CPU Feature: AVX512VBMI CopySign returns the product of the first operand with -1, 0, or 1,
whichever constant is nearest to the value of the second operand.
Asm: VPSIGNB, CPU Feature: AVX DotProductQuadruple performs dot products on groups of 4 elements of x and y.
DotProductQuadruple(x, y).Add(z) will be optimized to the full form of the underlying instruction.
Asm: VPDPBUSD, CPU Feature: AVXVNNI DotProductQuadrupleSaturated multiplies performs dot products on groups of 4 elements of x and y.
DotProductQuadrupleSaturated(x, y).Add(z) will be optimized to the full form of the underlying instruction.
Asm: VPDPBUSDS, CPU Feature: AVXVNNI Equal returns x equals y, elementwise.
Asm: VPCMPEQB, CPU Feature: AVX Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDB, CPU Feature: AVX512VBMI2 ExtendLo2ToInt64x2 converts 2 lowest vector element values to int64.
The result vector's elements are sign-extended.
Asm: VPMOVSXBQ, CPU Feature: AVX ExtendLo4ToInt32x4 converts 4 lowest vector element values to int32.
The result vector's elements are sign-extended.
Asm: VPMOVSXBD, CPU Feature: AVX ExtendLo4ToInt64x4 converts 4 lowest vector element values to int64.
The result vector's elements are sign-extended.
Asm: VPMOVSXBQ, CPU Feature: AVX2 ExtendLo8ToInt16x8 converts 8 lowest vector element values to int16.
The result vector's elements are sign-extended.
Asm: VPMOVSXBW, CPU Feature: AVX ExtendLo8ToInt32x8 converts 8 lowest vector element values to int32.
The result vector's elements are sign-extended.
Asm: VPMOVSXBD, CPU Feature: AVX2 ExtendLo8ToInt64x8 converts 8 lowest vector element values to int64.
The result vector's elements are sign-extended.
Asm: VPMOVSXBQ, CPU Feature: AVX512 ExtendToInt16 converts element values to int16.
The result vector's elements are sign-extended.
Asm: VPMOVSXBW, CPU Feature: AVX2 ExtendToInt32 converts element values to int32.
The result vector's elements are sign-extended.
Asm: VPMOVSXBD, CPU Feature: AVX512 GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRB, CPU Feature: AVX512 Greater returns x greater-than y, elementwise.
Asm: VPCMPGTB, CPU Feature: AVX GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x.
x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX Len returns the number of elements in a Int8x16 Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXSB, CPU Feature: AVX Merge returns x but with elements set to y where mask is false. Min computes the minimum of corresponding elements.
Asm: VPMINSB, CPU Feature: AVX Not returns the bitwise complement of x
Emulated, CPU Feature AVX NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX OnesCount counts the number of set bits in each element.
Asm: VPOPCNTB, CPU Feature: AVX512BITALG Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 4 bits (values 0-15) of each element of indices is used
Asm: VPERMB, CPU Feature: AVX512VBMI PermuteOrZero performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The lower four bits of each byte-sized index in indices select an element from x,
unless the index's sign bit is set in which case zero is used instead.
Asm: VPSHUFB, CPU Feature: AVX SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRB, CPU Feature: AVX Store stores a Int8x16 to an array StoreSlice stores x into a slice of at least 16 int8s StoreSlicePart stores the elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 16 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBB, CPU Feature: AVX SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBSB, CPU Feature: AVX ToMask converts from Int8x16 to Mask8x16, mask element is set to true when the corresponding vector element is non-zero. Xor performs a bitwise XOR operation between two vectors.
Asm: VPXOR, CPU Feature: AVX
Int8x16 : expvar.Var
Int8x16 : fmt.Stringer
func BroadcastInt8x16(x int8) Int8x16
func LoadInt8x16(y *[16]int8) Int8x16
func LoadInt8x16Slice(s []int8) Int8x16
func LoadInt8x16SlicePart(s []int8) Int8x16
func Float32x4.AsInt8x16() (to Int8x16)
func Float64x2.AsInt8x16() (to Int8x16)
func Int16x16.SaturateToInt8() Int8x16
func Int16x16.SaturateToUint8() Int8x16
func Int16x16.TruncateToInt8() Int8x16
func Int16x8.AsInt8x16() (to Int8x16)
func Int16x8.SaturateToInt8() Int8x16
func Int16x8.SaturateToUint8() Int8x16
func Int16x8.TruncateToInt8() Int8x16
func Int32x16.SaturateToInt8() Int8x16
func Int32x16.SaturateToUint8() Int8x16
func Int32x16.TruncateToInt8() Int8x16
func Int32x4.AsInt8x16() (to Int8x16)
func Int32x4.SaturateToInt8() Int8x16
func Int32x4.SaturateToUint8() Int8x16
func Int32x4.TruncateToInt8() Int8x16
func Int32x8.SaturateToInt8() Int8x16
func Int32x8.SaturateToUint8() Int8x16
func Int32x8.TruncateToInt8() Int8x16
func Int64x2.AsInt8x16() (to Int8x16)
func Int64x2.SaturateToInt8() Int8x16
func Int64x2.SaturateToUint8() Int8x16
func Int64x2.TruncateToInt8() Int8x16
func Int64x4.SaturateToInt8() Int8x16
func Int64x4.SaturateToUint8() Int8x16
func Int64x4.TruncateToInt8() Int8x16
func Int64x8.SaturateToInt8() Int8x16
func Int64x8.SaturateToUint8() Int8x16
func Int64x8.TruncateToInt8() Int8x16
func Int8x16.Abs() Int8x16
func Int8x16.Add(y Int8x16) Int8x16
func Int8x16.AddSaturated(y Int8x16) Int8x16
func Int8x16.And(y Int8x16) Int8x16
func Int8x16.AndNot(y Int8x16) Int8x16
func Int8x16.Broadcast128() Int8x16
func Int8x16.Compress(mask Mask8x16) Int8x16
func Int8x16.ConcatPermute(y Int8x16, indices Uint8x16) Int8x16
func Int8x16.CopySign(y Int8x16) Int8x16
func Int8x16.Expand(mask Mask8x16) Int8x16
func Int8x16.Masked(mask Mask8x16) Int8x16
func Int8x16.Max(y Int8x16) Int8x16
func Int8x16.Merge(y Int8x16, mask Mask8x16) Int8x16
func Int8x16.Min(y Int8x16) Int8x16
func Int8x16.Not() Int8x16
func Int8x16.OnesCount() Int8x16
func Int8x16.Or(y Int8x16) Int8x16
func Int8x16.Permute(indices Uint8x16) Int8x16
func Int8x16.PermuteOrZero(indices Int8x16) Int8x16
func Int8x16.SetElem(index uint8, y int8) Int8x16
func Int8x16.Sub(y Int8x16) Int8x16
func Int8x16.SubSaturated(y Int8x16) Int8x16
func Int8x16.Xor(y Int8x16) Int8x16
func Int8x32.GetHi() Int8x16
func Int8x32.GetLo() Int8x16
func Mask8x16.ToInt8x16() (to Int8x16)
func Uint16x8.AsInt8x16() (to Int8x16)
func Uint32x4.AsInt8x16() (to Int8x16)
func Uint64x2.AsInt8x16() (to Int8x16)
func Uint8x16.AsInt8x16() (to Int8x16)
func Int8x16.Add(y Int8x16) Int8x16
func Int8x16.AddSaturated(y Int8x16) Int8x16
func Int8x16.And(y Int8x16) Int8x16
func Int8x16.AndNot(y Int8x16) Int8x16
func Int8x16.ConcatPermute(y Int8x16, indices Uint8x16) Int8x16
func Int8x16.CopySign(y Int8x16) Int8x16
func Int8x16.Equal(y Int8x16) Mask8x16
func Int8x16.Greater(y Int8x16) Mask8x16
func Int8x16.GreaterEqual(y Int8x16) Mask8x16
func Int8x16.Less(y Int8x16) Mask8x16
func Int8x16.LessEqual(y Int8x16) Mask8x16
func Int8x16.Max(y Int8x16) Int8x16
func Int8x16.Merge(y Int8x16, mask Mask8x16) Int8x16
func Int8x16.Min(y Int8x16) Int8x16
func Int8x16.NotEqual(y Int8x16) Mask8x16
func Int8x16.Or(y Int8x16) Int8x16
func Int8x16.PermuteOrZero(indices Int8x16) Int8x16
func Int8x16.Sub(y Int8x16) Int8x16
func Int8x16.SubSaturated(y Int8x16) Int8x16
func Int8x16.Xor(y Int8x16) Int8x16
func Int8x32.SetHi(y Int8x16) Int8x32
func Int8x32.SetLo(y Int8x16) Int8x32
func Uint8x16.DotProductPairsSaturated(y Int8x16) Int16x8
func Uint8x16.PermuteOrZero(indices Int8x16) Uint8x16
Int8x32 is a 256-bit SIMD vector of 32 int8 Abs computes the absolute value of each element.
Asm: VPABSB, CPU Feature: AVX2 Add adds corresponding elements of two vectors.
Asm: VPADDB, CPU Feature: AVX2 AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDSB, CPU Feature: AVX2 And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2 AndNot performs a bitwise x &^ y.
Asm: VPANDN, CPU Feature: AVX2 Float32x8 converts from Int8x32 to Float32x8 Float64x4 converts from Int8x32 to Float64x4 Int16x16 converts from Int8x32 to Int16x16 Int32x8 converts from Int8x32 to Int32x8 Int64x4 converts from Int8x32 to Int64x4 Uint16x16 converts from Int8x32 to Uint16x16 Uint32x8 converts from Int8x32 to Uint32x8 Uint64x4 converts from Int8x32 to Uint64x4 Uint8x32 converts from Int8x32 to Uint8x32 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2B, CPU Feature: AVX512VBMI CopySign returns the product of the first operand with -1, 0, or 1,
whichever constant is nearest to the value of the second operand.
Asm: VPSIGNB, CPU Feature: AVX2 DotProductQuadruple performs dot products on groups of 4 elements of x and y.
DotProductQuadruple(x, y).Add(z) will be optimized to the full form of the underlying instruction.
Asm: VPDPBUSD, CPU Feature: AVXVNNI DotProductQuadrupleSaturated multiplies performs dot products on groups of 4 elements of x and y.
DotProductQuadrupleSaturated(x, y).Add(z) will be optimized to the full form of the underlying instruction.
Asm: VPDPBUSDS, CPU Feature: AVXVNNI Equal returns x equals y, elementwise.
Asm: VPCMPEQB, CPU Feature: AVX2 Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDB, CPU Feature: AVX512VBMI2 ExtendToInt16 converts element values to int16.
The result vector's elements are sign-extended.
Asm: VPMOVSXBW, CPU Feature: AVX512 GetHi returns the upper half of x.
Asm: VEXTRACTI128, CPU Feature: AVX2 GetLo returns the lower half of x.
Asm: VEXTRACTI128, CPU Feature: AVX2 Greater returns x greater-than y, elementwise.
Asm: VPCMPGTB, CPU Feature: AVX2 GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX2 IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x.
x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX Len returns the number of elements in a Int8x32 Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX2 LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX2 Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXSB, CPU Feature: AVX2 Merge returns x but with elements set to y where mask is false. Min computes the minimum of corresponding elements.
Asm: VPMINSB, CPU Feature: AVX2 Not returns the bitwise complement of x
Emulated, CPU Feature AVX2 NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX2 OnesCount counts the number of set bits in each element.
Asm: VPOPCNTB, CPU Feature: AVX512BITALG Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2 Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 5 bits (values 0-31) of each element of indices is used
Asm: VPERMB, CPU Feature: AVX512VBMI PermuteOrZeroGrouped performs a grouped permutation of vector x using indices:
result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...}
The lower four bits of each byte-sized index in indices select an element from its corresponding group in x,
unless the index's sign bit is set in which case zero is used instead.
Each group is of size 128-bit.
Asm: VPSHUFB, CPU Feature: AVX2 Select128FromPair treats the 256-bit vectors x and y as a single vector of four
128-bit elements, and returns a 256-bit result formed by
concatenating the two elements specified by lo and hi.
For example,
{0x40, 0x41, ..., 0x4f, 0x50, 0x51, ..., 0x5f}.Select128FromPair(3, 0,
{0x60, 0x61, ..., 0x6f, 0x70, 0x71, ..., 0x7f})
returns {0x70, 0x71, ..., 0x7f, 0x40, 0x41, ..., 0x4f}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table.
lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2 SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2 SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2 Store stores a Int8x32 to an array StoreSlice stores x into a slice of at least 32 int8s StoreSlicePart stores the elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 32 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBB, CPU Feature: AVX2 SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBSB, CPU Feature: AVX2 ToMask converts from Int8x32 to Mask8x32, mask element is set to true when the corresponding vector element is non-zero. Xor performs a bitwise XOR operation between two vectors.
Asm: VPXOR, CPU Feature: AVX2
Int8x32 : expvar.Var
Int8x32 : fmt.Stringer
func BroadcastInt8x32(x int8) Int8x32
func LoadInt8x32(y *[32]int8) Int8x32
func LoadInt8x32Slice(s []int8) Int8x32
func LoadInt8x32SlicePart(s []int8) Int8x32
func Float32x8.AsInt8x32() (to Int8x32)
func Float64x4.AsInt8x32() (to Int8x32)
func Int16x16.AsInt8x32() (to Int8x32)
func Int16x32.SaturateToInt8() Int8x32
func Int16x32.TruncateToInt8() Int8x32
func Int32x8.AsInt8x32() (to Int8x32)
func Int64x4.AsInt8x32() (to Int8x32)
func Int8x16.Broadcast256() Int8x32
func Int8x32.Abs() Int8x32
func Int8x32.Add(y Int8x32) Int8x32
func Int8x32.AddSaturated(y Int8x32) Int8x32
func Int8x32.And(y Int8x32) Int8x32
func Int8x32.AndNot(y Int8x32) Int8x32
func Int8x32.Compress(mask Mask8x32) Int8x32
func Int8x32.ConcatPermute(y Int8x32, indices Uint8x32) Int8x32
func Int8x32.CopySign(y Int8x32) Int8x32
func Int8x32.Expand(mask Mask8x32) Int8x32
func Int8x32.Masked(mask Mask8x32) Int8x32
func Int8x32.Max(y Int8x32) Int8x32
func Int8x32.Merge(y Int8x32, mask Mask8x32) Int8x32
func Int8x32.Min(y Int8x32) Int8x32
func Int8x32.Not() Int8x32
func Int8x32.OnesCount() Int8x32
func Int8x32.Or(y Int8x32) Int8x32
func Int8x32.Permute(indices Uint8x32) Int8x32
func Int8x32.PermuteOrZeroGrouped(indices Int8x32) Int8x32
func Int8x32.Select128FromPair(lo, hi uint8, y Int8x32) Int8x32
func Int8x32.SetHi(y Int8x16) Int8x32
func Int8x32.SetLo(y Int8x16) Int8x32
func Int8x32.Sub(y Int8x32) Int8x32
func Int8x32.SubSaturated(y Int8x32) Int8x32
func Int8x32.Xor(y Int8x32) Int8x32
func Int8x64.GetHi() Int8x32
func Int8x64.GetLo() Int8x32
func Mask8x32.ToInt8x32() (to Int8x32)
func Uint16x16.AsInt8x32() (to Int8x32)
func Uint32x8.AsInt8x32() (to Int8x32)
func Uint64x4.AsInt8x32() (to Int8x32)
func Uint8x32.AsInt8x32() (to Int8x32)
func Int8x32.Add(y Int8x32) Int8x32
func Int8x32.AddSaturated(y Int8x32) Int8x32
func Int8x32.And(y Int8x32) Int8x32
func Int8x32.AndNot(y Int8x32) Int8x32
func Int8x32.ConcatPermute(y Int8x32, indices Uint8x32) Int8x32
func Int8x32.CopySign(y Int8x32) Int8x32
func Int8x32.Equal(y Int8x32) Mask8x32
func Int8x32.Greater(y Int8x32) Mask8x32
func Int8x32.GreaterEqual(y Int8x32) Mask8x32
func Int8x32.Less(y Int8x32) Mask8x32
func Int8x32.LessEqual(y Int8x32) Mask8x32
func Int8x32.Max(y Int8x32) Int8x32
func Int8x32.Merge(y Int8x32, mask Mask8x32) Int8x32
func Int8x32.Min(y Int8x32) Int8x32
func Int8x32.NotEqual(y Int8x32) Mask8x32
func Int8x32.Or(y Int8x32) Int8x32
func Int8x32.PermuteOrZeroGrouped(indices Int8x32) Int8x32
func Int8x32.Select128FromPair(lo, hi uint8, y Int8x32) Int8x32
func Int8x32.Sub(y Int8x32) Int8x32
func Int8x32.SubSaturated(y Int8x32) Int8x32
func Int8x32.Xor(y Int8x32) Int8x32
func Int8x64.SetHi(y Int8x32) Int8x64
func Int8x64.SetLo(y Int8x32) Int8x64
func Uint8x32.DotProductPairsSaturated(y Int8x32) Int16x16
func Uint8x32.PermuteOrZeroGrouped(indices Int8x32) Uint8x32
Int8x64 is a 512-bit SIMD vector of 64 int8 Abs computes the absolute value of each element.
Asm: VPABSB, CPU Feature: AVX512 Add adds corresponding elements of two vectors.
Asm: VPADDB, CPU Feature: AVX512 AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDSB, CPU Feature: AVX512 And performs a bitwise AND operation between two vectors.
Asm: VPANDD, CPU Feature: AVX512 AndNot performs a bitwise x &^ y.
Asm: VPANDND, CPU Feature: AVX512 Float32x16 converts from Int8x64 to Float32x16 Float64x8 converts from Int8x64 to Float64x8 Int16x32 converts from Int8x64 to Int16x32 Int32x16 converts from Int8x64 to Int32x16 Int64x8 converts from Int8x64 to Int64x8 Uint16x32 converts from Int8x64 to Uint16x32 Uint32x16 converts from Int8x64 to Uint32x16 Uint64x8 converts from Int8x64 to Uint64x8 Uint8x64 converts from Int8x64 to Uint8x64 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2B, CPU Feature: AVX512VBMI DotProductQuadruple performs dot products on groups of 4 elements of x and y.
DotProductQuadruple(x, y).Add(z) will be optimized to the full form of the underlying instruction.
Asm: VPDPBUSD, CPU Feature: AVX512VNNI DotProductQuadrupleSaturated multiplies performs dot products on groups of 4 elements of x and y.
DotProductQuadrupleSaturated(x, y).Add(z) will be optimized to the full form of the underlying instruction.
Asm: VPDPBUSDS, CPU Feature: AVX512VNNI Equal returns x equals y, elementwise.
Asm: VPCMPEQB, CPU Feature: AVX512 Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDB, CPU Feature: AVX512VBMI2 GetHi returns the upper half of x.
Asm: VEXTRACTI64X4, CPU Feature: AVX512 GetLo returns the lower half of x.
Asm: VEXTRACTI64X4, CPU Feature: AVX512 Greater returns x greater-than y, elementwise.
Asm: VPCMPGTB, CPU Feature: AVX512 GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VPCMPB, CPU Feature: AVX512 Len returns the number of elements in a Int8x64 Less returns x less-than y, elementwise.
Asm: VPCMPB, CPU Feature: AVX512 LessEqual returns x less-than-or-equals y, elementwise.
Asm: VPCMPB, CPU Feature: AVX512 Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXSB, CPU Feature: AVX512 Merge returns x but with elements set to y where m is false. Min computes the minimum of corresponding elements.
Asm: VPMINSB, CPU Feature: AVX512 Not returns the bitwise complement of x
Emulated, CPU Feature AVX512 NotEqual returns x not-equals y, elementwise.
Asm: VPCMPB, CPU Feature: AVX512 OnesCount counts the number of set bits in each element.
Asm: VPOPCNTB, CPU Feature: AVX512BITALG Or performs a bitwise OR operation between two vectors.
Asm: VPORD, CPU Feature: AVX512 Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 6 bits (values 0-63) of each element of indices is used
Asm: VPERMB, CPU Feature: AVX512VBMI PermuteOrZeroGrouped performs a grouped permutation of vector x using indices:
result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...}
The lower four bits of each byte-sized index in indices select an element from its corresponding group in x,
unless the index's sign bit is set in which case zero is used instead.
Each group is of size 128-bit.
Asm: VPSHUFB, CPU Feature: AVX512 SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512 SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512 Store stores a Int8x64 to an array StoreMasked stores a Int8x64 to an array,
at those elements enabled by mask
Asm: VMOVDQU8, CPU Feature: AVX512 StoreSlice stores x into a slice of at least 64 int8s StoreSlicePart stores the 64 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 64 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBB, CPU Feature: AVX512 SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBSB, CPU Feature: AVX512 ToMask converts from Int8x64 to Mask8x64, mask element is set to true when the corresponding vector element is non-zero. Xor performs a bitwise XOR operation between two vectors.
Asm: VPXORD, CPU Feature: AVX512
Int8x64 : expvar.Var
Int8x64 : fmt.Stringer
func BroadcastInt8x64(x int8) Int8x64
func LoadInt8x64(y *[64]int8) Int8x64
func LoadInt8x64Slice(s []int8) Int8x64
func LoadInt8x64SlicePart(s []int8) Int8x64
func LoadMaskedInt8x64(y *[64]int8, mask Mask8x64) Int8x64
func Float32x16.AsInt8x64() (to Int8x64)
func Float64x8.AsInt8x64() (to Int8x64)
func Int16x32.AsInt8x64() (to Int8x64)
func Int32x16.AsInt8x64() (to Int8x64)
func Int64x8.AsInt8x64() (to Int8x64)
func Int8x16.Broadcast512() Int8x64
func Int8x64.Abs() Int8x64
func Int8x64.Add(y Int8x64) Int8x64
func Int8x64.AddSaturated(y Int8x64) Int8x64
func Int8x64.And(y Int8x64) Int8x64
func Int8x64.AndNot(y Int8x64) Int8x64
func Int8x64.Compress(mask Mask8x64) Int8x64
func Int8x64.ConcatPermute(y Int8x64, indices Uint8x64) Int8x64
func Int8x64.Expand(mask Mask8x64) Int8x64
func Int8x64.Masked(mask Mask8x64) Int8x64
func Int8x64.Max(y Int8x64) Int8x64
func Int8x64.Merge(y Int8x64, mask Mask8x64) Int8x64
func Int8x64.Min(y Int8x64) Int8x64
func Int8x64.Not() Int8x64
func Int8x64.OnesCount() Int8x64
func Int8x64.Or(y Int8x64) Int8x64
func Int8x64.Permute(indices Uint8x64) Int8x64
func Int8x64.PermuteOrZeroGrouped(indices Int8x64) Int8x64
func Int8x64.SetHi(y Int8x32) Int8x64
func Int8x64.SetLo(y Int8x32) Int8x64
func Int8x64.Sub(y Int8x64) Int8x64
func Int8x64.SubSaturated(y Int8x64) Int8x64
func Int8x64.Xor(y Int8x64) Int8x64
func Mask8x64.ToInt8x64() (to Int8x64)
func Uint16x32.AsInt8x64() (to Int8x64)
func Uint32x16.AsInt8x64() (to Int8x64)
func Uint64x8.AsInt8x64() (to Int8x64)
func Uint8x64.AsInt8x64() (to Int8x64)
func Int8x64.Add(y Int8x64) Int8x64
func Int8x64.AddSaturated(y Int8x64) Int8x64
func Int8x64.And(y Int8x64) Int8x64
func Int8x64.AndNot(y Int8x64) Int8x64
func Int8x64.ConcatPermute(y Int8x64, indices Uint8x64) Int8x64
func Int8x64.Equal(y Int8x64) Mask8x64
func Int8x64.Greater(y Int8x64) Mask8x64
func Int8x64.GreaterEqual(y Int8x64) Mask8x64
func Int8x64.Less(y Int8x64) Mask8x64
func Int8x64.LessEqual(y Int8x64) Mask8x64
func Int8x64.Max(y Int8x64) Int8x64
func Int8x64.Merge(y Int8x64, mask Mask8x64) Int8x64
func Int8x64.Min(y Int8x64) Int8x64
func Int8x64.NotEqual(y Int8x64) Mask8x64
func Int8x64.Or(y Int8x64) Int8x64
func Int8x64.PermuteOrZeroGrouped(indices Int8x64) Int8x64
func Int8x64.Sub(y Int8x64) Int8x64
func Int8x64.SubSaturated(y Int8x64) Int8x64
func Int8x64.Xor(y Int8x64) Int8x64
func Uint8x64.DotProductPairsSaturated(y Int8x64) Int16x32
func Uint8x64.PermuteOrZeroGrouped(indices Int8x64) Uint8x64
Uint16x16 is a 256-bit SIMD vector of 16 uint16 Add adds corresponding elements of two vectors.
Asm: VPADDW, CPU Feature: AVX2 AddPairs horizontally adds adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VPHADDW, CPU Feature: AVX2 AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDUSW, CPU Feature: AVX2 And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2 AndNot performs a bitwise x &^ y.
Asm: VPANDN, CPU Feature: AVX2 Float32x8 converts from Uint16x16 to Float32x8 Float64x4 converts from Uint16x16 to Float64x4 Int16x16 converts from Uint16x16 to Int16x16 Int32x8 converts from Uint16x16 to Int32x8 Int64x4 converts from Uint16x16 to Int64x4 Int8x32 converts from Uint16x16 to Int8x32 Uint32x8 converts from Uint16x16 to Uint32x8 Uint64x4 converts from Uint16x16 to Uint64x4 Uint8x32 converts from Uint16x16 to Uint8x32 Average computes the rounded average of corresponding elements.
Asm: VPAVGW, CPU Feature: AVX2 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2W, CPU Feature: AVX512 Equal returns x equals y, elementwise.
Asm: VPCMPEQW, CPU Feature: AVX2 Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDW, CPU Feature: AVX512VBMI2 ExtendToUint32 converts element values to uint32.
The result vector's elements are zero-extended.
Asm: VPMOVZXWD, CPU Feature: AVX512 GetHi returns the upper half of x.
Asm: VEXTRACTI128, CPU Feature: AVX2 GetLo returns the lower half of x.
Asm: VEXTRACTI128, CPU Feature: AVX2 Greater returns a mask whose elements indicate whether x > y
Emulated, CPU Feature AVX2 GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX2 InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHWD, CPU Feature: AVX2 InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLWD, CPU Feature: AVX2 IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x.
x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX Len returns the number of elements in a Uint16x16 Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX2 LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX2 Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXUW, CPU Feature: AVX2 Merge returns x but with elements set to y where mask is false. Min computes the minimum of corresponding elements.
Asm: VPMINUW, CPU Feature: AVX2 Mul multiplies corresponding elements of two vectors.
Asm: VPMULLW, CPU Feature: AVX2 MulHigh multiplies elements and stores the high part of the result.
Asm: VPMULHUW, CPU Feature: AVX2 Not returns the bitwise complement of x
Emulated, CPU Feature AVX2 NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX2 OnesCount counts the number of set bits in each element.
Asm: VPOPCNTW, CPU Feature: AVX512BITALG Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2 Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 4 bits (values 0-15) of each element of indices is used
Asm: VPERMW, CPU Feature: AVX512 PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4],
x[8], x[9], x[10], x[11], x[a+12], x[b+12], x[c+12], x[d+12]}
Each group is of size 128-bit.
Parameters a,b,c,d should have values between 0 and 3.
If a through d are constants, then an instruction will be inlined, otherwise
a jump table is generated.
Asm: VPSHUFHW, CPU Feature: AVX2 PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:
result = {x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7],
x[a+8], x[b+8], x[c+8], x[d+8], x[12], x[13], x[14], x[15]}
Parameters a,b,c,d should have values between 0 and 3.
If a through d are constants, then an instruction will be inlined, otherwise
a jump table is generated.
Asm: VPSHUFLW, CPU Feature: AVX2 Select128FromPair treats the 256-bit vectors x and y as a single vector of four
128-bit elements, and returns a 256-bit result formed by
concatenating the two elements specified by lo and hi.
For example,
{40, 41, 42, 43, 44, 45, 46, 47, 50, 51, 52, 53, 54, 55, 56, 57}.Select128FromPair(3, 0,
{60, 61, 62, 63, 64, 65, 66, 67, 70, 71, 72, 73, 74, 75, 76, 77})
returns {70, 71, 72, 73, 74, 75, 76, 77, 40, 41, 42, 43, 44, 45, 46, 47}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table.
lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2 SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2 SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2 ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLW, CPU Feature: AVX2 ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDW, CPU Feature: AVX512VBMI2 ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
Asm: VPSRLW, CPU Feature: AVX2 ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDW, CPU Feature: AVX512VBMI2 ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVW, CPU Feature: AVX512 ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVW, CPU Feature: AVX512VBMI2 ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
Asm: VPSRLVW, CPU Feature: AVX512 ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVW, CPU Feature: AVX512VBMI2 Store stores a Uint16x16 to an array StoreSlice stores x into a slice of at least 16 uint16s StoreSlicePart stores the 16 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 16 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBW, CPU Feature: AVX2 SubPairs horizontally subtracts adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VPHSUBW, CPU Feature: AVX2 SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBUSW, CPU Feature: AVX2 TruncateToUint8 converts element values to uint8.
Conversion is done with truncation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVWB, CPU Feature: AVX512 Xor performs a bitwise XOR operation between two vectors.
Asm: VPXOR, CPU Feature: AVX2
Uint16x16 : expvar.Var
Uint16x16 : fmt.Stringer
func BroadcastUint16x16(x uint16) Uint16x16
func LoadUint16x16(y *[16]uint16) Uint16x16
func LoadUint16x16Slice(s []uint16) Uint16x16
func LoadUint16x16SlicePart(s []uint16) Uint16x16
func Float32x8.AsUint16x16() (to Uint16x16)
func Float64x4.AsUint16x16() (to Uint16x16)
func Int16x16.AsUint16x16() (to Uint16x16)
func Int32x8.AsUint16x16() (to Uint16x16)
func Int64x4.AsUint16x16() (to Uint16x16)
func Int8x32.AsUint16x16() (to Uint16x16)
func Uint16x16.Add(y Uint16x16) Uint16x16
func Uint16x16.AddPairs(y Uint16x16) Uint16x16
func Uint16x16.AddSaturated(y Uint16x16) Uint16x16
func Uint16x16.And(y Uint16x16) Uint16x16
func Uint16x16.AndNot(y Uint16x16) Uint16x16
func Uint16x16.Average(y Uint16x16) Uint16x16
func Uint16x16.Compress(mask Mask16x16) Uint16x16
func Uint16x16.ConcatPermute(y Uint16x16, indices Uint16x16) Uint16x16
func Uint16x16.Expand(mask Mask16x16) Uint16x16
func Uint16x16.InterleaveHiGrouped(y Uint16x16) Uint16x16
func Uint16x16.InterleaveLoGrouped(y Uint16x16) Uint16x16
func Uint16x16.Masked(mask Mask16x16) Uint16x16
func Uint16x16.Max(y Uint16x16) Uint16x16
func Uint16x16.Merge(y Uint16x16, mask Mask16x16) Uint16x16
func Uint16x16.Min(y Uint16x16) Uint16x16
func Uint16x16.Mul(y Uint16x16) Uint16x16
func Uint16x16.MulHigh(y Uint16x16) Uint16x16
func Uint16x16.Not() Uint16x16
func Uint16x16.OnesCount() Uint16x16
func Uint16x16.Or(y Uint16x16) Uint16x16
func Uint16x16.Permute(indices Uint16x16) Uint16x16
func Uint16x16.PermuteScalarsHiGrouped(a, b, c, d uint8) Uint16x16
func Uint16x16.PermuteScalarsLoGrouped(a, b, c, d uint8) Uint16x16
func Uint16x16.Select128FromPair(lo, hi uint8, y Uint16x16) Uint16x16
func Uint16x16.SetHi(y Uint16x8) Uint16x16
func Uint16x16.SetLo(y Uint16x8) Uint16x16
func Uint16x16.ShiftAllLeft(y uint64) Uint16x16
func Uint16x16.ShiftAllLeftConcat(shift uint8, y Uint16x16) Uint16x16
func Uint16x16.ShiftAllRight(y uint64) Uint16x16
func Uint16x16.ShiftAllRightConcat(shift uint8, y Uint16x16) Uint16x16
func Uint16x16.ShiftLeft(y Uint16x16) Uint16x16
func Uint16x16.ShiftLeftConcat(y Uint16x16, z Uint16x16) Uint16x16
func Uint16x16.ShiftRight(y Uint16x16) Uint16x16
func Uint16x16.ShiftRightConcat(y Uint16x16, z Uint16x16) Uint16x16
func Uint16x16.Sub(y Uint16x16) Uint16x16
func Uint16x16.SubPairs(y Uint16x16) Uint16x16
func Uint16x16.SubSaturated(y Uint16x16) Uint16x16
func Uint16x16.Xor(y Uint16x16) Uint16x16
func Uint16x32.GetHi() Uint16x16
func Uint16x32.GetLo() Uint16x16
func Uint16x8.Broadcast256() Uint16x16
func Uint32x16.SaturateToUint16() Uint16x16
func Uint32x16.TruncateToUint16() Uint16x16
func Uint32x8.AsUint16x16() (to Uint16x16)
func Uint32x8.SaturateToUint16Concat(y Uint32x8) Uint16x16
func Uint64x4.AsUint16x16() (to Uint16x16)
func Uint8x16.ExtendToUint16() Uint16x16
func Uint8x32.AsUint16x16() (to Uint16x16)
func Uint8x32.SumAbsDiff(y Uint8x32) Uint16x16
func Int16x16.ConcatPermute(y Int16x16, indices Uint16x16) Int16x16
func Int16x16.Permute(indices Uint16x16) Int16x16
func Uint16x16.Add(y Uint16x16) Uint16x16
func Uint16x16.AddPairs(y Uint16x16) Uint16x16
func Uint16x16.AddSaturated(y Uint16x16) Uint16x16
func Uint16x16.And(y Uint16x16) Uint16x16
func Uint16x16.AndNot(y Uint16x16) Uint16x16
func Uint16x16.Average(y Uint16x16) Uint16x16
func Uint16x16.ConcatPermute(y Uint16x16, indices Uint16x16) Uint16x16
func Uint16x16.ConcatPermute(y Uint16x16, indices Uint16x16) Uint16x16
func Uint16x16.Equal(y Uint16x16) Mask16x16
func Uint16x16.Greater(y Uint16x16) Mask16x16
func Uint16x16.GreaterEqual(y Uint16x16) Mask16x16
func Uint16x16.InterleaveHiGrouped(y Uint16x16) Uint16x16
func Uint16x16.InterleaveLoGrouped(y Uint16x16) Uint16x16
func Uint16x16.Less(y Uint16x16) Mask16x16
func Uint16x16.LessEqual(y Uint16x16) Mask16x16
func Uint16x16.Max(y Uint16x16) Uint16x16
func Uint16x16.Merge(y Uint16x16, mask Mask16x16) Uint16x16
func Uint16x16.Min(y Uint16x16) Uint16x16
func Uint16x16.Mul(y Uint16x16) Uint16x16
func Uint16x16.MulHigh(y Uint16x16) Uint16x16
func Uint16x16.NotEqual(y Uint16x16) Mask16x16
func Uint16x16.Or(y Uint16x16) Uint16x16
func Uint16x16.Permute(indices Uint16x16) Uint16x16
func Uint16x16.Select128FromPair(lo, hi uint8, y Uint16x16) Uint16x16
func Uint16x16.ShiftAllLeftConcat(shift uint8, y Uint16x16) Uint16x16
func Uint16x16.ShiftAllRightConcat(shift uint8, y Uint16x16) Uint16x16
func Uint16x16.ShiftLeft(y Uint16x16) Uint16x16
func Uint16x16.ShiftLeftConcat(y Uint16x16, z Uint16x16) Uint16x16
func Uint16x16.ShiftLeftConcat(y Uint16x16, z Uint16x16) Uint16x16
func Uint16x16.ShiftRight(y Uint16x16) Uint16x16
func Uint16x16.ShiftRightConcat(y Uint16x16, z Uint16x16) Uint16x16
func Uint16x16.ShiftRightConcat(y Uint16x16, z Uint16x16) Uint16x16
func Uint16x16.Sub(y Uint16x16) Uint16x16
func Uint16x16.SubPairs(y Uint16x16) Uint16x16
func Uint16x16.SubSaturated(y Uint16x16) Uint16x16
func Uint16x16.Xor(y Uint16x16) Uint16x16
func Uint16x32.SetHi(y Uint16x16) Uint16x32
func Uint16x32.SetLo(y Uint16x16) Uint16x32
Uint16x32 is a 512-bit SIMD vector of 32 uint16 Add adds corresponding elements of two vectors.
Asm: VPADDW, CPU Feature: AVX512 AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDUSW, CPU Feature: AVX512 And performs a bitwise AND operation between two vectors.
Asm: VPANDD, CPU Feature: AVX512 AndNot performs a bitwise x &^ y.
Asm: VPANDND, CPU Feature: AVX512 Float32x16 converts from Uint16x32 to Float32x16 Float64x8 converts from Uint16x32 to Float64x8 Int16x32 converts from Uint16x32 to Int16x32 Int32x16 converts from Uint16x32 to Int32x16 Int64x8 converts from Uint16x32 to Int64x8 Int8x64 converts from Uint16x32 to Int8x64 Uint32x16 converts from Uint16x32 to Uint32x16 Uint64x8 converts from Uint16x32 to Uint64x8 Uint8x64 converts from Uint16x32 to Uint8x64 Average computes the rounded average of corresponding elements.
Asm: VPAVGW, CPU Feature: AVX512 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2W, CPU Feature: AVX512 Equal returns x equals y, elementwise.
Asm: VPCMPEQW, CPU Feature: AVX512 Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDW, CPU Feature: AVX512VBMI2 GetHi returns the upper half of x.
Asm: VEXTRACTI64X4, CPU Feature: AVX512 GetLo returns the lower half of x.
Asm: VEXTRACTI64X4, CPU Feature: AVX512 Greater returns x greater-than y, elementwise.
Asm: VPCMPUW, CPU Feature: AVX512 GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VPCMPUW, CPU Feature: AVX512 InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHWD, CPU Feature: AVX512 InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLWD, CPU Feature: AVX512 Len returns the number of elements in a Uint16x32 Less returns x less-than y, elementwise.
Asm: VPCMPUW, CPU Feature: AVX512 LessEqual returns x less-than-or-equals y, elementwise.
Asm: VPCMPUW, CPU Feature: AVX512 Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXUW, CPU Feature: AVX512 Merge returns x but with elements set to y where m is false. Min computes the minimum of corresponding elements.
Asm: VPMINUW, CPU Feature: AVX512 Mul multiplies corresponding elements of two vectors.
Asm: VPMULLW, CPU Feature: AVX512 MulHigh multiplies elements and stores the high part of the result.
Asm: VPMULHUW, CPU Feature: AVX512 Not returns the bitwise complement of x
Emulated, CPU Feature AVX512 NotEqual returns x not-equals y, elementwise.
Asm: VPCMPUW, CPU Feature: AVX512 OnesCount counts the number of set bits in each element.
Asm: VPOPCNTW, CPU Feature: AVX512BITALG Or performs a bitwise OR operation between two vectors.
Asm: VPORD, CPU Feature: AVX512 Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 5 bits (values 0-31) of each element of indices is used
Asm: VPERMW, CPU Feature: AVX512 PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{ x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4],
x[8], x[9], x[10], x[11], x[a+12], x[b+12], x[c+12], x[d+12],
x[16], x[17], x[18], x[19], x[a+20], x[b+20], x[c+20], x[d+20],
x[24], x[25], x[26], x[27], x[a+28], x[b+28], x[c+28], x[d+28]}
Parameters a,b,c,d should have values between 0 and 3.
If a through d are constants, then an instruction will be inlined, otherwise
a jump table is generated.
Asm: VPSHUFHW, CPU Feature: AVX512 PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7],
x[a+8], x[b+8], x[c+8], x[d+8], x[12], x[13], x[14], x[15],
x[a+16], x[b+16], x[c+16], x[d+16], x[20], x[21], x[22], x[23],
x[a+24], x[b+24], x[c+24], x[d+24], x[28], x[29], x[30], x[31]}
Each group is of size 128-bit.
Parameters a,b,c,d should have values between 0 and 3.
If a through d are constants, then an instruction will be inlined, otherwise
a jump table is generated.
Asm: VPSHUFLW, CPU Feature: AVX512 SaturateToUint8 converts element values to uint8.
Conversion is done with saturation on the vector elements.
Asm: VPMOVUSWB, CPU Feature: AVX512 SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512 SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512 ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLW, CPU Feature: AVX512 ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDW, CPU Feature: AVX512VBMI2 ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
Asm: VPSRLW, CPU Feature: AVX512 ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDW, CPU Feature: AVX512VBMI2 ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVW, CPU Feature: AVX512 ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVW, CPU Feature: AVX512VBMI2 ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
Asm: VPSRLVW, CPU Feature: AVX512 ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVW, CPU Feature: AVX512VBMI2 Store stores a Uint16x32 to an array StoreMasked stores a Uint16x32 to an array,
at those elements enabled by mask
Asm: VMOVDQU16, CPU Feature: AVX512 StoreSlice stores x into a slice of at least 32 uint16s StoreSlicePart stores the 32 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 32 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBW, CPU Feature: AVX512 SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBUSW, CPU Feature: AVX512 TruncateToUint8 converts element values to uint8.
Conversion is done with truncation on the vector elements.
Asm: VPMOVWB, CPU Feature: AVX512 Xor performs a bitwise XOR operation between two vectors.
Asm: VPXORD, CPU Feature: AVX512
Uint16x32 : expvar.Var
Uint16x32 : fmt.Stringer
func BroadcastUint16x32(x uint16) Uint16x32
func LoadMaskedUint16x32(y *[32]uint16, mask Mask16x32) Uint16x32
func LoadUint16x32(y *[32]uint16) Uint16x32
func LoadUint16x32Slice(s []uint16) Uint16x32
func LoadUint16x32SlicePart(s []uint16) Uint16x32
func Float32x16.AsUint16x32() (to Uint16x32)
func Float64x8.AsUint16x32() (to Uint16x32)
func Int16x32.AsUint16x32() (to Uint16x32)
func Int32x16.AsUint16x32() (to Uint16x32)
func Int64x8.AsUint16x32() (to Uint16x32)
func Int8x64.AsUint16x32() (to Uint16x32)
func Uint16x32.Add(y Uint16x32) Uint16x32
func Uint16x32.AddSaturated(y Uint16x32) Uint16x32
func Uint16x32.And(y Uint16x32) Uint16x32
func Uint16x32.AndNot(y Uint16x32) Uint16x32
func Uint16x32.Average(y Uint16x32) Uint16x32
func Uint16x32.Compress(mask Mask16x32) Uint16x32
func Uint16x32.ConcatPermute(y Uint16x32, indices Uint16x32) Uint16x32
func Uint16x32.Expand(mask Mask16x32) Uint16x32
func Uint16x32.InterleaveHiGrouped(y Uint16x32) Uint16x32
func Uint16x32.InterleaveLoGrouped(y Uint16x32) Uint16x32
func Uint16x32.Masked(mask Mask16x32) Uint16x32
func Uint16x32.Max(y Uint16x32) Uint16x32
func Uint16x32.Merge(y Uint16x32, mask Mask16x32) Uint16x32
func Uint16x32.Min(y Uint16x32) Uint16x32
func Uint16x32.Mul(y Uint16x32) Uint16x32
func Uint16x32.MulHigh(y Uint16x32) Uint16x32
func Uint16x32.Not() Uint16x32
func Uint16x32.OnesCount() Uint16x32
func Uint16x32.Or(y Uint16x32) Uint16x32
func Uint16x32.Permute(indices Uint16x32) Uint16x32
func Uint16x32.PermuteScalarsHiGrouped(a, b, c, d uint8) Uint16x32
func Uint16x32.PermuteScalarsLoGrouped(a, b, c, d uint8) Uint16x32
func Uint16x32.SetHi(y Uint16x16) Uint16x32
func Uint16x32.SetLo(y Uint16x16) Uint16x32
func Uint16x32.ShiftAllLeft(y uint64) Uint16x32
func Uint16x32.ShiftAllLeftConcat(shift uint8, y Uint16x32) Uint16x32
func Uint16x32.ShiftAllRight(y uint64) Uint16x32
func Uint16x32.ShiftAllRightConcat(shift uint8, y Uint16x32) Uint16x32
func Uint16x32.ShiftLeft(y Uint16x32) Uint16x32
func Uint16x32.ShiftLeftConcat(y Uint16x32, z Uint16x32) Uint16x32
func Uint16x32.ShiftRight(y Uint16x32) Uint16x32
func Uint16x32.ShiftRightConcat(y Uint16x32, z Uint16x32) Uint16x32
func Uint16x32.Sub(y Uint16x32) Uint16x32
func Uint16x32.SubSaturated(y Uint16x32) Uint16x32
func Uint16x32.Xor(y Uint16x32) Uint16x32
func Uint16x8.Broadcast512() Uint16x32
func Uint32x16.AsUint16x32() (to Uint16x32)
func Uint32x16.SaturateToUint16Concat(y Uint32x16) Uint16x32
func Uint64x8.AsUint16x32() (to Uint16x32)
func Uint8x32.ExtendToUint16() Uint16x32
func Uint8x64.AsUint16x32() (to Uint16x32)
func Uint8x64.SumAbsDiff(y Uint8x64) Uint16x32
func Int16x32.ConcatPermute(y Int16x32, indices Uint16x32) Int16x32
func Int16x32.Permute(indices Uint16x32) Int16x32
func Uint16x32.Add(y Uint16x32) Uint16x32
func Uint16x32.AddSaturated(y Uint16x32) Uint16x32
func Uint16x32.And(y Uint16x32) Uint16x32
func Uint16x32.AndNot(y Uint16x32) Uint16x32
func Uint16x32.Average(y Uint16x32) Uint16x32
func Uint16x32.ConcatPermute(y Uint16x32, indices Uint16x32) Uint16x32
func Uint16x32.ConcatPermute(y Uint16x32, indices Uint16x32) Uint16x32
func Uint16x32.Equal(y Uint16x32) Mask16x32
func Uint16x32.Greater(y Uint16x32) Mask16x32
func Uint16x32.GreaterEqual(y Uint16x32) Mask16x32
func Uint16x32.InterleaveHiGrouped(y Uint16x32) Uint16x32
func Uint16x32.InterleaveLoGrouped(y Uint16x32) Uint16x32
func Uint16x32.Less(y Uint16x32) Mask16x32
func Uint16x32.LessEqual(y Uint16x32) Mask16x32
func Uint16x32.Max(y Uint16x32) Uint16x32
func Uint16x32.Merge(y Uint16x32, mask Mask16x32) Uint16x32
func Uint16x32.Min(y Uint16x32) Uint16x32
func Uint16x32.Mul(y Uint16x32) Uint16x32
func Uint16x32.MulHigh(y Uint16x32) Uint16x32
func Uint16x32.NotEqual(y Uint16x32) Mask16x32
func Uint16x32.Or(y Uint16x32) Uint16x32
func Uint16x32.Permute(indices Uint16x32) Uint16x32
func Uint16x32.ShiftAllLeftConcat(shift uint8, y Uint16x32) Uint16x32
func Uint16x32.ShiftAllRightConcat(shift uint8, y Uint16x32) Uint16x32
func Uint16x32.ShiftLeft(y Uint16x32) Uint16x32
func Uint16x32.ShiftLeftConcat(y Uint16x32, z Uint16x32) Uint16x32
func Uint16x32.ShiftLeftConcat(y Uint16x32, z Uint16x32) Uint16x32
func Uint16x32.ShiftRight(y Uint16x32) Uint16x32
func Uint16x32.ShiftRightConcat(y Uint16x32, z Uint16x32) Uint16x32
func Uint16x32.ShiftRightConcat(y Uint16x32, z Uint16x32) Uint16x32
func Uint16x32.Sub(y Uint16x32) Uint16x32
func Uint16x32.SubSaturated(y Uint16x32) Uint16x32
func Uint16x32.Xor(y Uint16x32) Uint16x32
Uint16x8 is a 128-bit SIMD vector of 8 uint16 Add adds corresponding elements of two vectors.
Asm: VPADDW, CPU Feature: AVX AddPairs horizontally adds adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VPHADDW, CPU Feature: AVX AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDUSW, CPU Feature: AVX And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX AndNot performs a bitwise x &^ y.
Asm: VPANDN, CPU Feature: AVX Float32x4 converts from Uint16x8 to Float32x4 Float64x2 converts from Uint16x8 to Float64x2 Int16x8 converts from Uint16x8 to Int16x8 Int32x4 converts from Uint16x8 to Int32x4 Int64x2 converts from Uint16x8 to Int64x2 Int8x16 converts from Uint16x8 to Int8x16 Uint32x4 converts from Uint16x8 to Uint32x4 Uint64x2 converts from Uint16x8 to Uint64x2 Uint8x16 converts from Uint16x8 to Uint8x16 Average computes the rounded average of corresponding elements.
Asm: VPAVGW, CPU Feature: AVX Broadcast128 copies element zero of its (128-bit) input to all elements of
the 128-bit output vector.
Asm: VPBROADCASTW, CPU Feature: AVX2 Broadcast256 copies element zero of its (128-bit) input to all elements of
the 256-bit output vector.
Asm: VPBROADCASTW, CPU Feature: AVX2 Broadcast512 copies element zero of its (128-bit) input to all elements of
the 512-bit output vector.
Asm: VPBROADCASTW, CPU Feature: AVX512 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2W, CPU Feature: AVX512 Equal returns x equals y, elementwise.
Asm: VPCMPEQW, CPU Feature: AVX Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDW, CPU Feature: AVX512VBMI2 ExtendLo2ToUint64x2 converts 2 lowest vector element values to uint64.
The result vector's elements are zero-extended.
Asm: VPMOVZXWQ, CPU Feature: AVX ExtendLo4ToUint32x4 converts 4 lowest vector element values to uint32.
The result vector's elements are zero-extended.
Asm: VPMOVZXWD, CPU Feature: AVX ExtendLo4ToUint64x4 converts 4 lowest vector element values to uint64.
The result vector's elements are zero-extended.
Asm: VPMOVZXWQ, CPU Feature: AVX2 ExtendToUint32 converts element values to uint32.
The result vector's elements are zero-extended.
Asm: VPMOVZXWD, CPU Feature: AVX2 ExtendToUint64 converts element values to uint64.
The result vector's elements are zero-extended.
Asm: VPMOVZXWQ, CPU Feature: AVX512 GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRW, CPU Feature: AVX512 Greater returns a mask whose elements indicate whether x > y
Emulated, CPU Feature AVX GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX InterleaveHi interleaves the elements of the high halves of x and y.
Asm: VPUNPCKHWD, CPU Feature: AVX InterleaveLo interleaves the elements of the low halves of x and y.
Asm: VPUNPCKLWD, CPU Feature: AVX IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x.
x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX Len returns the number of elements in a Uint16x8 Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXUW, CPU Feature: AVX Merge returns x but with elements set to y where mask is false. Min computes the minimum of corresponding elements.
Asm: VPMINUW, CPU Feature: AVX Mul multiplies corresponding elements of two vectors.
Asm: VPMULLW, CPU Feature: AVX MulHigh multiplies elements and stores the high part of the result.
Asm: VPMULHUW, CPU Feature: AVX Not returns the bitwise complement of x
Emulated, CPU Feature AVX NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX OnesCount counts the number of set bits in each element.
Asm: VPOPCNTW, CPU Feature: AVX512BITALG Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 3 bits (values 0-7) of each element of indices is used
Asm: VPERMW, CPU Feature: AVX512 PermuteScalarsHi performs a permutation of vector x using the supplied indices:
result = {x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4]}
Parameters a,b,c,d should have values between 0 and 3.
If a through d are constants, then an instruction will be inlined, otherwise
a jump table is generated.
Asm: VPSHUFHW, CPU Feature: AVX512 PermuteScalarsLo performs a permutation of vector x using the supplied indices:
result = {x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7]}
Parameters a,b,c,d should have values between 0 and 3.
If a through d are constants, then an instruction will be inlined, otherwise
a jump table is generated.
Asm: VPSHUFLW, CPU Feature: AVX512 SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRW, CPU Feature: AVX ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLW, CPU Feature: AVX ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDW, CPU Feature: AVX512VBMI2 ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
Asm: VPSRLW, CPU Feature: AVX ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDW, CPU Feature: AVX512VBMI2 ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVW, CPU Feature: AVX512 ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVW, CPU Feature: AVX512VBMI2 ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
Asm: VPSRLVW, CPU Feature: AVX512 ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVW, CPU Feature: AVX512VBMI2 Store stores a Uint16x8 to an array StoreSlice stores x into a slice of at least 8 uint16s StoreSlicePart stores the 8 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 8 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBW, CPU Feature: AVX SubPairs horizontally subtracts adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VPHSUBW, CPU Feature: AVX SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBUSW, CPU Feature: AVX TruncateToUint8 converts element values to uint8.
Conversion is done with truncation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVWB, CPU Feature: AVX512 Xor performs a bitwise XOR operation between two vectors.
Asm: VPXOR, CPU Feature: AVX
Uint16x8 : expvar.Var
Uint16x8 : fmt.Stringer
func BroadcastUint16x8(x uint16) Uint16x8
func LoadUint16x8(y *[8]uint16) Uint16x8
func LoadUint16x8Slice(s []uint16) Uint16x8
func LoadUint16x8SlicePart(s []uint16) Uint16x8
func Float32x4.AsUint16x8() (to Uint16x8)
func Float64x2.AsUint16x8() (to Uint16x8)
func Int16x8.AsUint16x8() (to Uint16x8)
func Int32x4.AsUint16x8() (to Uint16x8)
func Int64x2.AsUint16x8() (to Uint16x8)
func Int8x16.AsUint16x8() (to Uint16x8)
func Uint16x16.GetHi() Uint16x8
func Uint16x16.GetLo() Uint16x8
func Uint16x8.Add(y Uint16x8) Uint16x8
func Uint16x8.AddPairs(y Uint16x8) Uint16x8
func Uint16x8.AddSaturated(y Uint16x8) Uint16x8
func Uint16x8.And(y Uint16x8) Uint16x8
func Uint16x8.AndNot(y Uint16x8) Uint16x8
func Uint16x8.Average(y Uint16x8) Uint16x8
func Uint16x8.Broadcast128() Uint16x8
func Uint16x8.Compress(mask Mask16x8) Uint16x8
func Uint16x8.ConcatPermute(y Uint16x8, indices Uint16x8) Uint16x8
func Uint16x8.Expand(mask Mask16x8) Uint16x8
func Uint16x8.InterleaveHi(y Uint16x8) Uint16x8
func Uint16x8.InterleaveLo(y Uint16x8) Uint16x8
func Uint16x8.Masked(mask Mask16x8) Uint16x8
func Uint16x8.Max(y Uint16x8) Uint16x8
func Uint16x8.Merge(y Uint16x8, mask Mask16x8) Uint16x8
func Uint16x8.Min(y Uint16x8) Uint16x8
func Uint16x8.Mul(y Uint16x8) Uint16x8
func Uint16x8.MulHigh(y Uint16x8) Uint16x8
func Uint16x8.Not() Uint16x8
func Uint16x8.OnesCount() Uint16x8
func Uint16x8.Or(y Uint16x8) Uint16x8
func Uint16x8.Permute(indices Uint16x8) Uint16x8
func Uint16x8.PermuteScalarsHi(a, b, c, d uint8) Uint16x8
func Uint16x8.PermuteScalarsLo(a, b, c, d uint8) Uint16x8
func Uint16x8.SetElem(index uint8, y uint16) Uint16x8
func Uint16x8.ShiftAllLeft(y uint64) Uint16x8
func Uint16x8.ShiftAllLeftConcat(shift uint8, y Uint16x8) Uint16x8
func Uint16x8.ShiftAllRight(y uint64) Uint16x8
func Uint16x8.ShiftAllRightConcat(shift uint8, y Uint16x8) Uint16x8
func Uint16x8.ShiftLeft(y Uint16x8) Uint16x8
func Uint16x8.ShiftLeftConcat(y Uint16x8, z Uint16x8) Uint16x8
func Uint16x8.ShiftRight(y Uint16x8) Uint16x8
func Uint16x8.ShiftRightConcat(y Uint16x8, z Uint16x8) Uint16x8
func Uint16x8.Sub(y Uint16x8) Uint16x8
func Uint16x8.SubPairs(y Uint16x8) Uint16x8
func Uint16x8.SubSaturated(y Uint16x8) Uint16x8
func Uint16x8.Xor(y Uint16x8) Uint16x8
func Uint32x4.AsUint16x8() (to Uint16x8)
func Uint32x4.SaturateToUint16() Uint16x8
func Uint32x4.SaturateToUint16Concat(y Uint32x4) Uint16x8
func Uint32x4.TruncateToUint16() Uint16x8
func Uint32x8.SaturateToUint16() Uint16x8
func Uint32x8.TruncateToUint16() Uint16x8
func Uint64x2.AsUint16x8() (to Uint16x8)
func Uint64x2.SaturateToUint16() Uint16x8
func Uint64x2.TruncateToUint16() Uint16x8
func Uint64x4.SaturateToUint16() Uint16x8
func Uint64x4.TruncateToUint16() Uint16x8
func Uint64x8.SaturateToUint16() Uint16x8
func Uint64x8.TruncateToUint16() Uint16x8
func Uint8x16.AsUint16x8() (to Uint16x8)
func Uint8x16.ExtendLo8ToUint16x8() Uint16x8
func Uint8x16.SumAbsDiff(y Uint8x16) Uint16x8
func Int16x8.ConcatPermute(y Int16x8, indices Uint16x8) Int16x8
func Int16x8.Permute(indices Uint16x8) Int16x8
func Uint16x16.SetHi(y Uint16x8) Uint16x16
func Uint16x16.SetLo(y Uint16x8) Uint16x16
func Uint16x8.Add(y Uint16x8) Uint16x8
func Uint16x8.AddPairs(y Uint16x8) Uint16x8
func Uint16x8.AddSaturated(y Uint16x8) Uint16x8
func Uint16x8.And(y Uint16x8) Uint16x8
func Uint16x8.AndNot(y Uint16x8) Uint16x8
func Uint16x8.Average(y Uint16x8) Uint16x8
func Uint16x8.ConcatPermute(y Uint16x8, indices Uint16x8) Uint16x8
func Uint16x8.ConcatPermute(y Uint16x8, indices Uint16x8) Uint16x8
func Uint16x8.Equal(y Uint16x8) Mask16x8
func Uint16x8.Greater(y Uint16x8) Mask16x8
func Uint16x8.GreaterEqual(y Uint16x8) Mask16x8
func Uint16x8.InterleaveHi(y Uint16x8) Uint16x8
func Uint16x8.InterleaveLo(y Uint16x8) Uint16x8
func Uint16x8.Less(y Uint16x8) Mask16x8
func Uint16x8.LessEqual(y Uint16x8) Mask16x8
func Uint16x8.Max(y Uint16x8) Uint16x8
func Uint16x8.Merge(y Uint16x8, mask Mask16x8) Uint16x8
func Uint16x8.Min(y Uint16x8) Uint16x8
func Uint16x8.Mul(y Uint16x8) Uint16x8
func Uint16x8.MulHigh(y Uint16x8) Uint16x8
func Uint16x8.NotEqual(y Uint16x8) Mask16x8
func Uint16x8.Or(y Uint16x8) Uint16x8
func Uint16x8.Permute(indices Uint16x8) Uint16x8
func Uint16x8.ShiftAllLeftConcat(shift uint8, y Uint16x8) Uint16x8
func Uint16x8.ShiftAllRightConcat(shift uint8, y Uint16x8) Uint16x8
func Uint16x8.ShiftLeft(y Uint16x8) Uint16x8
func Uint16x8.ShiftLeftConcat(y Uint16x8, z Uint16x8) Uint16x8
func Uint16x8.ShiftLeftConcat(y Uint16x8, z Uint16x8) Uint16x8
func Uint16x8.ShiftRight(y Uint16x8) Uint16x8
func Uint16x8.ShiftRightConcat(y Uint16x8, z Uint16x8) Uint16x8
func Uint16x8.ShiftRightConcat(y Uint16x8, z Uint16x8) Uint16x8
func Uint16x8.Sub(y Uint16x8) Uint16x8
func Uint16x8.SubPairs(y Uint16x8) Uint16x8
func Uint16x8.SubSaturated(y Uint16x8) Uint16x8
func Uint16x8.Xor(y Uint16x8) Uint16x8
Uint32x16 is a 512-bit SIMD vector of 16 uint32 Add adds corresponding elements of two vectors.
Asm: VPADDD, CPU Feature: AVX512 And performs a bitwise AND operation between two vectors.
Asm: VPANDD, CPU Feature: AVX512 AndNot performs a bitwise x &^ y.
Asm: VPANDND, CPU Feature: AVX512 Float32x16 converts from Uint32x16 to Float32x16 Float64x8 converts from Uint32x16 to Float64x8 Int16x32 converts from Uint32x16 to Int16x32 Int32x16 converts from Uint32x16 to Int32x16 Int64x8 converts from Uint32x16 to Int64x8 Int8x64 converts from Uint32x16 to Int8x64 Uint16x32 converts from Uint32x16 to Uint16x32 Uint64x8 converts from Uint32x16 to Uint64x8 Uint8x64 converts from Uint32x16 to Uint8x64 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSD, CPU Feature: AVX512 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2D, CPU Feature: AVX512 ConvertToFloat32 converts element values to float32.
Asm: VCVTUDQ2PS, CPU Feature: AVX512 Equal returns x equals y, elementwise.
Asm: VPCMPEQD, CPU Feature: AVX512 Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDD, CPU Feature: AVX512 GetHi returns the upper half of x.
Asm: VEXTRACTI64X4, CPU Feature: AVX512 GetLo returns the lower half of x.
Asm: VEXTRACTI64X4, CPU Feature: AVX512 Greater returns x greater-than y, elementwise.
Asm: VPCMPUD, CPU Feature: AVX512 GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VPCMPUD, CPU Feature: AVX512 InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHDQ, CPU Feature: AVX512 InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLDQ, CPU Feature: AVX512 LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTD, CPU Feature: AVX512 Len returns the number of elements in a Uint32x16 Less returns x less-than y, elementwise.
Asm: VPCMPUD, CPU Feature: AVX512 LessEqual returns x less-than-or-equals y, elementwise.
Asm: VPCMPUD, CPU Feature: AVX512 Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXUD, CPU Feature: AVX512 Merge returns x but with elements set to y where m is false. Min computes the minimum of corresponding elements.
Asm: VPMINUD, CPU Feature: AVX512 Mul multiplies corresponding elements of two vectors.
Asm: VPMULLD, CPU Feature: AVX512 Not returns the bitwise complement of x
Emulated, CPU Feature AVX512 NotEqual returns x not-equals y, elementwise.
Asm: VPCMPUD, CPU Feature: AVX512 OnesCount counts the number of set bits in each element.
Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ Or performs a bitwise OR operation between two vectors.
Asm: VPORD, CPU Feature: AVX512 Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 4 bits (values 0-15) of each element of indices is used
Asm: VPERMD, CPU Feature: AVX512 PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{ x[a], x[b], x[c], x[d], x[a+4], x[b+4], x[c+4], x[d+4],
x[a+8], x[b+8], x[c+8], x[d+8], x[a+12], x[b+12], x[c+12], x[d+12]}
Parameters a,b,c,d should have values between 0 and 3.
If a through d are constants, then an instruction will be inlined, otherwise
a jump table is generated.
Asm: VPSHUFD, CPU Feature: AVX512 RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLD, CPU Feature: AVX512 RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORD, CPU Feature: AVX512 RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVD, CPU Feature: AVX512 RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVD, CPU Feature: AVX512 SaturateToUint16 converts element values to uint16.
Conversion is done with saturation on the vector elements.
Asm: VPMOVUSDW, CPU Feature: AVX512 SaturateToUint16Concat converts element values to uint16.
With each 128-bit as a group:
The converted group from the first input vector will be packed to the lower part of the result vector,
the converted group from the second input vector will be packed to the upper part of the result vector.
Conversion is done with saturation on the vector elements.
Asm: VPACKUSDW, CPU Feature: AVX512 SelectFromPairGrouped returns, for each of the four 128-bit subvectors
of the vectors x and y, the selection of four elements from x and y,
where selector values in the range 0-3 specify elements from x and
values in the range 4-7 specify the 0-3 elements of y.
When the selectors are constants and can be the selection
can be implemented in a single instruction, it will be, otherwise
it requires two.
If the selectors are not constant this will translate to a function
call.
Asm: VSHUFPS, CPU Feature: AVX512 SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512 SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512 ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLD, CPU Feature: AVX512 ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDD, CPU Feature: AVX512VBMI2 ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
Asm: VPSRLD, CPU Feature: AVX512 ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDD, CPU Feature: AVX512VBMI2 ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVD, CPU Feature: AVX512 ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVD, CPU Feature: AVX512VBMI2 ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
Asm: VPSRLVD, CPU Feature: AVX512 ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVD, CPU Feature: AVX512VBMI2 Store stores a Uint32x16 to an array StoreMasked stores a Uint32x16 to an array,
at those elements enabled by mask
Asm: VMOVDQU32, CPU Feature: AVX512 StoreSlice stores x into a slice of at least 16 uint32s StoreSlicePart stores the 16 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 16 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBD, CPU Feature: AVX512 TruncateToUint16 converts element values to uint16.
Conversion is done with truncation on the vector elements.
Asm: VPMOVDW, CPU Feature: AVX512 TruncateToUint8 converts element values to uint8.
Conversion is done with truncation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVDB, CPU Feature: AVX512 Xor performs a bitwise XOR operation between two vectors.
Asm: VPXORD, CPU Feature: AVX512
Uint32x16 : expvar.Var
Uint32x16 : fmt.Stringer
func BroadcastUint32x16(x uint32) Uint32x16
func LoadMaskedUint32x16(y *[16]uint32, mask Mask32x16) Uint32x16
func LoadUint32x16(y *[16]uint32) Uint32x16
func LoadUint32x16Slice(s []uint32) Uint32x16
func LoadUint32x16SlicePart(s []uint32) Uint32x16
func Float32x16.AsUint32x16() (to Uint32x16)
func Float32x16.ConvertToUint32() Uint32x16
func Float64x8.AsUint32x16() (to Uint32x16)
func Int16x32.AsUint32x16() (to Uint32x16)
func Int32x16.AsUint32x16() (to Uint32x16)
func Int64x8.AsUint32x16() (to Uint32x16)
func Int8x64.AsUint32x16() (to Uint32x16)
func Uint16x16.ExtendToUint32() Uint32x16
func Uint16x32.AsUint32x16() (to Uint32x16)
func Uint32x16.Add(y Uint32x16) Uint32x16
func Uint32x16.And(y Uint32x16) Uint32x16
func Uint32x16.AndNot(y Uint32x16) Uint32x16
func Uint32x16.Compress(mask Mask32x16) Uint32x16
func Uint32x16.ConcatPermute(y Uint32x16, indices Uint32x16) Uint32x16
func Uint32x16.Expand(mask Mask32x16) Uint32x16
func Uint32x16.InterleaveHiGrouped(y Uint32x16) Uint32x16
func Uint32x16.InterleaveLoGrouped(y Uint32x16) Uint32x16
func Uint32x16.LeadingZeros() Uint32x16
func Uint32x16.Masked(mask Mask32x16) Uint32x16
func Uint32x16.Max(y Uint32x16) Uint32x16
func Uint32x16.Merge(y Uint32x16, mask Mask32x16) Uint32x16
func Uint32x16.Min(y Uint32x16) Uint32x16
func Uint32x16.Mul(y Uint32x16) Uint32x16
func Uint32x16.Not() Uint32x16
func Uint32x16.OnesCount() Uint32x16
func Uint32x16.Or(y Uint32x16) Uint32x16
func Uint32x16.Permute(indices Uint32x16) Uint32x16
func Uint32x16.PermuteScalarsGrouped(a, b, c, d uint8) Uint32x16
func Uint32x16.RotateAllLeft(shift uint8) Uint32x16
func Uint32x16.RotateAllRight(shift uint8) Uint32x16
func Uint32x16.RotateLeft(y Uint32x16) Uint32x16
func Uint32x16.RotateRight(y Uint32x16) Uint32x16
func Uint32x16.SelectFromPairGrouped(a, b, c, d uint8, y Uint32x16) Uint32x16
func Uint32x16.SetHi(y Uint32x8) Uint32x16
func Uint32x16.SetLo(y Uint32x8) Uint32x16
func Uint32x16.ShiftAllLeft(y uint64) Uint32x16
func Uint32x16.ShiftAllLeftConcat(shift uint8, y Uint32x16) Uint32x16
func Uint32x16.ShiftAllRight(y uint64) Uint32x16
func Uint32x16.ShiftAllRightConcat(shift uint8, y Uint32x16) Uint32x16
func Uint32x16.ShiftLeft(y Uint32x16) Uint32x16
func Uint32x16.ShiftLeftConcat(y Uint32x16, z Uint32x16) Uint32x16
func Uint32x16.ShiftRight(y Uint32x16) Uint32x16
func Uint32x16.ShiftRightConcat(y Uint32x16, z Uint32x16) Uint32x16
func Uint32x16.Sub(y Uint32x16) Uint32x16
func Uint32x16.Xor(y Uint32x16) Uint32x16
func Uint32x4.Broadcast512() Uint32x16
func Uint64x8.AsUint32x16() (to Uint32x16)
func Uint8x16.ExtendToUint32() Uint32x16
func Uint8x64.AsUint32x16() (to Uint32x16)
func Float32x16.ConcatPermute(y Float32x16, indices Uint32x16) Float32x16
func Float32x16.Permute(indices Uint32x16) Float32x16
func Int32x16.ConcatPermute(y Int32x16, indices Uint32x16) Int32x16
func Int32x16.Permute(indices Uint32x16) Int32x16
func Uint32x16.Add(y Uint32x16) Uint32x16
func Uint32x16.And(y Uint32x16) Uint32x16
func Uint32x16.AndNot(y Uint32x16) Uint32x16
func Uint32x16.ConcatPermute(y Uint32x16, indices Uint32x16) Uint32x16
func Uint32x16.ConcatPermute(y Uint32x16, indices Uint32x16) Uint32x16
func Uint32x16.Equal(y Uint32x16) Mask32x16
func Uint32x16.Greater(y Uint32x16) Mask32x16
func Uint32x16.GreaterEqual(y Uint32x16) Mask32x16
func Uint32x16.InterleaveHiGrouped(y Uint32x16) Uint32x16
func Uint32x16.InterleaveLoGrouped(y Uint32x16) Uint32x16
func Uint32x16.Less(y Uint32x16) Mask32x16
func Uint32x16.LessEqual(y Uint32x16) Mask32x16
func Uint32x16.Max(y Uint32x16) Uint32x16
func Uint32x16.Merge(y Uint32x16, mask Mask32x16) Uint32x16
func Uint32x16.Min(y Uint32x16) Uint32x16
func Uint32x16.Mul(y Uint32x16) Uint32x16
func Uint32x16.NotEqual(y Uint32x16) Mask32x16
func Uint32x16.Or(y Uint32x16) Uint32x16
func Uint32x16.Permute(indices Uint32x16) Uint32x16
func Uint32x16.RotateLeft(y Uint32x16) Uint32x16
func Uint32x16.RotateRight(y Uint32x16) Uint32x16
func Uint32x16.SaturateToUint16Concat(y Uint32x16) Uint16x32
func Uint32x16.SelectFromPairGrouped(a, b, c, d uint8, y Uint32x16) Uint32x16
func Uint32x16.ShiftAllLeftConcat(shift uint8, y Uint32x16) Uint32x16
func Uint32x16.ShiftAllRightConcat(shift uint8, y Uint32x16) Uint32x16
func Uint32x16.ShiftLeft(y Uint32x16) Uint32x16
func Uint32x16.ShiftLeftConcat(y Uint32x16, z Uint32x16) Uint32x16
func Uint32x16.ShiftLeftConcat(y Uint32x16, z Uint32x16) Uint32x16
func Uint32x16.ShiftRight(y Uint32x16) Uint32x16
func Uint32x16.ShiftRightConcat(y Uint32x16, z Uint32x16) Uint32x16
func Uint32x16.ShiftRightConcat(y Uint32x16, z Uint32x16) Uint32x16
func Uint32x16.Sub(y Uint32x16) Uint32x16
func Uint32x16.Xor(y Uint32x16) Uint32x16
func Uint8x64.AESDecryptLastRound(y Uint32x16) Uint8x64
func Uint8x64.AESDecryptOneRound(y Uint32x16) Uint8x64
func Uint8x64.AESEncryptLastRound(y Uint32x16) Uint8x64
func Uint8x64.AESEncryptOneRound(y Uint32x16) Uint8x64
Uint32x4 is a 128-bit SIMD vector of 4 uint32 AESInvMixColumns performs the InvMixColumns operation in AES cipher algorithm defined in FIPS 197.
x is the chunk of w array in use.
result = InvMixColumns(x)
Asm: VAESIMC, CPU Feature: AVX, AES AESRoundKeyGenAssist performs some components of KeyExpansion in AES cipher algorithm defined in FIPS 197.
x is an array of AES words, but only x[0] and x[2] are used.
r is a value from the Rcon constant array.
result[0] = XOR(SubWord(RotWord(x[0])), r)
result[1] = SubWord(x[1])
result[2] = XOR(SubWord(RotWord(x[2])), r)
result[3] = SubWord(x[3])
rconVal results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VAESKEYGENASSIST, CPU Feature: AVX, AES Add adds corresponding elements of two vectors.
Asm: VPADDD, CPU Feature: AVX AddPairs horizontally adds adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VPHADDD, CPU Feature: AVX And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX AndNot performs a bitwise x &^ y.
Asm: VPANDN, CPU Feature: AVX Float32x4 converts from Uint32x4 to Float32x4 Float64x2 converts from Uint32x4 to Float64x2 Int16x8 converts from Uint32x4 to Int16x8 Int32x4 converts from Uint32x4 to Int32x4 Int64x2 converts from Uint32x4 to Int64x2 Int8x16 converts from Uint32x4 to Int8x16 Uint16x8 converts from Uint32x4 to Uint16x8 Uint64x2 converts from Uint32x4 to Uint64x2 Uint8x16 converts from Uint32x4 to Uint8x16 Broadcast128 copies element zero of its (128-bit) input to all elements of
the 128-bit output vector.
Asm: VPBROADCASTD, CPU Feature: AVX2 Broadcast256 copies element zero of its (128-bit) input to all elements of
the 256-bit output vector.
Asm: VPBROADCASTD, CPU Feature: AVX2 Broadcast512 copies element zero of its (128-bit) input to all elements of
the 512-bit output vector.
Asm: VPBROADCASTD, CPU Feature: AVX512 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSD, CPU Feature: AVX512 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2D, CPU Feature: AVX512 ConvertToFloat32 converts element values to float32.
Asm: VCVTUDQ2PS, CPU Feature: AVX512 ConvertToFloat64 converts element values to float64.
Asm: VCVTUDQ2PD, CPU Feature: AVX512 Equal returns x equals y, elementwise.
Asm: VPCMPEQD, CPU Feature: AVX Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDD, CPU Feature: AVX512 ExtendLo2ToUint64x2 converts 2 lowest vector element values to uint64.
The result vector's elements are zero-extended.
Asm: VPMOVZXDQ, CPU Feature: AVX ExtendToUint64 converts element values to uint64.
The result vector's elements are zero-extended.
Asm: VPMOVZXDQ, CPU Feature: AVX2 GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRD, CPU Feature: AVX Greater returns a mask whose elements indicate whether x > y
Emulated, CPU Feature AVX GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX InterleaveHi interleaves the elements of the high halves of x and y.
Asm: VPUNPCKHDQ, CPU Feature: AVX InterleaveLo interleaves the elements of the low halves of x and y.
Asm: VPUNPCKLDQ, CPU Feature: AVX IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x.
x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTD, CPU Feature: AVX512 Len returns the number of elements in a Uint32x4 Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXUD, CPU Feature: AVX Merge returns x but with elements set to y where mask is false. Min computes the minimum of corresponding elements.
Asm: VPMINUD, CPU Feature: AVX Mul multiplies corresponding elements of two vectors.
Asm: VPMULLD, CPU Feature: AVX MulEvenWiden multiplies even-indexed elements, widening the result.
Result[i] = v1.Even[i] * v2.Even[i].
Asm: VPMULUDQ, CPU Feature: AVX Not returns the bitwise complement of x
Emulated, CPU Feature AVX NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX OnesCount counts the number of set bits in each element.
Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX PermuteScalars performs a permutation of vector x's elements using the supplied indices:
result = {x[a], x[b], x[c], x[d]}
Parameters a,b,c,d should have values between 0 and 3.
If a through d are constants, then an instruction will be inlined, otherwise
a jump table may be generated.
Asm: VPSHUFD, CPU Feature: AVX RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLD, CPU Feature: AVX512 RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORD, CPU Feature: AVX512 RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVD, CPU Feature: AVX512 RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVD, CPU Feature: AVX512 SHA1FourRounds performs 4 rounds of B loop in SHA1 algorithm defined in FIPS 180-4.
x contains the state variables a, b, c and d from upper to lower order.
y contains the W array elements (with the state variable e added to the upper element) from upper to lower order.
result = the state variables a', b', c', d' updated after 4 rounds.
constant = 0 for the first 20 rounds of the loop, 1 for the next 20 rounds of the loop..., 3 for the last 20 rounds of the loop.
constant results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: SHA1RNDS4, CPU Feature: SHA SHA1Message1 does the XORing of 1 in SHA1 algorithm defined in FIPS 180-4.
x = {W3, W2, W1, W0}
y = {0, 0, W5, W4}
result = {W3^W5, W2^W4, W1^W3, W0^W2}.
Asm: SHA1MSG1, CPU Feature: SHA SHA1Message2 does the calculation of 3 and 4 in SHA1 algorithm defined in FIPS 180-4.
x = result of 2.
y = {W15, W14, W13}
result = {W19, W18, W17, W16}
Asm: SHA1MSG2, CPU Feature: SHA SHA1NextE calculates the state variable e' updated after 4 rounds in SHA1 algorithm defined in FIPS 180-4.
x contains the state variable a (before the 4 rounds), placed in the upper element.
y is the elements of W array for next 4 rounds from upper to lower order.
result = the elements of the W array for the next 4 rounds, with the updated state variable e' added to the upper element,
from upper to lower order.
For the last round of the loop, you can specify zero for y to obtain the e' value itself, or better off specifying H4:0:0:0
for y to get e' added to H4. (Note that the value of e' is computed only from x, and values of y don't affect the
computation of the value of e'.)
Asm: SHA1NEXTE, CPU Feature: SHA SHA256Message1 does the sigma and addtion of 1 in SHA1 algorithm defined in FIPS 180-4.
x = {W0, W1, W2, W3}
y = {W4, 0, 0, 0}
result = {W0+σ(W1), W1+σ(W2), W2+σ(W3), W3+σ(W4)}
Asm: SHA256MSG1, CPU Feature: SHA SHA256Message2 does the sigma and addition of 3 in SHA1 algorithm defined in FIPS 180-4.
x = result of 2
y = {0, 0, W14, W15}
result = {W16, W17, W18, W19}
Asm: SHA256MSG2, CPU Feature: SHA SHA256TwoRounds does 2 rounds of B loop to calculate updated state variables in SHA1 algorithm defined in FIPS 180-4.
x = {h, g, d, c}
y = {f, e, b, a}
z = {W0+K0, W1+K1}
result = {f', e', b', a'}
The K array is a 64-DWORD constant array defined in page 11 of FIPS 180-4. Each element of the K array is to be added to
the corresponding element of the W array to make the input data z.
The updated state variables c', d', g', h' are not returned by this instruction, because they are equal to the input data
y (the state variables a, b, e, f before the 2 rounds).
Asm: SHA256RNDS2, CPU Feature: SHA SaturateToUint16 converts element values to uint16.
Conversion is done with saturation on the vector elements.
Asm: VPMOVUSDW, CPU Feature: AVX512 SaturateToUint16Concat converts element values to uint16.
With each 128-bit as a group:
The converted group from the first input vector will be packed to the lower part of the result vector,
the converted group from the second input vector will be packed to the upper part of the result vector.
Conversion is done with saturation on the vector elements.
Asm: VPACKUSDW, CPU Feature: AVX SelectFromPair returns the selection of four elements from the two
vectors x and y, where selector values in the range 0-3 specify
elements from x and values in the range 4-7 specify the 0-3 elements
of y. When the selectors are constants and can be the selection
can be implemented in a single instruction, it will be, otherwise
it requires two. a is the source index of the least element in the
output, and b, c, and d are the indices of the 2nd, 3rd, and 4th
elements in the output. For example,
{1,2,4,8}.SelectFromPair(2,3,5,7,{9,25,49,81}) returns {4,8,25,81}
If the selectors are not constant this will translate to a function
call.
Asm: VSHUFPS, CPU Feature: AVX SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRD, CPU Feature: AVX ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLD, CPU Feature: AVX ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDD, CPU Feature: AVX512VBMI2 ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
Asm: VPSRLD, CPU Feature: AVX ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDD, CPU Feature: AVX512VBMI2 ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVD, CPU Feature: AVX2 ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVD, CPU Feature: AVX512VBMI2 ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
Asm: VPSRLVD, CPU Feature: AVX2 ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVD, CPU Feature: AVX512VBMI2 Store stores a Uint32x4 to an array StoreMasked stores a Uint32x4 to an array,
at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2 StoreSlice stores x into a slice of at least 4 uint32s StoreSlicePart stores the 4 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 4 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBD, CPU Feature: AVX SubPairs horizontally subtracts adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VPHSUBD, CPU Feature: AVX TruncateToUint16 converts element values to uint16.
Conversion is done with truncation on the vector elements.
Asm: VPMOVDW, CPU Feature: AVX512 TruncateToUint8 converts element values to uint8.
Conversion is done with truncation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVDB, CPU Feature: AVX512 Xor performs a bitwise XOR operation between two vectors.
Asm: VPXOR, CPU Feature: AVX
Uint32x4 : expvar.Var
Uint32x4 : fmt.Stringer
func BroadcastUint32x4(x uint32) Uint32x4
func LoadMaskedUint32x4(y *[4]uint32, mask Mask32x4) Uint32x4
func LoadUint32x4(y *[4]uint32) Uint32x4
func LoadUint32x4Slice(s []uint32) Uint32x4
func LoadUint32x4SlicePart(s []uint32) Uint32x4
func Float32x4.AsUint32x4() (to Uint32x4)
func Float32x4.ConvertToUint32() Uint32x4
func Float64x2.AsUint32x4() (to Uint32x4)
func Float64x2.ConvertToUint32() Uint32x4
func Float64x4.ConvertToUint32() Uint32x4
func Int16x8.AsUint32x4() (to Uint32x4)
func Int32x4.AsUint32x4() (to Uint32x4)
func Int64x2.AsUint32x4() (to Uint32x4)
func Int8x16.AsUint32x4() (to Uint32x4)
func Uint16x8.AsUint32x4() (to Uint32x4)
func Uint16x8.ExtendLo4ToUint32x4() Uint32x4
func Uint32x4.Add(y Uint32x4) Uint32x4
func Uint32x4.AddPairs(y Uint32x4) Uint32x4
func Uint32x4.AESInvMixColumns() Uint32x4
func Uint32x4.AESRoundKeyGenAssist(rconVal uint8) Uint32x4
func Uint32x4.And(y Uint32x4) Uint32x4
func Uint32x4.AndNot(y Uint32x4) Uint32x4
func Uint32x4.Broadcast128() Uint32x4
func Uint32x4.Compress(mask Mask32x4) Uint32x4
func Uint32x4.ConcatPermute(y Uint32x4, indices Uint32x4) Uint32x4
func Uint32x4.Expand(mask Mask32x4) Uint32x4
func Uint32x4.InterleaveHi(y Uint32x4) Uint32x4
func Uint32x4.InterleaveLo(y Uint32x4) Uint32x4
func Uint32x4.LeadingZeros() Uint32x4
func Uint32x4.Masked(mask Mask32x4) Uint32x4
func Uint32x4.Max(y Uint32x4) Uint32x4
func Uint32x4.Merge(y Uint32x4, mask Mask32x4) Uint32x4
func Uint32x4.Min(y Uint32x4) Uint32x4
func Uint32x4.Mul(y Uint32x4) Uint32x4
func Uint32x4.Not() Uint32x4
func Uint32x4.OnesCount() Uint32x4
func Uint32x4.Or(y Uint32x4) Uint32x4
func Uint32x4.PermuteScalars(a, b, c, d uint8) Uint32x4
func Uint32x4.RotateAllLeft(shift uint8) Uint32x4
func Uint32x4.RotateAllRight(shift uint8) Uint32x4
func Uint32x4.RotateLeft(y Uint32x4) Uint32x4
func Uint32x4.RotateRight(y Uint32x4) Uint32x4
func Uint32x4.SelectFromPair(a, b, c, d uint8, y Uint32x4) Uint32x4
func Uint32x4.SetElem(index uint8, y uint32) Uint32x4
func Uint32x4.SHA1FourRounds(constant uint8, y Uint32x4) Uint32x4
func Uint32x4.SHA1Message1(y Uint32x4) Uint32x4
func Uint32x4.SHA1Message2(y Uint32x4) Uint32x4
func Uint32x4.SHA1NextE(y Uint32x4) Uint32x4
func Uint32x4.SHA256Message1(y Uint32x4) Uint32x4
func Uint32x4.SHA256Message2(y Uint32x4) Uint32x4
func Uint32x4.SHA256TwoRounds(y Uint32x4, z Uint32x4) Uint32x4
func Uint32x4.ShiftAllLeft(y uint64) Uint32x4
func Uint32x4.ShiftAllLeftConcat(shift uint8, y Uint32x4) Uint32x4
func Uint32x4.ShiftAllRight(y uint64) Uint32x4
func Uint32x4.ShiftAllRightConcat(shift uint8, y Uint32x4) Uint32x4
func Uint32x4.ShiftLeft(y Uint32x4) Uint32x4
func Uint32x4.ShiftLeftConcat(y Uint32x4, z Uint32x4) Uint32x4
func Uint32x4.ShiftRight(y Uint32x4) Uint32x4
func Uint32x4.ShiftRightConcat(y Uint32x4, z Uint32x4) Uint32x4
func Uint32x4.Sub(y Uint32x4) Uint32x4
func Uint32x4.SubPairs(y Uint32x4) Uint32x4
func Uint32x4.Xor(y Uint32x4) Uint32x4
func Uint32x8.GetHi() Uint32x4
func Uint32x8.GetLo() Uint32x4
func Uint64x2.AsUint32x4() (to Uint32x4)
func Uint64x2.SaturateToUint32() Uint32x4
func Uint64x2.TruncateToUint32() Uint32x4
func Uint64x4.SaturateToUint32() Uint32x4
func Uint64x4.TruncateToUint32() Uint32x4
func Uint8x16.AsUint32x4() (to Uint32x4)
func Uint8x16.ExtendLo4ToUint32x4() Uint32x4
func Float32x4.ConcatPermute(y Float32x4, indices Uint32x4) Float32x4
func Int32x4.ConcatPermute(y Int32x4, indices Uint32x4) Int32x4
func Uint32x4.Add(y Uint32x4) Uint32x4
func Uint32x4.AddPairs(y Uint32x4) Uint32x4
func Uint32x4.And(y Uint32x4) Uint32x4
func Uint32x4.AndNot(y Uint32x4) Uint32x4
func Uint32x4.ConcatPermute(y Uint32x4, indices Uint32x4) Uint32x4
func Uint32x4.ConcatPermute(y Uint32x4, indices Uint32x4) Uint32x4
func Uint32x4.Equal(y Uint32x4) Mask32x4
func Uint32x4.Greater(y Uint32x4) Mask32x4
func Uint32x4.GreaterEqual(y Uint32x4) Mask32x4
func Uint32x4.InterleaveHi(y Uint32x4) Uint32x4
func Uint32x4.InterleaveLo(y Uint32x4) Uint32x4
func Uint32x4.Less(y Uint32x4) Mask32x4
func Uint32x4.LessEqual(y Uint32x4) Mask32x4
func Uint32x4.Max(y Uint32x4) Uint32x4
func Uint32x4.Merge(y Uint32x4, mask Mask32x4) Uint32x4
func Uint32x4.Min(y Uint32x4) Uint32x4
func Uint32x4.Mul(y Uint32x4) Uint32x4
func Uint32x4.MulEvenWiden(y Uint32x4) Uint64x2
func Uint32x4.NotEqual(y Uint32x4) Mask32x4
func Uint32x4.Or(y Uint32x4) Uint32x4
func Uint32x4.RotateLeft(y Uint32x4) Uint32x4
func Uint32x4.RotateRight(y Uint32x4) Uint32x4
func Uint32x4.SaturateToUint16Concat(y Uint32x4) Uint16x8
func Uint32x4.SelectFromPair(a, b, c, d uint8, y Uint32x4) Uint32x4
func Uint32x4.SHA1FourRounds(constant uint8, y Uint32x4) Uint32x4
func Uint32x4.SHA1Message1(y Uint32x4) Uint32x4
func Uint32x4.SHA1Message2(y Uint32x4) Uint32x4
func Uint32x4.SHA1NextE(y Uint32x4) Uint32x4
func Uint32x4.SHA256Message1(y Uint32x4) Uint32x4
func Uint32x4.SHA256Message2(y Uint32x4) Uint32x4
func Uint32x4.SHA256TwoRounds(y Uint32x4, z Uint32x4) Uint32x4
func Uint32x4.SHA256TwoRounds(y Uint32x4, z Uint32x4) Uint32x4
func Uint32x4.ShiftAllLeftConcat(shift uint8, y Uint32x4) Uint32x4
func Uint32x4.ShiftAllRightConcat(shift uint8, y Uint32x4) Uint32x4
func Uint32x4.ShiftLeft(y Uint32x4) Uint32x4
func Uint32x4.ShiftLeftConcat(y Uint32x4, z Uint32x4) Uint32x4
func Uint32x4.ShiftLeftConcat(y Uint32x4, z Uint32x4) Uint32x4
func Uint32x4.ShiftRight(y Uint32x4) Uint32x4
func Uint32x4.ShiftRightConcat(y Uint32x4, z Uint32x4) Uint32x4
func Uint32x4.ShiftRightConcat(y Uint32x4, z Uint32x4) Uint32x4
func Uint32x4.Sub(y Uint32x4) Uint32x4
func Uint32x4.SubPairs(y Uint32x4) Uint32x4
func Uint32x4.Xor(y Uint32x4) Uint32x4
func Uint32x8.SetHi(y Uint32x4) Uint32x8
func Uint32x8.SetLo(y Uint32x4) Uint32x8
func Uint8x16.AESDecryptLastRound(y Uint32x4) Uint8x16
func Uint8x16.AESDecryptOneRound(y Uint32x4) Uint8x16
func Uint8x16.AESEncryptLastRound(y Uint32x4) Uint8x16
func Uint8x16.AESEncryptOneRound(y Uint32x4) Uint8x16
Uint32x8 is a 256-bit SIMD vector of 8 uint32 Add adds corresponding elements of two vectors.
Asm: VPADDD, CPU Feature: AVX2 AddPairs horizontally adds adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VPHADDD, CPU Feature: AVX2 And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2 AndNot performs a bitwise x &^ y.
Asm: VPANDN, CPU Feature: AVX2 Float32x8 converts from Uint32x8 to Float32x8 Float64x4 converts from Uint32x8 to Float64x4 Int16x16 converts from Uint32x8 to Int16x16 Int32x8 converts from Uint32x8 to Int32x8 Int64x4 converts from Uint32x8 to Int64x4 Int8x32 converts from Uint32x8 to Int8x32 Uint16x16 converts from Uint32x8 to Uint16x16 Uint64x4 converts from Uint32x8 to Uint64x4 Uint8x32 converts from Uint32x8 to Uint8x32 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSD, CPU Feature: AVX512 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2D, CPU Feature: AVX512 ConvertToFloat32 converts element values to float32.
Asm: VCVTUDQ2PS, CPU Feature: AVX512 ConvertToFloat64 converts element values to float64.
Asm: VCVTUDQ2PD, CPU Feature: AVX512 Equal returns x equals y, elementwise.
Asm: VPCMPEQD, CPU Feature: AVX2 Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDD, CPU Feature: AVX512 ExtendToUint64 converts element values to uint64.
The result vector's elements are zero-extended.
Asm: VPMOVZXDQ, CPU Feature: AVX512 GetHi returns the upper half of x.
Asm: VEXTRACTI128, CPU Feature: AVX2 GetLo returns the lower half of x.
Asm: VEXTRACTI128, CPU Feature: AVX2 Greater returns a mask whose elements indicate whether x > y
Emulated, CPU Feature AVX2 GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX2 InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHDQ, CPU Feature: AVX2 InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLDQ, CPU Feature: AVX2 IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x.
x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTD, CPU Feature: AVX512 Len returns the number of elements in a Uint32x8 Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX2 LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX2 Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXUD, CPU Feature: AVX2 Merge returns x but with elements set to y where mask is false. Min computes the minimum of corresponding elements.
Asm: VPMINUD, CPU Feature: AVX2 Mul multiplies corresponding elements of two vectors.
Asm: VPMULLD, CPU Feature: AVX2 MulEvenWiden multiplies even-indexed elements, widening the result.
Result[i] = v1.Even[i] * v2.Even[i].
Asm: VPMULUDQ, CPU Feature: AVX2 Not returns the bitwise complement of x
Emulated, CPU Feature AVX2 NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX2 OnesCount counts the number of set bits in each element.
Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2 Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 3 bits (values 0-7) of each element of indices is used
Asm: VPERMD, CPU Feature: AVX2 PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:
result = {x[a], x[b], x[c], x[d], x[a+4], x[b+4], x[c+4], x[d+4]}
Parameters a,b,c,d should have values between 0 and 3.
If a through d are constants, then an instruction will be inlined, otherwise
a jump table is generated.
Asm: VPSHUFD, CPU Feature: AVX2 RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLD, CPU Feature: AVX512 RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORD, CPU Feature: AVX512 RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVD, CPU Feature: AVX512 RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVD, CPU Feature: AVX512 SaturateToUint16 converts element values to uint16.
Conversion is done with saturation on the vector elements.
Asm: VPMOVUSDW, CPU Feature: AVX512 SaturateToUint16Concat converts element values to uint16.
With each 128-bit as a group:
The converted group from the first input vector will be packed to the lower part of the result vector,
the converted group from the second input vector will be packed to the upper part of the result vector.
Conversion is done with saturation on the vector elements.
Asm: VPACKUSDW, CPU Feature: AVX2 Select128FromPair treats the 256-bit vectors x and y as a single vector of four
128-bit elements, and returns a 256-bit result formed by
concatenating the two elements specified by lo and hi.
For example,
{40, 41, 42, 43, 50, 51, 52, 53}.Select128FromPair(3, 0, {60, 61, 62, 63, 70, 71, 72, 73})
returns {70, 71, 72, 73, 40, 41, 42, 43}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table.
lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2 SelectFromPairGrouped returns, for each of the two 128-bit halves of
the vectors x and y, the selection of four elements from x and y,
where selector values in the range 0-3 specify elements from x and
values in the range 4-7 specify the 0-3 elements of y.
When the selectors are constants and can be the selection
can be implemented in a single instruction, it will be, otherwise
it requires two. a is the source index of the least element in the
output, and b, c, and d are the indices of the 2nd, 3rd, and 4th
elements in the output. For example,
{1,2,4,8,16,32,64,128}.SelectFromPair(2,3,5,7,{9,25,49,81,121,169,225,289})
returns {4,8,25,81,64,128,169,289}
If the selectors are not constant this will translate to a function
call.
Asm: VSHUFPS, CPU Feature: AVX SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2 SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2 ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLD, CPU Feature: AVX2 ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDD, CPU Feature: AVX512VBMI2 ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
Asm: VPSRLD, CPU Feature: AVX2 ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDD, CPU Feature: AVX512VBMI2 ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVD, CPU Feature: AVX2 ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVD, CPU Feature: AVX512VBMI2 ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
Asm: VPSRLVD, CPU Feature: AVX2 ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVD, CPU Feature: AVX512VBMI2 Store stores a Uint32x8 to an array StoreMasked stores a Uint32x8 to an array,
at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2 StoreSlice stores x into a slice of at least 8 uint32s StoreSlicePart stores the 8 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 8 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBD, CPU Feature: AVX2 SubPairs horizontally subtracts adjacent pairs of elements.
For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VPHSUBD, CPU Feature: AVX2 TruncateToUint16 converts element values to uint16.
Conversion is done with truncation on the vector elements.
Asm: VPMOVDW, CPU Feature: AVX512 TruncateToUint8 converts element values to uint8.
Conversion is done with truncation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVDB, CPU Feature: AVX512 Xor performs a bitwise XOR operation between two vectors.
Asm: VPXOR, CPU Feature: AVX2
Uint32x8 : expvar.Var
Uint32x8 : fmt.Stringer
func BroadcastUint32x8(x uint32) Uint32x8
func LoadMaskedUint32x8(y *[8]uint32, mask Mask32x8) Uint32x8
func LoadUint32x8(y *[8]uint32) Uint32x8
func LoadUint32x8Slice(s []uint32) Uint32x8
func LoadUint32x8SlicePart(s []uint32) Uint32x8
func Float32x8.AsUint32x8() (to Uint32x8)
func Float32x8.ConvertToUint32() Uint32x8
func Float64x4.AsUint32x8() (to Uint32x8)
func Float64x8.ConvertToUint32() Uint32x8
func Int16x16.AsUint32x8() (to Uint32x8)
func Int32x8.AsUint32x8() (to Uint32x8)
func Int64x4.AsUint32x8() (to Uint32x8)
func Int8x32.AsUint32x8() (to Uint32x8)
func Uint16x16.AsUint32x8() (to Uint32x8)
func Uint16x8.ExtendToUint32() Uint32x8
func Uint32x16.GetHi() Uint32x8
func Uint32x16.GetLo() Uint32x8
func Uint32x4.Broadcast256() Uint32x8
func Uint32x8.Add(y Uint32x8) Uint32x8
func Uint32x8.AddPairs(y Uint32x8) Uint32x8
func Uint32x8.And(y Uint32x8) Uint32x8
func Uint32x8.AndNot(y Uint32x8) Uint32x8
func Uint32x8.Compress(mask Mask32x8) Uint32x8
func Uint32x8.ConcatPermute(y Uint32x8, indices Uint32x8) Uint32x8
func Uint32x8.Expand(mask Mask32x8) Uint32x8
func Uint32x8.InterleaveHiGrouped(y Uint32x8) Uint32x8
func Uint32x8.InterleaveLoGrouped(y Uint32x8) Uint32x8
func Uint32x8.LeadingZeros() Uint32x8
func Uint32x8.Masked(mask Mask32x8) Uint32x8
func Uint32x8.Max(y Uint32x8) Uint32x8
func Uint32x8.Merge(y Uint32x8, mask Mask32x8) Uint32x8
func Uint32x8.Min(y Uint32x8) Uint32x8
func Uint32x8.Mul(y Uint32x8) Uint32x8
func Uint32x8.Not() Uint32x8
func Uint32x8.OnesCount() Uint32x8
func Uint32x8.Or(y Uint32x8) Uint32x8
func Uint32x8.Permute(indices Uint32x8) Uint32x8
func Uint32x8.PermuteScalarsGrouped(a, b, c, d uint8) Uint32x8
func Uint32x8.RotateAllLeft(shift uint8) Uint32x8
func Uint32x8.RotateAllRight(shift uint8) Uint32x8
func Uint32x8.RotateLeft(y Uint32x8) Uint32x8
func Uint32x8.RotateRight(y Uint32x8) Uint32x8
func Uint32x8.Select128FromPair(lo, hi uint8, y Uint32x8) Uint32x8
func Uint32x8.SelectFromPairGrouped(a, b, c, d uint8, y Uint32x8) Uint32x8
func Uint32x8.SetHi(y Uint32x4) Uint32x8
func Uint32x8.SetLo(y Uint32x4) Uint32x8
func Uint32x8.ShiftAllLeft(y uint64) Uint32x8
func Uint32x8.ShiftAllLeftConcat(shift uint8, y Uint32x8) Uint32x8
func Uint32x8.ShiftAllRight(y uint64) Uint32x8
func Uint32x8.ShiftAllRightConcat(shift uint8, y Uint32x8) Uint32x8
func Uint32x8.ShiftLeft(y Uint32x8) Uint32x8
func Uint32x8.ShiftLeftConcat(y Uint32x8, z Uint32x8) Uint32x8
func Uint32x8.ShiftRight(y Uint32x8) Uint32x8
func Uint32x8.ShiftRightConcat(y Uint32x8, z Uint32x8) Uint32x8
func Uint32x8.Sub(y Uint32x8) Uint32x8
func Uint32x8.SubPairs(y Uint32x8) Uint32x8
func Uint32x8.Xor(y Uint32x8) Uint32x8
func Uint64x4.AsUint32x8() (to Uint32x8)
func Uint64x8.SaturateToUint32() Uint32x8
func Uint64x8.TruncateToUint32() Uint32x8
func Uint8x16.ExtendLo8ToUint32x8() Uint32x8
func Uint8x32.AsUint32x8() (to Uint32x8)
func Float32x8.ConcatPermute(y Float32x8, indices Uint32x8) Float32x8
func Float32x8.Permute(indices Uint32x8) Float32x8
func Int32x8.ConcatPermute(y Int32x8, indices Uint32x8) Int32x8
func Int32x8.Permute(indices Uint32x8) Int32x8
func Uint32x16.SetHi(y Uint32x8) Uint32x16
func Uint32x16.SetLo(y Uint32x8) Uint32x16
func Uint32x8.Add(y Uint32x8) Uint32x8
func Uint32x8.AddPairs(y Uint32x8) Uint32x8
func Uint32x8.And(y Uint32x8) Uint32x8
func Uint32x8.AndNot(y Uint32x8) Uint32x8
func Uint32x8.ConcatPermute(y Uint32x8, indices Uint32x8) Uint32x8
func Uint32x8.ConcatPermute(y Uint32x8, indices Uint32x8) Uint32x8
func Uint32x8.Equal(y Uint32x8) Mask32x8
func Uint32x8.Greater(y Uint32x8) Mask32x8
func Uint32x8.GreaterEqual(y Uint32x8) Mask32x8
func Uint32x8.InterleaveHiGrouped(y Uint32x8) Uint32x8
func Uint32x8.InterleaveLoGrouped(y Uint32x8) Uint32x8
func Uint32x8.Less(y Uint32x8) Mask32x8
func Uint32x8.LessEqual(y Uint32x8) Mask32x8
func Uint32x8.Max(y Uint32x8) Uint32x8
func Uint32x8.Merge(y Uint32x8, mask Mask32x8) Uint32x8
func Uint32x8.Min(y Uint32x8) Uint32x8
func Uint32x8.Mul(y Uint32x8) Uint32x8
func Uint32x8.MulEvenWiden(y Uint32x8) Uint64x4
func Uint32x8.NotEqual(y Uint32x8) Mask32x8
func Uint32x8.Or(y Uint32x8) Uint32x8
func Uint32x8.Permute(indices Uint32x8) Uint32x8
func Uint32x8.RotateLeft(y Uint32x8) Uint32x8
func Uint32x8.RotateRight(y Uint32x8) Uint32x8
func Uint32x8.SaturateToUint16Concat(y Uint32x8) Uint16x16
func Uint32x8.Select128FromPair(lo, hi uint8, y Uint32x8) Uint32x8
func Uint32x8.SelectFromPairGrouped(a, b, c, d uint8, y Uint32x8) Uint32x8
func Uint32x8.ShiftAllLeftConcat(shift uint8, y Uint32x8) Uint32x8
func Uint32x8.ShiftAllRightConcat(shift uint8, y Uint32x8) Uint32x8
func Uint32x8.ShiftLeft(y Uint32x8) Uint32x8
func Uint32x8.ShiftLeftConcat(y Uint32x8, z Uint32x8) Uint32x8
func Uint32x8.ShiftLeftConcat(y Uint32x8, z Uint32x8) Uint32x8
func Uint32x8.ShiftRight(y Uint32x8) Uint32x8
func Uint32x8.ShiftRightConcat(y Uint32x8, z Uint32x8) Uint32x8
func Uint32x8.ShiftRightConcat(y Uint32x8, z Uint32x8) Uint32x8
func Uint32x8.Sub(y Uint32x8) Uint32x8
func Uint32x8.SubPairs(y Uint32x8) Uint32x8
func Uint32x8.Xor(y Uint32x8) Uint32x8
func Uint8x32.AESDecryptLastRound(y Uint32x8) Uint8x32
func Uint8x32.AESDecryptOneRound(y Uint32x8) Uint8x32
func Uint8x32.AESEncryptLastRound(y Uint32x8) Uint8x32
func Uint8x32.AESEncryptOneRound(y Uint32x8) Uint8x32
Uint64x2 is a 128-bit SIMD vector of 2 uint64 Add adds corresponding elements of two vectors.
Asm: VPADDQ, CPU Feature: AVX And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX AndNot performs a bitwise x &^ y.
Asm: VPANDN, CPU Feature: AVX Float32x4 converts from Uint64x2 to Float32x4 Float64x2 converts from Uint64x2 to Float64x2 Int16x8 converts from Uint64x2 to Int16x8 Int32x4 converts from Uint64x2 to Int32x4 Int64x2 converts from Uint64x2 to Int64x2 Int8x16 converts from Uint64x2 to Int8x16 Uint16x8 converts from Uint64x2 to Uint16x8 Uint32x4 converts from Uint64x2 to Uint32x4 Uint8x16 converts from Uint64x2 to Uint8x16 Broadcast128 copies element zero of its (128-bit) input to all elements of
the 128-bit output vector.
Asm: VPBROADCASTQ, CPU Feature: AVX2 Broadcast256 copies element zero of its (128-bit) input to all elements of
the 256-bit output vector.
Asm: VPBROADCASTQ, CPU Feature: AVX2 Broadcast512 copies element zero of its (128-bit) input to all elements of
the 512-bit output vector.
Asm: VPBROADCASTQ, CPU Feature: AVX512 CarrylessMultiply computes one of four possible carryless
multiplications of selected high and low halves of x and y,
depending on the values of a and b, returning the 128-bit
product in the concatenated two elements of the result.
a selects the low (0) or high (1) element of x and
b selects the low (0) or high (1) element of y.
A carryless multiplication uses bitwise XOR instead of
add-with-carry, for example (in base two):
11 * 11 = 11 * (10 ^ 1) = (11 * 10) ^ (11 * 1) = 110 ^ 11 = 101
This also models multiplication of polynomials with coefficients
from GF(2) -- 11 * 11 models (x+1)*(x+1) = x**2 + (1^1)x + 1 =
x**2 + 0x + 1 = x**2 + 1 modeled by 101. (Note that "+" adds
polynomial terms, but coefficients "add" with XOR.)
constant values of a and b will result in better performance,
otherwise the intrinsic may translate into a jump table.
Asm: VPCLMULQDQ, CPU Feature: AVX Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSQ, CPU Feature: AVX512 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2Q, CPU Feature: AVX512 ConvertToFloat32 converts element values to float32.
Asm: VCVTUQQ2PSX, CPU Feature: AVX512 ConvertToFloat64 converts element values to float64.
Asm: VCVTUQQ2PD, CPU Feature: AVX512 Equal returns x equals y, elementwise.
Asm: VPCMPEQQ, CPU Feature: AVX Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDQ, CPU Feature: AVX512 GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRQ, CPU Feature: AVX Greater returns a mask whose elements indicate whether x > y
Emulated, CPU Feature AVX GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX InterleaveHi interleaves the elements of the high halves of x and y.
Asm: VPUNPCKHQDQ, CPU Feature: AVX InterleaveLo interleaves the elements of the low halves of x and y.
Asm: VPUNPCKLQDQ, CPU Feature: AVX IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x.
x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTQ, CPU Feature: AVX512 Len returns the number of elements in a Uint64x2 Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXUQ, CPU Feature: AVX512 Merge returns x but with elements set to y where mask is false. Min computes the minimum of corresponding elements.
Asm: VPMINUQ, CPU Feature: AVX512 Mul multiplies corresponding elements of two vectors.
Asm: VPMULLQ, CPU Feature: AVX512 Not returns the bitwise complement of x
Emulated, CPU Feature AVX NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX OnesCount counts the number of set bits in each element.
Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLQ, CPU Feature: AVX512 RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORQ, CPU Feature: AVX512 RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVQ, CPU Feature: AVX512 RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVQ, CPU Feature: AVX512 SaturateToUint16 converts element values to uint16.
Conversion is done with saturation on the vector elements.
Asm: VPMOVUSQW, CPU Feature: AVX512 SaturateToUint32 converts element values to uint32.
Conversion is done with saturation on the vector elements.
Asm: VPMOVUSQD, CPU Feature: AVX512 SelectFromPair returns the selection of two elements from the two
vectors x and y, where selector values in the range 0-1 specify
elements from x and values in the range 2-3 specify the 0-1 elements
of y. When the selectors are constants the selection can be
implemented in a single instruction.
If the selectors are not constant this will translate to a function
call.
Asm: VSHUFPD, CPU Feature: AVX SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRQ, CPU Feature: AVX ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLQ, CPU Feature: AVX ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDQ, CPU Feature: AVX512VBMI2 ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
Asm: VPSRLQ, CPU Feature: AVX ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDQ, CPU Feature: AVX512VBMI2 ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVQ, CPU Feature: AVX2 ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2 ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
Asm: VPSRLVQ, CPU Feature: AVX2 ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2 Store stores a Uint64x2 to an array StoreMasked stores a Uint64x2 to an array,
at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2 StoreSlice stores x into a slice of at least 2 uint64s StoreSlicePart stores the 2 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 2 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBQ, CPU Feature: AVX TruncateToUint16 converts element values to uint16.
Conversion is done with truncation on the vector elements.
Asm: VPMOVQW, CPU Feature: AVX512 TruncateToUint32 converts element values to uint32.
Conversion is done with truncation on the vector elements.
Asm: VPMOVQD, CPU Feature: AVX512 TruncateToUint8 converts element values to uint8.
Conversion is done with truncation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVQB, CPU Feature: AVX512 Xor performs a bitwise XOR operation between two vectors.
Asm: VPXOR, CPU Feature: AVX
Uint64x2 : expvar.Var
Uint64x2 : fmt.Stringer
func BroadcastUint64x2(x uint64) Uint64x2
func LoadMaskedUint64x2(y *[2]uint64, mask Mask64x2) Uint64x2
func LoadUint64x2(y *[2]uint64) Uint64x2
func LoadUint64x2Slice(s []uint64) Uint64x2
func LoadUint64x2SlicePart(s []uint64) Uint64x2
func Float32x4.AsUint64x2() (to Uint64x2)
func Float64x2.AsUint64x2() (to Uint64x2)
func Float64x2.ConvertToUint64() Uint64x2
func Int16x8.AsUint64x2() (to Uint64x2)
func Int32x4.AsUint64x2() (to Uint64x2)
func Int64x2.AsUint64x2() (to Uint64x2)
func Int8x16.AsUint64x2() (to Uint64x2)
func Uint16x8.AsUint64x2() (to Uint64x2)
func Uint16x8.ExtendLo2ToUint64x2() Uint64x2
func Uint32x4.AsUint64x2() (to Uint64x2)
func Uint32x4.ExtendLo2ToUint64x2() Uint64x2
func Uint32x4.MulEvenWiden(y Uint32x4) Uint64x2
func Uint64x2.Add(y Uint64x2) Uint64x2
func Uint64x2.And(y Uint64x2) Uint64x2
func Uint64x2.AndNot(y Uint64x2) Uint64x2
func Uint64x2.Broadcast128() Uint64x2
func Uint64x2.CarrylessMultiply(a, b uint8, y Uint64x2) Uint64x2
func Uint64x2.Compress(mask Mask64x2) Uint64x2
func Uint64x2.ConcatPermute(y Uint64x2, indices Uint64x2) Uint64x2
func Uint64x2.Expand(mask Mask64x2) Uint64x2
func Uint64x2.InterleaveHi(y Uint64x2) Uint64x2
func Uint64x2.InterleaveLo(y Uint64x2) Uint64x2
func Uint64x2.LeadingZeros() Uint64x2
func Uint64x2.Masked(mask Mask64x2) Uint64x2
func Uint64x2.Max(y Uint64x2) Uint64x2
func Uint64x2.Merge(y Uint64x2, mask Mask64x2) Uint64x2
func Uint64x2.Min(y Uint64x2) Uint64x2
func Uint64x2.Mul(y Uint64x2) Uint64x2
func Uint64x2.Not() Uint64x2
func Uint64x2.OnesCount() Uint64x2
func Uint64x2.Or(y Uint64x2) Uint64x2
func Uint64x2.RotateAllLeft(shift uint8) Uint64x2
func Uint64x2.RotateAllRight(shift uint8) Uint64x2
func Uint64x2.RotateLeft(y Uint64x2) Uint64x2
func Uint64x2.RotateRight(y Uint64x2) Uint64x2
func Uint64x2.SelectFromPair(a, b uint8, y Uint64x2) Uint64x2
func Uint64x2.SetElem(index uint8, y uint64) Uint64x2
func Uint64x2.ShiftAllLeft(y uint64) Uint64x2
func Uint64x2.ShiftAllLeftConcat(shift uint8, y Uint64x2) Uint64x2
func Uint64x2.ShiftAllRight(y uint64) Uint64x2
func Uint64x2.ShiftAllRightConcat(shift uint8, y Uint64x2) Uint64x2
func Uint64x2.ShiftLeft(y Uint64x2) Uint64x2
func Uint64x2.ShiftLeftConcat(y Uint64x2, z Uint64x2) Uint64x2
func Uint64x2.ShiftRight(y Uint64x2) Uint64x2
func Uint64x2.ShiftRightConcat(y Uint64x2, z Uint64x2) Uint64x2
func Uint64x2.Sub(y Uint64x2) Uint64x2
func Uint64x2.Xor(y Uint64x2) Uint64x2
func Uint64x4.GetHi() Uint64x2
func Uint64x4.GetLo() Uint64x2
func Uint8x16.AsUint64x2() (to Uint64x2)
func Uint8x16.ExtendLo2ToUint64x2() Uint64x2
func Float64x2.ConcatPermute(y Float64x2, indices Uint64x2) Float64x2
func Int64x2.ConcatPermute(y Int64x2, indices Uint64x2) Int64x2
func Uint64x2.Add(y Uint64x2) Uint64x2
func Uint64x2.And(y Uint64x2) Uint64x2
func Uint64x2.AndNot(y Uint64x2) Uint64x2
func Uint64x2.CarrylessMultiply(a, b uint8, y Uint64x2) Uint64x2
func Uint64x2.ConcatPermute(y Uint64x2, indices Uint64x2) Uint64x2
func Uint64x2.ConcatPermute(y Uint64x2, indices Uint64x2) Uint64x2
func Uint64x2.Equal(y Uint64x2) Mask64x2
func Uint64x2.Greater(y Uint64x2) Mask64x2
func Uint64x2.GreaterEqual(y Uint64x2) Mask64x2
func Uint64x2.InterleaveHi(y Uint64x2) Uint64x2
func Uint64x2.InterleaveLo(y Uint64x2) Uint64x2
func Uint64x2.Less(y Uint64x2) Mask64x2
func Uint64x2.LessEqual(y Uint64x2) Mask64x2
func Uint64x2.Max(y Uint64x2) Uint64x2
func Uint64x2.Merge(y Uint64x2, mask Mask64x2) Uint64x2
func Uint64x2.Min(y Uint64x2) Uint64x2
func Uint64x2.Mul(y Uint64x2) Uint64x2
func Uint64x2.NotEqual(y Uint64x2) Mask64x2
func Uint64x2.Or(y Uint64x2) Uint64x2
func Uint64x2.RotateLeft(y Uint64x2) Uint64x2
func Uint64x2.RotateRight(y Uint64x2) Uint64x2
func Uint64x2.SelectFromPair(a, b uint8, y Uint64x2) Uint64x2
func Uint64x2.ShiftAllLeftConcat(shift uint8, y Uint64x2) Uint64x2
func Uint64x2.ShiftAllRightConcat(shift uint8, y Uint64x2) Uint64x2
func Uint64x2.ShiftLeft(y Uint64x2) Uint64x2
func Uint64x2.ShiftLeftConcat(y Uint64x2, z Uint64x2) Uint64x2
func Uint64x2.ShiftLeftConcat(y Uint64x2, z Uint64x2) Uint64x2
func Uint64x2.ShiftRight(y Uint64x2) Uint64x2
func Uint64x2.ShiftRightConcat(y Uint64x2, z Uint64x2) Uint64x2
func Uint64x2.ShiftRightConcat(y Uint64x2, z Uint64x2) Uint64x2
func Uint64x2.Sub(y Uint64x2) Uint64x2
func Uint64x2.Xor(y Uint64x2) Uint64x2
func Uint64x4.SetHi(y Uint64x2) Uint64x4
func Uint64x4.SetLo(y Uint64x2) Uint64x4
func Uint8x16.GaloisFieldAffineTransform(y Uint64x2, b uint8) Uint8x16
func Uint8x16.GaloisFieldAffineTransformInverse(y Uint64x2, b uint8) Uint8x16
Uint64x4 is a 256-bit SIMD vector of 4 uint64 Add adds corresponding elements of two vectors.
Asm: VPADDQ, CPU Feature: AVX2 And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2 AndNot performs a bitwise x &^ y.
Asm: VPANDN, CPU Feature: AVX2 Float32x8 converts from Uint64x4 to Float32x8 Float64x4 converts from Uint64x4 to Float64x4 Int16x16 converts from Uint64x4 to Int16x16 Int32x8 converts from Uint64x4 to Int32x8 Int64x4 converts from Uint64x4 to Int64x4 Int8x32 converts from Uint64x4 to Int8x32 Uint16x16 converts from Uint64x4 to Uint16x16 Uint32x8 converts from Uint64x4 to Uint32x8 Uint8x32 converts from Uint64x4 to Uint8x32 CarrylessMultiplyGrouped computes one of four possible carryless
multiplications of selected high and low halves of each of the two
128-bit lanes of x and y, depending on the values of a and b,
and returns the four 128-bit products in the result's lanes.
a selects the low (0) or high (1) elements of x's lanes and
b selects the low (0) or high (1) elements of y's lanes.
A carryless multiplication uses bitwise XOR instead of
add-with-carry, for example (in base two):
11 * 11 = 11 * (10 ^ 1) = (11 * 10) ^ (11 * 1) = 110 ^ 11 = 101
This also models multiplication of polynomials with coefficients
from GF(2) -- 11 * 11 models (x+1)*(x+1) = x**2 + (1^1)x + 1 =
x**2 + 0x + 1 = x**2 + 1 modeled by 101. (Note that "+" adds
polynomial terms, but coefficients "add" with XOR.)
constant values of a and b will result in better performance,
otherwise the intrinsic may translate into a jump table.
Asm: VPCLMULQDQ, CPU Feature: AVX512VPCLMULQDQ Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSQ, CPU Feature: AVX512 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2Q, CPU Feature: AVX512 ConvertToFloat32 converts element values to float32.
Asm: VCVTUQQ2PSY, CPU Feature: AVX512 ConvertToFloat64 converts element values to float64.
Asm: VCVTUQQ2PD, CPU Feature: AVX512 Equal returns x equals y, elementwise.
Asm: VPCMPEQQ, CPU Feature: AVX2 Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDQ, CPU Feature: AVX512 GetHi returns the upper half of x.
Asm: VEXTRACTI128, CPU Feature: AVX2 GetLo returns the lower half of x.
Asm: VEXTRACTI128, CPU Feature: AVX2 Greater returns a mask whose elements indicate whether x > y
Emulated, CPU Feature AVX2 GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX2 InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHQDQ, CPU Feature: AVX2 InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLQDQ, CPU Feature: AVX2 IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x.
x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTQ, CPU Feature: AVX512 Len returns the number of elements in a Uint64x4 Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX2 LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX2 Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXUQ, CPU Feature: AVX512 Merge returns x but with elements set to y where mask is false. Min computes the minimum of corresponding elements.
Asm: VPMINUQ, CPU Feature: AVX512 Mul multiplies corresponding elements of two vectors.
Asm: VPMULLQ, CPU Feature: AVX512 Not returns the bitwise complement of x
Emulated, CPU Feature AVX2 NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX2 OnesCount counts the number of set bits in each element.
Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2 Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 2 bits (values 0-3) of each element of indices is used
Asm: VPERMQ, CPU Feature: AVX512 RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLQ, CPU Feature: AVX512 RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORQ, CPU Feature: AVX512 RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVQ, CPU Feature: AVX512 RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVQ, CPU Feature: AVX512 SaturateToUint16 converts element values to uint16.
Conversion is done with saturation on the vector elements.
Asm: VPMOVUSQW, CPU Feature: AVX512 SaturateToUint32 converts element values to uint32.
Conversion is done with saturation on the vector elements.
Asm: VPMOVUSQD, CPU Feature: AVX512 Select128FromPair treats the 256-bit vectors x and y as a single vector of four
128-bit elements, and returns a 256-bit result formed by
concatenating the two elements specified by lo and hi.
For example,
{40, 41, 50, 51}.Select128FromPair(3, 0, {60, 61, 70, 71})
returns {70, 71, 40, 41}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table.
lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2 SelectFromPairGrouped returns, for each of the two 128-bit halves of
the vectors x and y, the selection of two elements from the two
vectors x and y, where selector values in the range 0-1 specify
elements from x and values in the range 2-3 specify the 0-1 elements
of y. When the selectors are constants the selection can be
implemented in a single instruction.
If the selectors are not constant this will translate to a function
call.
Asm: VSHUFPD, CPU Feature: AVX SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2 SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2 ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLQ, CPU Feature: AVX2 ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDQ, CPU Feature: AVX512VBMI2 ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
Asm: VPSRLQ, CPU Feature: AVX2 ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDQ, CPU Feature: AVX512VBMI2 ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVQ, CPU Feature: AVX2 ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2 ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
Asm: VPSRLVQ, CPU Feature: AVX2 ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2 Store stores a Uint64x4 to an array StoreMasked stores a Uint64x4 to an array,
at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2 StoreSlice stores x into a slice of at least 4 uint64s StoreSlicePart stores the 4 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 4 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBQ, CPU Feature: AVX2 TruncateToUint16 converts element values to uint16.
Conversion is done with truncation on the vector elements.
Asm: VPMOVQW, CPU Feature: AVX512 TruncateToUint32 converts element values to uint32.
Conversion is done with truncation on the vector elements.
Asm: VPMOVQD, CPU Feature: AVX512 TruncateToUint8 converts element values to uint8.
Conversion is done with truncation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVQB, CPU Feature: AVX512 Xor performs a bitwise XOR operation between two vectors.
Asm: VPXOR, CPU Feature: AVX2
Uint64x4 : expvar.Var
Uint64x4 : fmt.Stringer
func BroadcastUint64x4(x uint64) Uint64x4
func LoadMaskedUint64x4(y *[4]uint64, mask Mask64x4) Uint64x4
func LoadUint64x4(y *[4]uint64) Uint64x4
func LoadUint64x4Slice(s []uint64) Uint64x4
func LoadUint64x4SlicePart(s []uint64) Uint64x4
func Float32x4.ConvertToUint64() Uint64x4
func Float32x8.AsUint64x4() (to Uint64x4)
func Float64x4.AsUint64x4() (to Uint64x4)
func Float64x4.ConvertToUint64() Uint64x4
func Int16x16.AsUint64x4() (to Uint64x4)
func Int32x8.AsUint64x4() (to Uint64x4)
func Int64x4.AsUint64x4() (to Uint64x4)
func Int8x32.AsUint64x4() (to Uint64x4)
func Uint16x16.AsUint64x4() (to Uint64x4)
func Uint16x8.ExtendLo4ToUint64x4() Uint64x4
func Uint32x4.ExtendToUint64() Uint64x4
func Uint32x8.AsUint64x4() (to Uint64x4)
func Uint32x8.MulEvenWiden(y Uint32x8) Uint64x4
func Uint64x2.Broadcast256() Uint64x4
func Uint64x4.Add(y Uint64x4) Uint64x4
func Uint64x4.And(y Uint64x4) Uint64x4
func Uint64x4.AndNot(y Uint64x4) Uint64x4
func Uint64x4.CarrylessMultiplyGrouped(a, b uint8, y Uint64x4) Uint64x4
func Uint64x4.Compress(mask Mask64x4) Uint64x4
func Uint64x4.ConcatPermute(y Uint64x4, indices Uint64x4) Uint64x4
func Uint64x4.Expand(mask Mask64x4) Uint64x4
func Uint64x4.InterleaveHiGrouped(y Uint64x4) Uint64x4
func Uint64x4.InterleaveLoGrouped(y Uint64x4) Uint64x4
func Uint64x4.LeadingZeros() Uint64x4
func Uint64x4.Masked(mask Mask64x4) Uint64x4
func Uint64x4.Max(y Uint64x4) Uint64x4
func Uint64x4.Merge(y Uint64x4, mask Mask64x4) Uint64x4
func Uint64x4.Min(y Uint64x4) Uint64x4
func Uint64x4.Mul(y Uint64x4) Uint64x4
func Uint64x4.Not() Uint64x4
func Uint64x4.OnesCount() Uint64x4
func Uint64x4.Or(y Uint64x4) Uint64x4
func Uint64x4.Permute(indices Uint64x4) Uint64x4
func Uint64x4.RotateAllLeft(shift uint8) Uint64x4
func Uint64x4.RotateAllRight(shift uint8) Uint64x4
func Uint64x4.RotateLeft(y Uint64x4) Uint64x4
func Uint64x4.RotateRight(y Uint64x4) Uint64x4
func Uint64x4.Select128FromPair(lo, hi uint8, y Uint64x4) Uint64x4
func Uint64x4.SelectFromPairGrouped(a, b uint8, y Uint64x4) Uint64x4
func Uint64x4.SetHi(y Uint64x2) Uint64x4
func Uint64x4.SetLo(y Uint64x2) Uint64x4
func Uint64x4.ShiftAllLeft(y uint64) Uint64x4
func Uint64x4.ShiftAllLeftConcat(shift uint8, y Uint64x4) Uint64x4
func Uint64x4.ShiftAllRight(y uint64) Uint64x4
func Uint64x4.ShiftAllRightConcat(shift uint8, y Uint64x4) Uint64x4
func Uint64x4.ShiftLeft(y Uint64x4) Uint64x4
func Uint64x4.ShiftLeftConcat(y Uint64x4, z Uint64x4) Uint64x4
func Uint64x4.ShiftRight(y Uint64x4) Uint64x4
func Uint64x4.ShiftRightConcat(y Uint64x4, z Uint64x4) Uint64x4
func Uint64x4.Sub(y Uint64x4) Uint64x4
func Uint64x4.Xor(y Uint64x4) Uint64x4
func Uint64x8.GetHi() Uint64x4
func Uint64x8.GetLo() Uint64x4
func Uint8x16.ExtendLo4ToUint64x4() Uint64x4
func Uint8x32.AsUint64x4() (to Uint64x4)
func Float64x4.ConcatPermute(y Float64x4, indices Uint64x4) Float64x4
func Float64x4.Permute(indices Uint64x4) Float64x4
func Int64x4.ConcatPermute(y Int64x4, indices Uint64x4) Int64x4
func Int64x4.Permute(indices Uint64x4) Int64x4
func Uint64x4.Add(y Uint64x4) Uint64x4
func Uint64x4.And(y Uint64x4) Uint64x4
func Uint64x4.AndNot(y Uint64x4) Uint64x4
func Uint64x4.CarrylessMultiplyGrouped(a, b uint8, y Uint64x4) Uint64x4
func Uint64x4.ConcatPermute(y Uint64x4, indices Uint64x4) Uint64x4
func Uint64x4.ConcatPermute(y Uint64x4, indices Uint64x4) Uint64x4
func Uint64x4.Equal(y Uint64x4) Mask64x4
func Uint64x4.Greater(y Uint64x4) Mask64x4
func Uint64x4.GreaterEqual(y Uint64x4) Mask64x4
func Uint64x4.InterleaveHiGrouped(y Uint64x4) Uint64x4
func Uint64x4.InterleaveLoGrouped(y Uint64x4) Uint64x4
func Uint64x4.Less(y Uint64x4) Mask64x4
func Uint64x4.LessEqual(y Uint64x4) Mask64x4
func Uint64x4.Max(y Uint64x4) Uint64x4
func Uint64x4.Merge(y Uint64x4, mask Mask64x4) Uint64x4
func Uint64x4.Min(y Uint64x4) Uint64x4
func Uint64x4.Mul(y Uint64x4) Uint64x4
func Uint64x4.NotEqual(y Uint64x4) Mask64x4
func Uint64x4.Or(y Uint64x4) Uint64x4
func Uint64x4.Permute(indices Uint64x4) Uint64x4
func Uint64x4.RotateLeft(y Uint64x4) Uint64x4
func Uint64x4.RotateRight(y Uint64x4) Uint64x4
func Uint64x4.Select128FromPair(lo, hi uint8, y Uint64x4) Uint64x4
func Uint64x4.SelectFromPairGrouped(a, b uint8, y Uint64x4) Uint64x4
func Uint64x4.ShiftAllLeftConcat(shift uint8, y Uint64x4) Uint64x4
func Uint64x4.ShiftAllRightConcat(shift uint8, y Uint64x4) Uint64x4
func Uint64x4.ShiftLeft(y Uint64x4) Uint64x4
func Uint64x4.ShiftLeftConcat(y Uint64x4, z Uint64x4) Uint64x4
func Uint64x4.ShiftLeftConcat(y Uint64x4, z Uint64x4) Uint64x4
func Uint64x4.ShiftRight(y Uint64x4) Uint64x4
func Uint64x4.ShiftRightConcat(y Uint64x4, z Uint64x4) Uint64x4
func Uint64x4.ShiftRightConcat(y Uint64x4, z Uint64x4) Uint64x4
func Uint64x4.Sub(y Uint64x4) Uint64x4
func Uint64x4.Xor(y Uint64x4) Uint64x4
func Uint64x8.SetHi(y Uint64x4) Uint64x8
func Uint64x8.SetLo(y Uint64x4) Uint64x8
func Uint8x32.GaloisFieldAffineTransform(y Uint64x4, b uint8) Uint8x32
func Uint8x32.GaloisFieldAffineTransformInverse(y Uint64x4, b uint8) Uint8x32
Uint64x8 is a 512-bit SIMD vector of 8 uint64 Add adds corresponding elements of two vectors.
Asm: VPADDQ, CPU Feature: AVX512 And performs a bitwise AND operation between two vectors.
Asm: VPANDQ, CPU Feature: AVX512 AndNot performs a bitwise x &^ y.
Asm: VPANDNQ, CPU Feature: AVX512 Float32x16 converts from Uint64x8 to Float32x16 Float64x8 converts from Uint64x8 to Float64x8 Int16x32 converts from Uint64x8 to Int16x32 Int32x16 converts from Uint64x8 to Int32x16 Int64x8 converts from Uint64x8 to Int64x8 Int8x64 converts from Uint64x8 to Int8x64 Uint16x32 converts from Uint64x8 to Uint16x32 Uint32x16 converts from Uint64x8 to Uint32x16 Uint8x64 converts from Uint64x8 to Uint8x64 CarrylessMultiplyGrouped computes one of four possible carryless
multiplications of selected high and low halves of each of the four
128-bit lanes of x and y, depending on the values of a and b,
and returns the four 128-bit products in the result's lanes.
a selects the low (0) or high (1) elements of x's lanes and
b selects the low (0) or high (1) elements of y's lanes.
A carryless multiplication uses bitwise XOR instead of
add-with-carry, for example (in base two):
11 * 11 = 11 * (10 ^ 1) = (11 * 10) ^ (11 * 1) = 110 ^ 11 = 101
This also models multiplication of polynomials with coefficients
from GF(2) -- 11 * 11 models (x+1)*(x+1) = x**2 + (1^1)x + 1 =
x**2 + 0x + 1 = x**2 + 1 modeled by 101. (Note that "+" adds
polynomial terms, but coefficients "add" with XOR.)
constant values of a and b will result in better performance,
otherwise the intrinsic may translate into a jump table.
Asm: VPCLMULQDQ, CPU Feature: AVX512VPCLMULQDQ Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSQ, CPU Feature: AVX512 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2Q, CPU Feature: AVX512 ConvertToFloat32 converts element values to float32.
Asm: VCVTUQQ2PS, CPU Feature: AVX512 ConvertToFloat64 converts element values to float64.
Asm: VCVTUQQ2PD, CPU Feature: AVX512 Equal returns x equals y, elementwise.
Asm: VPCMPEQQ, CPU Feature: AVX512 Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDQ, CPU Feature: AVX512 GetHi returns the upper half of x.
Asm: VEXTRACTI64X4, CPU Feature: AVX512 GetLo returns the lower half of x.
Asm: VEXTRACTI64X4, CPU Feature: AVX512 Greater returns x greater-than y, elementwise.
Asm: VPCMPUQ, CPU Feature: AVX512 GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VPCMPUQ, CPU Feature: AVX512 InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHQDQ, CPU Feature: AVX512 InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLQDQ, CPU Feature: AVX512 LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTQ, CPU Feature: AVX512 Len returns the number of elements in a Uint64x8 Less returns x less-than y, elementwise.
Asm: VPCMPUQ, CPU Feature: AVX512 LessEqual returns x less-than-or-equals y, elementwise.
Asm: VPCMPUQ, CPU Feature: AVX512 Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXUQ, CPU Feature: AVX512 Merge returns x but with elements set to y where m is false. Min computes the minimum of corresponding elements.
Asm: VPMINUQ, CPU Feature: AVX512 Mul multiplies corresponding elements of two vectors.
Asm: VPMULLQ, CPU Feature: AVX512 Not returns the bitwise complement of x
Emulated, CPU Feature AVX512 NotEqual returns x not-equals y, elementwise.
Asm: VPCMPUQ, CPU Feature: AVX512 OnesCount counts the number of set bits in each element.
Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ Or performs a bitwise OR operation between two vectors.
Asm: VPORQ, CPU Feature: AVX512 Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 3 bits (values 0-7) of each element of indices is used
Asm: VPERMQ, CPU Feature: AVX512 RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLQ, CPU Feature: AVX512 RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORQ, CPU Feature: AVX512 RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVQ, CPU Feature: AVX512 RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVQ, CPU Feature: AVX512 SaturateToUint16 converts element values to uint16.
Conversion is done with saturation on the vector elements.
Asm: VPMOVUSQW, CPU Feature: AVX512 SaturateToUint32 converts element values to uint32.
Conversion is done with saturation on the vector elements.
Asm: VPMOVUSQD, CPU Feature: AVX512 SelectFromPairGrouped returns, for each of the four 128-bit subvectors
of the vectors x and y, the selection of two elements from the two
vectors x and y, where selector values in the range 0-1 specify
elements from x and values in the range 2-3 specify the 0-1 elements
of y. When the selectors are constants the selection can be
implemented in a single instruction.
If the selectors are not constant this will translate to a function
call.
Asm: VSHUFPD, CPU Feature: AVX512 SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512 SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512 ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLQ, CPU Feature: AVX512 ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDQ, CPU Feature: AVX512VBMI2 ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
Asm: VPSRLQ, CPU Feature: AVX512 ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDQ, CPU Feature: AVX512VBMI2 ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVQ, CPU Feature: AVX512 ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2 ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
Asm: VPSRLVQ, CPU Feature: AVX512 ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2 Store stores a Uint64x8 to an array StoreMasked stores a Uint64x8 to an array,
at those elements enabled by mask
Asm: VMOVDQU64, CPU Feature: AVX512 StoreSlice stores x into a slice of at least 8 uint64s StoreSlicePart stores the 8 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 8 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBQ, CPU Feature: AVX512 TruncateToUint16 converts element values to uint16.
Conversion is done with truncation on the vector elements.
Asm: VPMOVQW, CPU Feature: AVX512 TruncateToUint32 converts element values to uint32.
Conversion is done with truncation on the vector elements.
Asm: VPMOVQD, CPU Feature: AVX512 TruncateToUint8 converts element values to uint8.
Conversion is done with truncation on the vector elements.
Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVQB, CPU Feature: AVX512 Xor performs a bitwise XOR operation between two vectors.
Asm: VPXORQ, CPU Feature: AVX512
Uint64x8 : expvar.Var
Uint64x8 : fmt.Stringer
func BroadcastUint64x8(x uint64) Uint64x8
func LoadMaskedUint64x8(y *[8]uint64, mask Mask64x8) Uint64x8
func LoadUint64x8(y *[8]uint64) Uint64x8
func LoadUint64x8Slice(s []uint64) Uint64x8
func LoadUint64x8SlicePart(s []uint64) Uint64x8
func Float32x16.AsUint64x8() (to Uint64x8)
func Float32x8.ConvertToUint64() Uint64x8
func Float64x8.AsUint64x8() (to Uint64x8)
func Float64x8.ConvertToUint64() Uint64x8
func Int16x32.AsUint64x8() (to Uint64x8)
func Int32x16.AsUint64x8() (to Uint64x8)
func Int64x8.AsUint64x8() (to Uint64x8)
func Int8x64.AsUint64x8() (to Uint64x8)
func Uint16x32.AsUint64x8() (to Uint64x8)
func Uint16x8.ExtendToUint64() Uint64x8
func Uint32x16.AsUint64x8() (to Uint64x8)
func Uint32x8.ExtendToUint64() Uint64x8
func Uint64x2.Broadcast512() Uint64x8
func Uint64x8.Add(y Uint64x8) Uint64x8
func Uint64x8.And(y Uint64x8) Uint64x8
func Uint64x8.AndNot(y Uint64x8) Uint64x8
func Uint64x8.CarrylessMultiplyGrouped(a, b uint8, y Uint64x8) Uint64x8
func Uint64x8.Compress(mask Mask64x8) Uint64x8
func Uint64x8.ConcatPermute(y Uint64x8, indices Uint64x8) Uint64x8
func Uint64x8.Expand(mask Mask64x8) Uint64x8
func Uint64x8.InterleaveHiGrouped(y Uint64x8) Uint64x8
func Uint64x8.InterleaveLoGrouped(y Uint64x8) Uint64x8
func Uint64x8.LeadingZeros() Uint64x8
func Uint64x8.Masked(mask Mask64x8) Uint64x8
func Uint64x8.Max(y Uint64x8) Uint64x8
func Uint64x8.Merge(y Uint64x8, mask Mask64x8) Uint64x8
func Uint64x8.Min(y Uint64x8) Uint64x8
func Uint64x8.Mul(y Uint64x8) Uint64x8
func Uint64x8.Not() Uint64x8
func Uint64x8.OnesCount() Uint64x8
func Uint64x8.Or(y Uint64x8) Uint64x8
func Uint64x8.Permute(indices Uint64x8) Uint64x8
func Uint64x8.RotateAllLeft(shift uint8) Uint64x8
func Uint64x8.RotateAllRight(shift uint8) Uint64x8
func Uint64x8.RotateLeft(y Uint64x8) Uint64x8
func Uint64x8.RotateRight(y Uint64x8) Uint64x8
func Uint64x8.SelectFromPairGrouped(a, b uint8, y Uint64x8) Uint64x8
func Uint64x8.SetHi(y Uint64x4) Uint64x8
func Uint64x8.SetLo(y Uint64x4) Uint64x8
func Uint64x8.ShiftAllLeft(y uint64) Uint64x8
func Uint64x8.ShiftAllLeftConcat(shift uint8, y Uint64x8) Uint64x8
func Uint64x8.ShiftAllRight(y uint64) Uint64x8
func Uint64x8.ShiftAllRightConcat(shift uint8, y Uint64x8) Uint64x8
func Uint64x8.ShiftLeft(y Uint64x8) Uint64x8
func Uint64x8.ShiftLeftConcat(y Uint64x8, z Uint64x8) Uint64x8
func Uint64x8.ShiftRight(y Uint64x8) Uint64x8
func Uint64x8.ShiftRightConcat(y Uint64x8, z Uint64x8) Uint64x8
func Uint64x8.Sub(y Uint64x8) Uint64x8
func Uint64x8.Xor(y Uint64x8) Uint64x8
func Uint8x16.ExtendLo8ToUint64x8() Uint64x8
func Uint8x64.AsUint64x8() (to Uint64x8)
func Float64x8.ConcatPermute(y Float64x8, indices Uint64x8) Float64x8
func Float64x8.Permute(indices Uint64x8) Float64x8
func Int64x8.ConcatPermute(y Int64x8, indices Uint64x8) Int64x8
func Int64x8.Permute(indices Uint64x8) Int64x8
func Uint64x8.Add(y Uint64x8) Uint64x8
func Uint64x8.And(y Uint64x8) Uint64x8
func Uint64x8.AndNot(y Uint64x8) Uint64x8
func Uint64x8.CarrylessMultiplyGrouped(a, b uint8, y Uint64x8) Uint64x8
func Uint64x8.ConcatPermute(y Uint64x8, indices Uint64x8) Uint64x8
func Uint64x8.ConcatPermute(y Uint64x8, indices Uint64x8) Uint64x8
func Uint64x8.Equal(y Uint64x8) Mask64x8
func Uint64x8.Greater(y Uint64x8) Mask64x8
func Uint64x8.GreaterEqual(y Uint64x8) Mask64x8
func Uint64x8.InterleaveHiGrouped(y Uint64x8) Uint64x8
func Uint64x8.InterleaveLoGrouped(y Uint64x8) Uint64x8
func Uint64x8.Less(y Uint64x8) Mask64x8
func Uint64x8.LessEqual(y Uint64x8) Mask64x8
func Uint64x8.Max(y Uint64x8) Uint64x8
func Uint64x8.Merge(y Uint64x8, mask Mask64x8) Uint64x8
func Uint64x8.Min(y Uint64x8) Uint64x8
func Uint64x8.Mul(y Uint64x8) Uint64x8
func Uint64x8.NotEqual(y Uint64x8) Mask64x8
func Uint64x8.Or(y Uint64x8) Uint64x8
func Uint64x8.Permute(indices Uint64x8) Uint64x8
func Uint64x8.RotateLeft(y Uint64x8) Uint64x8
func Uint64x8.RotateRight(y Uint64x8) Uint64x8
func Uint64x8.SelectFromPairGrouped(a, b uint8, y Uint64x8) Uint64x8
func Uint64x8.ShiftAllLeftConcat(shift uint8, y Uint64x8) Uint64x8
func Uint64x8.ShiftAllRightConcat(shift uint8, y Uint64x8) Uint64x8
func Uint64x8.ShiftLeft(y Uint64x8) Uint64x8
func Uint64x8.ShiftLeftConcat(y Uint64x8, z Uint64x8) Uint64x8
func Uint64x8.ShiftLeftConcat(y Uint64x8, z Uint64x8) Uint64x8
func Uint64x8.ShiftRight(y Uint64x8) Uint64x8
func Uint64x8.ShiftRightConcat(y Uint64x8, z Uint64x8) Uint64x8
func Uint64x8.ShiftRightConcat(y Uint64x8, z Uint64x8) Uint64x8
func Uint64x8.Sub(y Uint64x8) Uint64x8
func Uint64x8.Xor(y Uint64x8) Uint64x8
func Uint8x64.GaloisFieldAffineTransform(y Uint64x8, b uint8) Uint8x64
func Uint8x64.GaloisFieldAffineTransformInverse(y Uint64x8, b uint8) Uint8x64
Uint8x16 is a 128-bit SIMD vector of 16 uint8 AESDecryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
y is the chunk of dw array in use.
result = AddRoundKey(InvShiftRows(InvSubBytes(x)), y)
Asm: VAESDECLAST, CPU Feature: AVX, AES AESDecryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
y is the chunk of dw array in use.
result = AddRoundKey(InvMixColumns(InvShiftRows(InvSubBytes(x))), y)
Asm: VAESDEC, CPU Feature: AVX, AES AESEncryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
y is the chunk of w array in use.
result = AddRoundKey((ShiftRows(SubBytes(x))), y)
Asm: VAESENCLAST, CPU Feature: AVX, AES AESEncryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
y is the chunk of w array in use.
result = AddRoundKey(MixColumns(ShiftRows(SubBytes(x))), y)
Asm: VAESENC, CPU Feature: AVX, AES Add adds corresponding elements of two vectors.
Asm: VPADDB, CPU Feature: AVX AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDUSB, CPU Feature: AVX And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX AndNot performs a bitwise x &^ y.
Asm: VPANDN, CPU Feature: AVX Float32x4 converts from Uint8x16 to Float32x4 Float64x2 converts from Uint8x16 to Float64x2 Int16x8 converts from Uint8x16 to Int16x8 Int32x4 converts from Uint8x16 to Int32x4 Int64x2 converts from Uint8x16 to Int64x2 Int8x16 converts from Uint8x16 to Int8x16 Uint16x8 converts from Uint8x16 to Uint16x8 Uint32x4 converts from Uint8x16 to Uint32x4 Uint64x2 converts from Uint8x16 to Uint64x2 Average computes the rounded average of corresponding elements.
Asm: VPAVGB, CPU Feature: AVX Broadcast128 copies element zero of its (128-bit) input to all elements of
the 128-bit output vector.
Asm: VPBROADCASTB, CPU Feature: AVX2 Broadcast256 copies element zero of its (128-bit) input to all elements of
the 256-bit output vector.
Asm: VPBROADCASTB, CPU Feature: AVX2 Broadcast512 copies element zero of its (128-bit) input to all elements of
the 512-bit output vector.
Asm: VPBROADCASTB, CPU Feature: AVX512 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2B, CPU Feature: AVX512VBMI ConcatShiftBytesRight concatenates x and y and shift it right by constant bytes.
The result vector will be the lower half of the concatenated vector.
constant results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPALIGNR, CPU Feature: AVX DotProductPairsSaturated multiplies the elements and add the pairs together with saturation,
yielding a vector of half as many elements with twice the input element size.
Asm: VPMADDUBSW, CPU Feature: AVX Equal returns x equals y, elementwise.
Asm: VPCMPEQB, CPU Feature: AVX Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDB, CPU Feature: AVX512VBMI2 ExtendLo2ToUint64x2 converts 2 lowest vector element values to uint64.
The result vector's elements are zero-extended.
Asm: VPMOVZXBQ, CPU Feature: AVX ExtendLo4ToUint32x4 converts 4 lowest vector element values to uint32.
The result vector's elements are zero-extended.
Asm: VPMOVZXBD, CPU Feature: AVX ExtendLo4ToUint64x4 converts 4 lowest vector element values to uint64.
The result vector's elements are zero-extended.
Asm: VPMOVZXBQ, CPU Feature: AVX2 ExtendLo8ToUint16x8 converts 8 lowest vector element values to uint16.
The result vector's elements are zero-extended.
Asm: VPMOVZXBW, CPU Feature: AVX ExtendLo8ToUint32x8 converts 8 lowest vector element values to uint32.
The result vector's elements are zero-extended.
Asm: VPMOVZXBD, CPU Feature: AVX2 ExtendLo8ToUint64x8 converts 8 lowest vector element values to uint64.
The result vector's elements are zero-extended.
Asm: VPMOVZXBQ, CPU Feature: AVX512 ExtendToUint16 converts element values to uint16.
The result vector's elements are zero-extended.
Asm: VPMOVZXBW, CPU Feature: AVX2 ExtendToUint32 converts element values to uint32.
The result vector's elements are zero-extended.
Asm: VPMOVZXBD, CPU Feature: AVX512 GaloisFieldAffineTransform computes an affine transformation in GF(2^8):
x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes;
b is an 8-bit vector. The affine transformation is y * x + b, with each element of y
corresponding to a group of 8 elements in x.
b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VGF2P8AFFINEQB, CPU Feature: AVX512GFNI GaloisFieldAffineTransformInverse computes an affine transformation in GF(2^8),
with x inverted with respect to reduction polynomial x^8 + x^4 + x^3 + x + 1:
x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes;
b is an 8-bit vector. The affine transformation is y * x + b, with each element of y
corresponding to a group of 8 elements in x.
b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VGF2P8AFFINEINVQB, CPU Feature: AVX512GFNI GaloisFieldMul computes element-wise GF(2^8) multiplication with
reduction polynomial x^8 + x^4 + x^3 + x + 1.
Asm: VGF2P8MULB, CPU Feature: AVX512GFNI GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRB, CPU Feature: AVX512 Greater returns a mask whose elements indicate whether x > y
Emulated, CPU Feature AVX2 GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX2 IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x.
x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX Len returns the number of elements in a Uint8x16 Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX2 LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX2 Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXUB, CPU Feature: AVX Merge returns x but with elements set to y where mask is false. Min computes the minimum of corresponding elements.
Asm: VPMINUB, CPU Feature: AVX Not returns the bitwise complement of x
Emulated, CPU Feature AVX NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX OnesCount counts the number of set bits in each element.
Asm: VPOPCNTB, CPU Feature: AVX512BITALG Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 4 bits (values 0-15) of each element of indices is used
Asm: VPERMB, CPU Feature: AVX512VBMI PermuteOrZero performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The lower four bits of each byte-sized index in indices select an element from x,
unless the index's sign bit is set in which case zero is used instead.
Asm: VPSHUFB, CPU Feature: AVX SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRB, CPU Feature: AVX Store stores a Uint8x16 to an array StoreSlice stores x into a slice of at least 16 uint8s StoreSlicePart stores the 16 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 16 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBB, CPU Feature: AVX SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBUSB, CPU Feature: AVX SumAbsDiff sums the absolute distance of the two input vectors, each adjacent 8 bytes as a group. The output sum will
be a vector of word-sized elements whose each 4*n-th element contains the sum of the n-th input group. The other elements in the result vector are zeroed.
This method could be seen as the norm of the L1 distance of each adjacent 8-byte vector group of the two input vectors.
Asm: VPSADBW, CPU Feature: AVX Xor performs a bitwise XOR operation between two vectors.
Asm: VPXOR, CPU Feature: AVX
Uint8x16 : expvar.Var
Uint8x16 : fmt.Stringer
func BroadcastUint8x16(x uint8) Uint8x16
func LoadUint8x16(y *[16]uint8) Uint8x16
func LoadUint8x16Slice(s []uint8) Uint8x16
func LoadUint8x16SlicePart(s []uint8) Uint8x16
func Float32x4.AsUint8x16() (to Uint8x16)
func Float64x2.AsUint8x16() (to Uint8x16)
func Int16x8.AsUint8x16() (to Uint8x16)
func Int32x4.AsUint8x16() (to Uint8x16)
func Int64x2.AsUint8x16() (to Uint8x16)
func Int8x16.AsUint8x16() (to Uint8x16)
func Uint16x16.TruncateToUint8() Uint8x16
func Uint16x8.AsUint8x16() (to Uint8x16)
func Uint16x8.TruncateToUint8() Uint8x16
func Uint32x16.TruncateToUint8() Uint8x16
func Uint32x4.AsUint8x16() (to Uint8x16)
func Uint32x4.TruncateToUint8() Uint8x16
func Uint32x8.TruncateToUint8() Uint8x16
func Uint64x2.AsUint8x16() (to Uint8x16)
func Uint64x2.TruncateToUint8() Uint8x16
func Uint64x4.TruncateToUint8() Uint8x16
func Uint64x8.TruncateToUint8() Uint8x16
func Uint8x16.Add(y Uint8x16) Uint8x16
func Uint8x16.AddSaturated(y Uint8x16) Uint8x16
func Uint8x16.AESDecryptLastRound(y Uint32x4) Uint8x16
func Uint8x16.AESDecryptOneRound(y Uint32x4) Uint8x16
func Uint8x16.AESEncryptLastRound(y Uint32x4) Uint8x16
func Uint8x16.AESEncryptOneRound(y Uint32x4) Uint8x16
func Uint8x16.And(y Uint8x16) Uint8x16
func Uint8x16.AndNot(y Uint8x16) Uint8x16
func Uint8x16.Average(y Uint8x16) Uint8x16
func Uint8x16.Broadcast128() Uint8x16
func Uint8x16.Compress(mask Mask8x16) Uint8x16
func Uint8x16.ConcatPermute(y Uint8x16, indices Uint8x16) Uint8x16
func Uint8x16.ConcatShiftBytesRight(constant uint8, y Uint8x16) Uint8x16
func Uint8x16.Expand(mask Mask8x16) Uint8x16
func Uint8x16.GaloisFieldAffineTransform(y Uint64x2, b uint8) Uint8x16
func Uint8x16.GaloisFieldAffineTransformInverse(y Uint64x2, b uint8) Uint8x16
func Uint8x16.GaloisFieldMul(y Uint8x16) Uint8x16
func Uint8x16.Masked(mask Mask8x16) Uint8x16
func Uint8x16.Max(y Uint8x16) Uint8x16
func Uint8x16.Merge(y Uint8x16, mask Mask8x16) Uint8x16
func Uint8x16.Min(y Uint8x16) Uint8x16
func Uint8x16.Not() Uint8x16
func Uint8x16.OnesCount() Uint8x16
func Uint8x16.Or(y Uint8x16) Uint8x16
func Uint8x16.Permute(indices Uint8x16) Uint8x16
func Uint8x16.PermuteOrZero(indices Int8x16) Uint8x16
func Uint8x16.SetElem(index uint8, y uint8) Uint8x16
func Uint8x16.Sub(y Uint8x16) Uint8x16
func Uint8x16.SubSaturated(y Uint8x16) Uint8x16
func Uint8x16.Xor(y Uint8x16) Uint8x16
func Uint8x32.GetHi() Uint8x16
func Uint8x32.GetLo() Uint8x16
func Int8x16.ConcatPermute(y Int8x16, indices Uint8x16) Int8x16
func Int8x16.DotProductQuadruple(y Uint8x16) Int32x4
func Int8x16.DotProductQuadrupleSaturated(y Uint8x16) Int32x4
func Int8x16.Permute(indices Uint8x16) Int8x16
func Uint8x16.Add(y Uint8x16) Uint8x16
func Uint8x16.AddSaturated(y Uint8x16) Uint8x16
func Uint8x16.And(y Uint8x16) Uint8x16
func Uint8x16.AndNot(y Uint8x16) Uint8x16
func Uint8x16.Average(y Uint8x16) Uint8x16
func Uint8x16.ConcatPermute(y Uint8x16, indices Uint8x16) Uint8x16
func Uint8x16.ConcatPermute(y Uint8x16, indices Uint8x16) Uint8x16
func Uint8x16.ConcatShiftBytesRight(constant uint8, y Uint8x16) Uint8x16
func Uint8x16.Equal(y Uint8x16) Mask8x16
func Uint8x16.GaloisFieldMul(y Uint8x16) Uint8x16
func Uint8x16.Greater(y Uint8x16) Mask8x16
func Uint8x16.GreaterEqual(y Uint8x16) Mask8x16
func Uint8x16.Less(y Uint8x16) Mask8x16
func Uint8x16.LessEqual(y Uint8x16) Mask8x16
func Uint8x16.Max(y Uint8x16) Uint8x16
func Uint8x16.Merge(y Uint8x16, mask Mask8x16) Uint8x16
func Uint8x16.Min(y Uint8x16) Uint8x16
func Uint8x16.NotEqual(y Uint8x16) Mask8x16
func Uint8x16.Or(y Uint8x16) Uint8x16
func Uint8x16.Permute(indices Uint8x16) Uint8x16
func Uint8x16.Sub(y Uint8x16) Uint8x16
func Uint8x16.SubSaturated(y Uint8x16) Uint8x16
func Uint8x16.SumAbsDiff(y Uint8x16) Uint16x8
func Uint8x16.Xor(y Uint8x16) Uint8x16
func Uint8x32.SetHi(y Uint8x16) Uint8x32
func Uint8x32.SetLo(y Uint8x16) Uint8x32
Uint8x32 is a 256-bit SIMD vector of 32 uint8 AESDecryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
y is the chunk of dw array in use.
result = AddRoundKey(InvShiftRows(InvSubBytes(x)), y)
Asm: VAESDECLAST, CPU Feature: AVX512VAES AESDecryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
y is the chunk of dw array in use.
result = AddRoundKey(InvMixColumns(InvShiftRows(InvSubBytes(x))), y)
Asm: VAESDEC, CPU Feature: AVX512VAES AESEncryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
y is the chunk of w array in use.
result = AddRoundKey((ShiftRows(SubBytes(x))), y)
Asm: VAESENCLAST, CPU Feature: AVX512VAES AESEncryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
y is the chunk of w array in use.
result = AddRoundKey(MixColumns(ShiftRows(SubBytes(x))), y)
Asm: VAESENC, CPU Feature: AVX512VAES Add adds corresponding elements of two vectors.
Asm: VPADDB, CPU Feature: AVX2 AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDUSB, CPU Feature: AVX2 And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2 AndNot performs a bitwise x &^ y.
Asm: VPANDN, CPU Feature: AVX2 Float32x8 converts from Uint8x32 to Float32x8 Float64x4 converts from Uint8x32 to Float64x4 Int16x16 converts from Uint8x32 to Int16x16 Int32x8 converts from Uint8x32 to Int32x8 Int64x4 converts from Uint8x32 to Int64x4 Int8x32 converts from Uint8x32 to Int8x32 Uint16x16 converts from Uint8x32 to Uint16x16 Uint32x8 converts from Uint8x32 to Uint32x8 Uint64x4 converts from Uint8x32 to Uint64x4 Average computes the rounded average of corresponding elements.
Asm: VPAVGB, CPU Feature: AVX2 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2B, CPU Feature: AVX512VBMI ConcatShiftBytesRightGrouped concatenates x and y and shift it right by constant bytes.
The result vector will be the lower half of the concatenated vector.
This operation is performed grouped by each 16 byte.
constant results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPALIGNR, CPU Feature: AVX2 DotProductPairsSaturated multiplies the elements and add the pairs together with saturation,
yielding a vector of half as many elements with twice the input element size.
Asm: VPMADDUBSW, CPU Feature: AVX2 Equal returns x equals y, elementwise.
Asm: VPCMPEQB, CPU Feature: AVX2 Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDB, CPU Feature: AVX512VBMI2 ExtendToUint16 converts element values to uint16.
The result vector's elements are zero-extended.
Asm: VPMOVZXBW, CPU Feature: AVX512 GaloisFieldAffineTransform computes an affine transformation in GF(2^8):
x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes;
b is an 8-bit vector. The affine transformation is y * x + b, with each element of y
corresponding to a group of 8 elements in x.
b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VGF2P8AFFINEQB, CPU Feature: AVX512GFNI GaloisFieldAffineTransformInverse computes an affine transformation in GF(2^8),
with x inverted with respect to reduction polynomial x^8 + x^4 + x^3 + x + 1:
x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes;
b is an 8-bit vector. The affine transformation is y * x + b, with each element of y
corresponding to a group of 8 elements in x.
b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VGF2P8AFFINEINVQB, CPU Feature: AVX512GFNI GaloisFieldMul computes element-wise GF(2^8) multiplication with
reduction polynomial x^8 + x^4 + x^3 + x + 1.
Asm: VGF2P8MULB, CPU Feature: AVX512GFNI GetHi returns the upper half of x.
Asm: VEXTRACTI128, CPU Feature: AVX2 GetLo returns the lower half of x.
Asm: VEXTRACTI128, CPU Feature: AVX2 Greater returns a mask whose elements indicate whether x > y
Emulated, CPU Feature AVX2 GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX2 IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x.
x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX Len returns the number of elements in a Uint8x32 Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX2 LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX2 Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXUB, CPU Feature: AVX2 Merge returns x but with elements set to y where mask is false. Min computes the minimum of corresponding elements.
Asm: VPMINUB, CPU Feature: AVX2 Not returns the bitwise complement of x
Emulated, CPU Feature AVX2 NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX2 OnesCount counts the number of set bits in each element.
Asm: VPOPCNTB, CPU Feature: AVX512BITALG Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2 Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 5 bits (values 0-31) of each element of indices is used
Asm: VPERMB, CPU Feature: AVX512VBMI PermuteOrZeroGrouped performs a grouped permutation of vector x using indices:
result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...}
The lower four bits of each byte-sized index in indices select an element from its corresponding group in x,
unless the index's sign bit is set in which case zero is used instead.
Each group is of size 128-bit.
Asm: VPSHUFB, CPU Feature: AVX2 Select128FromPair treats the 256-bit vectors x and y as a single vector of four
128-bit elements, and returns a 256-bit result formed by
concatenating the two elements specified by lo and hi.
For example,
{0x40, 0x41, ..., 0x4f, 0x50, 0x51, ..., 0x5f}.Select128FromPair(3, 0,
{0x60, 0x61, ..., 0x6f, 0x70, 0x71, ..., 0x7f})
returns {0x70, 0x71, ..., 0x7f, 0x40, 0x41, ..., 0x4f}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table.
lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2 SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2 SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2 Store stores a Uint8x32 to an array StoreSlice stores x into a slice of at least 32 uint8s StoreSlicePart stores the 32 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 32 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBB, CPU Feature: AVX2 SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBUSB, CPU Feature: AVX2 SumAbsDiff sums the absolute distance of the two input vectors, each adjacent 8 bytes as a group. The output sum will
be a vector of word-sized elements whose each 4*n-th element contains the sum of the n-th input group. The other elements in the result vector are zeroed.
This method could be seen as the norm of the L1 distance of each adjacent 8-byte vector group of the two input vectors.
Asm: VPSADBW, CPU Feature: AVX2 Xor performs a bitwise XOR operation between two vectors.
Asm: VPXOR, CPU Feature: AVX2
Uint8x32 : expvar.Var
Uint8x32 : fmt.Stringer
func BroadcastUint8x32(x uint8) Uint8x32
func LoadUint8x32(y *[32]uint8) Uint8x32
func LoadUint8x32Slice(s []uint8) Uint8x32
func LoadUint8x32SlicePart(s []uint8) Uint8x32
func Float32x8.AsUint8x32() (to Uint8x32)
func Float64x4.AsUint8x32() (to Uint8x32)
func Int16x16.AsUint8x32() (to Uint8x32)
func Int32x8.AsUint8x32() (to Uint8x32)
func Int64x4.AsUint8x32() (to Uint8x32)
func Int8x32.AsUint8x32() (to Uint8x32)
func Uint16x16.AsUint8x32() (to Uint8x32)
func Uint16x32.SaturateToUint8() Uint8x32
func Uint16x32.TruncateToUint8() Uint8x32
func Uint32x8.AsUint8x32() (to Uint8x32)
func Uint64x4.AsUint8x32() (to Uint8x32)
func Uint8x16.Broadcast256() Uint8x32
func Uint8x32.Add(y Uint8x32) Uint8x32
func Uint8x32.AddSaturated(y Uint8x32) Uint8x32
func Uint8x32.AESDecryptLastRound(y Uint32x8) Uint8x32
func Uint8x32.AESDecryptOneRound(y Uint32x8) Uint8x32
func Uint8x32.AESEncryptLastRound(y Uint32x8) Uint8x32
func Uint8x32.AESEncryptOneRound(y Uint32x8) Uint8x32
func Uint8x32.And(y Uint8x32) Uint8x32
func Uint8x32.AndNot(y Uint8x32) Uint8x32
func Uint8x32.Average(y Uint8x32) Uint8x32
func Uint8x32.Compress(mask Mask8x32) Uint8x32
func Uint8x32.ConcatPermute(y Uint8x32, indices Uint8x32) Uint8x32
func Uint8x32.ConcatShiftBytesRightGrouped(constant uint8, y Uint8x32) Uint8x32
func Uint8x32.Expand(mask Mask8x32) Uint8x32
func Uint8x32.GaloisFieldAffineTransform(y Uint64x4, b uint8) Uint8x32
func Uint8x32.GaloisFieldAffineTransformInverse(y Uint64x4, b uint8) Uint8x32
func Uint8x32.GaloisFieldMul(y Uint8x32) Uint8x32
func Uint8x32.Masked(mask Mask8x32) Uint8x32
func Uint8x32.Max(y Uint8x32) Uint8x32
func Uint8x32.Merge(y Uint8x32, mask Mask8x32) Uint8x32
func Uint8x32.Min(y Uint8x32) Uint8x32
func Uint8x32.Not() Uint8x32
func Uint8x32.OnesCount() Uint8x32
func Uint8x32.Or(y Uint8x32) Uint8x32
func Uint8x32.Permute(indices Uint8x32) Uint8x32
func Uint8x32.PermuteOrZeroGrouped(indices Int8x32) Uint8x32
func Uint8x32.Select128FromPair(lo, hi uint8, y Uint8x32) Uint8x32
func Uint8x32.SetHi(y Uint8x16) Uint8x32
func Uint8x32.SetLo(y Uint8x16) Uint8x32
func Uint8x32.Sub(y Uint8x32) Uint8x32
func Uint8x32.SubSaturated(y Uint8x32) Uint8x32
func Uint8x32.Xor(y Uint8x32) Uint8x32
func Uint8x64.GetHi() Uint8x32
func Uint8x64.GetLo() Uint8x32
func Int8x32.ConcatPermute(y Int8x32, indices Uint8x32) Int8x32
func Int8x32.DotProductQuadruple(y Uint8x32) Int32x8
func Int8x32.DotProductQuadrupleSaturated(y Uint8x32) Int32x8
func Int8x32.Permute(indices Uint8x32) Int8x32
func Uint8x32.Add(y Uint8x32) Uint8x32
func Uint8x32.AddSaturated(y Uint8x32) Uint8x32
func Uint8x32.And(y Uint8x32) Uint8x32
func Uint8x32.AndNot(y Uint8x32) Uint8x32
func Uint8x32.Average(y Uint8x32) Uint8x32
func Uint8x32.ConcatPermute(y Uint8x32, indices Uint8x32) Uint8x32
func Uint8x32.ConcatPermute(y Uint8x32, indices Uint8x32) Uint8x32
func Uint8x32.ConcatShiftBytesRightGrouped(constant uint8, y Uint8x32) Uint8x32
func Uint8x32.Equal(y Uint8x32) Mask8x32
func Uint8x32.GaloisFieldMul(y Uint8x32) Uint8x32
func Uint8x32.Greater(y Uint8x32) Mask8x32
func Uint8x32.GreaterEqual(y Uint8x32) Mask8x32
func Uint8x32.Less(y Uint8x32) Mask8x32
func Uint8x32.LessEqual(y Uint8x32) Mask8x32
func Uint8x32.Max(y Uint8x32) Uint8x32
func Uint8x32.Merge(y Uint8x32, mask Mask8x32) Uint8x32
func Uint8x32.Min(y Uint8x32) Uint8x32
func Uint8x32.NotEqual(y Uint8x32) Mask8x32
func Uint8x32.Or(y Uint8x32) Uint8x32
func Uint8x32.Permute(indices Uint8x32) Uint8x32
func Uint8x32.Select128FromPair(lo, hi uint8, y Uint8x32) Uint8x32
func Uint8x32.Sub(y Uint8x32) Uint8x32
func Uint8x32.SubSaturated(y Uint8x32) Uint8x32
func Uint8x32.SumAbsDiff(y Uint8x32) Uint16x16
func Uint8x32.Xor(y Uint8x32) Uint8x32
func Uint8x64.SetHi(y Uint8x32) Uint8x64
func Uint8x64.SetLo(y Uint8x32) Uint8x64
Uint8x64 is a 512-bit SIMD vector of 64 uint8 AESDecryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
y is the chunk of dw array in use.
result = AddRoundKey(InvShiftRows(InvSubBytes(x)), y)
Asm: VAESDECLAST, CPU Feature: AVX512VAES AESDecryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
y is the chunk of dw array in use.
result = AddRoundKey(InvMixColumns(InvShiftRows(InvSubBytes(x))), y)
Asm: VAESDEC, CPU Feature: AVX512VAES AESEncryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
y is the chunk of w array in use.
result = AddRoundKey((ShiftRows(SubBytes(x))), y)
Asm: VAESENCLAST, CPU Feature: AVX512VAES AESEncryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
y is the chunk of w array in use.
result = AddRoundKey(MixColumns(ShiftRows(SubBytes(x))), y)
Asm: VAESENC, CPU Feature: AVX512VAES Add adds corresponding elements of two vectors.
Asm: VPADDB, CPU Feature: AVX512 AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDUSB, CPU Feature: AVX512 And performs a bitwise AND operation between two vectors.
Asm: VPANDD, CPU Feature: AVX512 AndNot performs a bitwise x &^ y.
Asm: VPANDND, CPU Feature: AVX512 Float32x16 converts from Uint8x64 to Float32x16 Float64x8 converts from Uint8x64 to Float64x8 Int16x32 converts from Uint8x64 to Int16x32 Int32x16 converts from Uint8x64 to Int32x16 Int64x8 converts from Uint8x64 to Int64x8 Int8x64 converts from Uint8x64 to Int8x64 Uint16x32 converts from Uint8x64 to Uint16x32 Uint32x16 converts from Uint8x64 to Uint32x16 Uint64x8 converts from Uint8x64 to Uint64x8 Average computes the rounded average of corresponding elements.
Asm: VPAVGB, CPU Feature: AVX512 Compress performs a compression on vector x using mask by
selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2 ConcatPermute performs a full permutation of vector x, y using indices:
result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half).
Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2B, CPU Feature: AVX512VBMI ConcatShiftBytesRightGrouped concatenates x and y and shift it right by constant bytes.
The result vector will be the lower half of the concatenated vector.
This operation is performed grouped by each 16 byte.
constant results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPALIGNR, CPU Feature: AVX512 DotProductPairsSaturated multiplies the elements and add the pairs together with saturation,
yielding a vector of half as many elements with twice the input element size.
Asm: VPMADDUBSW, CPU Feature: AVX512 Equal returns x equals y, elementwise.
Asm: VPCMPEQB, CPU Feature: AVX512 Expand performs an expansion on a vector x whose elements are packed to lower parts.
The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDB, CPU Feature: AVX512VBMI2 GaloisFieldAffineTransform computes an affine transformation in GF(2^8):
x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes;
b is an 8-bit vector. The affine transformation is y * x + b, with each element of y
corresponding to a group of 8 elements in x.
b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VGF2P8AFFINEQB, CPU Feature: AVX512GFNI GaloisFieldAffineTransformInverse computes an affine transformation in GF(2^8),
with x inverted with respect to reduction polynomial x^8 + x^4 + x^3 + x + 1:
x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes;
b is an 8-bit vector. The affine transformation is y * x + b, with each element of y
corresponding to a group of 8 elements in x.
b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VGF2P8AFFINEINVQB, CPU Feature: AVX512GFNI GaloisFieldMul computes element-wise GF(2^8) multiplication with
reduction polynomial x^8 + x^4 + x^3 + x + 1.
Asm: VGF2P8MULB, CPU Feature: AVX512GFNI GetHi returns the upper half of x.
Asm: VEXTRACTI64X4, CPU Feature: AVX512 GetLo returns the lower half of x.
Asm: VEXTRACTI64X4, CPU Feature: AVX512 Greater returns x greater-than y, elementwise.
Asm: VPCMPUB, CPU Feature: AVX512 GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VPCMPUB, CPU Feature: AVX512 Len returns the number of elements in a Uint8x64 Less returns x less-than y, elementwise.
Asm: VPCMPUB, CPU Feature: AVX512 LessEqual returns x less-than-or-equals y, elementwise.
Asm: VPCMPUB, CPU Feature: AVX512 Masked returns x but with elements zeroed where mask is false. Max computes the maximum of corresponding elements.
Asm: VPMAXUB, CPU Feature: AVX512 Merge returns x but with elements set to y where m is false. Min computes the minimum of corresponding elements.
Asm: VPMINUB, CPU Feature: AVX512 Not returns the bitwise complement of x
Emulated, CPU Feature AVX512 NotEqual returns x not-equals y, elementwise.
Asm: VPCMPUB, CPU Feature: AVX512 OnesCount counts the number of set bits in each element.
Asm: VPOPCNTB, CPU Feature: AVX512BITALG Or performs a bitwise OR operation between two vectors.
Asm: VPORD, CPU Feature: AVX512 Permute performs a full permutation of vector x using indices:
result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 6 bits (values 0-63) of each element of indices is used
Asm: VPERMB, CPU Feature: AVX512VBMI PermuteOrZeroGrouped performs a grouped permutation of vector x using indices:
result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...}
The lower four bits of each byte-sized index in indices select an element from its corresponding group in x,
unless the index's sign bit is set in which case zero is used instead.
Each group is of size 128-bit.
Asm: VPSHUFB, CPU Feature: AVX512 SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512 SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512 Store stores a Uint8x64 to an array StoreMasked stores a Uint8x64 to an array,
at those elements enabled by mask
Asm: VMOVDQU8, CPU Feature: AVX512 StoreSlice stores x into a slice of at least 64 uint8s StoreSlicePart stores the 64 elements of x into the slice s.
It stores as many elements as will fit in s.
If s has 64 or more elements, the method is equivalent to x.StoreSlice. String returns a string representation of SIMD vector x Sub subtracts corresponding elements of two vectors.
Asm: VPSUBB, CPU Feature: AVX512 SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBUSB, CPU Feature: AVX512 SumAbsDiff sums the absolute distance of the two input vectors, each adjacent 8 bytes as a group. The output sum will
be a vector of word-sized elements whose each 4*n-th element contains the sum of the n-th input group. The other elements in the result vector are zeroed.
This method could be seen as the norm of the L1 distance of each adjacent 8-byte vector group of the two input vectors.
Asm: VPSADBW, CPU Feature: AVX512 Xor performs a bitwise XOR operation between two vectors.
Asm: VPXORD, CPU Feature: AVX512
Uint8x64 : expvar.Var
Uint8x64 : fmt.Stringer
func BroadcastUint8x64(x uint8) Uint8x64
func LoadMaskedUint8x64(y *[64]uint8, mask Mask8x64) Uint8x64
func LoadUint8x64(y *[64]uint8) Uint8x64
func LoadUint8x64Slice(s []uint8) Uint8x64
func LoadUint8x64SlicePart(s []uint8) Uint8x64
func Float32x16.AsUint8x64() (to Uint8x64)
func Float64x8.AsUint8x64() (to Uint8x64)
func Int16x32.AsUint8x64() (to Uint8x64)
func Int32x16.AsUint8x64() (to Uint8x64)
func Int64x8.AsUint8x64() (to Uint8x64)
func Int8x64.AsUint8x64() (to Uint8x64)
func Uint16x32.AsUint8x64() (to Uint8x64)
func Uint32x16.AsUint8x64() (to Uint8x64)
func Uint64x8.AsUint8x64() (to Uint8x64)
func Uint8x16.Broadcast512() Uint8x64
func Uint8x64.Add(y Uint8x64) Uint8x64
func Uint8x64.AddSaturated(y Uint8x64) Uint8x64
func Uint8x64.AESDecryptLastRound(y Uint32x16) Uint8x64
func Uint8x64.AESDecryptOneRound(y Uint32x16) Uint8x64
func Uint8x64.AESEncryptLastRound(y Uint32x16) Uint8x64
func Uint8x64.AESEncryptOneRound(y Uint32x16) Uint8x64
func Uint8x64.And(y Uint8x64) Uint8x64
func Uint8x64.AndNot(y Uint8x64) Uint8x64
func Uint8x64.Average(y Uint8x64) Uint8x64
func Uint8x64.Compress(mask Mask8x64) Uint8x64
func Uint8x64.ConcatPermute(y Uint8x64, indices Uint8x64) Uint8x64
func Uint8x64.ConcatShiftBytesRightGrouped(constant uint8, y Uint8x64) Uint8x64
func Uint8x64.Expand(mask Mask8x64) Uint8x64
func Uint8x64.GaloisFieldAffineTransform(y Uint64x8, b uint8) Uint8x64
func Uint8x64.GaloisFieldAffineTransformInverse(y Uint64x8, b uint8) Uint8x64
func Uint8x64.GaloisFieldMul(y Uint8x64) Uint8x64
func Uint8x64.Masked(mask Mask8x64) Uint8x64
func Uint8x64.Max(y Uint8x64) Uint8x64
func Uint8x64.Merge(y Uint8x64, mask Mask8x64) Uint8x64
func Uint8x64.Min(y Uint8x64) Uint8x64
func Uint8x64.Not() Uint8x64
func Uint8x64.OnesCount() Uint8x64
func Uint8x64.Or(y Uint8x64) Uint8x64
func Uint8x64.Permute(indices Uint8x64) Uint8x64
func Uint8x64.PermuteOrZeroGrouped(indices Int8x64) Uint8x64
func Uint8x64.SetHi(y Uint8x32) Uint8x64
func Uint8x64.SetLo(y Uint8x32) Uint8x64
func Uint8x64.Sub(y Uint8x64) Uint8x64
func Uint8x64.SubSaturated(y Uint8x64) Uint8x64
func Uint8x64.Xor(y Uint8x64) Uint8x64
func Int8x64.ConcatPermute(y Int8x64, indices Uint8x64) Int8x64
func Int8x64.DotProductQuadruple(y Uint8x64) Int32x16
func Int8x64.DotProductQuadrupleSaturated(y Uint8x64) Int32x16
func Int8x64.Permute(indices Uint8x64) Int8x64
func Uint8x64.Add(y Uint8x64) Uint8x64
func Uint8x64.AddSaturated(y Uint8x64) Uint8x64
func Uint8x64.And(y Uint8x64) Uint8x64
func Uint8x64.AndNot(y Uint8x64) Uint8x64
func Uint8x64.Average(y Uint8x64) Uint8x64
func Uint8x64.ConcatPermute(y Uint8x64, indices Uint8x64) Uint8x64
func Uint8x64.ConcatPermute(y Uint8x64, indices Uint8x64) Uint8x64
func Uint8x64.ConcatShiftBytesRightGrouped(constant uint8, y Uint8x64) Uint8x64
func Uint8x64.Equal(y Uint8x64) Mask8x64
func Uint8x64.GaloisFieldMul(y Uint8x64) Uint8x64
func Uint8x64.Greater(y Uint8x64) Mask8x64
func Uint8x64.GreaterEqual(y Uint8x64) Mask8x64
func Uint8x64.Less(y Uint8x64) Mask8x64
func Uint8x64.LessEqual(y Uint8x64) Mask8x64
func Uint8x64.Max(y Uint8x64) Uint8x64
func Uint8x64.Merge(y Uint8x64, mask Mask8x64) Uint8x64
func Uint8x64.Min(y Uint8x64) Uint8x64
func Uint8x64.NotEqual(y Uint8x64) Mask8x64
func Uint8x64.Or(y Uint8x64) Uint8x64
func Uint8x64.Permute(indices Uint8x64) Uint8x64
func Uint8x64.Sub(y Uint8x64) Uint8x64
func Uint8x64.SubSaturated(y Uint8x64) Uint8x64
func Uint8x64.SumAbsDiff(y Uint8x64) Uint16x32
func Uint8x64.Xor(y Uint8x64) Uint8x64
AES returns whether the CPU supports the AES feature.
AES is defined on all GOARCHes, but will only return true on
GOARCH amd64. AVX returns whether the CPU supports the AVX feature.
AVX is defined on all GOARCHes, but will only return true on
GOARCH amd64. AVX2 returns whether the CPU supports the AVX2 feature.
AVX2 is defined on all GOARCHes, but will only return true on
GOARCH amd64. AVX512 returns whether the CPU supports the AVX512F+CD+BW+DQ+VL features.
These five CPU features are bundled together, and no use of AVX-512
is allowed unless all of these features are supported together.
Nearly every CPU that has shipped with any support for AVX-512 has
supported all five of these features.
AVX512 is defined on all GOARCHes, but will only return true on
GOARCH amd64. AVX512BITALG returns whether the CPU supports the AVX512BITALG feature.
AVX512BITALG is defined on all GOARCHes, but will only return true on
GOARCH amd64. AVX512GFNI returns whether the CPU supports the AVX512GFNI feature.
AVX512GFNI is defined on all GOARCHes, but will only return true on
GOARCH amd64. AVX512VAES returns whether the CPU supports the AVX512VAES feature.
AVX512VAES is defined on all GOARCHes, but will only return true on
GOARCH amd64. AVX512VBMI returns whether the CPU supports the AVX512VBMI feature.
AVX512VBMI is defined on all GOARCHes, but will only return true on
GOARCH amd64. AVX512VBMI2 returns whether the CPU supports the AVX512VBMI2 feature.
AVX512VBMI2 is defined on all GOARCHes, but will only return true on
GOARCH amd64. AVX512VNNI returns whether the CPU supports the AVX512VNNI feature.
AVX512VNNI is defined on all GOARCHes, but will only return true on
GOARCH amd64. AVX512VPCLMULQDQ returns whether the CPU supports the AVX512VPCLMULQDQ feature.
AVX512VPCLMULQDQ is defined on all GOARCHes, but will only return true on
GOARCH amd64. AVX512VPOPCNTDQ returns whether the CPU supports the AVX512VPOPCNTDQ feature.
AVX512VPOPCNTDQ is defined on all GOARCHes, but will only return true on
GOARCH amd64. AVXVNNI returns whether the CPU supports the AVXVNNI feature.
AVXVNNI is defined on all GOARCHes, but will only return true on
GOARCH amd64. SHA returns whether the CPU supports the SHA feature.
SHA is defined on all GOARCHes, but will only return true on
GOARCH amd64.
var X86
Package-Level Functions (total 155)
BroadcastFloat32x16 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX512F
BroadcastFloat32x4 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX2
BroadcastFloat32x8 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX2
BroadcastFloat64x2 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX2
BroadcastFloat64x4 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX2
BroadcastFloat64x8 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX512F
BroadcastInt16x16 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX2
BroadcastInt16x32 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX512BW
BroadcastInt16x8 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX2
BroadcastInt32x16 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX512F
BroadcastInt32x4 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX2
BroadcastInt32x8 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX2
BroadcastInt64x2 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX2
BroadcastInt64x4 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX2
BroadcastInt64x8 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX512F
BroadcastInt8x16 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX2
BroadcastInt8x32 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX2
BroadcastInt8x64 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX512BW
BroadcastUint16x16 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX2
BroadcastUint16x32 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX512BW
BroadcastUint16x8 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX2
BroadcastUint32x16 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX512F
BroadcastUint32x4 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX2
BroadcastUint32x8 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX2
BroadcastUint64x2 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX2
BroadcastUint64x4 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX2
BroadcastUint64x8 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX512F
BroadcastUint8x16 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX2
BroadcastUint8x32 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX2
BroadcastUint8x64 returns a vector with the input
x assigned to all elements of the output.
Emulated, CPU Feature AVX512BW
ClearAVXUpperBits clears the high bits of Y0-Y15 and Z0-Z15 registers.
It is intended for transitioning from AVX to SSE, eliminating the
performance penalties caused by false dependencies.
Note: in the future the compiler may automatically generate the
instruction, making this function unnecessary.
Asm: VZEROUPPER, CPU Feature: AVX
LoadFloat32x16 loads a Float32x16 from an array
LoadFloat32x16Slice loads a Float32x16 from a slice of at least 16 float32s
LoadFloat32x16SlicePart loads a Float32x16 from the slice s.
If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes.
If s has 16 or more elements, the function is equivalent to LoadFloat32x16Slice.
LoadFloat32x4 loads a Float32x4 from an array
LoadFloat32x4Slice loads a Float32x4 from a slice of at least 4 float32s
LoadFloat32x4SlicePart loads a Float32x4 from the slice s.
If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes.
If s has 4 or more elements, the function is equivalent to LoadFloat32x4Slice.
LoadFloat32x8 loads a Float32x8 from an array
LoadFloat32x8Slice loads a Float32x8 from a slice of at least 8 float32s
LoadFloat32x8SlicePart loads a Float32x8 from the slice s.
If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes.
If s has 8 or more elements, the function is equivalent to LoadFloat32x8Slice.
LoadFloat64x2 loads a Float64x2 from an array
LoadFloat64x2Slice loads a Float64x2 from a slice of at least 2 float64s
LoadFloat64x2SlicePart loads a Float64x2 from the slice s.
If s has fewer than 2 elements, the remaining elements of the vector are filled with zeroes.
If s has 2 or more elements, the function is equivalent to LoadFloat64x2Slice.
LoadFloat64x4 loads a Float64x4 from an array
LoadFloat64x4Slice loads a Float64x4 from a slice of at least 4 float64s
LoadFloat64x4SlicePart loads a Float64x4 from the slice s.
If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes.
If s has 4 or more elements, the function is equivalent to LoadFloat64x4Slice.
LoadFloat64x8 loads a Float64x8 from an array
LoadFloat64x8Slice loads a Float64x8 from a slice of at least 8 float64s
LoadFloat64x8SlicePart loads a Float64x8 from the slice s.
If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes.
If s has 8 or more elements, the function is equivalent to LoadFloat64x8Slice.
LoadInt16x16 loads a Int16x16 from an array
LoadInt16x16Slice loads an Int16x16 from a slice of at least 16 int16s
LoadInt16x16SlicePart loads a Int16x16 from the slice s.
If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes.
If s has 16 or more elements, the function is equivalent to LoadInt16x16Slice.
LoadInt16x32 loads a Int16x32 from an array
LoadInt16x32Slice loads an Int16x32 from a slice of at least 32 int16s
LoadInt16x32SlicePart loads a Int16x32 from the slice s.
If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes.
If s has 32 or more elements, the function is equivalent to LoadInt16x32Slice.
LoadInt16x8 loads a Int16x8 from an array
LoadInt16x8Slice loads an Int16x8 from a slice of at least 8 int16s
LoadInt16x8SlicePart loads a Int16x8 from the slice s.
If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes.
If s has 8 or more elements, the function is equivalent to LoadInt16x8Slice.
LoadInt32x16 loads a Int32x16 from an array
LoadInt32x16Slice loads an Int32x16 from a slice of at least 16 int32s
LoadInt32x16SlicePart loads a Int32x16 from the slice s.
If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes.
If s has 16 or more elements, the function is equivalent to LoadInt32x16Slice.
LoadInt32x4 loads a Int32x4 from an array
LoadInt32x4Slice loads an Int32x4 from a slice of at least 4 int32s
LoadInt32x4SlicePart loads a Int32x4 from the slice s.
If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes.
If s has 4 or more elements, the function is equivalent to LoadInt32x4Slice.
LoadInt32x8 loads a Int32x8 from an array
LoadInt32x8Slice loads an Int32x8 from a slice of at least 8 int32s
LoadInt32x8SlicePart loads a Int32x8 from the slice s.
If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes.
If s has 8 or more elements, the function is equivalent to LoadInt32x8Slice.
LoadInt64x2 loads a Int64x2 from an array
LoadInt64x2Slice loads an Int64x2 from a slice of at least 2 int64s
LoadInt64x2SlicePart loads a Int64x2 from the slice s.
If s has fewer than 2 elements, the remaining elements of the vector are filled with zeroes.
If s has 2 or more elements, the function is equivalent to LoadInt64x2Slice.
LoadInt64x4 loads a Int64x4 from an array
LoadInt64x4Slice loads an Int64x4 from a slice of at least 4 int64s
LoadInt64x4SlicePart loads a Int64x4 from the slice s.
If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes.
If s has 4 or more elements, the function is equivalent to LoadInt64x4Slice.
LoadInt64x8 loads a Int64x8 from an array
LoadInt64x8Slice loads an Int64x8 from a slice of at least 8 int64s
LoadInt64x8SlicePart loads a Int64x8 from the slice s.
If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes.
If s has 8 or more elements, the function is equivalent to LoadInt64x8Slice.
LoadInt8x16 loads a Int8x16 from an array
LoadInt8x16Slice loads an Int8x16 from a slice of at least 16 int8s
LoadInt8x16SlicePart loads a Int8x16 from the slice s.
If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes.
If s has 16 or more elements, the function is equivalent to LoadInt8x16Slice.
LoadInt8x32 loads a Int8x32 from an array
LoadInt8x32Slice loads an Int8x32 from a slice of at least 32 int8s
LoadInt8x32SlicePart loads a Int8x32 from the slice s.
If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes.
If s has 32 or more elements, the function is equivalent to LoadInt8x32Slice.
LoadInt8x64 loads a Int8x64 from an array
LoadInt8x64Slice loads an Int8x64 from a slice of at least 64 int8s
LoadInt8x64SlicePart loads a Int8x64 from the slice s.
If s has fewer than 64 elements, the remaining elements of the vector are filled with zeroes.
If s has 64 or more elements, the function is equivalent to LoadInt8x64Slice.
LoadMaskedFloat32x16 loads a Float32x16 from an array,
at those elements enabled by mask
Asm: VMOVDQU32.Z, CPU Feature: AVX512
LoadMaskedFloat32x4 loads a Float32x4 from an array,
at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2
LoadMaskedFloat32x8 loads a Float32x8 from an array,
at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2
LoadMaskedFloat64x2 loads a Float64x2 from an array,
at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2
LoadMaskedFloat64x4 loads a Float64x4 from an array,
at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2
LoadMaskedFloat64x8 loads a Float64x8 from an array,
at those elements enabled by mask
Asm: VMOVDQU64.Z, CPU Feature: AVX512
LoadMaskedInt16x32 loads a Int16x32 from an array,
at those elements enabled by mask
Asm: VMOVDQU16.Z, CPU Feature: AVX512
LoadMaskedInt32x16 loads a Int32x16 from an array,
at those elements enabled by mask
Asm: VMOVDQU32.Z, CPU Feature: AVX512
LoadMaskedInt32x4 loads a Int32x4 from an array,
at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2
LoadMaskedInt32x8 loads a Int32x8 from an array,
at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2
LoadMaskedInt64x2 loads a Int64x2 from an array,
at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2
LoadMaskedInt64x4 loads a Int64x4 from an array,
at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2
LoadMaskedInt64x8 loads a Int64x8 from an array,
at those elements enabled by mask
Asm: VMOVDQU64.Z, CPU Feature: AVX512
LoadMaskedInt8x64 loads a Int8x64 from an array,
at those elements enabled by mask
Asm: VMOVDQU8.Z, CPU Feature: AVX512
LoadMaskedUint16x32 loads a Uint16x32 from an array,
at those elements enabled by mask
Asm: VMOVDQU16.Z, CPU Feature: AVX512
LoadMaskedUint32x16 loads a Uint32x16 from an array,
at those elements enabled by mask
Asm: VMOVDQU32.Z, CPU Feature: AVX512
LoadMaskedUint32x4 loads a Uint32x4 from an array,
at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2
LoadMaskedUint32x8 loads a Uint32x8 from an array,
at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2
LoadMaskedUint64x2 loads a Uint64x2 from an array,
at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2
LoadMaskedUint64x4 loads a Uint64x4 from an array,
at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2
LoadMaskedUint64x8 loads a Uint64x8 from an array,
at those elements enabled by mask
Asm: VMOVDQU64.Z, CPU Feature: AVX512
LoadMaskedUint8x64 loads a Uint8x64 from an array,
at those elements enabled by mask
Asm: VMOVDQU8.Z, CPU Feature: AVX512
LoadUint16x16 loads a Uint16x16 from an array
LoadUint16x16Slice loads an Uint16x16 from a slice of at least 16 uint16s
LoadUint16x16SlicePart loads a Uint16x16 from the slice s.
If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes.
If s has 16 or more elements, the function is equivalent to LoadUint16x16Slice.
LoadUint16x32 loads a Uint16x32 from an array
LoadUint16x32Slice loads an Uint16x32 from a slice of at least 32 uint16s
LoadUint16x32SlicePart loads a Uint16x32 from the slice s.
If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes.
If s has 32 or more elements, the function is equivalent to LoadUint16x32Slice.
LoadUint16x8 loads a Uint16x8 from an array
LoadUint16x8Slice loads an Uint16x8 from a slice of at least 8 uint16s
LoadUint16x8SlicePart loads a Uint16x8 from the slice s.
If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes.
If s has 8 or more elements, the function is equivalent to LoadUint16x8Slice.
LoadUint32x16 loads a Uint32x16 from an array
LoadUint32x16Slice loads an Uint32x16 from a slice of at least 16 uint32s
LoadUint32x16SlicePart loads a Uint32x16 from the slice s.
If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes.
If s has 16 or more elements, the function is equivalent to LoadUint32x16Slice.
LoadUint32x4 loads a Uint32x4 from an array
LoadUint32x4Slice loads an Uint32x4 from a slice of at least 4 uint32s
LoadUint32x4SlicePart loads a Uint32x4 from the slice s.
If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes.
If s has 4 or more elements, the function is equivalent to LoadUint32x4Slice.
LoadUint32x8 loads a Uint32x8 from an array
LoadUint32x8Slice loads an Uint32x8 from a slice of at least 8 uint32s
LoadUint32x8SlicePart loads a Uint32x8 from the slice s.
If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes.
If s has 8 or more elements, the function is equivalent to LoadUint32x8Slice.
LoadUint64x2 loads a Uint64x2 from an array
LoadUint64x2Slice loads an Uint64x2 from a slice of at least 2 uint64s
LoadUint64x2SlicePart loads a Uint64x2 from the slice s.
If s has fewer than 2 elements, the remaining elements of the vector are filled with zeroes.
If s has 2 or more elements, the function is equivalent to LoadUint64x2Slice.
LoadUint64x4 loads a Uint64x4 from an array
LoadUint64x4Slice loads an Uint64x4 from a slice of at least 4 uint64s
LoadUint64x4SlicePart loads a Uint64x4 from the slice s.
If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes.
If s has 4 or more elements, the function is equivalent to LoadUint64x4Slice.
LoadUint64x8 loads a Uint64x8 from an array
LoadUint64x8Slice loads an Uint64x8 from a slice of at least 8 uint64s
LoadUint64x8SlicePart loads a Uint64x8 from the slice s.
If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes.
If s has 8 or more elements, the function is equivalent to LoadUint64x8Slice.
LoadUint8x16 loads a Uint8x16 from an array
LoadUint8x16Slice loads an Uint8x16 from a slice of at least 16 uint8s
LoadUint8x16SlicePart loads a Uint8x16 from the slice s.
If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes.
If s has 16 or more elements, the function is equivalent to LoadUint8x16Slice.
LoadUint8x32 loads a Uint8x32 from an array
LoadUint8x32Slice loads an Uint8x32 from a slice of at least 32 uint8s
LoadUint8x32SlicePart loads a Uint8x32 from the slice s.
If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes.
If s has 32 or more elements, the function is equivalent to LoadUint8x32Slice.
LoadUint8x64 loads a Uint8x64 from an array
LoadUint8x64Slice loads an Uint8x64 from a slice of at least 64 uint8s
LoadUint8x64SlicePart loads a Uint8x64 from the slice s.
If s has fewer than 64 elements, the remaining elements of the vector are filled with zeroes.
If s has 64 or more elements, the function is equivalent to LoadUint8x64Slice.
Mask16x16FromBits constructs a Mask16x16 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVW, CPU Feature: AVX512
Mask16x32FromBits constructs a Mask16x32 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVW, CPU Feature: AVX512
Mask16x8FromBits constructs a Mask16x8 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVW, CPU Feature: AVX512
Mask32x16FromBits constructs a Mask32x16 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVD, CPU Feature: AVX512
Mask32x4FromBits constructs a Mask32x4 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Only the lower 4 bits of y are used.
Asm: KMOVD, CPU Feature: AVX512
Mask32x8FromBits constructs a Mask32x8 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVD, CPU Feature: AVX512
Mask64x2FromBits constructs a Mask64x2 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Only the lower 2 bits of y are used.
Asm: KMOVQ, CPU Feature: AVX512
Mask64x4FromBits constructs a Mask64x4 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Only the lower 4 bits of y are used.
Asm: KMOVQ, CPU Feature: AVX512
Mask64x8FromBits constructs a Mask64x8 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVQ, CPU Feature: AVX512
Mask8x16FromBits constructs a Mask8x16 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVB, CPU Feature: AVX512
Mask8x32FromBits constructs a Mask8x32 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVB, CPU Feature: AVX512
Mask8x64FromBits constructs a Mask8x64 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVB, CPU Feature: AVX512
The pages are generated with Goldsv0.8.3-preview. (GOOS=linux GOARCH=amd64)
Golds is a Go 101 project developed by Tapir Liu.
PR and bug reports are welcome and can be submitted to the issue list.
Please follow @zigo_101 (reachable from the left QR code) to get the latest news of Golds.