Package: simd/archsimd

package archsimd

Import Path
	simd/archsimd (on go.dev)

Dependency Relation
	imports 3 packages, and imported by 0 packages

Involved Source Files

	    compare_gen_amd64.go
	    cpu.go
	  d doc.go
		Package archsimd provides access to architecture-specific SIMD operations.
		
		This is a low-level package that exposes hardware-specific functionality.
		It currently supports AMD64.
		
		This package is experimental, and not subject to the Go 1 compatibility promise.
		It only exists when building with the GOEXPERIMENT=simd environment variable set.
		
		# Vector types and operations
		
		Vector types are defined as structs, such as Int8x16 and Float64x8, corresponding
		to the hardware's vector registers. On AMD64, 128-, 256-, and 512-bit vectors are
		supported.
		
		Mask types are defined similarly, such as Mask8x16, and are represented as
		opaque types, handling the differences in the underlying representations.
		A mask can be converted to/from the corresponding integer vector type, or
		to/from a bitmask.
		
		Operations are mostly defined as methods on the vector types. Most of them
		are compiler intrinsics and correspond directly to hardware instructions.
		
		Common operations include:
		  - Load/Store: Load a vector from memory or store a vector to memory.
		  - Arithmetic: Add, Sub, Mul, etc.
		  - Bitwise: And, Or, Xor, etc.
		  - Comparison: Equal, Greater, etc., which produce a mask.
		  - Conversion: Convert between different vector types.
		  - Field selection and rearrangement: GetElem, Permute, etc.
		  - Masking: Masked, Merge.
		
		The compiler recognizes certain patterns of operations and may optimize
		them to more performant instructions. For example, on AVX512, an Add operation
		followed by Masked may be optimized to a masked add instruction.
		For this reason, not all hardware instructions are available as APIs.
		
		# CPU feature checks
		
		The package provides global variables to check for CPU features available
		at runtime. For example, on AMD64, the [X86] variable provides methods to
		check for AVX2, AVX512, etc.
		It is recommended to check for CPU features before using the corresponding
		vector operations.
		
		# Notes
		
		  - This package is not portable, as the available types and operations depend
		    on the target architecture. It is not recommended to expose the SIMD types
		    defined in this package in public APIs.
		  - For performance reasons, it is recommended to use the vector types directly
		    as values. It is not recommended to take the address of a vector type,
		    allocate it in the heap, or put it in an aggregate type.

	    extra_amd64.go
	    generate.go
	    maskmerge_gen_amd64.go
	    ops_amd64.go
	    ops_internal_amd64.go
	    other_gen_amd64.go
	    shuffles_amd64.go
	    slice_gen_amd64.go
	    slicepart_amd64.go
	    string.go
	    types_amd64.go
	    unsafe_helpers.go
	    dummy.s
Package-Level Type Names (total 43)

	/* sort by: alphabet | popularity */
	 type Float32x16 (struct)
		Float32x16 is a 512-bit SIMD vector of 16 float32

		Methods (total 56)
			( Float32x16) Add(y Float32x16) Float32x16
				Add adds corresponding elements of two vectors.
				
				Asm: VADDPS, CPU Feature: AVX512

			( Float32x16) AsFloat64x8() (to Float64x8)
				Float64x8 converts from Float32x16 to Float64x8

			( Float32x16) AsInt16x32() (to Int16x32)
				Int16x32 converts from Float32x16 to Int16x32

			( Float32x16) AsInt32x16() (to Int32x16)
				Int32x16 converts from Float32x16 to Int32x16

			( Float32x16) AsInt64x8() (to Int64x8)
				Int64x8 converts from Float32x16 to Int64x8

			( Float32x16) AsInt8x64() (to Int8x64)
				Int8x64 converts from Float32x16 to Int8x64

			( Float32x16) AsUint16x32() (to Uint16x32)
				Uint16x32 converts from Float32x16 to Uint16x32

			( Float32x16) AsUint32x16() (to Uint32x16)
				Uint32x16 converts from Float32x16 to Uint32x16

			( Float32x16) AsUint64x8() (to Uint64x8)
				Uint64x8 converts from Float32x16 to Uint64x8

			( Float32x16) AsUint8x64() (to Uint8x64)
				Uint8x64 converts from Float32x16 to Uint8x64

			( Float32x16) CeilScaled(prec uint8) Float32x16
				CeilScaled rounds elements up with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPS, CPU Feature: AVX512

			( Float32x16) CeilScaledResidue(prec uint8) Float32x16
				CeilScaledResidue computes the difference after ceiling with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPS, CPU Feature: AVX512

			( Float32x16) Compress(mask Mask32x16) Float32x16
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VCOMPRESSPS, CPU Feature: AVX512

			( Float32x16) ConcatPermute(y Float32x16, indices Uint32x16) Float32x16
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2PS, CPU Feature: AVX512

			( Float32x16) ConvertToInt32() Int32x16
				ConvertToInt32 converts element values to int32.
				When a conversion is inexact, a truncated (round toward zero) value is returned.
				If a converted result cannot be represented in int32, an implementation-defined
				architecture-specific value is returned.
				
				Asm: VCVTTPS2DQ, CPU Feature: AVX512

			( Float32x16) ConvertToUint32() Uint32x16
				ConvertToUint32 converts element values to uint32.
				When a conversion is inexact, a truncated (round toward zero) value is returned.
				If a converted result cannot be represented in uint32, an implementation-defined
				architecture-specific value is returned.
				
				Asm: VCVTTPS2UDQ, CPU Feature: AVX512

			( Float32x16) Div(y Float32x16) Float32x16
				Div divides elements of two vectors.
				
				Asm: VDIVPS, CPU Feature: AVX512

			( Float32x16) Equal(y Float32x16) Mask32x16
				Equal returns x equals y, elementwise.
				
				Asm: VCMPPS, CPU Feature: AVX512

			( Float32x16) Expand(mask Mask32x16) Float32x16
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VEXPANDPS, CPU Feature: AVX512

			( Float32x16) FloorScaled(prec uint8) Float32x16
				FloorScaled rounds elements down with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPS, CPU Feature: AVX512

			( Float32x16) FloorScaledResidue(prec uint8) Float32x16
				FloorScaledResidue computes the difference after flooring with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPS, CPU Feature: AVX512

			( Float32x16) GetHi() Float32x8
				GetHi returns the upper half of x.
				
				Asm: VEXTRACTF64X4, CPU Feature: AVX512

			( Float32x16) GetLo() Float32x8
				GetLo returns the lower half of x.
				
				Asm: VEXTRACTF64X4, CPU Feature: AVX512

			( Float32x16) Greater(y Float32x16) Mask32x16
				Greater returns x greater-than y, elementwise.
				
				Asm: VCMPPS, CPU Feature: AVX512

			( Float32x16) GreaterEqual(y Float32x16) Mask32x16
				GreaterEqual returns x greater-than-or-equals y, elementwise.
				
				Asm: VCMPPS, CPU Feature: AVX512

			( Float32x16) IsNan(y Float32x16) Mask32x16
				IsNan checks if elements are NaN. Use as x.IsNan(x).
				
				Asm: VCMPPS, CPU Feature: AVX512

			( Float32x16) Len() int
				Len returns the number of elements in a Float32x16

			( Float32x16) Less(y Float32x16) Mask32x16
				Less returns x less-than y, elementwise.
				
				Asm: VCMPPS, CPU Feature: AVX512

			( Float32x16) LessEqual(y Float32x16) Mask32x16
				LessEqual returns x less-than-or-equals y, elementwise.
				
				Asm: VCMPPS, CPU Feature: AVX512

			( Float32x16) Masked(mask Mask32x16) Float32x16
				Masked returns x but with elements zeroed where mask is false.

			( Float32x16) Max(y Float32x16) Float32x16
				Max computes the maximum of corresponding elements.
				
				Asm: VMAXPS, CPU Feature: AVX512

			( Float32x16) Merge(y Float32x16, mask Mask32x16) Float32x16
				Merge returns x but with elements set to y where m is false.

			( Float32x16) Min(y Float32x16) Float32x16
				Min computes the minimum of corresponding elements.
				
				Asm: VMINPS, CPU Feature: AVX512

			( Float32x16) Mul(y Float32x16) Float32x16
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VMULPS, CPU Feature: AVX512

			( Float32x16) MulAdd(y Float32x16, z Float32x16) Float32x16
				MulAdd performs a fused (x * y) + z.
				
				Asm: VFMADD213PS, CPU Feature: AVX512

			( Float32x16) MulAddSub(y Float32x16, z Float32x16) Float32x16
				MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
				
				Asm: VFMADDSUB213PS, CPU Feature: AVX512

			( Float32x16) MulSubAdd(y Float32x16, z Float32x16) Float32x16
				MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
				
				Asm: VFMSUBADD213PS, CPU Feature: AVX512

			( Float32x16) NotEqual(y Float32x16) Mask32x16
				NotEqual returns x not-equals y, elementwise.
				
				Asm: VCMPPS, CPU Feature: AVX512

			( Float32x16) Permute(indices Uint32x16) Float32x16
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 4 bits (values 0-15) of each element of indices is used
				
				Asm: VPERMPS, CPU Feature: AVX512

			( Float32x16) Reciprocal() Float32x16
				Reciprocal computes an approximate reciprocal of each element.
				
				Asm: VRCP14PS, CPU Feature: AVX512

			( Float32x16) ReciprocalSqrt() Float32x16
				ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
				
				Asm: VRSQRT14PS, CPU Feature: AVX512

			( Float32x16) RoundToEvenScaled(prec uint8) Float32x16
				RoundToEvenScaled rounds elements with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPS, CPU Feature: AVX512

			( Float32x16) RoundToEvenScaledResidue(prec uint8) Float32x16
				RoundToEvenScaledResidue computes the difference after rounding with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPS, CPU Feature: AVX512

			( Float32x16) Scale(y Float32x16) Float32x16
				Scale multiplies elements by a power of 2.
				
				Asm: VSCALEFPS, CPU Feature: AVX512

			( Float32x16) SelectFromPairGrouped(a, b, c, d uint8, y Float32x16) Float32x16
				SelectFromPairGrouped returns, for each of the four 128-bit subvectors
				of the vectors x and y, the selection of four elements from  x and y,
				where selector values in the range 0-3 specify elements from x and
				values in the range 4-7 specify the 0-3 elements of y.
				When the selectors are constants and can be the selection
				can be implemented in a single instruction, it will be, otherwise
				it requires two.
				
				If the selectors are not constant this will translate to a function
				call.
				
				Asm: VSHUFPS, CPU Feature: AVX512

			( Float32x16) SetHi(y Float32x8) Float32x16
				SetHi returns x with its upper half set to y.
				
				Asm: VINSERTF64X4, CPU Feature: AVX512

			( Float32x16) SetLo(y Float32x8) Float32x16
				SetLo returns x with its lower half set to y.
				
				Asm: VINSERTF64X4, CPU Feature: AVX512

			( Float32x16) Sqrt() Float32x16
				Sqrt computes the square root of each element.
				
				Asm: VSQRTPS, CPU Feature: AVX512

			( Float32x16) Store(y *[16]float32)
				Store stores a Float32x16 to an array

			( Float32x16) StoreMasked(y *[16]float32, mask Mask32x16)
				StoreMasked stores a Float32x16 to an array,
				at those elements enabled by mask
				
				Asm: VMOVDQU32, CPU Feature: AVX512

			( Float32x16) StoreSlice(s []float32)
				StoreSlice stores x into a slice of at least 16 float32s

			( Float32x16) StoreSlicePart(s []float32)
				StoreSlicePart stores the 16 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 16 or more elements, the method is equivalent to x.StoreSlice.

			( Float32x16) String() string
				String returns a string representation of SIMD vector x

			( Float32x16) Sub(y Float32x16) Float32x16
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VSUBPS, CPU Feature: AVX512

			( Float32x16) TruncScaled(prec uint8) Float32x16
				TruncScaled truncates elements with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPS, CPU Feature: AVX512

			( Float32x16) TruncScaledResidue(prec uint8) Float32x16
				TruncScaledResidue computes the difference after truncating with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPS, CPU Feature: AVX512

		Implements (at least 2)
			 Float32x16 : expvar.Var
			 Float32x16 : fmt.Stringer
		As Outputs Of (at least 47)
			func BroadcastFloat32x16(x float32) Float32x16
			func LoadFloat32x16(y *[16]float32) Float32x16
			func LoadFloat32x16Slice(s []float32) Float32x16
			func LoadFloat32x16SlicePart(s []float32) Float32x16
			func LoadMaskedFloat32x16(y *[16]float32, mask Mask32x16) Float32x16
			func Float32x16.Add(y Float32x16) Float32x16
			func Float32x16.CeilScaled(prec uint8) Float32x16
			func Float32x16.CeilScaledResidue(prec uint8) Float32x16
			func Float32x16.Compress(mask Mask32x16) Float32x16
			func Float32x16.ConcatPermute(y Float32x16, indices Uint32x16) Float32x16
			func Float32x16.Div(y Float32x16) Float32x16
			func Float32x16.Expand(mask Mask32x16) Float32x16
			func Float32x16.FloorScaled(prec uint8) Float32x16
			func Float32x16.FloorScaledResidue(prec uint8) Float32x16
			func Float32x16.Masked(mask Mask32x16) Float32x16
			func Float32x16.Max(y Float32x16) Float32x16
			func Float32x16.Merge(y Float32x16, mask Mask32x16) Float32x16
			func Float32x16.Min(y Float32x16) Float32x16
			func Float32x16.Mul(y Float32x16) Float32x16
			func Float32x16.MulAdd(y Float32x16, z Float32x16) Float32x16
			func Float32x16.MulAddSub(y Float32x16, z Float32x16) Float32x16
			func Float32x16.MulSubAdd(y Float32x16, z Float32x16) Float32x16
			func Float32x16.Permute(indices Uint32x16) Float32x16
			func Float32x16.Reciprocal() Float32x16
			func Float32x16.ReciprocalSqrt() Float32x16
			func Float32x16.RoundToEvenScaled(prec uint8) Float32x16
			func Float32x16.RoundToEvenScaledResidue(prec uint8) Float32x16
			func Float32x16.Scale(y Float32x16) Float32x16
			func Float32x16.SelectFromPairGrouped(a, b, c, d uint8, y Float32x16) Float32x16
			func Float32x16.SetHi(y Float32x8) Float32x16
			func Float32x16.SetLo(y Float32x8) Float32x16
			func Float32x16.Sqrt() Float32x16
			func Float32x16.Sub(y Float32x16) Float32x16
			func Float32x16.TruncScaled(prec uint8) Float32x16
			func Float32x16.TruncScaledResidue(prec uint8) Float32x16
			func Float32x4.Broadcast512() Float32x16
			func Float64x8.AsFloat32x16() (to Float32x16)
			func Int16x32.AsFloat32x16() (to Float32x16)
			func Int32x16.AsFloat32x16() (to Float32x16)
			func Int32x16.ConvertToFloat32() Float32x16
			func Int64x8.AsFloat32x16() (to Float32x16)
			func Int8x64.AsFloat32x16() (to Float32x16)
			func Uint16x32.AsFloat32x16() (to Float32x16)
			func Uint32x16.AsFloat32x16() (to Float32x16)
			func Uint32x16.ConvertToFloat32() Float32x16
			func Uint64x8.AsFloat32x16() (to Float32x16)
			func Uint8x64.AsFloat32x16() (to Float32x16)
		As Inputs Of (at least 23)
			func Float32x16.Add(y Float32x16) Float32x16
			func Float32x16.ConcatPermute(y Float32x16, indices Uint32x16) Float32x16
			func Float32x16.Div(y Float32x16) Float32x16
			func Float32x16.Equal(y Float32x16) Mask32x16
			func Float32x16.Greater(y Float32x16) Mask32x16
			func Float32x16.GreaterEqual(y Float32x16) Mask32x16
			func Float32x16.IsNan(y Float32x16) Mask32x16
			func Float32x16.Less(y Float32x16) Mask32x16
			func Float32x16.LessEqual(y Float32x16) Mask32x16
			func Float32x16.Max(y Float32x16) Float32x16
			func Float32x16.Merge(y Float32x16, mask Mask32x16) Float32x16
			func Float32x16.Min(y Float32x16) Float32x16
			func Float32x16.Mul(y Float32x16) Float32x16
			func Float32x16.MulAdd(y Float32x16, z Float32x16) Float32x16
			func Float32x16.MulAdd(y Float32x16, z Float32x16) Float32x16
			func Float32x16.MulAddSub(y Float32x16, z Float32x16) Float32x16
			func Float32x16.MulAddSub(y Float32x16, z Float32x16) Float32x16
			func Float32x16.MulSubAdd(y Float32x16, z Float32x16) Float32x16
			func Float32x16.MulSubAdd(y Float32x16, z Float32x16) Float32x16
			func Float32x16.NotEqual(y Float32x16) Mask32x16
			func Float32x16.Scale(y Float32x16) Float32x16
			func Float32x16.SelectFromPairGrouped(a, b, c, d uint8, y Float32x16) Float32x16
			func Float32x16.Sub(y Float32x16) Float32x16

	 type Float32x4 (struct)
		Float32x4 is a 128-bit SIMD vector of 4 float32

		Methods (total 66)
			( Float32x4) Add(y Float32x4) Float32x4
				Add adds corresponding elements of two vectors.
				
				Asm: VADDPS, CPU Feature: AVX

			( Float32x4) AddPairs(y Float32x4) Float32x4
				AddPairs horizontally adds adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
				
				Asm: VHADDPS, CPU Feature: AVX

			( Float32x4) AddSub(y Float32x4) Float32x4
				AddSub subtracts even elements and adds odd elements of two vectors.
				
				Asm: VADDSUBPS, CPU Feature: AVX

			( Float32x4) AsFloat64x2() (to Float64x2)
				Float64x2 converts from Float32x4 to Float64x2

			( Float32x4) AsInt16x8() (to Int16x8)
				Int16x8 converts from Float32x4 to Int16x8

			( Float32x4) AsInt32x4() (to Int32x4)
				Int32x4 converts from Float32x4 to Int32x4

			( Float32x4) AsInt64x2() (to Int64x2)
				Int64x2 converts from Float32x4 to Int64x2

			( Float32x4) AsInt8x16() (to Int8x16)
				Int8x16 converts from Float32x4 to Int8x16

			( Float32x4) AsUint16x8() (to Uint16x8)
				Uint16x8 converts from Float32x4 to Uint16x8

			( Float32x4) AsUint32x4() (to Uint32x4)
				Uint32x4 converts from Float32x4 to Uint32x4

			( Float32x4) AsUint64x2() (to Uint64x2)
				Uint64x2 converts from Float32x4 to Uint64x2

			( Float32x4) AsUint8x16() (to Uint8x16)
				Uint8x16 converts from Float32x4 to Uint8x16

			( Float32x4) Broadcast128() Float32x4
				Broadcast128 copies element zero of its (128-bit) input to all elements of
				the 128-bit output vector.
				
				Asm: VBROADCASTSS, CPU Feature: AVX2

			( Float32x4) Broadcast256() Float32x8
				Broadcast256 copies element zero of its (128-bit) input to all elements of
				the 256-bit output vector.
				
				Asm: VBROADCASTSS, CPU Feature: AVX2

			( Float32x4) Broadcast512() Float32x16
				Broadcast512 copies element zero of its (128-bit) input to all elements of
				the 512-bit output vector.
				
				Asm: VBROADCASTSS, CPU Feature: AVX512

			( Float32x4) Ceil() Float32x4
				Ceil rounds elements up to the nearest integer.
				
				Asm: VROUNDPS, CPU Feature: AVX

			( Float32x4) CeilScaled(prec uint8) Float32x4
				CeilScaled rounds elements up with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPS, CPU Feature: AVX512

			( Float32x4) CeilScaledResidue(prec uint8) Float32x4
				CeilScaledResidue computes the difference after ceiling with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPS, CPU Feature: AVX512

			( Float32x4) Compress(mask Mask32x4) Float32x4
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VCOMPRESSPS, CPU Feature: AVX512

			( Float32x4) ConcatPermute(y Float32x4, indices Uint32x4) Float32x4
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2PS, CPU Feature: AVX512

			( Float32x4) ConvertToFloat64() Float64x4
				ConvertToFloat64 converts element values to float64.
				
				Asm: VCVTPS2PD, CPU Feature: AVX

			( Float32x4) ConvertToInt32() Int32x4
				ConvertToInt32 converts element values to int32.
				When a conversion is inexact, a truncated (round toward zero) value is returned.
				If a converted result cannot be represented in int32, an implementation-defined
				architecture-specific value is returned.
				
				Asm: VCVTTPS2DQ, CPU Feature: AVX

			( Float32x4) ConvertToInt64() Int64x4
				ConvertToInt64 converts element values to int64.
				When a conversion is inexact, a truncated (round toward zero) value is returned.
				If a converted result cannot be represented in int64, an implementation-defined
				architecture-specific value is returned.
				
				Asm: VCVTTPS2QQ, CPU Feature: AVX512

			( Float32x4) ConvertToUint32() Uint32x4
				ConvertToUint32 converts element values to uint32.
				When a conversion is inexact, a truncated (round toward zero) value is returned.
				If a converted result cannot be represented in uint32, an implementation-defined
				architecture-specific value is returned.
				
				Asm: VCVTTPS2UDQ, CPU Feature: AVX512

			( Float32x4) ConvertToUint64() Uint64x4
				ConvertToUint64 converts element values to uint64.
				When a conversion is inexact, a truncated (round toward zero) value is returned.
				If a converted result cannot be represented in uint64, an implementation-defined
				architecture-specific value is returned.
				
				Asm: VCVTTPS2UQQ, CPU Feature: AVX512

			( Float32x4) Div(y Float32x4) Float32x4
				Div divides elements of two vectors.
				
				Asm: VDIVPS, CPU Feature: AVX

			( Float32x4) Equal(y Float32x4) Mask32x4
				Equal returns x equals y, elementwise.
				
				Asm: VCMPPS, CPU Feature: AVX

			( Float32x4) Expand(mask Mask32x4) Float32x4
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VEXPANDPS, CPU Feature: AVX512

			( Float32x4) Floor() Float32x4
				Floor rounds elements down to the nearest integer.
				
				Asm: VROUNDPS, CPU Feature: AVX

			( Float32x4) FloorScaled(prec uint8) Float32x4
				FloorScaled rounds elements down with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPS, CPU Feature: AVX512

			( Float32x4) FloorScaledResidue(prec uint8) Float32x4
				FloorScaledResidue computes the difference after flooring with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPS, CPU Feature: AVX512

			( Float32x4) GetElem(index uint8) float32
				GetElem retrieves a single constant-indexed element's value.
				
				index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPEXTRD, CPU Feature: AVX

			( Float32x4) Greater(y Float32x4) Mask32x4
				Greater returns x greater-than y, elementwise.
				
				Asm: VCMPPS, CPU Feature: AVX

			( Float32x4) GreaterEqual(y Float32x4) Mask32x4
				GreaterEqual returns x greater-than-or-equals y, elementwise.
				
				Asm: VCMPPS, CPU Feature: AVX

			( Float32x4) IsNan(y Float32x4) Mask32x4
				IsNan checks if elements are NaN. Use as x.IsNan(x).
				
				Asm: VCMPPS, CPU Feature: AVX

			( Float32x4) Len() int
				Len returns the number of elements in a Float32x4

			( Float32x4) Less(y Float32x4) Mask32x4
				Less returns x less-than y, elementwise.
				
				Asm: VCMPPS, CPU Feature: AVX

			( Float32x4) LessEqual(y Float32x4) Mask32x4
				LessEqual returns x less-than-or-equals y, elementwise.
				
				Asm: VCMPPS, CPU Feature: AVX

			( Float32x4) Masked(mask Mask32x4) Float32x4
				Masked returns x but with elements zeroed where mask is false.

			( Float32x4) Max(y Float32x4) Float32x4
				Max computes the maximum of corresponding elements.
				
				Asm: VMAXPS, CPU Feature: AVX

			( Float32x4) Merge(y Float32x4, mask Mask32x4) Float32x4
				Merge returns x but with elements set to y where mask is false.

			( Float32x4) Min(y Float32x4) Float32x4
				Min computes the minimum of corresponding elements.
				
				Asm: VMINPS, CPU Feature: AVX

			( Float32x4) Mul(y Float32x4) Float32x4
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VMULPS, CPU Feature: AVX

			( Float32x4) MulAdd(y Float32x4, z Float32x4) Float32x4
				MulAdd performs a fused (x * y) + z.
				
				Asm: VFMADD213PS, CPU Feature: AVX512

			( Float32x4) MulAddSub(y Float32x4, z Float32x4) Float32x4
				MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
				
				Asm: VFMADDSUB213PS, CPU Feature: AVX512

			( Float32x4) MulSubAdd(y Float32x4, z Float32x4) Float32x4
				MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
				
				Asm: VFMSUBADD213PS, CPU Feature: AVX512

			( Float32x4) NotEqual(y Float32x4) Mask32x4
				NotEqual returns x not-equals y, elementwise.
				
				Asm: VCMPPS, CPU Feature: AVX

			( Float32x4) Reciprocal() Float32x4
				Reciprocal computes an approximate reciprocal of each element.
				
				Asm: VRCPPS, CPU Feature: AVX

			( Float32x4) ReciprocalSqrt() Float32x4
				ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
				
				Asm: VRSQRTPS, CPU Feature: AVX

			( Float32x4) RoundToEven() Float32x4
				RoundToEven rounds elements to the nearest integer.
				
				Asm: VROUNDPS, CPU Feature: AVX

			( Float32x4) RoundToEvenScaled(prec uint8) Float32x4
				RoundToEvenScaled rounds elements with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPS, CPU Feature: AVX512

			( Float32x4) RoundToEvenScaledResidue(prec uint8) Float32x4
				RoundToEvenScaledResidue computes the difference after rounding with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPS, CPU Feature: AVX512

			( Float32x4) Scale(y Float32x4) Float32x4
				Scale multiplies elements by a power of 2.
				
				Asm: VSCALEFPS, CPU Feature: AVX512

			( Float32x4) SelectFromPair(a, b, c, d uint8, y Float32x4) Float32x4
				SelectFromPair returns the selection of four elements from the two
				vectors x and y, where selector values in the range 0-3 specify
				elements from x and values in the range 4-7 specify the 0-3 elements
				of y.  When the selectors are constants and can be the selection
				can be implemented in a single instruction, it will be, otherwise
				it requires two. a is the source index of the least element in the
				output, and b, c, and d are the indices of the 2nd, 3rd, and 4th
				elements in the output.  For example,
				{1,2,4,8}.SelectFromPair(2,3,5,7,{9,25,49,81}) returns {4,8,25,81}
				
				If the selectors are not constant this will translate to a function
				call.
				
				Asm: VSHUFPS, CPU Feature: AVX

			( Float32x4) SetElem(index uint8, y float32) Float32x4
				SetElem sets a single constant-indexed element's value.
				
				index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPINSRD, CPU Feature: AVX

			( Float32x4) Sqrt() Float32x4
				Sqrt computes the square root of each element.
				
				Asm: VSQRTPS, CPU Feature: AVX

			( Float32x4) Store(y *[4]float32)
				Store stores a Float32x4 to an array

			( Float32x4) StoreMasked(y *[4]float32, mask Mask32x4)
				StoreMasked stores a Float32x4 to an array,
				at those elements enabled by mask
				
				Asm: VMASKMOVD, CPU Feature: AVX2

			( Float32x4) StoreSlice(s []float32)
				StoreSlice stores x into a slice of at least 4 float32s

			( Float32x4) StoreSlicePart(s []float32)
				StoreSlicePart stores the 4 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 4 or more elements, the method is equivalent to x.StoreSlice.

			( Float32x4) String() string
				String returns a string representation of SIMD vector x

			( Float32x4) Sub(y Float32x4) Float32x4
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VSUBPS, CPU Feature: AVX

			( Float32x4) SubPairs(y Float32x4) Float32x4
				SubPairs horizontally subtracts adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
				
				Asm: VHSUBPS, CPU Feature: AVX

			( Float32x4) Trunc() Float32x4
				Trunc truncates elements towards zero.
				
				Asm: VROUNDPS, CPU Feature: AVX

			( Float32x4) TruncScaled(prec uint8) Float32x4
				TruncScaled truncates elements with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPS, CPU Feature: AVX512

			( Float32x4) TruncScaledResidue(prec uint8) Float32x4
				TruncScaledResidue computes the difference after truncating with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPS, CPU Feature: AVX512

		Implements (at least 2)
			 Float32x4 : expvar.Var
			 Float32x4 : fmt.Stringer
		As Outputs Of (at least 60)
			func BroadcastFloat32x4(x float32) Float32x4
			func LoadFloat32x4(y *[4]float32) Float32x4
			func LoadFloat32x4Slice(s []float32) Float32x4
			func LoadFloat32x4SlicePart(s []float32) Float32x4
			func LoadMaskedFloat32x4(y *[4]float32, mask Mask32x4) Float32x4
			func Float32x4.Add(y Float32x4) Float32x4
			func Float32x4.AddPairs(y Float32x4) Float32x4
			func Float32x4.AddSub(y Float32x4) Float32x4
			func Float32x4.Broadcast128() Float32x4
			func Float32x4.Ceil() Float32x4
			func Float32x4.CeilScaled(prec uint8) Float32x4
			func Float32x4.CeilScaledResidue(prec uint8) Float32x4
			func Float32x4.Compress(mask Mask32x4) Float32x4
			func Float32x4.ConcatPermute(y Float32x4, indices Uint32x4) Float32x4
			func Float32x4.Div(y Float32x4) Float32x4
			func Float32x4.Expand(mask Mask32x4) Float32x4
			func Float32x4.Floor() Float32x4
			func Float32x4.FloorScaled(prec uint8) Float32x4
			func Float32x4.FloorScaledResidue(prec uint8) Float32x4
			func Float32x4.Masked(mask Mask32x4) Float32x4
			func Float32x4.Max(y Float32x4) Float32x4
			func Float32x4.Merge(y Float32x4, mask Mask32x4) Float32x4
			func Float32x4.Min(y Float32x4) Float32x4
			func Float32x4.Mul(y Float32x4) Float32x4
			func Float32x4.MulAdd(y Float32x4, z Float32x4) Float32x4
			func Float32x4.MulAddSub(y Float32x4, z Float32x4) Float32x4
			func Float32x4.MulSubAdd(y Float32x4, z Float32x4) Float32x4
			func Float32x4.Reciprocal() Float32x4
			func Float32x4.ReciprocalSqrt() Float32x4
			func Float32x4.RoundToEven() Float32x4
			func Float32x4.RoundToEvenScaled(prec uint8) Float32x4
			func Float32x4.RoundToEvenScaledResidue(prec uint8) Float32x4
			func Float32x4.Scale(y Float32x4) Float32x4
			func Float32x4.SelectFromPair(a, b, c, d uint8, y Float32x4) Float32x4
			func Float32x4.SetElem(index uint8, y float32) Float32x4
			func Float32x4.Sqrt() Float32x4
			func Float32x4.Sub(y Float32x4) Float32x4
			func Float32x4.SubPairs(y Float32x4) Float32x4
			func Float32x4.Trunc() Float32x4
			func Float32x4.TruncScaled(prec uint8) Float32x4
			func Float32x4.TruncScaledResidue(prec uint8) Float32x4
			func Float32x8.GetHi() Float32x4
			func Float32x8.GetLo() Float32x4
			func Float64x2.AsFloat32x4() (to Float32x4)
			func Float64x2.ConvertToFloat32() Float32x4
			func Float64x4.ConvertToFloat32() Float32x4
			func Int16x8.AsFloat32x4() (to Float32x4)
			func Int32x4.AsFloat32x4() (to Float32x4)
			func Int32x4.ConvertToFloat32() Float32x4
			func Int64x2.AsFloat32x4() (to Float32x4)
			func Int64x2.ConvertToFloat32() Float32x4
			func Int64x4.ConvertToFloat32() Float32x4
			func Int8x16.AsFloat32x4() (to Float32x4)
			func Uint16x8.AsFloat32x4() (to Float32x4)
			func Uint32x4.AsFloat32x4() (to Float32x4)
			func Uint32x4.ConvertToFloat32() Float32x4
			func Uint64x2.AsFloat32x4() (to Float32x4)
			func Uint64x2.ConvertToFloat32() Float32x4
			func Uint64x4.ConvertToFloat32() Float32x4
			func Uint8x16.AsFloat32x4() (to Float32x4)
		As Inputs Of (at least 28)
			func Float32x4.Add(y Float32x4) Float32x4
			func Float32x4.AddPairs(y Float32x4) Float32x4
			func Float32x4.AddSub(y Float32x4) Float32x4
			func Float32x4.ConcatPermute(y Float32x4, indices Uint32x4) Float32x4
			func Float32x4.Div(y Float32x4) Float32x4
			func Float32x4.Equal(y Float32x4) Mask32x4
			func Float32x4.Greater(y Float32x4) Mask32x4
			func Float32x4.GreaterEqual(y Float32x4) Mask32x4
			func Float32x4.IsNan(y Float32x4) Mask32x4
			func Float32x4.Less(y Float32x4) Mask32x4
			func Float32x4.LessEqual(y Float32x4) Mask32x4
			func Float32x4.Max(y Float32x4) Float32x4
			func Float32x4.Merge(y Float32x4, mask Mask32x4) Float32x4
			func Float32x4.Min(y Float32x4) Float32x4
			func Float32x4.Mul(y Float32x4) Float32x4
			func Float32x4.MulAdd(y Float32x4, z Float32x4) Float32x4
			func Float32x4.MulAdd(y Float32x4, z Float32x4) Float32x4
			func Float32x4.MulAddSub(y Float32x4, z Float32x4) Float32x4
			func Float32x4.MulAddSub(y Float32x4, z Float32x4) Float32x4
			func Float32x4.MulSubAdd(y Float32x4, z Float32x4) Float32x4
			func Float32x4.MulSubAdd(y Float32x4, z Float32x4) Float32x4
			func Float32x4.NotEqual(y Float32x4) Mask32x4
			func Float32x4.Scale(y Float32x4) Float32x4
			func Float32x4.SelectFromPair(a, b, c, d uint8, y Float32x4) Float32x4
			func Float32x4.Sub(y Float32x4) Float32x4
			func Float32x4.SubPairs(y Float32x4) Float32x4
			func Float32x8.SetHi(y Float32x4) Float32x8
			func Float32x8.SetLo(y Float32x4) Float32x8

	 type Float32x8 (struct)
		Float32x8 is a 256-bit SIMD vector of 8 float32

		Methods (total 67)
			( Float32x8) Add(y Float32x8) Float32x8
				Add adds corresponding elements of two vectors.
				
				Asm: VADDPS, CPU Feature: AVX

			( Float32x8) AddPairs(y Float32x8) Float32x8
				AddPairs horizontally adds adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
				
				Asm: VHADDPS, CPU Feature: AVX

			( Float32x8) AddSub(y Float32x8) Float32x8
				AddSub subtracts even elements and adds odd elements of two vectors.
				
				Asm: VADDSUBPS, CPU Feature: AVX

			( Float32x8) AsFloat64x4() (to Float64x4)
				Float64x4 converts from Float32x8 to Float64x4

			( Float32x8) AsInt16x16() (to Int16x16)
				Int16x16 converts from Float32x8 to Int16x16

			( Float32x8) AsInt32x8() (to Int32x8)
				Int32x8 converts from Float32x8 to Int32x8

			( Float32x8) AsInt64x4() (to Int64x4)
				Int64x4 converts from Float32x8 to Int64x4

			( Float32x8) AsInt8x32() (to Int8x32)
				Int8x32 converts from Float32x8 to Int8x32

			( Float32x8) AsUint16x16() (to Uint16x16)
				Uint16x16 converts from Float32x8 to Uint16x16

			( Float32x8) AsUint32x8() (to Uint32x8)
				Uint32x8 converts from Float32x8 to Uint32x8

			( Float32x8) AsUint64x4() (to Uint64x4)
				Uint64x4 converts from Float32x8 to Uint64x4

			( Float32x8) AsUint8x32() (to Uint8x32)
				Uint8x32 converts from Float32x8 to Uint8x32

			( Float32x8) Ceil() Float32x8
				Ceil rounds elements up to the nearest integer.
				
				Asm: VROUNDPS, CPU Feature: AVX

			( Float32x8) CeilScaled(prec uint8) Float32x8
				CeilScaled rounds elements up with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPS, CPU Feature: AVX512

			( Float32x8) CeilScaledResidue(prec uint8) Float32x8
				CeilScaledResidue computes the difference after ceiling with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPS, CPU Feature: AVX512

			( Float32x8) Compress(mask Mask32x8) Float32x8
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VCOMPRESSPS, CPU Feature: AVX512

			( Float32x8) ConcatPermute(y Float32x8, indices Uint32x8) Float32x8
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2PS, CPU Feature: AVX512

			( Float32x8) ConvertToFloat64() Float64x8
				ConvertToFloat64 converts element values to float64.
				
				Asm: VCVTPS2PD, CPU Feature: AVX512

			( Float32x8) ConvertToInt32() Int32x8
				ConvertToInt32 converts element values to int32.
				When a conversion is inexact, a truncated (round toward zero) value is returned.
				If a converted result cannot be represented in int32, an implementation-defined
				architecture-specific value is returned.
				
				Asm: VCVTTPS2DQ, CPU Feature: AVX

			( Float32x8) ConvertToInt64() Int64x8
				ConvertToInt64 converts element values to int64.
				When a conversion is inexact, a truncated (round toward zero) value is returned.
				If a converted result cannot be represented in int64, an implementation-defined
				architecture-specific value is returned.
				
				Asm: VCVTTPS2QQ, CPU Feature: AVX512

			( Float32x8) ConvertToUint32() Uint32x8
				ConvertToUint32 converts element values to uint32.
				When a conversion is inexact, a truncated (round toward zero) value is returned.
				If a converted result cannot be represented in uint32, an implementation-defined
				architecture-specific value is returned.
				
				Asm: VCVTTPS2UDQ, CPU Feature: AVX512

			( Float32x8) ConvertToUint64() Uint64x8
				ConvertToUint64 converts element values to uint64.
				When a conversion is inexact, a truncated (round toward zero) value is returned.
				If a converted result cannot be represented in uint64, an implementation-defined
				architecture-specific value is returned.
				
				Asm: VCVTTPS2UQQ, CPU Feature: AVX512

			( Float32x8) Div(y Float32x8) Float32x8
				Div divides elements of two vectors.
				
				Asm: VDIVPS, CPU Feature: AVX

			( Float32x8) Equal(y Float32x8) Mask32x8
				Equal returns x equals y, elementwise.
				
				Asm: VCMPPS, CPU Feature: AVX

			( Float32x8) Expand(mask Mask32x8) Float32x8
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VEXPANDPS, CPU Feature: AVX512

			( Float32x8) Floor() Float32x8
				Floor rounds elements down to the nearest integer.
				
				Asm: VROUNDPS, CPU Feature: AVX

			( Float32x8) FloorScaled(prec uint8) Float32x8
				FloorScaled rounds elements down with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPS, CPU Feature: AVX512

			( Float32x8) FloorScaledResidue(prec uint8) Float32x8
				FloorScaledResidue computes the difference after flooring with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPS, CPU Feature: AVX512

			( Float32x8) GetHi() Float32x4
				GetHi returns the upper half of x.
				
				Asm: VEXTRACTF128, CPU Feature: AVX

			( Float32x8) GetLo() Float32x4
				GetLo returns the lower half of x.
				
				Asm: VEXTRACTF128, CPU Feature: AVX

			( Float32x8) Greater(y Float32x8) Mask32x8
				Greater returns x greater-than y, elementwise.
				
				Asm: VCMPPS, CPU Feature: AVX

			( Float32x8) GreaterEqual(y Float32x8) Mask32x8
				GreaterEqual returns x greater-than-or-equals y, elementwise.
				
				Asm: VCMPPS, CPU Feature: AVX

			( Float32x8) IsNan(y Float32x8) Mask32x8
				IsNan checks if elements are NaN. Use as x.IsNan(x).
				
				Asm: VCMPPS, CPU Feature: AVX

			( Float32x8) Len() int
				Len returns the number of elements in a Float32x8

			( Float32x8) Less(y Float32x8) Mask32x8
				Less returns x less-than y, elementwise.
				
				Asm: VCMPPS, CPU Feature: AVX

			( Float32x8) LessEqual(y Float32x8) Mask32x8
				LessEqual returns x less-than-or-equals y, elementwise.
				
				Asm: VCMPPS, CPU Feature: AVX

			( Float32x8) Masked(mask Mask32x8) Float32x8
				Masked returns x but with elements zeroed where mask is false.

			( Float32x8) Max(y Float32x8) Float32x8
				Max computes the maximum of corresponding elements.
				
				Asm: VMAXPS, CPU Feature: AVX

			( Float32x8) Merge(y Float32x8, mask Mask32x8) Float32x8
				Merge returns x but with elements set to y where mask is false.

			( Float32x8) Min(y Float32x8) Float32x8
				Min computes the minimum of corresponding elements.
				
				Asm: VMINPS, CPU Feature: AVX

			( Float32x8) Mul(y Float32x8) Float32x8
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VMULPS, CPU Feature: AVX

			( Float32x8) MulAdd(y Float32x8, z Float32x8) Float32x8
				MulAdd performs a fused (x * y) + z.
				
				Asm: VFMADD213PS, CPU Feature: AVX512

			( Float32x8) MulAddSub(y Float32x8, z Float32x8) Float32x8
				MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
				
				Asm: VFMADDSUB213PS, CPU Feature: AVX512

			( Float32x8) MulSubAdd(y Float32x8, z Float32x8) Float32x8
				MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
				
				Asm: VFMSUBADD213PS, CPU Feature: AVX512

			( Float32x8) NotEqual(y Float32x8) Mask32x8
				NotEqual returns x not-equals y, elementwise.
				
				Asm: VCMPPS, CPU Feature: AVX

			( Float32x8) Permute(indices Uint32x8) Float32x8
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 3 bits (values 0-7) of each element of indices is used
				
				Asm: VPERMPS, CPU Feature: AVX2

			( Float32x8) Reciprocal() Float32x8
				Reciprocal computes an approximate reciprocal of each element.
				
				Asm: VRCPPS, CPU Feature: AVX

			( Float32x8) ReciprocalSqrt() Float32x8
				ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
				
				Asm: VRSQRTPS, CPU Feature: AVX

			( Float32x8) RoundToEven() Float32x8
				RoundToEven rounds elements to the nearest integer.
				
				Asm: VROUNDPS, CPU Feature: AVX

			( Float32x8) RoundToEvenScaled(prec uint8) Float32x8
				RoundToEvenScaled rounds elements with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPS, CPU Feature: AVX512

			( Float32x8) RoundToEvenScaledResidue(prec uint8) Float32x8
				RoundToEvenScaledResidue computes the difference after rounding with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPS, CPU Feature: AVX512

			( Float32x8) Scale(y Float32x8) Float32x8
				Scale multiplies elements by a power of 2.
				
				Asm: VSCALEFPS, CPU Feature: AVX512

			( Float32x8) Select128FromPair(lo, hi uint8, y Float32x8) Float32x8
				Select128FromPair treats the 256-bit vectors x and y as a single vector of four
				128-bit elements, and returns a 256-bit result formed by
				concatenating the two elements specified by lo and hi.
				For example,
				
					{40, 41, 42, 43, 50, 51, 52, 53}.Select128FromPair(3, 0, {60, 61, 62, 63, 70, 71, 72, 73})
				
				returns {70, 71, 72, 73, 40, 41, 42, 43}.
				
				lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table.
				lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
				
				Asm: VPERM2F128, CPU Feature: AVX

			( Float32x8) SelectFromPairGrouped(a, b, c, d uint8, y Float32x8) Float32x8
				SelectFromPairGrouped returns, for each of the two 128-bit halves of
				the vectors x and y, the selection of four elements from  x and y,
				where selector values in the range 0-3 specify elements from x and
				values in the range 4-7 specify the 0-3 elements of y.
				When the selectors are constants and can be the selection
				can be implemented in a single instruction, it will be, otherwise
				it requires two. a is the source index of the least element in the
				output, and b, c, and d are the indices of the 2nd, 3rd, and 4th
				elements in the output.  For example,
				{1,2,4,8,16,32,64,128}.SelectFromPair(2,3,5,7,{9,25,49,81,121,169,225,289})
				
					returns {4,8,25,81,64,128,169,289}
				
				If the selectors are not constant this will translate to a function
				call.
				
				Asm: VSHUFPS, CPU Feature: AVX

			( Float32x8) SetHi(y Float32x4) Float32x8
				SetHi returns x with its upper half set to y.
				
				Asm: VINSERTF128, CPU Feature: AVX

			( Float32x8) SetLo(y Float32x4) Float32x8
				SetLo returns x with its lower half set to y.
				
				Asm: VINSERTF128, CPU Feature: AVX

			( Float32x8) Sqrt() Float32x8
				Sqrt computes the square root of each element.
				
				Asm: VSQRTPS, CPU Feature: AVX

			( Float32x8) Store(y *[8]float32)
				Store stores a Float32x8 to an array

			( Float32x8) StoreMasked(y *[8]float32, mask Mask32x8)
				StoreMasked stores a Float32x8 to an array,
				at those elements enabled by mask
				
				Asm: VMASKMOVD, CPU Feature: AVX2

			( Float32x8) StoreSlice(s []float32)
				StoreSlice stores x into a slice of at least 8 float32s

			( Float32x8) StoreSlicePart(s []float32)
				StoreSlicePart stores the 8 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 8 or more elements, the method is equivalent to x.StoreSlice.

			( Float32x8) String() string
				String returns a string representation of SIMD vector x

			( Float32x8) Sub(y Float32x8) Float32x8
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VSUBPS, CPU Feature: AVX

			( Float32x8) SubPairs(y Float32x8) Float32x8
				SubPairs horizontally subtracts adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
				
				Asm: VHSUBPS, CPU Feature: AVX

			( Float32x8) Trunc() Float32x8
				Trunc truncates elements towards zero.
				
				Asm: VROUNDPS, CPU Feature: AVX

			( Float32x8) TruncScaled(prec uint8) Float32x8
				TruncScaled truncates elements with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPS, CPU Feature: AVX512

			( Float32x8) TruncScaledResidue(prec uint8) Float32x8
				TruncScaledResidue computes the difference after truncating with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPS, CPU Feature: AVX512

		Implements (at least 2)
			 Float32x8 : expvar.Var
			 Float32x8 : fmt.Stringer
		As Outputs Of (at least 60)
			func BroadcastFloat32x8(x float32) Float32x8
			func LoadFloat32x8(y *[8]float32) Float32x8
			func LoadFloat32x8Slice(s []float32) Float32x8
			func LoadFloat32x8SlicePart(s []float32) Float32x8
			func LoadMaskedFloat32x8(y *[8]float32, mask Mask32x8) Float32x8
			func Float32x16.GetHi() Float32x8
			func Float32x16.GetLo() Float32x8
			func Float32x4.Broadcast256() Float32x8
			func Float32x8.Add(y Float32x8) Float32x8
			func Float32x8.AddPairs(y Float32x8) Float32x8
			func Float32x8.AddSub(y Float32x8) Float32x8
			func Float32x8.Ceil() Float32x8
			func Float32x8.CeilScaled(prec uint8) Float32x8
			func Float32x8.CeilScaledResidue(prec uint8) Float32x8
			func Float32x8.Compress(mask Mask32x8) Float32x8
			func Float32x8.ConcatPermute(y Float32x8, indices Uint32x8) Float32x8
			func Float32x8.Div(y Float32x8) Float32x8
			func Float32x8.Expand(mask Mask32x8) Float32x8
			func Float32x8.Floor() Float32x8
			func Float32x8.FloorScaled(prec uint8) Float32x8
			func Float32x8.FloorScaledResidue(prec uint8) Float32x8
			func Float32x8.Masked(mask Mask32x8) Float32x8
			func Float32x8.Max(y Float32x8) Float32x8
			func Float32x8.Merge(y Float32x8, mask Mask32x8) Float32x8
			func Float32x8.Min(y Float32x8) Float32x8
			func Float32x8.Mul(y Float32x8) Float32x8
			func Float32x8.MulAdd(y Float32x8, z Float32x8) Float32x8
			func Float32x8.MulAddSub(y Float32x8, z Float32x8) Float32x8
			func Float32x8.MulSubAdd(y Float32x8, z Float32x8) Float32x8
			func Float32x8.Permute(indices Uint32x8) Float32x8
			func Float32x8.Reciprocal() Float32x8
			func Float32x8.ReciprocalSqrt() Float32x8
			func Float32x8.RoundToEven() Float32x8
			func Float32x8.RoundToEvenScaled(prec uint8) Float32x8
			func Float32x8.RoundToEvenScaledResidue(prec uint8) Float32x8
			func Float32x8.Scale(y Float32x8) Float32x8
			func Float32x8.Select128FromPair(lo, hi uint8, y Float32x8) Float32x8
			func Float32x8.SelectFromPairGrouped(a, b, c, d uint8, y Float32x8) Float32x8
			func Float32x8.SetHi(y Float32x4) Float32x8
			func Float32x8.SetLo(y Float32x4) Float32x8
			func Float32x8.Sqrt() Float32x8
			func Float32x8.Sub(y Float32x8) Float32x8
			func Float32x8.SubPairs(y Float32x8) Float32x8
			func Float32x8.Trunc() Float32x8
			func Float32x8.TruncScaled(prec uint8) Float32x8
			func Float32x8.TruncScaledResidue(prec uint8) Float32x8
			func Float64x4.AsFloat32x8() (to Float32x8)
			func Float64x8.ConvertToFloat32() Float32x8
			func Int16x16.AsFloat32x8() (to Float32x8)
			func Int32x8.AsFloat32x8() (to Float32x8)
			func Int32x8.ConvertToFloat32() Float32x8
			func Int64x4.AsFloat32x8() (to Float32x8)
			func Int64x8.ConvertToFloat32() Float32x8
			func Int8x32.AsFloat32x8() (to Float32x8)
			func Uint16x16.AsFloat32x8() (to Float32x8)
			func Uint32x8.AsFloat32x8() (to Float32x8)
			func Uint32x8.ConvertToFloat32() Float32x8
			func Uint64x4.AsFloat32x8() (to Float32x8)
			func Uint64x8.ConvertToFloat32() Float32x8
			func Uint8x32.AsFloat32x8() (to Float32x8)
		As Inputs Of (at least 29)
			func Float32x16.SetHi(y Float32x8) Float32x16
			func Float32x16.SetLo(y Float32x8) Float32x16
			func Float32x8.Add(y Float32x8) Float32x8
			func Float32x8.AddPairs(y Float32x8) Float32x8
			func Float32x8.AddSub(y Float32x8) Float32x8
			func Float32x8.ConcatPermute(y Float32x8, indices Uint32x8) Float32x8
			func Float32x8.Div(y Float32x8) Float32x8
			func Float32x8.Equal(y Float32x8) Mask32x8
			func Float32x8.Greater(y Float32x8) Mask32x8
			func Float32x8.GreaterEqual(y Float32x8) Mask32x8
			func Float32x8.IsNan(y Float32x8) Mask32x8
			func Float32x8.Less(y Float32x8) Mask32x8
			func Float32x8.LessEqual(y Float32x8) Mask32x8
			func Float32x8.Max(y Float32x8) Float32x8
			func Float32x8.Merge(y Float32x8, mask Mask32x8) Float32x8
			func Float32x8.Min(y Float32x8) Float32x8
			func Float32x8.Mul(y Float32x8) Float32x8
			func Float32x8.MulAdd(y Float32x8, z Float32x8) Float32x8
			func Float32x8.MulAdd(y Float32x8, z Float32x8) Float32x8
			func Float32x8.MulAddSub(y Float32x8, z Float32x8) Float32x8
			func Float32x8.MulAddSub(y Float32x8, z Float32x8) Float32x8
			func Float32x8.MulSubAdd(y Float32x8, z Float32x8) Float32x8
			func Float32x8.MulSubAdd(y Float32x8, z Float32x8) Float32x8
			func Float32x8.NotEqual(y Float32x8) Mask32x8
			func Float32x8.Scale(y Float32x8) Float32x8
			func Float32x8.Select128FromPair(lo, hi uint8, y Float32x8) Float32x8
			func Float32x8.SelectFromPairGrouped(a, b, c, d uint8, y Float32x8) Float32x8
			func Float32x8.Sub(y Float32x8) Float32x8
			func Float32x8.SubPairs(y Float32x8) Float32x8

	 type Float64x2 (struct)
		Float64x2 is a 128-bit SIMD vector of 2 float64

		Methods (total 66)
			( Float64x2) Add(y Float64x2) Float64x2
				Add adds corresponding elements of two vectors.
				
				Asm: VADDPD, CPU Feature: AVX

			( Float64x2) AddPairs(y Float64x2) Float64x2
				AddPairs horizontally adds adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
				
				Asm: VHADDPD, CPU Feature: AVX

			( Float64x2) AddSub(y Float64x2) Float64x2
				AddSub subtracts even elements and adds odd elements of two vectors.
				
				Asm: VADDSUBPD, CPU Feature: AVX

			( Float64x2) AsFloat32x4() (to Float32x4)
				Float32x4 converts from Float64x2 to Float32x4

			( Float64x2) AsInt16x8() (to Int16x8)
				Int16x8 converts from Float64x2 to Int16x8

			( Float64x2) AsInt32x4() (to Int32x4)
				Int32x4 converts from Float64x2 to Int32x4

			( Float64x2) AsInt64x2() (to Int64x2)
				Int64x2 converts from Float64x2 to Int64x2

			( Float64x2) AsInt8x16() (to Int8x16)
				Int8x16 converts from Float64x2 to Int8x16

			( Float64x2) AsUint16x8() (to Uint16x8)
				Uint16x8 converts from Float64x2 to Uint16x8

			( Float64x2) AsUint32x4() (to Uint32x4)
				Uint32x4 converts from Float64x2 to Uint32x4

			( Float64x2) AsUint64x2() (to Uint64x2)
				Uint64x2 converts from Float64x2 to Uint64x2

			( Float64x2) AsUint8x16() (to Uint8x16)
				Uint8x16 converts from Float64x2 to Uint8x16

			( Float64x2) Broadcast128() Float64x2
				Broadcast128 copies element zero of its (128-bit) input to all elements of
				the 128-bit output vector.
				
				Asm: VPBROADCASTQ, CPU Feature: AVX2

			( Float64x2) Broadcast256() Float64x4
				Broadcast256 copies element zero of its (128-bit) input to all elements of
				the 256-bit output vector.
				
				Asm: VBROADCASTSD, CPU Feature: AVX2

			( Float64x2) Broadcast512() Float64x8
				Broadcast512 copies element zero of its (128-bit) input to all elements of
				the 512-bit output vector.
				
				Asm: VBROADCASTSD, CPU Feature: AVX512

			( Float64x2) Ceil() Float64x2
				Ceil rounds elements up to the nearest integer.
				
				Asm: VROUNDPD, CPU Feature: AVX

			( Float64x2) CeilScaled(prec uint8) Float64x2
				CeilScaled rounds elements up with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPD, CPU Feature: AVX512

			( Float64x2) CeilScaledResidue(prec uint8) Float64x2
				CeilScaledResidue computes the difference after ceiling with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPD, CPU Feature: AVX512

			( Float64x2) Compress(mask Mask64x2) Float64x2
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VCOMPRESSPD, CPU Feature: AVX512

			( Float64x2) ConcatPermute(y Float64x2, indices Uint64x2) Float64x2
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2PD, CPU Feature: AVX512

			( Float64x2) ConvertToFloat32() Float32x4
				ConvertToFloat32 converts element values to float32.
				The result vector's elements are rounded to the nearest value.
				
				Asm: VCVTPD2PSX, CPU Feature: AVX

			( Float64x2) ConvertToInt32() Int32x4
				ConvertToInt32 converts element values to int32.
				When a conversion is inexact, a truncated (round toward zero) value is returned.
				If a converted result cannot be represented in int32, an implementation-defined
				architecture-specific value is returned.
				
				Asm: VCVTTPD2DQX, CPU Feature: AVX

			( Float64x2) ConvertToInt64() Int64x2
				ConvertToInt64 converts element values to int64.
				When a conversion is inexact, a truncated (round toward zero) value is returned.
				If a converted result cannot be represented in int64, an implementation-defined
				architecture-specific value is returned.
				
				Asm: VCVTTPD2QQ, CPU Feature: AVX512

			( Float64x2) ConvertToUint32() Uint32x4
				ConvertToUint32 converts element values to uint32.
				When a conversion is inexact, a truncated (round toward zero) value is returned.
				If a converted result cannot be represented in uint32, an implementation-defined
				architecture-specific value is returned.
				
				Asm: VCVTTPD2UDQX, CPU Feature: AVX512

			( Float64x2) ConvertToUint64() Uint64x2
				ConvertToUint64 converts element values to uint64.
				When a conversion is inexact, a truncated (round toward zero) value is returned.
				If a converted result cannot be represented in uint64, an implementation-defined
				architecture-specific value is returned.
				
				Asm: VCVTTPD2UQQ, CPU Feature: AVX512

			( Float64x2) Div(y Float64x2) Float64x2
				Div divides elements of two vectors.
				
				Asm: VDIVPD, CPU Feature: AVX

			( Float64x2) Equal(y Float64x2) Mask64x2
				Equal returns x equals y, elementwise.
				
				Asm: VCMPPD, CPU Feature: AVX

			( Float64x2) Expand(mask Mask64x2) Float64x2
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VEXPANDPD, CPU Feature: AVX512

			( Float64x2) Floor() Float64x2
				Floor rounds elements down to the nearest integer.
				
				Asm: VROUNDPD, CPU Feature: AVX

			( Float64x2) FloorScaled(prec uint8) Float64x2
				FloorScaled rounds elements down with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPD, CPU Feature: AVX512

			( Float64x2) FloorScaledResidue(prec uint8) Float64x2
				FloorScaledResidue computes the difference after flooring with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPD, CPU Feature: AVX512

			( Float64x2) GetElem(index uint8) float64
				GetElem retrieves a single constant-indexed element's value.
				
				index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPEXTRQ, CPU Feature: AVX

			( Float64x2) Greater(y Float64x2) Mask64x2
				Greater returns x greater-than y, elementwise.
				
				Asm: VCMPPD, CPU Feature: AVX

			( Float64x2) GreaterEqual(y Float64x2) Mask64x2
				GreaterEqual returns x greater-than-or-equals y, elementwise.
				
				Asm: VCMPPD, CPU Feature: AVX

			( Float64x2) IsNan(y Float64x2) Mask64x2
				IsNan checks if elements are NaN. Use as x.IsNan(x).
				
				Asm: VCMPPD, CPU Feature: AVX

			( Float64x2) Len() int
				Len returns the number of elements in a Float64x2

			( Float64x2) Less(y Float64x2) Mask64x2
				Less returns x less-than y, elementwise.
				
				Asm: VCMPPD, CPU Feature: AVX

			( Float64x2) LessEqual(y Float64x2) Mask64x2
				LessEqual returns x less-than-or-equals y, elementwise.
				
				Asm: VCMPPD, CPU Feature: AVX

			( Float64x2) Masked(mask Mask64x2) Float64x2
				Masked returns x but with elements zeroed where mask is false.

			( Float64x2) Max(y Float64x2) Float64x2
				Max computes the maximum of corresponding elements.
				
				Asm: VMAXPD, CPU Feature: AVX

			( Float64x2) Merge(y Float64x2, mask Mask64x2) Float64x2
				Merge returns x but with elements set to y where mask is false.

			( Float64x2) Min(y Float64x2) Float64x2
				Min computes the minimum of corresponding elements.
				
				Asm: VMINPD, CPU Feature: AVX

			( Float64x2) Mul(y Float64x2) Float64x2
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VMULPD, CPU Feature: AVX

			( Float64x2) MulAdd(y Float64x2, z Float64x2) Float64x2
				MulAdd performs a fused (x * y) + z.
				
				Asm: VFMADD213PD, CPU Feature: AVX512

			( Float64x2) MulAddSub(y Float64x2, z Float64x2) Float64x2
				MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
				
				Asm: VFMADDSUB213PD, CPU Feature: AVX512

			( Float64x2) MulSubAdd(y Float64x2, z Float64x2) Float64x2
				MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
				
				Asm: VFMSUBADD213PD, CPU Feature: AVX512

			( Float64x2) NotEqual(y Float64x2) Mask64x2
				NotEqual returns x not-equals y, elementwise.
				
				Asm: VCMPPD, CPU Feature: AVX

			( Float64x2) Reciprocal() Float64x2
				Reciprocal computes an approximate reciprocal of each element.
				
				Asm: VRCP14PD, CPU Feature: AVX512

			( Float64x2) ReciprocalSqrt() Float64x2
				ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
				
				Asm: VRSQRT14PD, CPU Feature: AVX512

			( Float64x2) RoundToEven() Float64x2
				RoundToEven rounds elements to the nearest integer.
				
				Asm: VROUNDPD, CPU Feature: AVX

			( Float64x2) RoundToEvenScaled(prec uint8) Float64x2
				RoundToEvenScaled rounds elements with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPD, CPU Feature: AVX512

			( Float64x2) RoundToEvenScaledResidue(prec uint8) Float64x2
				RoundToEvenScaledResidue computes the difference after rounding with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPD, CPU Feature: AVX512

			( Float64x2) Scale(y Float64x2) Float64x2
				Scale multiplies elements by a power of 2.
				
				Asm: VSCALEFPD, CPU Feature: AVX512

			( Float64x2) SelectFromPair(a, b uint8, y Float64x2) Float64x2
				SelectFromPair returns the selection of two elements from the two
				vectors x and y, where selector values in the range 0-1 specify
				elements from x and values in the range 2-3 specify the 0-1 elements
				of y.  When the selectors are constants the selection can be
				implemented in a single instruction.
				
				If the selectors are not constant this will translate to a function
				call.
				
				Asm: VSHUFPD, CPU Feature: AVX

			( Float64x2) SetElem(index uint8, y float64) Float64x2
				SetElem sets a single constant-indexed element's value.
				
				index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPINSRQ, CPU Feature: AVX

			( Float64x2) Sqrt() Float64x2
				Sqrt computes the square root of each element.
				
				Asm: VSQRTPD, CPU Feature: AVX

			( Float64x2) Store(y *[2]float64)
				Store stores a Float64x2 to an array

			( Float64x2) StoreMasked(y *[2]float64, mask Mask64x2)
				StoreMasked stores a Float64x2 to an array,
				at those elements enabled by mask
				
				Asm: VMASKMOVQ, CPU Feature: AVX2

			( Float64x2) StoreSlice(s []float64)
				StoreSlice stores x into a slice of at least 2 float64s

			( Float64x2) StoreSlicePart(s []float64)
				StoreSlicePart stores the 2 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 2 or more elements, the method is equivalent to x.StoreSlice.

			( Float64x2) String() string
				String returns a string representation of SIMD vector x

			( Float64x2) Sub(y Float64x2) Float64x2
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VSUBPD, CPU Feature: AVX

			( Float64x2) SubPairs(y Float64x2) Float64x2
				SubPairs horizontally subtracts adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
				
				Asm: VHSUBPD, CPU Feature: AVX

			( Float64x2) Trunc() Float64x2
				Trunc truncates elements towards zero.
				
				Asm: VROUNDPD, CPU Feature: AVX

			( Float64x2) TruncScaled(prec uint8) Float64x2
				TruncScaled truncates elements with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPD, CPU Feature: AVX512

			( Float64x2) TruncScaledResidue(prec uint8) Float64x2
				TruncScaledResidue computes the difference after truncating with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPD, CPU Feature: AVX512

		Implements (at least 2)
			 Float64x2 : expvar.Var
			 Float64x2 : fmt.Stringer
		As Outputs Of (at least 54)
			func BroadcastFloat64x2(x float64) Float64x2
			func LoadFloat64x2(y *[2]float64) Float64x2
			func LoadFloat64x2Slice(s []float64) Float64x2
			func LoadFloat64x2SlicePart(s []float64) Float64x2
			func LoadMaskedFloat64x2(y *[2]float64, mask Mask64x2) Float64x2
			func Float32x4.AsFloat64x2() (to Float64x2)
			func Float64x2.Add(y Float64x2) Float64x2
			func Float64x2.AddPairs(y Float64x2) Float64x2
			func Float64x2.AddSub(y Float64x2) Float64x2
			func Float64x2.Broadcast128() Float64x2
			func Float64x2.Ceil() Float64x2
			func Float64x2.CeilScaled(prec uint8) Float64x2
			func Float64x2.CeilScaledResidue(prec uint8) Float64x2
			func Float64x2.Compress(mask Mask64x2) Float64x2
			func Float64x2.ConcatPermute(y Float64x2, indices Uint64x2) Float64x2
			func Float64x2.Div(y Float64x2) Float64x2
			func Float64x2.Expand(mask Mask64x2) Float64x2
			func Float64x2.Floor() Float64x2
			func Float64x2.FloorScaled(prec uint8) Float64x2
			func Float64x2.FloorScaledResidue(prec uint8) Float64x2
			func Float64x2.Masked(mask Mask64x2) Float64x2
			func Float64x2.Max(y Float64x2) Float64x2
			func Float64x2.Merge(y Float64x2, mask Mask64x2) Float64x2
			func Float64x2.Min(y Float64x2) Float64x2
			func Float64x2.Mul(y Float64x2) Float64x2
			func Float64x2.MulAdd(y Float64x2, z Float64x2) Float64x2
			func Float64x2.MulAddSub(y Float64x2, z Float64x2) Float64x2
			func Float64x2.MulSubAdd(y Float64x2, z Float64x2) Float64x2
			func Float64x2.Reciprocal() Float64x2
			func Float64x2.ReciprocalSqrt() Float64x2
			func Float64x2.RoundToEven() Float64x2
			func Float64x2.RoundToEvenScaled(prec uint8) Float64x2
			func Float64x2.RoundToEvenScaledResidue(prec uint8) Float64x2
			func Float64x2.Scale(y Float64x2) Float64x2
			func Float64x2.SelectFromPair(a, b uint8, y Float64x2) Float64x2
			func Float64x2.SetElem(index uint8, y float64) Float64x2
			func Float64x2.Sqrt() Float64x2
			func Float64x2.Sub(y Float64x2) Float64x2
			func Float64x2.SubPairs(y Float64x2) Float64x2
			func Float64x2.Trunc() Float64x2
			func Float64x2.TruncScaled(prec uint8) Float64x2
			func Float64x2.TruncScaledResidue(prec uint8) Float64x2
			func Float64x4.GetHi() Float64x2
			func Float64x4.GetLo() Float64x2
			func Int16x8.AsFloat64x2() (to Float64x2)
			func Int32x4.AsFloat64x2() (to Float64x2)
			func Int64x2.AsFloat64x2() (to Float64x2)
			func Int64x2.ConvertToFloat64() Float64x2
			func Int8x16.AsFloat64x2() (to Float64x2)
			func Uint16x8.AsFloat64x2() (to Float64x2)
			func Uint32x4.AsFloat64x2() (to Float64x2)
			func Uint64x2.AsFloat64x2() (to Float64x2)
			func Uint64x2.ConvertToFloat64() Float64x2
			func Uint8x16.AsFloat64x2() (to Float64x2)
		As Inputs Of (at least 28)
			func Float64x2.Add(y Float64x2) Float64x2
			func Float64x2.AddPairs(y Float64x2) Float64x2
			func Float64x2.AddSub(y Float64x2) Float64x2
			func Float64x2.ConcatPermute(y Float64x2, indices Uint64x2) Float64x2
			func Float64x2.Div(y Float64x2) Float64x2
			func Float64x2.Equal(y Float64x2) Mask64x2
			func Float64x2.Greater(y Float64x2) Mask64x2
			func Float64x2.GreaterEqual(y Float64x2) Mask64x2
			func Float64x2.IsNan(y Float64x2) Mask64x2
			func Float64x2.Less(y Float64x2) Mask64x2
			func Float64x2.LessEqual(y Float64x2) Mask64x2
			func Float64x2.Max(y Float64x2) Float64x2
			func Float64x2.Merge(y Float64x2, mask Mask64x2) Float64x2
			func Float64x2.Min(y Float64x2) Float64x2
			func Float64x2.Mul(y Float64x2) Float64x2
			func Float64x2.MulAdd(y Float64x2, z Float64x2) Float64x2
			func Float64x2.MulAdd(y Float64x2, z Float64x2) Float64x2
			func Float64x2.MulAddSub(y Float64x2, z Float64x2) Float64x2
			func Float64x2.MulAddSub(y Float64x2, z Float64x2) Float64x2
			func Float64x2.MulSubAdd(y Float64x2, z Float64x2) Float64x2
			func Float64x2.MulSubAdd(y Float64x2, z Float64x2) Float64x2
			func Float64x2.NotEqual(y Float64x2) Mask64x2
			func Float64x2.Scale(y Float64x2) Float64x2
			func Float64x2.SelectFromPair(a, b uint8, y Float64x2) Float64x2
			func Float64x2.Sub(y Float64x2) Float64x2
			func Float64x2.SubPairs(y Float64x2) Float64x2
			func Float64x4.SetHi(y Float64x2) Float64x4
			func Float64x4.SetLo(y Float64x2) Float64x4

	 type Float64x4 (struct)
		Float64x4 is a 256-bit SIMD vector of 4 float64

		Methods (total 67)
			( Float64x4) Add(y Float64x4) Float64x4
				Add adds corresponding elements of two vectors.
				
				Asm: VADDPD, CPU Feature: AVX

			( Float64x4) AddPairs(y Float64x4) Float64x4
				AddPairs horizontally adds adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
				
				Asm: VHADDPD, CPU Feature: AVX

			( Float64x4) AddSub(y Float64x4) Float64x4
				AddSub subtracts even elements and adds odd elements of two vectors.
				
				Asm: VADDSUBPD, CPU Feature: AVX

			( Float64x4) AsFloat32x8() (to Float32x8)
				Float32x8 converts from Float64x4 to Float32x8

			( Float64x4) AsInt16x16() (to Int16x16)
				Int16x16 converts from Float64x4 to Int16x16

			( Float64x4) AsInt32x8() (to Int32x8)
				Int32x8 converts from Float64x4 to Int32x8

			( Float64x4) AsInt64x4() (to Int64x4)
				Int64x4 converts from Float64x4 to Int64x4

			( Float64x4) AsInt8x32() (to Int8x32)
				Int8x32 converts from Float64x4 to Int8x32

			( Float64x4) AsUint16x16() (to Uint16x16)
				Uint16x16 converts from Float64x4 to Uint16x16

			( Float64x4) AsUint32x8() (to Uint32x8)
				Uint32x8 converts from Float64x4 to Uint32x8

			( Float64x4) AsUint64x4() (to Uint64x4)
				Uint64x4 converts from Float64x4 to Uint64x4

			( Float64x4) AsUint8x32() (to Uint8x32)
				Uint8x32 converts from Float64x4 to Uint8x32

			( Float64x4) Ceil() Float64x4
				Ceil rounds elements up to the nearest integer.
				
				Asm: VROUNDPD, CPU Feature: AVX

			( Float64x4) CeilScaled(prec uint8) Float64x4
				CeilScaled rounds elements up with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPD, CPU Feature: AVX512

			( Float64x4) CeilScaledResidue(prec uint8) Float64x4
				CeilScaledResidue computes the difference after ceiling with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPD, CPU Feature: AVX512

			( Float64x4) Compress(mask Mask64x4) Float64x4
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VCOMPRESSPD, CPU Feature: AVX512

			( Float64x4) ConcatPermute(y Float64x4, indices Uint64x4) Float64x4
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2PD, CPU Feature: AVX512

			( Float64x4) ConvertToFloat32() Float32x4
				ConvertToFloat32 converts element values to float32.
				The result vector's elements are rounded to the nearest value.
				
				Asm: VCVTPD2PSY, CPU Feature: AVX

			( Float64x4) ConvertToInt32() Int32x4
				ConvertToInt32 converts element values to int32.
				When a conversion is inexact, a truncated (round toward zero) value is returned.
				If a converted result cannot be represented in int32, an implementation-defined
				architecture-specific value is returned.
				
				Asm: VCVTTPD2DQY, CPU Feature: AVX

			( Float64x4) ConvertToInt64() Int64x4
				ConvertToInt64 converts element values to int64.
				When a conversion is inexact, a truncated (round toward zero) value is returned.
				If a converted result cannot be represented in int64, an implementation-defined
				architecture-specific value is returned.
				
				Asm: VCVTTPD2QQ, CPU Feature: AVX512

			( Float64x4) ConvertToUint32() Uint32x4
				ConvertToUint32 converts element values to uint32.
				When a conversion is inexact, a truncated (round toward zero) value is returned.
				If a converted result cannot be represented in uint32, an implementation-defined
				architecture-specific value is returned.
				
				Asm: VCVTTPD2UDQY, CPU Feature: AVX512

			( Float64x4) ConvertToUint64() Uint64x4
				ConvertToUint64 converts element values to uint64.
				When a conversion is inexact, a truncated (round toward zero) value is returned.
				If a converted result cannot be represented in uint64, an implementation-defined
				architecture-specific value is returned.
				
				Asm: VCVTTPD2UQQ, CPU Feature: AVX512

			( Float64x4) Div(y Float64x4) Float64x4
				Div divides elements of two vectors.
				
				Asm: VDIVPD, CPU Feature: AVX

			( Float64x4) Equal(y Float64x4) Mask64x4
				Equal returns x equals y, elementwise.
				
				Asm: VCMPPD, CPU Feature: AVX

			( Float64x4) Expand(mask Mask64x4) Float64x4
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VEXPANDPD, CPU Feature: AVX512

			( Float64x4) Floor() Float64x4
				Floor rounds elements down to the nearest integer.
				
				Asm: VROUNDPD, CPU Feature: AVX

			( Float64x4) FloorScaled(prec uint8) Float64x4
				FloorScaled rounds elements down with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPD, CPU Feature: AVX512

			( Float64x4) FloorScaledResidue(prec uint8) Float64x4
				FloorScaledResidue computes the difference after flooring with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPD, CPU Feature: AVX512

			( Float64x4) GetHi() Float64x2
				GetHi returns the upper half of x.
				
				Asm: VEXTRACTF128, CPU Feature: AVX

			( Float64x4) GetLo() Float64x2
				GetLo returns the lower half of x.
				
				Asm: VEXTRACTF128, CPU Feature: AVX

			( Float64x4) Greater(y Float64x4) Mask64x4
				Greater returns x greater-than y, elementwise.
				
				Asm: VCMPPD, CPU Feature: AVX

			( Float64x4) GreaterEqual(y Float64x4) Mask64x4
				GreaterEqual returns x greater-than-or-equals y, elementwise.
				
				Asm: VCMPPD, CPU Feature: AVX

			( Float64x4) IsNan(y Float64x4) Mask64x4
				IsNan checks if elements are NaN. Use as x.IsNan(x).
				
				Asm: VCMPPD, CPU Feature: AVX

			( Float64x4) Len() int
				Len returns the number of elements in a Float64x4

			( Float64x4) Less(y Float64x4) Mask64x4
				Less returns x less-than y, elementwise.
				
				Asm: VCMPPD, CPU Feature: AVX

			( Float64x4) LessEqual(y Float64x4) Mask64x4
				LessEqual returns x less-than-or-equals y, elementwise.
				
				Asm: VCMPPD, CPU Feature: AVX

			( Float64x4) Masked(mask Mask64x4) Float64x4
				Masked returns x but with elements zeroed where mask is false.

			( Float64x4) Max(y Float64x4) Float64x4
				Max computes the maximum of corresponding elements.
				
				Asm: VMAXPD, CPU Feature: AVX

			( Float64x4) Merge(y Float64x4, mask Mask64x4) Float64x4
				Merge returns x but with elements set to y where mask is false.

			( Float64x4) Min(y Float64x4) Float64x4
				Min computes the minimum of corresponding elements.
				
				Asm: VMINPD, CPU Feature: AVX

			( Float64x4) Mul(y Float64x4) Float64x4
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VMULPD, CPU Feature: AVX

			( Float64x4) MulAdd(y Float64x4, z Float64x4) Float64x4
				MulAdd performs a fused (x * y) + z.
				
				Asm: VFMADD213PD, CPU Feature: AVX512

			( Float64x4) MulAddSub(y Float64x4, z Float64x4) Float64x4
				MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
				
				Asm: VFMADDSUB213PD, CPU Feature: AVX512

			( Float64x4) MulSubAdd(y Float64x4, z Float64x4) Float64x4
				MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
				
				Asm: VFMSUBADD213PD, CPU Feature: AVX512

			( Float64x4) NotEqual(y Float64x4) Mask64x4
				NotEqual returns x not-equals y, elementwise.
				
				Asm: VCMPPD, CPU Feature: AVX

			( Float64x4) Permute(indices Uint64x4) Float64x4
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 2 bits (values 0-3) of each element of indices is used
				
				Asm: VPERMPD, CPU Feature: AVX512

			( Float64x4) Reciprocal() Float64x4
				Reciprocal computes an approximate reciprocal of each element.
				
				Asm: VRCP14PD, CPU Feature: AVX512

			( Float64x4) ReciprocalSqrt() Float64x4
				ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
				
				Asm: VRSQRT14PD, CPU Feature: AVX512

			( Float64x4) RoundToEven() Float64x4
				RoundToEven rounds elements to the nearest integer.
				
				Asm: VROUNDPD, CPU Feature: AVX

			( Float64x4) RoundToEvenScaled(prec uint8) Float64x4
				RoundToEvenScaled rounds elements with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPD, CPU Feature: AVX512

			( Float64x4) RoundToEvenScaledResidue(prec uint8) Float64x4
				RoundToEvenScaledResidue computes the difference after rounding with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPD, CPU Feature: AVX512

			( Float64x4) Scale(y Float64x4) Float64x4
				Scale multiplies elements by a power of 2.
				
				Asm: VSCALEFPD, CPU Feature: AVX512

			( Float64x4) Select128FromPair(lo, hi uint8, y Float64x4) Float64x4
				Select128FromPair treats the 256-bit vectors x and y as a single vector of four
				128-bit elements, and returns a 256-bit result formed by
				concatenating the two elements specified by lo and hi.
				For example,
				
					{40, 41, 50, 51}.Select128FromPair(3, 0, {60, 61, 70, 71})
				
				returns {70, 71, 40, 41}.
				
				lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table.
				lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
				
				Asm: VPERM2F128, CPU Feature: AVX

			( Float64x4) SelectFromPairGrouped(a, b uint8, y Float64x4) Float64x4
				SelectFromPairGrouped returns, for each of the two 128-bit halves of
				the vectors x and y, the selection of two elements from the two
				vectors x and y, where selector values in the range 0-1 specify
				elements from x and values in the range 2-3 specify the 0-1 elements
				of y.  When the selectors are constants the selection can be
				implemented in a single instruction.
				
				If the selectors are not constant this will translate to a function
				call.
				
				Asm: VSHUFPD, CPU Feature: AVX

			( Float64x4) SetHi(y Float64x2) Float64x4
				SetHi returns x with its upper half set to y.
				
				Asm: VINSERTF128, CPU Feature: AVX

			( Float64x4) SetLo(y Float64x2) Float64x4
				SetLo returns x with its lower half set to y.
				
				Asm: VINSERTF128, CPU Feature: AVX

			( Float64x4) Sqrt() Float64x4
				Sqrt computes the square root of each element.
				
				Asm: VSQRTPD, CPU Feature: AVX

			( Float64x4) Store(y *[4]float64)
				Store stores a Float64x4 to an array

			( Float64x4) StoreMasked(y *[4]float64, mask Mask64x4)
				StoreMasked stores a Float64x4 to an array,
				at those elements enabled by mask
				
				Asm: VMASKMOVQ, CPU Feature: AVX2

			( Float64x4) StoreSlice(s []float64)
				StoreSlice stores x into a slice of at least 4 float64s

			( Float64x4) StoreSlicePart(s []float64)
				StoreSlicePart stores the 4 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 4 or more elements, the method is equivalent to x.StoreSlice.

			( Float64x4) String() string
				String returns a string representation of SIMD vector x

			( Float64x4) Sub(y Float64x4) Float64x4
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VSUBPD, CPU Feature: AVX

			( Float64x4) SubPairs(y Float64x4) Float64x4
				SubPairs horizontally subtracts adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
				
				Asm: VHSUBPD, CPU Feature: AVX

			( Float64x4) Trunc() Float64x4
				Trunc truncates elements towards zero.
				
				Asm: VROUNDPD, CPU Feature: AVX

			( Float64x4) TruncScaled(prec uint8) Float64x4
				TruncScaled truncates elements with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPD, CPU Feature: AVX512

			( Float64x4) TruncScaledResidue(prec uint8) Float64x4
				TruncScaledResidue computes the difference after truncating with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPD, CPU Feature: AVX512

		Implements (at least 2)
			 Float64x4 : expvar.Var
			 Float64x4 : fmt.Stringer
		As Outputs Of (at least 60)
			func BroadcastFloat64x4(x float64) Float64x4
			func LoadFloat64x4(y *[4]float64) Float64x4
			func LoadFloat64x4Slice(s []float64) Float64x4
			func LoadFloat64x4SlicePart(s []float64) Float64x4
			func LoadMaskedFloat64x4(y *[4]float64, mask Mask64x4) Float64x4
			func Float32x4.ConvertToFloat64() Float64x4
			func Float32x8.AsFloat64x4() (to Float64x4)
			func Float64x2.Broadcast256() Float64x4
			func Float64x4.Add(y Float64x4) Float64x4
			func Float64x4.AddPairs(y Float64x4) Float64x4
			func Float64x4.AddSub(y Float64x4) Float64x4
			func Float64x4.Ceil() Float64x4
			func Float64x4.CeilScaled(prec uint8) Float64x4
			func Float64x4.CeilScaledResidue(prec uint8) Float64x4
			func Float64x4.Compress(mask Mask64x4) Float64x4
			func Float64x4.ConcatPermute(y Float64x4, indices Uint64x4) Float64x4
			func Float64x4.Div(y Float64x4) Float64x4
			func Float64x4.Expand(mask Mask64x4) Float64x4
			func Float64x4.Floor() Float64x4
			func Float64x4.FloorScaled(prec uint8) Float64x4
			func Float64x4.FloorScaledResidue(prec uint8) Float64x4
			func Float64x4.Masked(mask Mask64x4) Float64x4
			func Float64x4.Max(y Float64x4) Float64x4
			func Float64x4.Merge(y Float64x4, mask Mask64x4) Float64x4
			func Float64x4.Min(y Float64x4) Float64x4
			func Float64x4.Mul(y Float64x4) Float64x4
			func Float64x4.MulAdd(y Float64x4, z Float64x4) Float64x4
			func Float64x4.MulAddSub(y Float64x4, z Float64x4) Float64x4
			func Float64x4.MulSubAdd(y Float64x4, z Float64x4) Float64x4
			func Float64x4.Permute(indices Uint64x4) Float64x4
			func Float64x4.Reciprocal() Float64x4
			func Float64x4.ReciprocalSqrt() Float64x4
			func Float64x4.RoundToEven() Float64x4
			func Float64x4.RoundToEvenScaled(prec uint8) Float64x4
			func Float64x4.RoundToEvenScaledResidue(prec uint8) Float64x4
			func Float64x4.Scale(y Float64x4) Float64x4
			func Float64x4.Select128FromPair(lo, hi uint8, y Float64x4) Float64x4
			func Float64x4.SelectFromPairGrouped(a, b uint8, y Float64x4) Float64x4
			func Float64x4.SetHi(y Float64x2) Float64x4
			func Float64x4.SetLo(y Float64x2) Float64x4
			func Float64x4.Sqrt() Float64x4
			func Float64x4.Sub(y Float64x4) Float64x4
			func Float64x4.SubPairs(y Float64x4) Float64x4
			func Float64x4.Trunc() Float64x4
			func Float64x4.TruncScaled(prec uint8) Float64x4
			func Float64x4.TruncScaledResidue(prec uint8) Float64x4
			func Float64x8.GetHi() Float64x4
			func Float64x8.GetLo() Float64x4
			func Int16x16.AsFloat64x4() (to Float64x4)
			func Int32x4.ConvertToFloat64() Float64x4
			func Int32x8.AsFloat64x4() (to Float64x4)
			func Int64x4.AsFloat64x4() (to Float64x4)
			func Int64x4.ConvertToFloat64() Float64x4
			func Int8x32.AsFloat64x4() (to Float64x4)
			func Uint16x16.AsFloat64x4() (to Float64x4)
			func Uint32x4.ConvertToFloat64() Float64x4
			func Uint32x8.AsFloat64x4() (to Float64x4)
			func Uint64x4.AsFloat64x4() (to Float64x4)
			func Uint64x4.ConvertToFloat64() Float64x4
			func Uint8x32.AsFloat64x4() (to Float64x4)
		As Inputs Of (at least 29)
			func Float64x4.Add(y Float64x4) Float64x4
			func Float64x4.AddPairs(y Float64x4) Float64x4
			func Float64x4.AddSub(y Float64x4) Float64x4
			func Float64x4.ConcatPermute(y Float64x4, indices Uint64x4) Float64x4
			func Float64x4.Div(y Float64x4) Float64x4
			func Float64x4.Equal(y Float64x4) Mask64x4
			func Float64x4.Greater(y Float64x4) Mask64x4
			func Float64x4.GreaterEqual(y Float64x4) Mask64x4
			func Float64x4.IsNan(y Float64x4) Mask64x4
			func Float64x4.Less(y Float64x4) Mask64x4
			func Float64x4.LessEqual(y Float64x4) Mask64x4
			func Float64x4.Max(y Float64x4) Float64x4
			func Float64x4.Merge(y Float64x4, mask Mask64x4) Float64x4
			func Float64x4.Min(y Float64x4) Float64x4
			func Float64x4.Mul(y Float64x4) Float64x4
			func Float64x4.MulAdd(y Float64x4, z Float64x4) Float64x4
			func Float64x4.MulAdd(y Float64x4, z Float64x4) Float64x4
			func Float64x4.MulAddSub(y Float64x4, z Float64x4) Float64x4
			func Float64x4.MulAddSub(y Float64x4, z Float64x4) Float64x4
			func Float64x4.MulSubAdd(y Float64x4, z Float64x4) Float64x4
			func Float64x4.MulSubAdd(y Float64x4, z Float64x4) Float64x4
			func Float64x4.NotEqual(y Float64x4) Mask64x4
			func Float64x4.Scale(y Float64x4) Float64x4
			func Float64x4.Select128FromPair(lo, hi uint8, y Float64x4) Float64x4
			func Float64x4.SelectFromPairGrouped(a, b uint8, y Float64x4) Float64x4
			func Float64x4.Sub(y Float64x4) Float64x4
			func Float64x4.SubPairs(y Float64x4) Float64x4
			func Float64x8.SetHi(y Float64x4) Float64x8
			func Float64x8.SetLo(y Float64x4) Float64x8

	 type Float64x8 (struct)
		Float64x8 is a 512-bit SIMD vector of 8 float64

		Methods (total 59)
			( Float64x8) Add(y Float64x8) Float64x8
				Add adds corresponding elements of two vectors.
				
				Asm: VADDPD, CPU Feature: AVX512

			( Float64x8) AsFloat32x16() (to Float32x16)
				Float32x16 converts from Float64x8 to Float32x16

			( Float64x8) AsInt16x32() (to Int16x32)
				Int16x32 converts from Float64x8 to Int16x32

			( Float64x8) AsInt32x16() (to Int32x16)
				Int32x16 converts from Float64x8 to Int32x16

			( Float64x8) AsInt64x8() (to Int64x8)
				Int64x8 converts from Float64x8 to Int64x8

			( Float64x8) AsInt8x64() (to Int8x64)
				Int8x64 converts from Float64x8 to Int8x64

			( Float64x8) AsUint16x32() (to Uint16x32)
				Uint16x32 converts from Float64x8 to Uint16x32

			( Float64x8) AsUint32x16() (to Uint32x16)
				Uint32x16 converts from Float64x8 to Uint32x16

			( Float64x8) AsUint64x8() (to Uint64x8)
				Uint64x8 converts from Float64x8 to Uint64x8

			( Float64x8) AsUint8x64() (to Uint8x64)
				Uint8x64 converts from Float64x8 to Uint8x64

			( Float64x8) CeilScaled(prec uint8) Float64x8
				CeilScaled rounds elements up with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPD, CPU Feature: AVX512

			( Float64x8) CeilScaledResidue(prec uint8) Float64x8
				CeilScaledResidue computes the difference after ceiling with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPD, CPU Feature: AVX512

			( Float64x8) Compress(mask Mask64x8) Float64x8
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VCOMPRESSPD, CPU Feature: AVX512

			( Float64x8) ConcatPermute(y Float64x8, indices Uint64x8) Float64x8
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2PD, CPU Feature: AVX512

			( Float64x8) ConvertToFloat32() Float32x8
				ConvertToFloat32 converts element values to float32.
				The result vector's elements are rounded to the nearest value.
				
				Asm: VCVTPD2PS, CPU Feature: AVX512

			( Float64x8) ConvertToInt32() Int32x8
				ConvertToInt32 converts element values to int32.
				When a conversion is inexact, a truncated (round toward zero) value is returned.
				If a converted result cannot be represented in int32, an implementation-defined
				architecture-specific value is returned.
				
				Asm: VCVTTPD2DQ, CPU Feature: AVX512

			( Float64x8) ConvertToInt64() Int64x8
				ConvertToInt64 converts element values to int64.
				When a conversion is inexact, a truncated (round toward zero) value is returned.
				If a converted result cannot be represented in int64, an implementation-defined
				architecture-specific value is returned.
				
				Asm: VCVTTPD2QQ, CPU Feature: AVX512

			( Float64x8) ConvertToUint32() Uint32x8
				ConvertToUint32 converts element values to uint32.
				When a conversion is inexact, a truncated (round toward zero) value is returned.
				If a converted result cannot be represented in uint32, an implementation-defined
				architecture-specific value is returned.
				
				Asm: VCVTTPD2UDQ, CPU Feature: AVX512

			( Float64x8) ConvertToUint64() Uint64x8
				ConvertToUint64 converts element values to uint64.
				When a conversion is inexact, a truncated (round toward zero) value is returned.
				If a converted result cannot be represented in uint64, an implementation-defined
				architecture-specific value is returned.
				
				Asm: VCVTTPD2UQQ, CPU Feature: AVX512

			( Float64x8) Div(y Float64x8) Float64x8
				Div divides elements of two vectors.
				
				Asm: VDIVPD, CPU Feature: AVX512

			( Float64x8) Equal(y Float64x8) Mask64x8
				Equal returns x equals y, elementwise.
				
				Asm: VCMPPD, CPU Feature: AVX512

			( Float64x8) Expand(mask Mask64x8) Float64x8
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VEXPANDPD, CPU Feature: AVX512

			( Float64x8) FloorScaled(prec uint8) Float64x8
				FloorScaled rounds elements down with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPD, CPU Feature: AVX512

			( Float64x8) FloorScaledResidue(prec uint8) Float64x8
				FloorScaledResidue computes the difference after flooring with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPD, CPU Feature: AVX512

			( Float64x8) GetHi() Float64x4
				GetHi returns the upper half of x.
				
				Asm: VEXTRACTF64X4, CPU Feature: AVX512

			( Float64x8) GetLo() Float64x4
				GetLo returns the lower half of x.
				
				Asm: VEXTRACTF64X4, CPU Feature: AVX512

			( Float64x8) Greater(y Float64x8) Mask64x8
				Greater returns x greater-than y, elementwise.
				
				Asm: VCMPPD, CPU Feature: AVX512

			( Float64x8) GreaterEqual(y Float64x8) Mask64x8
				GreaterEqual returns x greater-than-or-equals y, elementwise.
				
				Asm: VCMPPD, CPU Feature: AVX512

			( Float64x8) IsNan(y Float64x8) Mask64x8
				IsNan checks if elements are NaN. Use as x.IsNan(x).
				
				Asm: VCMPPD, CPU Feature: AVX512

			( Float64x8) Len() int
				Len returns the number of elements in a Float64x8

			( Float64x8) Less(y Float64x8) Mask64x8
				Less returns x less-than y, elementwise.
				
				Asm: VCMPPD, CPU Feature: AVX512

			( Float64x8) LessEqual(y Float64x8) Mask64x8
				LessEqual returns x less-than-or-equals y, elementwise.
				
				Asm: VCMPPD, CPU Feature: AVX512

			( Float64x8) Masked(mask Mask64x8) Float64x8
				Masked returns x but with elements zeroed where mask is false.

			( Float64x8) Max(y Float64x8) Float64x8
				Max computes the maximum of corresponding elements.
				
				Asm: VMAXPD, CPU Feature: AVX512

			( Float64x8) Merge(y Float64x8, mask Mask64x8) Float64x8
				Merge returns x but with elements set to y where m is false.

			( Float64x8) Min(y Float64x8) Float64x8
				Min computes the minimum of corresponding elements.
				
				Asm: VMINPD, CPU Feature: AVX512

			( Float64x8) Mul(y Float64x8) Float64x8
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VMULPD, CPU Feature: AVX512

			( Float64x8) MulAdd(y Float64x8, z Float64x8) Float64x8
				MulAdd performs a fused (x * y) + z.
				
				Asm: VFMADD213PD, CPU Feature: AVX512

			( Float64x8) MulAddSub(y Float64x8, z Float64x8) Float64x8
				MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
				
				Asm: VFMADDSUB213PD, CPU Feature: AVX512

			( Float64x8) MulSubAdd(y Float64x8, z Float64x8) Float64x8
				MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
				
				Asm: VFMSUBADD213PD, CPU Feature: AVX512

			( Float64x8) NotEqual(y Float64x8) Mask64x8
				NotEqual returns x not-equals y, elementwise.
				
				Asm: VCMPPD, CPU Feature: AVX512

			( Float64x8) Permute(indices Uint64x8) Float64x8
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 3 bits (values 0-7) of each element of indices is used
				
				Asm: VPERMPD, CPU Feature: AVX512

			( Float64x8) Reciprocal() Float64x8
				Reciprocal computes an approximate reciprocal of each element.
				
				Asm: VRCP14PD, CPU Feature: AVX512

			( Float64x8) ReciprocalSqrt() Float64x8
				ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
				
				Asm: VRSQRT14PD, CPU Feature: AVX512

			( Float64x8) RoundToEvenScaled(prec uint8) Float64x8
				RoundToEvenScaled rounds elements with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPD, CPU Feature: AVX512

			( Float64x8) RoundToEvenScaledResidue(prec uint8) Float64x8
				RoundToEvenScaledResidue computes the difference after rounding with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPD, CPU Feature: AVX512

			( Float64x8) Scale(y Float64x8) Float64x8
				Scale multiplies elements by a power of 2.
				
				Asm: VSCALEFPD, CPU Feature: AVX512

			( Float64x8) SelectFromPairGrouped(a, b uint8, y Float64x8) Float64x8
				SelectFromPairGrouped returns, for each of the four 128-bit subvectors
				of the vectors x and y, the selection of two elements from the two
				vectors x and y, where selector values in the range 0-1 specify
				elements from x and values in the range 2-3 specify the 0-1 elements
				of y.  When the selectors are constants the selection can be
				implemented in a single instruction.
				
				If the selectors are not constant this will translate to a function
				call.
				
				Asm: VSHUFPD, CPU Feature: AVX512

			( Float64x8) SetHi(y Float64x4) Float64x8
				SetHi returns x with its upper half set to y.
				
				Asm: VINSERTF64X4, CPU Feature: AVX512

			( Float64x8) SetLo(y Float64x4) Float64x8
				SetLo returns x with its lower half set to y.
				
				Asm: VINSERTF64X4, CPU Feature: AVX512

			( Float64x8) Sqrt() Float64x8
				Sqrt computes the square root of each element.
				
				Asm: VSQRTPD, CPU Feature: AVX512

			( Float64x8) Store(y *[8]float64)
				Store stores a Float64x8 to an array

			( Float64x8) StoreMasked(y *[8]float64, mask Mask64x8)
				StoreMasked stores a Float64x8 to an array,
				at those elements enabled by mask
				
				Asm: VMOVDQU64, CPU Feature: AVX512

			( Float64x8) StoreSlice(s []float64)
				StoreSlice stores x into a slice of at least 8 float64s

			( Float64x8) StoreSlicePart(s []float64)
				StoreSlicePart stores the 8 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 8 or more elements, the method is equivalent to x.StoreSlice.

			( Float64x8) String() string
				String returns a string representation of SIMD vector x

			( Float64x8) Sub(y Float64x8) Float64x8
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VSUBPD, CPU Feature: AVX512

			( Float64x8) TruncScaled(prec uint8) Float64x8
				TruncScaled truncates elements with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VRNDSCALEPD, CPU Feature: AVX512

			( Float64x8) TruncScaledResidue(prec uint8) Float64x8
				TruncScaledResidue computes the difference after truncating with specified precision.
				
				prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VREDUCEPD, CPU Feature: AVX512

		Implements (at least 2)
			 Float64x8 : expvar.Var
			 Float64x8 : fmt.Stringer
		As Outputs Of (at least 50)
			func BroadcastFloat64x8(x float64) Float64x8
			func LoadFloat64x8(y *[8]float64) Float64x8
			func LoadFloat64x8Slice(s []float64) Float64x8
			func LoadFloat64x8SlicePart(s []float64) Float64x8
			func LoadMaskedFloat64x8(y *[8]float64, mask Mask64x8) Float64x8
			func Float32x16.AsFloat64x8() (to Float64x8)
			func Float32x8.ConvertToFloat64() Float64x8
			func Float64x2.Broadcast512() Float64x8
			func Float64x8.Add(y Float64x8) Float64x8
			func Float64x8.CeilScaled(prec uint8) Float64x8
			func Float64x8.CeilScaledResidue(prec uint8) Float64x8
			func Float64x8.Compress(mask Mask64x8) Float64x8
			func Float64x8.ConcatPermute(y Float64x8, indices Uint64x8) Float64x8
			func Float64x8.Div(y Float64x8) Float64x8
			func Float64x8.Expand(mask Mask64x8) Float64x8
			func Float64x8.FloorScaled(prec uint8) Float64x8
			func Float64x8.FloorScaledResidue(prec uint8) Float64x8
			func Float64x8.Masked(mask Mask64x8) Float64x8
			func Float64x8.Max(y Float64x8) Float64x8
			func Float64x8.Merge(y Float64x8, mask Mask64x8) Float64x8
			func Float64x8.Min(y Float64x8) Float64x8
			func Float64x8.Mul(y Float64x8) Float64x8
			func Float64x8.MulAdd(y Float64x8, z Float64x8) Float64x8
			func Float64x8.MulAddSub(y Float64x8, z Float64x8) Float64x8
			func Float64x8.MulSubAdd(y Float64x8, z Float64x8) Float64x8
			func Float64x8.Permute(indices Uint64x8) Float64x8
			func Float64x8.Reciprocal() Float64x8
			func Float64x8.ReciprocalSqrt() Float64x8
			func Float64x8.RoundToEvenScaled(prec uint8) Float64x8
			func Float64x8.RoundToEvenScaledResidue(prec uint8) Float64x8
			func Float64x8.Scale(y Float64x8) Float64x8
			func Float64x8.SelectFromPairGrouped(a, b uint8, y Float64x8) Float64x8
			func Float64x8.SetHi(y Float64x4) Float64x8
			func Float64x8.SetLo(y Float64x4) Float64x8
			func Float64x8.Sqrt() Float64x8
			func Float64x8.Sub(y Float64x8) Float64x8
			func Float64x8.TruncScaled(prec uint8) Float64x8
			func Float64x8.TruncScaledResidue(prec uint8) Float64x8
			func Int16x32.AsFloat64x8() (to Float64x8)
			func Int32x16.AsFloat64x8() (to Float64x8)
			func Int32x8.ConvertToFloat64() Float64x8
			func Int64x8.AsFloat64x8() (to Float64x8)
			func Int64x8.ConvertToFloat64() Float64x8
			func Int8x64.AsFloat64x8() (to Float64x8)
			func Uint16x32.AsFloat64x8() (to Float64x8)
			func Uint32x16.AsFloat64x8() (to Float64x8)
			func Uint32x8.ConvertToFloat64() Float64x8
			func Uint64x8.AsFloat64x8() (to Float64x8)
			func Uint64x8.ConvertToFloat64() Float64x8
			func Uint8x64.AsFloat64x8() (to Float64x8)
		As Inputs Of (at least 23)
			func Float64x8.Add(y Float64x8) Float64x8
			func Float64x8.ConcatPermute(y Float64x8, indices Uint64x8) Float64x8
			func Float64x8.Div(y Float64x8) Float64x8
			func Float64x8.Equal(y Float64x8) Mask64x8
			func Float64x8.Greater(y Float64x8) Mask64x8
			func Float64x8.GreaterEqual(y Float64x8) Mask64x8
			func Float64x8.IsNan(y Float64x8) Mask64x8
			func Float64x8.Less(y Float64x8) Mask64x8
			func Float64x8.LessEqual(y Float64x8) Mask64x8
			func Float64x8.Max(y Float64x8) Float64x8
			func Float64x8.Merge(y Float64x8, mask Mask64x8) Float64x8
			func Float64x8.Min(y Float64x8) Float64x8
			func Float64x8.Mul(y Float64x8) Float64x8
			func Float64x8.MulAdd(y Float64x8, z Float64x8) Float64x8
			func Float64x8.MulAdd(y Float64x8, z Float64x8) Float64x8
			func Float64x8.MulAddSub(y Float64x8, z Float64x8) Float64x8
			func Float64x8.MulAddSub(y Float64x8, z Float64x8) Float64x8
			func Float64x8.MulSubAdd(y Float64x8, z Float64x8) Float64x8
			func Float64x8.MulSubAdd(y Float64x8, z Float64x8) Float64x8
			func Float64x8.NotEqual(y Float64x8) Mask64x8
			func Float64x8.Scale(y Float64x8) Float64x8
			func Float64x8.SelectFromPairGrouped(a, b uint8, y Float64x8) Float64x8
			func Float64x8.Sub(y Float64x8) Float64x8

	 type Int16x16 (struct)
		Int16x16 is a 256-bit SIMD vector of 16 int16

		Methods (total 70)
			( Int16x16) Abs() Int16x16
				Abs computes the absolute value of each element.
				
				Asm: VPABSW, CPU Feature: AVX2

			( Int16x16) Add(y Int16x16) Int16x16
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDW, CPU Feature: AVX2

			( Int16x16) AddPairs(y Int16x16) Int16x16
				AddPairs horizontally adds adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
				
				Asm: VPHADDW, CPU Feature: AVX2

			( Int16x16) AddPairsSaturated(y Int16x16) Int16x16
				AddPairsSaturated horizontally adds adjacent pairs of elements with saturation.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
				
				Asm: VPHADDSW, CPU Feature: AVX2

			( Int16x16) AddSaturated(y Int16x16) Int16x16
				AddSaturated adds corresponding elements of two vectors with saturation.
				
				Asm: VPADDSW, CPU Feature: AVX2

			( Int16x16) And(y Int16x16) Int16x16
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPAND, CPU Feature: AVX2

			( Int16x16) AndNot(y Int16x16) Int16x16
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDN, CPU Feature: AVX2

			( Int16x16) AsFloat32x8() (to Float32x8)
				Float32x8 converts from Int16x16 to Float32x8

			( Int16x16) AsFloat64x4() (to Float64x4)
				Float64x4 converts from Int16x16 to Float64x4

			( Int16x16) AsInt32x8() (to Int32x8)
				Int32x8 converts from Int16x16 to Int32x8

			( Int16x16) AsInt64x4() (to Int64x4)
				Int64x4 converts from Int16x16 to Int64x4

			( Int16x16) AsInt8x32() (to Int8x32)
				Int8x32 converts from Int16x16 to Int8x32

			( Int16x16) AsUint16x16() (to Uint16x16)
				Uint16x16 converts from Int16x16 to Uint16x16

			( Int16x16) AsUint32x8() (to Uint32x8)
				Uint32x8 converts from Int16x16 to Uint32x8

			( Int16x16) AsUint64x4() (to Uint64x4)
				Uint64x4 converts from Int16x16 to Uint64x4

			( Int16x16) AsUint8x32() (to Uint8x32)
				Uint8x32 converts from Int16x16 to Uint8x32

			( Int16x16) Compress(mask Mask16x16) Int16x16
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2

			( Int16x16) ConcatPermute(y Int16x16, indices Uint16x16) Int16x16
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2W, CPU Feature: AVX512

			( Int16x16) CopySign(y Int16x16) Int16x16
				CopySign returns the product of the first operand with -1, 0, or 1,
				whichever constant is nearest to the value of the second operand.
				
				Asm: VPSIGNW, CPU Feature: AVX2

			( Int16x16) DotProductPairs(y Int16x16) Int32x8
				DotProductPairs multiplies the elements and add the pairs together,
				yielding a vector of half as many elements with twice the input element size.
				
				Asm: VPMADDWD, CPU Feature: AVX2

			( Int16x16) Equal(y Int16x16) Mask16x16
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQW, CPU Feature: AVX2

			( Int16x16) Expand(mask Mask16x16) Int16x16
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDW, CPU Feature: AVX512VBMI2

			( Int16x16) ExtendToInt32() Int32x16
				ExtendToInt32 converts element values to int32.
				The result vector's elements are sign-extended.
				
				Asm: VPMOVSXWD, CPU Feature: AVX512

			( Int16x16) GetHi() Int16x8
				GetHi returns the upper half of x.
				
				Asm: VEXTRACTI128, CPU Feature: AVX2

			( Int16x16) GetLo() Int16x8
				GetLo returns the lower half of x.
				
				Asm: VEXTRACTI128, CPU Feature: AVX2

			( Int16x16) Greater(y Int16x16) Mask16x16
				Greater returns x greater-than y, elementwise.
				
				Asm: VPCMPGTW, CPU Feature: AVX2

			( Int16x16) GreaterEqual(y Int16x16) Mask16x16
				GreaterEqual returns a mask whose elements indicate whether x >= y
				
				Emulated, CPU Feature AVX2

			( Int16x16) InterleaveHiGrouped(y Int16x16) Int16x16
				InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKHWD, CPU Feature: AVX2

			( Int16x16) InterleaveLoGrouped(y Int16x16) Int16x16
				InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKLWD, CPU Feature: AVX2

			( Int16x16) IsZero() bool
				IsZero returns true if all elements of x are zeros.
				
				This method compiles to VPTEST x, x.
				x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
				
				Asm: VPTEST, CPU Feature: AVX

			( Int16x16) Len() int
				Len returns the number of elements in a Int16x16

			( Int16x16) Less(y Int16x16) Mask16x16
				Less returns a mask whose elements indicate whether x < y
				
				Emulated, CPU Feature AVX2

			( Int16x16) LessEqual(y Int16x16) Mask16x16
				LessEqual returns a mask whose elements indicate whether x <= y
				
				Emulated, CPU Feature AVX2

			( Int16x16) Masked(mask Mask16x16) Int16x16
				Masked returns x but with elements zeroed where mask is false.

			( Int16x16) Max(y Int16x16) Int16x16
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXSW, CPU Feature: AVX2

			( Int16x16) Merge(y Int16x16, mask Mask16x16) Int16x16
				Merge returns x but with elements set to y where mask is false.

			( Int16x16) Min(y Int16x16) Int16x16
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINSW, CPU Feature: AVX2

			( Int16x16) Mul(y Int16x16) Int16x16
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VPMULLW, CPU Feature: AVX2

			( Int16x16) MulHigh(y Int16x16) Int16x16
				MulHigh multiplies elements and stores the high part of the result.
				
				Asm: VPMULHW, CPU Feature: AVX2

			( Int16x16) Not() Int16x16
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX2

			( Int16x16) NotEqual(y Int16x16) Mask16x16
				NotEqual returns a mask whose elements indicate whether x != y
				
				Emulated, CPU Feature AVX2

			( Int16x16) OnesCount() Int16x16
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTW, CPU Feature: AVX512BITALG

			( Int16x16) Or(y Int16x16) Int16x16
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPOR, CPU Feature: AVX2

			( Int16x16) Permute(indices Uint16x16) Int16x16
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 4 bits (values 0-15) of each element of indices is used
				
				Asm: VPERMW, CPU Feature: AVX512

			( Int16x16) PermuteScalarsHiGrouped(a, b, c, d uint8) Int16x16
				PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:
				
					 result =
						  {x[0], x[1], x[2], x[3],   x[a+4], x[b+4], x[c+4], x[d+4],
							x[8], x[9], x[10], x[11], x[a+12], x[b+12], x[c+12], x[d+12]}
				
				Parameters a,b,c,d should have values between 0 and 3.
				If a through d are constants, then an instruction will be inlined, otherwise
				a jump table is generated.
				
				Asm: VPSHUFHW, CPU Feature: AVX2

			( Int16x16) PermuteScalarsLoGrouped(a, b, c, d uint8) Int16x16
				PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:
				
					 result =
					 {x[a], x[b], x[c], x[d],         x[4], x[5], x[6], x[7],
						 x[a+8], x[b+8], x[c+8], x[d+8], x[12], x[13], x[14], x[15]}
				
				Parameters a,b,c,d should have values between 0 and 3.
				If a through d are constants, then an instruction will be inlined, otherwise
				a jump table is generated.
				
				Asm: VPSHUFLW, CPU Feature: AVX2

			( Int16x16) SaturateToInt8() Int8x16
				SaturateToInt8 converts element values to int8.
				Conversion is done with saturation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVSWB, CPU Feature: AVX512

			( Int16x16) SaturateToUint8() Int8x16
				SaturateToUint8 converts element values to uint8.
				Conversion is done with saturation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVSWB, CPU Feature: AVX512

			( Int16x16) Select128FromPair(lo, hi uint8, y Int16x16) Int16x16
				Select128FromPair treats the 256-bit vectors x and y as a single vector of four
				128-bit elements, and returns a 256-bit result formed by
				concatenating the two elements specified by lo and hi.
				For example,
				
					{40, 41, 42, 43, 44, 45, 46, 47, 50, 51, 52, 53, 54, 55, 56, 57}.Select128FromPair(3, 0,
					 {60, 61, 62, 63, 64, 65, 66, 67, 70, 71, 72, 73, 74, 75, 76, 77})
				
				returns {70, 71, 72, 73, 74, 75, 76, 77, 40, 41, 42, 43, 44, 45, 46, 47}.
				
				lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table.
				lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
				
				Asm: VPERM2I128, CPU Feature: AVX2

			( Int16x16) SetHi(y Int16x8) Int16x16
				SetHi returns x with its upper half set to y.
				
				Asm: VINSERTI128, CPU Feature: AVX2

			( Int16x16) SetLo(y Int16x8) Int16x16
				SetLo returns x with its lower half set to y.
				
				Asm: VINSERTI128, CPU Feature: AVX2

			( Int16x16) ShiftAllLeft(y uint64) Int16x16
				ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
				
				Asm: VPSLLW, CPU Feature: AVX2

			( Int16x16) ShiftAllLeftConcat(shift uint8, y Int16x16) Int16x16
				ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHLDW, CPU Feature: AVX512VBMI2

			( Int16x16) ShiftAllRight(y uint64) Int16x16
				ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
				
				Asm: VPSRAW, CPU Feature: AVX2

			( Int16x16) ShiftAllRightConcat(shift uint8, y Int16x16) Int16x16
				ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHRDW, CPU Feature: AVX512VBMI2

			( Int16x16) ShiftLeft(y Int16x16) Int16x16
				ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
				
				Asm: VPSLLVW, CPU Feature: AVX512

			( Int16x16) ShiftLeftConcat(y Int16x16, z Int16x16) Int16x16
				ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
				
				Asm: VPSHLDVW, CPU Feature: AVX512VBMI2

			( Int16x16) ShiftRight(y Int16x16) Int16x16
				ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
				
				Asm: VPSRAVW, CPU Feature: AVX512

			( Int16x16) ShiftRightConcat(y Int16x16, z Int16x16) Int16x16
				ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
				
				Asm: VPSHRDVW, CPU Feature: AVX512VBMI2

			( Int16x16) Store(y *[16]int16)
				Store stores a Int16x16 to an array

			( Int16x16) StoreSlice(s []int16)
				StoreSlice stores x into a slice of at least 16 int16s

			( Int16x16) StoreSlicePart(s []int16)
				StoreSlicePart stores the elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 16 or more elements, the method is equivalent to x.StoreSlice.

			( Int16x16) String() string
				String returns a string representation of SIMD vector x

			( Int16x16) Sub(y Int16x16) Int16x16
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBW, CPU Feature: AVX2

			( Int16x16) SubPairs(y Int16x16) Int16x16
				SubPairs horizontally subtracts adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
				
				Asm: VPHSUBW, CPU Feature: AVX2

			( Int16x16) SubPairsSaturated(y Int16x16) Int16x16
				SubPairsSaturated horizontally subtracts adjacent pairs of elements with saturation.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
				
				Asm: VPHSUBSW, CPU Feature: AVX2

			( Int16x16) SubSaturated(y Int16x16) Int16x16
				SubSaturated subtracts corresponding elements of two vectors with saturation.
				
				Asm: VPSUBSW, CPU Feature: AVX2

			( Int16x16) ToMask() (to Mask16x16)
				ToMask converts from Int16x16 to Mask16x16, mask element is set to true when the corresponding vector element is non-zero.

			( Int16x16) TruncateToInt8() Int8x16
				TruncateToInt8 converts element values to int8.
				Conversion is done with truncation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVWB, CPU Feature: AVX512

			( Int16x16) Xor(y Int16x16) Int16x16
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXOR, CPU Feature: AVX2

		Implements (at least 2)
			 Int16x16 : expvar.Var
			 Int16x16 : fmt.Stringer
		As Outputs Of (at least 63)
			func BroadcastInt16x16(x int16) Int16x16
			func LoadInt16x16(y *[16]int16) Int16x16
			func LoadInt16x16Slice(s []int16) Int16x16
			func LoadInt16x16SlicePart(s []int16) Int16x16
			func Float32x8.AsInt16x16() (to Int16x16)
			func Float64x4.AsInt16x16() (to Int16x16)
			func Int16x16.Abs() Int16x16
			func Int16x16.Add(y Int16x16) Int16x16
			func Int16x16.AddPairs(y Int16x16) Int16x16
			func Int16x16.AddPairsSaturated(y Int16x16) Int16x16
			func Int16x16.AddSaturated(y Int16x16) Int16x16
			func Int16x16.And(y Int16x16) Int16x16
			func Int16x16.AndNot(y Int16x16) Int16x16
			func Int16x16.Compress(mask Mask16x16) Int16x16
			func Int16x16.ConcatPermute(y Int16x16, indices Uint16x16) Int16x16
			func Int16x16.CopySign(y Int16x16) Int16x16
			func Int16x16.Expand(mask Mask16x16) Int16x16
			func Int16x16.InterleaveHiGrouped(y Int16x16) Int16x16
			func Int16x16.InterleaveLoGrouped(y Int16x16) Int16x16
			func Int16x16.Masked(mask Mask16x16) Int16x16
			func Int16x16.Max(y Int16x16) Int16x16
			func Int16x16.Merge(y Int16x16, mask Mask16x16) Int16x16
			func Int16x16.Min(y Int16x16) Int16x16
			func Int16x16.Mul(y Int16x16) Int16x16
			func Int16x16.MulHigh(y Int16x16) Int16x16
			func Int16x16.Not() Int16x16
			func Int16x16.OnesCount() Int16x16
			func Int16x16.Or(y Int16x16) Int16x16
			func Int16x16.Permute(indices Uint16x16) Int16x16
			func Int16x16.PermuteScalarsHiGrouped(a, b, c, d uint8) Int16x16
			func Int16x16.PermuteScalarsLoGrouped(a, b, c, d uint8) Int16x16
			func Int16x16.Select128FromPair(lo, hi uint8, y Int16x16) Int16x16
			func Int16x16.SetHi(y Int16x8) Int16x16
			func Int16x16.SetLo(y Int16x8) Int16x16
			func Int16x16.ShiftAllLeft(y uint64) Int16x16
			func Int16x16.ShiftAllLeftConcat(shift uint8, y Int16x16) Int16x16
			func Int16x16.ShiftAllRight(y uint64) Int16x16
			func Int16x16.ShiftAllRightConcat(shift uint8, y Int16x16) Int16x16
			func Int16x16.ShiftLeft(y Int16x16) Int16x16
			func Int16x16.ShiftLeftConcat(y Int16x16, z Int16x16) Int16x16
			func Int16x16.ShiftRight(y Int16x16) Int16x16
			func Int16x16.ShiftRightConcat(y Int16x16, z Int16x16) Int16x16
			func Int16x16.Sub(y Int16x16) Int16x16
			func Int16x16.SubPairs(y Int16x16) Int16x16
			func Int16x16.SubPairsSaturated(y Int16x16) Int16x16
			func Int16x16.SubSaturated(y Int16x16) Int16x16
			func Int16x16.Xor(y Int16x16) Int16x16
			func Int16x32.GetHi() Int16x16
			func Int16x32.GetLo() Int16x16
			func Int16x8.Broadcast256() Int16x16
			func Int32x16.SaturateToInt16() Int16x16
			func Int32x16.TruncateToInt16() Int16x16
			func Int32x8.AsInt16x16() (to Int16x16)
			func Int32x8.SaturateToInt16Concat(y Int32x8) Int16x16
			func Int64x4.AsInt16x16() (to Int16x16)
			func Int8x16.ExtendToInt16() Int16x16
			func Int8x32.AsInt16x16() (to Int16x16)
			func Mask16x16.ToInt16x16() (to Int16x16)
			func Uint16x16.AsInt16x16() (to Int16x16)
			func Uint32x8.AsInt16x16() (to Int16x16)
			func Uint64x4.AsInt16x16() (to Int16x16)
			func Uint8x32.AsInt16x16() (to Int16x16)
			func Uint8x32.DotProductPairsSaturated(y Int8x32) Int16x16
		As Inputs Of (at least 39)
			func Int16x16.Add(y Int16x16) Int16x16
			func Int16x16.AddPairs(y Int16x16) Int16x16
			func Int16x16.AddPairsSaturated(y Int16x16) Int16x16
			func Int16x16.AddSaturated(y Int16x16) Int16x16
			func Int16x16.And(y Int16x16) Int16x16
			func Int16x16.AndNot(y Int16x16) Int16x16
			func Int16x16.ConcatPermute(y Int16x16, indices Uint16x16) Int16x16
			func Int16x16.CopySign(y Int16x16) Int16x16
			func Int16x16.DotProductPairs(y Int16x16) Int32x8
			func Int16x16.Equal(y Int16x16) Mask16x16
			func Int16x16.Greater(y Int16x16) Mask16x16
			func Int16x16.GreaterEqual(y Int16x16) Mask16x16
			func Int16x16.InterleaveHiGrouped(y Int16x16) Int16x16
			func Int16x16.InterleaveLoGrouped(y Int16x16) Int16x16
			func Int16x16.Less(y Int16x16) Mask16x16
			func Int16x16.LessEqual(y Int16x16) Mask16x16
			func Int16x16.Max(y Int16x16) Int16x16
			func Int16x16.Merge(y Int16x16, mask Mask16x16) Int16x16
			func Int16x16.Min(y Int16x16) Int16x16
			func Int16x16.Mul(y Int16x16) Int16x16
			func Int16x16.MulHigh(y Int16x16) Int16x16
			func Int16x16.NotEqual(y Int16x16) Mask16x16
			func Int16x16.Or(y Int16x16) Int16x16
			func Int16x16.Select128FromPair(lo, hi uint8, y Int16x16) Int16x16
			func Int16x16.ShiftAllLeftConcat(shift uint8, y Int16x16) Int16x16
			func Int16x16.ShiftAllRightConcat(shift uint8, y Int16x16) Int16x16
			func Int16x16.ShiftLeft(y Int16x16) Int16x16
			func Int16x16.ShiftLeftConcat(y Int16x16, z Int16x16) Int16x16
			func Int16x16.ShiftLeftConcat(y Int16x16, z Int16x16) Int16x16
			func Int16x16.ShiftRight(y Int16x16) Int16x16
			func Int16x16.ShiftRightConcat(y Int16x16, z Int16x16) Int16x16
			func Int16x16.ShiftRightConcat(y Int16x16, z Int16x16) Int16x16
			func Int16x16.Sub(y Int16x16) Int16x16
			func Int16x16.SubPairs(y Int16x16) Int16x16
			func Int16x16.SubPairsSaturated(y Int16x16) Int16x16
			func Int16x16.SubSaturated(y Int16x16) Int16x16
			func Int16x16.Xor(y Int16x16) Int16x16
			func Int16x32.SetHi(y Int16x16) Int16x32
			func Int16x32.SetLo(y Int16x16) Int16x32

	 type Int16x32 (struct)
		Int16x32 is a 512-bit SIMD vector of 32 int16

		Methods (total 62)
			( Int16x32) Abs() Int16x32
				Abs computes the absolute value of each element.
				
				Asm: VPABSW, CPU Feature: AVX512

			( Int16x32) Add(y Int16x32) Int16x32
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDW, CPU Feature: AVX512

			( Int16x32) AddSaturated(y Int16x32) Int16x32
				AddSaturated adds corresponding elements of two vectors with saturation.
				
				Asm: VPADDSW, CPU Feature: AVX512

			( Int16x32) And(y Int16x32) Int16x32
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPANDD, CPU Feature: AVX512

			( Int16x32) AndNot(y Int16x32) Int16x32
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDND, CPU Feature: AVX512

			( Int16x32) AsFloat32x16() (to Float32x16)
				Float32x16 converts from Int16x32 to Float32x16

			( Int16x32) AsFloat64x8() (to Float64x8)
				Float64x8 converts from Int16x32 to Float64x8

			( Int16x32) AsInt32x16() (to Int32x16)
				Int32x16 converts from Int16x32 to Int32x16

			( Int16x32) AsInt64x8() (to Int64x8)
				Int64x8 converts from Int16x32 to Int64x8

			( Int16x32) AsInt8x64() (to Int8x64)
				Int8x64 converts from Int16x32 to Int8x64

			( Int16x32) AsUint16x32() (to Uint16x32)
				Uint16x32 converts from Int16x32 to Uint16x32

			( Int16x32) AsUint32x16() (to Uint32x16)
				Uint32x16 converts from Int16x32 to Uint32x16

			( Int16x32) AsUint64x8() (to Uint64x8)
				Uint64x8 converts from Int16x32 to Uint64x8

			( Int16x32) AsUint8x64() (to Uint8x64)
				Uint8x64 converts from Int16x32 to Uint8x64

			( Int16x32) Compress(mask Mask16x32) Int16x32
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2

			( Int16x32) ConcatPermute(y Int16x32, indices Uint16x32) Int16x32
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2W, CPU Feature: AVX512

			( Int16x32) DotProductPairs(y Int16x32) Int32x16
				DotProductPairs multiplies the elements and add the pairs together,
				yielding a vector of half as many elements with twice the input element size.
				
				Asm: VPMADDWD, CPU Feature: AVX512

			( Int16x32) Equal(y Int16x32) Mask16x32
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQW, CPU Feature: AVX512

			( Int16x32) Expand(mask Mask16x32) Int16x32
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDW, CPU Feature: AVX512VBMI2

			( Int16x32) GetHi() Int16x16
				GetHi returns the upper half of x.
				
				Asm: VEXTRACTI64X4, CPU Feature: AVX512

			( Int16x32) GetLo() Int16x16
				GetLo returns the lower half of x.
				
				Asm: VEXTRACTI64X4, CPU Feature: AVX512

			( Int16x32) Greater(y Int16x32) Mask16x32
				Greater returns x greater-than y, elementwise.
				
				Asm: VPCMPGTW, CPU Feature: AVX512

			( Int16x32) GreaterEqual(y Int16x32) Mask16x32
				GreaterEqual returns x greater-than-or-equals y, elementwise.
				
				Asm: VPCMPW, CPU Feature: AVX512

			( Int16x32) InterleaveHiGrouped(y Int16x32) Int16x32
				InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKHWD, CPU Feature: AVX512

			( Int16x32) InterleaveLoGrouped(y Int16x32) Int16x32
				InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKLWD, CPU Feature: AVX512

			( Int16x32) Len() int
				Len returns the number of elements in a Int16x32

			( Int16x32) Less(y Int16x32) Mask16x32
				Less returns x less-than y, elementwise.
				
				Asm: VPCMPW, CPU Feature: AVX512

			( Int16x32) LessEqual(y Int16x32) Mask16x32
				LessEqual returns x less-than-or-equals y, elementwise.
				
				Asm: VPCMPW, CPU Feature: AVX512

			( Int16x32) Masked(mask Mask16x32) Int16x32
				Masked returns x but with elements zeroed where mask is false.

			( Int16x32) Max(y Int16x32) Int16x32
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXSW, CPU Feature: AVX512

			( Int16x32) Merge(y Int16x32, mask Mask16x32) Int16x32
				Merge returns x but with elements set to y where m is false.

			( Int16x32) Min(y Int16x32) Int16x32
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINSW, CPU Feature: AVX512

			( Int16x32) Mul(y Int16x32) Int16x32
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VPMULLW, CPU Feature: AVX512

			( Int16x32) MulHigh(y Int16x32) Int16x32
				MulHigh multiplies elements and stores the high part of the result.
				
				Asm: VPMULHW, CPU Feature: AVX512

			( Int16x32) Not() Int16x32
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX512

			( Int16x32) NotEqual(y Int16x32) Mask16x32
				NotEqual returns x not-equals y, elementwise.
				
				Asm: VPCMPW, CPU Feature: AVX512

			( Int16x32) OnesCount() Int16x32
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTW, CPU Feature: AVX512BITALG

			( Int16x32) Or(y Int16x32) Int16x32
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPORD, CPU Feature: AVX512

			( Int16x32) Permute(indices Uint16x32) Int16x32
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 5 bits (values 0-31) of each element of indices is used
				
				Asm: VPERMW, CPU Feature: AVX512

			( Int16x32) PermuteScalarsHiGrouped(a, b, c, d uint8) Int16x32
				PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:
				
					 result =
						  {x[0], x[1], x[2], x[3],     x[a+4], x[b+4], x[c+4], x[d+4],
							x[8], x[9], x[10], x[11],   x[a+12], x[b+12], x[c+12], x[d+12],
							x[16], x[17], x[18], x[19], x[a+20], x[b+20], x[c+20], x[d+20],
							x[24], x[25], x[26], x[27], x[a+28], x[b+28], x[c+28], x[d+28]}
				
				Parameters a,b,c,d should have values between 0 and 3.
				If a through d are constants, then an instruction will be inlined, otherwise
				a jump table is generated.
				
				Asm: VPSHUFHW, CPU Feature: AVX512

			( Int16x32) PermuteScalarsLoGrouped(a, b, c, d uint8) Int16x32
				PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:
				
					 result =
					 {x[a], x[b], x[c], x[d],    x[4], x[5], x[6], x[7],
						x[a+8], x[b+8], x[c+8], x[d+8],     x[12], x[13], x[14], x[15],
						x[a+16], x[b+16], x[c+16], x[d+16], x[20], x[21], x[22], x[23],
						x[a+24], x[b+24], x[c+24], x[d+24], x[28], x[29], x[30], x[31]}
				
				Parameters a,b,c,d should have values between 0 and 3.
				If a through d are constants, then an instruction will be inlined, otherwise
				a jump table is generated.
				
				Asm: VPSHUFLW, CPU Feature: AVX512

			( Int16x32) SaturateToInt8() Int8x32
				SaturateToInt8 converts element values to int8.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPMOVSWB, CPU Feature: AVX512

			( Int16x32) SetHi(y Int16x16) Int16x32
				SetHi returns x with its upper half set to y.
				
				Asm: VINSERTI64X4, CPU Feature: AVX512

			( Int16x32) SetLo(y Int16x16) Int16x32
				SetLo returns x with its lower half set to y.
				
				Asm: VINSERTI64X4, CPU Feature: AVX512

			( Int16x32) ShiftAllLeft(y uint64) Int16x32
				ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
				
				Asm: VPSLLW, CPU Feature: AVX512

			( Int16x32) ShiftAllLeftConcat(shift uint8, y Int16x32) Int16x32
				ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHLDW, CPU Feature: AVX512VBMI2

			( Int16x32) ShiftAllRight(y uint64) Int16x32
				ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
				
				Asm: VPSRAW, CPU Feature: AVX512

			( Int16x32) ShiftAllRightConcat(shift uint8, y Int16x32) Int16x32
				ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHRDW, CPU Feature: AVX512VBMI2

			( Int16x32) ShiftLeft(y Int16x32) Int16x32
				ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
				
				Asm: VPSLLVW, CPU Feature: AVX512

			( Int16x32) ShiftLeftConcat(y Int16x32, z Int16x32) Int16x32
				ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
				
				Asm: VPSHLDVW, CPU Feature: AVX512VBMI2

			( Int16x32) ShiftRight(y Int16x32) Int16x32
				ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
				
				Asm: VPSRAVW, CPU Feature: AVX512

			( Int16x32) ShiftRightConcat(y Int16x32, z Int16x32) Int16x32
				ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
				
				Asm: VPSHRDVW, CPU Feature: AVX512VBMI2

			( Int16x32) Store(y *[32]int16)
				Store stores a Int16x32 to an array

			( Int16x32) StoreMasked(y *[32]int16, mask Mask16x32)
				StoreMasked stores a Int16x32 to an array,
				at those elements enabled by mask
				
				Asm: VMOVDQU16, CPU Feature: AVX512

			( Int16x32) StoreSlice(s []int16)
				StoreSlice stores x into a slice of at least 32 int16s

			( Int16x32) StoreSlicePart(s []int16)
				StoreSlicePart stores the 32 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 32 or more elements, the method is equivalent to x.StoreSlice.

			( Int16x32) String() string
				String returns a string representation of SIMD vector x

			( Int16x32) Sub(y Int16x32) Int16x32
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBW, CPU Feature: AVX512

			( Int16x32) SubSaturated(y Int16x32) Int16x32
				SubSaturated subtracts corresponding elements of two vectors with saturation.
				
				Asm: VPSUBSW, CPU Feature: AVX512

			( Int16x32) ToMask() (to Mask16x32)
				ToMask converts from Int16x32 to Mask16x32, mask element is set to true when the corresponding vector element is non-zero.

			( Int16x32) TruncateToInt8() Int8x32
				TruncateToInt8 converts element values to int8.
				Conversion is done with truncation on the vector elements.
				
				Asm: VPMOVWB, CPU Feature: AVX512

			( Int16x32) Xor(y Int16x32) Int16x32
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXORD, CPU Feature: AVX512

		Implements (at least 2)
			 Int16x32 : expvar.Var
			 Int16x32 : fmt.Stringer
		As Outputs Of (at least 54)
			func BroadcastInt16x32(x int16) Int16x32
			func LoadInt16x32(y *[32]int16) Int16x32
			func LoadInt16x32Slice(s []int16) Int16x32
			func LoadInt16x32SlicePart(s []int16) Int16x32
			func LoadMaskedInt16x32(y *[32]int16, mask Mask16x32) Int16x32
			func Float32x16.AsInt16x32() (to Int16x32)
			func Float64x8.AsInt16x32() (to Int16x32)
			func Int16x32.Abs() Int16x32
			func Int16x32.Add(y Int16x32) Int16x32
			func Int16x32.AddSaturated(y Int16x32) Int16x32
			func Int16x32.And(y Int16x32) Int16x32
			func Int16x32.AndNot(y Int16x32) Int16x32
			func Int16x32.Compress(mask Mask16x32) Int16x32
			func Int16x32.ConcatPermute(y Int16x32, indices Uint16x32) Int16x32
			func Int16x32.Expand(mask Mask16x32) Int16x32
			func Int16x32.InterleaveHiGrouped(y Int16x32) Int16x32
			func Int16x32.InterleaveLoGrouped(y Int16x32) Int16x32
			func Int16x32.Masked(mask Mask16x32) Int16x32
			func Int16x32.Max(y Int16x32) Int16x32
			func Int16x32.Merge(y Int16x32, mask Mask16x32) Int16x32
			func Int16x32.Min(y Int16x32) Int16x32
			func Int16x32.Mul(y Int16x32) Int16x32
			func Int16x32.MulHigh(y Int16x32) Int16x32
			func Int16x32.Not() Int16x32
			func Int16x32.OnesCount() Int16x32
			func Int16x32.Or(y Int16x32) Int16x32
			func Int16x32.Permute(indices Uint16x32) Int16x32
			func Int16x32.PermuteScalarsHiGrouped(a, b, c, d uint8) Int16x32
			func Int16x32.PermuteScalarsLoGrouped(a, b, c, d uint8) Int16x32
			func Int16x32.SetHi(y Int16x16) Int16x32
			func Int16x32.SetLo(y Int16x16) Int16x32
			func Int16x32.ShiftAllLeft(y uint64) Int16x32
			func Int16x32.ShiftAllLeftConcat(shift uint8, y Int16x32) Int16x32
			func Int16x32.ShiftAllRight(y uint64) Int16x32
			func Int16x32.ShiftAllRightConcat(shift uint8, y Int16x32) Int16x32
			func Int16x32.ShiftLeft(y Int16x32) Int16x32
			func Int16x32.ShiftLeftConcat(y Int16x32, z Int16x32) Int16x32
			func Int16x32.ShiftRight(y Int16x32) Int16x32
			func Int16x32.ShiftRightConcat(y Int16x32, z Int16x32) Int16x32
			func Int16x32.Sub(y Int16x32) Int16x32
			func Int16x32.SubSaturated(y Int16x32) Int16x32
			func Int16x32.Xor(y Int16x32) Int16x32
			func Int16x8.Broadcast512() Int16x32
			func Int32x16.AsInt16x32() (to Int16x32)
			func Int32x16.SaturateToInt16Concat(y Int32x16) Int16x32
			func Int64x8.AsInt16x32() (to Int16x32)
			func Int8x32.ExtendToInt16() Int16x32
			func Int8x64.AsInt16x32() (to Int16x32)
			func Mask16x32.ToInt16x32() (to Int16x32)
			func Uint16x32.AsInt16x32() (to Int16x32)
			func Uint32x16.AsInt16x32() (to Int16x32)
			func Uint64x8.AsInt16x32() (to Int16x32)
			func Uint8x64.AsInt16x32() (to Int16x32)
			func Uint8x64.DotProductPairsSaturated(y Int8x64) Int16x32
		As Inputs Of (at least 31)
			func Int16x32.Add(y Int16x32) Int16x32
			func Int16x32.AddSaturated(y Int16x32) Int16x32
			func Int16x32.And(y Int16x32) Int16x32
			func Int16x32.AndNot(y Int16x32) Int16x32
			func Int16x32.ConcatPermute(y Int16x32, indices Uint16x32) Int16x32
			func Int16x32.DotProductPairs(y Int16x32) Int32x16
			func Int16x32.Equal(y Int16x32) Mask16x32
			func Int16x32.Greater(y Int16x32) Mask16x32
			func Int16x32.GreaterEqual(y Int16x32) Mask16x32
			func Int16x32.InterleaveHiGrouped(y Int16x32) Int16x32
			func Int16x32.InterleaveLoGrouped(y Int16x32) Int16x32
			func Int16x32.Less(y Int16x32) Mask16x32
			func Int16x32.LessEqual(y Int16x32) Mask16x32
			func Int16x32.Max(y Int16x32) Int16x32
			func Int16x32.Merge(y Int16x32, mask Mask16x32) Int16x32
			func Int16x32.Min(y Int16x32) Int16x32
			func Int16x32.Mul(y Int16x32) Int16x32
			func Int16x32.MulHigh(y Int16x32) Int16x32
			func Int16x32.NotEqual(y Int16x32) Mask16x32
			func Int16x32.Or(y Int16x32) Int16x32
			func Int16x32.ShiftAllLeftConcat(shift uint8, y Int16x32) Int16x32
			func Int16x32.ShiftAllRightConcat(shift uint8, y Int16x32) Int16x32
			func Int16x32.ShiftLeft(y Int16x32) Int16x32
			func Int16x32.ShiftLeftConcat(y Int16x32, z Int16x32) Int16x32
			func Int16x32.ShiftLeftConcat(y Int16x32, z Int16x32) Int16x32
			func Int16x32.ShiftRight(y Int16x32) Int16x32
			func Int16x32.ShiftRightConcat(y Int16x32, z Int16x32) Int16x32
			func Int16x32.ShiftRightConcat(y Int16x32, z Int16x32) Int16x32
			func Int16x32.Sub(y Int16x32) Int16x32
			func Int16x32.SubSaturated(y Int16x32) Int16x32
			func Int16x32.Xor(y Int16x32) Int16x32

	 type Int16x8 (struct)
		Int16x8 is a 128-bit SIMD vector of 8 int16

		Methods (total 74)
			( Int16x8) Abs() Int16x8
				Abs computes the absolute value of each element.
				
				Asm: VPABSW, CPU Feature: AVX

			( Int16x8) Add(y Int16x8) Int16x8
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDW, CPU Feature: AVX

			( Int16x8) AddPairs(y Int16x8) Int16x8
				AddPairs horizontally adds adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
				
				Asm: VPHADDW, CPU Feature: AVX

			( Int16x8) AddPairsSaturated(y Int16x8) Int16x8
				AddPairsSaturated horizontally adds adjacent pairs of elements with saturation.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
				
				Asm: VPHADDSW, CPU Feature: AVX

			( Int16x8) AddSaturated(y Int16x8) Int16x8
				AddSaturated adds corresponding elements of two vectors with saturation.
				
				Asm: VPADDSW, CPU Feature: AVX

			( Int16x8) And(y Int16x8) Int16x8
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPAND, CPU Feature: AVX

			( Int16x8) AndNot(y Int16x8) Int16x8
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDN, CPU Feature: AVX

			( Int16x8) AsFloat32x4() (to Float32x4)
				Float32x4 converts from Int16x8 to Float32x4

			( Int16x8) AsFloat64x2() (to Float64x2)
				Float64x2 converts from Int16x8 to Float64x2

			( Int16x8) AsInt32x4() (to Int32x4)
				Int32x4 converts from Int16x8 to Int32x4

			( Int16x8) AsInt64x2() (to Int64x2)
				Int64x2 converts from Int16x8 to Int64x2

			( Int16x8) AsInt8x16() (to Int8x16)
				Int8x16 converts from Int16x8 to Int8x16

			( Int16x8) AsUint16x8() (to Uint16x8)
				Uint16x8 converts from Int16x8 to Uint16x8

			( Int16x8) AsUint32x4() (to Uint32x4)
				Uint32x4 converts from Int16x8 to Uint32x4

			( Int16x8) AsUint64x2() (to Uint64x2)
				Uint64x2 converts from Int16x8 to Uint64x2

			( Int16x8) AsUint8x16() (to Uint8x16)
				Uint8x16 converts from Int16x8 to Uint8x16

			( Int16x8) Broadcast128() Int16x8
				Broadcast128 copies element zero of its (128-bit) input to all elements of
				the 128-bit output vector.
				
				Asm: VPBROADCASTW, CPU Feature: AVX2

			( Int16x8) Broadcast256() Int16x16
				Broadcast256 copies element zero of its (128-bit) input to all elements of
				the 256-bit output vector.
				
				Asm: VPBROADCASTW, CPU Feature: AVX2

			( Int16x8) Broadcast512() Int16x32
				Broadcast512 copies element zero of its (128-bit) input to all elements of
				the 512-bit output vector.
				
				Asm: VPBROADCASTW, CPU Feature: AVX512

			( Int16x8) Compress(mask Mask16x8) Int16x8
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2

			( Int16x8) ConcatPermute(y Int16x8, indices Uint16x8) Int16x8
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2W, CPU Feature: AVX512

			( Int16x8) CopySign(y Int16x8) Int16x8
				CopySign returns the product of the first operand with -1, 0, or 1,
				whichever constant is nearest to the value of the second operand.
				
				Asm: VPSIGNW, CPU Feature: AVX

			( Int16x8) DotProductPairs(y Int16x8) Int32x4
				DotProductPairs multiplies the elements and add the pairs together,
				yielding a vector of half as many elements with twice the input element size.
				
				Asm: VPMADDWD, CPU Feature: AVX

			( Int16x8) Equal(y Int16x8) Mask16x8
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQW, CPU Feature: AVX

			( Int16x8) Expand(mask Mask16x8) Int16x8
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDW, CPU Feature: AVX512VBMI2

			( Int16x8) ExtendLo2ToInt64x2() Int64x2
				ExtendLo2ToInt64x2 converts 2 lowest vector element values to int64.
				The result vector's elements are sign-extended.
				
				Asm: VPMOVSXWQ, CPU Feature: AVX

			( Int16x8) ExtendLo4ToInt32x4() Int32x4
				ExtendLo4ToInt32x4 converts 4 lowest vector element values to int32.
				The result vector's elements are sign-extended.
				
				Asm: VPMOVSXWD, CPU Feature: AVX

			( Int16x8) ExtendLo4ToInt64x4() Int64x4
				ExtendLo4ToInt64x4 converts 4 lowest vector element values to int64.
				The result vector's elements are sign-extended.
				
				Asm: VPMOVSXWQ, CPU Feature: AVX2

			( Int16x8) ExtendToInt32() Int32x8
				ExtendToInt32 converts element values to int32.
				The result vector's elements are sign-extended.
				
				Asm: VPMOVSXWD, CPU Feature: AVX2

			( Int16x8) ExtendToInt64() Int64x8
				ExtendToInt64 converts element values to int64.
				The result vector's elements are sign-extended.
				
				Asm: VPMOVSXWQ, CPU Feature: AVX512

			( Int16x8) GetElem(index uint8) int16
				GetElem retrieves a single constant-indexed element's value.
				
				index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPEXTRW, CPU Feature: AVX512

			( Int16x8) Greater(y Int16x8) Mask16x8
				Greater returns x greater-than y, elementwise.
				
				Asm: VPCMPGTW, CPU Feature: AVX

			( Int16x8) GreaterEqual(y Int16x8) Mask16x8
				GreaterEqual returns a mask whose elements indicate whether x >= y
				
				Emulated, CPU Feature AVX

			( Int16x8) InterleaveHi(y Int16x8) Int16x8
				InterleaveHi interleaves the elements of the high halves of x and y.
				
				Asm: VPUNPCKHWD, CPU Feature: AVX

			( Int16x8) InterleaveLo(y Int16x8) Int16x8
				InterleaveLo interleaves the elements of the low halves of x and y.
				
				Asm: VPUNPCKLWD, CPU Feature: AVX

			( Int16x8) IsZero() bool
				IsZero returns true if all elements of x are zeros.
				
				This method compiles to VPTEST x, x.
				x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
				
				Asm: VPTEST, CPU Feature: AVX

			( Int16x8) Len() int
				Len returns the number of elements in a Int16x8

			( Int16x8) Less(y Int16x8) Mask16x8
				Less returns a mask whose elements indicate whether x < y
				
				Emulated, CPU Feature AVX

			( Int16x8) LessEqual(y Int16x8) Mask16x8
				LessEqual returns a mask whose elements indicate whether x <= y
				
				Emulated, CPU Feature AVX

			( Int16x8) Masked(mask Mask16x8) Int16x8
				Masked returns x but with elements zeroed where mask is false.

			( Int16x8) Max(y Int16x8) Int16x8
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXSW, CPU Feature: AVX

			( Int16x8) Merge(y Int16x8, mask Mask16x8) Int16x8
				Merge returns x but with elements set to y where mask is false.

			( Int16x8) Min(y Int16x8) Int16x8
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINSW, CPU Feature: AVX

			( Int16x8) Mul(y Int16x8) Int16x8
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VPMULLW, CPU Feature: AVX

			( Int16x8) MulHigh(y Int16x8) Int16x8
				MulHigh multiplies elements and stores the high part of the result.
				
				Asm: VPMULHW, CPU Feature: AVX

			( Int16x8) Not() Int16x8
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX

			( Int16x8) NotEqual(y Int16x8) Mask16x8
				NotEqual returns a mask whose elements indicate whether x != y
				
				Emulated, CPU Feature AVX

			( Int16x8) OnesCount() Int16x8
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTW, CPU Feature: AVX512BITALG

			( Int16x8) Or(y Int16x8) Int16x8
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPOR, CPU Feature: AVX

			( Int16x8) Permute(indices Uint16x8) Int16x8
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 3 bits (values 0-7) of each element of indices is used
				
				Asm: VPERMW, CPU Feature: AVX512

			( Int16x8) PermuteScalarsHi(a, b, c, d uint8) Int16x8
				PermuteScalarsHi performs a permutation of vector x using the supplied indices:
				
				result = {x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4]}
				
				Parameters a,b,c,d should have values between 0 and 3.
				If a through d are constants, then an instruction will be inlined, otherwise
				a jump table is generated.
				
				Asm: VPSHUFHW, CPU Feature: AVX512

			( Int16x8) PermuteScalarsLo(a, b, c, d uint8) Int16x8
				PermuteScalarsLo performs a permutation of vector x using the supplied indices:
				
					result = {x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7]}
				
				Parameters a,b,c,d should have values between 0 and 3.
				If a through d are constants, then an instruction will be inlined, otherwise
				a jump table is generated.
				
				Asm: VPSHUFLW, CPU Feature: AVX512

			( Int16x8) SaturateToInt8() Int8x16
				SaturateToInt8 converts element values to int8.
				Conversion is done with saturation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVSWB, CPU Feature: AVX512

			( Int16x8) SaturateToUint8() Int8x16
				SaturateToUint8 converts element values to uint8.
				Conversion is done with saturation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVSWB, CPU Feature: AVX512

			( Int16x8) SetElem(index uint8, y int16) Int16x8
				SetElem sets a single constant-indexed element's value.
				
				index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPINSRW, CPU Feature: AVX

			( Int16x8) ShiftAllLeft(y uint64) Int16x8
				ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
				
				Asm: VPSLLW, CPU Feature: AVX

			( Int16x8) ShiftAllLeftConcat(shift uint8, y Int16x8) Int16x8
				ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHLDW, CPU Feature: AVX512VBMI2

			( Int16x8) ShiftAllRight(y uint64) Int16x8
				ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
				
				Asm: VPSRAW, CPU Feature: AVX

			( Int16x8) ShiftAllRightConcat(shift uint8, y Int16x8) Int16x8
				ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHRDW, CPU Feature: AVX512VBMI2

			( Int16x8) ShiftLeft(y Int16x8) Int16x8
				ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
				
				Asm: VPSLLVW, CPU Feature: AVX512

			( Int16x8) ShiftLeftConcat(y Int16x8, z Int16x8) Int16x8
				ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
				
				Asm: VPSHLDVW, CPU Feature: AVX512VBMI2

			( Int16x8) ShiftRight(y Int16x8) Int16x8
				ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
				
				Asm: VPSRAVW, CPU Feature: AVX512

			( Int16x8) ShiftRightConcat(y Int16x8, z Int16x8) Int16x8
				ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
				
				Asm: VPSHRDVW, CPU Feature: AVX512VBMI2

			( Int16x8) Store(y *[8]int16)
				Store stores a Int16x8 to an array

			( Int16x8) StoreSlice(s []int16)
				StoreSlice stores x into a slice of at least 8 int16s

			( Int16x8) StoreSlicePart(s []int16)
				StoreSlicePart stores the elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 8 or more elements, the method is equivalent to x.StoreSlice.

			( Int16x8) String() string
				String returns a string representation of SIMD vector x

			( Int16x8) Sub(y Int16x8) Int16x8
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBW, CPU Feature: AVX

			( Int16x8) SubPairs(y Int16x8) Int16x8
				SubPairs horizontally subtracts adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
				
				Asm: VPHSUBW, CPU Feature: AVX

			( Int16x8) SubPairsSaturated(y Int16x8) Int16x8
				SubPairsSaturated horizontally subtracts adjacent pairs of elements with saturation.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
				
				Asm: VPHSUBSW, CPU Feature: AVX

			( Int16x8) SubSaturated(y Int16x8) Int16x8
				SubSaturated subtracts corresponding elements of two vectors with saturation.
				
				Asm: VPSUBSW, CPU Feature: AVX

			( Int16x8) ToMask() (to Mask16x8)
				ToMask converts from Int16x8 to Mask16x8, mask element is set to true when the corresponding vector element is non-zero.

			( Int16x8) TruncateToInt8() Int8x16
				TruncateToInt8 converts element values to int8.
				Conversion is done with truncation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVWB, CPU Feature: AVX512

			( Int16x8) Xor(y Int16x8) Int16x8
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXOR, CPU Feature: AVX

		Implements (at least 2)
			 Int16x8 : expvar.Var
			 Int16x8 : fmt.Stringer
		As Outputs Of (at least 69)
			func BroadcastInt16x8(x int16) Int16x8
			func LoadInt16x8(y *[8]int16) Int16x8
			func LoadInt16x8Slice(s []int16) Int16x8
			func LoadInt16x8SlicePart(s []int16) Int16x8
			func Float32x4.AsInt16x8() (to Int16x8)
			func Float64x2.AsInt16x8() (to Int16x8)
			func Int16x16.GetHi() Int16x8
			func Int16x16.GetLo() Int16x8
			func Int16x8.Abs() Int16x8
			func Int16x8.Add(y Int16x8) Int16x8
			func Int16x8.AddPairs(y Int16x8) Int16x8
			func Int16x8.AddPairsSaturated(y Int16x8) Int16x8
			func Int16x8.AddSaturated(y Int16x8) Int16x8
			func Int16x8.And(y Int16x8) Int16x8
			func Int16x8.AndNot(y Int16x8) Int16x8
			func Int16x8.Broadcast128() Int16x8
			func Int16x8.Compress(mask Mask16x8) Int16x8
			func Int16x8.ConcatPermute(y Int16x8, indices Uint16x8) Int16x8
			func Int16x8.CopySign(y Int16x8) Int16x8
			func Int16x8.Expand(mask Mask16x8) Int16x8
			func Int16x8.InterleaveHi(y Int16x8) Int16x8
			func Int16x8.InterleaveLo(y Int16x8) Int16x8
			func Int16x8.Masked(mask Mask16x8) Int16x8
			func Int16x8.Max(y Int16x8) Int16x8
			func Int16x8.Merge(y Int16x8, mask Mask16x8) Int16x8
			func Int16x8.Min(y Int16x8) Int16x8
			func Int16x8.Mul(y Int16x8) Int16x8
			func Int16x8.MulHigh(y Int16x8) Int16x8
			func Int16x8.Not() Int16x8
			func Int16x8.OnesCount() Int16x8
			func Int16x8.Or(y Int16x8) Int16x8
			func Int16x8.Permute(indices Uint16x8) Int16x8
			func Int16x8.PermuteScalarsHi(a, b, c, d uint8) Int16x8
			func Int16x8.PermuteScalarsLo(a, b, c, d uint8) Int16x8
			func Int16x8.SetElem(index uint8, y int16) Int16x8
			func Int16x8.ShiftAllLeft(y uint64) Int16x8
			func Int16x8.ShiftAllLeftConcat(shift uint8, y Int16x8) Int16x8
			func Int16x8.ShiftAllRight(y uint64) Int16x8
			func Int16x8.ShiftAllRightConcat(shift uint8, y Int16x8) Int16x8
			func Int16x8.ShiftLeft(y Int16x8) Int16x8
			func Int16x8.ShiftLeftConcat(y Int16x8, z Int16x8) Int16x8
			func Int16x8.ShiftRight(y Int16x8) Int16x8
			func Int16x8.ShiftRightConcat(y Int16x8, z Int16x8) Int16x8
			func Int16x8.Sub(y Int16x8) Int16x8
			func Int16x8.SubPairs(y Int16x8) Int16x8
			func Int16x8.SubPairsSaturated(y Int16x8) Int16x8
			func Int16x8.SubSaturated(y Int16x8) Int16x8
			func Int16x8.Xor(y Int16x8) Int16x8
			func Int32x4.AsInt16x8() (to Int16x8)
			func Int32x4.SaturateToInt16() Int16x8
			func Int32x4.SaturateToInt16Concat(y Int32x4) Int16x8
			func Int32x4.TruncateToInt16() Int16x8
			func Int32x8.SaturateToInt16() Int16x8
			func Int32x8.TruncateToInt16() Int16x8
			func Int64x2.AsInt16x8() (to Int16x8)
			func Int64x2.SaturateToInt16() Int16x8
			func Int64x2.TruncateToInt16() Int16x8
			func Int64x4.SaturateToInt16() Int16x8
			func Int64x4.TruncateToInt16() Int16x8
			func Int64x8.SaturateToInt16() Int16x8
			func Int64x8.TruncateToInt16() Int16x8
			func Int8x16.AsInt16x8() (to Int16x8)
			func Int8x16.ExtendLo8ToInt16x8() Int16x8
			func Mask16x8.ToInt16x8() (to Int16x8)
			func Uint16x8.AsInt16x8() (to Int16x8)
			func Uint32x4.AsInt16x8() (to Int16x8)
			func Uint64x2.AsInt16x8() (to Int16x8)
			func Uint8x16.AsInt16x8() (to Int16x8)
			func Uint8x16.DotProductPairsSaturated(y Int8x16) Int16x8
		As Inputs Of (at least 38)
			func Int16x16.SetHi(y Int16x8) Int16x16
			func Int16x16.SetLo(y Int16x8) Int16x16
			func Int16x8.Add(y Int16x8) Int16x8
			func Int16x8.AddPairs(y Int16x8) Int16x8
			func Int16x8.AddPairsSaturated(y Int16x8) Int16x8
			func Int16x8.AddSaturated(y Int16x8) Int16x8
			func Int16x8.And(y Int16x8) Int16x8
			func Int16x8.AndNot(y Int16x8) Int16x8
			func Int16x8.ConcatPermute(y Int16x8, indices Uint16x8) Int16x8
			func Int16x8.CopySign(y Int16x8) Int16x8
			func Int16x8.DotProductPairs(y Int16x8) Int32x4
			func Int16x8.Equal(y Int16x8) Mask16x8
			func Int16x8.Greater(y Int16x8) Mask16x8
			func Int16x8.GreaterEqual(y Int16x8) Mask16x8
			func Int16x8.InterleaveHi(y Int16x8) Int16x8
			func Int16x8.InterleaveLo(y Int16x8) Int16x8
			func Int16x8.Less(y Int16x8) Mask16x8
			func Int16x8.LessEqual(y Int16x8) Mask16x8
			func Int16x8.Max(y Int16x8) Int16x8
			func Int16x8.Merge(y Int16x8, mask Mask16x8) Int16x8
			func Int16x8.Min(y Int16x8) Int16x8
			func Int16x8.Mul(y Int16x8) Int16x8
			func Int16x8.MulHigh(y Int16x8) Int16x8
			func Int16x8.NotEqual(y Int16x8) Mask16x8
			func Int16x8.Or(y Int16x8) Int16x8
			func Int16x8.ShiftAllLeftConcat(shift uint8, y Int16x8) Int16x8
			func Int16x8.ShiftAllRightConcat(shift uint8, y Int16x8) Int16x8
			func Int16x8.ShiftLeft(y Int16x8) Int16x8
			func Int16x8.ShiftLeftConcat(y Int16x8, z Int16x8) Int16x8
			func Int16x8.ShiftLeftConcat(y Int16x8, z Int16x8) Int16x8
			func Int16x8.ShiftRight(y Int16x8) Int16x8
			func Int16x8.ShiftRightConcat(y Int16x8, z Int16x8) Int16x8
			func Int16x8.ShiftRightConcat(y Int16x8, z Int16x8) Int16x8
			func Int16x8.Sub(y Int16x8) Int16x8
			func Int16x8.SubPairs(y Int16x8) Int16x8
			func Int16x8.SubPairsSaturated(y Int16x8) Int16x8
			func Int16x8.SubSaturated(y Int16x8) Int16x8
			func Int16x8.Xor(y Int16x8) Int16x8

	 type Int32x16 (struct)
		Int32x16 is a 512-bit SIMD vector of 16 int32

		Methods (total 68)
			( Int32x16) Abs() Int32x16
				Abs computes the absolute value of each element.
				
				Asm: VPABSD, CPU Feature: AVX512

			( Int32x16) Add(y Int32x16) Int32x16
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDD, CPU Feature: AVX512

			( Int32x16) And(y Int32x16) Int32x16
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPANDD, CPU Feature: AVX512

			( Int32x16) AndNot(y Int32x16) Int32x16
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDND, CPU Feature: AVX512

			( Int32x16) AsFloat32x16() (to Float32x16)
				Float32x16 converts from Int32x16 to Float32x16

			( Int32x16) AsFloat64x8() (to Float64x8)
				Float64x8 converts from Int32x16 to Float64x8

			( Int32x16) AsInt16x32() (to Int16x32)
				Int16x32 converts from Int32x16 to Int16x32

			( Int32x16) AsInt64x8() (to Int64x8)
				Int64x8 converts from Int32x16 to Int64x8

			( Int32x16) AsInt8x64() (to Int8x64)
				Int8x64 converts from Int32x16 to Int8x64

			( Int32x16) AsUint16x32() (to Uint16x32)
				Uint16x32 converts from Int32x16 to Uint16x32

			( Int32x16) AsUint32x16() (to Uint32x16)
				Uint32x16 converts from Int32x16 to Uint32x16

			( Int32x16) AsUint64x8() (to Uint64x8)
				Uint64x8 converts from Int32x16 to Uint64x8

			( Int32x16) AsUint8x64() (to Uint8x64)
				Uint8x64 converts from Int32x16 to Uint8x64

			( Int32x16) Compress(mask Mask32x16) Int32x16
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSD, CPU Feature: AVX512

			( Int32x16) ConcatPermute(y Int32x16, indices Uint32x16) Int32x16
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2D, CPU Feature: AVX512

			( Int32x16) ConvertToFloat32() Float32x16
				ConvertToFloat32 converts element values to float32.
				
				Asm: VCVTDQ2PS, CPU Feature: AVX512

			( Int32x16) Equal(y Int32x16) Mask32x16
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQD, CPU Feature: AVX512

			( Int32x16) Expand(mask Mask32x16) Int32x16
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDD, CPU Feature: AVX512

			( Int32x16) GetHi() Int32x8
				GetHi returns the upper half of x.
				
				Asm: VEXTRACTI64X4, CPU Feature: AVX512

			( Int32x16) GetLo() Int32x8
				GetLo returns the lower half of x.
				
				Asm: VEXTRACTI64X4, CPU Feature: AVX512

			( Int32x16) Greater(y Int32x16) Mask32x16
				Greater returns x greater-than y, elementwise.
				
				Asm: VPCMPGTD, CPU Feature: AVX512

			( Int32x16) GreaterEqual(y Int32x16) Mask32x16
				GreaterEqual returns x greater-than-or-equals y, elementwise.
				
				Asm: VPCMPD, CPU Feature: AVX512

			( Int32x16) InterleaveHiGrouped(y Int32x16) Int32x16
				InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKHDQ, CPU Feature: AVX512

			( Int32x16) InterleaveLoGrouped(y Int32x16) Int32x16
				InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKLDQ, CPU Feature: AVX512

			( Int32x16) LeadingZeros() Int32x16
				LeadingZeros counts the leading zeros of each element in x.
				
				Asm: VPLZCNTD, CPU Feature: AVX512

			( Int32x16) Len() int
				Len returns the number of elements in a Int32x16

			( Int32x16) Less(y Int32x16) Mask32x16
				Less returns x less-than y, elementwise.
				
				Asm: VPCMPD, CPU Feature: AVX512

			( Int32x16) LessEqual(y Int32x16) Mask32x16
				LessEqual returns x less-than-or-equals y, elementwise.
				
				Asm: VPCMPD, CPU Feature: AVX512

			( Int32x16) Masked(mask Mask32x16) Int32x16
				Masked returns x but with elements zeroed where mask is false.

			( Int32x16) Max(y Int32x16) Int32x16
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXSD, CPU Feature: AVX512

			( Int32x16) Merge(y Int32x16, mask Mask32x16) Int32x16
				Merge returns x but with elements set to y where m is false.

			( Int32x16) Min(y Int32x16) Int32x16
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINSD, CPU Feature: AVX512

			( Int32x16) Mul(y Int32x16) Int32x16
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VPMULLD, CPU Feature: AVX512

			( Int32x16) Not() Int32x16
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX512

			( Int32x16) NotEqual(y Int32x16) Mask32x16
				NotEqual returns x not-equals y, elementwise.
				
				Asm: VPCMPD, CPU Feature: AVX512

			( Int32x16) OnesCount() Int32x16
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ

			( Int32x16) Or(y Int32x16) Int32x16
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPORD, CPU Feature: AVX512

			( Int32x16) Permute(indices Uint32x16) Int32x16
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 4 bits (values 0-15) of each element of indices is used
				
				Asm: VPERMD, CPU Feature: AVX512

			( Int32x16) PermuteScalarsGrouped(a, b, c, d uint8) Int32x16
				PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:
				
					 result =
						 {  x[a], x[b], x[c], x[d],         x[a+4], x[b+4], x[c+4], x[d+4],
							x[a+8], x[b+8], x[c+8], x[d+8], x[a+12], x[b+12], x[c+12], x[d+12]}
				
				Parameters a,b,c,d should have values between 0 and 3.
				If a through d are constants, then an instruction will be inlined, otherwise
				a jump table may be generated.
				
				Asm: VPSHUFD, CPU Feature: AVX512

			( Int32x16) RotateAllLeft(shift uint8) Int32x16
				RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPROLD, CPU Feature: AVX512

			( Int32x16) RotateAllRight(shift uint8) Int32x16
				RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPRORD, CPU Feature: AVX512

			( Int32x16) RotateLeft(y Int32x16) Int32x16
				RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
				
				Asm: VPROLVD, CPU Feature: AVX512

			( Int32x16) RotateRight(y Int32x16) Int32x16
				RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
				
				Asm: VPRORVD, CPU Feature: AVX512

			( Int32x16) SaturateToInt16() Int16x16
				SaturateToInt16 converts element values to int16.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPMOVSDW, CPU Feature: AVX512

			( Int32x16) SaturateToInt16Concat(y Int32x16) Int16x32
				SaturateToInt16Concat converts element values to int16.
				With each 128-bit as a group:
				The converted group from the first input vector will be packed to the lower part of the result vector,
				the converted group from the second input vector will be packed to the upper part of the result vector.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPACKSSDW, CPU Feature: AVX512

			( Int32x16) SaturateToInt8() Int8x16
				SaturateToInt8 converts element values to int8.
				Conversion is done with saturation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVSDB, CPU Feature: AVX512

			( Int32x16) SaturateToUint8() Int8x16
				SaturateToUint8 converts element values to uint8.
				Conversion is done with saturation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVSDB, CPU Feature: AVX512

			( Int32x16) SelectFromPairGrouped(a, b, c, d uint8, y Int32x16) Int32x16
				SelectFromPairGrouped returns, for each of the four 128-bit subvectors
				of the vectors x and y, the selection of four elements from  x and y,
				where selector values in the range 0-3 specify elements from x and
				values in the range 4-7 specify the 0-3 elements of y.
				When the selectors are constants and can be the selection
				can be implemented in a single instruction, it will be, otherwise
				it requires two.
				
				If the selectors are not constant this will translate to a function
				call.
				
				Asm: VSHUFPS, CPU Feature: AVX512

			( Int32x16) SetHi(y Int32x8) Int32x16
				SetHi returns x with its upper half set to y.
				
				Asm: VINSERTI64X4, CPU Feature: AVX512

			( Int32x16) SetLo(y Int32x8) Int32x16
				SetLo returns x with its lower half set to y.
				
				Asm: VINSERTI64X4, CPU Feature: AVX512

			( Int32x16) ShiftAllLeft(y uint64) Int32x16
				ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
				
				Asm: VPSLLD, CPU Feature: AVX512

			( Int32x16) ShiftAllLeftConcat(shift uint8, y Int32x16) Int32x16
				ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHLDD, CPU Feature: AVX512VBMI2

			( Int32x16) ShiftAllRight(y uint64) Int32x16
				ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
				
				Asm: VPSRAD, CPU Feature: AVX512

			( Int32x16) ShiftAllRightConcat(shift uint8, y Int32x16) Int32x16
				ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHRDD, CPU Feature: AVX512VBMI2

			( Int32x16) ShiftLeft(y Int32x16) Int32x16
				ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
				
				Asm: VPSLLVD, CPU Feature: AVX512

			( Int32x16) ShiftLeftConcat(y Int32x16, z Int32x16) Int32x16
				ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
				
				Asm: VPSHLDVD, CPU Feature: AVX512VBMI2

			( Int32x16) ShiftRight(y Int32x16) Int32x16
				ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
				
				Asm: VPSRAVD, CPU Feature: AVX512

			( Int32x16) ShiftRightConcat(y Int32x16, z Int32x16) Int32x16
				ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
				
				Asm: VPSHRDVD, CPU Feature: AVX512VBMI2

			( Int32x16) Store(y *[16]int32)
				Store stores a Int32x16 to an array

			( Int32x16) StoreMasked(y *[16]int32, mask Mask32x16)
				StoreMasked stores a Int32x16 to an array,
				at those elements enabled by mask
				
				Asm: VMOVDQU32, CPU Feature: AVX512

			( Int32x16) StoreSlice(s []int32)
				StoreSlice stores x into a slice of at least 16 int32s

			( Int32x16) StoreSlicePart(s []int32)
				StoreSlicePart stores the 16 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 16 or more elements, the method is equivalent to x.StoreSlice.

			( Int32x16) String() string
				String returns a string representation of SIMD vector x

			( Int32x16) Sub(y Int32x16) Int32x16
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBD, CPU Feature: AVX512

			( Int32x16) ToMask() (to Mask32x16)
				ToMask converts from Int32x16 to Mask32x16, mask element is set to true when the corresponding vector element is non-zero.

			( Int32x16) TruncateToInt16() Int16x16
				TruncateToInt16 converts element values to int16.
				Conversion is done with truncation on the vector elements.
				
				Asm: VPMOVDW, CPU Feature: AVX512

			( Int32x16) TruncateToInt8() Int8x16
				TruncateToInt8 converts element values to int8.
				Conversion is done with truncation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVDB, CPU Feature: AVX512

			( Int32x16) Xor(y Int32x16) Int32x16
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXORD, CPU Feature: AVX512

		Implements (at least 2)
			 Int32x16 : expvar.Var
			 Int32x16 : fmt.Stringer
		As Outputs Of (at least 59)
			func BroadcastInt32x16(x int32) Int32x16
			func LoadInt32x16(y *[16]int32) Int32x16
			func LoadInt32x16Slice(s []int32) Int32x16
			func LoadInt32x16SlicePart(s []int32) Int32x16
			func LoadMaskedInt32x16(y *[16]int32, mask Mask32x16) Int32x16
			func Float32x16.AsInt32x16() (to Int32x16)
			func Float32x16.ConvertToInt32() Int32x16
			func Float64x8.AsInt32x16() (to Int32x16)
			func Int16x16.ExtendToInt32() Int32x16
			func Int16x32.AsInt32x16() (to Int32x16)
			func Int16x32.DotProductPairs(y Int16x32) Int32x16
			func Int32x16.Abs() Int32x16
			func Int32x16.Add(y Int32x16) Int32x16
			func Int32x16.And(y Int32x16) Int32x16
			func Int32x16.AndNot(y Int32x16) Int32x16
			func Int32x16.Compress(mask Mask32x16) Int32x16
			func Int32x16.ConcatPermute(y Int32x16, indices Uint32x16) Int32x16
			func Int32x16.Expand(mask Mask32x16) Int32x16
			func Int32x16.InterleaveHiGrouped(y Int32x16) Int32x16
			func Int32x16.InterleaveLoGrouped(y Int32x16) Int32x16
			func Int32x16.LeadingZeros() Int32x16
			func Int32x16.Masked(mask Mask32x16) Int32x16
			func Int32x16.Max(y Int32x16) Int32x16
			func Int32x16.Merge(y Int32x16, mask Mask32x16) Int32x16
			func Int32x16.Min(y Int32x16) Int32x16
			func Int32x16.Mul(y Int32x16) Int32x16
			func Int32x16.Not() Int32x16
			func Int32x16.OnesCount() Int32x16
			func Int32x16.Or(y Int32x16) Int32x16
			func Int32x16.Permute(indices Uint32x16) Int32x16
			func Int32x16.PermuteScalarsGrouped(a, b, c, d uint8) Int32x16
			func Int32x16.RotateAllLeft(shift uint8) Int32x16
			func Int32x16.RotateAllRight(shift uint8) Int32x16
			func Int32x16.RotateLeft(y Int32x16) Int32x16
			func Int32x16.RotateRight(y Int32x16) Int32x16
			func Int32x16.SelectFromPairGrouped(a, b, c, d uint8, y Int32x16) Int32x16
			func Int32x16.SetHi(y Int32x8) Int32x16
			func Int32x16.SetLo(y Int32x8) Int32x16
			func Int32x16.ShiftAllLeft(y uint64) Int32x16
			func Int32x16.ShiftAllLeftConcat(shift uint8, y Int32x16) Int32x16
			func Int32x16.ShiftAllRight(y uint64) Int32x16
			func Int32x16.ShiftAllRightConcat(shift uint8, y Int32x16) Int32x16
			func Int32x16.ShiftLeft(y Int32x16) Int32x16
			func Int32x16.ShiftLeftConcat(y Int32x16, z Int32x16) Int32x16
			func Int32x16.ShiftRight(y Int32x16) Int32x16
			func Int32x16.ShiftRightConcat(y Int32x16, z Int32x16) Int32x16
			func Int32x16.Sub(y Int32x16) Int32x16
			func Int32x16.Xor(y Int32x16) Int32x16
			func Int32x4.Broadcast512() Int32x16
			func Int64x8.AsInt32x16() (to Int32x16)
			func Int8x16.ExtendToInt32() Int32x16
			func Int8x64.AsInt32x16() (to Int32x16)
			func Int8x64.DotProductQuadruple(y Uint8x64) Int32x16
			func Int8x64.DotProductQuadrupleSaturated(y Uint8x64) Int32x16
			func Mask32x16.ToInt32x16() (to Int32x16)
			func Uint16x32.AsInt32x16() (to Int32x16)
			func Uint32x16.AsInt32x16() (to Int32x16)
			func Uint64x8.AsInt32x16() (to Int32x16)
			func Uint8x64.AsInt32x16() (to Int32x16)
		As Inputs Of (at least 31)
			func Int32x16.Add(y Int32x16) Int32x16
			func Int32x16.And(y Int32x16) Int32x16
			func Int32x16.AndNot(y Int32x16) Int32x16
			func Int32x16.ConcatPermute(y Int32x16, indices Uint32x16) Int32x16
			func Int32x16.Equal(y Int32x16) Mask32x16
			func Int32x16.Greater(y Int32x16) Mask32x16
			func Int32x16.GreaterEqual(y Int32x16) Mask32x16
			func Int32x16.InterleaveHiGrouped(y Int32x16) Int32x16
			func Int32x16.InterleaveLoGrouped(y Int32x16) Int32x16
			func Int32x16.Less(y Int32x16) Mask32x16
			func Int32x16.LessEqual(y Int32x16) Mask32x16
			func Int32x16.Max(y Int32x16) Int32x16
			func Int32x16.Merge(y Int32x16, mask Mask32x16) Int32x16
			func Int32x16.Min(y Int32x16) Int32x16
			func Int32x16.Mul(y Int32x16) Int32x16
			func Int32x16.NotEqual(y Int32x16) Mask32x16
			func Int32x16.Or(y Int32x16) Int32x16
			func Int32x16.RotateLeft(y Int32x16) Int32x16
			func Int32x16.RotateRight(y Int32x16) Int32x16
			func Int32x16.SaturateToInt16Concat(y Int32x16) Int16x32
			func Int32x16.SelectFromPairGrouped(a, b, c, d uint8, y Int32x16) Int32x16
			func Int32x16.ShiftAllLeftConcat(shift uint8, y Int32x16) Int32x16
			func Int32x16.ShiftAllRightConcat(shift uint8, y Int32x16) Int32x16
			func Int32x16.ShiftLeft(y Int32x16) Int32x16
			func Int32x16.ShiftLeftConcat(y Int32x16, z Int32x16) Int32x16
			func Int32x16.ShiftLeftConcat(y Int32x16, z Int32x16) Int32x16
			func Int32x16.ShiftRight(y Int32x16) Int32x16
			func Int32x16.ShiftRightConcat(y Int32x16, z Int32x16) Int32x16
			func Int32x16.ShiftRightConcat(y Int32x16, z Int32x16) Int32x16
			func Int32x16.Sub(y Int32x16) Int32x16
			func Int32x16.Xor(y Int32x16) Int32x16

	 type Int32x4 (struct)
		Int32x4 is a 128-bit SIMD vector of 4 int32

		Methods (total 76)
			( Int32x4) Abs() Int32x4
				Abs computes the absolute value of each element.
				
				Asm: VPABSD, CPU Feature: AVX

			( Int32x4) Add(y Int32x4) Int32x4
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDD, CPU Feature: AVX

			( Int32x4) AddPairs(y Int32x4) Int32x4
				AddPairs horizontally adds adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
				
				Asm: VPHADDD, CPU Feature: AVX

			( Int32x4) And(y Int32x4) Int32x4
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPAND, CPU Feature: AVX

			( Int32x4) AndNot(y Int32x4) Int32x4
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDN, CPU Feature: AVX

			( Int32x4) AsFloat32x4() (to Float32x4)
				Float32x4 converts from Int32x4 to Float32x4

			( Int32x4) AsFloat64x2() (to Float64x2)
				Float64x2 converts from Int32x4 to Float64x2

			( Int32x4) AsInt16x8() (to Int16x8)
				Int16x8 converts from Int32x4 to Int16x8

			( Int32x4) AsInt64x2() (to Int64x2)
				Int64x2 converts from Int32x4 to Int64x2

			( Int32x4) AsInt8x16() (to Int8x16)
				Int8x16 converts from Int32x4 to Int8x16

			( Int32x4) AsUint16x8() (to Uint16x8)
				Uint16x8 converts from Int32x4 to Uint16x8

			( Int32x4) AsUint32x4() (to Uint32x4)
				Uint32x4 converts from Int32x4 to Uint32x4

			( Int32x4) AsUint64x2() (to Uint64x2)
				Uint64x2 converts from Int32x4 to Uint64x2

			( Int32x4) AsUint8x16() (to Uint8x16)
				Uint8x16 converts from Int32x4 to Uint8x16

			( Int32x4) Broadcast128() Int32x4
				Broadcast128 copies element zero of its (128-bit) input to all elements of
				the 128-bit output vector.
				
				Asm: VPBROADCASTD, CPU Feature: AVX2

			( Int32x4) Broadcast256() Int32x8
				Broadcast256 copies element zero of its (128-bit) input to all elements of
				the 256-bit output vector.
				
				Asm: VPBROADCASTD, CPU Feature: AVX2

			( Int32x4) Broadcast512() Int32x16
				Broadcast512 copies element zero of its (128-bit) input to all elements of
				the 512-bit output vector.
				
				Asm: VPBROADCASTD, CPU Feature: AVX512

			( Int32x4) Compress(mask Mask32x4) Int32x4
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSD, CPU Feature: AVX512

			( Int32x4) ConcatPermute(y Int32x4, indices Uint32x4) Int32x4
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2D, CPU Feature: AVX512

			( Int32x4) ConvertToFloat32() Float32x4
				ConvertToFloat32 converts element values to float32.
				
				Asm: VCVTDQ2PS, CPU Feature: AVX

			( Int32x4) ConvertToFloat64() Float64x4
				ConvertToFloat64 converts element values to float64.
				
				Asm: VCVTDQ2PD, CPU Feature: AVX

			( Int32x4) CopySign(y Int32x4) Int32x4
				CopySign returns the product of the first operand with -1, 0, or 1,
				whichever constant is nearest to the value of the second operand.
				
				Asm: VPSIGND, CPU Feature: AVX

			( Int32x4) Equal(y Int32x4) Mask32x4
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQD, CPU Feature: AVX

			( Int32x4) Expand(mask Mask32x4) Int32x4
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDD, CPU Feature: AVX512

			( Int32x4) ExtendLo2ToInt64x2() Int64x2
				ExtendLo2ToInt64x2 converts 2 lowest vector element values to int64.
				The result vector's elements are sign-extended.
				
				Asm: VPMOVSXDQ, CPU Feature: AVX

			( Int32x4) ExtendToInt64() Int64x4
				ExtendToInt64 converts element values to int64.
				The result vector's elements are sign-extended.
				
				Asm: VPMOVSXDQ, CPU Feature: AVX2

			( Int32x4) GetElem(index uint8) int32
				GetElem retrieves a single constant-indexed element's value.
				
				index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPEXTRD, CPU Feature: AVX

			( Int32x4) Greater(y Int32x4) Mask32x4
				Greater returns x greater-than y, elementwise.
				
				Asm: VPCMPGTD, CPU Feature: AVX

			( Int32x4) GreaterEqual(y Int32x4) Mask32x4
				GreaterEqual returns a mask whose elements indicate whether x >= y
				
				Emulated, CPU Feature AVX

			( Int32x4) InterleaveHi(y Int32x4) Int32x4
				InterleaveHi interleaves the elements of the high halves of x and y.
				
				Asm: VPUNPCKHDQ, CPU Feature: AVX

			( Int32x4) InterleaveLo(y Int32x4) Int32x4
				InterleaveLo interleaves the elements of the low halves of x and y.
				
				Asm: VPUNPCKLDQ, CPU Feature: AVX

			( Int32x4) IsZero() bool
				IsZero returns true if all elements of x are zeros.
				
				This method compiles to VPTEST x, x.
				x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
				
				Asm: VPTEST, CPU Feature: AVX

			( Int32x4) LeadingZeros() Int32x4
				LeadingZeros counts the leading zeros of each element in x.
				
				Asm: VPLZCNTD, CPU Feature: AVX512

			( Int32x4) Len() int
				Len returns the number of elements in a Int32x4

			( Int32x4) Less(y Int32x4) Mask32x4
				Less returns a mask whose elements indicate whether x < y
				
				Emulated, CPU Feature AVX

			( Int32x4) LessEqual(y Int32x4) Mask32x4
				LessEqual returns a mask whose elements indicate whether x <= y
				
				Emulated, CPU Feature AVX

			( Int32x4) Masked(mask Mask32x4) Int32x4
				Masked returns x but with elements zeroed where mask is false.

			( Int32x4) Max(y Int32x4) Int32x4
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXSD, CPU Feature: AVX

			( Int32x4) Merge(y Int32x4, mask Mask32x4) Int32x4
				Merge returns x but with elements set to y where mask is false.

			( Int32x4) Min(y Int32x4) Int32x4
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINSD, CPU Feature: AVX

			( Int32x4) Mul(y Int32x4) Int32x4
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VPMULLD, CPU Feature: AVX

			( Int32x4) MulEvenWiden(y Int32x4) Int64x2
				MulEvenWiden multiplies even-indexed elements, widening the result.
				Result[i] = v1.Even[i] * v2.Even[i].
				
				Asm: VPMULDQ, CPU Feature: AVX

			( Int32x4) Not() Int32x4
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX

			( Int32x4) NotEqual(y Int32x4) Mask32x4
				NotEqual returns a mask whose elements indicate whether x != y
				
				Emulated, CPU Feature AVX

			( Int32x4) OnesCount() Int32x4
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ

			( Int32x4) Or(y Int32x4) Int32x4
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPOR, CPU Feature: AVX

			( Int32x4) PermuteScalars(a, b, c, d uint8) Int32x4
				PermuteScalars performs a permutation of vector x's elements using the supplied indices:
				
					result = {x[a], x[b], x[c], x[d]}
				
				Parameters a,b,c,d should have values between 0 and 3.
				If a through d are constants, then an instruction will be inlined, otherwise
				a jump table may be generated.
				
				Asm: VPSHUFD, CPU Feature: AVX

			( Int32x4) RotateAllLeft(shift uint8) Int32x4
				RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPROLD, CPU Feature: AVX512

			( Int32x4) RotateAllRight(shift uint8) Int32x4
				RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPRORD, CPU Feature: AVX512

			( Int32x4) RotateLeft(y Int32x4) Int32x4
				RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
				
				Asm: VPROLVD, CPU Feature: AVX512

			( Int32x4) RotateRight(y Int32x4) Int32x4
				RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
				
				Asm: VPRORVD, CPU Feature: AVX512

			( Int32x4) SaturateToInt16() Int16x8
				SaturateToInt16 converts element values to int16.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPMOVSDW, CPU Feature: AVX512

			( Int32x4) SaturateToInt16Concat(y Int32x4) Int16x8
				SaturateToInt16Concat converts element values to int16.
				With each 128-bit as a group:
				The converted group from the first input vector will be packed to the lower part of the result vector,
				the converted group from the second input vector will be packed to the upper part of the result vector.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPACKSSDW, CPU Feature: AVX

			( Int32x4) SaturateToInt8() Int8x16
				SaturateToInt8 converts element values to int8.
				Conversion is done with saturation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVSDB, CPU Feature: AVX512

			( Int32x4) SaturateToUint8() Int8x16
				SaturateToUint8 converts element values to uint8.
				Conversion is done with saturation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVSDB, CPU Feature: AVX512

			( Int32x4) SelectFromPair(a, b, c, d uint8, y Int32x4) Int32x4
				SelectFromPair returns the selection of four elements from the two
				vectors x and y, where selector values in the range 0-3 specify
				elements from x and values in the range 4-7 specify the 0-3 elements
				of y.  When the selectors are constants and the selection can be
				implemented in a single instruction, it will be, otherwise it
				requires two.  a is the source index of the least element in the
				output, and b, c, and d are the indices of the 2nd, 3rd, and 4th
				elements in the output.  For example,
				{1,2,4,8}.SelectFromPair(2,3,5,7,{9,25,49,81}) returns {4,8,25,81}
				
				If the selectors are not constant this will translate to a function
				call.
				
				Asm: VSHUFPS, CPU Feature: AVX

			( Int32x4) SetElem(index uint8, y int32) Int32x4
				SetElem sets a single constant-indexed element's value.
				
				index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPINSRD, CPU Feature: AVX

			( Int32x4) ShiftAllLeft(y uint64) Int32x4
				ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
				
				Asm: VPSLLD, CPU Feature: AVX

			( Int32x4) ShiftAllLeftConcat(shift uint8, y Int32x4) Int32x4
				ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHLDD, CPU Feature: AVX512VBMI2

			( Int32x4) ShiftAllRight(y uint64) Int32x4
				ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
				
				Asm: VPSRAD, CPU Feature: AVX

			( Int32x4) ShiftAllRightConcat(shift uint8, y Int32x4) Int32x4
				ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHRDD, CPU Feature: AVX512VBMI2

			( Int32x4) ShiftLeft(y Int32x4) Int32x4
				ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
				
				Asm: VPSLLVD, CPU Feature: AVX2

			( Int32x4) ShiftLeftConcat(y Int32x4, z Int32x4) Int32x4
				ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
				
				Asm: VPSHLDVD, CPU Feature: AVX512VBMI2

			( Int32x4) ShiftRight(y Int32x4) Int32x4
				ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
				
				Asm: VPSRAVD, CPU Feature: AVX2

			( Int32x4) ShiftRightConcat(y Int32x4, z Int32x4) Int32x4
				ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
				
				Asm: VPSHRDVD, CPU Feature: AVX512VBMI2

			( Int32x4) Store(y *[4]int32)
				Store stores a Int32x4 to an array

			( Int32x4) StoreMasked(y *[4]int32, mask Mask32x4)
				StoreMasked stores a Int32x4 to an array,
				at those elements enabled by mask
				
				Asm: VMASKMOVD, CPU Feature: AVX2

			( Int32x4) StoreSlice(s []int32)
				StoreSlice stores x into a slice of at least 4 int32s

			( Int32x4) StoreSlicePart(s []int32)
				StoreSlicePart stores the 4 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 4 or more elements, the method is equivalent to x.StoreSlice.

			( Int32x4) String() string
				String returns a string representation of SIMD vector x

			( Int32x4) Sub(y Int32x4) Int32x4
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBD, CPU Feature: AVX

			( Int32x4) SubPairs(y Int32x4) Int32x4
				SubPairs horizontally subtracts adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
				
				Asm: VPHSUBD, CPU Feature: AVX

			( Int32x4) ToMask() (to Mask32x4)
				ToMask converts from Int32x4 to Mask32x4, mask element is set to true when the corresponding vector element is non-zero.

			( Int32x4) TruncateToInt16() Int16x8
				TruncateToInt16 converts element values to int16.
				Conversion is done with truncation on the vector elements.
				
				Asm: VPMOVDW, CPU Feature: AVX512

			( Int32x4) TruncateToInt8() Int8x16
				TruncateToInt8 converts element values to int8.
				Conversion is done with truncation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVDB, CPU Feature: AVX512

			( Int32x4) Xor(y Int32x4) Int32x4
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXOR, CPU Feature: AVX

		Implements (at least 2)
			 Int32x4 : expvar.Var
			 Int32x4 : fmt.Stringer
		As Outputs Of (at least 68)
			func BroadcastInt32x4(x int32) Int32x4
			func LoadInt32x4(y *[4]int32) Int32x4
			func LoadInt32x4Slice(s []int32) Int32x4
			func LoadInt32x4SlicePart(s []int32) Int32x4
			func LoadMaskedInt32x4(y *[4]int32, mask Mask32x4) Int32x4
			func Float32x4.AsInt32x4() (to Int32x4)
			func Float32x4.ConvertToInt32() Int32x4
			func Float64x2.AsInt32x4() (to Int32x4)
			func Float64x2.ConvertToInt32() Int32x4
			func Float64x4.ConvertToInt32() Int32x4
			func Int16x8.AsInt32x4() (to Int32x4)
			func Int16x8.DotProductPairs(y Int16x8) Int32x4
			func Int16x8.ExtendLo4ToInt32x4() Int32x4
			func Int32x4.Abs() Int32x4
			func Int32x4.Add(y Int32x4) Int32x4
			func Int32x4.AddPairs(y Int32x4) Int32x4
			func Int32x4.And(y Int32x4) Int32x4
			func Int32x4.AndNot(y Int32x4) Int32x4
			func Int32x4.Broadcast128() Int32x4
			func Int32x4.Compress(mask Mask32x4) Int32x4
			func Int32x4.ConcatPermute(y Int32x4, indices Uint32x4) Int32x4
			func Int32x4.CopySign(y Int32x4) Int32x4
			func Int32x4.Expand(mask Mask32x4) Int32x4
			func Int32x4.InterleaveHi(y Int32x4) Int32x4
			func Int32x4.InterleaveLo(y Int32x4) Int32x4
			func Int32x4.LeadingZeros() Int32x4
			func Int32x4.Masked(mask Mask32x4) Int32x4
			func Int32x4.Max(y Int32x4) Int32x4
			func Int32x4.Merge(y Int32x4, mask Mask32x4) Int32x4
			func Int32x4.Min(y Int32x4) Int32x4
			func Int32x4.Mul(y Int32x4) Int32x4
			func Int32x4.Not() Int32x4
			func Int32x4.OnesCount() Int32x4
			func Int32x4.Or(y Int32x4) Int32x4
			func Int32x4.PermuteScalars(a, b, c, d uint8) Int32x4
			func Int32x4.RotateAllLeft(shift uint8) Int32x4
			func Int32x4.RotateAllRight(shift uint8) Int32x4
			func Int32x4.RotateLeft(y Int32x4) Int32x4
			func Int32x4.RotateRight(y Int32x4) Int32x4
			func Int32x4.SelectFromPair(a, b, c, d uint8, y Int32x4) Int32x4
			func Int32x4.SetElem(index uint8, y int32) Int32x4
			func Int32x4.ShiftAllLeft(y uint64) Int32x4
			func Int32x4.ShiftAllLeftConcat(shift uint8, y Int32x4) Int32x4
			func Int32x4.ShiftAllRight(y uint64) Int32x4
			func Int32x4.ShiftAllRightConcat(shift uint8, y Int32x4) Int32x4
			func Int32x4.ShiftLeft(y Int32x4) Int32x4
			func Int32x4.ShiftLeftConcat(y Int32x4, z Int32x4) Int32x4
			func Int32x4.ShiftRight(y Int32x4) Int32x4
			func Int32x4.ShiftRightConcat(y Int32x4, z Int32x4) Int32x4
			func Int32x4.Sub(y Int32x4) Int32x4
			func Int32x4.SubPairs(y Int32x4) Int32x4
			func Int32x4.Xor(y Int32x4) Int32x4
			func Int32x8.GetHi() Int32x4
			func Int32x8.GetLo() Int32x4
			func Int64x2.AsInt32x4() (to Int32x4)
			func Int64x2.SaturateToInt32() Int32x4
			func Int64x2.TruncateToInt32() Int32x4
			func Int64x4.SaturateToInt32() Int32x4
			func Int64x4.TruncateToInt32() Int32x4
			func Int8x16.AsInt32x4() (to Int32x4)
			func Int8x16.DotProductQuadruple(y Uint8x16) Int32x4
			func Int8x16.DotProductQuadrupleSaturated(y Uint8x16) Int32x4
			func Int8x16.ExtendLo4ToInt32x4() Int32x4
			func Mask32x4.ToInt32x4() (to Int32x4)
			func Uint16x8.AsInt32x4() (to Int32x4)
			func Uint32x4.AsInt32x4() (to Int32x4)
			func Uint64x2.AsInt32x4() (to Int32x4)
			func Uint8x16.AsInt32x4() (to Int32x4)
		As Inputs Of (at least 37)
			func Int32x4.Add(y Int32x4) Int32x4
			func Int32x4.AddPairs(y Int32x4) Int32x4
			func Int32x4.And(y Int32x4) Int32x4
			func Int32x4.AndNot(y Int32x4) Int32x4
			func Int32x4.ConcatPermute(y Int32x4, indices Uint32x4) Int32x4
			func Int32x4.CopySign(y Int32x4) Int32x4
			func Int32x4.Equal(y Int32x4) Mask32x4
			func Int32x4.Greater(y Int32x4) Mask32x4
			func Int32x4.GreaterEqual(y Int32x4) Mask32x4
			func Int32x4.InterleaveHi(y Int32x4) Int32x4
			func Int32x4.InterleaveLo(y Int32x4) Int32x4
			func Int32x4.Less(y Int32x4) Mask32x4
			func Int32x4.LessEqual(y Int32x4) Mask32x4
			func Int32x4.Max(y Int32x4) Int32x4
			func Int32x4.Merge(y Int32x4, mask Mask32x4) Int32x4
			func Int32x4.Min(y Int32x4) Int32x4
			func Int32x4.Mul(y Int32x4) Int32x4
			func Int32x4.MulEvenWiden(y Int32x4) Int64x2
			func Int32x4.NotEqual(y Int32x4) Mask32x4
			func Int32x4.Or(y Int32x4) Int32x4
			func Int32x4.RotateLeft(y Int32x4) Int32x4
			func Int32x4.RotateRight(y Int32x4) Int32x4
			func Int32x4.SaturateToInt16Concat(y Int32x4) Int16x8
			func Int32x4.SelectFromPair(a, b, c, d uint8, y Int32x4) Int32x4
			func Int32x4.ShiftAllLeftConcat(shift uint8, y Int32x4) Int32x4
			func Int32x4.ShiftAllRightConcat(shift uint8, y Int32x4) Int32x4
			func Int32x4.ShiftLeft(y Int32x4) Int32x4
			func Int32x4.ShiftLeftConcat(y Int32x4, z Int32x4) Int32x4
			func Int32x4.ShiftLeftConcat(y Int32x4, z Int32x4) Int32x4
			func Int32x4.ShiftRight(y Int32x4) Int32x4
			func Int32x4.ShiftRightConcat(y Int32x4, z Int32x4) Int32x4
			func Int32x4.ShiftRightConcat(y Int32x4, z Int32x4) Int32x4
			func Int32x4.Sub(y Int32x4) Int32x4
			func Int32x4.SubPairs(y Int32x4) Int32x4
			func Int32x4.Xor(y Int32x4) Int32x4
			func Int32x8.SetHi(y Int32x4) Int32x8
			func Int32x8.SetLo(y Int32x4) Int32x8

	 type Int32x8 (struct)
		Int32x8 is a 256-bit SIMD vector of 8 int32

		Methods (total 76)
			( Int32x8) Abs() Int32x8
				Abs computes the absolute value of each element.
				
				Asm: VPABSD, CPU Feature: AVX2

			( Int32x8) Add(y Int32x8) Int32x8
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDD, CPU Feature: AVX2

			( Int32x8) AddPairs(y Int32x8) Int32x8
				AddPairs horizontally adds adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
				
				Asm: VPHADDD, CPU Feature: AVX2

			( Int32x8) And(y Int32x8) Int32x8
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPAND, CPU Feature: AVX2

			( Int32x8) AndNot(y Int32x8) Int32x8
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDN, CPU Feature: AVX2

			( Int32x8) AsFloat32x8() (to Float32x8)
				Float32x8 converts from Int32x8 to Float32x8

			( Int32x8) AsFloat64x4() (to Float64x4)
				Float64x4 converts from Int32x8 to Float64x4

			( Int32x8) AsInt16x16() (to Int16x16)
				Int16x16 converts from Int32x8 to Int16x16

			( Int32x8) AsInt64x4() (to Int64x4)
				Int64x4 converts from Int32x8 to Int64x4

			( Int32x8) AsInt8x32() (to Int8x32)
				Int8x32 converts from Int32x8 to Int8x32

			( Int32x8) AsUint16x16() (to Uint16x16)
				Uint16x16 converts from Int32x8 to Uint16x16

			( Int32x8) AsUint32x8() (to Uint32x8)
				Uint32x8 converts from Int32x8 to Uint32x8

			( Int32x8) AsUint64x4() (to Uint64x4)
				Uint64x4 converts from Int32x8 to Uint64x4

			( Int32x8) AsUint8x32() (to Uint8x32)
				Uint8x32 converts from Int32x8 to Uint8x32

			( Int32x8) Compress(mask Mask32x8) Int32x8
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSD, CPU Feature: AVX512

			( Int32x8) ConcatPermute(y Int32x8, indices Uint32x8) Int32x8
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2D, CPU Feature: AVX512

			( Int32x8) ConvertToFloat32() Float32x8
				ConvertToFloat32 converts element values to float32.
				
				Asm: VCVTDQ2PS, CPU Feature: AVX

			( Int32x8) ConvertToFloat64() Float64x8
				ConvertToFloat64 converts element values to float64.
				
				Asm: VCVTDQ2PD, CPU Feature: AVX512

			( Int32x8) CopySign(y Int32x8) Int32x8
				CopySign returns the product of the first operand with -1, 0, or 1,
				whichever constant is nearest to the value of the second operand.
				
				Asm: VPSIGND, CPU Feature: AVX2

			( Int32x8) Equal(y Int32x8) Mask32x8
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQD, CPU Feature: AVX2

			( Int32x8) Expand(mask Mask32x8) Int32x8
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDD, CPU Feature: AVX512

			( Int32x8) ExtendToInt64() Int64x8
				ExtendToInt64 converts element values to int64.
				The result vector's elements are sign-extended.
				
				Asm: VPMOVSXDQ, CPU Feature: AVX512

			( Int32x8) GetHi() Int32x4
				GetHi returns the upper half of x.
				
				Asm: VEXTRACTI128, CPU Feature: AVX2

			( Int32x8) GetLo() Int32x4
				GetLo returns the lower half of x.
				
				Asm: VEXTRACTI128, CPU Feature: AVX2

			( Int32x8) Greater(y Int32x8) Mask32x8
				Greater returns x greater-than y, elementwise.
				
				Asm: VPCMPGTD, CPU Feature: AVX2

			( Int32x8) GreaterEqual(y Int32x8) Mask32x8
				GreaterEqual returns a mask whose elements indicate whether x >= y
				
				Emulated, CPU Feature AVX2

			( Int32x8) InterleaveHiGrouped(y Int32x8) Int32x8
				InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKHDQ, CPU Feature: AVX2

			( Int32x8) InterleaveLoGrouped(y Int32x8) Int32x8
				InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKLDQ, CPU Feature: AVX2

			( Int32x8) IsZero() bool
				IsZero returns true if all elements of x are zeros.
				
				This method compiles to VPTEST x, x.
				x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
				
				Asm: VPTEST, CPU Feature: AVX

			( Int32x8) LeadingZeros() Int32x8
				LeadingZeros counts the leading zeros of each element in x.
				
				Asm: VPLZCNTD, CPU Feature: AVX512

			( Int32x8) Len() int
				Len returns the number of elements in a Int32x8

			( Int32x8) Less(y Int32x8) Mask32x8
				Less returns a mask whose elements indicate whether x < y
				
				Emulated, CPU Feature AVX2

			( Int32x8) LessEqual(y Int32x8) Mask32x8
				LessEqual returns a mask whose elements indicate whether x <= y
				
				Emulated, CPU Feature AVX2

			( Int32x8) Masked(mask Mask32x8) Int32x8
				Masked returns x but with elements zeroed where mask is false.

			( Int32x8) Max(y Int32x8) Int32x8
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXSD, CPU Feature: AVX2

			( Int32x8) Merge(y Int32x8, mask Mask32x8) Int32x8
				Merge returns x but with elements set to y where mask is false.

			( Int32x8) Min(y Int32x8) Int32x8
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINSD, CPU Feature: AVX2

			( Int32x8) Mul(y Int32x8) Int32x8
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VPMULLD, CPU Feature: AVX2

			( Int32x8) MulEvenWiden(y Int32x8) Int64x4
				MulEvenWiden multiplies even-indexed elements, widening the result.
				Result[i] = v1.Even[i] * v2.Even[i].
				
				Asm: VPMULDQ, CPU Feature: AVX2

			( Int32x8) Not() Int32x8
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX2

			( Int32x8) NotEqual(y Int32x8) Mask32x8
				NotEqual returns a mask whose elements indicate whether x != y
				
				Emulated, CPU Feature AVX2

			( Int32x8) OnesCount() Int32x8
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ

			( Int32x8) Or(y Int32x8) Int32x8
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPOR, CPU Feature: AVX2

			( Int32x8) Permute(indices Uint32x8) Int32x8
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 3 bits (values 0-7) of each element of indices is used
				
				Asm: VPERMD, CPU Feature: AVX2

			( Int32x8) PermuteScalarsGrouped(a, b, c, d uint8) Int32x8
				PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:
				
					result = {x[a], x[b], x[c], x[d], x[a+4], x[b+4], x[c+4], x[d+4]}
				
				Parameters a,b,c,d should have values between 0 and 3.
				If a through d are constants, then an instruction will be inlined, otherwise
				a jump table may be generated.
				
				Asm: VPSHUFD, CPU Feature: AVX2

			( Int32x8) RotateAllLeft(shift uint8) Int32x8
				RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPROLD, CPU Feature: AVX512

			( Int32x8) RotateAllRight(shift uint8) Int32x8
				RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPRORD, CPU Feature: AVX512

			( Int32x8) RotateLeft(y Int32x8) Int32x8
				RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
				
				Asm: VPROLVD, CPU Feature: AVX512

			( Int32x8) RotateRight(y Int32x8) Int32x8
				RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
				
				Asm: VPRORVD, CPU Feature: AVX512

			( Int32x8) SaturateToInt16() Int16x8
				SaturateToInt16 converts element values to int16.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPMOVSDW, CPU Feature: AVX512

			( Int32x8) SaturateToInt16Concat(y Int32x8) Int16x16
				SaturateToInt16Concat converts element values to int16.
				With each 128-bit as a group:
				The converted group from the first input vector will be packed to the lower part of the result vector,
				the converted group from the second input vector will be packed to the upper part of the result vector.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPACKSSDW, CPU Feature: AVX2

			( Int32x8) SaturateToInt8() Int8x16
				SaturateToInt8 converts element values to int8.
				Conversion is done with saturation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVSDB, CPU Feature: AVX512

			( Int32x8) SaturateToUint8() Int8x16
				SaturateToUint8 converts element values to uint8.
				Conversion is done with saturation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVSDB, CPU Feature: AVX512

			( Int32x8) Select128FromPair(lo, hi uint8, y Int32x8) Int32x8
				Select128FromPair treats the 256-bit vectors x and y as a single vector of four
				128-bit elements, and returns a 256-bit result formed by
				concatenating the two elements specified by lo and hi.
				For example,
				
					{40, 41, 42, 43, 50, 51, 52, 53}.Select128FromPair(3, 0, {60, 61, 62, 63, 70, 71, 72, 73})
				
				returns {70, 71, 72, 73, 40, 41, 42, 43}.
				
				lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table.
				lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
				
				Asm: VPERM2I128, CPU Feature: AVX2

			( Int32x8) SelectFromPairGrouped(a, b, c, d uint8, y Int32x8) Int32x8
				SelectFromPairGrouped returns, for each of the two 128-bit halves of
				the vectors x and y, the selection of four elements from  x and y,
				where selector values in the range 0-3 specify elements from x and
				values in the range 4-7 specify the 0-3 elements of y.
				When the selectors are constants and can be the selection
				can be implemented in a single instruction, it will be, otherwise
				it requires two. a is the source index of the least element in the
				output, and b, c, and d are the indices of the 2nd, 3rd, and 4th
				elements in the output.  For example,
				{1,2,4,8,16,32,64,128}.SelectFromPair(2,3,5,7,{9,25,49,81,121,169,225,289})
				
					returns {4,8,25,81,64,128,169,289}
				
				If the selectors are not constant this will translate to a function
				call.
				
				Asm: VSHUFPS, CPU Feature: AVX

			( Int32x8) SetHi(y Int32x4) Int32x8
				SetHi returns x with its upper half set to y.
				
				Asm: VINSERTI128, CPU Feature: AVX2

			( Int32x8) SetLo(y Int32x4) Int32x8
				SetLo returns x with its lower half set to y.
				
				Asm: VINSERTI128, CPU Feature: AVX2

			( Int32x8) ShiftAllLeft(y uint64) Int32x8
				ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
				
				Asm: VPSLLD, CPU Feature: AVX2

			( Int32x8) ShiftAllLeftConcat(shift uint8, y Int32x8) Int32x8
				ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHLDD, CPU Feature: AVX512VBMI2

			( Int32x8) ShiftAllRight(y uint64) Int32x8
				ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
				
				Asm: VPSRAD, CPU Feature: AVX2

			( Int32x8) ShiftAllRightConcat(shift uint8, y Int32x8) Int32x8
				ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHRDD, CPU Feature: AVX512VBMI2

			( Int32x8) ShiftLeft(y Int32x8) Int32x8
				ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
				
				Asm: VPSLLVD, CPU Feature: AVX2

			( Int32x8) ShiftLeftConcat(y Int32x8, z Int32x8) Int32x8
				ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
				
				Asm: VPSHLDVD, CPU Feature: AVX512VBMI2

			( Int32x8) ShiftRight(y Int32x8) Int32x8
				ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
				
				Asm: VPSRAVD, CPU Feature: AVX2

			( Int32x8) ShiftRightConcat(y Int32x8, z Int32x8) Int32x8
				ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
				
				Asm: VPSHRDVD, CPU Feature: AVX512VBMI2

			( Int32x8) Store(y *[8]int32)
				Store stores a Int32x8 to an array

			( Int32x8) StoreMasked(y *[8]int32, mask Mask32x8)
				StoreMasked stores a Int32x8 to an array,
				at those elements enabled by mask
				
				Asm: VMASKMOVD, CPU Feature: AVX2

			( Int32x8) StoreSlice(s []int32)
				StoreSlice stores x into a slice of at least 8 int32s

			( Int32x8) StoreSlicePart(s []int32)
				StoreSlicePart stores the 8 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 8 or more elements, the method is equivalent to x.StoreSlice.

			( Int32x8) String() string
				String returns a string representation of SIMD vector x

			( Int32x8) Sub(y Int32x8) Int32x8
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBD, CPU Feature: AVX2

			( Int32x8) SubPairs(y Int32x8) Int32x8
				SubPairs horizontally subtracts adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
				
				Asm: VPHSUBD, CPU Feature: AVX2

			( Int32x8) ToMask() (to Mask32x8)
				ToMask converts from Int32x8 to Mask32x8, mask element is set to true when the corresponding vector element is non-zero.

			( Int32x8) TruncateToInt16() Int16x8
				TruncateToInt16 converts element values to int16.
				Conversion is done with truncation on the vector elements.
				
				Asm: VPMOVDW, CPU Feature: AVX512

			( Int32x8) TruncateToInt8() Int8x16
				TruncateToInt8 converts element values to int8.
				Conversion is done with truncation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVDB, CPU Feature: AVX512

			( Int32x8) Xor(y Int32x8) Int32x8
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXOR, CPU Feature: AVX2

		Implements (at least 2)
			 Int32x8 : expvar.Var
			 Int32x8 : fmt.Stringer
		As Outputs Of (at least 68)
			func BroadcastInt32x8(x int32) Int32x8
			func LoadInt32x8(y *[8]int32) Int32x8
			func LoadInt32x8Slice(s []int32) Int32x8
			func LoadInt32x8SlicePart(s []int32) Int32x8
			func LoadMaskedInt32x8(y *[8]int32, mask Mask32x8) Int32x8
			func Float32x8.AsInt32x8() (to Int32x8)
			func Float32x8.ConvertToInt32() Int32x8
			func Float64x4.AsInt32x8() (to Int32x8)
			func Float64x8.ConvertToInt32() Int32x8
			func Int16x16.AsInt32x8() (to Int32x8)
			func Int16x16.DotProductPairs(y Int16x16) Int32x8
			func Int16x8.ExtendToInt32() Int32x8
			func Int32x16.GetHi() Int32x8
			func Int32x16.GetLo() Int32x8
			func Int32x4.Broadcast256() Int32x8
			func Int32x8.Abs() Int32x8
			func Int32x8.Add(y Int32x8) Int32x8
			func Int32x8.AddPairs(y Int32x8) Int32x8
			func Int32x8.And(y Int32x8) Int32x8
			func Int32x8.AndNot(y Int32x8) Int32x8
			func Int32x8.Compress(mask Mask32x8) Int32x8
			func Int32x8.ConcatPermute(y Int32x8, indices Uint32x8) Int32x8
			func Int32x8.CopySign(y Int32x8) Int32x8
			func Int32x8.Expand(mask Mask32x8) Int32x8
			func Int32x8.InterleaveHiGrouped(y Int32x8) Int32x8
			func Int32x8.InterleaveLoGrouped(y Int32x8) Int32x8
			func Int32x8.LeadingZeros() Int32x8
			func Int32x8.Masked(mask Mask32x8) Int32x8
			func Int32x8.Max(y Int32x8) Int32x8
			func Int32x8.Merge(y Int32x8, mask Mask32x8) Int32x8
			func Int32x8.Min(y Int32x8) Int32x8
			func Int32x8.Mul(y Int32x8) Int32x8
			func Int32x8.Not() Int32x8
			func Int32x8.OnesCount() Int32x8
			func Int32x8.Or(y Int32x8) Int32x8
			func Int32x8.Permute(indices Uint32x8) Int32x8
			func Int32x8.PermuteScalarsGrouped(a, b, c, d uint8) Int32x8
			func Int32x8.RotateAllLeft(shift uint8) Int32x8
			func Int32x8.RotateAllRight(shift uint8) Int32x8
			func Int32x8.RotateLeft(y Int32x8) Int32x8
			func Int32x8.RotateRight(y Int32x8) Int32x8
			func Int32x8.Select128FromPair(lo, hi uint8, y Int32x8) Int32x8
			func Int32x8.SelectFromPairGrouped(a, b, c, d uint8, y Int32x8) Int32x8
			func Int32x8.SetHi(y Int32x4) Int32x8
			func Int32x8.SetLo(y Int32x4) Int32x8
			func Int32x8.ShiftAllLeft(y uint64) Int32x8
			func Int32x8.ShiftAllLeftConcat(shift uint8, y Int32x8) Int32x8
			func Int32x8.ShiftAllRight(y uint64) Int32x8
			func Int32x8.ShiftAllRightConcat(shift uint8, y Int32x8) Int32x8
			func Int32x8.ShiftLeft(y Int32x8) Int32x8
			func Int32x8.ShiftLeftConcat(y Int32x8, z Int32x8) Int32x8
			func Int32x8.ShiftRight(y Int32x8) Int32x8
			func Int32x8.ShiftRightConcat(y Int32x8, z Int32x8) Int32x8
			func Int32x8.Sub(y Int32x8) Int32x8
			func Int32x8.SubPairs(y Int32x8) Int32x8
			func Int32x8.Xor(y Int32x8) Int32x8
			func Int64x4.AsInt32x8() (to Int32x8)
			func Int64x8.SaturateToInt32() Int32x8
			func Int64x8.TruncateToInt32() Int32x8
			func Int8x16.ExtendLo8ToInt32x8() Int32x8
			func Int8x32.AsInt32x8() (to Int32x8)
			func Int8x32.DotProductQuadruple(y Uint8x32) Int32x8
			func Int8x32.DotProductQuadrupleSaturated(y Uint8x32) Int32x8
			func Mask32x8.ToInt32x8() (to Int32x8)
			func Uint16x16.AsInt32x8() (to Int32x8)
			func Uint32x8.AsInt32x8() (to Int32x8)
			func Uint64x4.AsInt32x8() (to Int32x8)
			func Uint8x32.AsInt32x8() (to Int32x8)
		As Inputs Of (at least 38)
			func Int32x16.SetHi(y Int32x8) Int32x16
			func Int32x16.SetLo(y Int32x8) Int32x16
			func Int32x8.Add(y Int32x8) Int32x8
			func Int32x8.AddPairs(y Int32x8) Int32x8
			func Int32x8.And(y Int32x8) Int32x8
			func Int32x8.AndNot(y Int32x8) Int32x8
			func Int32x8.ConcatPermute(y Int32x8, indices Uint32x8) Int32x8
			func Int32x8.CopySign(y Int32x8) Int32x8
			func Int32x8.Equal(y Int32x8) Mask32x8
			func Int32x8.Greater(y Int32x8) Mask32x8
			func Int32x8.GreaterEqual(y Int32x8) Mask32x8
			func Int32x8.InterleaveHiGrouped(y Int32x8) Int32x8
			func Int32x8.InterleaveLoGrouped(y Int32x8) Int32x8
			func Int32x8.Less(y Int32x8) Mask32x8
			func Int32x8.LessEqual(y Int32x8) Mask32x8
			func Int32x8.Max(y Int32x8) Int32x8
			func Int32x8.Merge(y Int32x8, mask Mask32x8) Int32x8
			func Int32x8.Min(y Int32x8) Int32x8
			func Int32x8.Mul(y Int32x8) Int32x8
			func Int32x8.MulEvenWiden(y Int32x8) Int64x4
			func Int32x8.NotEqual(y Int32x8) Mask32x8
			func Int32x8.Or(y Int32x8) Int32x8
			func Int32x8.RotateLeft(y Int32x8) Int32x8
			func Int32x8.RotateRight(y Int32x8) Int32x8
			func Int32x8.SaturateToInt16Concat(y Int32x8) Int16x16
			func Int32x8.Select128FromPair(lo, hi uint8, y Int32x8) Int32x8
			func Int32x8.SelectFromPairGrouped(a, b, c, d uint8, y Int32x8) Int32x8
			func Int32x8.ShiftAllLeftConcat(shift uint8, y Int32x8) Int32x8
			func Int32x8.ShiftAllRightConcat(shift uint8, y Int32x8) Int32x8
			func Int32x8.ShiftLeft(y Int32x8) Int32x8
			func Int32x8.ShiftLeftConcat(y Int32x8, z Int32x8) Int32x8
			func Int32x8.ShiftLeftConcat(y Int32x8, z Int32x8) Int32x8
			func Int32x8.ShiftRight(y Int32x8) Int32x8
			func Int32x8.ShiftRightConcat(y Int32x8, z Int32x8) Int32x8
			func Int32x8.ShiftRightConcat(y Int32x8, z Int32x8) Int32x8
			func Int32x8.Sub(y Int32x8) Int32x8
			func Int32x8.SubPairs(y Int32x8) Int32x8
			func Int32x8.Xor(y Int32x8) Int32x8

	 type Int64x2 (struct)
		Int64x2 is a 128-bit SIMD vector of 2 int64

		Methods (total 70)
			( Int64x2) Abs() Int64x2
				Abs computes the absolute value of each element.
				
				Asm: VPABSQ, CPU Feature: AVX512

			( Int64x2) Add(y Int64x2) Int64x2
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDQ, CPU Feature: AVX

			( Int64x2) And(y Int64x2) Int64x2
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPAND, CPU Feature: AVX

			( Int64x2) AndNot(y Int64x2) Int64x2
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDN, CPU Feature: AVX

			( Int64x2) AsFloat32x4() (to Float32x4)
				Float32x4 converts from Int64x2 to Float32x4

			( Int64x2) AsFloat64x2() (to Float64x2)
				Float64x2 converts from Int64x2 to Float64x2

			( Int64x2) AsInt16x8() (to Int16x8)
				Int16x8 converts from Int64x2 to Int16x8

			( Int64x2) AsInt32x4() (to Int32x4)
				Int32x4 converts from Int64x2 to Int32x4

			( Int64x2) AsInt8x16() (to Int8x16)
				Int8x16 converts from Int64x2 to Int8x16

			( Int64x2) AsUint16x8() (to Uint16x8)
				Uint16x8 converts from Int64x2 to Uint16x8

			( Int64x2) AsUint32x4() (to Uint32x4)
				Uint32x4 converts from Int64x2 to Uint32x4

			( Int64x2) AsUint64x2() (to Uint64x2)
				Uint64x2 converts from Int64x2 to Uint64x2

			( Int64x2) AsUint8x16() (to Uint8x16)
				Uint8x16 converts from Int64x2 to Uint8x16

			( Int64x2) Broadcast128() Int64x2
				Broadcast128 copies element zero of its (128-bit) input to all elements of
				the 128-bit output vector.
				
				Asm: VPBROADCASTQ, CPU Feature: AVX2

			( Int64x2) Broadcast256() Int64x4
				Broadcast256 copies element zero of its (128-bit) input to all elements of
				the 256-bit output vector.
				
				Asm: VPBROADCASTQ, CPU Feature: AVX2

			( Int64x2) Broadcast512() Int64x8
				Broadcast512 copies element zero of its (128-bit) input to all elements of
				the 512-bit output vector.
				
				Asm: VPBROADCASTQ, CPU Feature: AVX512

			( Int64x2) Compress(mask Mask64x2) Int64x2
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSQ, CPU Feature: AVX512

			( Int64x2) ConcatPermute(y Int64x2, indices Uint64x2) Int64x2
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2Q, CPU Feature: AVX512

			( Int64x2) ConvertToFloat32() Float32x4
				ConvertToFloat32 converts element values to float32.
				
				Asm: VCVTQQ2PSX, CPU Feature: AVX512

			( Int64x2) ConvertToFloat64() Float64x2
				ConvertToFloat64 converts element values to float64.
				
				Asm: VCVTQQ2PD, CPU Feature: AVX512

			( Int64x2) Equal(y Int64x2) Mask64x2
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQQ, CPU Feature: AVX

			( Int64x2) Expand(mask Mask64x2) Int64x2
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDQ, CPU Feature: AVX512

			( Int64x2) GetElem(index uint8) int64
				GetElem retrieves a single constant-indexed element's value.
				
				index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPEXTRQ, CPU Feature: AVX

			( Int64x2) Greater(y Int64x2) Mask64x2
				Greater returns x greater-than y, elementwise.
				
				Asm: VPCMPGTQ, CPU Feature: AVX

			( Int64x2) GreaterEqual(y Int64x2) Mask64x2
				GreaterEqual returns a mask whose elements indicate whether x >= y
				
				Emulated, CPU Feature AVX

			( Int64x2) InterleaveHi(y Int64x2) Int64x2
				InterleaveHi interleaves the elements of the high halves of x and y.
				
				Asm: VPUNPCKHQDQ, CPU Feature: AVX

			( Int64x2) InterleaveLo(y Int64x2) Int64x2
				InterleaveLo interleaves the elements of the low halves of x and y.
				
				Asm: VPUNPCKLQDQ, CPU Feature: AVX

			( Int64x2) IsZero() bool
				IsZero returns true if all elements of x are zeros.
				
				This method compiles to VPTEST x, x.
				x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
				
				Asm: VPTEST, CPU Feature: AVX

			( Int64x2) LeadingZeros() Int64x2
				LeadingZeros counts the leading zeros of each element in x.
				
				Asm: VPLZCNTQ, CPU Feature: AVX512

			( Int64x2) Len() int
				Len returns the number of elements in a Int64x2

			( Int64x2) Less(y Int64x2) Mask64x2
				Less returns a mask whose elements indicate whether x < y
				
				Emulated, CPU Feature AVX

			( Int64x2) LessEqual(y Int64x2) Mask64x2
				LessEqual returns a mask whose elements indicate whether x <= y
				
				Emulated, CPU Feature AVX

			( Int64x2) Masked(mask Mask64x2) Int64x2
				Masked returns x but with elements zeroed where mask is false.

			( Int64x2) Max(y Int64x2) Int64x2
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXSQ, CPU Feature: AVX512

			( Int64x2) Merge(y Int64x2, mask Mask64x2) Int64x2
				Merge returns x but with elements set to y where mask is false.

			( Int64x2) Min(y Int64x2) Int64x2
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINSQ, CPU Feature: AVX512

			( Int64x2) Mul(y Int64x2) Int64x2
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VPMULLQ, CPU Feature: AVX512

			( Int64x2) Not() Int64x2
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX

			( Int64x2) NotEqual(y Int64x2) Mask64x2
				NotEqual returns a mask whose elements indicate whether x != y
				
				Emulated, CPU Feature AVX

			( Int64x2) OnesCount() Int64x2
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ

			( Int64x2) Or(y Int64x2) Int64x2
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPOR, CPU Feature: AVX

			( Int64x2) RotateAllLeft(shift uint8) Int64x2
				RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPROLQ, CPU Feature: AVX512

			( Int64x2) RotateAllRight(shift uint8) Int64x2
				RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPRORQ, CPU Feature: AVX512

			( Int64x2) RotateLeft(y Int64x2) Int64x2
				RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
				
				Asm: VPROLVQ, CPU Feature: AVX512

			( Int64x2) RotateRight(y Int64x2) Int64x2
				RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
				
				Asm: VPRORVQ, CPU Feature: AVX512

			( Int64x2) SaturateToInt16() Int16x8
				SaturateToInt16 converts element values to int16.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPMOVSQW, CPU Feature: AVX512

			( Int64x2) SaturateToInt32() Int32x4
				SaturateToInt32 converts element values to int32.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPMOVSQD, CPU Feature: AVX512

			( Int64x2) SaturateToInt8() Int8x16
				SaturateToInt8 converts element values to int8.
				Conversion is done with saturation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVSQB, CPU Feature: AVX512

			( Int64x2) SaturateToUint8() Int8x16
				SaturateToUint8 converts element values to uint8.
				Conversion is done with saturation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVSQB, CPU Feature: AVX512

			( Int64x2) SelectFromPair(a, b uint8, y Int64x2) Int64x2
				SelectFromPair returns the selection of two elements from the two
				vectors x and y, where selector values in the range 0-1 specify
				elements from x and values in the range 2-3 specify the 0-1 elements
				of y.  When the selectors are constants the selection can be
				implemented in a single instruction.
				
				If the selectors are not constant this will translate to a function
				call.
				
				Asm: VSHUFPD, CPU Feature: AVX

			( Int64x2) SetElem(index uint8, y int64) Int64x2
				SetElem sets a single constant-indexed element's value.
				
				index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPINSRQ, CPU Feature: AVX

			( Int64x2) ShiftAllLeft(y uint64) Int64x2
				ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
				
				Asm: VPSLLQ, CPU Feature: AVX

			( Int64x2) ShiftAllLeftConcat(shift uint8, y Int64x2) Int64x2
				ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHLDQ, CPU Feature: AVX512VBMI2

			( Int64x2) ShiftAllRight(y uint64) Int64x2
				ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
				
				Asm: VPSRAQ, CPU Feature: AVX512

			( Int64x2) ShiftAllRightConcat(shift uint8, y Int64x2) Int64x2
				ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHRDQ, CPU Feature: AVX512VBMI2

			( Int64x2) ShiftLeft(y Int64x2) Int64x2
				ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
				
				Asm: VPSLLVQ, CPU Feature: AVX2

			( Int64x2) ShiftLeftConcat(y Int64x2, z Int64x2) Int64x2
				ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
				
				Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2

			( Int64x2) ShiftRight(y Int64x2) Int64x2
				ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
				
				Asm: VPSRAVQ, CPU Feature: AVX512

			( Int64x2) ShiftRightConcat(y Int64x2, z Int64x2) Int64x2
				ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
				
				Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2

			( Int64x2) Store(y *[2]int64)
				Store stores a Int64x2 to an array

			( Int64x2) StoreMasked(y *[2]int64, mask Mask64x2)
				StoreMasked stores a Int64x2 to an array,
				at those elements enabled by mask
				
				Asm: VMASKMOVQ, CPU Feature: AVX2

			( Int64x2) StoreSlice(s []int64)
				StoreSlice stores x into a slice of at least 2 int64s

			( Int64x2) StoreSlicePart(s []int64)
				StoreSlicePart stores the 2 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 2 or more elements, the method is equivalent to x.StoreSlice.

			( Int64x2) String() string
				String returns a string representation of SIMD vector x

			( Int64x2) Sub(y Int64x2) Int64x2
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBQ, CPU Feature: AVX

			( Int64x2) ToMask() (to Mask64x2)
				ToMask converts from Int64x2 to Mask64x2, mask element is set to true when the corresponding vector element is non-zero.

			( Int64x2) TruncateToInt16() Int16x8
				TruncateToInt16 converts element values to int16.
				Conversion is done with truncation on the vector elements.
				
				Asm: VPMOVQW, CPU Feature: AVX512

			( Int64x2) TruncateToInt32() Int32x4
				TruncateToInt32 converts element values to int32.
				Conversion is done with truncation on the vector elements.
				
				Asm: VPMOVQD, CPU Feature: AVX512

			( Int64x2) TruncateToInt8() Int8x16
				TruncateToInt8 converts element values to int8.
				Conversion is done with truncation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVQB, CPU Feature: AVX512

			( Int64x2) Xor(y Int64x2) Int64x2
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXOR, CPU Feature: AVX

		Implements (at least 2)
			 Int64x2 : expvar.Var
			 Int64x2 : fmt.Stringer
		As Outputs Of (at least 57)
			func BroadcastInt64x2(x int64) Int64x2
			func LoadInt64x2(y *[2]int64) Int64x2
			func LoadInt64x2Slice(s []int64) Int64x2
			func LoadInt64x2SlicePart(s []int64) Int64x2
			func LoadMaskedInt64x2(y *[2]int64, mask Mask64x2) Int64x2
			func Float32x4.AsInt64x2() (to Int64x2)
			func Float64x2.AsInt64x2() (to Int64x2)
			func Float64x2.ConvertToInt64() Int64x2
			func Int16x8.AsInt64x2() (to Int64x2)
			func Int16x8.ExtendLo2ToInt64x2() Int64x2
			func Int32x4.AsInt64x2() (to Int64x2)
			func Int32x4.ExtendLo2ToInt64x2() Int64x2
			func Int32x4.MulEvenWiden(y Int32x4) Int64x2
			func Int64x2.Abs() Int64x2
			func Int64x2.Add(y Int64x2) Int64x2
			func Int64x2.And(y Int64x2) Int64x2
			func Int64x2.AndNot(y Int64x2) Int64x2
			func Int64x2.Broadcast128() Int64x2
			func Int64x2.Compress(mask Mask64x2) Int64x2
			func Int64x2.ConcatPermute(y Int64x2, indices Uint64x2) Int64x2
			func Int64x2.Expand(mask Mask64x2) Int64x2
			func Int64x2.InterleaveHi(y Int64x2) Int64x2
			func Int64x2.InterleaveLo(y Int64x2) Int64x2
			func Int64x2.LeadingZeros() Int64x2
			func Int64x2.Masked(mask Mask64x2) Int64x2
			func Int64x2.Max(y Int64x2) Int64x2
			func Int64x2.Merge(y Int64x2, mask Mask64x2) Int64x2
			func Int64x2.Min(y Int64x2) Int64x2
			func Int64x2.Mul(y Int64x2) Int64x2
			func Int64x2.Not() Int64x2
			func Int64x2.OnesCount() Int64x2
			func Int64x2.Or(y Int64x2) Int64x2
			func Int64x2.RotateAllLeft(shift uint8) Int64x2
			func Int64x2.RotateAllRight(shift uint8) Int64x2
			func Int64x2.RotateLeft(y Int64x2) Int64x2
			func Int64x2.RotateRight(y Int64x2) Int64x2
			func Int64x2.SelectFromPair(a, b uint8, y Int64x2) Int64x2
			func Int64x2.SetElem(index uint8, y int64) Int64x2
			func Int64x2.ShiftAllLeft(y uint64) Int64x2
			func Int64x2.ShiftAllLeftConcat(shift uint8, y Int64x2) Int64x2
			func Int64x2.ShiftAllRight(y uint64) Int64x2
			func Int64x2.ShiftAllRightConcat(shift uint8, y Int64x2) Int64x2
			func Int64x2.ShiftLeft(y Int64x2) Int64x2
			func Int64x2.ShiftLeftConcat(y Int64x2, z Int64x2) Int64x2
			func Int64x2.ShiftRight(y Int64x2) Int64x2
			func Int64x2.ShiftRightConcat(y Int64x2, z Int64x2) Int64x2
			func Int64x2.Sub(y Int64x2) Int64x2
			func Int64x2.Xor(y Int64x2) Int64x2
			func Int64x4.GetHi() Int64x2
			func Int64x4.GetLo() Int64x2
			func Int8x16.AsInt64x2() (to Int64x2)
			func Int8x16.ExtendLo2ToInt64x2() Int64x2
			func Mask64x2.ToInt64x2() (to Int64x2)
			func Uint16x8.AsInt64x2() (to Int64x2)
			func Uint32x4.AsInt64x2() (to Int64x2)
			func Uint64x2.AsInt64x2() (to Int64x2)
			func Uint8x16.AsInt64x2() (to Int64x2)
		As Inputs Of (at least 32)
			func Int64x2.Add(y Int64x2) Int64x2
			func Int64x2.And(y Int64x2) Int64x2
			func Int64x2.AndNot(y Int64x2) Int64x2
			func Int64x2.ConcatPermute(y Int64x2, indices Uint64x2) Int64x2
			func Int64x2.Equal(y Int64x2) Mask64x2
			func Int64x2.Greater(y Int64x2) Mask64x2
			func Int64x2.GreaterEqual(y Int64x2) Mask64x2
			func Int64x2.InterleaveHi(y Int64x2) Int64x2
			func Int64x2.InterleaveLo(y Int64x2) Int64x2
			func Int64x2.Less(y Int64x2) Mask64x2
			func Int64x2.LessEqual(y Int64x2) Mask64x2
			func Int64x2.Max(y Int64x2) Int64x2
			func Int64x2.Merge(y Int64x2, mask Mask64x2) Int64x2
			func Int64x2.Min(y Int64x2) Int64x2
			func Int64x2.Mul(y Int64x2) Int64x2
			func Int64x2.NotEqual(y Int64x2) Mask64x2
			func Int64x2.Or(y Int64x2) Int64x2
			func Int64x2.RotateLeft(y Int64x2) Int64x2
			func Int64x2.RotateRight(y Int64x2) Int64x2
			func Int64x2.SelectFromPair(a, b uint8, y Int64x2) Int64x2
			func Int64x2.ShiftAllLeftConcat(shift uint8, y Int64x2) Int64x2
			func Int64x2.ShiftAllRightConcat(shift uint8, y Int64x2) Int64x2
			func Int64x2.ShiftLeft(y Int64x2) Int64x2
			func Int64x2.ShiftLeftConcat(y Int64x2, z Int64x2) Int64x2
			func Int64x2.ShiftLeftConcat(y Int64x2, z Int64x2) Int64x2
			func Int64x2.ShiftRight(y Int64x2) Int64x2
			func Int64x2.ShiftRightConcat(y Int64x2, z Int64x2) Int64x2
			func Int64x2.ShiftRightConcat(y Int64x2, z Int64x2) Int64x2
			func Int64x2.Sub(y Int64x2) Int64x2
			func Int64x2.Xor(y Int64x2) Int64x2
			func Int64x4.SetHi(y Int64x2) Int64x4
			func Int64x4.SetLo(y Int64x2) Int64x4

	 type Int64x4 (struct)
		Int64x4 is a 256-bit SIMD vector of 4 int64

		Methods (total 71)
			( Int64x4) Abs() Int64x4
				Abs computes the absolute value of each element.
				
				Asm: VPABSQ, CPU Feature: AVX512

			( Int64x4) Add(y Int64x4) Int64x4
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDQ, CPU Feature: AVX2

			( Int64x4) And(y Int64x4) Int64x4
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPAND, CPU Feature: AVX2

			( Int64x4) AndNot(y Int64x4) Int64x4
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDN, CPU Feature: AVX2

			( Int64x4) AsFloat32x8() (to Float32x8)
				Float32x8 converts from Int64x4 to Float32x8

			( Int64x4) AsFloat64x4() (to Float64x4)
				Float64x4 converts from Int64x4 to Float64x4

			( Int64x4) AsInt16x16() (to Int16x16)
				Int16x16 converts from Int64x4 to Int16x16

			( Int64x4) AsInt32x8() (to Int32x8)
				Int32x8 converts from Int64x4 to Int32x8

			( Int64x4) AsInt8x32() (to Int8x32)
				Int8x32 converts from Int64x4 to Int8x32

			( Int64x4) AsUint16x16() (to Uint16x16)
				Uint16x16 converts from Int64x4 to Uint16x16

			( Int64x4) AsUint32x8() (to Uint32x8)
				Uint32x8 converts from Int64x4 to Uint32x8

			( Int64x4) AsUint64x4() (to Uint64x4)
				Uint64x4 converts from Int64x4 to Uint64x4

			( Int64x4) AsUint8x32() (to Uint8x32)
				Uint8x32 converts from Int64x4 to Uint8x32

			( Int64x4) Compress(mask Mask64x4) Int64x4
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSQ, CPU Feature: AVX512

			( Int64x4) ConcatPermute(y Int64x4, indices Uint64x4) Int64x4
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2Q, CPU Feature: AVX512

			( Int64x4) ConvertToFloat32() Float32x4
				ConvertToFloat32 converts element values to float32.
				
				Asm: VCVTQQ2PSY, CPU Feature: AVX512

			( Int64x4) ConvertToFloat64() Float64x4
				ConvertToFloat64 converts element values to float64.
				
				Asm: VCVTQQ2PD, CPU Feature: AVX512

			( Int64x4) Equal(y Int64x4) Mask64x4
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQQ, CPU Feature: AVX2

			( Int64x4) Expand(mask Mask64x4) Int64x4
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDQ, CPU Feature: AVX512

			( Int64x4) GetHi() Int64x2
				GetHi returns the upper half of x.
				
				Asm: VEXTRACTI128, CPU Feature: AVX2

			( Int64x4) GetLo() Int64x2
				GetLo returns the lower half of x.
				
				Asm: VEXTRACTI128, CPU Feature: AVX2

			( Int64x4) Greater(y Int64x4) Mask64x4
				Greater returns x greater-than y, elementwise.
				
				Asm: VPCMPGTQ, CPU Feature: AVX2

			( Int64x4) GreaterEqual(y Int64x4) Mask64x4
				GreaterEqual returns a mask whose elements indicate whether x >= y
				
				Emulated, CPU Feature AVX2

			( Int64x4) InterleaveHiGrouped(y Int64x4) Int64x4
				InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKHQDQ, CPU Feature: AVX2

			( Int64x4) InterleaveLoGrouped(y Int64x4) Int64x4
				InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKLQDQ, CPU Feature: AVX2

			( Int64x4) IsZero() bool
				IsZero returns true if all elements of x are zeros.
				
				This method compiles to VPTEST x, x.
				x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
				
				Asm: VPTEST, CPU Feature: AVX

			( Int64x4) LeadingZeros() Int64x4
				LeadingZeros counts the leading zeros of each element in x.
				
				Asm: VPLZCNTQ, CPU Feature: AVX512

			( Int64x4) Len() int
				Len returns the number of elements in a Int64x4

			( Int64x4) Less(y Int64x4) Mask64x4
				Less returns a mask whose elements indicate whether x < y
				
				Emulated, CPU Feature AVX2

			( Int64x4) LessEqual(y Int64x4) Mask64x4
				LessEqual returns a mask whose elements indicate whether x <= y
				
				Emulated, CPU Feature AVX2

			( Int64x4) Masked(mask Mask64x4) Int64x4
				Masked returns x but with elements zeroed where mask is false.

			( Int64x4) Max(y Int64x4) Int64x4
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXSQ, CPU Feature: AVX512

			( Int64x4) Merge(y Int64x4, mask Mask64x4) Int64x4
				Merge returns x but with elements set to y where mask is false.

			( Int64x4) Min(y Int64x4) Int64x4
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINSQ, CPU Feature: AVX512

			( Int64x4) Mul(y Int64x4) Int64x4
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VPMULLQ, CPU Feature: AVX512

			( Int64x4) Not() Int64x4
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX2

			( Int64x4) NotEqual(y Int64x4) Mask64x4
				NotEqual returns a mask whose elements indicate whether x != y
				
				Emulated, CPU Feature AVX2

			( Int64x4) OnesCount() Int64x4
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ

			( Int64x4) Or(y Int64x4) Int64x4
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPOR, CPU Feature: AVX2

			( Int64x4) Permute(indices Uint64x4) Int64x4
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 2 bits (values 0-3) of each element of indices is used
				
				Asm: VPERMQ, CPU Feature: AVX512

			( Int64x4) RotateAllLeft(shift uint8) Int64x4
				RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPROLQ, CPU Feature: AVX512

			( Int64x4) RotateAllRight(shift uint8) Int64x4
				RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPRORQ, CPU Feature: AVX512

			( Int64x4) RotateLeft(y Int64x4) Int64x4
				RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
				
				Asm: VPROLVQ, CPU Feature: AVX512

			( Int64x4) RotateRight(y Int64x4) Int64x4
				RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
				
				Asm: VPRORVQ, CPU Feature: AVX512

			( Int64x4) SaturateToInt16() Int16x8
				SaturateToInt16 converts element values to int16.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPMOVSQW, CPU Feature: AVX512

			( Int64x4) SaturateToInt32() Int32x4
				SaturateToInt32 converts element values to int32.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPMOVSQD, CPU Feature: AVX512

			( Int64x4) SaturateToInt8() Int8x16
				SaturateToInt8 converts element values to int8.
				Conversion is done with saturation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVSQB, CPU Feature: AVX512

			( Int64x4) SaturateToUint8() Int8x16
				SaturateToUint8 converts element values to uint8.
				Conversion is done with saturation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVSQB, CPU Feature: AVX512

			( Int64x4) Select128FromPair(lo, hi uint8, y Int64x4) Int64x4
				Select128FromPair treats the 256-bit vectors x and y as a single vector of four
				128-bit elements, and returns a 256-bit result formed by
				concatenating the two elements specified by lo and hi.
				For example,
				
					{40, 41, 50, 51}.Select128FromPair(3, 0, {60, 61, 70, 71})
				
				returns {70, 71, 40, 41}.
				
				lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table.
				lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
				
				Asm: VPERM2I128, CPU Feature: AVX2

			( Int64x4) SelectFromPairGrouped(a, b uint8, y Int64x4) Int64x4
				SelectFromPairGrouped returns, for each of the two 128-bit halves of
				the vectors x and y, the selection of two elements from the two
				vectors x and y, where selector values in the range 0-1 specify
				elements from x and values in the range 2-3 specify the 0-1 elements
				of y.  When the selectors are constants the selection can be
				implemented in a single instruction.
				
				If the selectors are not constant this will translate to a function
				call.
				
				Asm: VSHUFPD, CPU Feature: AVX

			( Int64x4) SetHi(y Int64x2) Int64x4
				SetHi returns x with its upper half set to y.
				
				Asm: VINSERTI128, CPU Feature: AVX2

			( Int64x4) SetLo(y Int64x2) Int64x4
				SetLo returns x with its lower half set to y.
				
				Asm: VINSERTI128, CPU Feature: AVX2

			( Int64x4) ShiftAllLeft(y uint64) Int64x4
				ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
				
				Asm: VPSLLQ, CPU Feature: AVX2

			( Int64x4) ShiftAllLeftConcat(shift uint8, y Int64x4) Int64x4
				ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHLDQ, CPU Feature: AVX512VBMI2

			( Int64x4) ShiftAllRight(y uint64) Int64x4
				ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
				
				Asm: VPSRAQ, CPU Feature: AVX512

			( Int64x4) ShiftAllRightConcat(shift uint8, y Int64x4) Int64x4
				ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHRDQ, CPU Feature: AVX512VBMI2

			( Int64x4) ShiftLeft(y Int64x4) Int64x4
				ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
				
				Asm: VPSLLVQ, CPU Feature: AVX2

			( Int64x4) ShiftLeftConcat(y Int64x4, z Int64x4) Int64x4
				ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
				
				Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2

			( Int64x4) ShiftRight(y Int64x4) Int64x4
				ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
				
				Asm: VPSRAVQ, CPU Feature: AVX512

			( Int64x4) ShiftRightConcat(y Int64x4, z Int64x4) Int64x4
				ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
				
				Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2

			( Int64x4) Store(y *[4]int64)
				Store stores a Int64x4 to an array

			( Int64x4) StoreMasked(y *[4]int64, mask Mask64x4)
				StoreMasked stores a Int64x4 to an array,
				at those elements enabled by mask
				
				Asm: VMASKMOVQ, CPU Feature: AVX2

			( Int64x4) StoreSlice(s []int64)
				StoreSlice stores x into a slice of at least 4 int64s

			( Int64x4) StoreSlicePart(s []int64)
				StoreSlicePart stores the 4 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 4 or more elements, the method is equivalent to x.StoreSlice.

			( Int64x4) String() string
				String returns a string representation of SIMD vector x

			( Int64x4) Sub(y Int64x4) Int64x4
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBQ, CPU Feature: AVX2

			( Int64x4) ToMask() (to Mask64x4)
				ToMask converts from Int64x4 to Mask64x4, mask element is set to true when the corresponding vector element is non-zero.

			( Int64x4) TruncateToInt16() Int16x8
				TruncateToInt16 converts element values to int16.
				Conversion is done with truncation on the vector elements.
				
				Asm: VPMOVQW, CPU Feature: AVX512

			( Int64x4) TruncateToInt32() Int32x4
				TruncateToInt32 converts element values to int32.
				Conversion is done with truncation on the vector elements.
				
				Asm: VPMOVQD, CPU Feature: AVX512

			( Int64x4) TruncateToInt8() Int8x16
				TruncateToInt8 converts element values to int8.
				Conversion is done with truncation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVQB, CPU Feature: AVX512

			( Int64x4) Xor(y Int64x4) Int64x4
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXOR, CPU Feature: AVX2

		Implements (at least 2)
			 Int64x4 : expvar.Var
			 Int64x4 : fmt.Stringer
		As Outputs Of (at least 61)
			func BroadcastInt64x4(x int64) Int64x4
			func LoadInt64x4(y *[4]int64) Int64x4
			func LoadInt64x4Slice(s []int64) Int64x4
			func LoadInt64x4SlicePart(s []int64) Int64x4
			func LoadMaskedInt64x4(y *[4]int64, mask Mask64x4) Int64x4
			func Float32x4.ConvertToInt64() Int64x4
			func Float32x8.AsInt64x4() (to Int64x4)
			func Float64x4.AsInt64x4() (to Int64x4)
			func Float64x4.ConvertToInt64() Int64x4
			func Int16x16.AsInt64x4() (to Int64x4)
			func Int16x8.ExtendLo4ToInt64x4() Int64x4
			func Int32x4.ExtendToInt64() Int64x4
			func Int32x8.AsInt64x4() (to Int64x4)
			func Int32x8.MulEvenWiden(y Int32x8) Int64x4
			func Int64x2.Broadcast256() Int64x4
			func Int64x4.Abs() Int64x4
			func Int64x4.Add(y Int64x4) Int64x4
			func Int64x4.And(y Int64x4) Int64x4
			func Int64x4.AndNot(y Int64x4) Int64x4
			func Int64x4.Compress(mask Mask64x4) Int64x4
			func Int64x4.ConcatPermute(y Int64x4, indices Uint64x4) Int64x4
			func Int64x4.Expand(mask Mask64x4) Int64x4
			func Int64x4.InterleaveHiGrouped(y Int64x4) Int64x4
			func Int64x4.InterleaveLoGrouped(y Int64x4) Int64x4
			func Int64x4.LeadingZeros() Int64x4
			func Int64x4.Masked(mask Mask64x4) Int64x4
			func Int64x4.Max(y Int64x4) Int64x4
			func Int64x4.Merge(y Int64x4, mask Mask64x4) Int64x4
			func Int64x4.Min(y Int64x4) Int64x4
			func Int64x4.Mul(y Int64x4) Int64x4
			func Int64x4.Not() Int64x4
			func Int64x4.OnesCount() Int64x4
			func Int64x4.Or(y Int64x4) Int64x4
			func Int64x4.Permute(indices Uint64x4) Int64x4
			func Int64x4.RotateAllLeft(shift uint8) Int64x4
			func Int64x4.RotateAllRight(shift uint8) Int64x4
			func Int64x4.RotateLeft(y Int64x4) Int64x4
			func Int64x4.RotateRight(y Int64x4) Int64x4
			func Int64x4.Select128FromPair(lo, hi uint8, y Int64x4) Int64x4
			func Int64x4.SelectFromPairGrouped(a, b uint8, y Int64x4) Int64x4
			func Int64x4.SetHi(y Int64x2) Int64x4
			func Int64x4.SetLo(y Int64x2) Int64x4
			func Int64x4.ShiftAllLeft(y uint64) Int64x4
			func Int64x4.ShiftAllLeftConcat(shift uint8, y Int64x4) Int64x4
			func Int64x4.ShiftAllRight(y uint64) Int64x4
			func Int64x4.ShiftAllRightConcat(shift uint8, y Int64x4) Int64x4
			func Int64x4.ShiftLeft(y Int64x4) Int64x4
			func Int64x4.ShiftLeftConcat(y Int64x4, z Int64x4) Int64x4
			func Int64x4.ShiftRight(y Int64x4) Int64x4
			func Int64x4.ShiftRightConcat(y Int64x4, z Int64x4) Int64x4
			func Int64x4.Sub(y Int64x4) Int64x4
			func Int64x4.Xor(y Int64x4) Int64x4
			func Int64x8.GetHi() Int64x4
			func Int64x8.GetLo() Int64x4
			func Int8x16.ExtendLo4ToInt64x4() Int64x4
			func Int8x32.AsInt64x4() (to Int64x4)
			func Mask64x4.ToInt64x4() (to Int64x4)
			func Uint16x16.AsInt64x4() (to Int64x4)
			func Uint32x8.AsInt64x4() (to Int64x4)
			func Uint64x4.AsInt64x4() (to Int64x4)
			func Uint8x32.AsInt64x4() (to Int64x4)
		As Inputs Of (at least 33)
			func Int64x4.Add(y Int64x4) Int64x4
			func Int64x4.And(y Int64x4) Int64x4
			func Int64x4.AndNot(y Int64x4) Int64x4
			func Int64x4.ConcatPermute(y Int64x4, indices Uint64x4) Int64x4
			func Int64x4.Equal(y Int64x4) Mask64x4
			func Int64x4.Greater(y Int64x4) Mask64x4
			func Int64x4.GreaterEqual(y Int64x4) Mask64x4
			func Int64x4.InterleaveHiGrouped(y Int64x4) Int64x4
			func Int64x4.InterleaveLoGrouped(y Int64x4) Int64x4
			func Int64x4.Less(y Int64x4) Mask64x4
			func Int64x4.LessEqual(y Int64x4) Mask64x4
			func Int64x4.Max(y Int64x4) Int64x4
			func Int64x4.Merge(y Int64x4, mask Mask64x4) Int64x4
			func Int64x4.Min(y Int64x4) Int64x4
			func Int64x4.Mul(y Int64x4) Int64x4
			func Int64x4.NotEqual(y Int64x4) Mask64x4
			func Int64x4.Or(y Int64x4) Int64x4
			func Int64x4.RotateLeft(y Int64x4) Int64x4
			func Int64x4.RotateRight(y Int64x4) Int64x4
			func Int64x4.Select128FromPair(lo, hi uint8, y Int64x4) Int64x4
			func Int64x4.SelectFromPairGrouped(a, b uint8, y Int64x4) Int64x4
			func Int64x4.ShiftAllLeftConcat(shift uint8, y Int64x4) Int64x4
			func Int64x4.ShiftAllRightConcat(shift uint8, y Int64x4) Int64x4
			func Int64x4.ShiftLeft(y Int64x4) Int64x4
			func Int64x4.ShiftLeftConcat(y Int64x4, z Int64x4) Int64x4
			func Int64x4.ShiftLeftConcat(y Int64x4, z Int64x4) Int64x4
			func Int64x4.ShiftRight(y Int64x4) Int64x4
			func Int64x4.ShiftRightConcat(y Int64x4, z Int64x4) Int64x4
			func Int64x4.ShiftRightConcat(y Int64x4, z Int64x4) Int64x4
			func Int64x4.Sub(y Int64x4) Int64x4
			func Int64x4.Xor(y Int64x4) Int64x4
			func Int64x8.SetHi(y Int64x4) Int64x8
			func Int64x8.SetLo(y Int64x4) Int64x8

	 type Int64x8 (struct)
		Int64x8 is a 512-bit SIMD vector of 8 int64

		Methods (total 69)
			( Int64x8) Abs() Int64x8
				Abs computes the absolute value of each element.
				
				Asm: VPABSQ, CPU Feature: AVX512

			( Int64x8) Add(y Int64x8) Int64x8
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDQ, CPU Feature: AVX512

			( Int64x8) And(y Int64x8) Int64x8
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPANDQ, CPU Feature: AVX512

			( Int64x8) AndNot(y Int64x8) Int64x8
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDNQ, CPU Feature: AVX512

			( Int64x8) AsFloat32x16() (to Float32x16)
				Float32x16 converts from Int64x8 to Float32x16

			( Int64x8) AsFloat64x8() (to Float64x8)
				Float64x8 converts from Int64x8 to Float64x8

			( Int64x8) AsInt16x32() (to Int16x32)
				Int16x32 converts from Int64x8 to Int16x32

			( Int64x8) AsInt32x16() (to Int32x16)
				Int32x16 converts from Int64x8 to Int32x16

			( Int64x8) AsInt8x64() (to Int8x64)
				Int8x64 converts from Int64x8 to Int8x64

			( Int64x8) AsUint16x32() (to Uint16x32)
				Uint16x32 converts from Int64x8 to Uint16x32

			( Int64x8) AsUint32x16() (to Uint32x16)
				Uint32x16 converts from Int64x8 to Uint32x16

			( Int64x8) AsUint64x8() (to Uint64x8)
				Uint64x8 converts from Int64x8 to Uint64x8

			( Int64x8) AsUint8x64() (to Uint8x64)
				Uint8x64 converts from Int64x8 to Uint8x64

			( Int64x8) Compress(mask Mask64x8) Int64x8
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSQ, CPU Feature: AVX512

			( Int64x8) ConcatPermute(y Int64x8, indices Uint64x8) Int64x8
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2Q, CPU Feature: AVX512

			( Int64x8) ConvertToFloat32() Float32x8
				ConvertToFloat32 converts element values to float32.
				
				Asm: VCVTQQ2PS, CPU Feature: AVX512

			( Int64x8) ConvertToFloat64() Float64x8
				ConvertToFloat64 converts element values to float64.
				
				Asm: VCVTQQ2PD, CPU Feature: AVX512

			( Int64x8) Equal(y Int64x8) Mask64x8
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQQ, CPU Feature: AVX512

			( Int64x8) Expand(mask Mask64x8) Int64x8
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDQ, CPU Feature: AVX512

			( Int64x8) GetHi() Int64x4
				GetHi returns the upper half of x.
				
				Asm: VEXTRACTI64X4, CPU Feature: AVX512

			( Int64x8) GetLo() Int64x4
				GetLo returns the lower half of x.
				
				Asm: VEXTRACTI64X4, CPU Feature: AVX512

			( Int64x8) Greater(y Int64x8) Mask64x8
				Greater returns x greater-than y, elementwise.
				
				Asm: VPCMPGTQ, CPU Feature: AVX512

			( Int64x8) GreaterEqual(y Int64x8) Mask64x8
				GreaterEqual returns x greater-than-or-equals y, elementwise.
				
				Asm: VPCMPQ, CPU Feature: AVX512

			( Int64x8) InterleaveHiGrouped(y Int64x8) Int64x8
				InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKHQDQ, CPU Feature: AVX512

			( Int64x8) InterleaveLoGrouped(y Int64x8) Int64x8
				InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKLQDQ, CPU Feature: AVX512

			( Int64x8) LeadingZeros() Int64x8
				LeadingZeros counts the leading zeros of each element in x.
				
				Asm: VPLZCNTQ, CPU Feature: AVX512

			( Int64x8) Len() int
				Len returns the number of elements in a Int64x8

			( Int64x8) Less(y Int64x8) Mask64x8
				Less returns x less-than y, elementwise.
				
				Asm: VPCMPQ, CPU Feature: AVX512

			( Int64x8) LessEqual(y Int64x8) Mask64x8
				LessEqual returns x less-than-or-equals y, elementwise.
				
				Asm: VPCMPQ, CPU Feature: AVX512

			( Int64x8) Masked(mask Mask64x8) Int64x8
				Masked returns x but with elements zeroed where mask is false.

			( Int64x8) Max(y Int64x8) Int64x8
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXSQ, CPU Feature: AVX512

			( Int64x8) Merge(y Int64x8, mask Mask64x8) Int64x8
				Merge returns x but with elements set to y where m is false.

			( Int64x8) Min(y Int64x8) Int64x8
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINSQ, CPU Feature: AVX512

			( Int64x8) Mul(y Int64x8) Int64x8
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VPMULLQ, CPU Feature: AVX512

			( Int64x8) Not() Int64x8
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX512

			( Int64x8) NotEqual(y Int64x8) Mask64x8
				NotEqual returns x not-equals y, elementwise.
				
				Asm: VPCMPQ, CPU Feature: AVX512

			( Int64x8) OnesCount() Int64x8
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ

			( Int64x8) Or(y Int64x8) Int64x8
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPORQ, CPU Feature: AVX512

			( Int64x8) Permute(indices Uint64x8) Int64x8
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 3 bits (values 0-7) of each element of indices is used
				
				Asm: VPERMQ, CPU Feature: AVX512

			( Int64x8) RotateAllLeft(shift uint8) Int64x8
				RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPROLQ, CPU Feature: AVX512

			( Int64x8) RotateAllRight(shift uint8) Int64x8
				RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPRORQ, CPU Feature: AVX512

			( Int64x8) RotateLeft(y Int64x8) Int64x8
				RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
				
				Asm: VPROLVQ, CPU Feature: AVX512

			( Int64x8) RotateRight(y Int64x8) Int64x8
				RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
				
				Asm: VPRORVQ, CPU Feature: AVX512

			( Int64x8) SaturateToInt16() Int16x8
				SaturateToInt16 converts element values to int16.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPMOVSQW, CPU Feature: AVX512

			( Int64x8) SaturateToInt32() Int32x8
				SaturateToInt32 converts element values to int32.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPMOVSQD, CPU Feature: AVX512

			( Int64x8) SaturateToInt8() Int8x16
				SaturateToInt8 converts element values to int8.
				Conversion is done with saturation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVSQB, CPU Feature: AVX512

			( Int64x8) SaturateToUint8() Int8x16
				SaturateToUint8 converts element values to uint8.
				Conversion is done with saturation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVSQB, CPU Feature: AVX512

			( Int64x8) SelectFromPairGrouped(a, b uint8, y Int64x8) Int64x8
				SelectFromPairGrouped returns, for each of the four 128-bit subvectors
				of the vectors x and y, the selection of two elements from the two
				vectors x and y, where selector values in the range 0-1 specify
				elements from x and values in the range 2-3 specify the 0-1 elements
				of y.  When the selectors are constants the selection can be
				implemented in a single instruction.
				
				If the selectors are not constant this will translate to a function
				call.
				
				Asm: VSHUFPD, CPU Feature: AVX512

			( Int64x8) SetHi(y Int64x4) Int64x8
				SetHi returns x with its upper half set to y.
				
				Asm: VINSERTI64X4, CPU Feature: AVX512

			( Int64x8) SetLo(y Int64x4) Int64x8
				SetLo returns x with its lower half set to y.
				
				Asm: VINSERTI64X4, CPU Feature: AVX512

			( Int64x8) ShiftAllLeft(y uint64) Int64x8
				ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
				
				Asm: VPSLLQ, CPU Feature: AVX512

			( Int64x8) ShiftAllLeftConcat(shift uint8, y Int64x8) Int64x8
				ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHLDQ, CPU Feature: AVX512VBMI2

			( Int64x8) ShiftAllRight(y uint64) Int64x8
				ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
				
				Asm: VPSRAQ, CPU Feature: AVX512

			( Int64x8) ShiftAllRightConcat(shift uint8, y Int64x8) Int64x8
				ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHRDQ, CPU Feature: AVX512VBMI2

			( Int64x8) ShiftLeft(y Int64x8) Int64x8
				ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
				
				Asm: VPSLLVQ, CPU Feature: AVX512

			( Int64x8) ShiftLeftConcat(y Int64x8, z Int64x8) Int64x8
				ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
				
				Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2

			( Int64x8) ShiftRight(y Int64x8) Int64x8
				ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
				
				Asm: VPSRAVQ, CPU Feature: AVX512

			( Int64x8) ShiftRightConcat(y Int64x8, z Int64x8) Int64x8
				ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
				
				Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2

			( Int64x8) Store(y *[8]int64)
				Store stores a Int64x8 to an array

			( Int64x8) StoreMasked(y *[8]int64, mask Mask64x8)
				StoreMasked stores a Int64x8 to an array,
				at those elements enabled by mask
				
				Asm: VMOVDQU64, CPU Feature: AVX512

			( Int64x8) StoreSlice(s []int64)
				StoreSlice stores x into a slice of at least 8 int64s

			( Int64x8) StoreSlicePart(s []int64)
				StoreSlicePart stores the 8 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 8 or more elements, the method is equivalent to x.StoreSlice.

			( Int64x8) String() string
				String returns a string representation of SIMD vector x

			( Int64x8) Sub(y Int64x8) Int64x8
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBQ, CPU Feature: AVX512

			( Int64x8) ToMask() (to Mask64x8)
				ToMask converts from Int64x8 to Mask64x8, mask element is set to true when the corresponding vector element is non-zero.

			( Int64x8) TruncateToInt16() Int16x8
				TruncateToInt16 converts element values to int16.
				Conversion is done with truncation on the vector elements.
				
				Asm: VPMOVQW, CPU Feature: AVX512

			( Int64x8) TruncateToInt32() Int32x8
				TruncateToInt32 converts element values to int32.
				Conversion is done with truncation on the vector elements.
				
				Asm: VPMOVQD, CPU Feature: AVX512

			( Int64x8) TruncateToInt8() Int8x16
				TruncateToInt8 converts element values to int8.
				Conversion is done with truncation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVQB, CPU Feature: AVX512

			( Int64x8) Xor(y Int64x8) Int64x8
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXORQ, CPU Feature: AVX512

		Implements (at least 2)
			 Int64x8 : expvar.Var
			 Int64x8 : fmt.Stringer
		As Outputs Of (at least 57)
			func BroadcastInt64x8(x int64) Int64x8
			func LoadInt64x8(y *[8]int64) Int64x8
			func LoadInt64x8Slice(s []int64) Int64x8
			func LoadInt64x8SlicePart(s []int64) Int64x8
			func LoadMaskedInt64x8(y *[8]int64, mask Mask64x8) Int64x8
			func Float32x16.AsInt64x8() (to Int64x8)
			func Float32x8.ConvertToInt64() Int64x8
			func Float64x8.AsInt64x8() (to Int64x8)
			func Float64x8.ConvertToInt64() Int64x8
			func Int16x32.AsInt64x8() (to Int64x8)
			func Int16x8.ExtendToInt64() Int64x8
			func Int32x16.AsInt64x8() (to Int64x8)
			func Int32x8.ExtendToInt64() Int64x8
			func Int64x2.Broadcast512() Int64x8
			func Int64x8.Abs() Int64x8
			func Int64x8.Add(y Int64x8) Int64x8
			func Int64x8.And(y Int64x8) Int64x8
			func Int64x8.AndNot(y Int64x8) Int64x8
			func Int64x8.Compress(mask Mask64x8) Int64x8
			func Int64x8.ConcatPermute(y Int64x8, indices Uint64x8) Int64x8
			func Int64x8.Expand(mask Mask64x8) Int64x8
			func Int64x8.InterleaveHiGrouped(y Int64x8) Int64x8
			func Int64x8.InterleaveLoGrouped(y Int64x8) Int64x8
			func Int64x8.LeadingZeros() Int64x8
			func Int64x8.Masked(mask Mask64x8) Int64x8
			func Int64x8.Max(y Int64x8) Int64x8
			func Int64x8.Merge(y Int64x8, mask Mask64x8) Int64x8
			func Int64x8.Min(y Int64x8) Int64x8
			func Int64x8.Mul(y Int64x8) Int64x8
			func Int64x8.Not() Int64x8
			func Int64x8.OnesCount() Int64x8
			func Int64x8.Or(y Int64x8) Int64x8
			func Int64x8.Permute(indices Uint64x8) Int64x8
			func Int64x8.RotateAllLeft(shift uint8) Int64x8
			func Int64x8.RotateAllRight(shift uint8) Int64x8
			func Int64x8.RotateLeft(y Int64x8) Int64x8
			func Int64x8.RotateRight(y Int64x8) Int64x8
			func Int64x8.SelectFromPairGrouped(a, b uint8, y Int64x8) Int64x8
			func Int64x8.SetHi(y Int64x4) Int64x8
			func Int64x8.SetLo(y Int64x4) Int64x8
			func Int64x8.ShiftAllLeft(y uint64) Int64x8
			func Int64x8.ShiftAllLeftConcat(shift uint8, y Int64x8) Int64x8
			func Int64x8.ShiftAllRight(y uint64) Int64x8
			func Int64x8.ShiftAllRightConcat(shift uint8, y Int64x8) Int64x8
			func Int64x8.ShiftLeft(y Int64x8) Int64x8
			func Int64x8.ShiftLeftConcat(y Int64x8, z Int64x8) Int64x8
			func Int64x8.ShiftRight(y Int64x8) Int64x8
			func Int64x8.ShiftRightConcat(y Int64x8, z Int64x8) Int64x8
			func Int64x8.Sub(y Int64x8) Int64x8
			func Int64x8.Xor(y Int64x8) Int64x8
			func Int8x16.ExtendLo8ToInt64x8() Int64x8
			func Int8x64.AsInt64x8() (to Int64x8)
			func Mask64x8.ToInt64x8() (to Int64x8)
			func Uint16x32.AsInt64x8() (to Int64x8)
			func Uint32x16.AsInt64x8() (to Int64x8)
			func Uint64x8.AsInt64x8() (to Int64x8)
			func Uint8x64.AsInt64x8() (to Int64x8)
		As Inputs Of (at least 30)
			func Int64x8.Add(y Int64x8) Int64x8
			func Int64x8.And(y Int64x8) Int64x8
			func Int64x8.AndNot(y Int64x8) Int64x8
			func Int64x8.ConcatPermute(y Int64x8, indices Uint64x8) Int64x8
			func Int64x8.Equal(y Int64x8) Mask64x8
			func Int64x8.Greater(y Int64x8) Mask64x8
			func Int64x8.GreaterEqual(y Int64x8) Mask64x8
			func Int64x8.InterleaveHiGrouped(y Int64x8) Int64x8
			func Int64x8.InterleaveLoGrouped(y Int64x8) Int64x8
			func Int64x8.Less(y Int64x8) Mask64x8
			func Int64x8.LessEqual(y Int64x8) Mask64x8
			func Int64x8.Max(y Int64x8) Int64x8
			func Int64x8.Merge(y Int64x8, mask Mask64x8) Int64x8
			func Int64x8.Min(y Int64x8) Int64x8
			func Int64x8.Mul(y Int64x8) Int64x8
			func Int64x8.NotEqual(y Int64x8) Mask64x8
			func Int64x8.Or(y Int64x8) Int64x8
			func Int64x8.RotateLeft(y Int64x8) Int64x8
			func Int64x8.RotateRight(y Int64x8) Int64x8
			func Int64x8.SelectFromPairGrouped(a, b uint8, y Int64x8) Int64x8
			func Int64x8.ShiftAllLeftConcat(shift uint8, y Int64x8) Int64x8
			func Int64x8.ShiftAllRightConcat(shift uint8, y Int64x8) Int64x8
			func Int64x8.ShiftLeft(y Int64x8) Int64x8
			func Int64x8.ShiftLeftConcat(y Int64x8, z Int64x8) Int64x8
			func Int64x8.ShiftLeftConcat(y Int64x8, z Int64x8) Int64x8
			func Int64x8.ShiftRight(y Int64x8) Int64x8
			func Int64x8.ShiftRightConcat(y Int64x8, z Int64x8) Int64x8
			func Int64x8.ShiftRightConcat(y Int64x8, z Int64x8) Int64x8
			func Int64x8.Sub(y Int64x8) Int64x8
			func Int64x8.Xor(y Int64x8) Int64x8

	 type Int8x16 (struct)
		Int8x16 is a 128-bit SIMD vector of 16 int8

		Methods (total 58)
			( Int8x16) Abs() Int8x16
				Abs computes the absolute value of each element.
				
				Asm: VPABSB, CPU Feature: AVX

			( Int8x16) Add(y Int8x16) Int8x16
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDB, CPU Feature: AVX

			( Int8x16) AddSaturated(y Int8x16) Int8x16
				AddSaturated adds corresponding elements of two vectors with saturation.
				
				Asm: VPADDSB, CPU Feature: AVX

			( Int8x16) And(y Int8x16) Int8x16
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPAND, CPU Feature: AVX

			( Int8x16) AndNot(y Int8x16) Int8x16
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDN, CPU Feature: AVX

			( Int8x16) AsFloat32x4() (to Float32x4)
				Float32x4 converts from Int8x16 to Float32x4

			( Int8x16) AsFloat64x2() (to Float64x2)
				Float64x2 converts from Int8x16 to Float64x2

			( Int8x16) AsInt16x8() (to Int16x8)
				Int16x8 converts from Int8x16 to Int16x8

			( Int8x16) AsInt32x4() (to Int32x4)
				Int32x4 converts from Int8x16 to Int32x4

			( Int8x16) AsInt64x2() (to Int64x2)
				Int64x2 converts from Int8x16 to Int64x2

			( Int8x16) AsUint16x8() (to Uint16x8)
				Uint16x8 converts from Int8x16 to Uint16x8

			( Int8x16) AsUint32x4() (to Uint32x4)
				Uint32x4 converts from Int8x16 to Uint32x4

			( Int8x16) AsUint64x2() (to Uint64x2)
				Uint64x2 converts from Int8x16 to Uint64x2

			( Int8x16) AsUint8x16() (to Uint8x16)
				Uint8x16 converts from Int8x16 to Uint8x16

			( Int8x16) Broadcast128() Int8x16
				Broadcast128 copies element zero of its (128-bit) input to all elements of
				the 128-bit output vector.
				
				Asm: VPBROADCASTB, CPU Feature: AVX2

			( Int8x16) Broadcast256() Int8x32
				Broadcast256 copies element zero of its (128-bit) input to all elements of
				the 256-bit output vector.
				
				Asm: VPBROADCASTB, CPU Feature: AVX2

			( Int8x16) Broadcast512() Int8x64
				Broadcast512 copies element zero of its (128-bit) input to all elements of
				the 512-bit output vector.
				
				Asm: VPBROADCASTB, CPU Feature: AVX512

			( Int8x16) Compress(mask Mask8x16) Int8x16
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2

			( Int8x16) ConcatPermute(y Int8x16, indices Uint8x16) Int8x16
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2B, CPU Feature: AVX512VBMI

			( Int8x16) CopySign(y Int8x16) Int8x16
				CopySign returns the product of the first operand with -1, 0, or 1,
				whichever constant is nearest to the value of the second operand.
				
				Asm: VPSIGNB, CPU Feature: AVX

			( Int8x16) DotProductQuadruple(y Uint8x16) Int32x4
				DotProductQuadruple performs dot products on groups of 4 elements of x and y.
				DotProductQuadruple(x, y).Add(z) will be optimized to the full form of the underlying instruction.
				
				Asm: VPDPBUSD, CPU Feature: AVXVNNI

			( Int8x16) DotProductQuadrupleSaturated(y Uint8x16) Int32x4
				DotProductQuadrupleSaturated multiplies performs dot products on groups of 4 elements of x and y.
				DotProductQuadrupleSaturated(x, y).Add(z) will be optimized to the full form of the underlying instruction.
				
				Asm: VPDPBUSDS, CPU Feature: AVXVNNI

			( Int8x16) Equal(y Int8x16) Mask8x16
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQB, CPU Feature: AVX

			( Int8x16) Expand(mask Mask8x16) Int8x16
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDB, CPU Feature: AVX512VBMI2

			( Int8x16) ExtendLo2ToInt64x2() Int64x2
				ExtendLo2ToInt64x2 converts 2 lowest vector element values to int64.
				The result vector's elements are sign-extended.
				
				Asm: VPMOVSXBQ, CPU Feature: AVX

			( Int8x16) ExtendLo4ToInt32x4() Int32x4
				ExtendLo4ToInt32x4 converts 4 lowest vector element values to int32.
				The result vector's elements are sign-extended.
				
				Asm: VPMOVSXBD, CPU Feature: AVX

			( Int8x16) ExtendLo4ToInt64x4() Int64x4
				ExtendLo4ToInt64x4 converts 4 lowest vector element values to int64.
				The result vector's elements are sign-extended.
				
				Asm: VPMOVSXBQ, CPU Feature: AVX2

			( Int8x16) ExtendLo8ToInt16x8() Int16x8
				ExtendLo8ToInt16x8 converts 8 lowest vector element values to int16.
				The result vector's elements are sign-extended.
				
				Asm: VPMOVSXBW, CPU Feature: AVX

			( Int8x16) ExtendLo8ToInt32x8() Int32x8
				ExtendLo8ToInt32x8 converts 8 lowest vector element values to int32.
				The result vector's elements are sign-extended.
				
				Asm: VPMOVSXBD, CPU Feature: AVX2

			( Int8x16) ExtendLo8ToInt64x8() Int64x8
				ExtendLo8ToInt64x8 converts 8 lowest vector element values to int64.
				The result vector's elements are sign-extended.
				
				Asm: VPMOVSXBQ, CPU Feature: AVX512

			( Int8x16) ExtendToInt16() Int16x16
				ExtendToInt16 converts element values to int16.
				The result vector's elements are sign-extended.
				
				Asm: VPMOVSXBW, CPU Feature: AVX2

			( Int8x16) ExtendToInt32() Int32x16
				ExtendToInt32 converts element values to int32.
				The result vector's elements are sign-extended.
				
				Asm: VPMOVSXBD, CPU Feature: AVX512

			( Int8x16) GetElem(index uint8) int8
				GetElem retrieves a single constant-indexed element's value.
				
				index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPEXTRB, CPU Feature: AVX512

			( Int8x16) Greater(y Int8x16) Mask8x16
				Greater returns x greater-than y, elementwise.
				
				Asm: VPCMPGTB, CPU Feature: AVX

			( Int8x16) GreaterEqual(y Int8x16) Mask8x16
				GreaterEqual returns a mask whose elements indicate whether x >= y
				
				Emulated, CPU Feature AVX

			( Int8x16) IsZero() bool
				IsZero returns true if all elements of x are zeros.
				
				This method compiles to VPTEST x, x.
				x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
				
				Asm: VPTEST, CPU Feature: AVX

			( Int8x16) Len() int
				Len returns the number of elements in a Int8x16

			( Int8x16) Less(y Int8x16) Mask8x16
				Less returns a mask whose elements indicate whether x < y
				
				Emulated, CPU Feature AVX

			( Int8x16) LessEqual(y Int8x16) Mask8x16
				LessEqual returns a mask whose elements indicate whether x <= y
				
				Emulated, CPU Feature AVX

			( Int8x16) Masked(mask Mask8x16) Int8x16
				Masked returns x but with elements zeroed where mask is false.

			( Int8x16) Max(y Int8x16) Int8x16
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXSB, CPU Feature: AVX

			( Int8x16) Merge(y Int8x16, mask Mask8x16) Int8x16
				Merge returns x but with elements set to y where mask is false.

			( Int8x16) Min(y Int8x16) Int8x16
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINSB, CPU Feature: AVX

			( Int8x16) Not() Int8x16
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX

			( Int8x16) NotEqual(y Int8x16) Mask8x16
				NotEqual returns a mask whose elements indicate whether x != y
				
				Emulated, CPU Feature AVX

			( Int8x16) OnesCount() Int8x16
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTB, CPU Feature: AVX512BITALG

			( Int8x16) Or(y Int8x16) Int8x16
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPOR, CPU Feature: AVX

			( Int8x16) Permute(indices Uint8x16) Int8x16
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 4 bits (values 0-15) of each element of indices is used
				
				Asm: VPERMB, CPU Feature: AVX512VBMI

			( Int8x16) PermuteOrZero(indices Int8x16) Int8x16
				PermuteOrZero performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The lower four bits of each byte-sized index in indices select an element from x,
				unless the index's sign bit is set in which case zero is used instead.
				
				Asm: VPSHUFB, CPU Feature: AVX

			( Int8x16) SetElem(index uint8, y int8) Int8x16
				SetElem sets a single constant-indexed element's value.
				
				index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPINSRB, CPU Feature: AVX

			( Int8x16) Store(y *[16]int8)
				Store stores a Int8x16 to an array

			( Int8x16) StoreSlice(s []int8)
				StoreSlice stores x into a slice of at least 16 int8s

			( Int8x16) StoreSlicePart(s []int8)
				StoreSlicePart stores the elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 16 or more elements, the method is equivalent to x.StoreSlice.

			( Int8x16) String() string
				String returns a string representation of SIMD vector x

			( Int8x16) Sub(y Int8x16) Int8x16
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBB, CPU Feature: AVX

			( Int8x16) SubSaturated(y Int8x16) Int8x16
				SubSaturated subtracts corresponding elements of two vectors with saturation.
				
				Asm: VPSUBSB, CPU Feature: AVX

			( Int8x16) ToMask() (to Mask8x16)
				ToMask converts from Int8x16 to Mask8x16, mask element is set to true when the corresponding vector element is non-zero.

			( Int8x16) Xor(y Int8x16) Int8x16
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXOR, CPU Feature: AVX

		Implements (at least 2)
			 Int8x16 : expvar.Var
			 Int8x16 : fmt.Stringer
		As Outputs Of (at least 63)
			func BroadcastInt8x16(x int8) Int8x16
			func LoadInt8x16(y *[16]int8) Int8x16
			func LoadInt8x16Slice(s []int8) Int8x16
			func LoadInt8x16SlicePart(s []int8) Int8x16
			func Float32x4.AsInt8x16() (to Int8x16)
			func Float64x2.AsInt8x16() (to Int8x16)
			func Int16x16.SaturateToInt8() Int8x16
			func Int16x16.SaturateToUint8() Int8x16
			func Int16x16.TruncateToInt8() Int8x16
			func Int16x8.AsInt8x16() (to Int8x16)
			func Int16x8.SaturateToInt8() Int8x16
			func Int16x8.SaturateToUint8() Int8x16
			func Int16x8.TruncateToInt8() Int8x16
			func Int32x16.SaturateToInt8() Int8x16
			func Int32x16.SaturateToUint8() Int8x16
			func Int32x16.TruncateToInt8() Int8x16
			func Int32x4.AsInt8x16() (to Int8x16)
			func Int32x4.SaturateToInt8() Int8x16
			func Int32x4.SaturateToUint8() Int8x16
			func Int32x4.TruncateToInt8() Int8x16
			func Int32x8.SaturateToInt8() Int8x16
			func Int32x8.SaturateToUint8() Int8x16
			func Int32x8.TruncateToInt8() Int8x16
			func Int64x2.AsInt8x16() (to Int8x16)
			func Int64x2.SaturateToInt8() Int8x16
			func Int64x2.SaturateToUint8() Int8x16
			func Int64x2.TruncateToInt8() Int8x16
			func Int64x4.SaturateToInt8() Int8x16
			func Int64x4.SaturateToUint8() Int8x16
			func Int64x4.TruncateToInt8() Int8x16
			func Int64x8.SaturateToInt8() Int8x16
			func Int64x8.SaturateToUint8() Int8x16
			func Int64x8.TruncateToInt8() Int8x16
			func Int8x16.Abs() Int8x16
			func Int8x16.Add(y Int8x16) Int8x16
			func Int8x16.AddSaturated(y Int8x16) Int8x16
			func Int8x16.And(y Int8x16) Int8x16
			func Int8x16.AndNot(y Int8x16) Int8x16
			func Int8x16.Broadcast128() Int8x16
			func Int8x16.Compress(mask Mask8x16) Int8x16
			func Int8x16.ConcatPermute(y Int8x16, indices Uint8x16) Int8x16
			func Int8x16.CopySign(y Int8x16) Int8x16
			func Int8x16.Expand(mask Mask8x16) Int8x16
			func Int8x16.Masked(mask Mask8x16) Int8x16
			func Int8x16.Max(y Int8x16) Int8x16
			func Int8x16.Merge(y Int8x16, mask Mask8x16) Int8x16
			func Int8x16.Min(y Int8x16) Int8x16
			func Int8x16.Not() Int8x16
			func Int8x16.OnesCount() Int8x16
			func Int8x16.Or(y Int8x16) Int8x16
			func Int8x16.Permute(indices Uint8x16) Int8x16
			func Int8x16.PermuteOrZero(indices Int8x16) Int8x16
			func Int8x16.SetElem(index uint8, y int8) Int8x16
			func Int8x16.Sub(y Int8x16) Int8x16
			func Int8x16.SubSaturated(y Int8x16) Int8x16
			func Int8x16.Xor(y Int8x16) Int8x16
			func Int8x32.GetHi() Int8x16
			func Int8x32.GetLo() Int8x16
			func Mask8x16.ToInt8x16() (to Int8x16)
			func Uint16x8.AsInt8x16() (to Int8x16)
			func Uint32x4.AsInt8x16() (to Int8x16)
			func Uint64x2.AsInt8x16() (to Int8x16)
			func Uint8x16.AsInt8x16() (to Int8x16)
		As Inputs Of (at least 24)
			func Int8x16.Add(y Int8x16) Int8x16
			func Int8x16.AddSaturated(y Int8x16) Int8x16
			func Int8x16.And(y Int8x16) Int8x16
			func Int8x16.AndNot(y Int8x16) Int8x16
			func Int8x16.ConcatPermute(y Int8x16, indices Uint8x16) Int8x16
			func Int8x16.CopySign(y Int8x16) Int8x16
			func Int8x16.Equal(y Int8x16) Mask8x16
			func Int8x16.Greater(y Int8x16) Mask8x16
			func Int8x16.GreaterEqual(y Int8x16) Mask8x16
			func Int8x16.Less(y Int8x16) Mask8x16
			func Int8x16.LessEqual(y Int8x16) Mask8x16
			func Int8x16.Max(y Int8x16) Int8x16
			func Int8x16.Merge(y Int8x16, mask Mask8x16) Int8x16
			func Int8x16.Min(y Int8x16) Int8x16
			func Int8x16.NotEqual(y Int8x16) Mask8x16
			func Int8x16.Or(y Int8x16) Int8x16
			func Int8x16.PermuteOrZero(indices Int8x16) Int8x16
			func Int8x16.Sub(y Int8x16) Int8x16
			func Int8x16.SubSaturated(y Int8x16) Int8x16
			func Int8x16.Xor(y Int8x16) Int8x16
			func Int8x32.SetHi(y Int8x16) Int8x32
			func Int8x32.SetLo(y Int8x16) Int8x32
			func Uint8x16.DotProductPairsSaturated(y Int8x16) Int16x8
			func Uint8x16.PermuteOrZero(indices Int8x16) Uint8x16

	 type Int8x32 (struct)
		Int8x32 is a 256-bit SIMD vector of 32 int8

		Methods (total 51)
			( Int8x32) Abs() Int8x32
				Abs computes the absolute value of each element.
				
				Asm: VPABSB, CPU Feature: AVX2

			( Int8x32) Add(y Int8x32) Int8x32
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDB, CPU Feature: AVX2

			( Int8x32) AddSaturated(y Int8x32) Int8x32
				AddSaturated adds corresponding elements of two vectors with saturation.
				
				Asm: VPADDSB, CPU Feature: AVX2

			( Int8x32) And(y Int8x32) Int8x32
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPAND, CPU Feature: AVX2

			( Int8x32) AndNot(y Int8x32) Int8x32
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDN, CPU Feature: AVX2

			( Int8x32) AsFloat32x8() (to Float32x8)
				Float32x8 converts from Int8x32 to Float32x8

			( Int8x32) AsFloat64x4() (to Float64x4)
				Float64x4 converts from Int8x32 to Float64x4

			( Int8x32) AsInt16x16() (to Int16x16)
				Int16x16 converts from Int8x32 to Int16x16

			( Int8x32) AsInt32x8() (to Int32x8)
				Int32x8 converts from Int8x32 to Int32x8

			( Int8x32) AsInt64x4() (to Int64x4)
				Int64x4 converts from Int8x32 to Int64x4

			( Int8x32) AsUint16x16() (to Uint16x16)
				Uint16x16 converts from Int8x32 to Uint16x16

			( Int8x32) AsUint32x8() (to Uint32x8)
				Uint32x8 converts from Int8x32 to Uint32x8

			( Int8x32) AsUint64x4() (to Uint64x4)
				Uint64x4 converts from Int8x32 to Uint64x4

			( Int8x32) AsUint8x32() (to Uint8x32)
				Uint8x32 converts from Int8x32 to Uint8x32

			( Int8x32) Compress(mask Mask8x32) Int8x32
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2

			( Int8x32) ConcatPermute(y Int8x32, indices Uint8x32) Int8x32
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2B, CPU Feature: AVX512VBMI

			( Int8x32) CopySign(y Int8x32) Int8x32
				CopySign returns the product of the first operand with -1, 0, or 1,
				whichever constant is nearest to the value of the second operand.
				
				Asm: VPSIGNB, CPU Feature: AVX2

			( Int8x32) DotProductQuadruple(y Uint8x32) Int32x8
				DotProductQuadruple performs dot products on groups of 4 elements of x and y.
				DotProductQuadruple(x, y).Add(z) will be optimized to the full form of the underlying instruction.
				
				Asm: VPDPBUSD, CPU Feature: AVXVNNI

			( Int8x32) DotProductQuadrupleSaturated(y Uint8x32) Int32x8
				DotProductQuadrupleSaturated multiplies performs dot products on groups of 4 elements of x and y.
				DotProductQuadrupleSaturated(x, y).Add(z) will be optimized to the full form of the underlying instruction.
				
				Asm: VPDPBUSDS, CPU Feature: AVXVNNI

			( Int8x32) Equal(y Int8x32) Mask8x32
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQB, CPU Feature: AVX2

			( Int8x32) Expand(mask Mask8x32) Int8x32
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDB, CPU Feature: AVX512VBMI2

			( Int8x32) ExtendToInt16() Int16x32
				ExtendToInt16 converts element values to int16.
				The result vector's elements are sign-extended.
				
				Asm: VPMOVSXBW, CPU Feature: AVX512

			( Int8x32) GetHi() Int8x16
				GetHi returns the upper half of x.
				
				Asm: VEXTRACTI128, CPU Feature: AVX2

			( Int8x32) GetLo() Int8x16
				GetLo returns the lower half of x.
				
				Asm: VEXTRACTI128, CPU Feature: AVX2

			( Int8x32) Greater(y Int8x32) Mask8x32
				Greater returns x greater-than y, elementwise.
				
				Asm: VPCMPGTB, CPU Feature: AVX2

			( Int8x32) GreaterEqual(y Int8x32) Mask8x32
				GreaterEqual returns a mask whose elements indicate whether x >= y
				
				Emulated, CPU Feature AVX2

			( Int8x32) IsZero() bool
				IsZero returns true if all elements of x are zeros.
				
				This method compiles to VPTEST x, x.
				x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
				
				Asm: VPTEST, CPU Feature: AVX

			( Int8x32) Len() int
				Len returns the number of elements in a Int8x32

			( Int8x32) Less(y Int8x32) Mask8x32
				Less returns a mask whose elements indicate whether x < y
				
				Emulated, CPU Feature AVX2

			( Int8x32) LessEqual(y Int8x32) Mask8x32
				LessEqual returns a mask whose elements indicate whether x <= y
				
				Emulated, CPU Feature AVX2

			( Int8x32) Masked(mask Mask8x32) Int8x32
				Masked returns x but with elements zeroed where mask is false.

			( Int8x32) Max(y Int8x32) Int8x32
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXSB, CPU Feature: AVX2

			( Int8x32) Merge(y Int8x32, mask Mask8x32) Int8x32
				Merge returns x but with elements set to y where mask is false.

			( Int8x32) Min(y Int8x32) Int8x32
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINSB, CPU Feature: AVX2

			( Int8x32) Not() Int8x32
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX2

			( Int8x32) NotEqual(y Int8x32) Mask8x32
				NotEqual returns a mask whose elements indicate whether x != y
				
				Emulated, CPU Feature AVX2

			( Int8x32) OnesCount() Int8x32
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTB, CPU Feature: AVX512BITALG

			( Int8x32) Or(y Int8x32) Int8x32
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPOR, CPU Feature: AVX2

			( Int8x32) Permute(indices Uint8x32) Int8x32
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 5 bits (values 0-31) of each element of indices is used
				
				Asm: VPERMB, CPU Feature: AVX512VBMI

			( Int8x32) PermuteOrZeroGrouped(indices Int8x32) Int8x32
				PermuteOrZeroGrouped performs a grouped permutation of vector x using indices:
				result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...}
				The lower four bits of each byte-sized index in indices select an element from its corresponding group in x,
				unless the index's sign bit is set in which case zero is used instead.
				Each group is of size 128-bit.
				
				Asm: VPSHUFB, CPU Feature: AVX2

			( Int8x32) Select128FromPair(lo, hi uint8, y Int8x32) Int8x32
				Select128FromPair treats the 256-bit vectors x and y as a single vector of four
				128-bit elements, and returns a 256-bit result formed by
				concatenating the two elements specified by lo and hi.
				For example,
				
					{0x40, 0x41, ..., 0x4f, 0x50, 0x51, ..., 0x5f}.Select128FromPair(3, 0,
					     {0x60, 0x61, ..., 0x6f, 0x70, 0x71, ..., 0x7f})
				
				returns {0x70, 0x71, ..., 0x7f, 0x40, 0x41, ..., 0x4f}.
				
				lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table.
				lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
				
				Asm: VPERM2I128, CPU Feature: AVX2

			( Int8x32) SetHi(y Int8x16) Int8x32
				SetHi returns x with its upper half set to y.
				
				Asm: VINSERTI128, CPU Feature: AVX2

			( Int8x32) SetLo(y Int8x16) Int8x32
				SetLo returns x with its lower half set to y.
				
				Asm: VINSERTI128, CPU Feature: AVX2

			( Int8x32) Store(y *[32]int8)
				Store stores a Int8x32 to an array

			( Int8x32) StoreSlice(s []int8)
				StoreSlice stores x into a slice of at least 32 int8s

			( Int8x32) StoreSlicePart(s []int8)
				StoreSlicePart stores the elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 32 or more elements, the method is equivalent to x.StoreSlice.

			( Int8x32) String() string
				String returns a string representation of SIMD vector x

			( Int8x32) Sub(y Int8x32) Int8x32
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBB, CPU Feature: AVX2

			( Int8x32) SubSaturated(y Int8x32) Int8x32
				SubSaturated subtracts corresponding elements of two vectors with saturation.
				
				Asm: VPSUBSB, CPU Feature: AVX2

			( Int8x32) ToMask() (to Mask8x32)
				ToMask converts from Int8x32 to Mask8x32, mask element is set to true when the corresponding vector element is non-zero.

			( Int8x32) Xor(y Int8x32) Int8x32
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXOR, CPU Feature: AVX2

		Implements (at least 2)
			 Int8x32 : expvar.Var
			 Int8x32 : fmt.Stringer
		As Outputs Of (at least 43)
			func BroadcastInt8x32(x int8) Int8x32
			func LoadInt8x32(y *[32]int8) Int8x32
			func LoadInt8x32Slice(s []int8) Int8x32
			func LoadInt8x32SlicePart(s []int8) Int8x32
			func Float32x8.AsInt8x32() (to Int8x32)
			func Float64x4.AsInt8x32() (to Int8x32)
			func Int16x16.AsInt8x32() (to Int8x32)
			func Int16x32.SaturateToInt8() Int8x32
			func Int16x32.TruncateToInt8() Int8x32
			func Int32x8.AsInt8x32() (to Int8x32)
			func Int64x4.AsInt8x32() (to Int8x32)
			func Int8x16.Broadcast256() Int8x32
			func Int8x32.Abs() Int8x32
			func Int8x32.Add(y Int8x32) Int8x32
			func Int8x32.AddSaturated(y Int8x32) Int8x32
			func Int8x32.And(y Int8x32) Int8x32
			func Int8x32.AndNot(y Int8x32) Int8x32
			func Int8x32.Compress(mask Mask8x32) Int8x32
			func Int8x32.ConcatPermute(y Int8x32, indices Uint8x32) Int8x32
			func Int8x32.CopySign(y Int8x32) Int8x32
			func Int8x32.Expand(mask Mask8x32) Int8x32
			func Int8x32.Masked(mask Mask8x32) Int8x32
			func Int8x32.Max(y Int8x32) Int8x32
			func Int8x32.Merge(y Int8x32, mask Mask8x32) Int8x32
			func Int8x32.Min(y Int8x32) Int8x32
			func Int8x32.Not() Int8x32
			func Int8x32.OnesCount() Int8x32
			func Int8x32.Or(y Int8x32) Int8x32
			func Int8x32.Permute(indices Uint8x32) Int8x32
			func Int8x32.PermuteOrZeroGrouped(indices Int8x32) Int8x32
			func Int8x32.Select128FromPair(lo, hi uint8, y Int8x32) Int8x32
			func Int8x32.SetHi(y Int8x16) Int8x32
			func Int8x32.SetLo(y Int8x16) Int8x32
			func Int8x32.Sub(y Int8x32) Int8x32
			func Int8x32.SubSaturated(y Int8x32) Int8x32
			func Int8x32.Xor(y Int8x32) Int8x32
			func Int8x64.GetHi() Int8x32
			func Int8x64.GetLo() Int8x32
			func Mask8x32.ToInt8x32() (to Int8x32)
			func Uint16x16.AsInt8x32() (to Int8x32)
			func Uint32x8.AsInt8x32() (to Int8x32)
			func Uint64x4.AsInt8x32() (to Int8x32)
			func Uint8x32.AsInt8x32() (to Int8x32)
		As Inputs Of (at least 25)
			func Int8x32.Add(y Int8x32) Int8x32
			func Int8x32.AddSaturated(y Int8x32) Int8x32
			func Int8x32.And(y Int8x32) Int8x32
			func Int8x32.AndNot(y Int8x32) Int8x32
			func Int8x32.ConcatPermute(y Int8x32, indices Uint8x32) Int8x32
			func Int8x32.CopySign(y Int8x32) Int8x32
			func Int8x32.Equal(y Int8x32) Mask8x32
			func Int8x32.Greater(y Int8x32) Mask8x32
			func Int8x32.GreaterEqual(y Int8x32) Mask8x32
			func Int8x32.Less(y Int8x32) Mask8x32
			func Int8x32.LessEqual(y Int8x32) Mask8x32
			func Int8x32.Max(y Int8x32) Int8x32
			func Int8x32.Merge(y Int8x32, mask Mask8x32) Int8x32
			func Int8x32.Min(y Int8x32) Int8x32
			func Int8x32.NotEqual(y Int8x32) Mask8x32
			func Int8x32.Or(y Int8x32) Int8x32
			func Int8x32.PermuteOrZeroGrouped(indices Int8x32) Int8x32
			func Int8x32.Select128FromPair(lo, hi uint8, y Int8x32) Int8x32
			func Int8x32.Sub(y Int8x32) Int8x32
			func Int8x32.SubSaturated(y Int8x32) Int8x32
			func Int8x32.Xor(y Int8x32) Int8x32
			func Int8x64.SetHi(y Int8x32) Int8x64
			func Int8x64.SetLo(y Int8x32) Int8x64
			func Uint8x32.DotProductPairsSaturated(y Int8x32) Int16x16
			func Uint8x32.PermuteOrZeroGrouped(indices Int8x32) Uint8x32

	 type Int8x64 (struct)
		Int8x64 is a 512-bit SIMD vector of 64 int8

		Methods (total 48)
			( Int8x64) Abs() Int8x64
				Abs computes the absolute value of each element.
				
				Asm: VPABSB, CPU Feature: AVX512

			( Int8x64) Add(y Int8x64) Int8x64
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDB, CPU Feature: AVX512

			( Int8x64) AddSaturated(y Int8x64) Int8x64
				AddSaturated adds corresponding elements of two vectors with saturation.
				
				Asm: VPADDSB, CPU Feature: AVX512

			( Int8x64) And(y Int8x64) Int8x64
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPANDD, CPU Feature: AVX512

			( Int8x64) AndNot(y Int8x64) Int8x64
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDND, CPU Feature: AVX512

			( Int8x64) AsFloat32x16() (to Float32x16)
				Float32x16 converts from Int8x64 to Float32x16

			( Int8x64) AsFloat64x8() (to Float64x8)
				Float64x8 converts from Int8x64 to Float64x8

			( Int8x64) AsInt16x32() (to Int16x32)
				Int16x32 converts from Int8x64 to Int16x32

			( Int8x64) AsInt32x16() (to Int32x16)
				Int32x16 converts from Int8x64 to Int32x16

			( Int8x64) AsInt64x8() (to Int64x8)
				Int64x8 converts from Int8x64 to Int64x8

			( Int8x64) AsUint16x32() (to Uint16x32)
				Uint16x32 converts from Int8x64 to Uint16x32

			( Int8x64) AsUint32x16() (to Uint32x16)
				Uint32x16 converts from Int8x64 to Uint32x16

			( Int8x64) AsUint64x8() (to Uint64x8)
				Uint64x8 converts from Int8x64 to Uint64x8

			( Int8x64) AsUint8x64() (to Uint8x64)
				Uint8x64 converts from Int8x64 to Uint8x64

			( Int8x64) Compress(mask Mask8x64) Int8x64
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2

			( Int8x64) ConcatPermute(y Int8x64, indices Uint8x64) Int8x64
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2B, CPU Feature: AVX512VBMI

			( Int8x64) DotProductQuadruple(y Uint8x64) Int32x16
				DotProductQuadruple performs dot products on groups of 4 elements of x and y.
				DotProductQuadruple(x, y).Add(z) will be optimized to the full form of the underlying instruction.
				
				Asm: VPDPBUSD, CPU Feature: AVX512VNNI

			( Int8x64) DotProductQuadrupleSaturated(y Uint8x64) Int32x16
				DotProductQuadrupleSaturated multiplies performs dot products on groups of 4 elements of x and y.
				DotProductQuadrupleSaturated(x, y).Add(z) will be optimized to the full form of the underlying instruction.
				
				Asm: VPDPBUSDS, CPU Feature: AVX512VNNI

			( Int8x64) Equal(y Int8x64) Mask8x64
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQB, CPU Feature: AVX512

			( Int8x64) Expand(mask Mask8x64) Int8x64
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDB, CPU Feature: AVX512VBMI2

			( Int8x64) GetHi() Int8x32
				GetHi returns the upper half of x.
				
				Asm: VEXTRACTI64X4, CPU Feature: AVX512

			( Int8x64) GetLo() Int8x32
				GetLo returns the lower half of x.
				
				Asm: VEXTRACTI64X4, CPU Feature: AVX512

			( Int8x64) Greater(y Int8x64) Mask8x64
				Greater returns x greater-than y, elementwise.
				
				Asm: VPCMPGTB, CPU Feature: AVX512

			( Int8x64) GreaterEqual(y Int8x64) Mask8x64
				GreaterEqual returns x greater-than-or-equals y, elementwise.
				
				Asm: VPCMPB, CPU Feature: AVX512

			( Int8x64) Len() int
				Len returns the number of elements in a Int8x64

			( Int8x64) Less(y Int8x64) Mask8x64
				Less returns x less-than y, elementwise.
				
				Asm: VPCMPB, CPU Feature: AVX512

			( Int8x64) LessEqual(y Int8x64) Mask8x64
				LessEqual returns x less-than-or-equals y, elementwise.
				
				Asm: VPCMPB, CPU Feature: AVX512

			( Int8x64) Masked(mask Mask8x64) Int8x64
				Masked returns x but with elements zeroed where mask is false.

			( Int8x64) Max(y Int8x64) Int8x64
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXSB, CPU Feature: AVX512

			( Int8x64) Merge(y Int8x64, mask Mask8x64) Int8x64
				Merge returns x but with elements set to y where m is false.

			( Int8x64) Min(y Int8x64) Int8x64
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINSB, CPU Feature: AVX512

			( Int8x64) Not() Int8x64
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX512

			( Int8x64) NotEqual(y Int8x64) Mask8x64
				NotEqual returns x not-equals y, elementwise.
				
				Asm: VPCMPB, CPU Feature: AVX512

			( Int8x64) OnesCount() Int8x64
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTB, CPU Feature: AVX512BITALG

			( Int8x64) Or(y Int8x64) Int8x64
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPORD, CPU Feature: AVX512

			( Int8x64) Permute(indices Uint8x64) Int8x64
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 6 bits (values 0-63) of each element of indices is used
				
				Asm: VPERMB, CPU Feature: AVX512VBMI

			( Int8x64) PermuteOrZeroGrouped(indices Int8x64) Int8x64
				PermuteOrZeroGrouped performs a grouped permutation of vector x using indices:
				result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...}
				The lower four bits of each byte-sized index in indices select an element from its corresponding group in x,
				unless the index's sign bit is set in which case zero is used instead.
				Each group is of size 128-bit.
				
				Asm: VPSHUFB, CPU Feature: AVX512

			( Int8x64) SetHi(y Int8x32) Int8x64
				SetHi returns x with its upper half set to y.
				
				Asm: VINSERTI64X4, CPU Feature: AVX512

			( Int8x64) SetLo(y Int8x32) Int8x64
				SetLo returns x with its lower half set to y.
				
				Asm: VINSERTI64X4, CPU Feature: AVX512

			( Int8x64) Store(y *[64]int8)
				Store stores a Int8x64 to an array

			( Int8x64) StoreMasked(y *[64]int8, mask Mask8x64)
				StoreMasked stores a Int8x64 to an array,
				at those elements enabled by mask
				
				Asm: VMOVDQU8, CPU Feature: AVX512

			( Int8x64) StoreSlice(s []int8)
				StoreSlice stores x into a slice of at least 64 int8s

			( Int8x64) StoreSlicePart(s []int8)
				StoreSlicePart stores the 64 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 64 or more elements, the method is equivalent to x.StoreSlice.

			( Int8x64) String() string
				String returns a string representation of SIMD vector x

			( Int8x64) Sub(y Int8x64) Int8x64
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBB, CPU Feature: AVX512

			( Int8x64) SubSaturated(y Int8x64) Int8x64
				SubSaturated subtracts corresponding elements of two vectors with saturation.
				
				Asm: VPSUBSB, CPU Feature: AVX512

			( Int8x64) ToMask() (to Mask8x64)
				ToMask converts from Int8x64 to Mask8x64, mask element is set to true when the corresponding vector element is non-zero.

			( Int8x64) Xor(y Int8x64) Int8x64
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXORD, CPU Feature: AVX512

		Implements (at least 2)
			 Int8x64 : expvar.Var
			 Int8x64 : fmt.Stringer
		As Outputs Of (at least 38)
			func BroadcastInt8x64(x int8) Int8x64
			func LoadInt8x64(y *[64]int8) Int8x64
			func LoadInt8x64Slice(s []int8) Int8x64
			func LoadInt8x64SlicePart(s []int8) Int8x64
			func LoadMaskedInt8x64(y *[64]int8, mask Mask8x64) Int8x64
			func Float32x16.AsInt8x64() (to Int8x64)
			func Float64x8.AsInt8x64() (to Int8x64)
			func Int16x32.AsInt8x64() (to Int8x64)
			func Int32x16.AsInt8x64() (to Int8x64)
			func Int64x8.AsInt8x64() (to Int8x64)
			func Int8x16.Broadcast512() Int8x64
			func Int8x64.Abs() Int8x64
			func Int8x64.Add(y Int8x64) Int8x64
			func Int8x64.AddSaturated(y Int8x64) Int8x64
			func Int8x64.And(y Int8x64) Int8x64
			func Int8x64.AndNot(y Int8x64) Int8x64
			func Int8x64.Compress(mask Mask8x64) Int8x64
			func Int8x64.ConcatPermute(y Int8x64, indices Uint8x64) Int8x64
			func Int8x64.Expand(mask Mask8x64) Int8x64
			func Int8x64.Masked(mask Mask8x64) Int8x64
			func Int8x64.Max(y Int8x64) Int8x64
			func Int8x64.Merge(y Int8x64, mask Mask8x64) Int8x64
			func Int8x64.Min(y Int8x64) Int8x64
			func Int8x64.Not() Int8x64
			func Int8x64.OnesCount() Int8x64
			func Int8x64.Or(y Int8x64) Int8x64
			func Int8x64.Permute(indices Uint8x64) Int8x64
			func Int8x64.PermuteOrZeroGrouped(indices Int8x64) Int8x64
			func Int8x64.SetHi(y Int8x32) Int8x64
			func Int8x64.SetLo(y Int8x32) Int8x64
			func Int8x64.Sub(y Int8x64) Int8x64
			func Int8x64.SubSaturated(y Int8x64) Int8x64
			func Int8x64.Xor(y Int8x64) Int8x64
			func Mask8x64.ToInt8x64() (to Int8x64)
			func Uint16x32.AsInt8x64() (to Int8x64)
			func Uint32x16.AsInt8x64() (to Int8x64)
			func Uint64x8.AsInt8x64() (to Int8x64)
			func Uint8x64.AsInt8x64() (to Int8x64)
		As Inputs Of (at least 21)
			func Int8x64.Add(y Int8x64) Int8x64
			func Int8x64.AddSaturated(y Int8x64) Int8x64
			func Int8x64.And(y Int8x64) Int8x64
			func Int8x64.AndNot(y Int8x64) Int8x64
			func Int8x64.ConcatPermute(y Int8x64, indices Uint8x64) Int8x64
			func Int8x64.Equal(y Int8x64) Mask8x64
			func Int8x64.Greater(y Int8x64) Mask8x64
			func Int8x64.GreaterEqual(y Int8x64) Mask8x64
			func Int8x64.Less(y Int8x64) Mask8x64
			func Int8x64.LessEqual(y Int8x64) Mask8x64
			func Int8x64.Max(y Int8x64) Int8x64
			func Int8x64.Merge(y Int8x64, mask Mask8x64) Int8x64
			func Int8x64.Min(y Int8x64) Int8x64
			func Int8x64.NotEqual(y Int8x64) Mask8x64
			func Int8x64.Or(y Int8x64) Int8x64
			func Int8x64.PermuteOrZeroGrouped(indices Int8x64) Int8x64
			func Int8x64.Sub(y Int8x64) Int8x64
			func Int8x64.SubSaturated(y Int8x64) Int8x64
			func Int8x64.Xor(y Int8x64) Int8x64
			func Uint8x64.DotProductPairsSaturated(y Int8x64) Int16x32
			func Uint8x64.PermuteOrZeroGrouped(indices Int8x64) Uint8x64

	 type Mask16x16 (struct)
		Mask16x16 is a 256-bit SIMD vector of 16 int16

		Methods (total 4)
			( Mask16x16) And(y Mask16x16) Mask16x16
			( Mask16x16) Or(y Mask16x16) Mask16x16
			( Mask16x16) ToBits() uint16
				ToBits constructs a bitmap from a Mask16x16, where 1 means set for the indexed element, 0 means unset.
				
				Asm: KMOVW, CPU Features: AVX512

			( Mask16x16) ToInt16x16() (to Int16x16)
				ToInt16x16 converts from Mask16x16 to Int16x16

		As Outputs Of (at least 16)
			func Mask16x16FromBits(y uint16) Mask16x16
			func Int16x16.Equal(y Int16x16) Mask16x16
			func Int16x16.Greater(y Int16x16) Mask16x16
			func Int16x16.GreaterEqual(y Int16x16) Mask16x16
			func Int16x16.Less(y Int16x16) Mask16x16
			func Int16x16.LessEqual(y Int16x16) Mask16x16
			func Int16x16.NotEqual(y Int16x16) Mask16x16
			func Int16x16.ToMask() (to Mask16x16)
			func Mask16x16.And(y Mask16x16) Mask16x16
			func Mask16x16.Or(y Mask16x16) Mask16x16
			func Uint16x16.Equal(y Uint16x16) Mask16x16
			func Uint16x16.Greater(y Uint16x16) Mask16x16
			func Uint16x16.GreaterEqual(y Uint16x16) Mask16x16
			func Uint16x16.Less(y Uint16x16) Mask16x16
			func Uint16x16.LessEqual(y Uint16x16) Mask16x16
			func Uint16x16.NotEqual(y Uint16x16) Mask16x16
		As Inputs Of (at least 10)
			func Int16x16.Compress(mask Mask16x16) Int16x16
			func Int16x16.Expand(mask Mask16x16) Int16x16
			func Int16x16.Masked(mask Mask16x16) Int16x16
			func Int16x16.Merge(y Int16x16, mask Mask16x16) Int16x16
			func Mask16x16.And(y Mask16x16) Mask16x16
			func Mask16x16.Or(y Mask16x16) Mask16x16
			func Uint16x16.Compress(mask Mask16x16) Uint16x16
			func Uint16x16.Expand(mask Mask16x16) Uint16x16
			func Uint16x16.Masked(mask Mask16x16) Uint16x16
			func Uint16x16.Merge(y Uint16x16, mask Mask16x16) Uint16x16

	 type Mask16x32 (struct)
		Mask16x32 is a 512-bit SIMD vector of 32 int16

		Methods (total 4)
			( Mask16x32) And(y Mask16x32) Mask16x32
			( Mask16x32) Or(y Mask16x32) Mask16x32
			( Mask16x32) ToBits() uint32
				ToBits constructs a bitmap from a Mask16x32, where 1 means set for the indexed element, 0 means unset.
				
				Asm: KMOVW, CPU Features: AVX512

			( Mask16x32) ToInt16x32() (to Int16x32)
				ToInt16x32 converts from Mask16x32 to Int16x32

		As Outputs Of (at least 16)
			func Mask16x32FromBits(y uint32) Mask16x32
			func Int16x32.Equal(y Int16x32) Mask16x32
			func Int16x32.Greater(y Int16x32) Mask16x32
			func Int16x32.GreaterEqual(y Int16x32) Mask16x32
			func Int16x32.Less(y Int16x32) Mask16x32
			func Int16x32.LessEqual(y Int16x32) Mask16x32
			func Int16x32.NotEqual(y Int16x32) Mask16x32
			func Int16x32.ToMask() (to Mask16x32)
			func Mask16x32.And(y Mask16x32) Mask16x32
			func Mask16x32.Or(y Mask16x32) Mask16x32
			func Uint16x32.Equal(y Uint16x32) Mask16x32
			func Uint16x32.Greater(y Uint16x32) Mask16x32
			func Uint16x32.GreaterEqual(y Uint16x32) Mask16x32
			func Uint16x32.Less(y Uint16x32) Mask16x32
			func Uint16x32.LessEqual(y Uint16x32) Mask16x32
			func Uint16x32.NotEqual(y Uint16x32) Mask16x32
		As Inputs Of (at least 14)
			func LoadMaskedInt16x32(y *[32]int16, mask Mask16x32) Int16x32
			func LoadMaskedUint16x32(y *[32]uint16, mask Mask16x32) Uint16x32
			func Int16x32.Compress(mask Mask16x32) Int16x32
			func Int16x32.Expand(mask Mask16x32) Int16x32
			func Int16x32.Masked(mask Mask16x32) Int16x32
			func Int16x32.Merge(y Int16x32, mask Mask16x32) Int16x32
			func Int16x32.StoreMasked(y *[32]int16, mask Mask16x32)
			func Mask16x32.And(y Mask16x32) Mask16x32
			func Mask16x32.Or(y Mask16x32) Mask16x32
			func Uint16x32.Compress(mask Mask16x32) Uint16x32
			func Uint16x32.Expand(mask Mask16x32) Uint16x32
			func Uint16x32.Masked(mask Mask16x32) Uint16x32
			func Uint16x32.Merge(y Uint16x32, mask Mask16x32) Uint16x32
			func Uint16x32.StoreMasked(y *[32]uint16, mask Mask16x32)

	 type Mask16x8 (struct)
		Mask16x8 is a 128-bit SIMD vector of 8 int16

		Methods (total 4)
			( Mask16x8) And(y Mask16x8) Mask16x8
			( Mask16x8) Or(y Mask16x8) Mask16x8
			( Mask16x8) ToBits() uint8
				ToBits constructs a bitmap from a Mask16x8, where 1 means set for the indexed element, 0 means unset.
				
				Asm: KMOVW, CPU Features: AVX512

			( Mask16x8) ToInt16x8() (to Int16x8)
				ToInt16x8 converts from Mask16x8 to Int16x8

		As Outputs Of (at least 16)
			func Mask16x8FromBits(y uint8) Mask16x8
			func Int16x8.Equal(y Int16x8) Mask16x8
			func Int16x8.Greater(y Int16x8) Mask16x8
			func Int16x8.GreaterEqual(y Int16x8) Mask16x8
			func Int16x8.Less(y Int16x8) Mask16x8
			func Int16x8.LessEqual(y Int16x8) Mask16x8
			func Int16x8.NotEqual(y Int16x8) Mask16x8
			func Int16x8.ToMask() (to Mask16x8)
			func Mask16x8.And(y Mask16x8) Mask16x8
			func Mask16x8.Or(y Mask16x8) Mask16x8
			func Uint16x8.Equal(y Uint16x8) Mask16x8
			func Uint16x8.Greater(y Uint16x8) Mask16x8
			func Uint16x8.GreaterEqual(y Uint16x8) Mask16x8
			func Uint16x8.Less(y Uint16x8) Mask16x8
			func Uint16x8.LessEqual(y Uint16x8) Mask16x8
			func Uint16x8.NotEqual(y Uint16x8) Mask16x8
		As Inputs Of (at least 10)
			func Int16x8.Compress(mask Mask16x8) Int16x8
			func Int16x8.Expand(mask Mask16x8) Int16x8
			func Int16x8.Masked(mask Mask16x8) Int16x8
			func Int16x8.Merge(y Int16x8, mask Mask16x8) Int16x8
			func Mask16x8.And(y Mask16x8) Mask16x8
			func Mask16x8.Or(y Mask16x8) Mask16x8
			func Uint16x8.Compress(mask Mask16x8) Uint16x8
			func Uint16x8.Expand(mask Mask16x8) Uint16x8
			func Uint16x8.Masked(mask Mask16x8) Uint16x8
			func Uint16x8.Merge(y Uint16x8, mask Mask16x8) Uint16x8

	 type Mask32x16 (struct)
		Mask32x16 is a 512-bit SIMD vector of 16 int32

		Methods (total 4)
			( Mask32x16) And(y Mask32x16) Mask32x16
			( Mask32x16) Or(y Mask32x16) Mask32x16
			( Mask32x16) ToBits() uint16
				ToBits constructs a bitmap from a Mask32x16, where 1 means set for the indexed element, 0 means unset.
				
				Asm: KMOVD, CPU Features: AVX512

			( Mask32x16) ToInt32x16() (to Int32x16)
				ToInt32x16 converts from Mask32x16 to Int32x16

		As Outputs Of (at least 23)
			func Mask32x16FromBits(y uint16) Mask32x16
			func Float32x16.Equal(y Float32x16) Mask32x16
			func Float32x16.Greater(y Float32x16) Mask32x16
			func Float32x16.GreaterEqual(y Float32x16) Mask32x16
			func Float32x16.IsNan(y Float32x16) Mask32x16
			func Float32x16.Less(y Float32x16) Mask32x16
			func Float32x16.LessEqual(y Float32x16) Mask32x16
			func Float32x16.NotEqual(y Float32x16) Mask32x16
			func Int32x16.Equal(y Int32x16) Mask32x16
			func Int32x16.Greater(y Int32x16) Mask32x16
			func Int32x16.GreaterEqual(y Int32x16) Mask32x16
			func Int32x16.Less(y Int32x16) Mask32x16
			func Int32x16.LessEqual(y Int32x16) Mask32x16
			func Int32x16.NotEqual(y Int32x16) Mask32x16
			func Int32x16.ToMask() (to Mask32x16)
			func Mask32x16.And(y Mask32x16) Mask32x16
			func Mask32x16.Or(y Mask32x16) Mask32x16
			func Uint32x16.Equal(y Uint32x16) Mask32x16
			func Uint32x16.Greater(y Uint32x16) Mask32x16
			func Uint32x16.GreaterEqual(y Uint32x16) Mask32x16
			func Uint32x16.Less(y Uint32x16) Mask32x16
			func Uint32x16.LessEqual(y Uint32x16) Mask32x16
			func Uint32x16.NotEqual(y Uint32x16) Mask32x16
		As Inputs Of (at least 20)
			func LoadMaskedFloat32x16(y *[16]float32, mask Mask32x16) Float32x16
			func LoadMaskedInt32x16(y *[16]int32, mask Mask32x16) Int32x16
			func LoadMaskedUint32x16(y *[16]uint32, mask Mask32x16) Uint32x16
			func Float32x16.Compress(mask Mask32x16) Float32x16
			func Float32x16.Expand(mask Mask32x16) Float32x16
			func Float32x16.Masked(mask Mask32x16) Float32x16
			func Float32x16.Merge(y Float32x16, mask Mask32x16) Float32x16
			func Float32x16.StoreMasked(y *[16]float32, mask Mask32x16)
			func Int32x16.Compress(mask Mask32x16) Int32x16
			func Int32x16.Expand(mask Mask32x16) Int32x16
			func Int32x16.Masked(mask Mask32x16) Int32x16
			func Int32x16.Merge(y Int32x16, mask Mask32x16) Int32x16
			func Int32x16.StoreMasked(y *[16]int32, mask Mask32x16)
			func Mask32x16.And(y Mask32x16) Mask32x16
			func Mask32x16.Or(y Mask32x16) Mask32x16
			func Uint32x16.Compress(mask Mask32x16) Uint32x16
			func Uint32x16.Expand(mask Mask32x16) Uint32x16
			func Uint32x16.Masked(mask Mask32x16) Uint32x16
			func Uint32x16.Merge(y Uint32x16, mask Mask32x16) Uint32x16
			func Uint32x16.StoreMasked(y *[16]uint32, mask Mask32x16)

	 type Mask32x4 (struct)
		Mask32x4 is a 128-bit SIMD vector of 4 int32

		Methods (total 4)
			( Mask32x4) And(y Mask32x4) Mask32x4
			( Mask32x4) Or(y Mask32x4) Mask32x4
			( Mask32x4) ToBits() uint8
				ToBits constructs a bitmap from a Mask32x4, where 1 means set for the indexed element, 0 means unset.
				Only the lower 4 bits of y are used.
				
				Asm: KMOVD, CPU Features: AVX512

			( Mask32x4) ToInt32x4() (to Int32x4)
				ToInt32x4 converts from Mask32x4 to Int32x4

		As Outputs Of (at least 23)
			func Mask32x4FromBits(y uint8) Mask32x4
			func Float32x4.Equal(y Float32x4) Mask32x4
			func Float32x4.Greater(y Float32x4) Mask32x4
			func Float32x4.GreaterEqual(y Float32x4) Mask32x4
			func Float32x4.IsNan(y Float32x4) Mask32x4
			func Float32x4.Less(y Float32x4) Mask32x4
			func Float32x4.LessEqual(y Float32x4) Mask32x4
			func Float32x4.NotEqual(y Float32x4) Mask32x4
			func Int32x4.Equal(y Int32x4) Mask32x4
			func Int32x4.Greater(y Int32x4) Mask32x4
			func Int32x4.GreaterEqual(y Int32x4) Mask32x4
			func Int32x4.Less(y Int32x4) Mask32x4
			func Int32x4.LessEqual(y Int32x4) Mask32x4
			func Int32x4.NotEqual(y Int32x4) Mask32x4
			func Int32x4.ToMask() (to Mask32x4)
			func Mask32x4.And(y Mask32x4) Mask32x4
			func Mask32x4.Or(y Mask32x4) Mask32x4
			func Uint32x4.Equal(y Uint32x4) Mask32x4
			func Uint32x4.Greater(y Uint32x4) Mask32x4
			func Uint32x4.GreaterEqual(y Uint32x4) Mask32x4
			func Uint32x4.Less(y Uint32x4) Mask32x4
			func Uint32x4.LessEqual(y Uint32x4) Mask32x4
			func Uint32x4.NotEqual(y Uint32x4) Mask32x4
		As Inputs Of (at least 20)
			func LoadMaskedFloat32x4(y *[4]float32, mask Mask32x4) Float32x4
			func LoadMaskedInt32x4(y *[4]int32, mask Mask32x4) Int32x4
			func LoadMaskedUint32x4(y *[4]uint32, mask Mask32x4) Uint32x4
			func Float32x4.Compress(mask Mask32x4) Float32x4
			func Float32x4.Expand(mask Mask32x4) Float32x4
			func Float32x4.Masked(mask Mask32x4) Float32x4
			func Float32x4.Merge(y Float32x4, mask Mask32x4) Float32x4
			func Float32x4.StoreMasked(y *[4]float32, mask Mask32x4)
			func Int32x4.Compress(mask Mask32x4) Int32x4
			func Int32x4.Expand(mask Mask32x4) Int32x4
			func Int32x4.Masked(mask Mask32x4) Int32x4
			func Int32x4.Merge(y Int32x4, mask Mask32x4) Int32x4
			func Int32x4.StoreMasked(y *[4]int32, mask Mask32x4)
			func Mask32x4.And(y Mask32x4) Mask32x4
			func Mask32x4.Or(y Mask32x4) Mask32x4
			func Uint32x4.Compress(mask Mask32x4) Uint32x4
			func Uint32x4.Expand(mask Mask32x4) Uint32x4
			func Uint32x4.Masked(mask Mask32x4) Uint32x4
			func Uint32x4.Merge(y Uint32x4, mask Mask32x4) Uint32x4
			func Uint32x4.StoreMasked(y *[4]uint32, mask Mask32x4)

	 type Mask32x8 (struct)
		Mask32x8 is a 256-bit SIMD vector of 8 int32

		Methods (total 4)
			( Mask32x8) And(y Mask32x8) Mask32x8
			( Mask32x8) Or(y Mask32x8) Mask32x8
			( Mask32x8) ToBits() uint8
				ToBits constructs a bitmap from a Mask32x8, where 1 means set for the indexed element, 0 means unset.
				
				Asm: KMOVD, CPU Features: AVX512

			( Mask32x8) ToInt32x8() (to Int32x8)
				ToInt32x8 converts from Mask32x8 to Int32x8

		As Outputs Of (at least 23)
			func Mask32x8FromBits(y uint8) Mask32x8
			func Float32x8.Equal(y Float32x8) Mask32x8
			func Float32x8.Greater(y Float32x8) Mask32x8
			func Float32x8.GreaterEqual(y Float32x8) Mask32x8
			func Float32x8.IsNan(y Float32x8) Mask32x8
			func Float32x8.Less(y Float32x8) Mask32x8
			func Float32x8.LessEqual(y Float32x8) Mask32x8
			func Float32x8.NotEqual(y Float32x8) Mask32x8
			func Int32x8.Equal(y Int32x8) Mask32x8
			func Int32x8.Greater(y Int32x8) Mask32x8
			func Int32x8.GreaterEqual(y Int32x8) Mask32x8
			func Int32x8.Less(y Int32x8) Mask32x8
			func Int32x8.LessEqual(y Int32x8) Mask32x8
			func Int32x8.NotEqual(y Int32x8) Mask32x8
			func Int32x8.ToMask() (to Mask32x8)
			func Mask32x8.And(y Mask32x8) Mask32x8
			func Mask32x8.Or(y Mask32x8) Mask32x8
			func Uint32x8.Equal(y Uint32x8) Mask32x8
			func Uint32x8.Greater(y Uint32x8) Mask32x8
			func Uint32x8.GreaterEqual(y Uint32x8) Mask32x8
			func Uint32x8.Less(y Uint32x8) Mask32x8
			func Uint32x8.LessEqual(y Uint32x8) Mask32x8
			func Uint32x8.NotEqual(y Uint32x8) Mask32x8
		As Inputs Of (at least 20)
			func LoadMaskedFloat32x8(y *[8]float32, mask Mask32x8) Float32x8
			func LoadMaskedInt32x8(y *[8]int32, mask Mask32x8) Int32x8
			func LoadMaskedUint32x8(y *[8]uint32, mask Mask32x8) Uint32x8
			func Float32x8.Compress(mask Mask32x8) Float32x8
			func Float32x8.Expand(mask Mask32x8) Float32x8
			func Float32x8.Masked(mask Mask32x8) Float32x8
			func Float32x8.Merge(y Float32x8, mask Mask32x8) Float32x8
			func Float32x8.StoreMasked(y *[8]float32, mask Mask32x8)
			func Int32x8.Compress(mask Mask32x8) Int32x8
			func Int32x8.Expand(mask Mask32x8) Int32x8
			func Int32x8.Masked(mask Mask32x8) Int32x8
			func Int32x8.Merge(y Int32x8, mask Mask32x8) Int32x8
			func Int32x8.StoreMasked(y *[8]int32, mask Mask32x8)
			func Mask32x8.And(y Mask32x8) Mask32x8
			func Mask32x8.Or(y Mask32x8) Mask32x8
			func Uint32x8.Compress(mask Mask32x8) Uint32x8
			func Uint32x8.Expand(mask Mask32x8) Uint32x8
			func Uint32x8.Masked(mask Mask32x8) Uint32x8
			func Uint32x8.Merge(y Uint32x8, mask Mask32x8) Uint32x8
			func Uint32x8.StoreMasked(y *[8]uint32, mask Mask32x8)

	 type Mask64x2 (struct)
		Mask64x2 is a 128-bit SIMD vector of 2 int64

		Methods (total 4)
			( Mask64x2) And(y Mask64x2) Mask64x2
			( Mask64x2) Or(y Mask64x2) Mask64x2
			( Mask64x2) ToBits() uint8
				ToBits constructs a bitmap from a Mask64x2, where 1 means set for the indexed element, 0 means unset.
				Only the lower 2 bits of y are used.
				
				Asm: KMOVQ, CPU Features: AVX512

			( Mask64x2) ToInt64x2() (to Int64x2)
				ToInt64x2 converts from Mask64x2 to Int64x2

		As Outputs Of (at least 23)
			func Mask64x2FromBits(y uint8) Mask64x2
			func Float64x2.Equal(y Float64x2) Mask64x2
			func Float64x2.Greater(y Float64x2) Mask64x2
			func Float64x2.GreaterEqual(y Float64x2) Mask64x2
			func Float64x2.IsNan(y Float64x2) Mask64x2
			func Float64x2.Less(y Float64x2) Mask64x2
			func Float64x2.LessEqual(y Float64x2) Mask64x2
			func Float64x2.NotEqual(y Float64x2) Mask64x2
			func Int64x2.Equal(y Int64x2) Mask64x2
			func Int64x2.Greater(y Int64x2) Mask64x2
			func Int64x2.GreaterEqual(y Int64x2) Mask64x2
			func Int64x2.Less(y Int64x2) Mask64x2
			func Int64x2.LessEqual(y Int64x2) Mask64x2
			func Int64x2.NotEqual(y Int64x2) Mask64x2
			func Int64x2.ToMask() (to Mask64x2)
			func Mask64x2.And(y Mask64x2) Mask64x2
			func Mask64x2.Or(y Mask64x2) Mask64x2
			func Uint64x2.Equal(y Uint64x2) Mask64x2
			func Uint64x2.Greater(y Uint64x2) Mask64x2
			func Uint64x2.GreaterEqual(y Uint64x2) Mask64x2
			func Uint64x2.Less(y Uint64x2) Mask64x2
			func Uint64x2.LessEqual(y Uint64x2) Mask64x2
			func Uint64x2.NotEqual(y Uint64x2) Mask64x2
		As Inputs Of (at least 20)
			func LoadMaskedFloat64x2(y *[2]float64, mask Mask64x2) Float64x2
			func LoadMaskedInt64x2(y *[2]int64, mask Mask64x2) Int64x2
			func LoadMaskedUint64x2(y *[2]uint64, mask Mask64x2) Uint64x2
			func Float64x2.Compress(mask Mask64x2) Float64x2
			func Float64x2.Expand(mask Mask64x2) Float64x2
			func Float64x2.Masked(mask Mask64x2) Float64x2
			func Float64x2.Merge(y Float64x2, mask Mask64x2) Float64x2
			func Float64x2.StoreMasked(y *[2]float64, mask Mask64x2)
			func Int64x2.Compress(mask Mask64x2) Int64x2
			func Int64x2.Expand(mask Mask64x2) Int64x2
			func Int64x2.Masked(mask Mask64x2) Int64x2
			func Int64x2.Merge(y Int64x2, mask Mask64x2) Int64x2
			func Int64x2.StoreMasked(y *[2]int64, mask Mask64x2)
			func Mask64x2.And(y Mask64x2) Mask64x2
			func Mask64x2.Or(y Mask64x2) Mask64x2
			func Uint64x2.Compress(mask Mask64x2) Uint64x2
			func Uint64x2.Expand(mask Mask64x2) Uint64x2
			func Uint64x2.Masked(mask Mask64x2) Uint64x2
			func Uint64x2.Merge(y Uint64x2, mask Mask64x2) Uint64x2
			func Uint64x2.StoreMasked(y *[2]uint64, mask Mask64x2)

	 type Mask64x4 (struct)
		Mask64x4 is a 256-bit SIMD vector of 4 int64

		Methods (total 4)
			( Mask64x4) And(y Mask64x4) Mask64x4
			( Mask64x4) Or(y Mask64x4) Mask64x4
			( Mask64x4) ToBits() uint8
				ToBits constructs a bitmap from a Mask64x4, where 1 means set for the indexed element, 0 means unset.
				Only the lower 4 bits of y are used.
				
				Asm: KMOVQ, CPU Features: AVX512

			( Mask64x4) ToInt64x4() (to Int64x4)
				ToInt64x4 converts from Mask64x4 to Int64x4

		As Outputs Of (at least 23)
			func Mask64x4FromBits(y uint8) Mask64x4
			func Float64x4.Equal(y Float64x4) Mask64x4
			func Float64x4.Greater(y Float64x4) Mask64x4
			func Float64x4.GreaterEqual(y Float64x4) Mask64x4
			func Float64x4.IsNan(y Float64x4) Mask64x4
			func Float64x4.Less(y Float64x4) Mask64x4
			func Float64x4.LessEqual(y Float64x4) Mask64x4
			func Float64x4.NotEqual(y Float64x4) Mask64x4
			func Int64x4.Equal(y Int64x4) Mask64x4
			func Int64x4.Greater(y Int64x4) Mask64x4
			func Int64x4.GreaterEqual(y Int64x4) Mask64x4
			func Int64x4.Less(y Int64x4) Mask64x4
			func Int64x4.LessEqual(y Int64x4) Mask64x4
			func Int64x4.NotEqual(y Int64x4) Mask64x4
			func Int64x4.ToMask() (to Mask64x4)
			func Mask64x4.And(y Mask64x4) Mask64x4
			func Mask64x4.Or(y Mask64x4) Mask64x4
			func Uint64x4.Equal(y Uint64x4) Mask64x4
			func Uint64x4.Greater(y Uint64x4) Mask64x4
			func Uint64x4.GreaterEqual(y Uint64x4) Mask64x4
			func Uint64x4.Less(y Uint64x4) Mask64x4
			func Uint64x4.LessEqual(y Uint64x4) Mask64x4
			func Uint64x4.NotEqual(y Uint64x4) Mask64x4
		As Inputs Of (at least 20)
			func LoadMaskedFloat64x4(y *[4]float64, mask Mask64x4) Float64x4
			func LoadMaskedInt64x4(y *[4]int64, mask Mask64x4) Int64x4
			func LoadMaskedUint64x4(y *[4]uint64, mask Mask64x4) Uint64x4
			func Float64x4.Compress(mask Mask64x4) Float64x4
			func Float64x4.Expand(mask Mask64x4) Float64x4
			func Float64x4.Masked(mask Mask64x4) Float64x4
			func Float64x4.Merge(y Float64x4, mask Mask64x4) Float64x4
			func Float64x4.StoreMasked(y *[4]float64, mask Mask64x4)
			func Int64x4.Compress(mask Mask64x4) Int64x4
			func Int64x4.Expand(mask Mask64x4) Int64x4
			func Int64x4.Masked(mask Mask64x4) Int64x4
			func Int64x4.Merge(y Int64x4, mask Mask64x4) Int64x4
			func Int64x4.StoreMasked(y *[4]int64, mask Mask64x4)
			func Mask64x4.And(y Mask64x4) Mask64x4
			func Mask64x4.Or(y Mask64x4) Mask64x4
			func Uint64x4.Compress(mask Mask64x4) Uint64x4
			func Uint64x4.Expand(mask Mask64x4) Uint64x4
			func Uint64x4.Masked(mask Mask64x4) Uint64x4
			func Uint64x4.Merge(y Uint64x4, mask Mask64x4) Uint64x4
			func Uint64x4.StoreMasked(y *[4]uint64, mask Mask64x4)

	 type Mask64x8 (struct)
		Mask64x8 is a 512-bit SIMD vector of 8 int64

		Methods (total 4)
			( Mask64x8) And(y Mask64x8) Mask64x8
			( Mask64x8) Or(y Mask64x8) Mask64x8
			( Mask64x8) ToBits() uint8
				ToBits constructs a bitmap from a Mask64x8, where 1 means set for the indexed element, 0 means unset.
				
				Asm: KMOVQ, CPU Features: AVX512

			( Mask64x8) ToInt64x8() (to Int64x8)
				ToInt64x8 converts from Mask64x8 to Int64x8

		As Outputs Of (at least 23)
			func Mask64x8FromBits(y uint8) Mask64x8
			func Float64x8.Equal(y Float64x8) Mask64x8
			func Float64x8.Greater(y Float64x8) Mask64x8
			func Float64x8.GreaterEqual(y Float64x8) Mask64x8
			func Float64x8.IsNan(y Float64x8) Mask64x8
			func Float64x8.Less(y Float64x8) Mask64x8
			func Float64x8.LessEqual(y Float64x8) Mask64x8
			func Float64x8.NotEqual(y Float64x8) Mask64x8
			func Int64x8.Equal(y Int64x8) Mask64x8
			func Int64x8.Greater(y Int64x8) Mask64x8
			func Int64x8.GreaterEqual(y Int64x8) Mask64x8
			func Int64x8.Less(y Int64x8) Mask64x8
			func Int64x8.LessEqual(y Int64x8) Mask64x8
			func Int64x8.NotEqual(y Int64x8) Mask64x8
			func Int64x8.ToMask() (to Mask64x8)
			func Mask64x8.And(y Mask64x8) Mask64x8
			func Mask64x8.Or(y Mask64x8) Mask64x8
			func Uint64x8.Equal(y Uint64x8) Mask64x8
			func Uint64x8.Greater(y Uint64x8) Mask64x8
			func Uint64x8.GreaterEqual(y Uint64x8) Mask64x8
			func Uint64x8.Less(y Uint64x8) Mask64x8
			func Uint64x8.LessEqual(y Uint64x8) Mask64x8
			func Uint64x8.NotEqual(y Uint64x8) Mask64x8
		As Inputs Of (at least 20)
			func LoadMaskedFloat64x8(y *[8]float64, mask Mask64x8) Float64x8
			func LoadMaskedInt64x8(y *[8]int64, mask Mask64x8) Int64x8
			func LoadMaskedUint64x8(y *[8]uint64, mask Mask64x8) Uint64x8
			func Float64x8.Compress(mask Mask64x8) Float64x8
			func Float64x8.Expand(mask Mask64x8) Float64x8
			func Float64x8.Masked(mask Mask64x8) Float64x8
			func Float64x8.Merge(y Float64x8, mask Mask64x8) Float64x8
			func Float64x8.StoreMasked(y *[8]float64, mask Mask64x8)
			func Int64x8.Compress(mask Mask64x8) Int64x8
			func Int64x8.Expand(mask Mask64x8) Int64x8
			func Int64x8.Masked(mask Mask64x8) Int64x8
			func Int64x8.Merge(y Int64x8, mask Mask64x8) Int64x8
			func Int64x8.StoreMasked(y *[8]int64, mask Mask64x8)
			func Mask64x8.And(y Mask64x8) Mask64x8
			func Mask64x8.Or(y Mask64x8) Mask64x8
			func Uint64x8.Compress(mask Mask64x8) Uint64x8
			func Uint64x8.Expand(mask Mask64x8) Uint64x8
			func Uint64x8.Masked(mask Mask64x8) Uint64x8
			func Uint64x8.Merge(y Uint64x8, mask Mask64x8) Uint64x8
			func Uint64x8.StoreMasked(y *[8]uint64, mask Mask64x8)

	 type Mask8x16 (struct)
		Mask8x16 is a 128-bit SIMD vector of 16 int8

		Methods (total 4)
			( Mask8x16) And(y Mask8x16) Mask8x16
			( Mask8x16) Or(y Mask8x16) Mask8x16
			( Mask8x16) ToBits() uint16
				ToBits constructs a bitmap from a Mask8x16, where 1 means set for the indexed element, 0 means unset.
				
				Asm: KMOVB, CPU Features: AVX512

			( Mask8x16) ToInt8x16() (to Int8x16)
				ToInt8x16 converts from Mask8x16 to Int8x16

		As Outputs Of (at least 16)
			func Mask8x16FromBits(y uint16) Mask8x16
			func Int8x16.Equal(y Int8x16) Mask8x16
			func Int8x16.Greater(y Int8x16) Mask8x16
			func Int8x16.GreaterEqual(y Int8x16) Mask8x16
			func Int8x16.Less(y Int8x16) Mask8x16
			func Int8x16.LessEqual(y Int8x16) Mask8x16
			func Int8x16.NotEqual(y Int8x16) Mask8x16
			func Int8x16.ToMask() (to Mask8x16)
			func Mask8x16.And(y Mask8x16) Mask8x16
			func Mask8x16.Or(y Mask8x16) Mask8x16
			func Uint8x16.Equal(y Uint8x16) Mask8x16
			func Uint8x16.Greater(y Uint8x16) Mask8x16
			func Uint8x16.GreaterEqual(y Uint8x16) Mask8x16
			func Uint8x16.Less(y Uint8x16) Mask8x16
			func Uint8x16.LessEqual(y Uint8x16) Mask8x16
			func Uint8x16.NotEqual(y Uint8x16) Mask8x16
		As Inputs Of (at least 10)
			func Int8x16.Compress(mask Mask8x16) Int8x16
			func Int8x16.Expand(mask Mask8x16) Int8x16
			func Int8x16.Masked(mask Mask8x16) Int8x16
			func Int8x16.Merge(y Int8x16, mask Mask8x16) Int8x16
			func Mask8x16.And(y Mask8x16) Mask8x16
			func Mask8x16.Or(y Mask8x16) Mask8x16
			func Uint8x16.Compress(mask Mask8x16) Uint8x16
			func Uint8x16.Expand(mask Mask8x16) Uint8x16
			func Uint8x16.Masked(mask Mask8x16) Uint8x16
			func Uint8x16.Merge(y Uint8x16, mask Mask8x16) Uint8x16

	 type Mask8x32 (struct)
		Mask8x32 is a 256-bit SIMD vector of 32 int8

		Methods (total 4)
			( Mask8x32) And(y Mask8x32) Mask8x32
			( Mask8x32) Or(y Mask8x32) Mask8x32
			( Mask8x32) ToBits() uint32
				ToBits constructs a bitmap from a Mask8x32, where 1 means set for the indexed element, 0 means unset.
				
				Asm: KMOVB, CPU Features: AVX512

			( Mask8x32) ToInt8x32() (to Int8x32)
				ToInt8x32 converts from Mask8x32 to Int8x32

		As Outputs Of (at least 16)
			func Mask8x32FromBits(y uint32) Mask8x32
			func Int8x32.Equal(y Int8x32) Mask8x32
			func Int8x32.Greater(y Int8x32) Mask8x32
			func Int8x32.GreaterEqual(y Int8x32) Mask8x32
			func Int8x32.Less(y Int8x32) Mask8x32
			func Int8x32.LessEqual(y Int8x32) Mask8x32
			func Int8x32.NotEqual(y Int8x32) Mask8x32
			func Int8x32.ToMask() (to Mask8x32)
			func Mask8x32.And(y Mask8x32) Mask8x32
			func Mask8x32.Or(y Mask8x32) Mask8x32
			func Uint8x32.Equal(y Uint8x32) Mask8x32
			func Uint8x32.Greater(y Uint8x32) Mask8x32
			func Uint8x32.GreaterEqual(y Uint8x32) Mask8x32
			func Uint8x32.Less(y Uint8x32) Mask8x32
			func Uint8x32.LessEqual(y Uint8x32) Mask8x32
			func Uint8x32.NotEqual(y Uint8x32) Mask8x32
		As Inputs Of (at least 10)
			func Int8x32.Compress(mask Mask8x32) Int8x32
			func Int8x32.Expand(mask Mask8x32) Int8x32
			func Int8x32.Masked(mask Mask8x32) Int8x32
			func Int8x32.Merge(y Int8x32, mask Mask8x32) Int8x32
			func Mask8x32.And(y Mask8x32) Mask8x32
			func Mask8x32.Or(y Mask8x32) Mask8x32
			func Uint8x32.Compress(mask Mask8x32) Uint8x32
			func Uint8x32.Expand(mask Mask8x32) Uint8x32
			func Uint8x32.Masked(mask Mask8x32) Uint8x32
			func Uint8x32.Merge(y Uint8x32, mask Mask8x32) Uint8x32

	 type Mask8x64 (struct)
		Mask8x64 is a 512-bit SIMD vector of 64 int8

		Methods (total 4)
			( Mask8x64) And(y Mask8x64) Mask8x64
			( Mask8x64) Or(y Mask8x64) Mask8x64
			( Mask8x64) ToBits() uint64
				ToBits constructs a bitmap from a Mask8x64, where 1 means set for the indexed element, 0 means unset.
				
				Asm: KMOVB, CPU Features: AVX512

			( Mask8x64) ToInt8x64() (to Int8x64)
				ToInt8x64 converts from Mask8x64 to Int8x64

		As Outputs Of (at least 16)
			func Mask8x64FromBits(y uint64) Mask8x64
			func Int8x64.Equal(y Int8x64) Mask8x64
			func Int8x64.Greater(y Int8x64) Mask8x64
			func Int8x64.GreaterEqual(y Int8x64) Mask8x64
			func Int8x64.Less(y Int8x64) Mask8x64
			func Int8x64.LessEqual(y Int8x64) Mask8x64
			func Int8x64.NotEqual(y Int8x64) Mask8x64
			func Int8x64.ToMask() (to Mask8x64)
			func Mask8x64.And(y Mask8x64) Mask8x64
			func Mask8x64.Or(y Mask8x64) Mask8x64
			func Uint8x64.Equal(y Uint8x64) Mask8x64
			func Uint8x64.Greater(y Uint8x64) Mask8x64
			func Uint8x64.GreaterEqual(y Uint8x64) Mask8x64
			func Uint8x64.Less(y Uint8x64) Mask8x64
			func Uint8x64.LessEqual(y Uint8x64) Mask8x64
			func Uint8x64.NotEqual(y Uint8x64) Mask8x64
		As Inputs Of (at least 14)
			func LoadMaskedInt8x64(y *[64]int8, mask Mask8x64) Int8x64
			func LoadMaskedUint8x64(y *[64]uint8, mask Mask8x64) Uint8x64
			func Int8x64.Compress(mask Mask8x64) Int8x64
			func Int8x64.Expand(mask Mask8x64) Int8x64
			func Int8x64.Masked(mask Mask8x64) Int8x64
			func Int8x64.Merge(y Int8x64, mask Mask8x64) Int8x64
			func Int8x64.StoreMasked(y *[64]int8, mask Mask8x64)
			func Mask8x64.And(y Mask8x64) Mask8x64
			func Mask8x64.Or(y Mask8x64) Mask8x64
			func Uint8x64.Compress(mask Mask8x64) Uint8x64
			func Uint8x64.Expand(mask Mask8x64) Uint8x64
			func Uint8x64.Masked(mask Mask8x64) Uint8x64
			func Uint8x64.Merge(y Uint8x64, mask Mask8x64) Uint8x64
			func Uint8x64.StoreMasked(y *[64]uint8, mask Mask8x64)

	 type Uint16x16 (struct)
		Uint16x16 is a 256-bit SIMD vector of 16 uint16

		Methods (total 63)
			( Uint16x16) Add(y Uint16x16) Uint16x16
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDW, CPU Feature: AVX2

			( Uint16x16) AddPairs(y Uint16x16) Uint16x16
				AddPairs horizontally adds adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
				
				Asm: VPHADDW, CPU Feature: AVX2

			( Uint16x16) AddSaturated(y Uint16x16) Uint16x16
				AddSaturated adds corresponding elements of two vectors with saturation.
				
				Asm: VPADDUSW, CPU Feature: AVX2

			( Uint16x16) And(y Uint16x16) Uint16x16
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPAND, CPU Feature: AVX2

			( Uint16x16) AndNot(y Uint16x16) Uint16x16
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDN, CPU Feature: AVX2

			( Uint16x16) AsFloat32x8() (to Float32x8)
				Float32x8 converts from Uint16x16 to Float32x8

			( Uint16x16) AsFloat64x4() (to Float64x4)
				Float64x4 converts from Uint16x16 to Float64x4

			( Uint16x16) AsInt16x16() (to Int16x16)
				Int16x16 converts from Uint16x16 to Int16x16

			( Uint16x16) AsInt32x8() (to Int32x8)
				Int32x8 converts from Uint16x16 to Int32x8

			( Uint16x16) AsInt64x4() (to Int64x4)
				Int64x4 converts from Uint16x16 to Int64x4

			( Uint16x16) AsInt8x32() (to Int8x32)
				Int8x32 converts from Uint16x16 to Int8x32

			( Uint16x16) AsUint32x8() (to Uint32x8)
				Uint32x8 converts from Uint16x16 to Uint32x8

			( Uint16x16) AsUint64x4() (to Uint64x4)
				Uint64x4 converts from Uint16x16 to Uint64x4

			( Uint16x16) AsUint8x32() (to Uint8x32)
				Uint8x32 converts from Uint16x16 to Uint8x32

			( Uint16x16) Average(y Uint16x16) Uint16x16
				Average computes the rounded average of corresponding elements.
				
				Asm: VPAVGW, CPU Feature: AVX2

			( Uint16x16) Compress(mask Mask16x16) Uint16x16
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2

			( Uint16x16) ConcatPermute(y Uint16x16, indices Uint16x16) Uint16x16
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2W, CPU Feature: AVX512

			( Uint16x16) Equal(y Uint16x16) Mask16x16
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQW, CPU Feature: AVX2

			( Uint16x16) Expand(mask Mask16x16) Uint16x16
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDW, CPU Feature: AVX512VBMI2

			( Uint16x16) ExtendToUint32() Uint32x16
				ExtendToUint32 converts element values to uint32.
				The result vector's elements are zero-extended.
				
				Asm: VPMOVZXWD, CPU Feature: AVX512

			( Uint16x16) GetHi() Uint16x8
				GetHi returns the upper half of x.
				
				Asm: VEXTRACTI128, CPU Feature: AVX2

			( Uint16x16) GetLo() Uint16x8
				GetLo returns the lower half of x.
				
				Asm: VEXTRACTI128, CPU Feature: AVX2

			( Uint16x16) Greater(y Uint16x16) Mask16x16
				Greater returns a mask whose elements indicate whether x > y
				
				Emulated, CPU Feature AVX2

			( Uint16x16) GreaterEqual(y Uint16x16) Mask16x16
				GreaterEqual returns a mask whose elements indicate whether x >= y
				
				Emulated, CPU Feature AVX2

			( Uint16x16) InterleaveHiGrouped(y Uint16x16) Uint16x16
				InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKHWD, CPU Feature: AVX2

			( Uint16x16) InterleaveLoGrouped(y Uint16x16) Uint16x16
				InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKLWD, CPU Feature: AVX2

			( Uint16x16) IsZero() bool
				IsZero returns true if all elements of x are zeros.
				
				This method compiles to VPTEST x, x.
				x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
				
				Asm: VPTEST, CPU Feature: AVX

			( Uint16x16) Len() int
				Len returns the number of elements in a Uint16x16

			( Uint16x16) Less(y Uint16x16) Mask16x16
				Less returns a mask whose elements indicate whether x < y
				
				Emulated, CPU Feature AVX2

			( Uint16x16) LessEqual(y Uint16x16) Mask16x16
				LessEqual returns a mask whose elements indicate whether x <= y
				
				Emulated, CPU Feature AVX2

			( Uint16x16) Masked(mask Mask16x16) Uint16x16
				Masked returns x but with elements zeroed where mask is false.

			( Uint16x16) Max(y Uint16x16) Uint16x16
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXUW, CPU Feature: AVX2

			( Uint16x16) Merge(y Uint16x16, mask Mask16x16) Uint16x16
				Merge returns x but with elements set to y where mask is false.

			( Uint16x16) Min(y Uint16x16) Uint16x16
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINUW, CPU Feature: AVX2

			( Uint16x16) Mul(y Uint16x16) Uint16x16
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VPMULLW, CPU Feature: AVX2

			( Uint16x16) MulHigh(y Uint16x16) Uint16x16
				MulHigh multiplies elements and stores the high part of the result.
				
				Asm: VPMULHUW, CPU Feature: AVX2

			( Uint16x16) Not() Uint16x16
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX2

			( Uint16x16) NotEqual(y Uint16x16) Mask16x16
				NotEqual returns a mask whose elements indicate whether x != y
				
				Emulated, CPU Feature AVX2

			( Uint16x16) OnesCount() Uint16x16
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTW, CPU Feature: AVX512BITALG

			( Uint16x16) Or(y Uint16x16) Uint16x16
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPOR, CPU Feature: AVX2

			( Uint16x16) Permute(indices Uint16x16) Uint16x16
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 4 bits (values 0-15) of each element of indices is used
				
				Asm: VPERMW, CPU Feature: AVX512

			( Uint16x16) PermuteScalarsHiGrouped(a, b, c, d uint8) Uint16x16
				PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:
				
					 result =
					  {x[0], x[1], x[2], x[3],   x[a+4], x[b+4], x[c+4], x[d+4],
						x[8], x[9], x[10], x[11], x[a+12], x[b+12], x[c+12], x[d+12]}
				
				Each group is of size 128-bit.
				
				Parameters a,b,c,d should have values between 0 and 3.
				If a through d are constants, then an instruction will be inlined, otherwise
				a jump table is generated.
				
				Asm: VPSHUFHW, CPU Feature: AVX2

			( Uint16x16) PermuteScalarsLoGrouped(a, b, c, d uint8) Uint16x16
				PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:
				
					 result = {x[a], x[b], x[c], x[d],         x[4], x[5], x[6], x[7],
						x[a+8], x[b+8], x[c+8], x[d+8], x[12], x[13], x[14], x[15]}
				
				Parameters a,b,c,d should have values between 0 and 3.
				If a through d are constants, then an instruction will be inlined, otherwise
				a jump table is generated.
				
				Asm: VPSHUFLW, CPU Feature: AVX2

			( Uint16x16) Select128FromPair(lo, hi uint8, y Uint16x16) Uint16x16
				Select128FromPair treats the 256-bit vectors x and y as a single vector of four
				128-bit elements, and returns a 256-bit result formed by
				concatenating the two elements specified by lo and hi.
				For example,
				
					{40, 41, 42, 43, 44, 45, 46, 47, 50, 51, 52, 53, 54, 55, 56, 57}.Select128FromPair(3, 0,
					 {60, 61, 62, 63, 64, 65, 66, 67, 70, 71, 72, 73, 74, 75, 76, 77})
				
				returns {70, 71, 72, 73, 74, 75, 76, 77, 40, 41, 42, 43, 44, 45, 46, 47}.
				
				lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table.
				lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
				
				Asm: VPERM2I128, CPU Feature: AVX2

			( Uint16x16) SetHi(y Uint16x8) Uint16x16
				SetHi returns x with its upper half set to y.
				
				Asm: VINSERTI128, CPU Feature: AVX2

			( Uint16x16) SetLo(y Uint16x8) Uint16x16
				SetLo returns x with its lower half set to y.
				
				Asm: VINSERTI128, CPU Feature: AVX2

			( Uint16x16) ShiftAllLeft(y uint64) Uint16x16
				ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
				
				Asm: VPSLLW, CPU Feature: AVX2

			( Uint16x16) ShiftAllLeftConcat(shift uint8, y Uint16x16) Uint16x16
				ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHLDW, CPU Feature: AVX512VBMI2

			( Uint16x16) ShiftAllRight(y uint64) Uint16x16
				ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
				
				Asm: VPSRLW, CPU Feature: AVX2

			( Uint16x16) ShiftAllRightConcat(shift uint8, y Uint16x16) Uint16x16
				ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHRDW, CPU Feature: AVX512VBMI2

			( Uint16x16) ShiftLeft(y Uint16x16) Uint16x16
				ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
				
				Asm: VPSLLVW, CPU Feature: AVX512

			( Uint16x16) ShiftLeftConcat(y Uint16x16, z Uint16x16) Uint16x16
				ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
				
				Asm: VPSHLDVW, CPU Feature: AVX512VBMI2

			( Uint16x16) ShiftRight(y Uint16x16) Uint16x16
				ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
				
				Asm: VPSRLVW, CPU Feature: AVX512

			( Uint16x16) ShiftRightConcat(y Uint16x16, z Uint16x16) Uint16x16
				ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
				
				Asm: VPSHRDVW, CPU Feature: AVX512VBMI2

			( Uint16x16) Store(y *[16]uint16)
				Store stores a Uint16x16 to an array

			( Uint16x16) StoreSlice(s []uint16)
				StoreSlice stores x into a slice of at least 16 uint16s

			( Uint16x16) StoreSlicePart(s []uint16)
				StoreSlicePart stores the 16 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 16 or more elements, the method is equivalent to x.StoreSlice.

			( Uint16x16) String() string
				String returns a string representation of SIMD vector x

			( Uint16x16) Sub(y Uint16x16) Uint16x16
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBW, CPU Feature: AVX2

			( Uint16x16) SubPairs(y Uint16x16) Uint16x16
				SubPairs horizontally subtracts adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
				
				Asm: VPHSUBW, CPU Feature: AVX2

			( Uint16x16) SubSaturated(y Uint16x16) Uint16x16
				SubSaturated subtracts corresponding elements of two vectors with saturation.
				
				Asm: VPSUBUSW, CPU Feature: AVX2

			( Uint16x16) TruncateToUint8() Uint8x16
				TruncateToUint8 converts element values to uint8.
				Conversion is done with truncation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVWB, CPU Feature: AVX512

			( Uint16x16) Xor(y Uint16x16) Uint16x16
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXOR, CPU Feature: AVX2

		Implements (at least 2)
			 Uint16x16 : expvar.Var
			 Uint16x16 : fmt.Stringer
		As Outputs Of (at least 59)
			func BroadcastUint16x16(x uint16) Uint16x16
			func LoadUint16x16(y *[16]uint16) Uint16x16
			func LoadUint16x16Slice(s []uint16) Uint16x16
			func LoadUint16x16SlicePart(s []uint16) Uint16x16
			func Float32x8.AsUint16x16() (to Uint16x16)
			func Float64x4.AsUint16x16() (to Uint16x16)
			func Int16x16.AsUint16x16() (to Uint16x16)
			func Int32x8.AsUint16x16() (to Uint16x16)
			func Int64x4.AsUint16x16() (to Uint16x16)
			func Int8x32.AsUint16x16() (to Uint16x16)
			func Uint16x16.Add(y Uint16x16) Uint16x16
			func Uint16x16.AddPairs(y Uint16x16) Uint16x16
			func Uint16x16.AddSaturated(y Uint16x16) Uint16x16
			func Uint16x16.And(y Uint16x16) Uint16x16
			func Uint16x16.AndNot(y Uint16x16) Uint16x16
			func Uint16x16.Average(y Uint16x16) Uint16x16
			func Uint16x16.Compress(mask Mask16x16) Uint16x16
			func Uint16x16.ConcatPermute(y Uint16x16, indices Uint16x16) Uint16x16
			func Uint16x16.Expand(mask Mask16x16) Uint16x16
			func Uint16x16.InterleaveHiGrouped(y Uint16x16) Uint16x16
			func Uint16x16.InterleaveLoGrouped(y Uint16x16) Uint16x16
			func Uint16x16.Masked(mask Mask16x16) Uint16x16
			func Uint16x16.Max(y Uint16x16) Uint16x16
			func Uint16x16.Merge(y Uint16x16, mask Mask16x16) Uint16x16
			func Uint16x16.Min(y Uint16x16) Uint16x16
			func Uint16x16.Mul(y Uint16x16) Uint16x16
			func Uint16x16.MulHigh(y Uint16x16) Uint16x16
			func Uint16x16.Not() Uint16x16
			func Uint16x16.OnesCount() Uint16x16
			func Uint16x16.Or(y Uint16x16) Uint16x16
			func Uint16x16.Permute(indices Uint16x16) Uint16x16
			func Uint16x16.PermuteScalarsHiGrouped(a, b, c, d uint8) Uint16x16
			func Uint16x16.PermuteScalarsLoGrouped(a, b, c, d uint8) Uint16x16
			func Uint16x16.Select128FromPair(lo, hi uint8, y Uint16x16) Uint16x16
			func Uint16x16.SetHi(y Uint16x8) Uint16x16
			func Uint16x16.SetLo(y Uint16x8) Uint16x16
			func Uint16x16.ShiftAllLeft(y uint64) Uint16x16
			func Uint16x16.ShiftAllLeftConcat(shift uint8, y Uint16x16) Uint16x16
			func Uint16x16.ShiftAllRight(y uint64) Uint16x16
			func Uint16x16.ShiftAllRightConcat(shift uint8, y Uint16x16) Uint16x16
			func Uint16x16.ShiftLeft(y Uint16x16) Uint16x16
			func Uint16x16.ShiftLeftConcat(y Uint16x16, z Uint16x16) Uint16x16
			func Uint16x16.ShiftRight(y Uint16x16) Uint16x16
			func Uint16x16.ShiftRightConcat(y Uint16x16, z Uint16x16) Uint16x16
			func Uint16x16.Sub(y Uint16x16) Uint16x16
			func Uint16x16.SubPairs(y Uint16x16) Uint16x16
			func Uint16x16.SubSaturated(y Uint16x16) Uint16x16
			func Uint16x16.Xor(y Uint16x16) Uint16x16
			func Uint16x32.GetHi() Uint16x16
			func Uint16x32.GetLo() Uint16x16
			func Uint16x8.Broadcast256() Uint16x16
			func Uint32x16.SaturateToUint16() Uint16x16
			func Uint32x16.TruncateToUint16() Uint16x16
			func Uint32x8.AsUint16x16() (to Uint16x16)
			func Uint32x8.SaturateToUint16Concat(y Uint32x8) Uint16x16
			func Uint64x4.AsUint16x16() (to Uint16x16)
			func Uint8x16.ExtendToUint16() Uint16x16
			func Uint8x32.AsUint16x16() (to Uint16x16)
			func Uint8x32.SumAbsDiff(y Uint8x32) Uint16x16
		As Inputs Of (at least 40)
			func Int16x16.ConcatPermute(y Int16x16, indices Uint16x16) Int16x16
			func Int16x16.Permute(indices Uint16x16) Int16x16
			func Uint16x16.Add(y Uint16x16) Uint16x16
			func Uint16x16.AddPairs(y Uint16x16) Uint16x16
			func Uint16x16.AddSaturated(y Uint16x16) Uint16x16
			func Uint16x16.And(y Uint16x16) Uint16x16
			func Uint16x16.AndNot(y Uint16x16) Uint16x16
			func Uint16x16.Average(y Uint16x16) Uint16x16
			func Uint16x16.ConcatPermute(y Uint16x16, indices Uint16x16) Uint16x16
			func Uint16x16.ConcatPermute(y Uint16x16, indices Uint16x16) Uint16x16
			func Uint16x16.Equal(y Uint16x16) Mask16x16
			func Uint16x16.Greater(y Uint16x16) Mask16x16
			func Uint16x16.GreaterEqual(y Uint16x16) Mask16x16
			func Uint16x16.InterleaveHiGrouped(y Uint16x16) Uint16x16
			func Uint16x16.InterleaveLoGrouped(y Uint16x16) Uint16x16
			func Uint16x16.Less(y Uint16x16) Mask16x16
			func Uint16x16.LessEqual(y Uint16x16) Mask16x16
			func Uint16x16.Max(y Uint16x16) Uint16x16
			func Uint16x16.Merge(y Uint16x16, mask Mask16x16) Uint16x16
			func Uint16x16.Min(y Uint16x16) Uint16x16
			func Uint16x16.Mul(y Uint16x16) Uint16x16
			func Uint16x16.MulHigh(y Uint16x16) Uint16x16
			func Uint16x16.NotEqual(y Uint16x16) Mask16x16
			func Uint16x16.Or(y Uint16x16) Uint16x16
			func Uint16x16.Permute(indices Uint16x16) Uint16x16
			func Uint16x16.Select128FromPair(lo, hi uint8, y Uint16x16) Uint16x16
			func Uint16x16.ShiftAllLeftConcat(shift uint8, y Uint16x16) Uint16x16
			func Uint16x16.ShiftAllRightConcat(shift uint8, y Uint16x16) Uint16x16
			func Uint16x16.ShiftLeft(y Uint16x16) Uint16x16
			func Uint16x16.ShiftLeftConcat(y Uint16x16, z Uint16x16) Uint16x16
			func Uint16x16.ShiftLeftConcat(y Uint16x16, z Uint16x16) Uint16x16
			func Uint16x16.ShiftRight(y Uint16x16) Uint16x16
			func Uint16x16.ShiftRightConcat(y Uint16x16, z Uint16x16) Uint16x16
			func Uint16x16.ShiftRightConcat(y Uint16x16, z Uint16x16) Uint16x16
			func Uint16x16.Sub(y Uint16x16) Uint16x16
			func Uint16x16.SubPairs(y Uint16x16) Uint16x16
			func Uint16x16.SubSaturated(y Uint16x16) Uint16x16
			func Uint16x16.Xor(y Uint16x16) Uint16x16
			func Uint16x32.SetHi(y Uint16x16) Uint16x32
			func Uint16x32.SetLo(y Uint16x16) Uint16x32

	 type Uint16x32 (struct)
		Uint16x32 is a 512-bit SIMD vector of 32 uint16

		Methods (total 60)
			( Uint16x32) Add(y Uint16x32) Uint16x32
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDW, CPU Feature: AVX512

			( Uint16x32) AddSaturated(y Uint16x32) Uint16x32
				AddSaturated adds corresponding elements of two vectors with saturation.
				
				Asm: VPADDUSW, CPU Feature: AVX512

			( Uint16x32) And(y Uint16x32) Uint16x32
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPANDD, CPU Feature: AVX512

			( Uint16x32) AndNot(y Uint16x32) Uint16x32
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDND, CPU Feature: AVX512

			( Uint16x32) AsFloat32x16() (to Float32x16)
				Float32x16 converts from Uint16x32 to Float32x16

			( Uint16x32) AsFloat64x8() (to Float64x8)
				Float64x8 converts from Uint16x32 to Float64x8

			( Uint16x32) AsInt16x32() (to Int16x32)
				Int16x32 converts from Uint16x32 to Int16x32

			( Uint16x32) AsInt32x16() (to Int32x16)
				Int32x16 converts from Uint16x32 to Int32x16

			( Uint16x32) AsInt64x8() (to Int64x8)
				Int64x8 converts from Uint16x32 to Int64x8

			( Uint16x32) AsInt8x64() (to Int8x64)
				Int8x64 converts from Uint16x32 to Int8x64

			( Uint16x32) AsUint32x16() (to Uint32x16)
				Uint32x16 converts from Uint16x32 to Uint32x16

			( Uint16x32) AsUint64x8() (to Uint64x8)
				Uint64x8 converts from Uint16x32 to Uint64x8

			( Uint16x32) AsUint8x64() (to Uint8x64)
				Uint8x64 converts from Uint16x32 to Uint8x64

			( Uint16x32) Average(y Uint16x32) Uint16x32
				Average computes the rounded average of corresponding elements.
				
				Asm: VPAVGW, CPU Feature: AVX512

			( Uint16x32) Compress(mask Mask16x32) Uint16x32
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2

			( Uint16x32) ConcatPermute(y Uint16x32, indices Uint16x32) Uint16x32
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2W, CPU Feature: AVX512

			( Uint16x32) Equal(y Uint16x32) Mask16x32
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQW, CPU Feature: AVX512

			( Uint16x32) Expand(mask Mask16x32) Uint16x32
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDW, CPU Feature: AVX512VBMI2

			( Uint16x32) GetHi() Uint16x16
				GetHi returns the upper half of x.
				
				Asm: VEXTRACTI64X4, CPU Feature: AVX512

			( Uint16x32) GetLo() Uint16x16
				GetLo returns the lower half of x.
				
				Asm: VEXTRACTI64X4, CPU Feature: AVX512

			( Uint16x32) Greater(y Uint16x32) Mask16x32
				Greater returns x greater-than y, elementwise.
				
				Asm: VPCMPUW, CPU Feature: AVX512

			( Uint16x32) GreaterEqual(y Uint16x32) Mask16x32
				GreaterEqual returns x greater-than-or-equals y, elementwise.
				
				Asm: VPCMPUW, CPU Feature: AVX512

			( Uint16x32) InterleaveHiGrouped(y Uint16x32) Uint16x32
				InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKHWD, CPU Feature: AVX512

			( Uint16x32) InterleaveLoGrouped(y Uint16x32) Uint16x32
				InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKLWD, CPU Feature: AVX512

			( Uint16x32) Len() int
				Len returns the number of elements in a Uint16x32

			( Uint16x32) Less(y Uint16x32) Mask16x32
				Less returns x less-than y, elementwise.
				
				Asm: VPCMPUW, CPU Feature: AVX512

			( Uint16x32) LessEqual(y Uint16x32) Mask16x32
				LessEqual returns x less-than-or-equals y, elementwise.
				
				Asm: VPCMPUW, CPU Feature: AVX512

			( Uint16x32) Masked(mask Mask16x32) Uint16x32
				Masked returns x but with elements zeroed where mask is false.

			( Uint16x32) Max(y Uint16x32) Uint16x32
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXUW, CPU Feature: AVX512

			( Uint16x32) Merge(y Uint16x32, mask Mask16x32) Uint16x32
				Merge returns x but with elements set to y where m is false.

			( Uint16x32) Min(y Uint16x32) Uint16x32
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINUW, CPU Feature: AVX512

			( Uint16x32) Mul(y Uint16x32) Uint16x32
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VPMULLW, CPU Feature: AVX512

			( Uint16x32) MulHigh(y Uint16x32) Uint16x32
				MulHigh multiplies elements and stores the high part of the result.
				
				Asm: VPMULHUW, CPU Feature: AVX512

			( Uint16x32) Not() Uint16x32
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX512

			( Uint16x32) NotEqual(y Uint16x32) Mask16x32
				NotEqual returns x not-equals y, elementwise.
				
				Asm: VPCMPUW, CPU Feature: AVX512

			( Uint16x32) OnesCount() Uint16x32
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTW, CPU Feature: AVX512BITALG

			( Uint16x32) Or(y Uint16x32) Uint16x32
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPORD, CPU Feature: AVX512

			( Uint16x32) Permute(indices Uint16x32) Uint16x32
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 5 bits (values 0-31) of each element of indices is used
				
				Asm: VPERMW, CPU Feature: AVX512

			( Uint16x32) PermuteScalarsHiGrouped(a, b, c, d uint8) Uint16x32
				PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:
				
					 result =
						 {  x[0], x[1], x[2], x[3],     x[a+4], x[b+4], x[c+4], x[d+4],
							x[8], x[9], x[10], x[11],   x[a+12], x[b+12], x[c+12], x[d+12],
							x[16], x[17], x[18], x[19], x[a+20], x[b+20], x[c+20], x[d+20],
							x[24], x[25], x[26], x[27], x[a+28], x[b+28], x[c+28], x[d+28]}
				
				Parameters a,b,c,d should have values between 0 and 3.
				If a through d are constants, then an instruction will be inlined, otherwise
				a jump table is generated.
				
				Asm: VPSHUFHW, CPU Feature: AVX512

			( Uint16x32) PermuteScalarsLoGrouped(a, b, c, d uint8) Uint16x32
				PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:
				
					 result =
					 {x[a], x[b], x[c], x[d],    x[4], x[5], x[6], x[7],
						x[a+8], x[b+8], x[c+8], x[d+8],     x[12], x[13], x[14], x[15],
						x[a+16], x[b+16], x[c+16], x[d+16], x[20], x[21], x[22], x[23],
						x[a+24], x[b+24], x[c+24], x[d+24], x[28], x[29], x[30], x[31]}
				
				Each group is of size 128-bit.
				
				Parameters a,b,c,d should have values between 0 and 3.
				If a through d are constants, then an instruction will be inlined, otherwise
				a jump table is generated.
				
				Asm: VPSHUFLW, CPU Feature: AVX512

			( Uint16x32) SaturateToUint8() Uint8x32
				SaturateToUint8 converts element values to uint8.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPMOVUSWB, CPU Feature: AVX512

			( Uint16x32) SetHi(y Uint16x16) Uint16x32
				SetHi returns x with its upper half set to y.
				
				Asm: VINSERTI64X4, CPU Feature: AVX512

			( Uint16x32) SetLo(y Uint16x16) Uint16x32
				SetLo returns x with its lower half set to y.
				
				Asm: VINSERTI64X4, CPU Feature: AVX512

			( Uint16x32) ShiftAllLeft(y uint64) Uint16x32
				ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
				
				Asm: VPSLLW, CPU Feature: AVX512

			( Uint16x32) ShiftAllLeftConcat(shift uint8, y Uint16x32) Uint16x32
				ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHLDW, CPU Feature: AVX512VBMI2

			( Uint16x32) ShiftAllRight(y uint64) Uint16x32
				ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
				
				Asm: VPSRLW, CPU Feature: AVX512

			( Uint16x32) ShiftAllRightConcat(shift uint8, y Uint16x32) Uint16x32
				ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHRDW, CPU Feature: AVX512VBMI2

			( Uint16x32) ShiftLeft(y Uint16x32) Uint16x32
				ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
				
				Asm: VPSLLVW, CPU Feature: AVX512

			( Uint16x32) ShiftLeftConcat(y Uint16x32, z Uint16x32) Uint16x32
				ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
				
				Asm: VPSHLDVW, CPU Feature: AVX512VBMI2

			( Uint16x32) ShiftRight(y Uint16x32) Uint16x32
				ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
				
				Asm: VPSRLVW, CPU Feature: AVX512

			( Uint16x32) ShiftRightConcat(y Uint16x32, z Uint16x32) Uint16x32
				ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
				
				Asm: VPSHRDVW, CPU Feature: AVX512VBMI2

			( Uint16x32) Store(y *[32]uint16)
				Store stores a Uint16x32 to an array

			( Uint16x32) StoreMasked(y *[32]uint16, mask Mask16x32)
				StoreMasked stores a Uint16x32 to an array,
				at those elements enabled by mask
				
				Asm: VMOVDQU16, CPU Feature: AVX512

			( Uint16x32) StoreSlice(s []uint16)
				StoreSlice stores x into a slice of at least 32 uint16s

			( Uint16x32) StoreSlicePart(s []uint16)
				StoreSlicePart stores the 32 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 32 or more elements, the method is equivalent to x.StoreSlice.

			( Uint16x32) String() string
				String returns a string representation of SIMD vector x

			( Uint16x32) Sub(y Uint16x32) Uint16x32
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBW, CPU Feature: AVX512

			( Uint16x32) SubSaturated(y Uint16x32) Uint16x32
				SubSaturated subtracts corresponding elements of two vectors with saturation.
				
				Asm: VPSUBUSW, CPU Feature: AVX512

			( Uint16x32) TruncateToUint8() Uint8x32
				TruncateToUint8 converts element values to uint8.
				Conversion is done with truncation on the vector elements.
				
				Asm: VPMOVWB, CPU Feature: AVX512

			( Uint16x32) Xor(y Uint16x32) Uint16x32
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXORD, CPU Feature: AVX512

		Implements (at least 2)
			 Uint16x32 : expvar.Var
			 Uint16x32 : fmt.Stringer
		As Outputs Of (at least 53)
			func BroadcastUint16x32(x uint16) Uint16x32
			func LoadMaskedUint16x32(y *[32]uint16, mask Mask16x32) Uint16x32
			func LoadUint16x32(y *[32]uint16) Uint16x32
			func LoadUint16x32Slice(s []uint16) Uint16x32
			func LoadUint16x32SlicePart(s []uint16) Uint16x32
			func Float32x16.AsUint16x32() (to Uint16x32)
			func Float64x8.AsUint16x32() (to Uint16x32)
			func Int16x32.AsUint16x32() (to Uint16x32)
			func Int32x16.AsUint16x32() (to Uint16x32)
			func Int64x8.AsUint16x32() (to Uint16x32)
			func Int8x64.AsUint16x32() (to Uint16x32)
			func Uint16x32.Add(y Uint16x32) Uint16x32
			func Uint16x32.AddSaturated(y Uint16x32) Uint16x32
			func Uint16x32.And(y Uint16x32) Uint16x32
			func Uint16x32.AndNot(y Uint16x32) Uint16x32
			func Uint16x32.Average(y Uint16x32) Uint16x32
			func Uint16x32.Compress(mask Mask16x32) Uint16x32
			func Uint16x32.ConcatPermute(y Uint16x32, indices Uint16x32) Uint16x32
			func Uint16x32.Expand(mask Mask16x32) Uint16x32
			func Uint16x32.InterleaveHiGrouped(y Uint16x32) Uint16x32
			func Uint16x32.InterleaveLoGrouped(y Uint16x32) Uint16x32
			func Uint16x32.Masked(mask Mask16x32) Uint16x32
			func Uint16x32.Max(y Uint16x32) Uint16x32
			func Uint16x32.Merge(y Uint16x32, mask Mask16x32) Uint16x32
			func Uint16x32.Min(y Uint16x32) Uint16x32
			func Uint16x32.Mul(y Uint16x32) Uint16x32
			func Uint16x32.MulHigh(y Uint16x32) Uint16x32
			func Uint16x32.Not() Uint16x32
			func Uint16x32.OnesCount() Uint16x32
			func Uint16x32.Or(y Uint16x32) Uint16x32
			func Uint16x32.Permute(indices Uint16x32) Uint16x32
			func Uint16x32.PermuteScalarsHiGrouped(a, b, c, d uint8) Uint16x32
			func Uint16x32.PermuteScalarsLoGrouped(a, b, c, d uint8) Uint16x32
			func Uint16x32.SetHi(y Uint16x16) Uint16x32
			func Uint16x32.SetLo(y Uint16x16) Uint16x32
			func Uint16x32.ShiftAllLeft(y uint64) Uint16x32
			func Uint16x32.ShiftAllLeftConcat(shift uint8, y Uint16x32) Uint16x32
			func Uint16x32.ShiftAllRight(y uint64) Uint16x32
			func Uint16x32.ShiftAllRightConcat(shift uint8, y Uint16x32) Uint16x32
			func Uint16x32.ShiftLeft(y Uint16x32) Uint16x32
			func Uint16x32.ShiftLeftConcat(y Uint16x32, z Uint16x32) Uint16x32
			func Uint16x32.ShiftRight(y Uint16x32) Uint16x32
			func Uint16x32.ShiftRightConcat(y Uint16x32, z Uint16x32) Uint16x32
			func Uint16x32.Sub(y Uint16x32) Uint16x32
			func Uint16x32.SubSaturated(y Uint16x32) Uint16x32
			func Uint16x32.Xor(y Uint16x32) Uint16x32
			func Uint16x8.Broadcast512() Uint16x32
			func Uint32x16.AsUint16x32() (to Uint16x32)
			func Uint32x16.SaturateToUint16Concat(y Uint32x16) Uint16x32
			func Uint64x8.AsUint16x32() (to Uint16x32)
			func Uint8x32.ExtendToUint16() Uint16x32
			func Uint8x64.AsUint16x32() (to Uint16x32)
			func Uint8x64.SumAbsDiff(y Uint8x64) Uint16x32
		As Inputs Of (at least 35)
			func Int16x32.ConcatPermute(y Int16x32, indices Uint16x32) Int16x32
			func Int16x32.Permute(indices Uint16x32) Int16x32
			func Uint16x32.Add(y Uint16x32) Uint16x32
			func Uint16x32.AddSaturated(y Uint16x32) Uint16x32
			func Uint16x32.And(y Uint16x32) Uint16x32
			func Uint16x32.AndNot(y Uint16x32) Uint16x32
			func Uint16x32.Average(y Uint16x32) Uint16x32
			func Uint16x32.ConcatPermute(y Uint16x32, indices Uint16x32) Uint16x32
			func Uint16x32.ConcatPermute(y Uint16x32, indices Uint16x32) Uint16x32
			func Uint16x32.Equal(y Uint16x32) Mask16x32
			func Uint16x32.Greater(y Uint16x32) Mask16x32
			func Uint16x32.GreaterEqual(y Uint16x32) Mask16x32
			func Uint16x32.InterleaveHiGrouped(y Uint16x32) Uint16x32
			func Uint16x32.InterleaveLoGrouped(y Uint16x32) Uint16x32
			func Uint16x32.Less(y Uint16x32) Mask16x32
			func Uint16x32.LessEqual(y Uint16x32) Mask16x32
			func Uint16x32.Max(y Uint16x32) Uint16x32
			func Uint16x32.Merge(y Uint16x32, mask Mask16x32) Uint16x32
			func Uint16x32.Min(y Uint16x32) Uint16x32
			func Uint16x32.Mul(y Uint16x32) Uint16x32
			func Uint16x32.MulHigh(y Uint16x32) Uint16x32
			func Uint16x32.NotEqual(y Uint16x32) Mask16x32
			func Uint16x32.Or(y Uint16x32) Uint16x32
			func Uint16x32.Permute(indices Uint16x32) Uint16x32
			func Uint16x32.ShiftAllLeftConcat(shift uint8, y Uint16x32) Uint16x32
			func Uint16x32.ShiftAllRightConcat(shift uint8, y Uint16x32) Uint16x32
			func Uint16x32.ShiftLeft(y Uint16x32) Uint16x32
			func Uint16x32.ShiftLeftConcat(y Uint16x32, z Uint16x32) Uint16x32
			func Uint16x32.ShiftLeftConcat(y Uint16x32, z Uint16x32) Uint16x32
			func Uint16x32.ShiftRight(y Uint16x32) Uint16x32
			func Uint16x32.ShiftRightConcat(y Uint16x32, z Uint16x32) Uint16x32
			func Uint16x32.ShiftRightConcat(y Uint16x32, z Uint16x32) Uint16x32
			func Uint16x32.Sub(y Uint16x32) Uint16x32
			func Uint16x32.SubSaturated(y Uint16x32) Uint16x32
			func Uint16x32.Xor(y Uint16x32) Uint16x32

	 type Uint16x8 (struct)
		Uint16x8 is a 128-bit SIMD vector of 8 uint16

		Methods (total 67)
			( Uint16x8) Add(y Uint16x8) Uint16x8
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDW, CPU Feature: AVX

			( Uint16x8) AddPairs(y Uint16x8) Uint16x8
				AddPairs horizontally adds adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
				
				Asm: VPHADDW, CPU Feature: AVX

			( Uint16x8) AddSaturated(y Uint16x8) Uint16x8
				AddSaturated adds corresponding elements of two vectors with saturation.
				
				Asm: VPADDUSW, CPU Feature: AVX

			( Uint16x8) And(y Uint16x8) Uint16x8
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPAND, CPU Feature: AVX

			( Uint16x8) AndNot(y Uint16x8) Uint16x8
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDN, CPU Feature: AVX

			( Uint16x8) AsFloat32x4() (to Float32x4)
				Float32x4 converts from Uint16x8 to Float32x4

			( Uint16x8) AsFloat64x2() (to Float64x2)
				Float64x2 converts from Uint16x8 to Float64x2

			( Uint16x8) AsInt16x8() (to Int16x8)
				Int16x8 converts from Uint16x8 to Int16x8

			( Uint16x8) AsInt32x4() (to Int32x4)
				Int32x4 converts from Uint16x8 to Int32x4

			( Uint16x8) AsInt64x2() (to Int64x2)
				Int64x2 converts from Uint16x8 to Int64x2

			( Uint16x8) AsInt8x16() (to Int8x16)
				Int8x16 converts from Uint16x8 to Int8x16

			( Uint16x8) AsUint32x4() (to Uint32x4)
				Uint32x4 converts from Uint16x8 to Uint32x4

			( Uint16x8) AsUint64x2() (to Uint64x2)
				Uint64x2 converts from Uint16x8 to Uint64x2

			( Uint16x8) AsUint8x16() (to Uint8x16)
				Uint8x16 converts from Uint16x8 to Uint8x16

			( Uint16x8) Average(y Uint16x8) Uint16x8
				Average computes the rounded average of corresponding elements.
				
				Asm: VPAVGW, CPU Feature: AVX

			( Uint16x8) Broadcast128() Uint16x8
				Broadcast128 copies element zero of its (128-bit) input to all elements of
				the 128-bit output vector.
				
				Asm: VPBROADCASTW, CPU Feature: AVX2

			( Uint16x8) Broadcast256() Uint16x16
				Broadcast256 copies element zero of its (128-bit) input to all elements of
				the 256-bit output vector.
				
				Asm: VPBROADCASTW, CPU Feature: AVX2

			( Uint16x8) Broadcast512() Uint16x32
				Broadcast512 copies element zero of its (128-bit) input to all elements of
				the 512-bit output vector.
				
				Asm: VPBROADCASTW, CPU Feature: AVX512

			( Uint16x8) Compress(mask Mask16x8) Uint16x8
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2

			( Uint16x8) ConcatPermute(y Uint16x8, indices Uint16x8) Uint16x8
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2W, CPU Feature: AVX512

			( Uint16x8) Equal(y Uint16x8) Mask16x8
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQW, CPU Feature: AVX

			( Uint16x8) Expand(mask Mask16x8) Uint16x8
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDW, CPU Feature: AVX512VBMI2

			( Uint16x8) ExtendLo2ToUint64x2() Uint64x2
				ExtendLo2ToUint64x2 converts 2 lowest vector element values to uint64.
				The result vector's elements are zero-extended.
				
				Asm: VPMOVZXWQ, CPU Feature: AVX

			( Uint16x8) ExtendLo4ToUint32x4() Uint32x4
				ExtendLo4ToUint32x4 converts 4 lowest vector element values to uint32.
				The result vector's elements are zero-extended.
				
				Asm: VPMOVZXWD, CPU Feature: AVX

			( Uint16x8) ExtendLo4ToUint64x4() Uint64x4
				ExtendLo4ToUint64x4 converts 4 lowest vector element values to uint64.
				The result vector's elements are zero-extended.
				
				Asm: VPMOVZXWQ, CPU Feature: AVX2

			( Uint16x8) ExtendToUint32() Uint32x8
				ExtendToUint32 converts element values to uint32.
				The result vector's elements are zero-extended.
				
				Asm: VPMOVZXWD, CPU Feature: AVX2

			( Uint16x8) ExtendToUint64() Uint64x8
				ExtendToUint64 converts element values to uint64.
				The result vector's elements are zero-extended.
				
				Asm: VPMOVZXWQ, CPU Feature: AVX512

			( Uint16x8) GetElem(index uint8) uint16
				GetElem retrieves a single constant-indexed element's value.
				
				index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPEXTRW, CPU Feature: AVX512

			( Uint16x8) Greater(y Uint16x8) Mask16x8
				Greater returns a mask whose elements indicate whether x > y
				
				Emulated, CPU Feature AVX

			( Uint16x8) GreaterEqual(y Uint16x8) Mask16x8
				GreaterEqual returns a mask whose elements indicate whether x >= y
				
				Emulated, CPU Feature AVX

			( Uint16x8) InterleaveHi(y Uint16x8) Uint16x8
				InterleaveHi interleaves the elements of the high halves of x and y.
				
				Asm: VPUNPCKHWD, CPU Feature: AVX

			( Uint16x8) InterleaveLo(y Uint16x8) Uint16x8
				InterleaveLo interleaves the elements of the low halves of x and y.
				
				Asm: VPUNPCKLWD, CPU Feature: AVX

			( Uint16x8) IsZero() bool
				IsZero returns true if all elements of x are zeros.
				
				This method compiles to VPTEST x, x.
				x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
				
				Asm: VPTEST, CPU Feature: AVX

			( Uint16x8) Len() int
				Len returns the number of elements in a Uint16x8

			( Uint16x8) Less(y Uint16x8) Mask16x8
				Less returns a mask whose elements indicate whether x < y
				
				Emulated, CPU Feature AVX

			( Uint16x8) LessEqual(y Uint16x8) Mask16x8
				LessEqual returns a mask whose elements indicate whether x <= y
				
				Emulated, CPU Feature AVX

			( Uint16x8) Masked(mask Mask16x8) Uint16x8
				Masked returns x but with elements zeroed where mask is false.

			( Uint16x8) Max(y Uint16x8) Uint16x8
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXUW, CPU Feature: AVX

			( Uint16x8) Merge(y Uint16x8, mask Mask16x8) Uint16x8
				Merge returns x but with elements set to y where mask is false.

			( Uint16x8) Min(y Uint16x8) Uint16x8
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINUW, CPU Feature: AVX

			( Uint16x8) Mul(y Uint16x8) Uint16x8
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VPMULLW, CPU Feature: AVX

			( Uint16x8) MulHigh(y Uint16x8) Uint16x8
				MulHigh multiplies elements and stores the high part of the result.
				
				Asm: VPMULHUW, CPU Feature: AVX

			( Uint16x8) Not() Uint16x8
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX

			( Uint16x8) NotEqual(y Uint16x8) Mask16x8
				NotEqual returns a mask whose elements indicate whether x != y
				
				Emulated, CPU Feature AVX

			( Uint16x8) OnesCount() Uint16x8
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTW, CPU Feature: AVX512BITALG

			( Uint16x8) Or(y Uint16x8) Uint16x8
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPOR, CPU Feature: AVX

			( Uint16x8) Permute(indices Uint16x8) Uint16x8
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 3 bits (values 0-7) of each element of indices is used
				
				Asm: VPERMW, CPU Feature: AVX512

			( Uint16x8) PermuteScalarsHi(a, b, c, d uint8) Uint16x8
				PermuteScalarsHi performs a permutation of vector x using the supplied indices:
				
				result = {x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4]}
				
				Parameters a,b,c,d should have values between 0 and 3.
				If a through d are constants, then an instruction will be inlined, otherwise
				a jump table is generated.
				
				Asm: VPSHUFHW, CPU Feature: AVX512

			( Uint16x8) PermuteScalarsLo(a, b, c, d uint8) Uint16x8
				PermuteScalarsLo performs a permutation of vector x using the supplied indices:
				
					result = {x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7]}
				
				Parameters a,b,c,d should have values between 0 and 3.
				If a through d are constants, then an instruction will be inlined, otherwise
				a jump table is generated.
				
				Asm: VPSHUFLW, CPU Feature: AVX512

			( Uint16x8) SetElem(index uint8, y uint16) Uint16x8
				SetElem sets a single constant-indexed element's value.
				
				index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPINSRW, CPU Feature: AVX

			( Uint16x8) ShiftAllLeft(y uint64) Uint16x8
				ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
				
				Asm: VPSLLW, CPU Feature: AVX

			( Uint16x8) ShiftAllLeftConcat(shift uint8, y Uint16x8) Uint16x8
				ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHLDW, CPU Feature: AVX512VBMI2

			( Uint16x8) ShiftAllRight(y uint64) Uint16x8
				ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
				
				Asm: VPSRLW, CPU Feature: AVX

			( Uint16x8) ShiftAllRightConcat(shift uint8, y Uint16x8) Uint16x8
				ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHRDW, CPU Feature: AVX512VBMI2

			( Uint16x8) ShiftLeft(y Uint16x8) Uint16x8
				ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
				
				Asm: VPSLLVW, CPU Feature: AVX512

			( Uint16x8) ShiftLeftConcat(y Uint16x8, z Uint16x8) Uint16x8
				ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
				
				Asm: VPSHLDVW, CPU Feature: AVX512VBMI2

			( Uint16x8) ShiftRight(y Uint16x8) Uint16x8
				ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
				
				Asm: VPSRLVW, CPU Feature: AVX512

			( Uint16x8) ShiftRightConcat(y Uint16x8, z Uint16x8) Uint16x8
				ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
				
				Asm: VPSHRDVW, CPU Feature: AVX512VBMI2

			( Uint16x8) Store(y *[8]uint16)
				Store stores a Uint16x8 to an array

			( Uint16x8) StoreSlice(s []uint16)
				StoreSlice stores x into a slice of at least 8 uint16s

			( Uint16x8) StoreSlicePart(s []uint16)
				StoreSlicePart stores the 8 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 8 or more elements, the method is equivalent to x.StoreSlice.

			( Uint16x8) String() string
				String returns a string representation of SIMD vector x

			( Uint16x8) Sub(y Uint16x8) Uint16x8
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBW, CPU Feature: AVX

			( Uint16x8) SubPairs(y Uint16x8) Uint16x8
				SubPairs horizontally subtracts adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
				
				Asm: VPHSUBW, CPU Feature: AVX

			( Uint16x8) SubSaturated(y Uint16x8) Uint16x8
				SubSaturated subtracts corresponding elements of two vectors with saturation.
				
				Asm: VPSUBUSW, CPU Feature: AVX

			( Uint16x8) TruncateToUint8() Uint8x16
				TruncateToUint8 converts element values to uint8.
				Conversion is done with truncation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVWB, CPU Feature: AVX512

			( Uint16x8) Xor(y Uint16x8) Uint16x8
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXOR, CPU Feature: AVX

		Implements (at least 2)
			 Uint16x8 : expvar.Var
			 Uint16x8 : fmt.Stringer
		As Outputs Of (at least 65)
			func BroadcastUint16x8(x uint16) Uint16x8
			func LoadUint16x8(y *[8]uint16) Uint16x8
			func LoadUint16x8Slice(s []uint16) Uint16x8
			func LoadUint16x8SlicePart(s []uint16) Uint16x8
			func Float32x4.AsUint16x8() (to Uint16x8)
			func Float64x2.AsUint16x8() (to Uint16x8)
			func Int16x8.AsUint16x8() (to Uint16x8)
			func Int32x4.AsUint16x8() (to Uint16x8)
			func Int64x2.AsUint16x8() (to Uint16x8)
			func Int8x16.AsUint16x8() (to Uint16x8)
			func Uint16x16.GetHi() Uint16x8
			func Uint16x16.GetLo() Uint16x8
			func Uint16x8.Add(y Uint16x8) Uint16x8
			func Uint16x8.AddPairs(y Uint16x8) Uint16x8
			func Uint16x8.AddSaturated(y Uint16x8) Uint16x8
			func Uint16x8.And(y Uint16x8) Uint16x8
			func Uint16x8.AndNot(y Uint16x8) Uint16x8
			func Uint16x8.Average(y Uint16x8) Uint16x8
			func Uint16x8.Broadcast128() Uint16x8
			func Uint16x8.Compress(mask Mask16x8) Uint16x8
			func Uint16x8.ConcatPermute(y Uint16x8, indices Uint16x8) Uint16x8
			func Uint16x8.Expand(mask Mask16x8) Uint16x8
			func Uint16x8.InterleaveHi(y Uint16x8) Uint16x8
			func Uint16x8.InterleaveLo(y Uint16x8) Uint16x8
			func Uint16x8.Masked(mask Mask16x8) Uint16x8
			func Uint16x8.Max(y Uint16x8) Uint16x8
			func Uint16x8.Merge(y Uint16x8, mask Mask16x8) Uint16x8
			func Uint16x8.Min(y Uint16x8) Uint16x8
			func Uint16x8.Mul(y Uint16x8) Uint16x8
			func Uint16x8.MulHigh(y Uint16x8) Uint16x8
			func Uint16x8.Not() Uint16x8
			func Uint16x8.OnesCount() Uint16x8
			func Uint16x8.Or(y Uint16x8) Uint16x8
			func Uint16x8.Permute(indices Uint16x8) Uint16x8
			func Uint16x8.PermuteScalarsHi(a, b, c, d uint8) Uint16x8
			func Uint16x8.PermuteScalarsLo(a, b, c, d uint8) Uint16x8
			func Uint16x8.SetElem(index uint8, y uint16) Uint16x8
			func Uint16x8.ShiftAllLeft(y uint64) Uint16x8
			func Uint16x8.ShiftAllLeftConcat(shift uint8, y Uint16x8) Uint16x8
			func Uint16x8.ShiftAllRight(y uint64) Uint16x8
			func Uint16x8.ShiftAllRightConcat(shift uint8, y Uint16x8) Uint16x8
			func Uint16x8.ShiftLeft(y Uint16x8) Uint16x8
			func Uint16x8.ShiftLeftConcat(y Uint16x8, z Uint16x8) Uint16x8
			func Uint16x8.ShiftRight(y Uint16x8) Uint16x8
			func Uint16x8.ShiftRightConcat(y Uint16x8, z Uint16x8) Uint16x8
			func Uint16x8.Sub(y Uint16x8) Uint16x8
			func Uint16x8.SubPairs(y Uint16x8) Uint16x8
			func Uint16x8.SubSaturated(y Uint16x8) Uint16x8
			func Uint16x8.Xor(y Uint16x8) Uint16x8
			func Uint32x4.AsUint16x8() (to Uint16x8)
			func Uint32x4.SaturateToUint16() Uint16x8
			func Uint32x4.SaturateToUint16Concat(y Uint32x4) Uint16x8
			func Uint32x4.TruncateToUint16() Uint16x8
			func Uint32x8.SaturateToUint16() Uint16x8
			func Uint32x8.TruncateToUint16() Uint16x8
			func Uint64x2.AsUint16x8() (to Uint16x8)
			func Uint64x2.SaturateToUint16() Uint16x8
			func Uint64x2.TruncateToUint16() Uint16x8
			func Uint64x4.SaturateToUint16() Uint16x8
			func Uint64x4.TruncateToUint16() Uint16x8
			func Uint64x8.SaturateToUint16() Uint16x8
			func Uint64x8.TruncateToUint16() Uint16x8
			func Uint8x16.AsUint16x8() (to Uint16x8)
			func Uint8x16.ExtendLo8ToUint16x8() Uint16x8
			func Uint8x16.SumAbsDiff(y Uint8x16) Uint16x8
		As Inputs Of (at least 39)
			func Int16x8.ConcatPermute(y Int16x8, indices Uint16x8) Int16x8
			func Int16x8.Permute(indices Uint16x8) Int16x8
			func Uint16x16.SetHi(y Uint16x8) Uint16x16
			func Uint16x16.SetLo(y Uint16x8) Uint16x16
			func Uint16x8.Add(y Uint16x8) Uint16x8
			func Uint16x8.AddPairs(y Uint16x8) Uint16x8
			func Uint16x8.AddSaturated(y Uint16x8) Uint16x8
			func Uint16x8.And(y Uint16x8) Uint16x8
			func Uint16x8.AndNot(y Uint16x8) Uint16x8
			func Uint16x8.Average(y Uint16x8) Uint16x8
			func Uint16x8.ConcatPermute(y Uint16x8, indices Uint16x8) Uint16x8
			func Uint16x8.ConcatPermute(y Uint16x8, indices Uint16x8) Uint16x8
			func Uint16x8.Equal(y Uint16x8) Mask16x8
			func Uint16x8.Greater(y Uint16x8) Mask16x8
			func Uint16x8.GreaterEqual(y Uint16x8) Mask16x8
			func Uint16x8.InterleaveHi(y Uint16x8) Uint16x8
			func Uint16x8.InterleaveLo(y Uint16x8) Uint16x8
			func Uint16x8.Less(y Uint16x8) Mask16x8
			func Uint16x8.LessEqual(y Uint16x8) Mask16x8
			func Uint16x8.Max(y Uint16x8) Uint16x8
			func Uint16x8.Merge(y Uint16x8, mask Mask16x8) Uint16x8
			func Uint16x8.Min(y Uint16x8) Uint16x8
			func Uint16x8.Mul(y Uint16x8) Uint16x8
			func Uint16x8.MulHigh(y Uint16x8) Uint16x8
			func Uint16x8.NotEqual(y Uint16x8) Mask16x8
			func Uint16x8.Or(y Uint16x8) Uint16x8
			func Uint16x8.Permute(indices Uint16x8) Uint16x8
			func Uint16x8.ShiftAllLeftConcat(shift uint8, y Uint16x8) Uint16x8
			func Uint16x8.ShiftAllRightConcat(shift uint8, y Uint16x8) Uint16x8
			func Uint16x8.ShiftLeft(y Uint16x8) Uint16x8
			func Uint16x8.ShiftLeftConcat(y Uint16x8, z Uint16x8) Uint16x8
			func Uint16x8.ShiftLeftConcat(y Uint16x8, z Uint16x8) Uint16x8
			func Uint16x8.ShiftRight(y Uint16x8) Uint16x8
			func Uint16x8.ShiftRightConcat(y Uint16x8, z Uint16x8) Uint16x8
			func Uint16x8.ShiftRightConcat(y Uint16x8, z Uint16x8) Uint16x8
			func Uint16x8.Sub(y Uint16x8) Uint16x8
			func Uint16x8.SubPairs(y Uint16x8) Uint16x8
			func Uint16x8.SubSaturated(y Uint16x8) Uint16x8
			func Uint16x8.Xor(y Uint16x8) Uint16x8

	 type Uint32x16 (struct)
		Uint32x16 is a 512-bit SIMD vector of 16 uint32

		Methods (total 64)
			( Uint32x16) Add(y Uint32x16) Uint32x16
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDD, CPU Feature: AVX512

			( Uint32x16) And(y Uint32x16) Uint32x16
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPANDD, CPU Feature: AVX512

			( Uint32x16) AndNot(y Uint32x16) Uint32x16
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDND, CPU Feature: AVX512

			( Uint32x16) AsFloat32x16() (to Float32x16)
				Float32x16 converts from Uint32x16 to Float32x16

			( Uint32x16) AsFloat64x8() (to Float64x8)
				Float64x8 converts from Uint32x16 to Float64x8

			( Uint32x16) AsInt16x32() (to Int16x32)
				Int16x32 converts from Uint32x16 to Int16x32

			( Uint32x16) AsInt32x16() (to Int32x16)
				Int32x16 converts from Uint32x16 to Int32x16

			( Uint32x16) AsInt64x8() (to Int64x8)
				Int64x8 converts from Uint32x16 to Int64x8

			( Uint32x16) AsInt8x64() (to Int8x64)
				Int8x64 converts from Uint32x16 to Int8x64

			( Uint32x16) AsUint16x32() (to Uint16x32)
				Uint16x32 converts from Uint32x16 to Uint16x32

			( Uint32x16) AsUint64x8() (to Uint64x8)
				Uint64x8 converts from Uint32x16 to Uint64x8

			( Uint32x16) AsUint8x64() (to Uint8x64)
				Uint8x64 converts from Uint32x16 to Uint8x64

			( Uint32x16) Compress(mask Mask32x16) Uint32x16
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSD, CPU Feature: AVX512

			( Uint32x16) ConcatPermute(y Uint32x16, indices Uint32x16) Uint32x16
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2D, CPU Feature: AVX512

			( Uint32x16) ConvertToFloat32() Float32x16
				ConvertToFloat32 converts element values to float32.
				
				Asm: VCVTUDQ2PS, CPU Feature: AVX512

			( Uint32x16) Equal(y Uint32x16) Mask32x16
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQD, CPU Feature: AVX512

			( Uint32x16) Expand(mask Mask32x16) Uint32x16
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDD, CPU Feature: AVX512

			( Uint32x16) GetHi() Uint32x8
				GetHi returns the upper half of x.
				
				Asm: VEXTRACTI64X4, CPU Feature: AVX512

			( Uint32x16) GetLo() Uint32x8
				GetLo returns the lower half of x.
				
				Asm: VEXTRACTI64X4, CPU Feature: AVX512

			( Uint32x16) Greater(y Uint32x16) Mask32x16
				Greater returns x greater-than y, elementwise.
				
				Asm: VPCMPUD, CPU Feature: AVX512

			( Uint32x16) GreaterEqual(y Uint32x16) Mask32x16
				GreaterEqual returns x greater-than-or-equals y, elementwise.
				
				Asm: VPCMPUD, CPU Feature: AVX512

			( Uint32x16) InterleaveHiGrouped(y Uint32x16) Uint32x16
				InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKHDQ, CPU Feature: AVX512

			( Uint32x16) InterleaveLoGrouped(y Uint32x16) Uint32x16
				InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKLDQ, CPU Feature: AVX512

			( Uint32x16) LeadingZeros() Uint32x16
				LeadingZeros counts the leading zeros of each element in x.
				
				Asm: VPLZCNTD, CPU Feature: AVX512

			( Uint32x16) Len() int
				Len returns the number of elements in a Uint32x16

			( Uint32x16) Less(y Uint32x16) Mask32x16
				Less returns x less-than y, elementwise.
				
				Asm: VPCMPUD, CPU Feature: AVX512

			( Uint32x16) LessEqual(y Uint32x16) Mask32x16
				LessEqual returns x less-than-or-equals y, elementwise.
				
				Asm: VPCMPUD, CPU Feature: AVX512

			( Uint32x16) Masked(mask Mask32x16) Uint32x16
				Masked returns x but with elements zeroed where mask is false.

			( Uint32x16) Max(y Uint32x16) Uint32x16
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXUD, CPU Feature: AVX512

			( Uint32x16) Merge(y Uint32x16, mask Mask32x16) Uint32x16
				Merge returns x but with elements set to y where m is false.

			( Uint32x16) Min(y Uint32x16) Uint32x16
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINUD, CPU Feature: AVX512

			( Uint32x16) Mul(y Uint32x16) Uint32x16
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VPMULLD, CPU Feature: AVX512

			( Uint32x16) Not() Uint32x16
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX512

			( Uint32x16) NotEqual(y Uint32x16) Mask32x16
				NotEqual returns x not-equals y, elementwise.
				
				Asm: VPCMPUD, CPU Feature: AVX512

			( Uint32x16) OnesCount() Uint32x16
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ

			( Uint32x16) Or(y Uint32x16) Uint32x16
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPORD, CPU Feature: AVX512

			( Uint32x16) Permute(indices Uint32x16) Uint32x16
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 4 bits (values 0-15) of each element of indices is used
				
				Asm: VPERMD, CPU Feature: AVX512

			( Uint32x16) PermuteScalarsGrouped(a, b, c, d uint8) Uint32x16
				PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:
				
					 result =
						 {  x[a], x[b], x[c], x[d],         x[a+4], x[b+4], x[c+4], x[d+4],
							x[a+8], x[b+8], x[c+8], x[d+8], x[a+12], x[b+12], x[c+12], x[d+12]}
				
				Parameters a,b,c,d should have values between 0 and 3.
				If a through d are constants, then an instruction will be inlined, otherwise
				a jump table is generated.
				
				Asm: VPSHUFD, CPU Feature: AVX512

			( Uint32x16) RotateAllLeft(shift uint8) Uint32x16
				RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPROLD, CPU Feature: AVX512

			( Uint32x16) RotateAllRight(shift uint8) Uint32x16
				RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPRORD, CPU Feature: AVX512

			( Uint32x16) RotateLeft(y Uint32x16) Uint32x16
				RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
				
				Asm: VPROLVD, CPU Feature: AVX512

			( Uint32x16) RotateRight(y Uint32x16) Uint32x16
				RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
				
				Asm: VPRORVD, CPU Feature: AVX512

			( Uint32x16) SaturateToUint16() Uint16x16
				SaturateToUint16 converts element values to uint16.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPMOVUSDW, CPU Feature: AVX512

			( Uint32x16) SaturateToUint16Concat(y Uint32x16) Uint16x32
				SaturateToUint16Concat converts element values to uint16.
				With each 128-bit as a group:
				The converted group from the first input vector will be packed to the lower part of the result vector,
				the converted group from the second input vector will be packed to the upper part of the result vector.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPACKUSDW, CPU Feature: AVX512

			( Uint32x16) SelectFromPairGrouped(a, b, c, d uint8, y Uint32x16) Uint32x16
				SelectFromPairGrouped returns, for each of the four 128-bit subvectors
				of the vectors x and y, the selection of four elements from  x and y,
				where selector values in the range 0-3 specify elements from x and
				values in the range 4-7 specify the 0-3 elements of y.
				When the selectors are constants and can be the selection
				can be implemented in a single instruction, it will be, otherwise
				it requires two.
				
				If the selectors are not constant this will translate to a function
				call.
				
				Asm: VSHUFPS, CPU Feature: AVX512

			( Uint32x16) SetHi(y Uint32x8) Uint32x16
				SetHi returns x with its upper half set to y.
				
				Asm: VINSERTI64X4, CPU Feature: AVX512

			( Uint32x16) SetLo(y Uint32x8) Uint32x16
				SetLo returns x with its lower half set to y.
				
				Asm: VINSERTI64X4, CPU Feature: AVX512

			( Uint32x16) ShiftAllLeft(y uint64) Uint32x16
				ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
				
				Asm: VPSLLD, CPU Feature: AVX512

			( Uint32x16) ShiftAllLeftConcat(shift uint8, y Uint32x16) Uint32x16
				ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHLDD, CPU Feature: AVX512VBMI2

			( Uint32x16) ShiftAllRight(y uint64) Uint32x16
				ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
				
				Asm: VPSRLD, CPU Feature: AVX512

			( Uint32x16) ShiftAllRightConcat(shift uint8, y Uint32x16) Uint32x16
				ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHRDD, CPU Feature: AVX512VBMI2

			( Uint32x16) ShiftLeft(y Uint32x16) Uint32x16
				ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
				
				Asm: VPSLLVD, CPU Feature: AVX512

			( Uint32x16) ShiftLeftConcat(y Uint32x16, z Uint32x16) Uint32x16
				ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
				
				Asm: VPSHLDVD, CPU Feature: AVX512VBMI2

			( Uint32x16) ShiftRight(y Uint32x16) Uint32x16
				ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
				
				Asm: VPSRLVD, CPU Feature: AVX512

			( Uint32x16) ShiftRightConcat(y Uint32x16, z Uint32x16) Uint32x16
				ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
				
				Asm: VPSHRDVD, CPU Feature: AVX512VBMI2

			( Uint32x16) Store(y *[16]uint32)
				Store stores a Uint32x16 to an array

			( Uint32x16) StoreMasked(y *[16]uint32, mask Mask32x16)
				StoreMasked stores a Uint32x16 to an array,
				at those elements enabled by mask
				
				Asm: VMOVDQU32, CPU Feature: AVX512

			( Uint32x16) StoreSlice(s []uint32)
				StoreSlice stores x into a slice of at least 16 uint32s

			( Uint32x16) StoreSlicePart(s []uint32)
				StoreSlicePart stores the 16 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 16 or more elements, the method is equivalent to x.StoreSlice.

			( Uint32x16) String() string
				String returns a string representation of SIMD vector x

			( Uint32x16) Sub(y Uint32x16) Uint32x16
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBD, CPU Feature: AVX512

			( Uint32x16) TruncateToUint16() Uint16x16
				TruncateToUint16 converts element values to uint16.
				Conversion is done with truncation on the vector elements.
				
				Asm: VPMOVDW, CPU Feature: AVX512

			( Uint32x16) TruncateToUint8() Uint8x16
				TruncateToUint8 converts element values to uint8.
				Conversion is done with truncation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVDB, CPU Feature: AVX512

			( Uint32x16) Xor(y Uint32x16) Uint32x16
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXORD, CPU Feature: AVX512

		Implements (at least 2)
			 Uint32x16 : expvar.Var
			 Uint32x16 : fmt.Stringer
		As Outputs Of (at least 54)
			func BroadcastUint32x16(x uint32) Uint32x16
			func LoadMaskedUint32x16(y *[16]uint32, mask Mask32x16) Uint32x16
			func LoadUint32x16(y *[16]uint32) Uint32x16
			func LoadUint32x16Slice(s []uint32) Uint32x16
			func LoadUint32x16SlicePart(s []uint32) Uint32x16
			func Float32x16.AsUint32x16() (to Uint32x16)
			func Float32x16.ConvertToUint32() Uint32x16
			func Float64x8.AsUint32x16() (to Uint32x16)
			func Int16x32.AsUint32x16() (to Uint32x16)
			func Int32x16.AsUint32x16() (to Uint32x16)
			func Int64x8.AsUint32x16() (to Uint32x16)
			func Int8x64.AsUint32x16() (to Uint32x16)
			func Uint16x16.ExtendToUint32() Uint32x16
			func Uint16x32.AsUint32x16() (to Uint32x16)
			func Uint32x16.Add(y Uint32x16) Uint32x16
			func Uint32x16.And(y Uint32x16) Uint32x16
			func Uint32x16.AndNot(y Uint32x16) Uint32x16
			func Uint32x16.Compress(mask Mask32x16) Uint32x16
			func Uint32x16.ConcatPermute(y Uint32x16, indices Uint32x16) Uint32x16
			func Uint32x16.Expand(mask Mask32x16) Uint32x16
			func Uint32x16.InterleaveHiGrouped(y Uint32x16) Uint32x16
			func Uint32x16.InterleaveLoGrouped(y Uint32x16) Uint32x16
			func Uint32x16.LeadingZeros() Uint32x16
			func Uint32x16.Masked(mask Mask32x16) Uint32x16
			func Uint32x16.Max(y Uint32x16) Uint32x16
			func Uint32x16.Merge(y Uint32x16, mask Mask32x16) Uint32x16
			func Uint32x16.Min(y Uint32x16) Uint32x16
			func Uint32x16.Mul(y Uint32x16) Uint32x16
			func Uint32x16.Not() Uint32x16
			func Uint32x16.OnesCount() Uint32x16
			func Uint32x16.Or(y Uint32x16) Uint32x16
			func Uint32x16.Permute(indices Uint32x16) Uint32x16
			func Uint32x16.PermuteScalarsGrouped(a, b, c, d uint8) Uint32x16
			func Uint32x16.RotateAllLeft(shift uint8) Uint32x16
			func Uint32x16.RotateAllRight(shift uint8) Uint32x16
			func Uint32x16.RotateLeft(y Uint32x16) Uint32x16
			func Uint32x16.RotateRight(y Uint32x16) Uint32x16
			func Uint32x16.SelectFromPairGrouped(a, b, c, d uint8, y Uint32x16) Uint32x16
			func Uint32x16.SetHi(y Uint32x8) Uint32x16
			func Uint32x16.SetLo(y Uint32x8) Uint32x16
			func Uint32x16.ShiftAllLeft(y uint64) Uint32x16
			func Uint32x16.ShiftAllLeftConcat(shift uint8, y Uint32x16) Uint32x16
			func Uint32x16.ShiftAllRight(y uint64) Uint32x16
			func Uint32x16.ShiftAllRightConcat(shift uint8, y Uint32x16) Uint32x16
			func Uint32x16.ShiftLeft(y Uint32x16) Uint32x16
			func Uint32x16.ShiftLeftConcat(y Uint32x16, z Uint32x16) Uint32x16
			func Uint32x16.ShiftRight(y Uint32x16) Uint32x16
			func Uint32x16.ShiftRightConcat(y Uint32x16, z Uint32x16) Uint32x16
			func Uint32x16.Sub(y Uint32x16) Uint32x16
			func Uint32x16.Xor(y Uint32x16) Uint32x16
			func Uint32x4.Broadcast512() Uint32x16
			func Uint64x8.AsUint32x16() (to Uint32x16)
			func Uint8x16.ExtendToUint32() Uint32x16
			func Uint8x64.AsUint32x16() (to Uint32x16)
		As Inputs Of (at least 41)
			func Float32x16.ConcatPermute(y Float32x16, indices Uint32x16) Float32x16
			func Float32x16.Permute(indices Uint32x16) Float32x16
			func Int32x16.ConcatPermute(y Int32x16, indices Uint32x16) Int32x16
			func Int32x16.Permute(indices Uint32x16) Int32x16
			func Uint32x16.Add(y Uint32x16) Uint32x16
			func Uint32x16.And(y Uint32x16) Uint32x16
			func Uint32x16.AndNot(y Uint32x16) Uint32x16
			func Uint32x16.ConcatPermute(y Uint32x16, indices Uint32x16) Uint32x16
			func Uint32x16.ConcatPermute(y Uint32x16, indices Uint32x16) Uint32x16
			func Uint32x16.Equal(y Uint32x16) Mask32x16
			func Uint32x16.Greater(y Uint32x16) Mask32x16
			func Uint32x16.GreaterEqual(y Uint32x16) Mask32x16
			func Uint32x16.InterleaveHiGrouped(y Uint32x16) Uint32x16
			func Uint32x16.InterleaveLoGrouped(y Uint32x16) Uint32x16
			func Uint32x16.Less(y Uint32x16) Mask32x16
			func Uint32x16.LessEqual(y Uint32x16) Mask32x16
			func Uint32x16.Max(y Uint32x16) Uint32x16
			func Uint32x16.Merge(y Uint32x16, mask Mask32x16) Uint32x16
			func Uint32x16.Min(y Uint32x16) Uint32x16
			func Uint32x16.Mul(y Uint32x16) Uint32x16
			func Uint32x16.NotEqual(y Uint32x16) Mask32x16
			func Uint32x16.Or(y Uint32x16) Uint32x16
			func Uint32x16.Permute(indices Uint32x16) Uint32x16
			func Uint32x16.RotateLeft(y Uint32x16) Uint32x16
			func Uint32x16.RotateRight(y Uint32x16) Uint32x16
			func Uint32x16.SaturateToUint16Concat(y Uint32x16) Uint16x32
			func Uint32x16.SelectFromPairGrouped(a, b, c, d uint8, y Uint32x16) Uint32x16
			func Uint32x16.ShiftAllLeftConcat(shift uint8, y Uint32x16) Uint32x16
			func Uint32x16.ShiftAllRightConcat(shift uint8, y Uint32x16) Uint32x16
			func Uint32x16.ShiftLeft(y Uint32x16) Uint32x16
			func Uint32x16.ShiftLeftConcat(y Uint32x16, z Uint32x16) Uint32x16
			func Uint32x16.ShiftLeftConcat(y Uint32x16, z Uint32x16) Uint32x16
			func Uint32x16.ShiftRight(y Uint32x16) Uint32x16
			func Uint32x16.ShiftRightConcat(y Uint32x16, z Uint32x16) Uint32x16
			func Uint32x16.ShiftRightConcat(y Uint32x16, z Uint32x16) Uint32x16
			func Uint32x16.Sub(y Uint32x16) Uint32x16
			func Uint32x16.Xor(y Uint32x16) Uint32x16
			func Uint8x64.AESDecryptLastRound(y Uint32x16) Uint8x64
			func Uint8x64.AESDecryptOneRound(y Uint32x16) Uint8x64
			func Uint8x64.AESEncryptLastRound(y Uint32x16) Uint8x64
			func Uint8x64.AESEncryptOneRound(y Uint32x16) Uint8x64

	 type Uint32x4 (struct)
		Uint32x4 is a 128-bit SIMD vector of 4 uint32

		Methods (total 80)
			( Uint32x4) AESInvMixColumns() Uint32x4
				AESInvMixColumns performs the InvMixColumns operation in AES cipher algorithm defined in FIPS 197.
				x is the chunk of w array in use.
				result = InvMixColumns(x)
				
				Asm: VAESIMC, CPU Feature: AVX, AES

			( Uint32x4) AESRoundKeyGenAssist(rconVal uint8) Uint32x4
				AESRoundKeyGenAssist performs some components of KeyExpansion in AES cipher algorithm defined in FIPS 197.
				x is an array of AES words, but only x[0] and x[2] are used.
				r is a value from the Rcon constant array.
				result[0] = XOR(SubWord(RotWord(x[0])), r)
				result[1] = SubWord(x[1])
				result[2] = XOR(SubWord(RotWord(x[2])), r)
				result[3] = SubWord(x[3])
				
				rconVal results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VAESKEYGENASSIST, CPU Feature: AVX, AES

			( Uint32x4) Add(y Uint32x4) Uint32x4
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDD, CPU Feature: AVX

			( Uint32x4) AddPairs(y Uint32x4) Uint32x4
				AddPairs horizontally adds adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
				
				Asm: VPHADDD, CPU Feature: AVX

			( Uint32x4) And(y Uint32x4) Uint32x4
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPAND, CPU Feature: AVX

			( Uint32x4) AndNot(y Uint32x4) Uint32x4
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDN, CPU Feature: AVX

			( Uint32x4) AsFloat32x4() (to Float32x4)
				Float32x4 converts from Uint32x4 to Float32x4

			( Uint32x4) AsFloat64x2() (to Float64x2)
				Float64x2 converts from Uint32x4 to Float64x2

			( Uint32x4) AsInt16x8() (to Int16x8)
				Int16x8 converts from Uint32x4 to Int16x8

			( Uint32x4) AsInt32x4() (to Int32x4)
				Int32x4 converts from Uint32x4 to Int32x4

			( Uint32x4) AsInt64x2() (to Int64x2)
				Int64x2 converts from Uint32x4 to Int64x2

			( Uint32x4) AsInt8x16() (to Int8x16)
				Int8x16 converts from Uint32x4 to Int8x16

			( Uint32x4) AsUint16x8() (to Uint16x8)
				Uint16x8 converts from Uint32x4 to Uint16x8

			( Uint32x4) AsUint64x2() (to Uint64x2)
				Uint64x2 converts from Uint32x4 to Uint64x2

			( Uint32x4) AsUint8x16() (to Uint8x16)
				Uint8x16 converts from Uint32x4 to Uint8x16

			( Uint32x4) Broadcast128() Uint32x4
				Broadcast128 copies element zero of its (128-bit) input to all elements of
				the 128-bit output vector.
				
				Asm: VPBROADCASTD, CPU Feature: AVX2

			( Uint32x4) Broadcast256() Uint32x8
				Broadcast256 copies element zero of its (128-bit) input to all elements of
				the 256-bit output vector.
				
				Asm: VPBROADCASTD, CPU Feature: AVX2

			( Uint32x4) Broadcast512() Uint32x16
				Broadcast512 copies element zero of its (128-bit) input to all elements of
				the 512-bit output vector.
				
				Asm: VPBROADCASTD, CPU Feature: AVX512

			( Uint32x4) Compress(mask Mask32x4) Uint32x4
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSD, CPU Feature: AVX512

			( Uint32x4) ConcatPermute(y Uint32x4, indices Uint32x4) Uint32x4
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2D, CPU Feature: AVX512

			( Uint32x4) ConvertToFloat32() Float32x4
				ConvertToFloat32 converts element values to float32.
				
				Asm: VCVTUDQ2PS, CPU Feature: AVX512

			( Uint32x4) ConvertToFloat64() Float64x4
				ConvertToFloat64 converts element values to float64.
				
				Asm: VCVTUDQ2PD, CPU Feature: AVX512

			( Uint32x4) Equal(y Uint32x4) Mask32x4
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQD, CPU Feature: AVX

			( Uint32x4) Expand(mask Mask32x4) Uint32x4
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDD, CPU Feature: AVX512

			( Uint32x4) ExtendLo2ToUint64x2() Uint64x2
				ExtendLo2ToUint64x2 converts 2 lowest vector element values to uint64.
				The result vector's elements are zero-extended.
				
				Asm: VPMOVZXDQ, CPU Feature: AVX

			( Uint32x4) ExtendToUint64() Uint64x4
				ExtendToUint64 converts element values to uint64.
				The result vector's elements are zero-extended.
				
				Asm: VPMOVZXDQ, CPU Feature: AVX2

			( Uint32x4) GetElem(index uint8) uint32
				GetElem retrieves a single constant-indexed element's value.
				
				index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPEXTRD, CPU Feature: AVX

			( Uint32x4) Greater(y Uint32x4) Mask32x4
				Greater returns a mask whose elements indicate whether x > y
				
				Emulated, CPU Feature AVX

			( Uint32x4) GreaterEqual(y Uint32x4) Mask32x4
				GreaterEqual returns a mask whose elements indicate whether x >= y
				
				Emulated, CPU Feature AVX

			( Uint32x4) InterleaveHi(y Uint32x4) Uint32x4
				InterleaveHi interleaves the elements of the high halves of x and y.
				
				Asm: VPUNPCKHDQ, CPU Feature: AVX

			( Uint32x4) InterleaveLo(y Uint32x4) Uint32x4
				InterleaveLo interleaves the elements of the low halves of x and y.
				
				Asm: VPUNPCKLDQ, CPU Feature: AVX

			( Uint32x4) IsZero() bool
				IsZero returns true if all elements of x are zeros.
				
				This method compiles to VPTEST x, x.
				x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
				
				Asm: VPTEST, CPU Feature: AVX

			( Uint32x4) LeadingZeros() Uint32x4
				LeadingZeros counts the leading zeros of each element in x.
				
				Asm: VPLZCNTD, CPU Feature: AVX512

			( Uint32x4) Len() int
				Len returns the number of elements in a Uint32x4

			( Uint32x4) Less(y Uint32x4) Mask32x4
				Less returns a mask whose elements indicate whether x < y
				
				Emulated, CPU Feature AVX

			( Uint32x4) LessEqual(y Uint32x4) Mask32x4
				LessEqual returns a mask whose elements indicate whether x <= y
				
				Emulated, CPU Feature AVX

			( Uint32x4) Masked(mask Mask32x4) Uint32x4
				Masked returns x but with elements zeroed where mask is false.

			( Uint32x4) Max(y Uint32x4) Uint32x4
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXUD, CPU Feature: AVX

			( Uint32x4) Merge(y Uint32x4, mask Mask32x4) Uint32x4
				Merge returns x but with elements set to y where mask is false.

			( Uint32x4) Min(y Uint32x4) Uint32x4
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINUD, CPU Feature: AVX

			( Uint32x4) Mul(y Uint32x4) Uint32x4
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VPMULLD, CPU Feature: AVX

			( Uint32x4) MulEvenWiden(y Uint32x4) Uint64x2
				MulEvenWiden multiplies even-indexed elements, widening the result.
				Result[i] = v1.Even[i] * v2.Even[i].
				
				Asm: VPMULUDQ, CPU Feature: AVX

			( Uint32x4) Not() Uint32x4
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX

			( Uint32x4) NotEqual(y Uint32x4) Mask32x4
				NotEqual returns a mask whose elements indicate whether x != y
				
				Emulated, CPU Feature AVX

			( Uint32x4) OnesCount() Uint32x4
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ

			( Uint32x4) Or(y Uint32x4) Uint32x4
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPOR, CPU Feature: AVX

			( Uint32x4) PermuteScalars(a, b, c, d uint8) Uint32x4
				PermuteScalars performs a permutation of vector x's elements using the supplied indices:
				
					result = {x[a], x[b], x[c], x[d]}
				
				Parameters a,b,c,d should have values between 0 and 3.
				If a through d are constants, then an instruction will be inlined, otherwise
				a jump table may be generated.
				
				Asm: VPSHUFD, CPU Feature: AVX

			( Uint32x4) RotateAllLeft(shift uint8) Uint32x4
				RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPROLD, CPU Feature: AVX512

			( Uint32x4) RotateAllRight(shift uint8) Uint32x4
				RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPRORD, CPU Feature: AVX512

			( Uint32x4) RotateLeft(y Uint32x4) Uint32x4
				RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
				
				Asm: VPROLVD, CPU Feature: AVX512

			( Uint32x4) RotateRight(y Uint32x4) Uint32x4
				RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
				
				Asm: VPRORVD, CPU Feature: AVX512

			( Uint32x4) SHA1FourRounds(constant uint8, y Uint32x4) Uint32x4
				SHA1FourRounds performs 4 rounds of B loop in SHA1 algorithm defined in FIPS 180-4.
				x contains the state variables a, b, c and d from upper to lower order.
				y contains the W array elements (with the state variable e added to the upper element) from upper to lower order.
				result = the state variables a', b', c', d' updated after 4 rounds.
				constant = 0 for the first 20 rounds of the loop, 1 for the next 20 rounds of the loop..., 3 for the last 20 rounds of the loop.
				
				constant results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: SHA1RNDS4, CPU Feature: SHA

			( Uint32x4) SHA1Message1(y Uint32x4) Uint32x4
				SHA1Message1 does the XORing of 1 in SHA1 algorithm defined in FIPS 180-4.
				x = {W3, W2, W1, W0}
				y = {0, 0, W5, W4}
				result = {W3^W5, W2^W4, W1^W3, W0^W2}.
				
				Asm: SHA1MSG1, CPU Feature: SHA

			( Uint32x4) SHA1Message2(y Uint32x4) Uint32x4
				SHA1Message2 does the calculation of 3 and 4 in SHA1 algorithm defined in FIPS 180-4.
				x = result of 2.
				y = {W15, W14, W13}
				result = {W19, W18, W17, W16}
				
				Asm: SHA1MSG2, CPU Feature: SHA

			( Uint32x4) SHA1NextE(y Uint32x4) Uint32x4
				SHA1NextE calculates the state variable e' updated after 4 rounds in SHA1 algorithm defined in FIPS 180-4.
				x contains the state variable a (before the 4 rounds), placed in the upper element.
				y is the elements of W array for next 4 rounds from upper to lower order.
				result = the elements of the W array for the next 4 rounds, with the updated state variable e' added to the upper element,
				from upper to lower order.
				For the last round of the loop, you can specify zero for y to obtain the e' value itself, or better off specifying H4:0:0:0
				for y to get e' added to H4. (Note that the value of e' is computed only from x, and values of y don't affect the
				computation of the value of e'.)
				
				Asm: SHA1NEXTE, CPU Feature: SHA

			( Uint32x4) SHA256Message1(y Uint32x4) Uint32x4
				SHA256Message1 does the sigma and addtion of 1 in SHA1 algorithm defined in FIPS 180-4.
				x = {W0, W1, W2, W3}
				y = {W4, 0, 0, 0}
				result = {W0+σ(W1), W1+σ(W2), W2+σ(W3), W3+σ(W4)}
				
				Asm: SHA256MSG1, CPU Feature: SHA

			( Uint32x4) SHA256Message2(y Uint32x4) Uint32x4
				SHA256Message2 does the sigma and addition of 3 in SHA1 algorithm defined in FIPS 180-4.
				x = result of 2
				y = {0, 0, W14, W15}
				result = {W16, W17, W18, W19}
				
				Asm: SHA256MSG2, CPU Feature: SHA

			( Uint32x4) SHA256TwoRounds(y Uint32x4, z Uint32x4) Uint32x4
				SHA256TwoRounds does 2 rounds of B loop to calculate updated state variables in SHA1 algorithm defined in FIPS 180-4.
				x = {h, g, d, c}
				y = {f, e, b, a}
				z = {W0+K0, W1+K1}
				result = {f', e', b', a'}
				The K array is a 64-DWORD constant array defined in page 11 of FIPS 180-4. Each element of the K array is to be added to
				the corresponding element of the W array to make the input data z.
				The updated state variables c', d', g', h' are not returned by this instruction, because they are equal to the input data
				y (the state variables a, b, e, f before the 2 rounds).
				
				Asm: SHA256RNDS2, CPU Feature: SHA

			( Uint32x4) SaturateToUint16() Uint16x8
				SaturateToUint16 converts element values to uint16.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPMOVUSDW, CPU Feature: AVX512

			( Uint32x4) SaturateToUint16Concat(y Uint32x4) Uint16x8
				SaturateToUint16Concat converts element values to uint16.
				With each 128-bit as a group:
				The converted group from the first input vector will be packed to the lower part of the result vector,
				the converted group from the second input vector will be packed to the upper part of the result vector.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPACKUSDW, CPU Feature: AVX

			( Uint32x4) SelectFromPair(a, b, c, d uint8, y Uint32x4) Uint32x4
				SelectFromPair returns the selection of four elements from the two
				vectors x and y, where selector values in the range 0-3 specify
				elements from x and values in the range 4-7 specify the 0-3 elements
				of y.  When the selectors are constants and can be the selection
				can be implemented in a single instruction, it will be, otherwise
				it requires two. a is the source index of the least element in the
				output, and b, c, and d are the indices of the 2nd, 3rd, and 4th
				elements in the output.  For example,
				{1,2,4,8}.SelectFromPair(2,3,5,7,{9,25,49,81}) returns {4,8,25,81}
				
				If the selectors are not constant this will translate to a function
				call.
				
				Asm: VSHUFPS, CPU Feature: AVX

			( Uint32x4) SetElem(index uint8, y uint32) Uint32x4
				SetElem sets a single constant-indexed element's value.
				
				index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPINSRD, CPU Feature: AVX

			( Uint32x4) ShiftAllLeft(y uint64) Uint32x4
				ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
				
				Asm: VPSLLD, CPU Feature: AVX

			( Uint32x4) ShiftAllLeftConcat(shift uint8, y Uint32x4) Uint32x4
				ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHLDD, CPU Feature: AVX512VBMI2

			( Uint32x4) ShiftAllRight(y uint64) Uint32x4
				ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
				
				Asm: VPSRLD, CPU Feature: AVX

			( Uint32x4) ShiftAllRightConcat(shift uint8, y Uint32x4) Uint32x4
				ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHRDD, CPU Feature: AVX512VBMI2

			( Uint32x4) ShiftLeft(y Uint32x4) Uint32x4
				ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
				
				Asm: VPSLLVD, CPU Feature: AVX2

			( Uint32x4) ShiftLeftConcat(y Uint32x4, z Uint32x4) Uint32x4
				ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
				
				Asm: VPSHLDVD, CPU Feature: AVX512VBMI2

			( Uint32x4) ShiftRight(y Uint32x4) Uint32x4
				ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
				
				Asm: VPSRLVD, CPU Feature: AVX2

			( Uint32x4) ShiftRightConcat(y Uint32x4, z Uint32x4) Uint32x4
				ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
				
				Asm: VPSHRDVD, CPU Feature: AVX512VBMI2

			( Uint32x4) Store(y *[4]uint32)
				Store stores a Uint32x4 to an array

			( Uint32x4) StoreMasked(y *[4]uint32, mask Mask32x4)
				StoreMasked stores a Uint32x4 to an array,
				at those elements enabled by mask
				
				Asm: VMASKMOVD, CPU Feature: AVX2

			( Uint32x4) StoreSlice(s []uint32)
				StoreSlice stores x into a slice of at least 4 uint32s

			( Uint32x4) StoreSlicePart(s []uint32)
				StoreSlicePart stores the 4 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 4 or more elements, the method is equivalent to x.StoreSlice.

			( Uint32x4) String() string
				String returns a string representation of SIMD vector x

			( Uint32x4) Sub(y Uint32x4) Uint32x4
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBD, CPU Feature: AVX

			( Uint32x4) SubPairs(y Uint32x4) Uint32x4
				SubPairs horizontally subtracts adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
				
				Asm: VPHSUBD, CPU Feature: AVX

			( Uint32x4) TruncateToUint16() Uint16x8
				TruncateToUint16 converts element values to uint16.
				Conversion is done with truncation on the vector elements.
				
				Asm: VPMOVDW, CPU Feature: AVX512

			( Uint32x4) TruncateToUint8() Uint8x16
				TruncateToUint8 converts element values to uint8.
				Conversion is done with truncation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVDB, CPU Feature: AVX512

			( Uint32x4) Xor(y Uint32x4) Uint32x4
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXOR, CPU Feature: AVX

		Implements (at least 2)
			 Uint32x4 : expvar.Var
			 Uint32x4 : fmt.Stringer
		As Outputs Of (at least 71)
			func BroadcastUint32x4(x uint32) Uint32x4
			func LoadMaskedUint32x4(y *[4]uint32, mask Mask32x4) Uint32x4
			func LoadUint32x4(y *[4]uint32) Uint32x4
			func LoadUint32x4Slice(s []uint32) Uint32x4
			func LoadUint32x4SlicePart(s []uint32) Uint32x4
			func Float32x4.AsUint32x4() (to Uint32x4)
			func Float32x4.ConvertToUint32() Uint32x4
			func Float64x2.AsUint32x4() (to Uint32x4)
			func Float64x2.ConvertToUint32() Uint32x4
			func Float64x4.ConvertToUint32() Uint32x4
			func Int16x8.AsUint32x4() (to Uint32x4)
			func Int32x4.AsUint32x4() (to Uint32x4)
			func Int64x2.AsUint32x4() (to Uint32x4)
			func Int8x16.AsUint32x4() (to Uint32x4)
			func Uint16x8.AsUint32x4() (to Uint32x4)
			func Uint16x8.ExtendLo4ToUint32x4() Uint32x4
			func Uint32x4.Add(y Uint32x4) Uint32x4
			func Uint32x4.AddPairs(y Uint32x4) Uint32x4
			func Uint32x4.AESInvMixColumns() Uint32x4
			func Uint32x4.AESRoundKeyGenAssist(rconVal uint8) Uint32x4
			func Uint32x4.And(y Uint32x4) Uint32x4
			func Uint32x4.AndNot(y Uint32x4) Uint32x4
			func Uint32x4.Broadcast128() Uint32x4
			func Uint32x4.Compress(mask Mask32x4) Uint32x4
			func Uint32x4.ConcatPermute(y Uint32x4, indices Uint32x4) Uint32x4
			func Uint32x4.Expand(mask Mask32x4) Uint32x4
			func Uint32x4.InterleaveHi(y Uint32x4) Uint32x4
			func Uint32x4.InterleaveLo(y Uint32x4) Uint32x4
			func Uint32x4.LeadingZeros() Uint32x4
			func Uint32x4.Masked(mask Mask32x4) Uint32x4
			func Uint32x4.Max(y Uint32x4) Uint32x4
			func Uint32x4.Merge(y Uint32x4, mask Mask32x4) Uint32x4
			func Uint32x4.Min(y Uint32x4) Uint32x4
			func Uint32x4.Mul(y Uint32x4) Uint32x4
			func Uint32x4.Not() Uint32x4
			func Uint32x4.OnesCount() Uint32x4
			func Uint32x4.Or(y Uint32x4) Uint32x4
			func Uint32x4.PermuteScalars(a, b, c, d uint8) Uint32x4
			func Uint32x4.RotateAllLeft(shift uint8) Uint32x4
			func Uint32x4.RotateAllRight(shift uint8) Uint32x4
			func Uint32x4.RotateLeft(y Uint32x4) Uint32x4
			func Uint32x4.RotateRight(y Uint32x4) Uint32x4
			func Uint32x4.SelectFromPair(a, b, c, d uint8, y Uint32x4) Uint32x4
			func Uint32x4.SetElem(index uint8, y uint32) Uint32x4
			func Uint32x4.SHA1FourRounds(constant uint8, y Uint32x4) Uint32x4
			func Uint32x4.SHA1Message1(y Uint32x4) Uint32x4
			func Uint32x4.SHA1Message2(y Uint32x4) Uint32x4
			func Uint32x4.SHA1NextE(y Uint32x4) Uint32x4
			func Uint32x4.SHA256Message1(y Uint32x4) Uint32x4
			func Uint32x4.SHA256Message2(y Uint32x4) Uint32x4
			func Uint32x4.SHA256TwoRounds(y Uint32x4, z Uint32x4) Uint32x4
			func Uint32x4.ShiftAllLeft(y uint64) Uint32x4
			func Uint32x4.ShiftAllLeftConcat(shift uint8, y Uint32x4) Uint32x4
			func Uint32x4.ShiftAllRight(y uint64) Uint32x4
			func Uint32x4.ShiftAllRightConcat(shift uint8, y Uint32x4) Uint32x4
			func Uint32x4.ShiftLeft(y Uint32x4) Uint32x4
			func Uint32x4.ShiftLeftConcat(y Uint32x4, z Uint32x4) Uint32x4
			func Uint32x4.ShiftRight(y Uint32x4) Uint32x4
			func Uint32x4.ShiftRightConcat(y Uint32x4, z Uint32x4) Uint32x4
			func Uint32x4.Sub(y Uint32x4) Uint32x4
			func Uint32x4.SubPairs(y Uint32x4) Uint32x4
			func Uint32x4.Xor(y Uint32x4) Uint32x4
			func Uint32x8.GetHi() Uint32x4
			func Uint32x8.GetLo() Uint32x4
			func Uint64x2.AsUint32x4() (to Uint32x4)
			func Uint64x2.SaturateToUint32() Uint32x4
			func Uint64x2.TruncateToUint32() Uint32x4
			func Uint64x4.SaturateToUint32() Uint32x4
			func Uint64x4.TruncateToUint32() Uint32x4
			func Uint8x16.AsUint32x4() (to Uint32x4)
			func Uint8x16.ExtendLo4ToUint32x4() Uint32x4
		As Inputs Of (at least 51)
			func Float32x4.ConcatPermute(y Float32x4, indices Uint32x4) Float32x4
			func Int32x4.ConcatPermute(y Int32x4, indices Uint32x4) Int32x4
			func Uint32x4.Add(y Uint32x4) Uint32x4
			func Uint32x4.AddPairs(y Uint32x4) Uint32x4
			func Uint32x4.And(y Uint32x4) Uint32x4
			func Uint32x4.AndNot(y Uint32x4) Uint32x4
			func Uint32x4.ConcatPermute(y Uint32x4, indices Uint32x4) Uint32x4
			func Uint32x4.ConcatPermute(y Uint32x4, indices Uint32x4) Uint32x4
			func Uint32x4.Equal(y Uint32x4) Mask32x4
			func Uint32x4.Greater(y Uint32x4) Mask32x4
			func Uint32x4.GreaterEqual(y Uint32x4) Mask32x4
			func Uint32x4.InterleaveHi(y Uint32x4) Uint32x4
			func Uint32x4.InterleaveLo(y Uint32x4) Uint32x4
			func Uint32x4.Less(y Uint32x4) Mask32x4
			func Uint32x4.LessEqual(y Uint32x4) Mask32x4
			func Uint32x4.Max(y Uint32x4) Uint32x4
			func Uint32x4.Merge(y Uint32x4, mask Mask32x4) Uint32x4
			func Uint32x4.Min(y Uint32x4) Uint32x4
			func Uint32x4.Mul(y Uint32x4) Uint32x4
			func Uint32x4.MulEvenWiden(y Uint32x4) Uint64x2
			func Uint32x4.NotEqual(y Uint32x4) Mask32x4
			func Uint32x4.Or(y Uint32x4) Uint32x4
			func Uint32x4.RotateLeft(y Uint32x4) Uint32x4
			func Uint32x4.RotateRight(y Uint32x4) Uint32x4
			func Uint32x4.SaturateToUint16Concat(y Uint32x4) Uint16x8
			func Uint32x4.SelectFromPair(a, b, c, d uint8, y Uint32x4) Uint32x4
			func Uint32x4.SHA1FourRounds(constant uint8, y Uint32x4) Uint32x4
			func Uint32x4.SHA1Message1(y Uint32x4) Uint32x4
			func Uint32x4.SHA1Message2(y Uint32x4) Uint32x4
			func Uint32x4.SHA1NextE(y Uint32x4) Uint32x4
			func Uint32x4.SHA256Message1(y Uint32x4) Uint32x4
			func Uint32x4.SHA256Message2(y Uint32x4) Uint32x4
			func Uint32x4.SHA256TwoRounds(y Uint32x4, z Uint32x4) Uint32x4
			func Uint32x4.SHA256TwoRounds(y Uint32x4, z Uint32x4) Uint32x4
			func Uint32x4.ShiftAllLeftConcat(shift uint8, y Uint32x4) Uint32x4
			func Uint32x4.ShiftAllRightConcat(shift uint8, y Uint32x4) Uint32x4
			func Uint32x4.ShiftLeft(y Uint32x4) Uint32x4
			func Uint32x4.ShiftLeftConcat(y Uint32x4, z Uint32x4) Uint32x4
			func Uint32x4.ShiftLeftConcat(y Uint32x4, z Uint32x4) Uint32x4
			func Uint32x4.ShiftRight(y Uint32x4) Uint32x4
			func Uint32x4.ShiftRightConcat(y Uint32x4, z Uint32x4) Uint32x4
			func Uint32x4.ShiftRightConcat(y Uint32x4, z Uint32x4) Uint32x4
			func Uint32x4.Sub(y Uint32x4) Uint32x4
			func Uint32x4.SubPairs(y Uint32x4) Uint32x4
			func Uint32x4.Xor(y Uint32x4) Uint32x4
			func Uint32x8.SetHi(y Uint32x4) Uint32x8
			func Uint32x8.SetLo(y Uint32x4) Uint32x8
			func Uint8x16.AESDecryptLastRound(y Uint32x4) Uint8x16
			func Uint8x16.AESDecryptOneRound(y Uint32x4) Uint8x16
			func Uint8x16.AESEncryptLastRound(y Uint32x4) Uint8x16
			func Uint8x16.AESEncryptOneRound(y Uint32x4) Uint8x16

	 type Uint32x8 (struct)
		Uint32x8 is a 256-bit SIMD vector of 8 uint32

		Methods (total 71)
			( Uint32x8) Add(y Uint32x8) Uint32x8
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDD, CPU Feature: AVX2

			( Uint32x8) AddPairs(y Uint32x8) Uint32x8
				AddPairs horizontally adds adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
				
				Asm: VPHADDD, CPU Feature: AVX2

			( Uint32x8) And(y Uint32x8) Uint32x8
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPAND, CPU Feature: AVX2

			( Uint32x8) AndNot(y Uint32x8) Uint32x8
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDN, CPU Feature: AVX2

			( Uint32x8) AsFloat32x8() (to Float32x8)
				Float32x8 converts from Uint32x8 to Float32x8

			( Uint32x8) AsFloat64x4() (to Float64x4)
				Float64x4 converts from Uint32x8 to Float64x4

			( Uint32x8) AsInt16x16() (to Int16x16)
				Int16x16 converts from Uint32x8 to Int16x16

			( Uint32x8) AsInt32x8() (to Int32x8)
				Int32x8 converts from Uint32x8 to Int32x8

			( Uint32x8) AsInt64x4() (to Int64x4)
				Int64x4 converts from Uint32x8 to Int64x4

			( Uint32x8) AsInt8x32() (to Int8x32)
				Int8x32 converts from Uint32x8 to Int8x32

			( Uint32x8) AsUint16x16() (to Uint16x16)
				Uint16x16 converts from Uint32x8 to Uint16x16

			( Uint32x8) AsUint64x4() (to Uint64x4)
				Uint64x4 converts from Uint32x8 to Uint64x4

			( Uint32x8) AsUint8x32() (to Uint8x32)
				Uint8x32 converts from Uint32x8 to Uint8x32

			( Uint32x8) Compress(mask Mask32x8) Uint32x8
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSD, CPU Feature: AVX512

			( Uint32x8) ConcatPermute(y Uint32x8, indices Uint32x8) Uint32x8
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2D, CPU Feature: AVX512

			( Uint32x8) ConvertToFloat32() Float32x8
				ConvertToFloat32 converts element values to float32.
				
				Asm: VCVTUDQ2PS, CPU Feature: AVX512

			( Uint32x8) ConvertToFloat64() Float64x8
				ConvertToFloat64 converts element values to float64.
				
				Asm: VCVTUDQ2PD, CPU Feature: AVX512

			( Uint32x8) Equal(y Uint32x8) Mask32x8
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQD, CPU Feature: AVX2

			( Uint32x8) Expand(mask Mask32x8) Uint32x8
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDD, CPU Feature: AVX512

			( Uint32x8) ExtendToUint64() Uint64x8
				ExtendToUint64 converts element values to uint64.
				The result vector's elements are zero-extended.
				
				Asm: VPMOVZXDQ, CPU Feature: AVX512

			( Uint32x8) GetHi() Uint32x4
				GetHi returns the upper half of x.
				
				Asm: VEXTRACTI128, CPU Feature: AVX2

			( Uint32x8) GetLo() Uint32x4
				GetLo returns the lower half of x.
				
				Asm: VEXTRACTI128, CPU Feature: AVX2

			( Uint32x8) Greater(y Uint32x8) Mask32x8
				Greater returns a mask whose elements indicate whether x > y
				
				Emulated, CPU Feature AVX2

			( Uint32x8) GreaterEqual(y Uint32x8) Mask32x8
				GreaterEqual returns a mask whose elements indicate whether x >= y
				
				Emulated, CPU Feature AVX2

			( Uint32x8) InterleaveHiGrouped(y Uint32x8) Uint32x8
				InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKHDQ, CPU Feature: AVX2

			( Uint32x8) InterleaveLoGrouped(y Uint32x8) Uint32x8
				InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKLDQ, CPU Feature: AVX2

			( Uint32x8) IsZero() bool
				IsZero returns true if all elements of x are zeros.
				
				This method compiles to VPTEST x, x.
				x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
				
				Asm: VPTEST, CPU Feature: AVX

			( Uint32x8) LeadingZeros() Uint32x8
				LeadingZeros counts the leading zeros of each element in x.
				
				Asm: VPLZCNTD, CPU Feature: AVX512

			( Uint32x8) Len() int
				Len returns the number of elements in a Uint32x8

			( Uint32x8) Less(y Uint32x8) Mask32x8
				Less returns a mask whose elements indicate whether x < y
				
				Emulated, CPU Feature AVX2

			( Uint32x8) LessEqual(y Uint32x8) Mask32x8
				LessEqual returns a mask whose elements indicate whether x <= y
				
				Emulated, CPU Feature AVX2

			( Uint32x8) Masked(mask Mask32x8) Uint32x8
				Masked returns x but with elements zeroed where mask is false.

			( Uint32x8) Max(y Uint32x8) Uint32x8
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXUD, CPU Feature: AVX2

			( Uint32x8) Merge(y Uint32x8, mask Mask32x8) Uint32x8
				Merge returns x but with elements set to y where mask is false.

			( Uint32x8) Min(y Uint32x8) Uint32x8
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINUD, CPU Feature: AVX2

			( Uint32x8) Mul(y Uint32x8) Uint32x8
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VPMULLD, CPU Feature: AVX2

			( Uint32x8) MulEvenWiden(y Uint32x8) Uint64x4
				MulEvenWiden multiplies even-indexed elements, widening the result.
				Result[i] = v1.Even[i] * v2.Even[i].
				
				Asm: VPMULUDQ, CPU Feature: AVX2

			( Uint32x8) Not() Uint32x8
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX2

			( Uint32x8) NotEqual(y Uint32x8) Mask32x8
				NotEqual returns a mask whose elements indicate whether x != y
				
				Emulated, CPU Feature AVX2

			( Uint32x8) OnesCount() Uint32x8
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ

			( Uint32x8) Or(y Uint32x8) Uint32x8
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPOR, CPU Feature: AVX2

			( Uint32x8) Permute(indices Uint32x8) Uint32x8
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 3 bits (values 0-7) of each element of indices is used
				
				Asm: VPERMD, CPU Feature: AVX2

			( Uint32x8) PermuteScalarsGrouped(a, b, c, d uint8) Uint32x8
				PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:
				
					result = {x[a], x[b], x[c], x[d], x[a+4], x[b+4], x[c+4], x[d+4]}
				
				Parameters a,b,c,d should have values between 0 and 3.
				If a through d are constants, then an instruction will be inlined, otherwise
				a jump table is generated.
				
				Asm: VPSHUFD, CPU Feature: AVX2

			( Uint32x8) RotateAllLeft(shift uint8) Uint32x8
				RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPROLD, CPU Feature: AVX512

			( Uint32x8) RotateAllRight(shift uint8) Uint32x8
				RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPRORD, CPU Feature: AVX512

			( Uint32x8) RotateLeft(y Uint32x8) Uint32x8
				RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
				
				Asm: VPROLVD, CPU Feature: AVX512

			( Uint32x8) RotateRight(y Uint32x8) Uint32x8
				RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
				
				Asm: VPRORVD, CPU Feature: AVX512

			( Uint32x8) SaturateToUint16() Uint16x8
				SaturateToUint16 converts element values to uint16.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPMOVUSDW, CPU Feature: AVX512

			( Uint32x8) SaturateToUint16Concat(y Uint32x8) Uint16x16
				SaturateToUint16Concat converts element values to uint16.
				With each 128-bit as a group:
				The converted group from the first input vector will be packed to the lower part of the result vector,
				the converted group from the second input vector will be packed to the upper part of the result vector.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPACKUSDW, CPU Feature: AVX2

			( Uint32x8) Select128FromPair(lo, hi uint8, y Uint32x8) Uint32x8
				Select128FromPair treats the 256-bit vectors x and y as a single vector of four
				128-bit elements, and returns a 256-bit result formed by
				concatenating the two elements specified by lo and hi.
				For example,
				
					{40, 41, 42, 43, 50, 51, 52, 53}.Select128FromPair(3, 0, {60, 61, 62, 63, 70, 71, 72, 73})
				
				returns {70, 71, 72, 73, 40, 41, 42, 43}.
				
				lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table.
				lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
				
				Asm: VPERM2I128, CPU Feature: AVX2

			( Uint32x8) SelectFromPairGrouped(a, b, c, d uint8, y Uint32x8) Uint32x8
				SelectFromPairGrouped returns, for each of the two 128-bit halves of
				the vectors x and y, the selection of four elements from  x and y,
				where selector values in the range 0-3 specify elements from x and
				values in the range 4-7 specify the 0-3 elements of y.
				When the selectors are constants and can be the selection
				can be implemented in a single instruction, it will be, otherwise
				it requires two. a is the source index of the least element in the
				output, and b, c, and d are the indices of the 2nd, 3rd, and 4th
				elements in the output.  For example,
				{1,2,4,8,16,32,64,128}.SelectFromPair(2,3,5,7,{9,25,49,81,121,169,225,289})
				
					returns {4,8,25,81,64,128,169,289}
				
				If the selectors are not constant this will translate to a function
				call.
				
				Asm: VSHUFPS, CPU Feature: AVX

			( Uint32x8) SetHi(y Uint32x4) Uint32x8
				SetHi returns x with its upper half set to y.
				
				Asm: VINSERTI128, CPU Feature: AVX2

			( Uint32x8) SetLo(y Uint32x4) Uint32x8
				SetLo returns x with its lower half set to y.
				
				Asm: VINSERTI128, CPU Feature: AVX2

			( Uint32x8) ShiftAllLeft(y uint64) Uint32x8
				ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
				
				Asm: VPSLLD, CPU Feature: AVX2

			( Uint32x8) ShiftAllLeftConcat(shift uint8, y Uint32x8) Uint32x8
				ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHLDD, CPU Feature: AVX512VBMI2

			( Uint32x8) ShiftAllRight(y uint64) Uint32x8
				ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
				
				Asm: VPSRLD, CPU Feature: AVX2

			( Uint32x8) ShiftAllRightConcat(shift uint8, y Uint32x8) Uint32x8
				ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHRDD, CPU Feature: AVX512VBMI2

			( Uint32x8) ShiftLeft(y Uint32x8) Uint32x8
				ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
				
				Asm: VPSLLVD, CPU Feature: AVX2

			( Uint32x8) ShiftLeftConcat(y Uint32x8, z Uint32x8) Uint32x8
				ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
				
				Asm: VPSHLDVD, CPU Feature: AVX512VBMI2

			( Uint32x8) ShiftRight(y Uint32x8) Uint32x8
				ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
				
				Asm: VPSRLVD, CPU Feature: AVX2

			( Uint32x8) ShiftRightConcat(y Uint32x8, z Uint32x8) Uint32x8
				ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
				
				Asm: VPSHRDVD, CPU Feature: AVX512VBMI2

			( Uint32x8) Store(y *[8]uint32)
				Store stores a Uint32x8 to an array

			( Uint32x8) StoreMasked(y *[8]uint32, mask Mask32x8)
				StoreMasked stores a Uint32x8 to an array,
				at those elements enabled by mask
				
				Asm: VMASKMOVD, CPU Feature: AVX2

			( Uint32x8) StoreSlice(s []uint32)
				StoreSlice stores x into a slice of at least 8 uint32s

			( Uint32x8) StoreSlicePart(s []uint32)
				StoreSlicePart stores the 8 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 8 or more elements, the method is equivalent to x.StoreSlice.

			( Uint32x8) String() string
				String returns a string representation of SIMD vector x

			( Uint32x8) Sub(y Uint32x8) Uint32x8
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBD, CPU Feature: AVX2

			( Uint32x8) SubPairs(y Uint32x8) Uint32x8
				SubPairs horizontally subtracts adjacent pairs of elements.
				For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
				
				Asm: VPHSUBD, CPU Feature: AVX2

			( Uint32x8) TruncateToUint16() Uint16x8
				TruncateToUint16 converts element values to uint16.
				Conversion is done with truncation on the vector elements.
				
				Asm: VPMOVDW, CPU Feature: AVX512

			( Uint32x8) TruncateToUint8() Uint8x16
				TruncateToUint8 converts element values to uint8.
				Conversion is done with truncation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVDB, CPU Feature: AVX512

			( Uint32x8) Xor(y Uint32x8) Uint32x8
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXOR, CPU Feature: AVX2

		Implements (at least 2)
			 Uint32x8 : expvar.Var
			 Uint32x8 : fmt.Stringer
		As Outputs Of (at least 62)
			func BroadcastUint32x8(x uint32) Uint32x8
			func LoadMaskedUint32x8(y *[8]uint32, mask Mask32x8) Uint32x8
			func LoadUint32x8(y *[8]uint32) Uint32x8
			func LoadUint32x8Slice(s []uint32) Uint32x8
			func LoadUint32x8SlicePart(s []uint32) Uint32x8
			func Float32x8.AsUint32x8() (to Uint32x8)
			func Float32x8.ConvertToUint32() Uint32x8
			func Float64x4.AsUint32x8() (to Uint32x8)
			func Float64x8.ConvertToUint32() Uint32x8
			func Int16x16.AsUint32x8() (to Uint32x8)
			func Int32x8.AsUint32x8() (to Uint32x8)
			func Int64x4.AsUint32x8() (to Uint32x8)
			func Int8x32.AsUint32x8() (to Uint32x8)
			func Uint16x16.AsUint32x8() (to Uint32x8)
			func Uint16x8.ExtendToUint32() Uint32x8
			func Uint32x16.GetHi() Uint32x8
			func Uint32x16.GetLo() Uint32x8
			func Uint32x4.Broadcast256() Uint32x8
			func Uint32x8.Add(y Uint32x8) Uint32x8
			func Uint32x8.AddPairs(y Uint32x8) Uint32x8
			func Uint32x8.And(y Uint32x8) Uint32x8
			func Uint32x8.AndNot(y Uint32x8) Uint32x8
			func Uint32x8.Compress(mask Mask32x8) Uint32x8
			func Uint32x8.ConcatPermute(y Uint32x8, indices Uint32x8) Uint32x8
			func Uint32x8.Expand(mask Mask32x8) Uint32x8
			func Uint32x8.InterleaveHiGrouped(y Uint32x8) Uint32x8
			func Uint32x8.InterleaveLoGrouped(y Uint32x8) Uint32x8
			func Uint32x8.LeadingZeros() Uint32x8
			func Uint32x8.Masked(mask Mask32x8) Uint32x8
			func Uint32x8.Max(y Uint32x8) Uint32x8
			func Uint32x8.Merge(y Uint32x8, mask Mask32x8) Uint32x8
			func Uint32x8.Min(y Uint32x8) Uint32x8
			func Uint32x8.Mul(y Uint32x8) Uint32x8
			func Uint32x8.Not() Uint32x8
			func Uint32x8.OnesCount() Uint32x8
			func Uint32x8.Or(y Uint32x8) Uint32x8
			func Uint32x8.Permute(indices Uint32x8) Uint32x8
			func Uint32x8.PermuteScalarsGrouped(a, b, c, d uint8) Uint32x8
			func Uint32x8.RotateAllLeft(shift uint8) Uint32x8
			func Uint32x8.RotateAllRight(shift uint8) Uint32x8
			func Uint32x8.RotateLeft(y Uint32x8) Uint32x8
			func Uint32x8.RotateRight(y Uint32x8) Uint32x8
			func Uint32x8.Select128FromPair(lo, hi uint8, y Uint32x8) Uint32x8
			func Uint32x8.SelectFromPairGrouped(a, b, c, d uint8, y Uint32x8) Uint32x8
			func Uint32x8.SetHi(y Uint32x4) Uint32x8
			func Uint32x8.SetLo(y Uint32x4) Uint32x8
			func Uint32x8.ShiftAllLeft(y uint64) Uint32x8
			func Uint32x8.ShiftAllLeftConcat(shift uint8, y Uint32x8) Uint32x8
			func Uint32x8.ShiftAllRight(y uint64) Uint32x8
			func Uint32x8.ShiftAllRightConcat(shift uint8, y Uint32x8) Uint32x8
			func Uint32x8.ShiftLeft(y Uint32x8) Uint32x8
			func Uint32x8.ShiftLeftConcat(y Uint32x8, z Uint32x8) Uint32x8
			func Uint32x8.ShiftRight(y Uint32x8) Uint32x8
			func Uint32x8.ShiftRightConcat(y Uint32x8, z Uint32x8) Uint32x8
			func Uint32x8.Sub(y Uint32x8) Uint32x8
			func Uint32x8.SubPairs(y Uint32x8) Uint32x8
			func Uint32x8.Xor(y Uint32x8) Uint32x8
			func Uint64x4.AsUint32x8() (to Uint32x8)
			func Uint64x8.SaturateToUint32() Uint32x8
			func Uint64x8.TruncateToUint32() Uint32x8
			func Uint8x16.ExtendLo8ToUint32x8() Uint32x8
			func Uint8x32.AsUint32x8() (to Uint32x8)
		As Inputs Of (at least 47)
			func Float32x8.ConcatPermute(y Float32x8, indices Uint32x8) Float32x8
			func Float32x8.Permute(indices Uint32x8) Float32x8
			func Int32x8.ConcatPermute(y Int32x8, indices Uint32x8) Int32x8
			func Int32x8.Permute(indices Uint32x8) Int32x8
			func Uint32x16.SetHi(y Uint32x8) Uint32x16
			func Uint32x16.SetLo(y Uint32x8) Uint32x16
			func Uint32x8.Add(y Uint32x8) Uint32x8
			func Uint32x8.AddPairs(y Uint32x8) Uint32x8
			func Uint32x8.And(y Uint32x8) Uint32x8
			func Uint32x8.AndNot(y Uint32x8) Uint32x8
			func Uint32x8.ConcatPermute(y Uint32x8, indices Uint32x8) Uint32x8
			func Uint32x8.ConcatPermute(y Uint32x8, indices Uint32x8) Uint32x8
			func Uint32x8.Equal(y Uint32x8) Mask32x8
			func Uint32x8.Greater(y Uint32x8) Mask32x8
			func Uint32x8.GreaterEqual(y Uint32x8) Mask32x8
			func Uint32x8.InterleaveHiGrouped(y Uint32x8) Uint32x8
			func Uint32x8.InterleaveLoGrouped(y Uint32x8) Uint32x8
			func Uint32x8.Less(y Uint32x8) Mask32x8
			func Uint32x8.LessEqual(y Uint32x8) Mask32x8
			func Uint32x8.Max(y Uint32x8) Uint32x8
			func Uint32x8.Merge(y Uint32x8, mask Mask32x8) Uint32x8
			func Uint32x8.Min(y Uint32x8) Uint32x8
			func Uint32x8.Mul(y Uint32x8) Uint32x8
			func Uint32x8.MulEvenWiden(y Uint32x8) Uint64x4
			func Uint32x8.NotEqual(y Uint32x8) Mask32x8
			func Uint32x8.Or(y Uint32x8) Uint32x8
			func Uint32x8.Permute(indices Uint32x8) Uint32x8
			func Uint32x8.RotateLeft(y Uint32x8) Uint32x8
			func Uint32x8.RotateRight(y Uint32x8) Uint32x8
			func Uint32x8.SaturateToUint16Concat(y Uint32x8) Uint16x16
			func Uint32x8.Select128FromPair(lo, hi uint8, y Uint32x8) Uint32x8
			func Uint32x8.SelectFromPairGrouped(a, b, c, d uint8, y Uint32x8) Uint32x8
			func Uint32x8.ShiftAllLeftConcat(shift uint8, y Uint32x8) Uint32x8
			func Uint32x8.ShiftAllRightConcat(shift uint8, y Uint32x8) Uint32x8
			func Uint32x8.ShiftLeft(y Uint32x8) Uint32x8
			func Uint32x8.ShiftLeftConcat(y Uint32x8, z Uint32x8) Uint32x8
			func Uint32x8.ShiftLeftConcat(y Uint32x8, z Uint32x8) Uint32x8
			func Uint32x8.ShiftRight(y Uint32x8) Uint32x8
			func Uint32x8.ShiftRightConcat(y Uint32x8, z Uint32x8) Uint32x8
			func Uint32x8.ShiftRightConcat(y Uint32x8, z Uint32x8) Uint32x8
			func Uint32x8.Sub(y Uint32x8) Uint32x8
			func Uint32x8.SubPairs(y Uint32x8) Uint32x8
			func Uint32x8.Xor(y Uint32x8) Uint32x8
			func Uint8x32.AESDecryptLastRound(y Uint32x8) Uint8x32
			func Uint8x32.AESDecryptOneRound(y Uint32x8) Uint8x32
			func Uint8x32.AESEncryptLastRound(y Uint32x8) Uint8x32
			func Uint8x32.AESEncryptOneRound(y Uint32x8) Uint8x32

	 type Uint64x2 (struct)
		Uint64x2 is a 128-bit SIMD vector of 2 uint64

		Methods (total 67)
			( Uint64x2) Add(y Uint64x2) Uint64x2
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDQ, CPU Feature: AVX

			( Uint64x2) And(y Uint64x2) Uint64x2
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPAND, CPU Feature: AVX

			( Uint64x2) AndNot(y Uint64x2) Uint64x2
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDN, CPU Feature: AVX

			( Uint64x2) AsFloat32x4() (to Float32x4)
				Float32x4 converts from Uint64x2 to Float32x4

			( Uint64x2) AsFloat64x2() (to Float64x2)
				Float64x2 converts from Uint64x2 to Float64x2

			( Uint64x2) AsInt16x8() (to Int16x8)
				Int16x8 converts from Uint64x2 to Int16x8

			( Uint64x2) AsInt32x4() (to Int32x4)
				Int32x4 converts from Uint64x2 to Int32x4

			( Uint64x2) AsInt64x2() (to Int64x2)
				Int64x2 converts from Uint64x2 to Int64x2

			( Uint64x2) AsInt8x16() (to Int8x16)
				Int8x16 converts from Uint64x2 to Int8x16

			( Uint64x2) AsUint16x8() (to Uint16x8)
				Uint16x8 converts from Uint64x2 to Uint16x8

			( Uint64x2) AsUint32x4() (to Uint32x4)
				Uint32x4 converts from Uint64x2 to Uint32x4

			( Uint64x2) AsUint8x16() (to Uint8x16)
				Uint8x16 converts from Uint64x2 to Uint8x16

			( Uint64x2) Broadcast128() Uint64x2
				Broadcast128 copies element zero of its (128-bit) input to all elements of
				the 128-bit output vector.
				
				Asm: VPBROADCASTQ, CPU Feature: AVX2

			( Uint64x2) Broadcast256() Uint64x4
				Broadcast256 copies element zero of its (128-bit) input to all elements of
				the 256-bit output vector.
				
				Asm: VPBROADCASTQ, CPU Feature: AVX2

			( Uint64x2) Broadcast512() Uint64x8
				Broadcast512 copies element zero of its (128-bit) input to all elements of
				the 512-bit output vector.
				
				Asm: VPBROADCASTQ, CPU Feature: AVX512

			( Uint64x2) CarrylessMultiply(a, b uint8, y Uint64x2) Uint64x2
				CarrylessMultiply computes one of four possible carryless
				multiplications of selected high and low halves of x and y,
				depending on the values of a and b, returning the 128-bit
				product in the concatenated two elements of the result.
				a selects the low (0) or high (1) element of x and
				b selects the low (0) or high (1) element of y.
				
				A carryless multiplication uses bitwise XOR instead of
				add-with-carry, for example (in base two):
				11 * 11 = 11 * (10 ^ 1) = (11 * 10) ^ (11 * 1) = 110 ^ 11 = 101
				
				This also models multiplication of polynomials with coefficients
				from GF(2) -- 11 * 11 models (x+1)*(x+1) = x**2 + (1^1)x + 1 =
				x**2 + 0x + 1 = x**2 + 1 modeled by 101.  (Note that "+" adds
				polynomial terms, but coefficients "add" with XOR.)
				
				constant values of a and b will result in better performance,
				otherwise the intrinsic may translate into a jump table.
				
				Asm: VPCLMULQDQ, CPU Feature: AVX

			( Uint64x2) Compress(mask Mask64x2) Uint64x2
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSQ, CPU Feature: AVX512

			( Uint64x2) ConcatPermute(y Uint64x2, indices Uint64x2) Uint64x2
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2Q, CPU Feature: AVX512

			( Uint64x2) ConvertToFloat32() Float32x4
				ConvertToFloat32 converts element values to float32.
				
				Asm: VCVTUQQ2PSX, CPU Feature: AVX512

			( Uint64x2) ConvertToFloat64() Float64x2
				ConvertToFloat64 converts element values to float64.
				
				Asm: VCVTUQQ2PD, CPU Feature: AVX512

			( Uint64x2) Equal(y Uint64x2) Mask64x2
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQQ, CPU Feature: AVX

			( Uint64x2) Expand(mask Mask64x2) Uint64x2
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDQ, CPU Feature: AVX512

			( Uint64x2) GetElem(index uint8) uint64
				GetElem retrieves a single constant-indexed element's value.
				
				index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPEXTRQ, CPU Feature: AVX

			( Uint64x2) Greater(y Uint64x2) Mask64x2
				Greater returns a mask whose elements indicate whether x > y
				
				Emulated, CPU Feature AVX

			( Uint64x2) GreaterEqual(y Uint64x2) Mask64x2
				GreaterEqual returns a mask whose elements indicate whether x >= y
				
				Emulated, CPU Feature AVX

			( Uint64x2) InterleaveHi(y Uint64x2) Uint64x2
				InterleaveHi interleaves the elements of the high halves of x and y.
				
				Asm: VPUNPCKHQDQ, CPU Feature: AVX

			( Uint64x2) InterleaveLo(y Uint64x2) Uint64x2
				InterleaveLo interleaves the elements of the low halves of x and y.
				
				Asm: VPUNPCKLQDQ, CPU Feature: AVX

			( Uint64x2) IsZero() bool
				IsZero returns true if all elements of x are zeros.
				
				This method compiles to VPTEST x, x.
				x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
				
				Asm: VPTEST, CPU Feature: AVX

			( Uint64x2) LeadingZeros() Uint64x2
				LeadingZeros counts the leading zeros of each element in x.
				
				Asm: VPLZCNTQ, CPU Feature: AVX512

			( Uint64x2) Len() int
				Len returns the number of elements in a Uint64x2

			( Uint64x2) Less(y Uint64x2) Mask64x2
				Less returns a mask whose elements indicate whether x < y
				
				Emulated, CPU Feature AVX

			( Uint64x2) LessEqual(y Uint64x2) Mask64x2
				LessEqual returns a mask whose elements indicate whether x <= y
				
				Emulated, CPU Feature AVX

			( Uint64x2) Masked(mask Mask64x2) Uint64x2
				Masked returns x but with elements zeroed where mask is false.

			( Uint64x2) Max(y Uint64x2) Uint64x2
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXUQ, CPU Feature: AVX512

			( Uint64x2) Merge(y Uint64x2, mask Mask64x2) Uint64x2
				Merge returns x but with elements set to y where mask is false.

			( Uint64x2) Min(y Uint64x2) Uint64x2
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINUQ, CPU Feature: AVX512

			( Uint64x2) Mul(y Uint64x2) Uint64x2
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VPMULLQ, CPU Feature: AVX512

			( Uint64x2) Not() Uint64x2
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX

			( Uint64x2) NotEqual(y Uint64x2) Mask64x2
				NotEqual returns a mask whose elements indicate whether x != y
				
				Emulated, CPU Feature AVX

			( Uint64x2) OnesCount() Uint64x2
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ

			( Uint64x2) Or(y Uint64x2) Uint64x2
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPOR, CPU Feature: AVX

			( Uint64x2) RotateAllLeft(shift uint8) Uint64x2
				RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPROLQ, CPU Feature: AVX512

			( Uint64x2) RotateAllRight(shift uint8) Uint64x2
				RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPRORQ, CPU Feature: AVX512

			( Uint64x2) RotateLeft(y Uint64x2) Uint64x2
				RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
				
				Asm: VPROLVQ, CPU Feature: AVX512

			( Uint64x2) RotateRight(y Uint64x2) Uint64x2
				RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
				
				Asm: VPRORVQ, CPU Feature: AVX512

			( Uint64x2) SaturateToUint16() Uint16x8
				SaturateToUint16 converts element values to uint16.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPMOVUSQW, CPU Feature: AVX512

			( Uint64x2) SaturateToUint32() Uint32x4
				SaturateToUint32 converts element values to uint32.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPMOVUSQD, CPU Feature: AVX512

			( Uint64x2) SelectFromPair(a, b uint8, y Uint64x2) Uint64x2
				SelectFromPair returns the selection of two elements from the two
				vectors x and y, where selector values in the range 0-1 specify
				elements from x and values in the range 2-3 specify the 0-1 elements
				of y.  When the selectors are constants the selection can be
				implemented in a single instruction.
				
				If the selectors are not constant this will translate to a function
				call.
				
				Asm: VSHUFPD, CPU Feature: AVX

			( Uint64x2) SetElem(index uint8, y uint64) Uint64x2
				SetElem sets a single constant-indexed element's value.
				
				index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPINSRQ, CPU Feature: AVX

			( Uint64x2) ShiftAllLeft(y uint64) Uint64x2
				ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
				
				Asm: VPSLLQ, CPU Feature: AVX

			( Uint64x2) ShiftAllLeftConcat(shift uint8, y Uint64x2) Uint64x2
				ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHLDQ, CPU Feature: AVX512VBMI2

			( Uint64x2) ShiftAllRight(y uint64) Uint64x2
				ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
				
				Asm: VPSRLQ, CPU Feature: AVX

			( Uint64x2) ShiftAllRightConcat(shift uint8, y Uint64x2) Uint64x2
				ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHRDQ, CPU Feature: AVX512VBMI2

			( Uint64x2) ShiftLeft(y Uint64x2) Uint64x2
				ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
				
				Asm: VPSLLVQ, CPU Feature: AVX2

			( Uint64x2) ShiftLeftConcat(y Uint64x2, z Uint64x2) Uint64x2
				ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
				
				Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2

			( Uint64x2) ShiftRight(y Uint64x2) Uint64x2
				ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
				
				Asm: VPSRLVQ, CPU Feature: AVX2

			( Uint64x2) ShiftRightConcat(y Uint64x2, z Uint64x2) Uint64x2
				ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
				
				Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2

			( Uint64x2) Store(y *[2]uint64)
				Store stores a Uint64x2 to an array

			( Uint64x2) StoreMasked(y *[2]uint64, mask Mask64x2)
				StoreMasked stores a Uint64x2 to an array,
				at those elements enabled by mask
				
				Asm: VMASKMOVQ, CPU Feature: AVX2

			( Uint64x2) StoreSlice(s []uint64)
				StoreSlice stores x into a slice of at least 2 uint64s

			( Uint64x2) StoreSlicePart(s []uint64)
				StoreSlicePart stores the 2 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 2 or more elements, the method is equivalent to x.StoreSlice.

			( Uint64x2) String() string
				String returns a string representation of SIMD vector x

			( Uint64x2) Sub(y Uint64x2) Uint64x2
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBQ, CPU Feature: AVX

			( Uint64x2) TruncateToUint16() Uint16x8
				TruncateToUint16 converts element values to uint16.
				Conversion is done with truncation on the vector elements.
				
				Asm: VPMOVQW, CPU Feature: AVX512

			( Uint64x2) TruncateToUint32() Uint32x4
				TruncateToUint32 converts element values to uint32.
				Conversion is done with truncation on the vector elements.
				
				Asm: VPMOVQD, CPU Feature: AVX512

			( Uint64x2) TruncateToUint8() Uint8x16
				TruncateToUint8 converts element values to uint8.
				Conversion is done with truncation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVQB, CPU Feature: AVX512

			( Uint64x2) Xor(y Uint64x2) Uint64x2
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXOR, CPU Feature: AVX

		Implements (at least 2)
			 Uint64x2 : expvar.Var
			 Uint64x2 : fmt.Stringer
		As Outputs Of (at least 56)
			func BroadcastUint64x2(x uint64) Uint64x2
			func LoadMaskedUint64x2(y *[2]uint64, mask Mask64x2) Uint64x2
			func LoadUint64x2(y *[2]uint64) Uint64x2
			func LoadUint64x2Slice(s []uint64) Uint64x2
			func LoadUint64x2SlicePart(s []uint64) Uint64x2
			func Float32x4.AsUint64x2() (to Uint64x2)
			func Float64x2.AsUint64x2() (to Uint64x2)
			func Float64x2.ConvertToUint64() Uint64x2
			func Int16x8.AsUint64x2() (to Uint64x2)
			func Int32x4.AsUint64x2() (to Uint64x2)
			func Int64x2.AsUint64x2() (to Uint64x2)
			func Int8x16.AsUint64x2() (to Uint64x2)
			func Uint16x8.AsUint64x2() (to Uint64x2)
			func Uint16x8.ExtendLo2ToUint64x2() Uint64x2
			func Uint32x4.AsUint64x2() (to Uint64x2)
			func Uint32x4.ExtendLo2ToUint64x2() Uint64x2
			func Uint32x4.MulEvenWiden(y Uint32x4) Uint64x2
			func Uint64x2.Add(y Uint64x2) Uint64x2
			func Uint64x2.And(y Uint64x2) Uint64x2
			func Uint64x2.AndNot(y Uint64x2) Uint64x2
			func Uint64x2.Broadcast128() Uint64x2
			func Uint64x2.CarrylessMultiply(a, b uint8, y Uint64x2) Uint64x2
			func Uint64x2.Compress(mask Mask64x2) Uint64x2
			func Uint64x2.ConcatPermute(y Uint64x2, indices Uint64x2) Uint64x2
			func Uint64x2.Expand(mask Mask64x2) Uint64x2
			func Uint64x2.InterleaveHi(y Uint64x2) Uint64x2
			func Uint64x2.InterleaveLo(y Uint64x2) Uint64x2
			func Uint64x2.LeadingZeros() Uint64x2
			func Uint64x2.Masked(mask Mask64x2) Uint64x2
			func Uint64x2.Max(y Uint64x2) Uint64x2
			func Uint64x2.Merge(y Uint64x2, mask Mask64x2) Uint64x2
			func Uint64x2.Min(y Uint64x2) Uint64x2
			func Uint64x2.Mul(y Uint64x2) Uint64x2
			func Uint64x2.Not() Uint64x2
			func Uint64x2.OnesCount() Uint64x2
			func Uint64x2.Or(y Uint64x2) Uint64x2
			func Uint64x2.RotateAllLeft(shift uint8) Uint64x2
			func Uint64x2.RotateAllRight(shift uint8) Uint64x2
			func Uint64x2.RotateLeft(y Uint64x2) Uint64x2
			func Uint64x2.RotateRight(y Uint64x2) Uint64x2
			func Uint64x2.SelectFromPair(a, b uint8, y Uint64x2) Uint64x2
			func Uint64x2.SetElem(index uint8, y uint64) Uint64x2
			func Uint64x2.ShiftAllLeft(y uint64) Uint64x2
			func Uint64x2.ShiftAllLeftConcat(shift uint8, y Uint64x2) Uint64x2
			func Uint64x2.ShiftAllRight(y uint64) Uint64x2
			func Uint64x2.ShiftAllRightConcat(shift uint8, y Uint64x2) Uint64x2
			func Uint64x2.ShiftLeft(y Uint64x2) Uint64x2
			func Uint64x2.ShiftLeftConcat(y Uint64x2, z Uint64x2) Uint64x2
			func Uint64x2.ShiftRight(y Uint64x2) Uint64x2
			func Uint64x2.ShiftRightConcat(y Uint64x2, z Uint64x2) Uint64x2
			func Uint64x2.Sub(y Uint64x2) Uint64x2
			func Uint64x2.Xor(y Uint64x2) Uint64x2
			func Uint64x4.GetHi() Uint64x2
			func Uint64x4.GetLo() Uint64x2
			func Uint8x16.AsUint64x2() (to Uint64x2)
			func Uint8x16.ExtendLo2ToUint64x2() Uint64x2
		As Inputs Of (at least 38)
			func Float64x2.ConcatPermute(y Float64x2, indices Uint64x2) Float64x2
			func Int64x2.ConcatPermute(y Int64x2, indices Uint64x2) Int64x2
			func Uint64x2.Add(y Uint64x2) Uint64x2
			func Uint64x2.And(y Uint64x2) Uint64x2
			func Uint64x2.AndNot(y Uint64x2) Uint64x2
			func Uint64x2.CarrylessMultiply(a, b uint8, y Uint64x2) Uint64x2
			func Uint64x2.ConcatPermute(y Uint64x2, indices Uint64x2) Uint64x2
			func Uint64x2.ConcatPermute(y Uint64x2, indices Uint64x2) Uint64x2
			func Uint64x2.Equal(y Uint64x2) Mask64x2
			func Uint64x2.Greater(y Uint64x2) Mask64x2
			func Uint64x2.GreaterEqual(y Uint64x2) Mask64x2
			func Uint64x2.InterleaveHi(y Uint64x2) Uint64x2
			func Uint64x2.InterleaveLo(y Uint64x2) Uint64x2
			func Uint64x2.Less(y Uint64x2) Mask64x2
			func Uint64x2.LessEqual(y Uint64x2) Mask64x2
			func Uint64x2.Max(y Uint64x2) Uint64x2
			func Uint64x2.Merge(y Uint64x2, mask Mask64x2) Uint64x2
			func Uint64x2.Min(y Uint64x2) Uint64x2
			func Uint64x2.Mul(y Uint64x2) Uint64x2
			func Uint64x2.NotEqual(y Uint64x2) Mask64x2
			func Uint64x2.Or(y Uint64x2) Uint64x2
			func Uint64x2.RotateLeft(y Uint64x2) Uint64x2
			func Uint64x2.RotateRight(y Uint64x2) Uint64x2
			func Uint64x2.SelectFromPair(a, b uint8, y Uint64x2) Uint64x2
			func Uint64x2.ShiftAllLeftConcat(shift uint8, y Uint64x2) Uint64x2
			func Uint64x2.ShiftAllRightConcat(shift uint8, y Uint64x2) Uint64x2
			func Uint64x2.ShiftLeft(y Uint64x2) Uint64x2
			func Uint64x2.ShiftLeftConcat(y Uint64x2, z Uint64x2) Uint64x2
			func Uint64x2.ShiftLeftConcat(y Uint64x2, z Uint64x2) Uint64x2
			func Uint64x2.ShiftRight(y Uint64x2) Uint64x2
			func Uint64x2.ShiftRightConcat(y Uint64x2, z Uint64x2) Uint64x2
			func Uint64x2.ShiftRightConcat(y Uint64x2, z Uint64x2) Uint64x2
			func Uint64x2.Sub(y Uint64x2) Uint64x2
			func Uint64x2.Xor(y Uint64x2) Uint64x2
			func Uint64x4.SetHi(y Uint64x2) Uint64x4
			func Uint64x4.SetLo(y Uint64x2) Uint64x4
			func Uint8x16.GaloisFieldAffineTransform(y Uint64x2, b uint8) Uint8x16
			func Uint8x16.GaloisFieldAffineTransformInverse(y Uint64x2, b uint8) Uint8x16

	 type Uint64x4 (struct)
		Uint64x4 is a 256-bit SIMD vector of 4 uint64

		Methods (total 68)
			( Uint64x4) Add(y Uint64x4) Uint64x4
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDQ, CPU Feature: AVX2

			( Uint64x4) And(y Uint64x4) Uint64x4
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPAND, CPU Feature: AVX2

			( Uint64x4) AndNot(y Uint64x4) Uint64x4
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDN, CPU Feature: AVX2

			( Uint64x4) AsFloat32x8() (to Float32x8)
				Float32x8 converts from Uint64x4 to Float32x8

			( Uint64x4) AsFloat64x4() (to Float64x4)
				Float64x4 converts from Uint64x4 to Float64x4

			( Uint64x4) AsInt16x16() (to Int16x16)
				Int16x16 converts from Uint64x4 to Int16x16

			( Uint64x4) AsInt32x8() (to Int32x8)
				Int32x8 converts from Uint64x4 to Int32x8

			( Uint64x4) AsInt64x4() (to Int64x4)
				Int64x4 converts from Uint64x4 to Int64x4

			( Uint64x4) AsInt8x32() (to Int8x32)
				Int8x32 converts from Uint64x4 to Int8x32

			( Uint64x4) AsUint16x16() (to Uint16x16)
				Uint16x16 converts from Uint64x4 to Uint16x16

			( Uint64x4) AsUint32x8() (to Uint32x8)
				Uint32x8 converts from Uint64x4 to Uint32x8

			( Uint64x4) AsUint8x32() (to Uint8x32)
				Uint8x32 converts from Uint64x4 to Uint8x32

			( Uint64x4) CarrylessMultiplyGrouped(a, b uint8, y Uint64x4) Uint64x4
				CarrylessMultiplyGrouped computes one of four possible carryless
				multiplications of selected high and low halves of each of the two
				128-bit lanes of x and y, depending on the values of a and b,
				and returns the four 128-bit products in the result's lanes.
				a selects the low (0) or high (1) elements of x's lanes and
				b selects the low (0) or high (1) elements of y's lanes.
				
				A carryless multiplication uses bitwise XOR instead of
				add-with-carry, for example (in base two):
				11 * 11 = 11 * (10 ^ 1) = (11 * 10) ^ (11 * 1) = 110 ^ 11 = 101
				
				This also models multiplication of polynomials with coefficients
				from GF(2) -- 11 * 11 models (x+1)*(x+1) = x**2 + (1^1)x + 1 =
				x**2 + 0x + 1 = x**2 + 1 modeled by 101.  (Note that "+" adds
				polynomial terms, but coefficients "add" with XOR.)
				
				constant values of a and b will result in better performance,
				otherwise the intrinsic may translate into a jump table.
				
				Asm: VPCLMULQDQ, CPU Feature: AVX512VPCLMULQDQ

			( Uint64x4) Compress(mask Mask64x4) Uint64x4
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSQ, CPU Feature: AVX512

			( Uint64x4) ConcatPermute(y Uint64x4, indices Uint64x4) Uint64x4
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2Q, CPU Feature: AVX512

			( Uint64x4) ConvertToFloat32() Float32x4
				ConvertToFloat32 converts element values to float32.
				
				Asm: VCVTUQQ2PSY, CPU Feature: AVX512

			( Uint64x4) ConvertToFloat64() Float64x4
				ConvertToFloat64 converts element values to float64.
				
				Asm: VCVTUQQ2PD, CPU Feature: AVX512

			( Uint64x4) Equal(y Uint64x4) Mask64x4
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQQ, CPU Feature: AVX2

			( Uint64x4) Expand(mask Mask64x4) Uint64x4
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDQ, CPU Feature: AVX512

			( Uint64x4) GetHi() Uint64x2
				GetHi returns the upper half of x.
				
				Asm: VEXTRACTI128, CPU Feature: AVX2

			( Uint64x4) GetLo() Uint64x2
				GetLo returns the lower half of x.
				
				Asm: VEXTRACTI128, CPU Feature: AVX2

			( Uint64x4) Greater(y Uint64x4) Mask64x4
				Greater returns a mask whose elements indicate whether x > y
				
				Emulated, CPU Feature AVX2

			( Uint64x4) GreaterEqual(y Uint64x4) Mask64x4
				GreaterEqual returns a mask whose elements indicate whether x >= y
				
				Emulated, CPU Feature AVX2

			( Uint64x4) InterleaveHiGrouped(y Uint64x4) Uint64x4
				InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKHQDQ, CPU Feature: AVX2

			( Uint64x4) InterleaveLoGrouped(y Uint64x4) Uint64x4
				InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKLQDQ, CPU Feature: AVX2

			( Uint64x4) IsZero() bool
				IsZero returns true if all elements of x are zeros.
				
				This method compiles to VPTEST x, x.
				x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
				
				Asm: VPTEST, CPU Feature: AVX

			( Uint64x4) LeadingZeros() Uint64x4
				LeadingZeros counts the leading zeros of each element in x.
				
				Asm: VPLZCNTQ, CPU Feature: AVX512

			( Uint64x4) Len() int
				Len returns the number of elements in a Uint64x4

			( Uint64x4) Less(y Uint64x4) Mask64x4
				Less returns a mask whose elements indicate whether x < y
				
				Emulated, CPU Feature AVX2

			( Uint64x4) LessEqual(y Uint64x4) Mask64x4
				LessEqual returns a mask whose elements indicate whether x <= y
				
				Emulated, CPU Feature AVX2

			( Uint64x4) Masked(mask Mask64x4) Uint64x4
				Masked returns x but with elements zeroed where mask is false.

			( Uint64x4) Max(y Uint64x4) Uint64x4
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXUQ, CPU Feature: AVX512

			( Uint64x4) Merge(y Uint64x4, mask Mask64x4) Uint64x4
				Merge returns x but with elements set to y where mask is false.

			( Uint64x4) Min(y Uint64x4) Uint64x4
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINUQ, CPU Feature: AVX512

			( Uint64x4) Mul(y Uint64x4) Uint64x4
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VPMULLQ, CPU Feature: AVX512

			( Uint64x4) Not() Uint64x4
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX2

			( Uint64x4) NotEqual(y Uint64x4) Mask64x4
				NotEqual returns a mask whose elements indicate whether x != y
				
				Emulated, CPU Feature AVX2

			( Uint64x4) OnesCount() Uint64x4
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ

			( Uint64x4) Or(y Uint64x4) Uint64x4
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPOR, CPU Feature: AVX2

			( Uint64x4) Permute(indices Uint64x4) Uint64x4
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 2 bits (values 0-3) of each element of indices is used
				
				Asm: VPERMQ, CPU Feature: AVX512

			( Uint64x4) RotateAllLeft(shift uint8) Uint64x4
				RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPROLQ, CPU Feature: AVX512

			( Uint64x4) RotateAllRight(shift uint8) Uint64x4
				RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPRORQ, CPU Feature: AVX512

			( Uint64x4) RotateLeft(y Uint64x4) Uint64x4
				RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
				
				Asm: VPROLVQ, CPU Feature: AVX512

			( Uint64x4) RotateRight(y Uint64x4) Uint64x4
				RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
				
				Asm: VPRORVQ, CPU Feature: AVX512

			( Uint64x4) SaturateToUint16() Uint16x8
				SaturateToUint16 converts element values to uint16.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPMOVUSQW, CPU Feature: AVX512

			( Uint64x4) SaturateToUint32() Uint32x4
				SaturateToUint32 converts element values to uint32.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPMOVUSQD, CPU Feature: AVX512

			( Uint64x4) Select128FromPair(lo, hi uint8, y Uint64x4) Uint64x4
				Select128FromPair treats the 256-bit vectors x and y as a single vector of four
				128-bit elements, and returns a 256-bit result formed by
				concatenating the two elements specified by lo and hi.
				For example,
				
					{40, 41, 50, 51}.Select128FromPair(3, 0, {60, 61, 70, 71})
				
				returns {70, 71, 40, 41}.
				
				lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table.
				lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
				
				Asm: VPERM2I128, CPU Feature: AVX2

			( Uint64x4) SelectFromPairGrouped(a, b uint8, y Uint64x4) Uint64x4
				SelectFromPairGrouped returns, for each of the two 128-bit halves of
				the vectors x and y, the selection of two elements from the two
				vectors x and y, where selector values in the range 0-1 specify
				elements from x and values in the range 2-3 specify the 0-1 elements
				of y.  When the selectors are constants the selection can be
				implemented in a single instruction.
				
				If the selectors are not constant this will translate to a function
				call.
				
				Asm: VSHUFPD, CPU Feature: AVX

			( Uint64x4) SetHi(y Uint64x2) Uint64x4
				SetHi returns x with its upper half set to y.
				
				Asm: VINSERTI128, CPU Feature: AVX2

			( Uint64x4) SetLo(y Uint64x2) Uint64x4
				SetLo returns x with its lower half set to y.
				
				Asm: VINSERTI128, CPU Feature: AVX2

			( Uint64x4) ShiftAllLeft(y uint64) Uint64x4
				ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
				
				Asm: VPSLLQ, CPU Feature: AVX2

			( Uint64x4) ShiftAllLeftConcat(shift uint8, y Uint64x4) Uint64x4
				ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHLDQ, CPU Feature: AVX512VBMI2

			( Uint64x4) ShiftAllRight(y uint64) Uint64x4
				ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
				
				Asm: VPSRLQ, CPU Feature: AVX2

			( Uint64x4) ShiftAllRightConcat(shift uint8, y Uint64x4) Uint64x4
				ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHRDQ, CPU Feature: AVX512VBMI2

			( Uint64x4) ShiftLeft(y Uint64x4) Uint64x4
				ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
				
				Asm: VPSLLVQ, CPU Feature: AVX2

			( Uint64x4) ShiftLeftConcat(y Uint64x4, z Uint64x4) Uint64x4
				ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
				
				Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2

			( Uint64x4) ShiftRight(y Uint64x4) Uint64x4
				ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
				
				Asm: VPSRLVQ, CPU Feature: AVX2

			( Uint64x4) ShiftRightConcat(y Uint64x4, z Uint64x4) Uint64x4
				ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
				
				Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2

			( Uint64x4) Store(y *[4]uint64)
				Store stores a Uint64x4 to an array

			( Uint64x4) StoreMasked(y *[4]uint64, mask Mask64x4)
				StoreMasked stores a Uint64x4 to an array,
				at those elements enabled by mask
				
				Asm: VMASKMOVQ, CPU Feature: AVX2

			( Uint64x4) StoreSlice(s []uint64)
				StoreSlice stores x into a slice of at least 4 uint64s

			( Uint64x4) StoreSlicePart(s []uint64)
				StoreSlicePart stores the 4 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 4 or more elements, the method is equivalent to x.StoreSlice.

			( Uint64x4) String() string
				String returns a string representation of SIMD vector x

			( Uint64x4) Sub(y Uint64x4) Uint64x4
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBQ, CPU Feature: AVX2

			( Uint64x4) TruncateToUint16() Uint16x8
				TruncateToUint16 converts element values to uint16.
				Conversion is done with truncation on the vector elements.
				
				Asm: VPMOVQW, CPU Feature: AVX512

			( Uint64x4) TruncateToUint32() Uint32x4
				TruncateToUint32 converts element values to uint32.
				Conversion is done with truncation on the vector elements.
				
				Asm: VPMOVQD, CPU Feature: AVX512

			( Uint64x4) TruncateToUint8() Uint8x16
				TruncateToUint8 converts element values to uint8.
				Conversion is done with truncation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVQB, CPU Feature: AVX512

			( Uint64x4) Xor(y Uint64x4) Uint64x4
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXOR, CPU Feature: AVX2

		Implements (at least 2)
			 Uint64x4 : expvar.Var
			 Uint64x4 : fmt.Stringer
		As Outputs Of (at least 60)
			func BroadcastUint64x4(x uint64) Uint64x4
			func LoadMaskedUint64x4(y *[4]uint64, mask Mask64x4) Uint64x4
			func LoadUint64x4(y *[4]uint64) Uint64x4
			func LoadUint64x4Slice(s []uint64) Uint64x4
			func LoadUint64x4SlicePart(s []uint64) Uint64x4
			func Float32x4.ConvertToUint64() Uint64x4
			func Float32x8.AsUint64x4() (to Uint64x4)
			func Float64x4.AsUint64x4() (to Uint64x4)
			func Float64x4.ConvertToUint64() Uint64x4
			func Int16x16.AsUint64x4() (to Uint64x4)
			func Int32x8.AsUint64x4() (to Uint64x4)
			func Int64x4.AsUint64x4() (to Uint64x4)
			func Int8x32.AsUint64x4() (to Uint64x4)
			func Uint16x16.AsUint64x4() (to Uint64x4)
			func Uint16x8.ExtendLo4ToUint64x4() Uint64x4
			func Uint32x4.ExtendToUint64() Uint64x4
			func Uint32x8.AsUint64x4() (to Uint64x4)
			func Uint32x8.MulEvenWiden(y Uint32x8) Uint64x4
			func Uint64x2.Broadcast256() Uint64x4
			func Uint64x4.Add(y Uint64x4) Uint64x4
			func Uint64x4.And(y Uint64x4) Uint64x4
			func Uint64x4.AndNot(y Uint64x4) Uint64x4
			func Uint64x4.CarrylessMultiplyGrouped(a, b uint8, y Uint64x4) Uint64x4
			func Uint64x4.Compress(mask Mask64x4) Uint64x4
			func Uint64x4.ConcatPermute(y Uint64x4, indices Uint64x4) Uint64x4
			func Uint64x4.Expand(mask Mask64x4) Uint64x4
			func Uint64x4.InterleaveHiGrouped(y Uint64x4) Uint64x4
			func Uint64x4.InterleaveLoGrouped(y Uint64x4) Uint64x4
			func Uint64x4.LeadingZeros() Uint64x4
			func Uint64x4.Masked(mask Mask64x4) Uint64x4
			func Uint64x4.Max(y Uint64x4) Uint64x4
			func Uint64x4.Merge(y Uint64x4, mask Mask64x4) Uint64x4
			func Uint64x4.Min(y Uint64x4) Uint64x4
			func Uint64x4.Mul(y Uint64x4) Uint64x4
			func Uint64x4.Not() Uint64x4
			func Uint64x4.OnesCount() Uint64x4
			func Uint64x4.Or(y Uint64x4) Uint64x4
			func Uint64x4.Permute(indices Uint64x4) Uint64x4
			func Uint64x4.RotateAllLeft(shift uint8) Uint64x4
			func Uint64x4.RotateAllRight(shift uint8) Uint64x4
			func Uint64x4.RotateLeft(y Uint64x4) Uint64x4
			func Uint64x4.RotateRight(y Uint64x4) Uint64x4
			func Uint64x4.Select128FromPair(lo, hi uint8, y Uint64x4) Uint64x4
			func Uint64x4.SelectFromPairGrouped(a, b uint8, y Uint64x4) Uint64x4
			func Uint64x4.SetHi(y Uint64x2) Uint64x4
			func Uint64x4.SetLo(y Uint64x2) Uint64x4
			func Uint64x4.ShiftAllLeft(y uint64) Uint64x4
			func Uint64x4.ShiftAllLeftConcat(shift uint8, y Uint64x4) Uint64x4
			func Uint64x4.ShiftAllRight(y uint64) Uint64x4
			func Uint64x4.ShiftAllRightConcat(shift uint8, y Uint64x4) Uint64x4
			func Uint64x4.ShiftLeft(y Uint64x4) Uint64x4
			func Uint64x4.ShiftLeftConcat(y Uint64x4, z Uint64x4) Uint64x4
			func Uint64x4.ShiftRight(y Uint64x4) Uint64x4
			func Uint64x4.ShiftRightConcat(y Uint64x4, z Uint64x4) Uint64x4
			func Uint64x4.Sub(y Uint64x4) Uint64x4
			func Uint64x4.Xor(y Uint64x4) Uint64x4
			func Uint64x8.GetHi() Uint64x4
			func Uint64x8.GetLo() Uint64x4
			func Uint8x16.ExtendLo4ToUint64x4() Uint64x4
			func Uint8x32.AsUint64x4() (to Uint64x4)
		As Inputs Of (at least 42)
			func Float64x4.ConcatPermute(y Float64x4, indices Uint64x4) Float64x4
			func Float64x4.Permute(indices Uint64x4) Float64x4
			func Int64x4.ConcatPermute(y Int64x4, indices Uint64x4) Int64x4
			func Int64x4.Permute(indices Uint64x4) Int64x4
			func Uint64x4.Add(y Uint64x4) Uint64x4
			func Uint64x4.And(y Uint64x4) Uint64x4
			func Uint64x4.AndNot(y Uint64x4) Uint64x4
			func Uint64x4.CarrylessMultiplyGrouped(a, b uint8, y Uint64x4) Uint64x4
			func Uint64x4.ConcatPermute(y Uint64x4, indices Uint64x4) Uint64x4
			func Uint64x4.ConcatPermute(y Uint64x4, indices Uint64x4) Uint64x4
			func Uint64x4.Equal(y Uint64x4) Mask64x4
			func Uint64x4.Greater(y Uint64x4) Mask64x4
			func Uint64x4.GreaterEqual(y Uint64x4) Mask64x4
			func Uint64x4.InterleaveHiGrouped(y Uint64x4) Uint64x4
			func Uint64x4.InterleaveLoGrouped(y Uint64x4) Uint64x4
			func Uint64x4.Less(y Uint64x4) Mask64x4
			func Uint64x4.LessEqual(y Uint64x4) Mask64x4
			func Uint64x4.Max(y Uint64x4) Uint64x4
			func Uint64x4.Merge(y Uint64x4, mask Mask64x4) Uint64x4
			func Uint64x4.Min(y Uint64x4) Uint64x4
			func Uint64x4.Mul(y Uint64x4) Uint64x4
			func Uint64x4.NotEqual(y Uint64x4) Mask64x4
			func Uint64x4.Or(y Uint64x4) Uint64x4
			func Uint64x4.Permute(indices Uint64x4) Uint64x4
			func Uint64x4.RotateLeft(y Uint64x4) Uint64x4
			func Uint64x4.RotateRight(y Uint64x4) Uint64x4
			func Uint64x4.Select128FromPair(lo, hi uint8, y Uint64x4) Uint64x4
			func Uint64x4.SelectFromPairGrouped(a, b uint8, y Uint64x4) Uint64x4
			func Uint64x4.ShiftAllLeftConcat(shift uint8, y Uint64x4) Uint64x4
			func Uint64x4.ShiftAllRightConcat(shift uint8, y Uint64x4) Uint64x4
			func Uint64x4.ShiftLeft(y Uint64x4) Uint64x4
			func Uint64x4.ShiftLeftConcat(y Uint64x4, z Uint64x4) Uint64x4
			func Uint64x4.ShiftLeftConcat(y Uint64x4, z Uint64x4) Uint64x4
			func Uint64x4.ShiftRight(y Uint64x4) Uint64x4
			func Uint64x4.ShiftRightConcat(y Uint64x4, z Uint64x4) Uint64x4
			func Uint64x4.ShiftRightConcat(y Uint64x4, z Uint64x4) Uint64x4
			func Uint64x4.Sub(y Uint64x4) Uint64x4
			func Uint64x4.Xor(y Uint64x4) Uint64x4
			func Uint64x8.SetHi(y Uint64x4) Uint64x8
			func Uint64x8.SetLo(y Uint64x4) Uint64x8
			func Uint8x32.GaloisFieldAffineTransform(y Uint64x4, b uint8) Uint8x32
			func Uint8x32.GaloisFieldAffineTransformInverse(y Uint64x4, b uint8) Uint8x32

	 type Uint64x8 (struct)
		Uint64x8 is a 512-bit SIMD vector of 8 uint64

		Methods (total 66)
			( Uint64x8) Add(y Uint64x8) Uint64x8
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDQ, CPU Feature: AVX512

			( Uint64x8) And(y Uint64x8) Uint64x8
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPANDQ, CPU Feature: AVX512

			( Uint64x8) AndNot(y Uint64x8) Uint64x8
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDNQ, CPU Feature: AVX512

			( Uint64x8) AsFloat32x16() (to Float32x16)
				Float32x16 converts from Uint64x8 to Float32x16

			( Uint64x8) AsFloat64x8() (to Float64x8)
				Float64x8 converts from Uint64x8 to Float64x8

			( Uint64x8) AsInt16x32() (to Int16x32)
				Int16x32 converts from Uint64x8 to Int16x32

			( Uint64x8) AsInt32x16() (to Int32x16)
				Int32x16 converts from Uint64x8 to Int32x16

			( Uint64x8) AsInt64x8() (to Int64x8)
				Int64x8 converts from Uint64x8 to Int64x8

			( Uint64x8) AsInt8x64() (to Int8x64)
				Int8x64 converts from Uint64x8 to Int8x64

			( Uint64x8) AsUint16x32() (to Uint16x32)
				Uint16x32 converts from Uint64x8 to Uint16x32

			( Uint64x8) AsUint32x16() (to Uint32x16)
				Uint32x16 converts from Uint64x8 to Uint32x16

			( Uint64x8) AsUint8x64() (to Uint8x64)
				Uint8x64 converts from Uint64x8 to Uint8x64

			( Uint64x8) CarrylessMultiplyGrouped(a, b uint8, y Uint64x8) Uint64x8
				CarrylessMultiplyGrouped computes one of four possible carryless
				multiplications of selected high and low halves of each of the four
				128-bit lanes of x and y, depending on the values of a and b,
				and returns the four 128-bit products in the result's lanes.
				a selects the low (0) or high (1) elements of x's lanes and
				b selects the low (0) or high (1) elements of y's lanes.
				
				A carryless multiplication uses bitwise XOR instead of
				add-with-carry, for example (in base two):
				11 * 11 = 11 * (10 ^ 1) = (11 * 10) ^ (11 * 1) = 110 ^ 11 = 101
				
				This also models multiplication of polynomials with coefficients
				from GF(2) -- 11 * 11 models (x+1)*(x+1) = x**2 + (1^1)x + 1 =
				x**2 + 0x + 1 = x**2 + 1 modeled by 101.  (Note that "+" adds
				polynomial terms, but coefficients "add" with XOR.)
				
				constant values of a and b will result in better performance,
				otherwise the intrinsic may translate into a jump table.
				
				Asm: VPCLMULQDQ, CPU Feature: AVX512VPCLMULQDQ

			( Uint64x8) Compress(mask Mask64x8) Uint64x8
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSQ, CPU Feature: AVX512

			( Uint64x8) ConcatPermute(y Uint64x8, indices Uint64x8) Uint64x8
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2Q, CPU Feature: AVX512

			( Uint64x8) ConvertToFloat32() Float32x8
				ConvertToFloat32 converts element values to float32.
				
				Asm: VCVTUQQ2PS, CPU Feature: AVX512

			( Uint64x8) ConvertToFloat64() Float64x8
				ConvertToFloat64 converts element values to float64.
				
				Asm: VCVTUQQ2PD, CPU Feature: AVX512

			( Uint64x8) Equal(y Uint64x8) Mask64x8
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQQ, CPU Feature: AVX512

			( Uint64x8) Expand(mask Mask64x8) Uint64x8
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDQ, CPU Feature: AVX512

			( Uint64x8) GetHi() Uint64x4
				GetHi returns the upper half of x.
				
				Asm: VEXTRACTI64X4, CPU Feature: AVX512

			( Uint64x8) GetLo() Uint64x4
				GetLo returns the lower half of x.
				
				Asm: VEXTRACTI64X4, CPU Feature: AVX512

			( Uint64x8) Greater(y Uint64x8) Mask64x8
				Greater returns x greater-than y, elementwise.
				
				Asm: VPCMPUQ, CPU Feature: AVX512

			( Uint64x8) GreaterEqual(y Uint64x8) Mask64x8
				GreaterEqual returns x greater-than-or-equals y, elementwise.
				
				Asm: VPCMPUQ, CPU Feature: AVX512

			( Uint64x8) InterleaveHiGrouped(y Uint64x8) Uint64x8
				InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKHQDQ, CPU Feature: AVX512

			( Uint64x8) InterleaveLoGrouped(y Uint64x8) Uint64x8
				InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
				
				Asm: VPUNPCKLQDQ, CPU Feature: AVX512

			( Uint64x8) LeadingZeros() Uint64x8
				LeadingZeros counts the leading zeros of each element in x.
				
				Asm: VPLZCNTQ, CPU Feature: AVX512

			( Uint64x8) Len() int
				Len returns the number of elements in a Uint64x8

			( Uint64x8) Less(y Uint64x8) Mask64x8
				Less returns x less-than y, elementwise.
				
				Asm: VPCMPUQ, CPU Feature: AVX512

			( Uint64x8) LessEqual(y Uint64x8) Mask64x8
				LessEqual returns x less-than-or-equals y, elementwise.
				
				Asm: VPCMPUQ, CPU Feature: AVX512

			( Uint64x8) Masked(mask Mask64x8) Uint64x8
				Masked returns x but with elements zeroed where mask is false.

			( Uint64x8) Max(y Uint64x8) Uint64x8
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXUQ, CPU Feature: AVX512

			( Uint64x8) Merge(y Uint64x8, mask Mask64x8) Uint64x8
				Merge returns x but with elements set to y where m is false.

			( Uint64x8) Min(y Uint64x8) Uint64x8
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINUQ, CPU Feature: AVX512

			( Uint64x8) Mul(y Uint64x8) Uint64x8
				Mul multiplies corresponding elements of two vectors.
				
				Asm: VPMULLQ, CPU Feature: AVX512

			( Uint64x8) Not() Uint64x8
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX512

			( Uint64x8) NotEqual(y Uint64x8) Mask64x8
				NotEqual returns x not-equals y, elementwise.
				
				Asm: VPCMPUQ, CPU Feature: AVX512

			( Uint64x8) OnesCount() Uint64x8
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ

			( Uint64x8) Or(y Uint64x8) Uint64x8
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPORQ, CPU Feature: AVX512

			( Uint64x8) Permute(indices Uint64x8) Uint64x8
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 3 bits (values 0-7) of each element of indices is used
				
				Asm: VPERMQ, CPU Feature: AVX512

			( Uint64x8) RotateAllLeft(shift uint8) Uint64x8
				RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPROLQ, CPU Feature: AVX512

			( Uint64x8) RotateAllRight(shift uint8) Uint64x8
				RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPRORQ, CPU Feature: AVX512

			( Uint64x8) RotateLeft(y Uint64x8) Uint64x8
				RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
				
				Asm: VPROLVQ, CPU Feature: AVX512

			( Uint64x8) RotateRight(y Uint64x8) Uint64x8
				RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
				
				Asm: VPRORVQ, CPU Feature: AVX512

			( Uint64x8) SaturateToUint16() Uint16x8
				SaturateToUint16 converts element values to uint16.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPMOVUSQW, CPU Feature: AVX512

			( Uint64x8) SaturateToUint32() Uint32x8
				SaturateToUint32 converts element values to uint32.
				Conversion is done with saturation on the vector elements.
				
				Asm: VPMOVUSQD, CPU Feature: AVX512

			( Uint64x8) SelectFromPairGrouped(a, b uint8, y Uint64x8) Uint64x8
				SelectFromPairGrouped returns, for each of the four 128-bit subvectors
				of the vectors x and y, the selection of two elements from the two
				vectors x and y, where selector values in the range 0-1 specify
				elements from x and values in the range 2-3 specify the 0-1 elements
				of y.  When the selectors are constants the selection can be
				implemented in a single instruction.
				
				If the selectors are not constant this will translate to a function
				call.
				
				Asm: VSHUFPD, CPU Feature: AVX512

			( Uint64x8) SetHi(y Uint64x4) Uint64x8
				SetHi returns x with its upper half set to y.
				
				Asm: VINSERTI64X4, CPU Feature: AVX512

			( Uint64x8) SetLo(y Uint64x4) Uint64x8
				SetLo returns x with its lower half set to y.
				
				Asm: VINSERTI64X4, CPU Feature: AVX512

			( Uint64x8) ShiftAllLeft(y uint64) Uint64x8
				ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
				
				Asm: VPSLLQ, CPU Feature: AVX512

			( Uint64x8) ShiftAllLeftConcat(shift uint8, y Uint64x8) Uint64x8
				ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHLDQ, CPU Feature: AVX512VBMI2

			( Uint64x8) ShiftAllRight(y uint64) Uint64x8
				ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
				
				Asm: VPSRLQ, CPU Feature: AVX512

			( Uint64x8) ShiftAllRightConcat(shift uint8, y Uint64x8) Uint64x8
				ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the
				immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
				
				shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPSHRDQ, CPU Feature: AVX512VBMI2

			( Uint64x8) ShiftLeft(y Uint64x8) Uint64x8
				ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
				
				Asm: VPSLLVQ, CPU Feature: AVX512

			( Uint64x8) ShiftLeftConcat(y Uint64x8, z Uint64x8) Uint64x8
				ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
				
				Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2

			( Uint64x8) ShiftRight(y Uint64x8) Uint64x8
				ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
				
				Asm: VPSRLVQ, CPU Feature: AVX512

			( Uint64x8) ShiftRightConcat(y Uint64x8, z Uint64x8) Uint64x8
				ShiftRightConcat shifts each element of x to the right by the number of bits specified by the
				corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
				
				Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2

			( Uint64x8) Store(y *[8]uint64)
				Store stores a Uint64x8 to an array

			( Uint64x8) StoreMasked(y *[8]uint64, mask Mask64x8)
				StoreMasked stores a Uint64x8 to an array,
				at those elements enabled by mask
				
				Asm: VMOVDQU64, CPU Feature: AVX512

			( Uint64x8) StoreSlice(s []uint64)
				StoreSlice stores x into a slice of at least 8 uint64s

			( Uint64x8) StoreSlicePart(s []uint64)
				StoreSlicePart stores the 8 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 8 or more elements, the method is equivalent to x.StoreSlice.

			( Uint64x8) String() string
				String returns a string representation of SIMD vector x

			( Uint64x8) Sub(y Uint64x8) Uint64x8
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBQ, CPU Feature: AVX512

			( Uint64x8) TruncateToUint16() Uint16x8
				TruncateToUint16 converts element values to uint16.
				Conversion is done with truncation on the vector elements.
				
				Asm: VPMOVQW, CPU Feature: AVX512

			( Uint64x8) TruncateToUint32() Uint32x8
				TruncateToUint32 converts element values to uint32.
				Conversion is done with truncation on the vector elements.
				
				Asm: VPMOVQD, CPU Feature: AVX512

			( Uint64x8) TruncateToUint8() Uint8x16
				TruncateToUint8 converts element values to uint8.
				Conversion is done with truncation on the vector elements.
				Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
				
				Asm: VPMOVQB, CPU Feature: AVX512

			( Uint64x8) Xor(y Uint64x8) Uint64x8
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXORQ, CPU Feature: AVX512

		Implements (at least 2)
			 Uint64x8 : expvar.Var
			 Uint64x8 : fmt.Stringer
		As Outputs Of (at least 56)
			func BroadcastUint64x8(x uint64) Uint64x8
			func LoadMaskedUint64x8(y *[8]uint64, mask Mask64x8) Uint64x8
			func LoadUint64x8(y *[8]uint64) Uint64x8
			func LoadUint64x8Slice(s []uint64) Uint64x8
			func LoadUint64x8SlicePart(s []uint64) Uint64x8
			func Float32x16.AsUint64x8() (to Uint64x8)
			func Float32x8.ConvertToUint64() Uint64x8
			func Float64x8.AsUint64x8() (to Uint64x8)
			func Float64x8.ConvertToUint64() Uint64x8
			func Int16x32.AsUint64x8() (to Uint64x8)
			func Int32x16.AsUint64x8() (to Uint64x8)
			func Int64x8.AsUint64x8() (to Uint64x8)
			func Int8x64.AsUint64x8() (to Uint64x8)
			func Uint16x32.AsUint64x8() (to Uint64x8)
			func Uint16x8.ExtendToUint64() Uint64x8
			func Uint32x16.AsUint64x8() (to Uint64x8)
			func Uint32x8.ExtendToUint64() Uint64x8
			func Uint64x2.Broadcast512() Uint64x8
			func Uint64x8.Add(y Uint64x8) Uint64x8
			func Uint64x8.And(y Uint64x8) Uint64x8
			func Uint64x8.AndNot(y Uint64x8) Uint64x8
			func Uint64x8.CarrylessMultiplyGrouped(a, b uint8, y Uint64x8) Uint64x8
			func Uint64x8.Compress(mask Mask64x8) Uint64x8
			func Uint64x8.ConcatPermute(y Uint64x8, indices Uint64x8) Uint64x8
			func Uint64x8.Expand(mask Mask64x8) Uint64x8
			func Uint64x8.InterleaveHiGrouped(y Uint64x8) Uint64x8
			func Uint64x8.InterleaveLoGrouped(y Uint64x8) Uint64x8
			func Uint64x8.LeadingZeros() Uint64x8
			func Uint64x8.Masked(mask Mask64x8) Uint64x8
			func Uint64x8.Max(y Uint64x8) Uint64x8
			func Uint64x8.Merge(y Uint64x8, mask Mask64x8) Uint64x8
			func Uint64x8.Min(y Uint64x8) Uint64x8
			func Uint64x8.Mul(y Uint64x8) Uint64x8
			func Uint64x8.Not() Uint64x8
			func Uint64x8.OnesCount() Uint64x8
			func Uint64x8.Or(y Uint64x8) Uint64x8
			func Uint64x8.Permute(indices Uint64x8) Uint64x8
			func Uint64x8.RotateAllLeft(shift uint8) Uint64x8
			func Uint64x8.RotateAllRight(shift uint8) Uint64x8
			func Uint64x8.RotateLeft(y Uint64x8) Uint64x8
			func Uint64x8.RotateRight(y Uint64x8) Uint64x8
			func Uint64x8.SelectFromPairGrouped(a, b uint8, y Uint64x8) Uint64x8
			func Uint64x8.SetHi(y Uint64x4) Uint64x8
			func Uint64x8.SetLo(y Uint64x4) Uint64x8
			func Uint64x8.ShiftAllLeft(y uint64) Uint64x8
			func Uint64x8.ShiftAllLeftConcat(shift uint8, y Uint64x8) Uint64x8
			func Uint64x8.ShiftAllRight(y uint64) Uint64x8
			func Uint64x8.ShiftAllRightConcat(shift uint8, y Uint64x8) Uint64x8
			func Uint64x8.ShiftLeft(y Uint64x8) Uint64x8
			func Uint64x8.ShiftLeftConcat(y Uint64x8, z Uint64x8) Uint64x8
			func Uint64x8.ShiftRight(y Uint64x8) Uint64x8
			func Uint64x8.ShiftRightConcat(y Uint64x8, z Uint64x8) Uint64x8
			func Uint64x8.Sub(y Uint64x8) Uint64x8
			func Uint64x8.Xor(y Uint64x8) Uint64x8
			func Uint8x16.ExtendLo8ToUint64x8() Uint64x8
			func Uint8x64.AsUint64x8() (to Uint64x8)
		As Inputs Of (at least 39)
			func Float64x8.ConcatPermute(y Float64x8, indices Uint64x8) Float64x8
			func Float64x8.Permute(indices Uint64x8) Float64x8
			func Int64x8.ConcatPermute(y Int64x8, indices Uint64x8) Int64x8
			func Int64x8.Permute(indices Uint64x8) Int64x8
			func Uint64x8.Add(y Uint64x8) Uint64x8
			func Uint64x8.And(y Uint64x8) Uint64x8
			func Uint64x8.AndNot(y Uint64x8) Uint64x8
			func Uint64x8.CarrylessMultiplyGrouped(a, b uint8, y Uint64x8) Uint64x8
			func Uint64x8.ConcatPermute(y Uint64x8, indices Uint64x8) Uint64x8
			func Uint64x8.ConcatPermute(y Uint64x8, indices Uint64x8) Uint64x8
			func Uint64x8.Equal(y Uint64x8) Mask64x8
			func Uint64x8.Greater(y Uint64x8) Mask64x8
			func Uint64x8.GreaterEqual(y Uint64x8) Mask64x8
			func Uint64x8.InterleaveHiGrouped(y Uint64x8) Uint64x8
			func Uint64x8.InterleaveLoGrouped(y Uint64x8) Uint64x8
			func Uint64x8.Less(y Uint64x8) Mask64x8
			func Uint64x8.LessEqual(y Uint64x8) Mask64x8
			func Uint64x8.Max(y Uint64x8) Uint64x8
			func Uint64x8.Merge(y Uint64x8, mask Mask64x8) Uint64x8
			func Uint64x8.Min(y Uint64x8) Uint64x8
			func Uint64x8.Mul(y Uint64x8) Uint64x8
			func Uint64x8.NotEqual(y Uint64x8) Mask64x8
			func Uint64x8.Or(y Uint64x8) Uint64x8
			func Uint64x8.Permute(indices Uint64x8) Uint64x8
			func Uint64x8.RotateLeft(y Uint64x8) Uint64x8
			func Uint64x8.RotateRight(y Uint64x8) Uint64x8
			func Uint64x8.SelectFromPairGrouped(a, b uint8, y Uint64x8) Uint64x8
			func Uint64x8.ShiftAllLeftConcat(shift uint8, y Uint64x8) Uint64x8
			func Uint64x8.ShiftAllRightConcat(shift uint8, y Uint64x8) Uint64x8
			func Uint64x8.ShiftLeft(y Uint64x8) Uint64x8
			func Uint64x8.ShiftLeftConcat(y Uint64x8, z Uint64x8) Uint64x8
			func Uint64x8.ShiftLeftConcat(y Uint64x8, z Uint64x8) Uint64x8
			func Uint64x8.ShiftRight(y Uint64x8) Uint64x8
			func Uint64x8.ShiftRightConcat(y Uint64x8, z Uint64x8) Uint64x8
			func Uint64x8.ShiftRightConcat(y Uint64x8, z Uint64x8) Uint64x8
			func Uint64x8.Sub(y Uint64x8) Uint64x8
			func Uint64x8.Xor(y Uint64x8) Uint64x8
			func Uint8x64.GaloisFieldAffineTransform(y Uint64x8, b uint8) Uint8x64
			func Uint8x64.GaloisFieldAffineTransformInverse(y Uint64x8, b uint8) Uint8x64

	 type Uint8x16 (struct)
		Uint8x16 is a 128-bit SIMD vector of 16 uint8

		Methods (total 64)
			( Uint8x16) AESDecryptLastRound(y Uint32x4) Uint8x16
				AESDecryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
				x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
				y is the chunk of dw array in use.
				result = AddRoundKey(InvShiftRows(InvSubBytes(x)), y)
				
				Asm: VAESDECLAST, CPU Feature: AVX, AES

			( Uint8x16) AESDecryptOneRound(y Uint32x4) Uint8x16
				AESDecryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
				x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
				y is the chunk of dw array in use.
				result = AddRoundKey(InvMixColumns(InvShiftRows(InvSubBytes(x))), y)
				
				Asm: VAESDEC, CPU Feature: AVX, AES

			( Uint8x16) AESEncryptLastRound(y Uint32x4) Uint8x16
				AESEncryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
				x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
				y is the chunk of w array in use.
				result = AddRoundKey((ShiftRows(SubBytes(x))), y)
				
				Asm: VAESENCLAST, CPU Feature: AVX, AES

			( Uint8x16) AESEncryptOneRound(y Uint32x4) Uint8x16
				AESEncryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
				x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
				y is the chunk of w array in use.
				result = AddRoundKey(MixColumns(ShiftRows(SubBytes(x))), y)
				
				Asm: VAESENC, CPU Feature: AVX, AES

			( Uint8x16) Add(y Uint8x16) Uint8x16
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDB, CPU Feature: AVX

			( Uint8x16) AddSaturated(y Uint8x16) Uint8x16
				AddSaturated adds corresponding elements of two vectors with saturation.
				
				Asm: VPADDUSB, CPU Feature: AVX

			( Uint8x16) And(y Uint8x16) Uint8x16
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPAND, CPU Feature: AVX

			( Uint8x16) AndNot(y Uint8x16) Uint8x16
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDN, CPU Feature: AVX

			( Uint8x16) AsFloat32x4() (to Float32x4)
				Float32x4 converts from Uint8x16 to Float32x4

			( Uint8x16) AsFloat64x2() (to Float64x2)
				Float64x2 converts from Uint8x16 to Float64x2

			( Uint8x16) AsInt16x8() (to Int16x8)
				Int16x8 converts from Uint8x16 to Int16x8

			( Uint8x16) AsInt32x4() (to Int32x4)
				Int32x4 converts from Uint8x16 to Int32x4

			( Uint8x16) AsInt64x2() (to Int64x2)
				Int64x2 converts from Uint8x16 to Int64x2

			( Uint8x16) AsInt8x16() (to Int8x16)
				Int8x16 converts from Uint8x16 to Int8x16

			( Uint8x16) AsUint16x8() (to Uint16x8)
				Uint16x8 converts from Uint8x16 to Uint16x8

			( Uint8x16) AsUint32x4() (to Uint32x4)
				Uint32x4 converts from Uint8x16 to Uint32x4

			( Uint8x16) AsUint64x2() (to Uint64x2)
				Uint64x2 converts from Uint8x16 to Uint64x2

			( Uint8x16) Average(y Uint8x16) Uint8x16
				Average computes the rounded average of corresponding elements.
				
				Asm: VPAVGB, CPU Feature: AVX

			( Uint8x16) Broadcast128() Uint8x16
				Broadcast128 copies element zero of its (128-bit) input to all elements of
				the 128-bit output vector.
				
				Asm: VPBROADCASTB, CPU Feature: AVX2

			( Uint8x16) Broadcast256() Uint8x32
				Broadcast256 copies element zero of its (128-bit) input to all elements of
				the 256-bit output vector.
				
				Asm: VPBROADCASTB, CPU Feature: AVX2

			( Uint8x16) Broadcast512() Uint8x64
				Broadcast512 copies element zero of its (128-bit) input to all elements of
				the 512-bit output vector.
				
				Asm: VPBROADCASTB, CPU Feature: AVX512

			( Uint8x16) Compress(mask Mask8x16) Uint8x16
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2

			( Uint8x16) ConcatPermute(y Uint8x16, indices Uint8x16) Uint8x16
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2B, CPU Feature: AVX512VBMI

			( Uint8x16) ConcatShiftBytesRight(constant uint8, y Uint8x16) Uint8x16
				ConcatShiftBytesRight concatenates x and y and shift it right by constant bytes.
				The result vector will be the lower half of the concatenated vector.
				
				constant results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPALIGNR, CPU Feature: AVX

			( Uint8x16) DotProductPairsSaturated(y Int8x16) Int16x8
				DotProductPairsSaturated multiplies the elements and add the pairs together with saturation,
				yielding a vector of half as many elements with twice the input element size.
				
				Asm: VPMADDUBSW, CPU Feature: AVX

			( Uint8x16) Equal(y Uint8x16) Mask8x16
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQB, CPU Feature: AVX

			( Uint8x16) Expand(mask Mask8x16) Uint8x16
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDB, CPU Feature: AVX512VBMI2

			( Uint8x16) ExtendLo2ToUint64x2() Uint64x2
				ExtendLo2ToUint64x2 converts 2 lowest vector element values to uint64.
				The result vector's elements are zero-extended.
				
				Asm: VPMOVZXBQ, CPU Feature: AVX

			( Uint8x16) ExtendLo4ToUint32x4() Uint32x4
				ExtendLo4ToUint32x4 converts 4 lowest vector element values to uint32.
				The result vector's elements are zero-extended.
				
				Asm: VPMOVZXBD, CPU Feature: AVX

			( Uint8x16) ExtendLo4ToUint64x4() Uint64x4
				ExtendLo4ToUint64x4 converts 4 lowest vector element values to uint64.
				The result vector's elements are zero-extended.
				
				Asm: VPMOVZXBQ, CPU Feature: AVX2

			( Uint8x16) ExtendLo8ToUint16x8() Uint16x8
				ExtendLo8ToUint16x8 converts 8 lowest vector element values to uint16.
				The result vector's elements are zero-extended.
				
				Asm: VPMOVZXBW, CPU Feature: AVX

			( Uint8x16) ExtendLo8ToUint32x8() Uint32x8
				ExtendLo8ToUint32x8 converts 8 lowest vector element values to uint32.
				The result vector's elements are zero-extended.
				
				Asm: VPMOVZXBD, CPU Feature: AVX2

			( Uint8x16) ExtendLo8ToUint64x8() Uint64x8
				ExtendLo8ToUint64x8 converts 8 lowest vector element values to uint64.
				The result vector's elements are zero-extended.
				
				Asm: VPMOVZXBQ, CPU Feature: AVX512

			( Uint8x16) ExtendToUint16() Uint16x16
				ExtendToUint16 converts element values to uint16.
				The result vector's elements are zero-extended.
				
				Asm: VPMOVZXBW, CPU Feature: AVX2

			( Uint8x16) ExtendToUint32() Uint32x16
				ExtendToUint32 converts element values to uint32.
				The result vector's elements are zero-extended.
				
				Asm: VPMOVZXBD, CPU Feature: AVX512

			( Uint8x16) GaloisFieldAffineTransform(y Uint64x2, b uint8) Uint8x16
				GaloisFieldAffineTransform computes an affine transformation in GF(2^8):
				x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes;
				b is an 8-bit vector. The affine transformation is y * x + b, with each element of y
				corresponding to a group of 8 elements in x.
				
				b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VGF2P8AFFINEQB, CPU Feature: AVX512GFNI

			( Uint8x16) GaloisFieldAffineTransformInverse(y Uint64x2, b uint8) Uint8x16
				GaloisFieldAffineTransformInverse computes an affine transformation in GF(2^8),
				with x inverted with respect to reduction polynomial x^8 + x^4 + x^3 + x + 1:
				x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes;
				b is an 8-bit vector. The affine transformation is y * x + b, with each element of y
				corresponding to a group of 8 elements in x.
				
				b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VGF2P8AFFINEINVQB, CPU Feature: AVX512GFNI

			( Uint8x16) GaloisFieldMul(y Uint8x16) Uint8x16
				GaloisFieldMul computes element-wise GF(2^8) multiplication with
				reduction polynomial x^8 + x^4 + x^3 + x + 1.
				
				Asm: VGF2P8MULB, CPU Feature: AVX512GFNI

			( Uint8x16) GetElem(index uint8) uint8
				GetElem retrieves a single constant-indexed element's value.
				
				index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPEXTRB, CPU Feature: AVX512

			( Uint8x16) Greater(y Uint8x16) Mask8x16
				Greater returns a mask whose elements indicate whether x > y
				
				Emulated, CPU Feature AVX2

			( Uint8x16) GreaterEqual(y Uint8x16) Mask8x16
				GreaterEqual returns a mask whose elements indicate whether x >= y
				
				Emulated, CPU Feature AVX2

			( Uint8x16) IsZero() bool
				IsZero returns true if all elements of x are zeros.
				
				This method compiles to VPTEST x, x.
				x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
				
				Asm: VPTEST, CPU Feature: AVX

			( Uint8x16) Len() int
				Len returns the number of elements in a Uint8x16

			( Uint8x16) Less(y Uint8x16) Mask8x16
				Less returns a mask whose elements indicate whether x < y
				
				Emulated, CPU Feature AVX2

			( Uint8x16) LessEqual(y Uint8x16) Mask8x16
				LessEqual returns a mask whose elements indicate whether x <= y
				
				Emulated, CPU Feature AVX2

			( Uint8x16) Masked(mask Mask8x16) Uint8x16
				Masked returns x but with elements zeroed where mask is false.

			( Uint8x16) Max(y Uint8x16) Uint8x16
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXUB, CPU Feature: AVX

			( Uint8x16) Merge(y Uint8x16, mask Mask8x16) Uint8x16
				Merge returns x but with elements set to y where mask is false.

			( Uint8x16) Min(y Uint8x16) Uint8x16
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINUB, CPU Feature: AVX

			( Uint8x16) Not() Uint8x16
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX

			( Uint8x16) NotEqual(y Uint8x16) Mask8x16
				NotEqual returns a mask whose elements indicate whether x != y
				
				Emulated, CPU Feature AVX

			( Uint8x16) OnesCount() Uint8x16
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTB, CPU Feature: AVX512BITALG

			( Uint8x16) Or(y Uint8x16) Uint8x16
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPOR, CPU Feature: AVX

			( Uint8x16) Permute(indices Uint8x16) Uint8x16
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 4 bits (values 0-15) of each element of indices is used
				
				Asm: VPERMB, CPU Feature: AVX512VBMI

			( Uint8x16) PermuteOrZero(indices Int8x16) Uint8x16
				PermuteOrZero performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The lower four bits of each byte-sized index in indices select an element from x,
				unless the index's sign bit is set in which case zero is used instead.
				
				Asm: VPSHUFB, CPU Feature: AVX

			( Uint8x16) SetElem(index uint8, y uint8) Uint8x16
				SetElem sets a single constant-indexed element's value.
				
				index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPINSRB, CPU Feature: AVX

			( Uint8x16) Store(y *[16]uint8)
				Store stores a Uint8x16 to an array

			( Uint8x16) StoreSlice(s []uint8)
				StoreSlice stores x into a slice of at least 16 uint8s

			( Uint8x16) StoreSlicePart(s []uint8)
				StoreSlicePart stores the 16 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 16 or more elements, the method is equivalent to x.StoreSlice.

			( Uint8x16) String() string
				String returns a string representation of SIMD vector x

			( Uint8x16) Sub(y Uint8x16) Uint8x16
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBB, CPU Feature: AVX

			( Uint8x16) SubSaturated(y Uint8x16) Uint8x16
				SubSaturated subtracts corresponding elements of two vectors with saturation.
				
				Asm: VPSUBUSB, CPU Feature: AVX

			( Uint8x16) SumAbsDiff(y Uint8x16) Uint16x8
				SumAbsDiff sums the absolute distance of the two input vectors, each adjacent 8 bytes as a group. The output sum will
				be a vector of word-sized elements whose each 4*n-th element contains the sum of the n-th input group. The other elements in the result vector are zeroed.
				This method could be seen as the norm of the L1 distance of each adjacent 8-byte vector group of the two input vectors.
				
				Asm: VPSADBW, CPU Feature: AVX

			( Uint8x16) Xor(y Uint8x16) Uint8x16
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXOR, CPU Feature: AVX

		Implements (at least 2)
			 Uint8x16 : expvar.Var
			 Uint8x16 : fmt.Stringer
		As Outputs Of (at least 53)
			func BroadcastUint8x16(x uint8) Uint8x16
			func LoadUint8x16(y *[16]uint8) Uint8x16
			func LoadUint8x16Slice(s []uint8) Uint8x16
			func LoadUint8x16SlicePart(s []uint8) Uint8x16
			func Float32x4.AsUint8x16() (to Uint8x16)
			func Float64x2.AsUint8x16() (to Uint8x16)
			func Int16x8.AsUint8x16() (to Uint8x16)
			func Int32x4.AsUint8x16() (to Uint8x16)
			func Int64x2.AsUint8x16() (to Uint8x16)
			func Int8x16.AsUint8x16() (to Uint8x16)
			func Uint16x16.TruncateToUint8() Uint8x16
			func Uint16x8.AsUint8x16() (to Uint8x16)
			func Uint16x8.TruncateToUint8() Uint8x16
			func Uint32x16.TruncateToUint8() Uint8x16
			func Uint32x4.AsUint8x16() (to Uint8x16)
			func Uint32x4.TruncateToUint8() Uint8x16
			func Uint32x8.TruncateToUint8() Uint8x16
			func Uint64x2.AsUint8x16() (to Uint8x16)
			func Uint64x2.TruncateToUint8() Uint8x16
			func Uint64x4.TruncateToUint8() Uint8x16
			func Uint64x8.TruncateToUint8() Uint8x16
			func Uint8x16.Add(y Uint8x16) Uint8x16
			func Uint8x16.AddSaturated(y Uint8x16) Uint8x16
			func Uint8x16.AESDecryptLastRound(y Uint32x4) Uint8x16
			func Uint8x16.AESDecryptOneRound(y Uint32x4) Uint8x16
			func Uint8x16.AESEncryptLastRound(y Uint32x4) Uint8x16
			func Uint8x16.AESEncryptOneRound(y Uint32x4) Uint8x16
			func Uint8x16.And(y Uint8x16) Uint8x16
			func Uint8x16.AndNot(y Uint8x16) Uint8x16
			func Uint8x16.Average(y Uint8x16) Uint8x16
			func Uint8x16.Broadcast128() Uint8x16
			func Uint8x16.Compress(mask Mask8x16) Uint8x16
			func Uint8x16.ConcatPermute(y Uint8x16, indices Uint8x16) Uint8x16
			func Uint8x16.ConcatShiftBytesRight(constant uint8, y Uint8x16) Uint8x16
			func Uint8x16.Expand(mask Mask8x16) Uint8x16
			func Uint8x16.GaloisFieldAffineTransform(y Uint64x2, b uint8) Uint8x16
			func Uint8x16.GaloisFieldAffineTransformInverse(y Uint64x2, b uint8) Uint8x16
			func Uint8x16.GaloisFieldMul(y Uint8x16) Uint8x16
			func Uint8x16.Masked(mask Mask8x16) Uint8x16
			func Uint8x16.Max(y Uint8x16) Uint8x16
			func Uint8x16.Merge(y Uint8x16, mask Mask8x16) Uint8x16
			func Uint8x16.Min(y Uint8x16) Uint8x16
			func Uint8x16.Not() Uint8x16
			func Uint8x16.OnesCount() Uint8x16
			func Uint8x16.Or(y Uint8x16) Uint8x16
			func Uint8x16.Permute(indices Uint8x16) Uint8x16
			func Uint8x16.PermuteOrZero(indices Int8x16) Uint8x16
			func Uint8x16.SetElem(index uint8, y uint8) Uint8x16
			func Uint8x16.Sub(y Uint8x16) Uint8x16
			func Uint8x16.SubSaturated(y Uint8x16) Uint8x16
			func Uint8x16.Xor(y Uint8x16) Uint8x16
			func Uint8x32.GetHi() Uint8x16
			func Uint8x32.GetLo() Uint8x16
		As Inputs Of (at least 30)
			func Int8x16.ConcatPermute(y Int8x16, indices Uint8x16) Int8x16
			func Int8x16.DotProductQuadruple(y Uint8x16) Int32x4
			func Int8x16.DotProductQuadrupleSaturated(y Uint8x16) Int32x4
			func Int8x16.Permute(indices Uint8x16) Int8x16
			func Uint8x16.Add(y Uint8x16) Uint8x16
			func Uint8x16.AddSaturated(y Uint8x16) Uint8x16
			func Uint8x16.And(y Uint8x16) Uint8x16
			func Uint8x16.AndNot(y Uint8x16) Uint8x16
			func Uint8x16.Average(y Uint8x16) Uint8x16
			func Uint8x16.ConcatPermute(y Uint8x16, indices Uint8x16) Uint8x16
			func Uint8x16.ConcatPermute(y Uint8x16, indices Uint8x16) Uint8x16
			func Uint8x16.ConcatShiftBytesRight(constant uint8, y Uint8x16) Uint8x16
			func Uint8x16.Equal(y Uint8x16) Mask8x16
			func Uint8x16.GaloisFieldMul(y Uint8x16) Uint8x16
			func Uint8x16.Greater(y Uint8x16) Mask8x16
			func Uint8x16.GreaterEqual(y Uint8x16) Mask8x16
			func Uint8x16.Less(y Uint8x16) Mask8x16
			func Uint8x16.LessEqual(y Uint8x16) Mask8x16
			func Uint8x16.Max(y Uint8x16) Uint8x16
			func Uint8x16.Merge(y Uint8x16, mask Mask8x16) Uint8x16
			func Uint8x16.Min(y Uint8x16) Uint8x16
			func Uint8x16.NotEqual(y Uint8x16) Mask8x16
			func Uint8x16.Or(y Uint8x16) Uint8x16
			func Uint8x16.Permute(indices Uint8x16) Uint8x16
			func Uint8x16.Sub(y Uint8x16) Uint8x16
			func Uint8x16.SubSaturated(y Uint8x16) Uint8x16
			func Uint8x16.SumAbsDiff(y Uint8x16) Uint16x8
			func Uint8x16.Xor(y Uint8x16) Uint8x16
			func Uint8x32.SetHi(y Uint8x16) Uint8x32
			func Uint8x32.SetLo(y Uint8x16) Uint8x32

	 type Uint8x32 (struct)
		Uint8x32 is a 256-bit SIMD vector of 32 uint8

		Methods (total 57)
			( Uint8x32) AESDecryptLastRound(y Uint32x8) Uint8x32
				AESDecryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
				x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
				y is the chunk of dw array in use.
				result = AddRoundKey(InvShiftRows(InvSubBytes(x)), y)
				
				Asm: VAESDECLAST, CPU Feature: AVX512VAES

			( Uint8x32) AESDecryptOneRound(y Uint32x8) Uint8x32
				AESDecryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
				x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
				y is the chunk of dw array in use.
				result = AddRoundKey(InvMixColumns(InvShiftRows(InvSubBytes(x))), y)
				
				Asm: VAESDEC, CPU Feature: AVX512VAES

			( Uint8x32) AESEncryptLastRound(y Uint32x8) Uint8x32
				AESEncryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
				x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
				y is the chunk of w array in use.
				result = AddRoundKey((ShiftRows(SubBytes(x))), y)
				
				Asm: VAESENCLAST, CPU Feature: AVX512VAES

			( Uint8x32) AESEncryptOneRound(y Uint32x8) Uint8x32
				AESEncryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
				x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
				y is the chunk of w array in use.
				result = AddRoundKey(MixColumns(ShiftRows(SubBytes(x))), y)
				
				Asm: VAESENC, CPU Feature: AVX512VAES

			( Uint8x32) Add(y Uint8x32) Uint8x32
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDB, CPU Feature: AVX2

			( Uint8x32) AddSaturated(y Uint8x32) Uint8x32
				AddSaturated adds corresponding elements of two vectors with saturation.
				
				Asm: VPADDUSB, CPU Feature: AVX2

			( Uint8x32) And(y Uint8x32) Uint8x32
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPAND, CPU Feature: AVX2

			( Uint8x32) AndNot(y Uint8x32) Uint8x32
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDN, CPU Feature: AVX2

			( Uint8x32) AsFloat32x8() (to Float32x8)
				Float32x8 converts from Uint8x32 to Float32x8

			( Uint8x32) AsFloat64x4() (to Float64x4)
				Float64x4 converts from Uint8x32 to Float64x4

			( Uint8x32) AsInt16x16() (to Int16x16)
				Int16x16 converts from Uint8x32 to Int16x16

			( Uint8x32) AsInt32x8() (to Int32x8)
				Int32x8 converts from Uint8x32 to Int32x8

			( Uint8x32) AsInt64x4() (to Int64x4)
				Int64x4 converts from Uint8x32 to Int64x4

			( Uint8x32) AsInt8x32() (to Int8x32)
				Int8x32 converts from Uint8x32 to Int8x32

			( Uint8x32) AsUint16x16() (to Uint16x16)
				Uint16x16 converts from Uint8x32 to Uint16x16

			( Uint8x32) AsUint32x8() (to Uint32x8)
				Uint32x8 converts from Uint8x32 to Uint32x8

			( Uint8x32) AsUint64x4() (to Uint64x4)
				Uint64x4 converts from Uint8x32 to Uint64x4

			( Uint8x32) Average(y Uint8x32) Uint8x32
				Average computes the rounded average of corresponding elements.
				
				Asm: VPAVGB, CPU Feature: AVX2

			( Uint8x32) Compress(mask Mask8x32) Uint8x32
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2

			( Uint8x32) ConcatPermute(y Uint8x32, indices Uint8x32) Uint8x32
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2B, CPU Feature: AVX512VBMI

			( Uint8x32) ConcatShiftBytesRightGrouped(constant uint8, y Uint8x32) Uint8x32
				ConcatShiftBytesRightGrouped concatenates x and y and shift it right by constant bytes.
				The result vector will be the lower half of the concatenated vector.
				This operation is performed grouped by each 16 byte.
				
				constant results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPALIGNR, CPU Feature: AVX2

			( Uint8x32) DotProductPairsSaturated(y Int8x32) Int16x16
				DotProductPairsSaturated multiplies the elements and add the pairs together with saturation,
				yielding a vector of half as many elements with twice the input element size.
				
				Asm: VPMADDUBSW, CPU Feature: AVX2

			( Uint8x32) Equal(y Uint8x32) Mask8x32
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQB, CPU Feature: AVX2

			( Uint8x32) Expand(mask Mask8x32) Uint8x32
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDB, CPU Feature: AVX512VBMI2

			( Uint8x32) ExtendToUint16() Uint16x32
				ExtendToUint16 converts element values to uint16.
				The result vector's elements are zero-extended.
				
				Asm: VPMOVZXBW, CPU Feature: AVX512

			( Uint8x32) GaloisFieldAffineTransform(y Uint64x4, b uint8) Uint8x32
				GaloisFieldAffineTransform computes an affine transformation in GF(2^8):
				x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes;
				b is an 8-bit vector. The affine transformation is y * x + b, with each element of y
				corresponding to a group of 8 elements in x.
				
				b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VGF2P8AFFINEQB, CPU Feature: AVX512GFNI

			( Uint8x32) GaloisFieldAffineTransformInverse(y Uint64x4, b uint8) Uint8x32
				GaloisFieldAffineTransformInverse computes an affine transformation in GF(2^8),
				with x inverted with respect to reduction polynomial x^8 + x^4 + x^3 + x + 1:
				x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes;
				b is an 8-bit vector. The affine transformation is y * x + b, with each element of y
				corresponding to a group of 8 elements in x.
				
				b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VGF2P8AFFINEINVQB, CPU Feature: AVX512GFNI

			( Uint8x32) GaloisFieldMul(y Uint8x32) Uint8x32
				GaloisFieldMul computes element-wise GF(2^8) multiplication with
				reduction polynomial x^8 + x^4 + x^3 + x + 1.
				
				Asm: VGF2P8MULB, CPU Feature: AVX512GFNI

			( Uint8x32) GetHi() Uint8x16
				GetHi returns the upper half of x.
				
				Asm: VEXTRACTI128, CPU Feature: AVX2

			( Uint8x32) GetLo() Uint8x16
				GetLo returns the lower half of x.
				
				Asm: VEXTRACTI128, CPU Feature: AVX2

			( Uint8x32) Greater(y Uint8x32) Mask8x32
				Greater returns a mask whose elements indicate whether x > y
				
				Emulated, CPU Feature AVX2

			( Uint8x32) GreaterEqual(y Uint8x32) Mask8x32
				GreaterEqual returns a mask whose elements indicate whether x >= y
				
				Emulated, CPU Feature AVX2

			( Uint8x32) IsZero() bool
				IsZero returns true if all elements of x are zeros.
				
				This method compiles to VPTEST x, x.
				x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
				
				Asm: VPTEST, CPU Feature: AVX

			( Uint8x32) Len() int
				Len returns the number of elements in a Uint8x32

			( Uint8x32) Less(y Uint8x32) Mask8x32
				Less returns a mask whose elements indicate whether x < y
				
				Emulated, CPU Feature AVX2

			( Uint8x32) LessEqual(y Uint8x32) Mask8x32
				LessEqual returns a mask whose elements indicate whether x <= y
				
				Emulated, CPU Feature AVX2

			( Uint8x32) Masked(mask Mask8x32) Uint8x32
				Masked returns x but with elements zeroed where mask is false.

			( Uint8x32) Max(y Uint8x32) Uint8x32
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXUB, CPU Feature: AVX2

			( Uint8x32) Merge(y Uint8x32, mask Mask8x32) Uint8x32
				Merge returns x but with elements set to y where mask is false.

			( Uint8x32) Min(y Uint8x32) Uint8x32
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINUB, CPU Feature: AVX2

			( Uint8x32) Not() Uint8x32
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX2

			( Uint8x32) NotEqual(y Uint8x32) Mask8x32
				NotEqual returns a mask whose elements indicate whether x != y
				
				Emulated, CPU Feature AVX2

			( Uint8x32) OnesCount() Uint8x32
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTB, CPU Feature: AVX512BITALG

			( Uint8x32) Or(y Uint8x32) Uint8x32
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPOR, CPU Feature: AVX2

			( Uint8x32) Permute(indices Uint8x32) Uint8x32
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 5 bits (values 0-31) of each element of indices is used
				
				Asm: VPERMB, CPU Feature: AVX512VBMI

			( Uint8x32) PermuteOrZeroGrouped(indices Int8x32) Uint8x32
				PermuteOrZeroGrouped performs a grouped permutation of vector x using indices:
				result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...}
				The lower four bits of each byte-sized index in indices select an element from its corresponding group in x,
				unless the index's sign bit is set in which case zero is used instead.
				Each group is of size 128-bit.
				
				Asm: VPSHUFB, CPU Feature: AVX2

			( Uint8x32) Select128FromPair(lo, hi uint8, y Uint8x32) Uint8x32
				Select128FromPair treats the 256-bit vectors x and y as a single vector of four
				128-bit elements, and returns a 256-bit result formed by
				concatenating the two elements specified by lo and hi.
				For example,
				
					{0x40, 0x41, ..., 0x4f, 0x50, 0x51, ..., 0x5f}.Select128FromPair(3, 0,
					     {0x60, 0x61, ..., 0x6f, 0x70, 0x71, ..., 0x7f})
				
				returns {0x70, 0x71, ..., 0x7f, 0x40, 0x41, ..., 0x4f}.
				
				lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table.
				lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
				
				Asm: VPERM2I128, CPU Feature: AVX2

			( Uint8x32) SetHi(y Uint8x16) Uint8x32
				SetHi returns x with its upper half set to y.
				
				Asm: VINSERTI128, CPU Feature: AVX2

			( Uint8x32) SetLo(y Uint8x16) Uint8x32
				SetLo returns x with its lower half set to y.
				
				Asm: VINSERTI128, CPU Feature: AVX2

			( Uint8x32) Store(y *[32]uint8)
				Store stores a Uint8x32 to an array

			( Uint8x32) StoreSlice(s []uint8)
				StoreSlice stores x into a slice of at least 32 uint8s

			( Uint8x32) StoreSlicePart(s []uint8)
				StoreSlicePart stores the 32 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 32 or more elements, the method is equivalent to x.StoreSlice.

			( Uint8x32) String() string
				String returns a string representation of SIMD vector x

			( Uint8x32) Sub(y Uint8x32) Uint8x32
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBB, CPU Feature: AVX2

			( Uint8x32) SubSaturated(y Uint8x32) Uint8x32
				SubSaturated subtracts corresponding elements of two vectors with saturation.
				
				Asm: VPSUBUSB, CPU Feature: AVX2

			( Uint8x32) SumAbsDiff(y Uint8x32) Uint16x16
				SumAbsDiff sums the absolute distance of the two input vectors, each adjacent 8 bytes as a group. The output sum will
				be a vector of word-sized elements whose each 4*n-th element contains the sum of the n-th input group. The other elements in the result vector are zeroed.
				This method could be seen as the norm of the L1 distance of each adjacent 8-byte vector group of the two input vectors.
				
				Asm: VPSADBW, CPU Feature: AVX2

			( Uint8x32) Xor(y Uint8x32) Uint8x32
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXOR, CPU Feature: AVX2

		Implements (at least 2)
			 Uint8x32 : expvar.Var
			 Uint8x32 : fmt.Stringer
		As Outputs Of (at least 49)
			func BroadcastUint8x32(x uint8) Uint8x32
			func LoadUint8x32(y *[32]uint8) Uint8x32
			func LoadUint8x32Slice(s []uint8) Uint8x32
			func LoadUint8x32SlicePart(s []uint8) Uint8x32
			func Float32x8.AsUint8x32() (to Uint8x32)
			func Float64x4.AsUint8x32() (to Uint8x32)
			func Int16x16.AsUint8x32() (to Uint8x32)
			func Int32x8.AsUint8x32() (to Uint8x32)
			func Int64x4.AsUint8x32() (to Uint8x32)
			func Int8x32.AsUint8x32() (to Uint8x32)
			func Uint16x16.AsUint8x32() (to Uint8x32)
			func Uint16x32.SaturateToUint8() Uint8x32
			func Uint16x32.TruncateToUint8() Uint8x32
			func Uint32x8.AsUint8x32() (to Uint8x32)
			func Uint64x4.AsUint8x32() (to Uint8x32)
			func Uint8x16.Broadcast256() Uint8x32
			func Uint8x32.Add(y Uint8x32) Uint8x32
			func Uint8x32.AddSaturated(y Uint8x32) Uint8x32
			func Uint8x32.AESDecryptLastRound(y Uint32x8) Uint8x32
			func Uint8x32.AESDecryptOneRound(y Uint32x8) Uint8x32
			func Uint8x32.AESEncryptLastRound(y Uint32x8) Uint8x32
			func Uint8x32.AESEncryptOneRound(y Uint32x8) Uint8x32
			func Uint8x32.And(y Uint8x32) Uint8x32
			func Uint8x32.AndNot(y Uint8x32) Uint8x32
			func Uint8x32.Average(y Uint8x32) Uint8x32
			func Uint8x32.Compress(mask Mask8x32) Uint8x32
			func Uint8x32.ConcatPermute(y Uint8x32, indices Uint8x32) Uint8x32
			func Uint8x32.ConcatShiftBytesRightGrouped(constant uint8, y Uint8x32) Uint8x32
			func Uint8x32.Expand(mask Mask8x32) Uint8x32
			func Uint8x32.GaloisFieldAffineTransform(y Uint64x4, b uint8) Uint8x32
			func Uint8x32.GaloisFieldAffineTransformInverse(y Uint64x4, b uint8) Uint8x32
			func Uint8x32.GaloisFieldMul(y Uint8x32) Uint8x32
			func Uint8x32.Masked(mask Mask8x32) Uint8x32
			func Uint8x32.Max(y Uint8x32) Uint8x32
			func Uint8x32.Merge(y Uint8x32, mask Mask8x32) Uint8x32
			func Uint8x32.Min(y Uint8x32) Uint8x32
			func Uint8x32.Not() Uint8x32
			func Uint8x32.OnesCount() Uint8x32
			func Uint8x32.Or(y Uint8x32) Uint8x32
			func Uint8x32.Permute(indices Uint8x32) Uint8x32
			func Uint8x32.PermuteOrZeroGrouped(indices Int8x32) Uint8x32
			func Uint8x32.Select128FromPair(lo, hi uint8, y Uint8x32) Uint8x32
			func Uint8x32.SetHi(y Uint8x16) Uint8x32
			func Uint8x32.SetLo(y Uint8x16) Uint8x32
			func Uint8x32.Sub(y Uint8x32) Uint8x32
			func Uint8x32.SubSaturated(y Uint8x32) Uint8x32
			func Uint8x32.Xor(y Uint8x32) Uint8x32
			func Uint8x64.GetHi() Uint8x32
			func Uint8x64.GetLo() Uint8x32
		As Inputs Of (at least 31)
			func Int8x32.ConcatPermute(y Int8x32, indices Uint8x32) Int8x32
			func Int8x32.DotProductQuadruple(y Uint8x32) Int32x8
			func Int8x32.DotProductQuadrupleSaturated(y Uint8x32) Int32x8
			func Int8x32.Permute(indices Uint8x32) Int8x32
			func Uint8x32.Add(y Uint8x32) Uint8x32
			func Uint8x32.AddSaturated(y Uint8x32) Uint8x32
			func Uint8x32.And(y Uint8x32) Uint8x32
			func Uint8x32.AndNot(y Uint8x32) Uint8x32
			func Uint8x32.Average(y Uint8x32) Uint8x32
			func Uint8x32.ConcatPermute(y Uint8x32, indices Uint8x32) Uint8x32
			func Uint8x32.ConcatPermute(y Uint8x32, indices Uint8x32) Uint8x32
			func Uint8x32.ConcatShiftBytesRightGrouped(constant uint8, y Uint8x32) Uint8x32
			func Uint8x32.Equal(y Uint8x32) Mask8x32
			func Uint8x32.GaloisFieldMul(y Uint8x32) Uint8x32
			func Uint8x32.Greater(y Uint8x32) Mask8x32
			func Uint8x32.GreaterEqual(y Uint8x32) Mask8x32
			func Uint8x32.Less(y Uint8x32) Mask8x32
			func Uint8x32.LessEqual(y Uint8x32) Mask8x32
			func Uint8x32.Max(y Uint8x32) Uint8x32
			func Uint8x32.Merge(y Uint8x32, mask Mask8x32) Uint8x32
			func Uint8x32.Min(y Uint8x32) Uint8x32
			func Uint8x32.NotEqual(y Uint8x32) Mask8x32
			func Uint8x32.Or(y Uint8x32) Uint8x32
			func Uint8x32.Permute(indices Uint8x32) Uint8x32
			func Uint8x32.Select128FromPair(lo, hi uint8, y Uint8x32) Uint8x32
			func Uint8x32.Sub(y Uint8x32) Uint8x32
			func Uint8x32.SubSaturated(y Uint8x32) Uint8x32
			func Uint8x32.SumAbsDiff(y Uint8x32) Uint16x16
			func Uint8x32.Xor(y Uint8x32) Uint8x32
			func Uint8x64.SetHi(y Uint8x32) Uint8x64
			func Uint8x64.SetLo(y Uint8x32) Uint8x64

	 type Uint8x64 (struct)
		Uint8x64 is a 512-bit SIMD vector of 64 uint8

		Methods (total 55)
			( Uint8x64) AESDecryptLastRound(y Uint32x16) Uint8x64
				AESDecryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
				x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
				y is the chunk of dw array in use.
				result = AddRoundKey(InvShiftRows(InvSubBytes(x)), y)
				
				Asm: VAESDECLAST, CPU Feature: AVX512VAES

			( Uint8x64) AESDecryptOneRound(y Uint32x16) Uint8x64
				AESDecryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
				x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
				y is the chunk of dw array in use.
				result = AddRoundKey(InvMixColumns(InvShiftRows(InvSubBytes(x))), y)
				
				Asm: VAESDEC, CPU Feature: AVX512VAES

			( Uint8x64) AESEncryptLastRound(y Uint32x16) Uint8x64
				AESEncryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
				x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
				y is the chunk of w array in use.
				result = AddRoundKey((ShiftRows(SubBytes(x))), y)
				
				Asm: VAESENCLAST, CPU Feature: AVX512VAES

			( Uint8x64) AESEncryptOneRound(y Uint32x16) Uint8x64
				AESEncryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197.
				x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33.
				y is the chunk of w array in use.
				result = AddRoundKey(MixColumns(ShiftRows(SubBytes(x))), y)
				
				Asm: VAESENC, CPU Feature: AVX512VAES

			( Uint8x64) Add(y Uint8x64) Uint8x64
				Add adds corresponding elements of two vectors.
				
				Asm: VPADDB, CPU Feature: AVX512

			( Uint8x64) AddSaturated(y Uint8x64) Uint8x64
				AddSaturated adds corresponding elements of two vectors with saturation.
				
				Asm: VPADDUSB, CPU Feature: AVX512

			( Uint8x64) And(y Uint8x64) Uint8x64
				And performs a bitwise AND operation between two vectors.
				
				Asm: VPANDD, CPU Feature: AVX512

			( Uint8x64) AndNot(y Uint8x64) Uint8x64
				AndNot performs a bitwise x &^ y.
				
				Asm: VPANDND, CPU Feature: AVX512

			( Uint8x64) AsFloat32x16() (to Float32x16)
				Float32x16 converts from Uint8x64 to Float32x16

			( Uint8x64) AsFloat64x8() (to Float64x8)
				Float64x8 converts from Uint8x64 to Float64x8

			( Uint8x64) AsInt16x32() (to Int16x32)
				Int16x32 converts from Uint8x64 to Int16x32

			( Uint8x64) AsInt32x16() (to Int32x16)
				Int32x16 converts from Uint8x64 to Int32x16

			( Uint8x64) AsInt64x8() (to Int64x8)
				Int64x8 converts from Uint8x64 to Int64x8

			( Uint8x64) AsInt8x64() (to Int8x64)
				Int8x64 converts from Uint8x64 to Int8x64

			( Uint8x64) AsUint16x32() (to Uint16x32)
				Uint16x32 converts from Uint8x64 to Uint16x32

			( Uint8x64) AsUint32x16() (to Uint32x16)
				Uint32x16 converts from Uint8x64 to Uint32x16

			( Uint8x64) AsUint64x8() (to Uint64x8)
				Uint64x8 converts from Uint8x64 to Uint64x8

			( Uint8x64) Average(y Uint8x64) Uint8x64
				Average computes the rounded average of corresponding elements.
				
				Asm: VPAVGB, CPU Feature: AVX512

			( Uint8x64) Compress(mask Mask8x64) Uint8x64
				Compress performs a compression on vector x using mask by
				selecting elements as indicated by mask, and pack them to lower indexed elements.
				
				Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2

			( Uint8x64) ConcatPermute(y Uint8x64, indices Uint8x64) Uint8x64
				ConcatPermute performs a full permutation of vector x, y using indices:
				result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
				where xy is the concatenation of x (lower half) and y (upper half).
				Only the needed bits to represent xy's index are used in indices' elements.
				
				Asm: VPERMI2B, CPU Feature: AVX512VBMI

			( Uint8x64) ConcatShiftBytesRightGrouped(constant uint8, y Uint8x64) Uint8x64
				ConcatShiftBytesRightGrouped concatenates x and y and shift it right by constant bytes.
				The result vector will be the lower half of the concatenated vector.
				This operation is performed grouped by each 16 byte.
				
				constant results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VPALIGNR, CPU Feature: AVX512

			( Uint8x64) DotProductPairsSaturated(y Int8x64) Int16x32
				DotProductPairsSaturated multiplies the elements and add the pairs together with saturation,
				yielding a vector of half as many elements with twice the input element size.
				
				Asm: VPMADDUBSW, CPU Feature: AVX512

			( Uint8x64) Equal(y Uint8x64) Mask8x64
				Equal returns x equals y, elementwise.
				
				Asm: VPCMPEQB, CPU Feature: AVX512

			( Uint8x64) Expand(mask Mask8x64) Uint8x64
				Expand performs an expansion on a vector x whose elements are packed to lower parts.
				The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
				
				Asm: VPEXPANDB, CPU Feature: AVX512VBMI2

			( Uint8x64) GaloisFieldAffineTransform(y Uint64x8, b uint8) Uint8x64
				GaloisFieldAffineTransform computes an affine transformation in GF(2^8):
				x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes;
				b is an 8-bit vector. The affine transformation is y * x + b, with each element of y
				corresponding to a group of 8 elements in x.
				
				b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VGF2P8AFFINEQB, CPU Feature: AVX512GFNI

			( Uint8x64) GaloisFieldAffineTransformInverse(y Uint64x8, b uint8) Uint8x64
				GaloisFieldAffineTransformInverse computes an affine transformation in GF(2^8),
				with x inverted with respect to reduction polynomial x^8 + x^4 + x^3 + x + 1:
				x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes;
				b is an 8-bit vector. The affine transformation is y * x + b, with each element of y
				corresponding to a group of 8 elements in x.
				
				b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
				
				Asm: VGF2P8AFFINEINVQB, CPU Feature: AVX512GFNI

			( Uint8x64) GaloisFieldMul(y Uint8x64) Uint8x64
				GaloisFieldMul computes element-wise GF(2^8) multiplication with
				reduction polynomial x^8 + x^4 + x^3 + x + 1.
				
				Asm: VGF2P8MULB, CPU Feature: AVX512GFNI

			( Uint8x64) GetHi() Uint8x32
				GetHi returns the upper half of x.
				
				Asm: VEXTRACTI64X4, CPU Feature: AVX512

			( Uint8x64) GetLo() Uint8x32
				GetLo returns the lower half of x.
				
				Asm: VEXTRACTI64X4, CPU Feature: AVX512

			( Uint8x64) Greater(y Uint8x64) Mask8x64
				Greater returns x greater-than y, elementwise.
				
				Asm: VPCMPUB, CPU Feature: AVX512

			( Uint8x64) GreaterEqual(y Uint8x64) Mask8x64
				GreaterEqual returns x greater-than-or-equals y, elementwise.
				
				Asm: VPCMPUB, CPU Feature: AVX512

			( Uint8x64) Len() int
				Len returns the number of elements in a Uint8x64

			( Uint8x64) Less(y Uint8x64) Mask8x64
				Less returns x less-than y, elementwise.
				
				Asm: VPCMPUB, CPU Feature: AVX512

			( Uint8x64) LessEqual(y Uint8x64) Mask8x64
				LessEqual returns x less-than-or-equals y, elementwise.
				
				Asm: VPCMPUB, CPU Feature: AVX512

			( Uint8x64) Masked(mask Mask8x64) Uint8x64
				Masked returns x but with elements zeroed where mask is false.

			( Uint8x64) Max(y Uint8x64) Uint8x64
				Max computes the maximum of corresponding elements.
				
				Asm: VPMAXUB, CPU Feature: AVX512

			( Uint8x64) Merge(y Uint8x64, mask Mask8x64) Uint8x64
				Merge returns x but with elements set to y where m is false.

			( Uint8x64) Min(y Uint8x64) Uint8x64
				Min computes the minimum of corresponding elements.
				
				Asm: VPMINUB, CPU Feature: AVX512

			( Uint8x64) Not() Uint8x64
				Not returns the bitwise complement of x
				
				Emulated, CPU Feature AVX512

			( Uint8x64) NotEqual(y Uint8x64) Mask8x64
				NotEqual returns x not-equals y, elementwise.
				
				Asm: VPCMPUB, CPU Feature: AVX512

			( Uint8x64) OnesCount() Uint8x64
				OnesCount counts the number of set bits in each element.
				
				Asm: VPOPCNTB, CPU Feature: AVX512BITALG

			( Uint8x64) Or(y Uint8x64) Uint8x64
				Or performs a bitwise OR operation between two vectors.
				
				Asm: VPORD, CPU Feature: AVX512

			( Uint8x64) Permute(indices Uint8x64) Uint8x64
				Permute performs a full permutation of vector x using indices:
				result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
				The low 6 bits (values 0-63) of each element of indices is used
				
				Asm: VPERMB, CPU Feature: AVX512VBMI

			( Uint8x64) PermuteOrZeroGrouped(indices Int8x64) Uint8x64
				PermuteOrZeroGrouped performs a grouped permutation of vector x using indices:
				result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...}
				The lower four bits of each byte-sized index in indices select an element from its corresponding group in x,
				unless the index's sign bit is set in which case zero is used instead.
				Each group is of size 128-bit.
				
				Asm: VPSHUFB, CPU Feature: AVX512

			( Uint8x64) SetHi(y Uint8x32) Uint8x64
				SetHi returns x with its upper half set to y.
				
				Asm: VINSERTI64X4, CPU Feature: AVX512

			( Uint8x64) SetLo(y Uint8x32) Uint8x64
				SetLo returns x with its lower half set to y.
				
				Asm: VINSERTI64X4, CPU Feature: AVX512

			( Uint8x64) Store(y *[64]uint8)
				Store stores a Uint8x64 to an array

			( Uint8x64) StoreMasked(y *[64]uint8, mask Mask8x64)
				StoreMasked stores a Uint8x64 to an array,
				at those elements enabled by mask
				
				Asm: VMOVDQU8, CPU Feature: AVX512

			( Uint8x64) StoreSlice(s []uint8)
				StoreSlice stores x into a slice of at least 64 uint8s

			( Uint8x64) StoreSlicePart(s []uint8)
				StoreSlicePart stores the 64 elements of x into the slice s.
				It stores as many elements as will fit in s.
				If s has 64 or more elements, the method is equivalent to x.StoreSlice.

			( Uint8x64) String() string
				String returns a string representation of SIMD vector x

			( Uint8x64) Sub(y Uint8x64) Uint8x64
				Sub subtracts corresponding elements of two vectors.
				
				Asm: VPSUBB, CPU Feature: AVX512

			( Uint8x64) SubSaturated(y Uint8x64) Uint8x64
				SubSaturated subtracts corresponding elements of two vectors with saturation.
				
				Asm: VPSUBUSB, CPU Feature: AVX512

			( Uint8x64) SumAbsDiff(y Uint8x64) Uint16x32
				SumAbsDiff sums the absolute distance of the two input vectors, each adjacent 8 bytes as a group. The output sum will
				be a vector of word-sized elements whose each 4*n-th element contains the sum of the n-th input group. The other elements in the result vector are zeroed.
				This method could be seen as the norm of the L1 distance of each adjacent 8-byte vector group of the two input vectors.
				
				Asm: VPSADBW, CPU Feature: AVX512

			( Uint8x64) Xor(y Uint8x64) Uint8x64
				Xor performs a bitwise XOR operation between two vectors.
				
				Asm: VPXORD, CPU Feature: AVX512

		Implements (at least 2)
			 Uint8x64 : expvar.Var
			 Uint8x64 : fmt.Stringer
		As Outputs Of (at least 45)
			func BroadcastUint8x64(x uint8) Uint8x64
			func LoadMaskedUint8x64(y *[64]uint8, mask Mask8x64) Uint8x64
			func LoadUint8x64(y *[64]uint8) Uint8x64
			func LoadUint8x64Slice(s []uint8) Uint8x64
			func LoadUint8x64SlicePart(s []uint8) Uint8x64
			func Float32x16.AsUint8x64() (to Uint8x64)
			func Float64x8.AsUint8x64() (to Uint8x64)
			func Int16x32.AsUint8x64() (to Uint8x64)
			func Int32x16.AsUint8x64() (to Uint8x64)
			func Int64x8.AsUint8x64() (to Uint8x64)
			func Int8x64.AsUint8x64() (to Uint8x64)
			func Uint16x32.AsUint8x64() (to Uint8x64)
			func Uint32x16.AsUint8x64() (to Uint8x64)
			func Uint64x8.AsUint8x64() (to Uint8x64)
			func Uint8x16.Broadcast512() Uint8x64
			func Uint8x64.Add(y Uint8x64) Uint8x64
			func Uint8x64.AddSaturated(y Uint8x64) Uint8x64
			func Uint8x64.AESDecryptLastRound(y Uint32x16) Uint8x64
			func Uint8x64.AESDecryptOneRound(y Uint32x16) Uint8x64
			func Uint8x64.AESEncryptLastRound(y Uint32x16) Uint8x64
			func Uint8x64.AESEncryptOneRound(y Uint32x16) Uint8x64
			func Uint8x64.And(y Uint8x64) Uint8x64
			func Uint8x64.AndNot(y Uint8x64) Uint8x64
			func Uint8x64.Average(y Uint8x64) Uint8x64
			func Uint8x64.Compress(mask Mask8x64) Uint8x64
			func Uint8x64.ConcatPermute(y Uint8x64, indices Uint8x64) Uint8x64
			func Uint8x64.ConcatShiftBytesRightGrouped(constant uint8, y Uint8x64) Uint8x64
			func Uint8x64.Expand(mask Mask8x64) Uint8x64
			func Uint8x64.GaloisFieldAffineTransform(y Uint64x8, b uint8) Uint8x64
			func Uint8x64.GaloisFieldAffineTransformInverse(y Uint64x8, b uint8) Uint8x64
			func Uint8x64.GaloisFieldMul(y Uint8x64) Uint8x64
			func Uint8x64.Masked(mask Mask8x64) Uint8x64
			func Uint8x64.Max(y Uint8x64) Uint8x64
			func Uint8x64.Merge(y Uint8x64, mask Mask8x64) Uint8x64
			func Uint8x64.Min(y Uint8x64) Uint8x64
			func Uint8x64.Not() Uint8x64
			func Uint8x64.OnesCount() Uint8x64
			func Uint8x64.Or(y Uint8x64) Uint8x64
			func Uint8x64.Permute(indices Uint8x64) Uint8x64
			func Uint8x64.PermuteOrZeroGrouped(indices Int8x64) Uint8x64
			func Uint8x64.SetHi(y Uint8x32) Uint8x64
			func Uint8x64.SetLo(y Uint8x32) Uint8x64
			func Uint8x64.Sub(y Uint8x64) Uint8x64
			func Uint8x64.SubSaturated(y Uint8x64) Uint8x64
			func Uint8x64.Xor(y Uint8x64) Uint8x64
		As Inputs Of (at least 28)
			func Int8x64.ConcatPermute(y Int8x64, indices Uint8x64) Int8x64
			func Int8x64.DotProductQuadruple(y Uint8x64) Int32x16
			func Int8x64.DotProductQuadrupleSaturated(y Uint8x64) Int32x16
			func Int8x64.Permute(indices Uint8x64) Int8x64
			func Uint8x64.Add(y Uint8x64) Uint8x64
			func Uint8x64.AddSaturated(y Uint8x64) Uint8x64
			func Uint8x64.And(y Uint8x64) Uint8x64
			func Uint8x64.AndNot(y Uint8x64) Uint8x64
			func Uint8x64.Average(y Uint8x64) Uint8x64
			func Uint8x64.ConcatPermute(y Uint8x64, indices Uint8x64) Uint8x64
			func Uint8x64.ConcatPermute(y Uint8x64, indices Uint8x64) Uint8x64
			func Uint8x64.ConcatShiftBytesRightGrouped(constant uint8, y Uint8x64) Uint8x64
			func Uint8x64.Equal(y Uint8x64) Mask8x64
			func Uint8x64.GaloisFieldMul(y Uint8x64) Uint8x64
			func Uint8x64.Greater(y Uint8x64) Mask8x64
			func Uint8x64.GreaterEqual(y Uint8x64) Mask8x64
			func Uint8x64.Less(y Uint8x64) Mask8x64
			func Uint8x64.LessEqual(y Uint8x64) Mask8x64
			func Uint8x64.Max(y Uint8x64) Uint8x64
			func Uint8x64.Merge(y Uint8x64, mask Mask8x64) Uint8x64
			func Uint8x64.Min(y Uint8x64) Uint8x64
			func Uint8x64.NotEqual(y Uint8x64) Mask8x64
			func Uint8x64.Or(y Uint8x64) Uint8x64
			func Uint8x64.Permute(indices Uint8x64) Uint8x64
			func Uint8x64.Sub(y Uint8x64) Uint8x64
			func Uint8x64.SubSaturated(y Uint8x64) Uint8x64
			func Uint8x64.SumAbsDiff(y Uint8x64) Uint16x32
			func Uint8x64.Xor(y Uint8x64) Uint8x64

	 type X86Features (struct)

		Methods (total 14)
			( X86Features) AES() bool
				AES returns whether the CPU supports the AES feature.
				
				AES is defined on all GOARCHes, but will only return true on
				GOARCH amd64.

			( X86Features) AVX() bool
				AVX returns whether the CPU supports the AVX feature.
				
				AVX is defined on all GOARCHes, but will only return true on
				GOARCH amd64.

			( X86Features) AVX2() bool
				AVX2 returns whether the CPU supports the AVX2 feature.
				
				AVX2 is defined on all GOARCHes, but will only return true on
				GOARCH amd64.

			( X86Features) AVX512() bool
				AVX512 returns whether the CPU supports the AVX512F+CD+BW+DQ+VL features.
				
				These five CPU features are bundled together, and no use of AVX-512
				is allowed unless all of these features are supported together.
				Nearly every CPU that has shipped with any support for AVX-512 has
				supported all five of these features.
				
				AVX512 is defined on all GOARCHes, but will only return true on
				GOARCH amd64.

			( X86Features) AVX512BITALG() bool
				AVX512BITALG returns whether the CPU supports the AVX512BITALG feature.
				
				AVX512BITALG is defined on all GOARCHes, but will only return true on
				GOARCH amd64.

			( X86Features) AVX512GFNI() bool
				AVX512GFNI returns whether the CPU supports the AVX512GFNI feature.
				
				AVX512GFNI is defined on all GOARCHes, but will only return true on
				GOARCH amd64.

			( X86Features) AVX512VAES() bool
				AVX512VAES returns whether the CPU supports the AVX512VAES feature.
				
				AVX512VAES is defined on all GOARCHes, but will only return true on
				GOARCH amd64.

			( X86Features) AVX512VBMI() bool
				AVX512VBMI returns whether the CPU supports the AVX512VBMI feature.
				
				AVX512VBMI is defined on all GOARCHes, but will only return true on
				GOARCH amd64.

			( X86Features) AVX512VBMI2() bool
				AVX512VBMI2 returns whether the CPU supports the AVX512VBMI2 feature.
				
				AVX512VBMI2 is defined on all GOARCHes, but will only return true on
				GOARCH amd64.

			( X86Features) AVX512VNNI() bool
				AVX512VNNI returns whether the CPU supports the AVX512VNNI feature.
				
				AVX512VNNI is defined on all GOARCHes, but will only return true on
				GOARCH amd64.

			( X86Features) AVX512VPCLMULQDQ() bool
				AVX512VPCLMULQDQ returns whether the CPU supports the AVX512VPCLMULQDQ feature.
				
				AVX512VPCLMULQDQ is defined on all GOARCHes, but will only return true on
				GOARCH amd64.

			( X86Features) AVX512VPOPCNTDQ() bool
				AVX512VPOPCNTDQ returns whether the CPU supports the AVX512VPOPCNTDQ feature.
				
				AVX512VPOPCNTDQ is defined on all GOARCHes, but will only return true on
				GOARCH amd64.

			( X86Features) AVXVNNI() bool
				AVXVNNI returns whether the CPU supports the AVXVNNI feature.
				
				AVXVNNI is defined on all GOARCHes, but will only return true on
				GOARCH amd64.

			( X86Features) SHA() bool
				SHA returns whether the CPU supports the SHA feature.
				
				SHA is defined on all GOARCHes, but will only return true on
				GOARCH amd64.

		As Types Of (only one)
			  var X86


Package-Level Functions (total 155)

	 func BroadcastFloat32x16(x float32) Float32x16
		BroadcastFloat32x16 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX512F

	 func BroadcastFloat32x4(x float32) Float32x4
		BroadcastFloat32x4 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX2

	 func BroadcastFloat32x8(x float32) Float32x8
		BroadcastFloat32x8 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX2

	 func BroadcastFloat64x2(x float64) Float64x2
		BroadcastFloat64x2 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX2

	 func BroadcastFloat64x4(x float64) Float64x4
		BroadcastFloat64x4 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX2

	 func BroadcastFloat64x8(x float64) Float64x8
		BroadcastFloat64x8 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX512F

	 func BroadcastInt16x16(x int16) Int16x16
		BroadcastInt16x16 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX2

	 func BroadcastInt16x32(x int16) Int16x32
		BroadcastInt16x32 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX512BW

	 func BroadcastInt16x8(x int16) Int16x8
		BroadcastInt16x8 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX2

	 func BroadcastInt32x16(x int32) Int32x16
		BroadcastInt32x16 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX512F

	 func BroadcastInt32x4(x int32) Int32x4
		BroadcastInt32x4 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX2

	 func BroadcastInt32x8(x int32) Int32x8
		BroadcastInt32x8 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX2

	 func BroadcastInt64x2(x int64) Int64x2
		BroadcastInt64x2 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX2

	 func BroadcastInt64x4(x int64) Int64x4
		BroadcastInt64x4 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX2

	 func BroadcastInt64x8(x int64) Int64x8
		BroadcastInt64x8 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX512F

	 func BroadcastInt8x16(x int8) Int8x16
		BroadcastInt8x16 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX2

	 func BroadcastInt8x32(x int8) Int8x32
		BroadcastInt8x32 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX2

	 func BroadcastInt8x64(x int8) Int8x64
		BroadcastInt8x64 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX512BW

	 func BroadcastUint16x16(x uint16) Uint16x16
		BroadcastUint16x16 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX2

	 func BroadcastUint16x32(x uint16) Uint16x32
		BroadcastUint16x32 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX512BW

	 func BroadcastUint16x8(x uint16) Uint16x8
		BroadcastUint16x8 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX2

	 func BroadcastUint32x16(x uint32) Uint32x16
		BroadcastUint32x16 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX512F

	 func BroadcastUint32x4(x uint32) Uint32x4
		BroadcastUint32x4 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX2

	 func BroadcastUint32x8(x uint32) Uint32x8
		BroadcastUint32x8 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX2

	 func BroadcastUint64x2(x uint64) Uint64x2
		BroadcastUint64x2 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX2

	 func BroadcastUint64x4(x uint64) Uint64x4
		BroadcastUint64x4 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX2

	 func BroadcastUint64x8(x uint64) Uint64x8
		BroadcastUint64x8 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX512F

	 func BroadcastUint8x16(x uint8) Uint8x16
		BroadcastUint8x16 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX2

	 func BroadcastUint8x32(x uint8) Uint8x32
		BroadcastUint8x32 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX2

	 func BroadcastUint8x64(x uint8) Uint8x64
		BroadcastUint8x64 returns a vector with the input
		x assigned to all elements of the output.
		
		Emulated, CPU Feature AVX512BW

	 func ClearAVXUpperBits()
		ClearAVXUpperBits clears the high bits of Y0-Y15 and Z0-Z15 registers.
		It is intended for transitioning from AVX to SSE, eliminating the
		performance penalties caused by false dependencies.
		
		Note: in the future the compiler may automatically generate the
		instruction, making this function unnecessary.
		
		Asm: VZEROUPPER, CPU Feature: AVX

	 func LoadFloat32x16(y *[16]float32) Float32x16
		LoadFloat32x16 loads a Float32x16 from an array

	 func LoadFloat32x16Slice(s []float32) Float32x16
		LoadFloat32x16Slice loads a Float32x16 from a slice of at least 16 float32s

	 func LoadFloat32x16SlicePart(s []float32) Float32x16
		LoadFloat32x16SlicePart loads a Float32x16 from the slice s.
		If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes.
		If s has 16 or more elements, the function is equivalent to LoadFloat32x16Slice.

	 func LoadFloat32x4(y *[4]float32) Float32x4
		LoadFloat32x4 loads a Float32x4 from an array

	 func LoadFloat32x4Slice(s []float32) Float32x4
		LoadFloat32x4Slice loads a Float32x4 from a slice of at least 4 float32s

	 func LoadFloat32x4SlicePart(s []float32) Float32x4
		LoadFloat32x4SlicePart loads a Float32x4 from the slice s.
		If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes.
		If s has 4 or more elements, the function is equivalent to LoadFloat32x4Slice.

	 func LoadFloat32x8(y *[8]float32) Float32x8
		LoadFloat32x8 loads a Float32x8 from an array

	 func LoadFloat32x8Slice(s []float32) Float32x8
		LoadFloat32x8Slice loads a Float32x8 from a slice of at least 8 float32s

	 func LoadFloat32x8SlicePart(s []float32) Float32x8
		LoadFloat32x8SlicePart loads a Float32x8 from the slice s.
		If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes.
		If s has 8 or more elements, the function is equivalent to LoadFloat32x8Slice.

	 func LoadFloat64x2(y *[2]float64) Float64x2
		LoadFloat64x2 loads a Float64x2 from an array

	 func LoadFloat64x2Slice(s []float64) Float64x2
		LoadFloat64x2Slice loads a Float64x2 from a slice of at least 2 float64s

	 func LoadFloat64x2SlicePart(s []float64) Float64x2
		LoadFloat64x2SlicePart loads a Float64x2 from the slice s.
		If s has fewer than 2 elements, the remaining elements of the vector are filled with zeroes.
		If s has 2 or more elements, the function is equivalent to LoadFloat64x2Slice.

	 func LoadFloat64x4(y *[4]float64) Float64x4
		LoadFloat64x4 loads a Float64x4 from an array

	 func LoadFloat64x4Slice(s []float64) Float64x4
		LoadFloat64x4Slice loads a Float64x4 from a slice of at least 4 float64s

	 func LoadFloat64x4SlicePart(s []float64) Float64x4
		LoadFloat64x4SlicePart loads a Float64x4 from the slice s.
		If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes.
		If s has 4 or more elements, the function is equivalent to LoadFloat64x4Slice.

	 func LoadFloat64x8(y *[8]float64) Float64x8
		LoadFloat64x8 loads a Float64x8 from an array

	 func LoadFloat64x8Slice(s []float64) Float64x8
		LoadFloat64x8Slice loads a Float64x8 from a slice of at least 8 float64s

	 func LoadFloat64x8SlicePart(s []float64) Float64x8
		LoadFloat64x8SlicePart loads a Float64x8 from the slice s.
		If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes.
		If s has 8 or more elements, the function is equivalent to LoadFloat64x8Slice.

	 func LoadInt16x16(y *[16]int16) Int16x16
		LoadInt16x16 loads a Int16x16 from an array

	 func LoadInt16x16Slice(s []int16) Int16x16
		LoadInt16x16Slice loads an Int16x16 from a slice of at least 16 int16s

	 func LoadInt16x16SlicePart(s []int16) Int16x16
		LoadInt16x16SlicePart loads a Int16x16 from the slice s.
		If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes.
		If s has 16 or more elements, the function is equivalent to LoadInt16x16Slice.

	 func LoadInt16x32(y *[32]int16) Int16x32
		LoadInt16x32 loads a Int16x32 from an array

	 func LoadInt16x32Slice(s []int16) Int16x32
		LoadInt16x32Slice loads an Int16x32 from a slice of at least 32 int16s

	 func LoadInt16x32SlicePart(s []int16) Int16x32
		LoadInt16x32SlicePart loads a Int16x32 from the slice s.
		If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes.
		If s has 32 or more elements, the function is equivalent to LoadInt16x32Slice.

	 func LoadInt16x8(y *[8]int16) Int16x8
		LoadInt16x8 loads a Int16x8 from an array

	 func LoadInt16x8Slice(s []int16) Int16x8
		LoadInt16x8Slice loads an Int16x8 from a slice of at least 8 int16s

	 func LoadInt16x8SlicePart(s []int16) Int16x8
		LoadInt16x8SlicePart loads a Int16x8 from the slice s.
		If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes.
		If s has 8 or more elements, the function is equivalent to LoadInt16x8Slice.

	 func LoadInt32x16(y *[16]int32) Int32x16
		LoadInt32x16 loads a Int32x16 from an array

	 func LoadInt32x16Slice(s []int32) Int32x16
		LoadInt32x16Slice loads an Int32x16 from a slice of at least 16 int32s

	 func LoadInt32x16SlicePart(s []int32) Int32x16
		LoadInt32x16SlicePart loads a Int32x16 from the slice s.
		If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes.
		If s has 16 or more elements, the function is equivalent to LoadInt32x16Slice.

	 func LoadInt32x4(y *[4]int32) Int32x4
		LoadInt32x4 loads a Int32x4 from an array

	 func LoadInt32x4Slice(s []int32) Int32x4
		LoadInt32x4Slice loads an Int32x4 from a slice of at least 4 int32s

	 func LoadInt32x4SlicePart(s []int32) Int32x4
		LoadInt32x4SlicePart loads a Int32x4 from the slice s.
		If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes.
		If s has 4 or more elements, the function is equivalent to LoadInt32x4Slice.

	 func LoadInt32x8(y *[8]int32) Int32x8
		LoadInt32x8 loads a Int32x8 from an array

	 func LoadInt32x8Slice(s []int32) Int32x8
		LoadInt32x8Slice loads an Int32x8 from a slice of at least 8 int32s

	 func LoadInt32x8SlicePart(s []int32) Int32x8
		LoadInt32x8SlicePart loads a Int32x8 from the slice s.
		If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes.
		If s has 8 or more elements, the function is equivalent to LoadInt32x8Slice.

	 func LoadInt64x2(y *[2]int64) Int64x2
		LoadInt64x2 loads a Int64x2 from an array

	 func LoadInt64x2Slice(s []int64) Int64x2
		LoadInt64x2Slice loads an Int64x2 from a slice of at least 2 int64s

	 func LoadInt64x2SlicePart(s []int64) Int64x2
		LoadInt64x2SlicePart loads a Int64x2 from the slice s.
		If s has fewer than 2 elements, the remaining elements of the vector are filled with zeroes.
		If s has 2 or more elements, the function is equivalent to LoadInt64x2Slice.

	 func LoadInt64x4(y *[4]int64) Int64x4
		LoadInt64x4 loads a Int64x4 from an array

	 func LoadInt64x4Slice(s []int64) Int64x4
		LoadInt64x4Slice loads an Int64x4 from a slice of at least 4 int64s

	 func LoadInt64x4SlicePart(s []int64) Int64x4
		LoadInt64x4SlicePart loads a Int64x4 from the slice s.
		If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes.
		If s has 4 or more elements, the function is equivalent to LoadInt64x4Slice.

	 func LoadInt64x8(y *[8]int64) Int64x8
		LoadInt64x8 loads a Int64x8 from an array

	 func LoadInt64x8Slice(s []int64) Int64x8
		LoadInt64x8Slice loads an Int64x8 from a slice of at least 8 int64s

	 func LoadInt64x8SlicePart(s []int64) Int64x8
		LoadInt64x8SlicePart loads a Int64x8 from the slice s.
		If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes.
		If s has 8 or more elements, the function is equivalent to LoadInt64x8Slice.

	 func LoadInt8x16(y *[16]int8) Int8x16
		LoadInt8x16 loads a Int8x16 from an array

	 func LoadInt8x16Slice(s []int8) Int8x16
		LoadInt8x16Slice loads an Int8x16 from a slice of at least 16 int8s

	 func LoadInt8x16SlicePart(s []int8) Int8x16
		LoadInt8x16SlicePart loads a Int8x16 from the slice s.
		If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes.
		If s has 16 or more elements, the function is equivalent to LoadInt8x16Slice.

	 func LoadInt8x32(y *[32]int8) Int8x32
		LoadInt8x32 loads a Int8x32 from an array

	 func LoadInt8x32Slice(s []int8) Int8x32
		LoadInt8x32Slice loads an Int8x32 from a slice of at least 32 int8s

	 func LoadInt8x32SlicePart(s []int8) Int8x32
		LoadInt8x32SlicePart loads a Int8x32 from the slice s.
		If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes.
		If s has 32 or more elements, the function is equivalent to LoadInt8x32Slice.

	 func LoadInt8x64(y *[64]int8) Int8x64
		LoadInt8x64 loads a Int8x64 from an array

	 func LoadInt8x64Slice(s []int8) Int8x64
		LoadInt8x64Slice loads an Int8x64 from a slice of at least 64 int8s

	 func LoadInt8x64SlicePart(s []int8) Int8x64
		LoadInt8x64SlicePart loads a Int8x64 from the slice s.
		If s has fewer than 64 elements, the remaining elements of the vector are filled with zeroes.
		If s has 64 or more elements, the function is equivalent to LoadInt8x64Slice.

	 func LoadMaskedFloat32x16(y *[16]float32, mask Mask32x16) Float32x16
		LoadMaskedFloat32x16 loads a Float32x16 from an array,
		at those elements enabled by mask
		
		Asm: VMOVDQU32.Z, CPU Feature: AVX512

	 func LoadMaskedFloat32x4(y *[4]float32, mask Mask32x4) Float32x4
		LoadMaskedFloat32x4 loads a Float32x4 from an array,
		at those elements enabled by mask
		
		Asm: VMASKMOVD, CPU Feature: AVX2

	 func LoadMaskedFloat32x8(y *[8]float32, mask Mask32x8) Float32x8
		LoadMaskedFloat32x8 loads a Float32x8 from an array,
		at those elements enabled by mask
		
		Asm: VMASKMOVD, CPU Feature: AVX2

	 func LoadMaskedFloat64x2(y *[2]float64, mask Mask64x2) Float64x2
		LoadMaskedFloat64x2 loads a Float64x2 from an array,
		at those elements enabled by mask
		
		Asm: VMASKMOVQ, CPU Feature: AVX2

	 func LoadMaskedFloat64x4(y *[4]float64, mask Mask64x4) Float64x4
		LoadMaskedFloat64x4 loads a Float64x4 from an array,
		at those elements enabled by mask
		
		Asm: VMASKMOVQ, CPU Feature: AVX2

	 func LoadMaskedFloat64x8(y *[8]float64, mask Mask64x8) Float64x8
		LoadMaskedFloat64x8 loads a Float64x8 from an array,
		at those elements enabled by mask
		
		Asm: VMOVDQU64.Z, CPU Feature: AVX512

	 func LoadMaskedInt16x32(y *[32]int16, mask Mask16x32) Int16x32
		LoadMaskedInt16x32 loads a Int16x32 from an array,
		at those elements enabled by mask
		
		Asm: VMOVDQU16.Z, CPU Feature: AVX512

	 func LoadMaskedInt32x16(y *[16]int32, mask Mask32x16) Int32x16
		LoadMaskedInt32x16 loads a Int32x16 from an array,
		at those elements enabled by mask
		
		Asm: VMOVDQU32.Z, CPU Feature: AVX512

	 func LoadMaskedInt32x4(y *[4]int32, mask Mask32x4) Int32x4
		LoadMaskedInt32x4 loads a Int32x4 from an array,
		at those elements enabled by mask
		
		Asm: VMASKMOVD, CPU Feature: AVX2

	 func LoadMaskedInt32x8(y *[8]int32, mask Mask32x8) Int32x8
		LoadMaskedInt32x8 loads a Int32x8 from an array,
		at those elements enabled by mask
		
		Asm: VMASKMOVD, CPU Feature: AVX2

	 func LoadMaskedInt64x2(y *[2]int64, mask Mask64x2) Int64x2
		LoadMaskedInt64x2 loads a Int64x2 from an array,
		at those elements enabled by mask
		
		Asm: VMASKMOVQ, CPU Feature: AVX2

	 func LoadMaskedInt64x4(y *[4]int64, mask Mask64x4) Int64x4
		LoadMaskedInt64x4 loads a Int64x4 from an array,
		at those elements enabled by mask
		
		Asm: VMASKMOVQ, CPU Feature: AVX2

	 func LoadMaskedInt64x8(y *[8]int64, mask Mask64x8) Int64x8
		LoadMaskedInt64x8 loads a Int64x8 from an array,
		at those elements enabled by mask
		
		Asm: VMOVDQU64.Z, CPU Feature: AVX512

	 func LoadMaskedInt8x64(y *[64]int8, mask Mask8x64) Int8x64
		LoadMaskedInt8x64 loads a Int8x64 from an array,
		at those elements enabled by mask
		
		Asm: VMOVDQU8.Z, CPU Feature: AVX512

	 func LoadMaskedUint16x32(y *[32]uint16, mask Mask16x32) Uint16x32
		LoadMaskedUint16x32 loads a Uint16x32 from an array,
		at those elements enabled by mask
		
		Asm: VMOVDQU16.Z, CPU Feature: AVX512

	 func LoadMaskedUint32x16(y *[16]uint32, mask Mask32x16) Uint32x16
		LoadMaskedUint32x16 loads a Uint32x16 from an array,
		at those elements enabled by mask
		
		Asm: VMOVDQU32.Z, CPU Feature: AVX512

	 func LoadMaskedUint32x4(y *[4]uint32, mask Mask32x4) Uint32x4
		LoadMaskedUint32x4 loads a Uint32x4 from an array,
		at those elements enabled by mask
		
		Asm: VMASKMOVD, CPU Feature: AVX2

	 func LoadMaskedUint32x8(y *[8]uint32, mask Mask32x8) Uint32x8
		LoadMaskedUint32x8 loads a Uint32x8 from an array,
		at those elements enabled by mask
		
		Asm: VMASKMOVD, CPU Feature: AVX2

	 func LoadMaskedUint64x2(y *[2]uint64, mask Mask64x2) Uint64x2
		LoadMaskedUint64x2 loads a Uint64x2 from an array,
		at those elements enabled by mask
		
		Asm: VMASKMOVQ, CPU Feature: AVX2

	 func LoadMaskedUint64x4(y *[4]uint64, mask Mask64x4) Uint64x4
		LoadMaskedUint64x4 loads a Uint64x4 from an array,
		at those elements enabled by mask
		
		Asm: VMASKMOVQ, CPU Feature: AVX2

	 func LoadMaskedUint64x8(y *[8]uint64, mask Mask64x8) Uint64x8
		LoadMaskedUint64x8 loads a Uint64x8 from an array,
		at those elements enabled by mask
		
		Asm: VMOVDQU64.Z, CPU Feature: AVX512

	 func LoadMaskedUint8x64(y *[64]uint8, mask Mask8x64) Uint8x64
		LoadMaskedUint8x64 loads a Uint8x64 from an array,
		at those elements enabled by mask
		
		Asm: VMOVDQU8.Z, CPU Feature: AVX512

	 func LoadUint16x16(y *[16]uint16) Uint16x16
		LoadUint16x16 loads a Uint16x16 from an array

	 func LoadUint16x16Slice(s []uint16) Uint16x16
		LoadUint16x16Slice loads an Uint16x16 from a slice of at least 16 uint16s

	 func LoadUint16x16SlicePart(s []uint16) Uint16x16
		LoadUint16x16SlicePart loads a Uint16x16 from the slice s.
		If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes.
		If s has 16 or more elements, the function is equivalent to LoadUint16x16Slice.

	 func LoadUint16x32(y *[32]uint16) Uint16x32
		LoadUint16x32 loads a Uint16x32 from an array

	 func LoadUint16x32Slice(s []uint16) Uint16x32
		LoadUint16x32Slice loads an Uint16x32 from a slice of at least 32 uint16s

	 func LoadUint16x32SlicePart(s []uint16) Uint16x32
		LoadUint16x32SlicePart loads a Uint16x32 from the slice s.
		If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes.
		If s has 32 or more elements, the function is equivalent to LoadUint16x32Slice.

	 func LoadUint16x8(y *[8]uint16) Uint16x8
		LoadUint16x8 loads a Uint16x8 from an array

	 func LoadUint16x8Slice(s []uint16) Uint16x8
		LoadUint16x8Slice loads an Uint16x8 from a slice of at least 8 uint16s

	 func LoadUint16x8SlicePart(s []uint16) Uint16x8
		LoadUint16x8SlicePart loads a Uint16x8 from the slice s.
		If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes.
		If s has 8 or more elements, the function is equivalent to LoadUint16x8Slice.

	 func LoadUint32x16(y *[16]uint32) Uint32x16
		LoadUint32x16 loads a Uint32x16 from an array

	 func LoadUint32x16Slice(s []uint32) Uint32x16
		LoadUint32x16Slice loads an Uint32x16 from a slice of at least 16 uint32s

	 func LoadUint32x16SlicePart(s []uint32) Uint32x16
		LoadUint32x16SlicePart loads a Uint32x16 from the slice s.
		If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes.
		If s has 16 or more elements, the function is equivalent to LoadUint32x16Slice.

	 func LoadUint32x4(y *[4]uint32) Uint32x4
		LoadUint32x4 loads a Uint32x4 from an array

	 func LoadUint32x4Slice(s []uint32) Uint32x4
		LoadUint32x4Slice loads an Uint32x4 from a slice of at least 4 uint32s

	 func LoadUint32x4SlicePart(s []uint32) Uint32x4
		LoadUint32x4SlicePart loads a Uint32x4 from the slice s.
		If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes.
		If s has 4 or more elements, the function is equivalent to LoadUint32x4Slice.

	 func LoadUint32x8(y *[8]uint32) Uint32x8
		LoadUint32x8 loads a Uint32x8 from an array

	 func LoadUint32x8Slice(s []uint32) Uint32x8
		LoadUint32x8Slice loads an Uint32x8 from a slice of at least 8 uint32s

	 func LoadUint32x8SlicePart(s []uint32) Uint32x8
		LoadUint32x8SlicePart loads a Uint32x8 from the slice s.
		If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes.
		If s has 8 or more elements, the function is equivalent to LoadUint32x8Slice.

	 func LoadUint64x2(y *[2]uint64) Uint64x2
		LoadUint64x2 loads a Uint64x2 from an array

	 func LoadUint64x2Slice(s []uint64) Uint64x2
		LoadUint64x2Slice loads an Uint64x2 from a slice of at least 2 uint64s

	 func LoadUint64x2SlicePart(s []uint64) Uint64x2
		LoadUint64x2SlicePart loads a Uint64x2 from the slice s.
		If s has fewer than 2 elements, the remaining elements of the vector are filled with zeroes.
		If s has 2 or more elements, the function is equivalent to LoadUint64x2Slice.

	 func LoadUint64x4(y *[4]uint64) Uint64x4
		LoadUint64x4 loads a Uint64x4 from an array

	 func LoadUint64x4Slice(s []uint64) Uint64x4
		LoadUint64x4Slice loads an Uint64x4 from a slice of at least 4 uint64s

	 func LoadUint64x4SlicePart(s []uint64) Uint64x4
		LoadUint64x4SlicePart loads a Uint64x4 from the slice s.
		If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes.
		If s has 4 or more elements, the function is equivalent to LoadUint64x4Slice.

	 func LoadUint64x8(y *[8]uint64) Uint64x8
		LoadUint64x8 loads a Uint64x8 from an array

	 func LoadUint64x8Slice(s []uint64) Uint64x8
		LoadUint64x8Slice loads an Uint64x8 from a slice of at least 8 uint64s

	 func LoadUint64x8SlicePart(s []uint64) Uint64x8
		LoadUint64x8SlicePart loads a Uint64x8 from the slice s.
		If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes.
		If s has 8 or more elements, the function is equivalent to LoadUint64x8Slice.

	 func LoadUint8x16(y *[16]uint8) Uint8x16
		LoadUint8x16 loads a Uint8x16 from an array

	 func LoadUint8x16Slice(s []uint8) Uint8x16
		LoadUint8x16Slice loads an Uint8x16 from a slice of at least 16 uint8s

	 func LoadUint8x16SlicePart(s []uint8) Uint8x16
		LoadUint8x16SlicePart loads a Uint8x16 from the slice s.
		If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes.
		If s has 16 or more elements, the function is equivalent to LoadUint8x16Slice.

	 func LoadUint8x32(y *[32]uint8) Uint8x32
		LoadUint8x32 loads a Uint8x32 from an array

	 func LoadUint8x32Slice(s []uint8) Uint8x32
		LoadUint8x32Slice loads an Uint8x32 from a slice of at least 32 uint8s

	 func LoadUint8x32SlicePart(s []uint8) Uint8x32
		LoadUint8x32SlicePart loads a Uint8x32 from the slice s.
		If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes.
		If s has 32 or more elements, the function is equivalent to LoadUint8x32Slice.

	 func LoadUint8x64(y *[64]uint8) Uint8x64
		LoadUint8x64 loads a Uint8x64 from an array

	 func LoadUint8x64Slice(s []uint8) Uint8x64
		LoadUint8x64Slice loads an Uint8x64 from a slice of at least 64 uint8s

	 func LoadUint8x64SlicePart(s []uint8) Uint8x64
		LoadUint8x64SlicePart loads a Uint8x64 from the slice s.
		If s has fewer than 64 elements, the remaining elements of the vector are filled with zeroes.
		If s has 64 or more elements, the function is equivalent to LoadUint8x64Slice.

	 func Mask16x16FromBits(y uint16) Mask16x16
		Mask16x16FromBits constructs a Mask16x16 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
		
		Asm: KMOVW, CPU Feature: AVX512

	 func Mask16x32FromBits(y uint32) Mask16x32
		Mask16x32FromBits constructs a Mask16x32 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
		
		Asm: KMOVW, CPU Feature: AVX512

	 func Mask16x8FromBits(y uint8) Mask16x8
		Mask16x8FromBits constructs a Mask16x8 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
		
		Asm: KMOVW, CPU Feature: AVX512

	 func Mask32x16FromBits(y uint16) Mask32x16
		Mask32x16FromBits constructs a Mask32x16 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
		
		Asm: KMOVD, CPU Feature: AVX512

	 func Mask32x4FromBits(y uint8) Mask32x4
		Mask32x4FromBits constructs a Mask32x4 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
		Only the lower 4 bits of y are used.
		
		Asm: KMOVD, CPU Feature: AVX512

	 func Mask32x8FromBits(y uint8) Mask32x8
		Mask32x8FromBits constructs a Mask32x8 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
		
		Asm: KMOVD, CPU Feature: AVX512

	 func Mask64x2FromBits(y uint8) Mask64x2
		Mask64x2FromBits constructs a Mask64x2 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
		Only the lower 2 bits of y are used.
		
		Asm: KMOVQ, CPU Feature: AVX512

	 func Mask64x4FromBits(y uint8) Mask64x4
		Mask64x4FromBits constructs a Mask64x4 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
		Only the lower 4 bits of y are used.
		
		Asm: KMOVQ, CPU Feature: AVX512

	 func Mask64x8FromBits(y uint8) Mask64x8
		Mask64x8FromBits constructs a Mask64x8 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
		
		Asm: KMOVQ, CPU Feature: AVX512

	 func Mask8x16FromBits(y uint16) Mask8x16
		Mask8x16FromBits constructs a Mask8x16 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
		
		Asm: KMOVB, CPU Feature: AVX512

	 func Mask8x32FromBits(y uint32) Mask8x32
		Mask8x32FromBits constructs a Mask8x32 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
		
		Asm: KMOVB, CPU Feature: AVX512

	 func Mask8x64FromBits(y uint64) Mask8x64
		Mask8x64FromBits constructs a Mask8x64 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
		
		Asm: KMOVB, CPU Feature: AVX512


Package-Level Variables (only one)

	  var X86 X86Features


The pages are generated with Golds v0.8.3-preview. (GOOS=linux GOARCH=amd64)
Golds is a Go 101 project developed by Tapir Liu.
PR and bug reports are welcome and can be submitted to the issue list.
Please follow @zigo_101 (reachable from the left QR code) to get the latest news of Golds.