Contiguity (Row-Major Order, Fortran & MatLab Column-Major Order, .stride, Traversing Memory to Print Tensor, data_ptr, What is Contiguous, Importance & Benefits, Common Operations for Contiguity & Non-Contiguity, )

I stopped to fix index_select, then need to finish vdot, then atomicadd.

`Dimension Reordering Operations`

`Summary of Multiplication and Product Functions in PyTorch`

Multiplication Name	Function	Explanation	Symbol	Vector Or Matrix
Tensor Multiplication	`.matmul` `@`	performs matrix multiplication or batch matrix multiplication	$A·B$ —— $AB$	Vectors/Matrix
Matrix Multiplication	`.mm`	performs matrix multiplication only on 2D matrices.	$A·B$ —— $AB$	2D Matrix
Hadamard product / Element-Wise Multiplication	`.mul` `.multiply` `*`	performs Element-wise Multiplication	$A * B$ —— $A ⊙ B$	Vectors/Matrix
Dot Product / Inner Product / Scalar Product	`.dot` `.inner`		$a ⋅ b$	Vectors, and `.inner` accepts Matrix but performs on last dim only.
Outer Product	`.outer`		$a ⊗ b$ —— $a ∧ b$	Vectors Only
Kronecker Product	`.kron`		$A ⊗ B$	Vectors/Matrix
Cross Product	`.cross`		$a × b$	Vectors Only
Cartesian Product	`.cartesian_prod`	Returns 2D list of all combinations pairs (no multiplication)	$A × B$	Vectors Only

To use any of these operations, both tensors must be of the same data type.

Even if one is float32, and the other is float64, PyTorch will raise a runtime error of mismatched types.

🔑

✅ Best Practice:

Use float32 unless you explicitly need:

Mixed precision (float16) for speed
Double precision (float64) for numerical stability
Complex numbers for Fourier or signal ops
More details here: for more data types like torch.bfloat16 or torch.qint8, etc..
All data types are here:

💡

The main bottleneck in modern deep learning is usually Memory Bandwidth (moving data), not Compute (doing the math).

Fused operations reduce the number of memory read/write cycles, leading to significant speedups.

Kernel Fusion: The primary benefit is saving Memory Bandwidth.

addr // outer product then add to matrix input

I still need to complete tensordot, cross, outer, kronecker, vdot, and so on…

torch.nn.Linear

torch.vdot

`.outer` `⊗` (Outer Product)

It’s very SIMPLE.
It takes just 1 dimensional tensors.

If input is a vector of size n and vec2 is a vector of size m, then out must be a matrix of size (n×m).
python
```
v1 = torch.tensor((1, 2, 3, 4)) # Size [4] --> Treated as [4, 1]
v2 = torch.tensor((1, 2, 3)) # Size[3] --> Treated as [1, 3]
```
If we have $\mathbf{u} = \begin{pmatrix}u_1 \\u_2 \\u_3\end{pmatrix},\quad\mathbf{v} = \begin{pmatrix}v_1 \\v_2 \\v_3\end{pmatrix}\Rightarrow\mathbf{u} \otimes \mathbf{v} = \mathbf{u} \mathbf{v}^T=\begin{pmatrix}u_1 \\u_2 \\u_3\end{pmatrix}\begin{pmatrix}v_1 & v_2 & v_3\end{pmatrix}=\begin{pmatrix}u_1 v_1 & u_1 v_2 & u_1 v_3 \\u_2 v_1 & u_2 v_2 & u_2 v_3 \\u_3 v_1 & u_3 v_2 & u_3 v_3\end{pmatrix}$ , with a shape $(3\times1)*(1\times3) = (3\times3)$
🔑
The outer product $a⊗b$ is the same as writing $ab^T$ in Matrix Format.
However, when working with 1D vectors, we typically omit the transpose for simplicity, since ⊗ implicitly imply an outer product.

To use matmul, to do the outer product, we can do the following:

python

v1 = torch.tensor((1, 2, 3, 4)).reshape(4, 1) # Written in Matrix Format
v2 = torch.tensor((1, 2, 3)).reshape(3, 1) # Written in Matrix Format

python

a = torch.tensor([[1],
								  [2],
								  [3]])  
b = torch.tensor([[1], 
									[2], 
									[3]])

# INNER PRODUCT: aᵀ @ b → scalar
inner_product = torch.matmul(a.T, b)

# OUTER PRODUCT: a @ bᵀ → 3×3 matrix
outer_product = torch.matmul(a, b.T)

Outer product is Symmetric up to Transposition

⛔

a\otimes b \ne b\otimes a

a \otimes b = (b \otimes a)^T

Low-Rank Matrix Factorization

✅

It's a way to approximate a big matrix using two smaller matrices — so you save space, reduce noise, or learn meaningful patterns.

Use Cases:

Constructing Covariance Matrices
- Remember that $\text{Var}(X_1) = (x_1 - \mu_1)^2$ , and $\text{Cov}(X_1, X_2) = (x_1 - \mu_1)(x_2 - \mu_2)$

`.kron` `⊗` (Kronecker Product) (Matrix Direct Product) (Generalized Outer Product)

The Kronecker product (denoted $A \otimes B$ ) is a generalization of the outer product from vectors to matrices.
Given $(m\times n)$ matrix $A$ and a $(p \times q)$ matrix $B$ , the kronecker product will be $C = A \otimes B$ with a shape of $(mp \times nq)$

A = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}, \quad B = \begin{bmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{bmatrix} \Rightarrow A \otimes B = \begin{bmatrix} a_{11} B & a_{12} B \\ a_{21} B & a_{22} B \end{bmatrix} = \begin{bmatrix} a_{11} b_{11} & a_{11} b_{12} & a_{12} b_{11} & a_{12} b_{12} \\ a_{11} b_{21} & a_{11} b_{22} & a_{12} b_{21} & a_{12} b_{22} \\ a_{21} b_{11} & a_{21} b_{12} & a_{22} b_{11} & a_{22} b_{12} \\ a_{21} b_{21} & a_{21} b_{22} & a_{22} b_{21} & a_{22} b_{22} \end{bmatrix}

python

A = torch.tensor([[1, 2],
                  [3, 4]])

B = torch.tensor([[0, 5],
                  [6, 7]])

python

mat1 = torch.eye(2)
mat2 = torch.arange(1, 5).reshape(2, 2)

🔑

The Kronecker product is a way to turn a small matrix into a bigger matrix by multiplying its elements with another matrix.

The Kronecker product of two matrices A and B means: “Take every number in matrix A, and replace it with that number multiplied by all of matrix B.”

💡

Outer product is a special case of the Kronecker product when inputs are vectors.

⛔

Kronecker product is not commutative

A\otimes B \ne B \otimes A

In Matrix-Matrix Multiplication, we learned that each element of $C$ is $C_{ij} = \sum_{k=1}^n A_{ik}⋅B_{kj}$
In Kronecker Product, $k _t =i_ t ⋅b_ t +j_ t$ $\text{for} \space 0\le t \le n$ .
- It means move to the

✳️ If one tensor has fewer dimensions —> PyTorch will automatically add extra dimensions (using .unsqueeze()) to the smaller one.

python

a = torch.tensor([[1, 2]])    # shape: (1, 2)
b = torch.tensor([3, 4, 5])   # shape:    (3)
result = torch.kron(a, b)     # PyTorch treats b as shape (1, 3) # Finall result shape (1x1, 2x3) = (1, 6)

Basic Properties

It does not matter where we place multiplication with a scalar $(\alpha A) \otimes B = A \otimes (\alpha B) = \alpha (A \otimes B)$
Taking the transpose before carrying out the Kronecker product yields the same result as doing so afterwards $(A \otimes B)^\top = A^\top \otimes B^\top$
The Kronecker product is associative $(A \otimes B) \otimes C = A \otimes (B \otimes C)$
The Kronecker product is right–distributive $(A + B) \otimes C = A \otimes C + B \otimes C$
The Kronecker product is left–distributive $A \otimes (B + C) = A \otimes B + A \otimes C$
The product of two Kronecker products yields another Kronecker product $(A \otimes B)(C \otimes D) = (AC) \otimes (BD)$
The trace of the Kronecker product of two matrices is the product of the traces of the matrices $\operatorname{tr}(A \otimes B) = \operatorname{tr}(A)\operatorname{tr}(B)$
- The trace of a square matrix is the sum of its diagonal elements.

Use Cases:

Repeating a Pattern (Matrix Tiling)
- You’re building a large image or grid that follows a pattern — like tiles on a floor.
- Imagine a pattern like
  python
  [0 1] [1 0]
  And you want to repeat it across a 4x4 area, scaled differently in each region (Brightness, Weights, …).
- Think of it like “Take this small checkerboard and copy it multiple times into a bigger grid — but scale each copy differently."
  python
  # Tile pattern (e.g., black/white checkerboard) tile = torch.tensor([[0, 1], [1, 0]], dtype=torch.float32) # Control pattern (e.g., brightness or number of times to repeat) pattern = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32) # Kronecker product to scale and tile
Combining Systems (Quantum / State Expansion)
- You have two small systems (like coin flips or binary states), and want to model the combined system.
- You want to simulate both together as one big system with 4 states (2 × 2).
- This is how quantum computing combines qubits.
  python
  # 2-state systems (e.g., [1, 0] is "on", [0, 1] is "off") sys_A = torch.tensor([[1.], [0.]]) # A is ON sys_B = torch.tensor([[0.], [1.]]) # B is OFF # Combined system (A ⊗ B) combined = torch.kron(sys_A, sys_B)

Building Smart Weight Matrices (Machine Learning) 🧠

You’re training a neural network, and one of your layers has a huge matrix of weights.
You realize:
- It’s slow
- Takes too much memory
- But most of the values are based on a smaller pattern.

You can use Kronecker product to build that big matrix from small ones.

python

# Base patterns
A = torch.tensor([[1, 2],
                  [3, 4]], dtype=torch.float32)

B = torch.tensor([[0.1, 0.2],
                  [0.3, 0.4]], dtype=torch.float32)

# Build a large structured weight matrix
W = torch.kron(A, B)

# Input to multiply (size must match)
x = torch.randn(4)  # W is 4x4

# Apply the layer
y = W @ x

Efficient Neural Network Layer
- A paper called “Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with 1/n Parameters” discusses this idea.
- Instead of training normal FC layers, we will have PHM layer (Parameterized Hypercomplex Multiplication).
- Each layer, will have the matrix $\bold{H}$ , which is $\bold{H} = \bold{A} \otimes \bold{S}$ , and such construction reduces number of parameters to 1/n.
- “Instead of learning 1000 values, I’ll learn 10 values and repeat them smartly to build the big thing.”

Manual Implementation of Kronecker Product

python

def manual_kron(A: torch.Tensor, B: torch.Tensor):
    a_rows, a_cols = A.shape
    b_rows, b_cols = B.shape

    # Shape of the output: (a_rows * b_rows, a_cols * b_cols)
    C = torch.zeros((a_rows * b_rows, a_cols * b_cols), dtype=A.dtype)

    for i in range(a_rows):
        for j in range(a_cols):
            # Multiply the scalar A[i, j] by the full matrix B
            C[i*b_rows:(i+1)*b_rows, j*b_cols:(j+1)*b_cols] = A[i, j] * B
            
            # Note that we for loop on Matrix A, and for every element of it, we generate a block
            #

Fully Vectorized Kronecker Product (2D matrices only)

• axes=([1, 2], [1, 2]) tells NumPy to sum over the 2nd and 3rd dimensions of both arrays.

`.tensordot` (Tensor Dot Product) (Generalized Contraction) `HARD`

Many PyTorch users get stuck when they have to move beyond simple 2D matrix multiplication (torch.matmul or @) into 3D, 4D, or 5D tensors.

👌🏻

Everyone loves matrix multiplication when it’s just 2D. It’s clean, it’s simple.

But what happens when you are doing deep learning and you hit 4D image tensors? Suddenly, you find yourself desperately using .view(), .permute(), and .transpose() just to get dimensions to line up so you can use matmul. It’s messy, error-prone, and hard to read six months later.

What if I told you there is a single function that handles multiplying complex tensors without ever needing to reshape them first? Today, we are mastering torch.tensordot.

What is Tensor Contraction?
- Don't let the name scare you. "Contraction" just means we are choosing specific dimensions (axes) from two different tensors, multiplying the elements along those axes, and summing them up.
- Because we sum them up, those dimensions "contract" —> they disappear from the final output.
.tensordot takes three main arguments: your first tensor, your second tensor, and the most importantly dims parameter.
Basic Syntax: torch.tensordot(A, B, dims)
The magic happens in dims.

Integer: number of last axes of A and first axes of B to contract (Not Explicit Axis Index).
A tuple of two lists: explicit axes from A and B, dims=([List A], [List B]).
- List A: The indices of the dimensions in the first tensor you want to contract.
- List B: The indices of the dimensions in the second tensor you want to contract against them.

Case 0: Dot Product `.dot`

python

A = torch.tensor([1, 2, 3]) # (3,)
B = torch.tensor([4, 5, 6]) # (3,)
result = torch.tensordot(A, B, dims=1) # Contract 3 from A with the 3 from B
print(result) # tensor(32)

Case 1: 2D Matrix Multiplication `.mm`

python

A = torch.randn(3, 4)    
B = torch.randn(4, 5)

torch.tensordot(A, B, dims=1).size()

Case 2: dims = 0 → Outer Product

dims = 0 means: “do NOT contract anything.”
No summation, just multiplication one by one.

python

A = torch.randn(4)    
B = torch.randn(5)

torch.tensordot(A, B, dims=0).size() # torch.Size([4, 5]), same as in normal outer product

# If A = [1, 2, 3, 4]
# and B = [6, 7, 8, 9, 10]
# Then it will be

Now let’s say I have A of size (3,4), and B of size (5), and I said torch.tensordot(A, B, dims=0) , what will happen?
Say we have A is [[1, 2, 3, 4] [5, 6, 7, 8] [9, 10, 11, 12]]
And B is [1, 2, 3, 4, 5]
Since, dims=0, which means no contraction, so again outer product.
We will take the first row from A and perform outer product with B, then second row from A, and outer product with B, and so on…

python

A = torch.randn

So real quick if A = torch.randn(3, 4); B = torch.randn(4, 5) then torch.tensordot(A, B, dims=0) will result in a 4-D tensor by taking the outer product of every element of A with every element of B, which gives a matrix of size (3, 4, 4, 5).
Even though both tensors have a matching dimension of size 4, dims=0 treats them as distinct independent axes. It does not attempt to align or broadcast them.

👌🏻

Outer product of two matrices produces a 4-D tensor. You cannot perform this using .outer().

👌🏻

When dims=0, tensordot creates an outer product: every element of A is multiplied with every element of B, producing a tensor whose shape is the concatenation of A’s shape and B’s shape.

\text{Shape}(A) + \text{Shape}(B) \rightarrow \text{Result Shape}

👌🏻

You cannot use dims=([ ], [ ]) to get an outer product.

The only correct way to get an outer product with tensordot is: dims = 0 .

`Case 3 — dimension size 1`

python

A = torch.tensor([[1, 2, 3]])   # shape (1, 3)
B = torch.tensor([[4, 5, 6]])   # shape (1, 3)

result = torch.tensordot(A, B, dims=1)
print(result)

We need to multiply across the last dimension of A (3) and first dimension of A (1).

Case 4 — explicit axes (Matrix Multiplication)

Here, we say, contract the last index of A (axis = 0) with first index of B (axis = 1).
It’s like normal matrix multiplication.

python

A = torch.randn(2, 3)
B = torch.randn(3, 2)
result = torch.tensordot(A, B, dims = ([1], [0])) # dims = 1

print(result) # Size: [2, 2]

Case 5 — explicit axes (Double Contraction) ( $A :B$ )

Tensor contractions can be thought of as the higher-dimensional equivalent of matrix-matrix multiplications.
The symbol “:” means double contraction (also called double dot product).
Double Contraction is the tensor-analogue of the dot product but applied twice.
Dot product = contract one index from each matrix.
Double dot product = contract two indices from each matrix.
A double dot product between two tensors of orders m and n will result in $\text{order}(A:B)=(m+n−4)$ , which is .dim() , because 2 axes have been removed from each matrix.

👌🏻

I spent two days, to get to this understanding:

We are not doing matrix multiplication here per se, all we want is tensor contractions. This moves our thinking to having (Dot Product): vectors of matching lengths, that we multiply them by each other to get a scalar.

Let’s understand through an example:

If $A$ is size $(4, 3, 2)$ which is 4 blocks of $3 \times2$ . $B$ is size $(2, 3, 5)$ which is 2 blocks of $3 \times 5$ . Let’s say we want to do torch.tensordot(A, B, dims = ([2, 1], [0, 1])).
You might think that well, the matrices are organized very well for us for Matrix Multiplication where we want rows $\times$ columns, i.e., $(3 \times 2)$ vs $(2 \times 3)$ .
- This is the most point of confusion with tensor contractions. We are not thinking of this as Matrix Multiplication, rather Vector Dot Products.
We requested contracting using this mapping:
A axis B axis size
2 (size 2) 0 (size 2) ✓ matches
1 (size 3) 1 (size 3) ✓ matches
[IMPORTANT] But How tensordot actually executes this contraction?
To perform this operation efficiently, PyTorch (and NumPy) follows a three-step process: Permute $\rightarrow$ Reshape $\rightarrow$ Matrix Multiply.
- You need to study this: to understand how is it different from .reshape/.view .
Our goal is to have something like $(4, 6) \times (6, 4)$ , so applying @ to them is easy.
The answer that flies is then let’s flatten —> A.reshape(A.shape[0], -1); B.reshape(-1, B.shape[0]) , but it’s not that simple, and let’s understand why this need permute first.

A axis	B axis	size
2 (size 2)	0 (size 2)	✓ matches
1 (size 3)	1 (size 3)	✓ matches

Let’s say we have the following:

python

A = torch.randint(low=0, high=10, size=(4, 3, 2))   # integers 0–9
B = torch.randint(low=0, high=10, size=(2, 3, 5))   # integers 0–9
result = torch.tensordot(A, B, dims = ([2, 1], [0, 1]))

# Let's say
A : (4, 3, 2)
tensor([[[0, 1],
         [5, 7],
         [9, 9]],

        [[2, 7],
         [3, 9],
         [4, 0]],

        [[2, 8],
         [4, 4],
         [7, 4]],

        [[1, 0],
         [5, 4],
         [8, 4]]])
         
B: (2, 3, 5)
tensor([[[2, 7, 1, 8, 2],
         [9, 3, 6, 7, 3],
         [3, 0, 5, 4, 8]],

        [[3, 9, 1, 5, 1],
         [2, 6, 7, 7, 5],
         [5, 5, 1, 5, 2]]])

In a normal 2D matrix multiplication, we loop row by row from A, and column by column from B, then in each iteration, we perform the dot product operation to get one $\text{Result}$ cell value.
$\text{Result}[i, j] = \sum_{k=0}^{K-1} (\mathbf{A}[i, k] \times \mathbf{B}[k, j])$
We have:
- A.shape = (4, 3, 2) # think A[a, j, i]
- B.shape = (2, 3, 5) # think B[i, j, b]
We want to perform:
- result = torch.tensordot(A, B, dims=([2, 1], [0, 1]))
We know the output should be of size (4, 5), as two dimensions has been contracted from each.
Let’s trace result[0,0] explicitly:
- a = 0 (The cell
- b = 0
- i ∈ {0, 1}
- j ∈ {0, 1, 2}
Now, since we are in need to double contract, the formula is sum over (i) then over (j) because we are doing
$result[0,0]=∑_{i}∑_{j}A[0,j,i]⋅B[i,j,0]$
Let’s perform the

$\text{Result}[i, j, k, l] = \sum_{m=0}^{1} (\mathbf{A}[i, j, m] \times \mathbf{B}[m, k, l])$

The usefulness of permute and reshape functions is that they allow a contraction between a pair of tensors (which we call a binary tensor contraction) to be recast as a matrix multiplication.

Batch matrix multiplication is a special case of a tensor contraction.

python

# 1. Permute:
A_perm = A.permute(0, 1, 2)          # (4, 3, 2), same as original

B_perm = B. 

# If we flatten / Reshape A

Flatten (Reshape) is basically for grouping into one shot operation of double contraction
- PyTorch effectively flattens the tensors so the Contracted dimensions are grouped on the "inside."
- Flattening / Reshaping A: We keep dim 0. We combine dims 1 and 2. $(4, 3, 2) \rightarrow (4, 3 \times 2) \rightarrow \text{Matrix size } (4, 6)$
- Flattening / Reshaping B: We combine dims 0 and 1. We keep dim 2. $(2, 3, 5) \rightarrow (2 \times 3, 5) \rightarrow \text{Matrix size } (6, 5)$
Now just multiply, and we get the final result of $(4, 6)$ .

python

# Final Result
tensor([[134, 111, 134, 170, 141],
        [ 82, 140, 110, 151,  97],
        [113, 142, 101, 160, 108],
        [ 99,  66, 103, 123, 109]])

.tensordot generalizes:
- inner product .dot
- outer product .outer
- matrix multiply .mm
- batch .matmul
- multi-axis contraction .tensordot
- Einstein summation patterns .einsum

What is full geometric intuition behind tensor contractions???

Tensor product could be between 3D and 2D (doesnt have. to be of same order).

https://www.youtube.com/watch?v=RxbL5i8gczg (Log from Tensor Contraction Section).

🔑

`tensordot` vs `einsum`

The Rule of Thumb

Default to einsum: Use it for 95% of your code (modeling code, layers, loss functions). It is self-documenting and handles permutations automatically

For Readability & Documentation The equation bij,bjk->bik documents itself. You can instantly see it is a batch matrix multiplication. The equivalent tensordot requires you to mentally map indices to axis numbers.

einsum supports 3+ tensors; tensordot is binary.

Use tensordot when you know the contraction pattern and want the fastest GEMM-like (General Matrix Multiply) path.
Use tensordot when you want guaranteed “single matmul” behavior, because a single einsum maybe decomposed internally into (several matmuls, plus adds, plus permutes, plus buffer allocations, …).
- This is because the behavior is always as:
  1. Permute A and B so that the contracted axes are contiguous (if needed).
- If you know your contraction is really “one big GEMM”, tensordot makes that structure explicit and reduces the risk of the framework doing something surprising.

tensordot is constrained —> only contracts equal-length dimensions.

`.repeat` (Whole Blocks)

Similar to numpy.tile()

`.repeat_interleave` (Individual Elements)

Similar to numpy.repeat()
Repeat elements of a tensor.

`Slicing`

`What we studied so far does not Cover any of (torch.nn) Classes and Methods`

All of the following things can appear in a computation graph, but not all of them are “trainable building blocks”.
We can divide them into three categories:
- [1] TRAINABLE LAYERS
- [2] OPERATIONAL BUILDING BLOCKS (part of the graph but not trainable)
- [3] SUPPORT / META COMPONENTS (NOT part of the graph)

`[1] TRAINABLE LAYERS`

These absolutely become part of the real neural network structure.

`[2] OPERATIONAL BUILDING BLOCKS`

These do create computation graph connections, but they don’t hold weights, so they don’t show up in .parameters().

`[3] SUPPORT / META COMPONENTS`

These do not create operations in the graph. They are helpers.

✅

A simple analogy (you'll never forget this). Think of building a robot:

The brain & muscles (trainable modules)

Conv, Linear, Embedding, Attention, LSTM…

The joints and wiring (ops without parameters)

ReLU, Pooling, Softmax, Dropout, reshape, matmul…

The toolbox (utilities)

ModuleList, prune, weight_norm, Lazy modules…

Torch.gather

Reproducibility (Seeds)

python

import torch
import random
import numpy as np

seed = 42

random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)

if torch.cuda.is_available():
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)  # if using multi-GPU

Floating Point Associativity `atomicAdd`

Floating-point addition is not “order independent”

For real numbers (math world), we have: $(a + b) + c = a + (b + c)$

But for floating-point numbers, that is not always true because of rounding.

Example:

Tensor Operations (PyTorch)

Broadcasting (.expand(), .broadcast_to())

.repeat (Whole Blocks)

.unsqueeze + [:, None, :] + Ellipsis …

.squeeze

Dimension Reordering Operations

.transpose

.t() and .T

.permute (The Most General)

.view(*shape)

.reshape

.view_as

.reshape_as

.flatten

.unflatten

.movedim and .moveaxis

.swapdims and .swapaxis

.ravel

.view(torch.dtype)

.to(torch.dtype) and pin_memory (Modern Recommended Approach)

get_default_dtype() and set_default_dtype()

.type(torch.dtype) and .type_as(tensor)

Summary of Multiplication and Product Functions in PyTorch

✅ Best Practice:

.matmul @ (Tensor Multiplication)

.mm (Matrix Multiplication)

.bmm (Batched Matrix Multiplication)

.mv (Matrix-Vector Multiplication)

.linalg.multi_dot (Efficient Matrices Multiplication)

.mul .multiply * (Element-Wise Multiplication) (Hadamard Product) (Schur Product)

.dot .inner (Dot/Scalar Product)

.vdot (NOT COMPLETE)

.baddbmm (Batch Add Batch Matrix-Matrix Product) (Fused Ops)

.addmm (Add Matrix-Matrix Product) (Fused Ops)

.addmv (Add Matrix-Vector Product) (Fused Ops)

.addbmm (Add Reduced Matrix-Matrix Product) (Fused Ops)

.outer ⊗ (Outer Product)

Outer product is Symmetric up to Transposition

Low-Rank Matrix Factorization

Use Cases:

.addr (Add Outer product) (Fused Ops)

.kron ⊗ (Kronecker Product) (Matrix Direct Product) (Generalized Outer Product)

Basic Properties

Use Cases:

Manual Implementation of Kronecker Product

Fully Vectorized Kronecker Product (2D matrices only)

.cartesian_prod (Cartesian Product)

.prod (Product of All Elements)

.cumprod (Cumulative Product)

.cross ×\times× (Cross Product)

.einsum

.tensordot (Tensor Dot Product) (Generalized Contraction) HARD

Case 0: Dot Product .dot

Case 1: 2D Matrix Multiplication .mm

Case 2: dims = 0 → Outer Product

Case 3 — dimension size 1

Case 4 — explicit axes (Matrix Multiplication)

Case 5 — explicit axes (Double Contraction) (A:BA :BA:B)

tensordot vs einsum

.repeat (Whole Blocks)

.repeat_interleave (Individual Elements)

.unfold (Sliding Windows)

.equal

.eq

Slicing

Standard Slicing

.select and index_select

.narrow and .narrow_copy (Dynamic Slicing)

What we studied so far does not Cover any of (torch.nn) Classes and Methods

[1] TRAINABLE LAYERS

[2] OPERATIONAL BUILDING BLOCKS

[3] SUPPORT / META COMPONENTS

Torch.gather

Reproducibility (Seeds)

Floating Point Associativity atomicAdd

`Broadcasting` (`.expand()`, `.broadcast_to()`)

`.repeat` (Whole Blocks)

`.unsqueeze` + `[:, None, :]` + `Ellipsis …`

`.squeeze`

`Dimension Reordering Operations`

`.transpose`

`.t()` and `.T`

`.permute` (The Most General)

`.view(*shape)`

`.reshape`

`.view_as`

`.reshape_as`

`.flatten`

`.unflatten`

`.movedim` and `.moveaxis`

`.swapdims` and `.swapaxis`

`.ravel`

`.view(torch.dtype)`

`.to(torch.dtype)` and `pin_memory` (Modern Recommended Approach)

`get_default_dtype()` and `set_default_dtype()`

`.type(torch.dtype)` and `.type_as(tensor)`

`Summary of Multiplication and Product Functions in PyTorch`

`.matmul` `@` (Tensor Multiplication)

`.mm` (Matrix Multiplication)

`.bmm` (Batched Matrix Multiplication)

`.mv` (Matrix-Vector Multiplication)

`.linalg.multi_dot` (Efficient Matrices Multiplication)

`.mul` `.multiply` `*` (Element-Wise Multiplication) (Hadamard Product) (Schur Product)

`.dot` `.inner` (Dot/Scalar Product)

`.vdot` (NOT COMPLETE)

`.baddbmm` (Batch Add Batch Matrix-Matrix Product) (Fused Ops)

`.addmm` (Add Matrix-Matrix Product) (Fused Ops)

`.addmv` (Add Matrix-Vector Product) (Fused Ops)

`.addbmm` (Add Reduced Matrix-Matrix Product) (Fused Ops)

`.outer` `⊗` (Outer Product)

`.addr` (Add Outer product) (Fused Ops)

`.kron` `⊗` (Kronecker Product) (Matrix Direct Product) (Generalized Outer Product)

`.cartesian_prod` (Cartesian Product)

`.prod` (Product of All Elements)

`.cumprod` (Cumulative Product)

`.cross` $\times$ (Cross Product)

`.einsum`

`.tensordot` (Tensor Dot Product) (Generalized Contraction) `HARD`

Case 0: Dot Product `.dot`

Case 1: 2D Matrix Multiplication `.mm`

`Case 3 — dimension size 1`

Case 5 — explicit axes (Double Contraction) ( $A :B$ )

`tensordot` vs `einsum`

`.repeat` (Whole Blocks)

`.repeat_interleave` (Individual Elements)

`.unfold` (Sliding Windows)

`.equal`

`.eq`

`Slicing`

`.select` and `index_select`

`.narrow` and `.narrow_copy` (Dynamic Slicing)

`What we studied so far does not Cover any of (torch.nn) Classes and Methods`

`[1] TRAINABLE LAYERS`

`[2] OPERATIONAL BUILDING BLOCKS`

`[3] SUPPORT / META COMPONENTS`

Floating Point Associativity `atomicAdd`