Kaixi Hou's Log

CUDA Tips: nvcc’s -code, -arch, -gencode

3 minute read

Introduction People may feel confused by the options of -code, -arch, -gencode when compiling their CUDA codes. Although the official guidance explains the d...

Expected Data Types in Mixed Precision Cheatsheet

1 minute read

When training neural networks with the Keras API, we care about the data types and computation types since they are relevant to the convergence (numeric stab...

Understanding the GeLU Fusion with TF-Grappler Visualization Tool

2 minute read

Introduction This post focuses on the GELU activation and showcases a debugging tool I created to visualize the TF op graphs. The Gaussian Error Linear Unit,...

Demystifying the BatchNorm-Add-ReLU Fusion

2 minute read

Introduction My previous post, “Demystifying the Conv-Bias-ReLU Fusion”, has introduced a common fusion pattern in deep learning models. This post, on the ot...

Sparse Data Structure: Sorting Indices with Any Sorter + Custom Comparators

3 minute read

Introduction Recently, I am working on a project regarding sparse tensors in Tensorflow. Sparse tensors are used to represent tensors with many zeros. To sav...

Kaixi Hou

Recent posts

CUDA Tips: nvcc’s -code, -arch, -gencode

Expected Data Types in Mixed Precision Cheatsheet

Understanding the GeLU Fusion with TF-Grappler Visualization Tool

Demystifying the BatchNorm-Add-ReLU Fusion

Sparse Data Structure: Sorting Indices with Any Sorter + Custom Comparators