Visualize how transformers use Key-Value caching during autoregressive generation
Interactive visualization of BPE tokenization for GPT-4 and GPT-4o models