Ankur Mishra clankur

Hey, I'm Ankur!

An independent LLM researcher with experience in distributed systems and machine learning, currently investigating different approaches to shrink the KVCache. Previously worked @ Microsoft and AWS.

🔬 Current Research

Reducing the KVCache's Footprint
- Implemented a Large Concept Model with jointly trained encoder/decoder for KVCache compression
- Explored shared projection for K/V as an alternative to GQA
- Implemented multi-head latent attention and MixAttention - two different approaches to shrink the KVCache during inference
- Identified patterns between heads that sparsely accessed V without additional training
- Investigated calibrating cluster centers for Q/K to partition and sparsify KVCache access
Hyperparameter Transfer
- Compared two different approaches for hyperparameter transfer described by Yang et al. (2022) and Everett et al. (2024) and reproduced Everett's findings that standard parameterization with scaling exponents outperforms muP when scaling hyperparameters from 37m to 1b
- Implemented power scheduler from Shen et al (2024) for batch size and training length optimization
Other work
- Implemented Llama 3 and Gemma 2 within muGPT
- Designed a synthetic benchmark to assess the impact of position embeddings (RoPE, Alibi, CoPE, NoPE) and attention modifications (e.g., KVCache reuse, multi-head latent attention) on long-context performance

💻 Technical Skills

Languages: Python, C++, TypeScript
Frameworks: JAX, Pallas, PyTorch, TensorFlow
Cloud & Infrastructure: Google TPUs, AWS Lambda, CloudFormation
Areas of Expertise: Transformers, Machine Learning Infrastructure, Distributed Systems

🎓 Education

BS in Computer Science from University of Maryland @ College Park (2021)
- Graduated in 3 years with President's Scholarship
Thomas Jefferson High School for Science and Technology (2018)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ankur Mishra clankur

Achievements

Achievements

Block or report clankur

Hey, I'm Ankur!

🔬 Current Research

💻 Technical Skills

🎓 Education

Pinned Loading