GPU Information

This document contains information about potential data centre grade GPUs and which Large Language Models (LLMs) they could potentially support. The table purely considers the theoretical vRAM requirements of various models and which GPUs should be able to run the model for inference.

Note: The information here does not suggest anything regarding throuput or general performance of each model on each GPU.

Models / GPU

Model	Params (billions)	Bit Quantizaton	vRAM (GB)	GPU
				A40	2 x A40	A100	2 x A100	H100 NVL	2 x H100 NVL
				48GB	96GB	80GB	160GB	94GB	188GB
`llama3.1:8B`	8	4	4.47	✓	✓	✓	✓	✓	✓
	8	8	8.94	✓	✓	✓	✓	✓	✓
	8	16	17.88	✓	✓	✓	✓	✓	✓
`llama3.1 :70B`	70	4	39.12	✓	✓	✓	✓	✓	✓
	70	8	78.23		✓	✓	✓	✓	✓
	70	16	156.46				✓		✓
`gemma2:9B`	9	4	5.03	✓	✓	✓	✓	✓	✓
	9	8	10.06	✓	✓	✓	✓	✓	✓
	9	16	20.12	✓	✓	✓	✓	✓	✓
`gemma2:27B`	27	4	15.09	✓	✓	✓	✓	✓	✓
	27	8	30.17	✓	✓	✓	✓	✓	✓
	27	16	60.35		✓	✓	✓	✓	✓
`mistral`	7	4	3.91	✓	✓	✓	✓	✓	✓
	7	8	7.82	✓	✓	✓	✓	✓	✓
	7	16	15.65	✓	✓	✓	✓	✓	✓
`mistral-nemo`	12	4	6.71	✓	✓	✓	✓	✓	✓
	12	8	13.41	✓	✓	✓	✓	✓	✓
	12	16	26.82	✓	✓	✓	✓	✓	✓
`mistral-small`	22	4	12.29	✓	✓	✓	✓	✓	✓
	22	8	24.59	✓	✓	✓	✓	✓	✓
	22	16	49.17		✓	✓	✓	✓	✓
`mistral-large`	123	4	68.73		✓	✓	✓	✓	✓
	123	8	137.46				✓		✓
	123	16	274.93

Note: The vRam requirement is an estiomate calculated based on the number of paramaters, the quantization used, and including a 20% overhead i.e. vRAM = ((No. params x quantization bits)/8) x 1.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu-info.md

gpu-info.md

GPU Information

Models / GPU

Files

gpu-info.md

Latest commit

History

gpu-info.md

File metadata and controls

GPU Information

Models / GPU