What is the GPU Instance Specification (GIS)?

GIS is a vendor-neutral, machine-readable JSON format for describing any GPU cloud offering. One JSON document fully describes a GPU instance — hardware, compute, pricing, availability, and normalized comparison metrics.

The GIS specification is open and free forever under CC BY 4.0. The computespec.dev site is free to browse. Pro features like historical pricing data, alerts, and API access are paid.

How is computespec different from GPU comparison sites?

Comparison sites scrape and normalize data independently. computespec defines the open standard format itself — the layer underneath all comparison tools. Any tool, marketplace, or provider can adopt GIS format.

Is it true that a gpu has thousands of small cores optimized for parallel work?

A GPU has thousands of small cores optimized for parallel work

Is it true that cpus have fewer, more powerful cores for sequential tasks?

CPUs have fewer, more powerful cores for sequential tasks

Is it true that gpus dominate ai/ml because training is massively parallel?

GPUs dominate AI/ML because training is massively parallel

Foundational5 min readgpu.*

What is a GPU?

A Graphics Processing Unit is a chip designed to do thousands of simple calculations at the same time. Originally built for rendering pixels, now the engine behind AI and scientific computing.

On this page

What it is

A GPU (Graphics Processing Unit) is a specialized processor designed to handle thousands of operations simultaneously. Think of it as a factory with 16,000 workers who can each do simple math — compared to a CPU, which is more like 16 expert engineers who can each solve complex problems.

GPUs were originally built to render graphics — calculating the color of millions of pixels 60 times per second. That requires doing the same math on different data, over and over, in parallel. This is called SIMD (Single Instruction, Multiple Data).

It turns out that training AI models has the same pattern: multiply millions of numbers together, add them up, repeat billions of times. That's why GPUs became the engine behind modern AI.

How a GPU works

A modern GPU like the NVIDIA H100 contains 132 Streaming Multiprocessors (SMs), each with 128 CUDA cores. That's 16,896 cores total. Each core is simple — it can do one floating-point operation per clock cycle. But 16,896 of them working together can do 989.5 trillion operations per second (TFLOPS at FP16 precision).

GPUs also have their own dedicated memory called VRAM (Video RAM). The H100 has 80 GB of HBM3 memory with 3.35 TB/s bandwidth. This is separate from the system RAM — the GPU needs fast, local memory because it processes data so quickly that waiting for system memory would create a bottleneck.

Key numbers — NVIDIA H100 SXM5
Cores: 16,896 CUDA + 528 Tensor Cores
VRAM: 80 GB HBM3
Bandwidth: 3.35 TB/s
FP16 TFLOPS: 989.5
TDP: 700W

Why it matters for cloud compute

When you rent a GPU cloud instance, you're renting access to one or more GPUs attached to a server. The GPU is usually the most expensive component — an 8×H100 instance on AWS costs ~$98/hour, and most of that cost is the GPUs.

Different GPUs have wildly different capabilities. An A100 and an H100 are both NVIDIA data center GPUs, but the H100 has roughly 3× the AI performance. Understanding GPU specs lets you pick the right hardware for your workload and avoid overpaying.

This is exactly what the GIS specification captures — every GPU cloud offering described in a single, comparable JSON document.

How it appears in GIS

The gpu section of a GIS document describes the GPU hardware:

{
  "gpu": {
    "model": "nvidia-h100",
    "variant": "sxm5",
    "count": 1,
    "vram_gb": 80,
    "tflops_fp16": 989.5,
    "memory_bandwidth_tbps": 3.35,
    "interconnect": "nvlink",
    "architecture": "hopper"
  }
}

Every field maps to a concept you can learn about in this knowledge base. model → GPU Model Names. vram_gb → What is VRAM?. tflops_fp16 → What are TFLOPS?

Deep dive: GPU architecture

Modern NVIDIA GPUs are organized hierarchically:

GPC (Graphics Processing Cluster) — the top-level grouping
TPC (Texture Processing Cluster) — contains 2 SMs
SM (Streaming Multiprocessor) — the fundamental compute unit, contains CUDA cores, Tensor Cores, and shared memory
CUDA Core — executes one floating-point or integer operation per clock
Tensor Core — specialized for matrix multiply-accumulate (the core operation in neural networks)

For AI workloads, Tensor Cores are what matter most. They can perform a 4×4 matrix multiply in a single operation, which is orders of magnitude faster than doing it with CUDA cores. The H100's Tensor Cores support FP8 precision, enabling even faster training and inference.

The architecture generation (Hopper, Ampere, etc.) determines which features are available. Each generation typically brings new Tensor Core capabilities, more memory bandwidth, and better interconnects.

Key takeaways