What is the GPU Instance Specification (GIS)?

GIS is a vendor-neutral, machine-readable JSON format for describing any GPU cloud offering. One JSON document fully describes a GPU instance — hardware, compute, pricing, availability, and normalized comparison metrics.

The GIS specification is open and free forever under CC BY 4.0. The computespec.dev site is free to browse. Pro features like historical pricing data, alerts, and API access are paid.

How is computespec different from GPU comparison sites?

Comparison sites scrape and normalize data independently. computespec defines the open standard format itself — the layer underneath all comparison tools. Any tool, marketplace, or provider can adopt GIS format.

Intermediate6 min readgpu.tflops_fp16

What are TFLOPS?

TFLOPS (Tera Floating-Point Operations Per Second) measures how many trillion math operations a GPU can perform each second. It's the raw compute power metric.

On this page

What it is

TFLOPS stands for Tera Floating-Point Operations Per Second. One TFLOP = one trillion (10¹²) floating-point math operations every second.

Think of it as the GPU's horsepower rating. A car's horsepower tells you how much mechanical work the engine can do. TFLOPS tells you how much mathematical work the GPU can do.

A floating-point operation is any basic math on decimal numbers: addition, subtraction, multiplication, or division. When a GPU has 990 TFLOPS at FP16, it can perform 990 trillion of these operations every second at half-precision (16-bit) floating point.

Why precision changes everything

The same GPU has different TFLOPS ratings depending on the precision (number format) used. Lower precision = smaller numbers = more operations per cycle.

Precision	Bits	H100 TFLOPS	Use Case
FP64	64	67	Scientific computing, financial modeling
FP32	32	134	General compute, some training
TF32	19	495	NVIDIA-specific, training acceleration
FP16 / BF16	16	990	AI training (most common)
FP8	8	1,979	AI inference, some training
INT8	8	1,979	Quantized inference

GIS uses FP16 TFLOPS as the standard metric because it's the most common precision for AI training and is reported by all major GPU vendors.

TFLOPS vs real-world performance

TFLOPS is a theoretical peak. Real-world performance is always lower because:

Memory bandwidth bottleneck — the GPU can compute faster than memory can feed it data. This is called being "memory-bound."
Utilization — not all cores are busy all the time. Real workloads rarely achieve 100% utilization.
Software overhead — kernel launches, synchronization, data transfers all take time.

Rule of thumb
Real-world AI training typically achieves 30-60% of theoretical peak TFLOPS. A well-optimized workload on an H100 might sustain ~400-600 TFLOPS at FP16, not the theoretical 990.

This is why memory_bandwidth_tbps matters alongside TFLOPS. A GPU with high TFLOPS but low bandwidth will be bottlenecked. The H100's 3.35 TB/s bandwidth is designed to keep its 990 TFLOPS fed.

How it appears in GIS

TFLOPS appears in two places in a GIS document:

{
  "gpu": {
    "tflops_fp16": 989.5
  },
  "normalized": {
    "cost_per_tflop_hour": 0.00252
  }
}

gpu.tflops_fp16 is the raw compute power. normalized.cost_per_tflop_hour divides the hourly cost by TFLOPS to give you a price-performance ratio.

Lower cost_per_tflop_hour = more compute per dollar. This metric lets you compare a $2.49/hr H100 (990 TFLOPS) against a $1.10/hr A100 (312 TFLOPS) on equal footing.

Key takeaways

·1 TFLOP = 1 trillion floating-point operations per second
·TFLOPS vary by precision: FP64, FP32, FP16, BF16, FP8, INT8
·GIS uses FP16 TFLOPS as the standard comparison metric
·Higher TFLOPS ≠ always faster — memory bandwidth can be the bottleneck
·cost_per_tflop_hour normalizes price against compute power

All articles Read the spec