Credible technical evidence, not marketing numbers. Throughput is relative to Ray Data on identical hardware. Every row lists its dataset, hardware, command and environment — and links to a reproducible script.
Higher is better. Vane / Ray Data / Daft columns are relative throughput; baseline = Ray Data 1.0×.
Prefix bucketing groups similar-length prompts to cut padding waste, raising effective batch utilization on the same GPUs.
Image, audio, document and video workloads. Image-decode + CLIP and audio transcription are measured below; document and video are in progress.
A benchmark you can't reproduce is a marketing number. Everything here is pinned and scripted.
# clone, pin the environment, run git clone https://github.com/AstroVela/vane cd vane/benchmarks pip install -r requirements.lock # vLLM batch inference benchmark python bench_vllm.py \ --dataset s3://bench/prompts-66k.parquet \ --gpus 2 --bucketing prefix --runs 3