Vane Data / Overview

Vane Data Docs

Vane Data helps teams build AI-ready datasets with DuckDB-compatible SQL, Python/Arrow UDFs, Ray-backed execution, and AI helper functions.

Use it when structured or semi-structured data needs SQL transformations plus Python libraries, model inference, embeddings, prompting, or distributed execution before it moves to training, search, analytics, or serving systems.

SQL is a first-class interface through con.sql(...) and con.execute(...); Python is available through UDFs and AI helpers when the pipeline needs custom logic.

Install vane-ai and import vane.

Vane Data is a data processing layer, not a model server, vector database, workflow scheduler, or transactional database.

Choose a starting point

A. Python and performance workflows

For pipelines where Python code, model inference, GPU work, or provider-backed AI calls dominate cost and runtime.

  1. Python quickstart
  2. UDFs
  3. AI functions
  4. Embeddings at scale
  5. GPU inference UDF
  6. Performance tuning

B. SQL and lightweight workflows

For relational pipelines that should stay close to DuckDB-compatible SQL and add Python only where it is useful.

  1. SQL quickstart
  2. SQL vs Python
  3. SQL multimodal pipeline
  4. Doris integration
  5. Single-node deployment

Advanced: distributed execution

For workloads where data size, scan parallelism, GPU placement, or multi-node execution becomes the main design question.

  1. Architecture
  2. Execution model
  3. Ray cluster
  4. Sizing

Documentation map