Your AI.
Your rules.
Dedicated GPU instances with open-source LLMs. Documents never leave. Self-destructs when done.
1. Non-solicitation clause extends 24 months post-termination, covering all subsidiaries
2. Liquidated damages set at $500K per breach, enforceable without proof of actual loss
3. Governing law defaults to Delaware courts with mandatory arbitration waiver
What you get
Security By Architecture,
Not By Promise.
Dedicated GPU
Your own isolated GPU instance. No shared resources, no noisy neighbors, no compromises.
Self-Destructing Vaults
Workspace and all data obliterated on completion. Cryptographic proof of deletion.
Open Source Models
Fully auditable LLMs. No black boxes.
Full Audit Trail
Every query and response logged immutably.
Zero Knowledge
We never see your documents. Ever.
Network Isolation
No egress. Your data stays put.
Simple pricing
Pay As
You Go.
No subscriptions. No minimums. Spin up a vault, use it, destroy it. You only pay for GPU time consumed.
$0.50/hr
Starting rate · billed per minute · $0.008/min
FLASH
Qwen 2.5 7B
$0.50/hr
$0.008/min
TURBO
Qwen 3.5 35B
$2.50/hr
$0.04/min
ULTRA
GLM-5 744B
$20.00/hr
$0.33/min
FAQ
Frequently asked questions
Everything you need to know about VaultAI's security and privacy architecture.
ChatGPT processes your documents on shared infrastructure, and your data may be used for training. VaultAI spins up a completely isolated GPU instance for each vault. Your documents never leave your dedicated environment, are never shared, and are never used for model training. When you destroy the vault, the data is permanently deleted.
Your documents are uploaded directly to your dedicated GPU instance via end-to-end encryption. They are processed locally on that instance using open-source AI models. Our orchestration layer only handles metadata (vault status, billing) and never has access to your actual document content.
When you destroy a vault, the dedicated GPU instance is completely terminated. All data, including uploaded documents, AI model memory, chat history, and generated outputs, is permanently deleted. The underlying storage is wiped and the instance is decommissioned. This process is irreversible by design.
We run the Qwen 2.5 family of open-source models (7B, 32B, and 72B parameters) served via vLLM, the fastest open-source inference engine. All model weights are fully auditable and run entirely on your dedicated GPU instance — no data is sent to any third-party API. You choose the model tier that fits your workload, from fast extraction to full-precision reasoning.
VaultAI's architecture is designed to support HIPAA compliance. Each vault runs on isolated infrastructure with encrypted storage, no shared resources, and complete data destruction on termination. We provide BAA (Business Associate Agreement) for Enterprise customers. However, full HIPAA compliance depends on your organization's overall security implementation.
VaultAI uses simple pay-as-you-go pricing with no subscriptions or minimums. GPU time starts at $0.50/hr for the FLASH tier (7B model) and goes up to $10.00/hr for the MAX tier (405B model). You only pay for the minutes you use. Add credit to your balance and start analyzing — no commitments.