API

Chat APIs

Two compatible endpoint shapes — /v1/chat/completions for OpenAI SDKs and /v1/messages for the Anthropic SDK. Same auth, same router, same bill.

Anything that already speaks to OpenAI or Anthropic works on Token Harbor with a single base-URL swap. Grab or rotate a key on the dashboard first; keys look like thk_live_….

OpenAI-compatible · `/v1/chat/completions`

Drop-in compatible with the OpenAI Chat Completions schema. Auth header is Authorization: Bearer thk_….

Non-streaming

Returns the full response in a single JSON body. Best for short completions and synchronous request paths.

import requests

response = requests.post(
    url="https://tokenharbor.ai/v1/chat/completions",
    headers={
        "Authorization": "Bearer thk_live_PASTE_YOURS_HERE",
        "Content-Type": "application/json",
    },
    json={
        "model": "tokenharbor/qwen3-max",
        "messages": [{"role": "user", "content": "Hi"}],
    },
)
print(response.json()["choices"][0]["message"]["content"])

On Windows PowerShell, use the PowerShell tab (Invoke-RestMethod). The aliased curl is actually Invoke-WebRequest, and even curl.exe mangles the JSON quoting — that's the usual cause of a bad range specification error, or an invalid_api_key when a long key gets a stray space or line-break. Always paste your key in full, with no spaces.

Streaming

Pass stream: true and the server emits Server-Sent Events so tokens arrive as they're generated.

import requests

with requests.post(
    url="https://tokenharbor.ai/v1/chat/completions",
    headers={
        "Authorization": "Bearer thk_live_PASTE_YOURS_HERE",
        "Content-Type": "application/json",
    },
    json={
        "model": "tokenharbor/qwen3-max",
        "messages": [{"role": "user", "content": "Write a haiku about the sea."}],
        "stream": True,
    },
    stream=True,
) as r:
    for line in r.iter_lines():
        if line:
            print(line.decode("utf-8"))

Claude-compatible · `/v1/messages`

Anthropic Messages API shape. Point the official @anthropic-ai/sdk or anthropic Python client at Token Harbor by setting base_url (or baseURL). Auth accepts either x-api-key (Anthropic default) or Authorization: Bearer — pass your thk_live_ key in either header.

Non-streaming

Returns a single message object with the assistant's reply in content[0].text. max_tokens is required by the Anthropic schema.

from anthropic import Anthropic

client = Anthropic(
    api_key="thk_live_PASTE_YOURS_HERE",
    base_url="https://tokenharbor.ai",
)

message = client.messages.create(
    model="tokenharbor/qwen3-max",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hi"}],
)
print(message.content[0].text)

Streaming

The Anthropic SDK exposes a typed event stream (message_start, content_block_delta, message_stop, etc.). Our adapter emits the same events so SDK handlers keep working.

from anthropic import Anthropic

client = Anthropic(
    api_key="thk_live_PASTE_YOURS_HERE",
    base_url="https://tokenharbor.ai",
)

with client.messages.stream(
    model="tokenharbor/qwen3-max",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a haiku about the sea."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Model id is whatever appears in /models — every gateway model works on both endpoints. Routing to the cheapest healthy upstream happens internally regardless of which shape you call.

Prompt handling & steering

How we treat your prompt depends on the model you pick:

A specific model id (e.g. tokenharbor/qwen3-max, anthropic/claude-*) — pure pass-through. Your system and messages are forwarded byte-for-byte; we never add or rewrite anything.
th-orchestra (smart routing) — we add a small steering directive to improve behaviour (e.g. plan → todo → execute for agent tasks, or role-specific guidance). To turn it off and get pure routing with an untouched prompt, send the header X-TH-Steering: off (or body "th_steering": "off").

Every response includes an X-TH-Steering-Applied header so you always know what was applied: none, off, or the role/workflow that was injected.

Next steps

Browse the full catalog at /models — every entry shows the exact per-million input/output price.
Create, rotate, or revoke keys on the dashboard.

API

Chat APIs

Two compatible endpoint shapes — /v1/chat/completions for OpenAI SDKs and /v1/messages for the Anthropic SDK. Same auth, same router, same bill.

Anything that already speaks to OpenAI or Anthropic works on Token Harbor with a single base-URL swap. Grab or rotate a key on the dashboard first; keys look like thk_live_….

OpenAI-compatible · `/v1/chat/completions`

Drop-in compatible with the OpenAI Chat Completions schema. Auth header is Authorization: Bearer thk_….

Non-streaming

Returns the full response in a single JSON body. Best for short completions and synchronous request paths.

import requests

response = requests.post(
    url="https://tokenharbor.ai/v1/chat/completions",
    headers={
        "Authorization": "Bearer thk_live_PASTE_YOURS_HERE",
        "Content-Type": "application/json",
    },
    json={
        "model": "tokenharbor/qwen3-max",
        "messages": [{"role": "user", "content": "Hi"}],
    },
)
print(response.json()["choices"][0]["message"]["content"])

Streaming

Pass stream: true and the server emits Server-Sent Events so tokens arrive as they're generated.

import requests

with requests.post(
    url="https://tokenharbor.ai/v1/chat/completions",
    headers={
        "Authorization": "Bearer thk_live_PASTE_YOURS_HERE",
        "Content-Type": "application/json",
    },
    json={
        "model": "tokenharbor/qwen3-max",
        "messages": [{"role": "user", "content": "Write a haiku about the sea."}],
        "stream": True,
    },
    stream=True,
) as r:
    for line in r.iter_lines():
        if line:
            print(line.decode("utf-8"))

Claude-compatible · `/v1/messages`

Non-streaming

Returns a single message object with the assistant's reply in content[0].text. max_tokens is required by the Anthropic schema.

from anthropic import Anthropic

client = Anthropic(
    api_key="thk_live_PASTE_YOURS_HERE",
    base_url="https://tokenharbor.ai",
)

message = client.messages.create(
    model="tokenharbor/qwen3-max",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hi"}],
)
print(message.content[0].text)

Streaming

The Anthropic SDK exposes a typed event stream (message_start, content_block_delta, message_stop, etc.). Our adapter emits the same events so SDK handlers keep working.

from anthropic import Anthropic

client = Anthropic(
    api_key="thk_live_PASTE_YOURS_HERE",
    base_url="https://tokenharbor.ai",
)

with client.messages.stream(
    model="tokenharbor/qwen3-max",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a haiku about the sea."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Model id is whatever appears in /models — every gateway model works on both endpoints. Routing to the cheapest healthy upstream happens internally regardless of which shape you call.

Prompt handling & steering

How we treat your prompt depends on the model you pick:

A specific model id (e.g. tokenharbor/qwen3-max, anthropic/claude-*) — pure pass-through. Your system and messages are forwarded byte-for-byte; we never add or rewrite anything.
th-orchestra (smart routing) — we add a small steering directive to improve behaviour (e.g. plan → todo → execute for agent tasks, or role-specific guidance). To turn it off and get pure routing with an untouched prompt, send the header X-TH-Steering: off (or body "th_steering": "off").

Every response includes an X-TH-Steering-Applied header so you always know what was applied: none, off, or the role/workflow that was injected.

Next steps

Browse the full catalog at /models — every entry shows the exact per-million input/output price.
Create, rotate, or revoke keys on the dashboard.

OpenAI-compatible · /v1/chat/completions

Non-streaming

Streaming

Claude-compatible · /v1/messages

Non-streaming

Streaming

Prompt handling & steering

Next steps

OpenAI-compatible · /v1/chat/completions

Non-streaming

Streaming

Claude-compatible · /v1/messages

Non-streaming

Streaming

Prompt handling & steering

Next steps

OpenAI-compatible · `/v1/chat/completions`

Claude-compatible · `/v1/messages`

OpenAI-compatible · `/v1/chat/completions`

Claude-compatible · `/v1/messages`