Ploy
Ploy
Features

AI Integration

Build AI-powered applications with OpenAI-compatible models in your Ploy workers.

AI Integration

Ploy provides built-in AI integration for workers, allowing you to build AI-powered applications with env.AI.run() and with direct OpenAI-compatible SDK calls when needed.

Enabling AI

To use AI in your worker, add ai: true to your ploy.yaml:

ploy.yaml
kind: dynamic
ai: true

This injects an env.AI binding plus the raw escape-hatch environment variables:

  • AI - AI binding with env.AI.run(model, inputs, options?)
  • PLOY_AI_URL - OpenAI-compatible base URL for direct SDK or HTTP requests
  • PLOY_AI_TOKEN - Authentication token (automatically managed per deployment)

Local Development

When you run ploy dev, remote AI bindings are attributed to an organization-scoped local development AI sandbox in Ploy.

  • If your checkout matches a deployed project, Ploy uses that project to choose the organization, then routes local AI usage through the organization's sandbox.
  • If your checkout does not match a deployed project and you belong to multiple organizations, set PLOY_DEV_AI_PROJECT_ID to any project in the target organization to choose the correct sandbox.
  • For full manual control, set both PLOY_DEV_AI_URL and PLOY_DEV_AI_TOKEN.

Example:

PLOY_DEV_AI_PROJECT_ID=proj_123 pnpm exec ploy dev

Basic AI Example

Here's a simple worker that calls an AI model:

src/index.ts
interface Env {
	AI: Ai;
	PLOY_AI_URL: string;
	PLOY_AI_TOKEN: string;
}

export default {
	async fetch(request, env) {
		const url = new URL(request.url);

		// Health check endpoint
		if (url.pathname === "/health") {
			return new Response("ok");
		}

		// AI endpoint
		if (url.pathname === "/ai") {
			try {
				// Get parameters from query string
				const model = url.searchParams.get("model") || "glm-4.6v-flash";
				const prompt = url.searchParams.get("prompt") || "just reply 'OK'";

				const data = await env.AI.run(model, { prompt });
				return new Response(JSON.stringify(data), {
					status: 200,
					headers: { "Content-Type": "application/json" },
				});
			} catch (error) {
				return new Response(
					JSON.stringify({
						error: error instanceof Error ? error.message : String(error),
					}),
					{
						status: 500,
						headers: { "Content-Type": "application/json" },
					},
				);
			}
		}

		return new Response("hi!");
	},
};

Usage

# Basic request
curl https://your-deployment.ploy.app/ai

# Custom prompt
curl "https://your-deployment.ploy.app/ai?prompt=What is TypeScript?"

# Different model
curl "https://your-deployment.ploy.app/ai?model=gpt-4&prompt=Hello"

env.AI.run()

Ploy exposes env.AI.run(model, inputs, options?):

const response = await env.AI.run("auto", {
	prompt: "Hello, World",
});

env.AI.run() is the primary worker API. The raw PLOY_AI_URL and PLOY_AI_TOKEN variables remain available when you want to call the OpenAI-compatible gateway directly with your own client.

Examples

  • examples/ai-simple uses env.AI.run() for the basic request path.
  • examples/ai-streaming uses env.AI.run() with stream: true and returns SSE.
  • examples/ai-openai uses the OpenAI SDK with PLOY_AI_URL and PLOY_AI_TOKEN.

OpenAI-Compatible API

Ploy's AI integration uses the OpenAI chat completions format:

Request Format

{
  model: string;           // Model name (e.g., "glm-4.6v-flash", "gpt-4")
  messages: Array<{        // Conversation messages
    role: "system" | "user" | "assistant";
    content: string;
  }>;
  temperature?: number;    // Randomness (0-2, default: 1)
  max_tokens?: number;     // Maximum response length
  top_p?: number;         // Nucleus sampling (0-1, default: 1)
  stream?: boolean;       // Enable streaming responses
}

Response Format

{
	id: string;
	object: "chat.completion";
	created: number;
	model: string;
	choices: Array<{
		index: number;
		message: {
			role: "assistant";
			content: string;
		};
		finish_reason: string;
	}>;
	usage: {
		prompt_tokens: number;
		completion_tokens: number;
		total_tokens: number;
	}
}

Conversation Context

Build conversational applications by maintaining message history:

interface Message {
	role: "system" | "user" | "assistant";
	content: string;
}

export default {
	async fetch(request, env) {
		if (request.method !== "POST") {
			return new Response("Method Not Allowed", { status: 405 });
		}

		try {
			const { messages, model = "glm-4.6v-flash" } = await request.json();

			// Add system prompt
			const fullMessages: Message[] = [
				{
					role: "system",
					content: "You are a helpful assistant that provides concise answers.",
				},
				...messages,
			];

			const data = await env.AI.run(model, {
				messages: fullMessages,
			});
			return new Response(JSON.stringify(data), {
				headers: { "Content-Type": "application/json" },
			});
		} catch (error) {
			return new Response(
				JSON.stringify({
					error: error instanceof Error ? error.message : String(error),
				}),
				{
					status: 500,
					headers: { "Content-Type": "application/json" },
				},
			);
		}
	},
};

Direct SDK Usage

If you need a library that expects an OpenAI-compatible baseURL and apiKey, use the injected raw variables:

import OpenAI from "openai";

const client = new OpenAI({
	apiKey: env.PLOY_AI_TOKEN,
	baseURL: env.PLOY_AI_URL,
});

const response = await client.chat.completions.create({
	model: "auto",
	messages: [{ role: "user", content: "Hello" }],
});

Streaming

env.AI.run() also supports streaming:

const stream = await env.AI.run("auto", {
	prompt: "Write one short sentence.",
	stream: true,
});

return new Response(stream, {
	headers: {
		"Content-Type": "text/event-stream; charset=utf-8",
	},
});

Usage

curl -X POST https://your-deployment.ploy.app/ai \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is JavaScript?"},
      {"role": "assistant", "content": "JavaScript is a programming language..."},
      {"role": "user", "content": "How do I use async/await?"}
    ]
  }'

LangChain Integration

Use LangChain for advanced AI features like agents and tools:

Installation

package.json
{
	"dependencies": {
		"@langchain/core": "^1.1.0",
		"@langchain/openai": "^1.1.3",
		"langchain": "^1.1.1",
		"zod": "^3.24.1"
	}
}

Creating Tools

Define tools that the AI can use:

src/index.ts
import { ChatOpenAI } from "@langchain/openai";
import { createAgent, tool } from "langchain";
import * as z from "zod";

interface Env {
	PLOY_AI_URL: string;
	PLOY_AI_TOKEN: string;
}

// Define a tool for getting weather information
const getWeather = tool((input) => `It's always sunny in ${input.city}!`, {
	name: "get_weather",
	description: "Get the weather for a given city",
	schema: z.object({
		city: z.string().describe("The city to get the weather for"),
	}),
});

export default {
	async fetch(request, env) {
		const url = new URL(request.url);

		if (url.pathname === "/health") {
			return new Response("ok");
		}

		if (url.pathname === "/ai") {
			try {
				const prompt = url.searchParams.get("prompt") || "just reply 'OK'";

				// Initialize OpenAI-compatible model
				const model = new ChatOpenAI({
					model: "glm-4.6v-flash",
					apiKey: env.PLOY_AI_TOKEN,
					configuration: {
						baseURL: env.PLOY_AI_URL,
					},
				});

				// Create agent with tools
				const agent = createAgent({
					model,
					tools: [getWeather],
				});

				// Invoke agent
				const result = await agent.invoke({
					messages: [{ role: "user", content: prompt }],
				});

				// Extract response
				const lastMessage = result.messages[result.messages.length - 1];
				const content =
					typeof lastMessage.content === "string"
						? lastMessage.content
						: JSON.stringify(lastMessage.content);

				// Return in OpenAI format
				return new Response(
					JSON.stringify({
						choices: [
							{
								message: {
									role: "assistant",
									content: content,
								},
							},
						],
					}),
					{
						status: 200,
						headers: { "Content-Type": "application/json" },
					},
				);
			} catch (error) {
				return new Response(
					JSON.stringify({
						error: error instanceof Error ? error.message : String(error),
					}),
					{
						status: 500,
						headers: { "Content-Type": "application/json" },
					},
				);
			}
		}

		return new Response("LangChain agent example!");
	},
};

Token Usage and Billing

Monitor token consumption in API responses:

const data = await response.json();

// Extract token usage
const usage = data.usage;
console.log(`Prompt tokens: ${usage.prompt_tokens}`);
console.log(`Completion tokens: ${usage.completion_tokens}`);
console.log(`Total tokens: ${usage.total_tokens}`);

// Track costs (example rates)
const cost = (
	usage.prompt_tokens * 0.00001 +
	usage.completion_tokens * 0.00002
).toFixed(4);
console.log(`Estimated cost: $${cost}`);

Token usage varies by model. Larger models (like GPT-5) cost more per token than smaller models (like GPT-4o-mini).

Examples

Summarization Service

export default {
	async fetch(request, env) {
		if (request.method !== "POST") {
			return new Response("Method Not Allowed", { status: 405 });
		}

		const { text } = await request.json();

		const response = await fetch(`${env.PLOY_AI_URL}/chat/completions`, {
			method: "POST",
			headers: {
				"Content-Type": "application/json",
				Authorization: `Bearer ${env.PLOY_AI_TOKEN}`,
			},
			body: JSON.stringify({
				model: "glm-4.6v-flash",
				messages: [
					{
						role: "system",
						content:
							"You are a summarization assistant. Provide concise summaries.",
					},
					{
						role: "user",
						content: `Summarize the following text:\n\n${text}`,
					},
				],
				max_tokens: 150,
			}),
		});

		const data = await response.json();
		const summary = data.choices[0].message.content;

		return new Response(JSON.stringify({ summary }), {
			headers: { "Content-Type": "application/json" },
		});
	},
};

Next Steps

How is this guide?

Last updated on