Zhipu AIActive

GLM-4.7 Flash API

Free tier GLM model. Currently in open beta — perfect for testing and prototyping.

💰 Save up to 70% vs official Zhipu AI pricing

TL;DR

Price: $0.00/1M input · $0.00/1M output

Context: 128K · max 4,096 output

Provider: Zhipu AI

Cost advantage: Cheaper than official API · No credit card

GLM-4.7 Flash — cheaper than the official Zhipu AI API

Access GLM-4.7 Flash through AI API Hub and pay less per token. Same OpenAI-compatible endpoint, lower cost.

$0.00/1M

input token price

INPUT / 1M tokens

$0.00

OUTPUT / 1M tokens

$0.00

CONTEXT WINDOW

128K

Technical Specifications

Provider	Zhipu AI
Model Family	GLM-4.7 Flash
Release Date	2026-05
Context Window	128K
Max Output Tokens	4,096
Input Price	$0.00 / 1M tokens
Output Price	$0.00 / 1M tokens
Vision Support	No
Function Calling	No
JSON Mode	No
Streaming	Yes ✓
Fine Tuning	Not Available
Status	Active ✓

Overview

GLM-4.7 Flash is Zhipu AI's current glm model, released in 2026-05. Free tier GLM model. Currently in open beta — perfect for testing and prototyping.

For developers, the headline numbers are a 128K context window and up to 4,096 output tokens per response — enough headroom for free and fast without chunking your input. Priced at $0.00/1M input and $0.00/1M output, it sits in the budget tier — ideal for high-volume pipelines where token cost dominates.

On the capability side, GLM-4.7 Flash exposes 2 features: Bilingual, Streaming. Note that fine-tuning isn't supported — you'll work with the base model. It's text-only, so route image or audio workloads elsewhere.

The practical appeal of routing GLM-4.7 Flash through AI API Hub is simplicity: one OpenAI-compatible endpoint, USDT & USDC payments, no credit card, and you're calling the API in under 30 seconds — just swap your base URL.

What Makes GLM-4.7 Flash Different

How GLM-4.7 Flash is used

GLM-4.7 Flash is used for general-purpose text tasks — chat, summarization, drafting, classification, and extraction. It handles the standard text-in/text-out case reliably. For specialized workloads (coding, reasoning, vision), a purpose-tuned sibling may perform better.

Pricing position within Zhipu AI

GLM-4.7 Flash is the cheapest active model in Zhipu AI's lineup at $0.00/1M input — no sibling undercuts it. The most expensive sibling costs $0.85/1M (Infinity% more). At scale, routing high-volume calls here vs the flagship saves significantly.

GLM-4.7 Flash's role in the lineup

Within Zhipu AI's lineup, GLM-4.7 Flash is the entry-level option — cheapest per token, designed for high-volume workloads. Other glm family variants offer more capability at higher cost. If you hit quality limits, step up to a mid-tier or flagship sibling.

Real-world use cases

Real-world deployments: customer support chatbots, content drafting and summarization, classification pipelines, and extraction workflows. GLM-4.7 Flash handles the standard text-in/text-out case reliably — route specialized tasks (vision, coding, reasoning) to purpose-tuned siblings.

vs sibling models

What makes GLM-4.7 Flash different from sibling models: compared to GLM-5.1 ($0.85/1M more expensive, same 128K context); GLM-5-Turbo ($0.70/1M more expensive, same 128K context); GLM-4.5-Air ($0.11/1M more expensive, same 128K context). Choose GLM-4.7 Flash when cost per token is the priority.

API Examples

Python

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.apiyihe.org/v1"
)

response = client.chat.completions.create(
    model="glm-4.7-flash",
    messages=[
        {"role": "user", "content": "Hello"}
    ]
)

print(response.choices[0].message.content)

JavaScript / Node.js

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.API_KEY,
  baseURL: "https://api.apiyihe.org/v1"
});

const response = await client.chat.completions.create({
  model: "glm-4.7-flash",
  messages: [
    { role: "user", content: "Hello" }
  ]
});

console.log(response.choices[0].message.content);

cURL

curl https://api.apiyihe.org/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "glm-4.7-flash",
    "messages": [
      {"role": "user", "content": "Hello"}
    ]
  }'

Supported Features

Vision / Image Input	❌ Not Available
Audio / Voice Input	❌ Not Available
Function Calling	❌ Not Available
JSON Mode	❌ Not Available
Streaming	✅ Supported
Fine-Tuning	❌ Not Available
Multimodal	❌ Not Available

Benchmark Scores

Benchmark	Score
MMLU	Not Publicly Available
GPQA	Not Publicly Available
SWE-Bench	Not Publicly Available
HumanEval	Not Publicly Available
GSM8K	Not Publicly Available
MATH	Not Publicly Available
MMMU	Not Publicly Available

Scores are from official provider publications. Empty fields indicate benchmarks not yet publicly disclosed.

Pricing History

GLM-4.7 Flash was released in 2026-05 by Zhipu AI and is currently publicly available via AI API Hub.

Current Pricing: $0.00 per 1M input tokens · $0.00 per 1M output tokens. Pay-as-you-go with no minimum commitment.

Pricing Model: Token-based billing (pay per use). No subscription fees. No hidden costs.

💡 Zhipu AI occasionally updates pricing. AI API Hub reflects current pricing in real-time. All prices in USD. Pay with USDT or USDC — no currency conversion fees.

Compare Alternatives

GPT-5.5$5.00/1M

OpenAI · 256K context

GPT-5.4$2.50/1M

OpenAI · 256K context

GLM-4.7 Flash vs GPT-5.5 GLM-4.7 Flash vs GPT-5.4 GLM-4.7 Flash vs GPT-4.1 GLM-4.7 Flash vs GPT-4.1 Mini

Frequently Asked Questions

What is GLM-4.7 Flash?

GLM-4.7 Flash is Zhipu AI's current glm model. Free tier GLM model. Currently in open beta — perfect for testing and prototyping. It offers a 128K context window and supports Bilingual, Streaming. You can access it through AI API Hub using USDT or USDC — no credit card required.

How much does GLM-4.7 Flash cost?

GLM-4.7 Flash is priced at $0.00 per 1M input tokens and $0.00 per 1M output tokens, billed pay-as-you-go with no minimum. Through AI API Hub you can start with as little as $5 and scale from there.

GLM-4.7 Flash vs GPT-5.5?

They're built for different jobs. GLM-4.7 Flash costs $0.00/1M input with a 128K window; GPT-5.5 runs $5.00/1M input with 256K. GLM-4.7 Flash is the more cost-effective pick and still brings free. See the full side-by-side at /compare/glm-4.7-flash-vs-gpt-5.5/.

GLM-4.7 Flash context window?

GLM-4.7 Flash has a 128K context window, capable of processing up to 128,000 tokens in a single request. Maximum output tokens: 4,096.

Does GLM-4.7 Flash support function calling?

No, GLM-4.7 Flash does not natively support function calling. For function calling use cases, consider Zhipu AI's flagship models.

Is GLM-4.7 Flash multimodal?

No, GLM-4.7 Flash is a text-only model. For multimodal use cases, consider models with vision/audio capabilities.

GLM-4.7 Flash API rate limits?

GLM-4.7 Flash rate limits: 10K RPM. Higher tier plans offer increased throughput. For high-volume production use, consider Zhipu AI's faster variant models.

How to access GLM-4.7 Flash API?

Access GLM-4.7 Flash through AI API Hub: (1) Register at api.apiyihe.org/register?aff=8JZC, (2) Deposit USDT/USDC, (3) Get your API key instantly, (4) Use the OpenAI-compatible endpoint https://api.apiyihe.org/v1 with model name "glm-4.7-flash". Start building in under 30 seconds.

Get GLM-4.7 Flash API Access

Pay with USDT & USDC. Same model, up to 70% less.

アカウント作成