GoogleActive

Gemini 2.5 Flash Lite API

Google's most affordable Gemini model. Ideal for high-volume, cost-sensitive applications like classification and simple extraction.

💰 Save up to 70% vs official Google pricing

TL;DR

Price: $0.10/1M input · $0.40/1M output

Context: 1M · max 8,192 output

Provider: Google

Cost advantage: Cheaper than official API · No credit card

Gemini 2.5 Flash Lite — cheaper than the official Google API

Access Gemini 2.5 Flash Lite through AI API Hub and pay less per token. Same OpenAI-compatible endpoint, lower cost.

$0.10/1M

input token price

INPUT / 1M tokens

$0.10

OUTPUT / 1M tokens

$0.40

CONTEXT WINDOW

Technical Specifications

Provider	Google
Model Family	Gemini 2.5 Flash Lite
Release Date	2026-05
Context Window	1M
Max Output Tokens	8,192
Input Price	$0.10 / 1M tokens
Output Price	$0.40 / 1M tokens
Vision Support	Yes ✓
Function Calling	No
JSON Mode	No
Streaming	Yes ✓
Fine Tuning	Not Available
Status	Active ✓

Overview

Gemini 2.5 Flash Lite is Google's current gemini model, released in 2026-05. Google's most affordable Gemini model. Ideal for high-volume, cost-sensitive applications like classification and simple extraction.

For developers, the headline numbers are a 1M context window and up to 8,192 output tokens per response — enough headroom for free tier and ultra-low cost without chunking your input. Priced at $0.10/1M input and $0.40/1M output, it sits in the budget tier — ideal for high-volume pipelines where token cost dominates.

On the capability side, Gemini 2.5 Flash Lite exposes 2 features: Vision, Streaming. Note that fine-tuning isn't supported — you'll work with the base model. Vision support means you can pass images alongside text, handy for document parsing or UI automation.

The practical appeal of routing Gemini 2.5 Flash Lite through AI API Hub is simplicity: one OpenAI-compatible endpoint, USDT & USDC payments, no credit card, and you're calling the API in under 30 seconds — just swap your base URL.

What Makes Gemini 2.5 Flash Lite Different

How Gemini 2.5 Flash Lite is used

Gemini 2.5 Flash Lite is used for document understanding, image-based Q&A, OCR-free receipt/invoice parsing, and screenshot analysis. Route pure text workloads to a cheaper non-vision sibling — the vision capability adds cost that's wasted on text-only tasks.

Pricing position within Google

Gemini 2.5 Flash Lite is the cheapest active model in Google's lineup at $0.10/1M input — no sibling undercuts it. The most expensive sibling costs $2.00/1M (1900% more). At scale, routing high-volume calls here vs the flagship saves significantly.

Gemini 2.5 Flash Lite's role in the lineup

Within Google's lineup, Gemini 2.5 Flash Lite is the entry-level option — cheapest per token, designed for high-volume workloads. Other gemini family variants offer more capability at higher cost. If you hit quality limits, step up to a mid-tier or flagship sibling.

Real-world use cases

Real-world deployments: OCR-free receipt and invoice parsing, screenshot analysis for support tickets, image-based content moderation, and visual Q&A for accessibility tools. Gemini 2.5 Flash Lite reads images directly — no separate OCR step needed.

vs sibling models

What makes Gemini 2.5 Flash Lite different from sibling models: compared to Gemini 2.5 Pro ($1.15/1M more expensive, same 1M context); Gemini 2.0 Flash ($0.00/1M cheaper, same 1M context); Gemini 3.5 Flash ($1.40/1M more expensive, same 1M context). Choose Gemini 2.5 Flash Lite when vision input is needed.

API Examples

Python

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.apiyihe.org/v1"
)

response = client.chat.completions.create(
    model="gemini-2-5-flash-lite",
    messages=[
        {"role": "user", "content": "Hello"}
    ]
)

print(response.choices[0].message.content)

JavaScript / Node.js

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.API_KEY,
  baseURL: "https://api.apiyihe.org/v1"
});

const response = await client.chat.completions.create({
  model: "gemini-2-5-flash-lite",
  messages: [
    { role: "user", content: "Hello" }
  ]
});

console.log(response.choices[0].message.content);

cURL

curl https://api.apiyihe.org/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "gemini-2-5-flash-lite",
    "messages": [
      {"role": "user", "content": "Hello"}
    ]
  }'

Supported Features

Vision / Image Input	✅ Supported
Audio / Voice Input	❌ Not Available
Function Calling	❌ Not Available
JSON Mode	❌ Not Available
Streaming	✅ Supported
Fine-Tuning	❌ Not Available
Multimodal	❌ Not Available

Benchmark Scores

Benchmark	Score
MMLU	Not Publicly Available
GPQA	Not Publicly Available
SWE-Bench	Not Publicly Available
HumanEval	Not Publicly Available
GSM8K	Not Publicly Available
MATH	Not Publicly Available
MMMU	Not Publicly Available

Scores are from official provider publications. Empty fields indicate benchmarks not yet publicly disclosed.

Pricing History

Gemini 2.5 Flash Lite was released in 2026-05 by Google and is currently publicly available via AI API Hub.

Current Pricing: $0.10 per 1M input tokens · $0.40 per 1M output tokens. Pay-as-you-go with no minimum commitment.

Pricing Model: Token-based billing (pay per use). No subscription fees. No hidden costs.

💡 Google occasionally updates pricing. AI API Hub reflects current pricing in real-time. All prices in USD. Pay with USDT or USDC — no currency conversion fees.

Compare Alternatives

GPT-5.5$5.00/1M

OpenAI · 256K context

GPT-5.4$2.50/1M

OpenAI · 256K context

Gemini 2.5 Flash Lite vs GPT-5.5 Gemini 2.5 Flash Lite vs GPT-5.4 Gemini 2.5 Flash Lite vs GPT-4.1 Gemini 2.5 Flash Lite vs GPT-4.1 Mini

Frequently Asked Questions

What is Gemini 2.5 Flash Lite?

Gemini 2.5 Flash Lite is Google's current gemini model. Google's most affordable Gemini model. Ideal for high-volume, cost-sensitive applications like classification and simple extraction. It offers a 1M context window and supports Vision, Streaming. You can access it through AI API Hub using USDT or USDC — no credit card required.

How much does Gemini 2.5 Flash Lite cost?

Gemini 2.5 Flash Lite is priced at $0.10 per 1M input tokens and $0.40 per 1M output tokens, billed pay-as-you-go with no minimum. Through AI API Hub you can start with as little as $5 and scale from there.

Gemini 2.5 Flash Lite vs GPT-5.5?

They're built for different jobs. Gemini 2.5 Flash Lite costs $0.10/1M input with a 1M window; GPT-5.5 runs $5.00/1M input with 256K. Gemini 2.5 Flash Lite is the more cost-effective pick and still brings free tier. See the full side-by-side at /compare/gemini-2-5-flash-lite-vs-gpt-5.5/.

Gemini 2.5 Flash Lite context window?

Gemini 2.5 Flash Lite has a 1M context window, capable of processing up to 1,048,576 tokens in a single request. Maximum output tokens: 8,192.

Does Gemini 2.5 Flash Lite support function calling?

No, Gemini 2.5 Flash Lite does not natively support function calling. For function calling use cases, consider Google's flagship models.

Is Gemini 2.5 Flash Lite multimodal?

Partially — Gemini 2.5 Flash Lite supports vision (image input) but not native audio processing.

Gemini 2.5 Flash Lite API rate limits?

Gemini 2.5 Flash Lite rate limits: 10K RPM. Higher tier plans offer increased throughput. For high-volume production use, consider Google's faster variant models.

How to access Gemini 2.5 Flash Lite API?

Access Gemini 2.5 Flash Lite through AI API Hub: (1) Register at api.apiyihe.org/register?aff=8JZC, (2) Deposit USDT/USDC, (3) Get your API key instantly, (4) Use the OpenAI-compatible endpoint https://api.apiyihe.org/v1 with model name "gemini-2-5-flash-lite". Start building in under 30 seconds.

Get Gemini 2.5 Flash Lite API Access

Pay with USDT & USDC. Same model, up to 70% less.

إنشاء حساب