Gemini API Python Guide: Flash & Pro Integration
Introduction
Google Gemini 2.0 brings a massive 1M token context window, native multimodal capabilities, and code execution to the OpenAI-compatible API format. This guide covers integrating Gemini through AI API Hub.
Why Gemini?
| Feature | Gemini 2.0 Flash | GPT-4o | |---|---|---| | Input Price | $1.50/1M | $2.50/1M | | Output Price | $6.00/1M | $10.00/1M | | Context Window | 1M | 128K | | Multimodal | Text, Image, Audio, Video | Text, Image | | Google Integration | Deep | None |
Setup
import openai
client = openai.OpenAI(
api_key="your-api-key",
base_url="https://api.apiyihe.org/v1"
)
Basic Usage
response = client.chat.completions.create(
model="gemini-2.0-flash",
messages=[
{"role": "system", "content": "You are a research analyst."},
{"role": "user", "content": "Analyze the impact of quantum computing on cybersecurity."}
],
max_tokens=2000,
temperature=0.7
)
print(response.choices[0].message.content)
Streaming
stream = client.chat.completions.create(
model="gemini-2.0-flash",
messages=[{"role": "user", "content": "Explain blockchain technology"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Large Context Processing
Gemini's 1M context window enables processing entire codebases or books in a single request.
import os
# Concatenate multiple files into context
files = ["api.py", "models.py", "services.py", "utils.py"]
codebase = ""
for f in files:
with open(f"src/{f}") as file:
codebase += f"\n// {f}\n" + file.read()
response = client.chat.completions.create(
model="gemini-2.0-flash",
messages=[{
"role": "user",
"content": f"Review this codebase and suggest architectural improvements:\n\n{codebase}"
}],
max_tokens=4000
)
Function Calling
functions = [{
"name": "search_google",
"description": "Search the web for information",
"parameters": {
"type": "object",
"properties": {"query": {"type": "string"}},
"required": ["query"]
}
}]
response = client.chat.completions.create(
model="gemini-2.0-flash",
messages=[{"role": "user", "content": "What's the latest news about AI regulation in 2026?"}],
functions=functions,
function_call="auto"
)
Node.js Example
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "your-api-key",
baseURL: "https://api.apiyihe.org/v1",
});
const r = await client.chat.completions.create({
model: "gemini-2.0-flash",
messages: [{ role: "user", content: "Summarize the latest AI research trends" }],
max_tokens: 1000,
});
console.log(r.choices[0].message.content);
Common Issues
| Issue | Fix | |---|---| | Response cutoff | Increase max_tokens; Gemini supports up to 8192 output | | Slow with large context | Use Gemini 2.0 Flash (not Pro) for speed | | Safety filters block content | Gemini has strict safety; rephrase sensitive queries |
FAQ
Q: Is Gemini cheaper than GPT-4o? A: Yes. Gemini 2.0 Flash costs 40% less for input and output tokens.
Q: Can I use Gemini for coding? A: Yes. Gemini supports code generation and execution. Quality is comparable to GPT-4o for most tasks.
Q: What's the advantage of the 1M context? A: Process entire books, codebases, or conversation histories without splitting or summarizing.