Gemini API Python Guide: Flash & Pro Integration

Introduction

Google Gemini 2.0 brings a massive 1M token context window, native multimodal capabilities, and code execution to the OpenAI-compatible API format. This guide covers integrating Gemini through AI API Hub.

Why Gemini?

| Feature | Gemini 2.0 Flash | GPT-4o | |---|---|---| | Input Price | $1.50/1M | $2.50/1M | | Output Price | $6.00/1M | $10.00/1M | | Context Window | 1M | 128K | | Multimodal | Text, Image, Audio, Video | Text, Image | | Google Integration | Deep | None |

Setup

import openai

client = openai.OpenAI(
    api_key="your-api-key",
    base_url="https://api.apiyihe.org/v1"
)

Basic Usage

response = client.chat.completions.create(
    model="gemini-2.0-flash",
    messages=[
        {"role": "system", "content": "You are a research analyst."},
        {"role": "user", "content": "Analyze the impact of quantum computing on cybersecurity."}
    ],
    max_tokens=2000,
    temperature=0.7
)

print(response.choices[0].message.content)

Streaming

stream = client.chat.completions.create(
    model="gemini-2.0-flash",
    messages=[{"role": "user", "content": "Explain blockchain technology"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Large Context Processing

Gemini's 1M context window enables processing entire codebases or books in a single request.

import os

# Concatenate multiple files into context
files = ["api.py", "models.py", "services.py", "utils.py"]
codebase = ""
for f in files:
    with open(f"src/{f}") as file:
        codebase += f"\n// {f}\n" + file.read()

response = client.chat.completions.create(
    model="gemini-2.0-flash",
    messages=[{
        "role": "user",
        "content": f"Review this codebase and suggest architectural improvements:\n\n{codebase}"
    }],
    max_tokens=4000
)

Function Calling

functions = [{
    "name": "search_google",
    "description": "Search the web for information",
    "parameters": {
        "type": "object",
        "properties": {"query": {"type": "string"}},
        "required": ["query"]
    }
}]

response = client.chat.completions.create(
    model="gemini-2.0-flash",
    messages=[{"role": "user", "content": "What's the latest news about AI regulation in 2026?"}],
    functions=functions,
    function_call="auto"
)

Node.js Example

import OpenAI from "openai";
const client = new OpenAI({
  apiKey: "your-api-key",
  baseURL: "https://api.apiyihe.org/v1",
});
const r = await client.chat.completions.create({
  model: "gemini-2.0-flash",
  messages: [{ role: "user", content: "Summarize the latest AI research trends" }],
  max_tokens: 1000,
});
console.log(r.choices[0].message.content);

Common Issues

| Issue | Fix | |---|---| | Response cutoff | Increase max_tokens; Gemini supports up to 8192 output | | Slow with large context | Use Gemini 2.0 Flash (not Pro) for speed | | Safety filters block content | Gemini has strict safety; rephrase sensitive queries |

FAQ

Q: Is Gemini cheaper than GPT-4o? A: Yes. Gemini 2.0 Flash costs 40% less for input and output tokens.

Q: Can I use Gemini for coding? A: Yes. Gemini supports code generation and execution. Quality is comparable to GPT-4o for most tasks.

Q: What's the advantage of the 1M context? A: Process entire books, codebases, or conversation histories without splitting or summarizing.