The Complete Guide to Prompting with Nano Banana Pro: How to Generate Stunning Images with AI

Nano Banana Pro

Introduction: What Is Nano Banana Pro?

Built on Gemini 3, Nano Banana Pro is Google’s most advanced image model to date, ready to bridge the gap between imagination and professional execution. Google DeepMind introduced Nano Banana Pro as a new image generation and editing model built on Gemini 3 Pro.

Nano Banana Pro is a significant leap forward from previous generation models, moving from “fun” image generation to “functional” professional asset production. It excels in text rendering, character consistency, visual synthesis, world knowledge (Search), and high-resolution (4K) output.

Across Google’s products and services, you now have a choice: the original Nano Banana for fast, fun editing, or Nano Banana Pro for complex compositions requiring the highest quality and visually sophisticated results.

The model stands apart through its reasoning-driven approach. Rather than simply matching keywords from your prompt, Nano Banana Pro understands intent, context, and real-world relationships. This intelligence enables it to generate images with unprecedented accuracy, consistency, and creative control.

Section 1: How to Access Nano Banana Pro

Before diving into prompts, you need to know how to reach the tool.

To access Nano Banana, select “🍌 Create images” from the tools menu. You can use the “Fast,” “Thinking,” or “Pro” model from the model menu. Then add a prompt or upload an image to edit.

Google AI Pro, Plus, Ultra users can regenerate images using Nano Banana Pro by selecting the three-dot menu and then “Redo with Pro.”

For developers and power users, while end-users can access Nano Banana Pro in the Gemini app, the best environment for developers to prototype and test prompts is Google AI Studio. AI Studio is a playground to experiment with all available AI models before writing any code, and it’s also the entry point for building with the Gemini API. You can use Nano Banana Pro within AI Studio by going to aistudio.google.com, signing in with your Google account, and selecting Nano Banana Pro (Gemini 3 Pro Image) from the model picker.

Note that, contrary to Nano Banana, the Pro version doesn’t have a free tier, which means you need to select an API key with billing enabled.

Section 2: Technical Specifications You Should Know

Understanding what the model can technically do helps you write better prompts and set the right expectations.

Gemini 3.1 Flash Image (Nano Banana 2) supports a maximum of 131,072 input tokens, while Gemini 3 Pro Image (Nano Banana Pro) supports a maximum of 65,536 input tokens. Both models support a maximum of 32,768 output tokens.

Built-in generation capabilities exist for 1K, 2K, and 4K visuals. Gemini 3.1 Flash Image adds the smaller 512px (0.5K) resolution.

Both models support aspect ratios of 1:1, 3:2, 2:3, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, and 21:9.

With Nano Banana Pro, you can blend more elements than ever before, using up to 14 images and maintaining the consistency and resemblance of up to 5 people.

The model renders clean text and supports detection/translation across 10 languages.

All generated images include built-in transparency features: you can simply upload an image to the Gemini app and ask if it was generated by Google AI. This verification is powered by SynthID, Google’s digital watermarking technology.

Section 3: The Foundation of a Great Prompt

Before trying any advanced techniques, it’s important to understand the core building blocks of an effective Nano Banana Pro prompt. Think of a prompt as a structured brief: start with the core subject, add the details that matter, then use constraints to narrow the output. This approach improves consistency and reduces “close, but not quite” results.

Break down what you want to generate into pieces, and list them from most important to least important. Earlier details have more influence on the final result.

According to Google’s official prompt guide, to achieve the best results and have more nuanced creative control, include the following elements in your prompt:

Subject: Who or what is in the image? Be specific — for example, a bartender, a stoic robot barista with glowing blue optics, or a fluffy calico cat wearing a tiny wizard hat.
Composition: Camera framing, such as close/medium/wide shot, angle, lens (e.g., 35mm), depth of field, and subject placement.
Action: What it’s doing and how it interacts with others or objects.
Location: Place, time, weather, and environment — indoors/outdoors, city/nature, etc.
Style & Medium: Photo/illustration/3D/watercolor/pixel, realism level, and art direction.
Editing Instructions: For modifying an existing image, be direct and specific — for example, “change the man’s tie to green” or “remove the car in the background.”

The Core Prompting Rules

Be concrete, not vague. Instead of “better” or “more premium,” specify “realistic photo, soft light, shallow depth of field, warm/cool contrast.” Put the important details first: subject, action, setting, and style should come before secondary details.

Avoid contradictions — for example, “minimal white background” and “dense complex background details” in the same prompt.

Use constraints to narrow results, especially “no text,” “no watermark,” “no extra people,” “no deformed hands,” or “no heavy blur.”

Section 4: Five Prompting Frameworks (From Official Documentation)

Google’s official prompting guide for Nano Banana models identifies five core frameworks that cover the most common use cases. Here they are in detail.

Framework 1: Text-to-Image Generation (No Reference Images)

When starting from a blank canvas, a simple list of keywords won’t cut it. You need to describe the scene narratively. The official formula from Google Cloud is:

Formula: [Subject] + [Action] + [Location/Context] + [Composition] + [Style]

Example prompt (from Google Cloud’s official guide):

[Subject] A striking fashion model wearing a tailored brown dress, sleek boots, and holding a structured handbag. [Action] Posing with a confident, statuesque stance, slightly turned. [Location] A seamless, deep cherry red studio backdrop. [Composition] Medium-full shot, center-framed. [Style] Fashion magazine editorial, shot on medium-format analog film, pronounced grain, high saturation, cinematic lighting effect.

Nano Banana Pro is a “Thinking” model — it doesn’t just match keywords; it understands intent, physics, and composition. To get the best results, stop using “tag soups” (e.g., dog, park, 4k, realistic) and start acting like a Creative Director.

Framework 2: Multimodal Generation (With Reference Images)

Gemini allows you to combine multiple reference images to guide the final output. This is perfect for maintaining character consistency or merging a specific product into a new environment.

Formula: [Reference images] + [Relationship instruction] + [New scenario]

Example prompt: “Using the attached napkin sketch as the structure and the attached fabric sample as the texture [References], transform this into a high-fidelity 3D armchair render [Relationship]. Place it in a sun-drenched, minimalist living room [New Scenario].”

When using uploaded images, clearly define the role of each one. For example: “Use Image A for the character’s pose, Image B for the art style, and Image C for the background environment.”

Best practices for reference images:

Define each image’s role clearly: “Use Image A for facial features, Image B for pose, Image C for background.” Maintain consistency by keeping similar lighting and perspective across inputs. Start simple: begin with 2–3 images before attempting all 14.
Nano Banana Pro supports up to 14 reference images, providing comprehensive visual guidance.

Framework 3: Image Editing

Editing requires a different mindset than generating. You already have a base image; your prompt needs to focus on what is changing and what is staying the same.

Semantic masking (inpainting): You can define a “mask” through text to edit a specific part of an image while leaving the rest untouched.

Conversational Editing Tip: The model is exceptionally good at understanding conversational edits. If an image is 80% correct, do not generate a new one from scratch. Instead, simply ask for the specific change you need.

The model also supports composition and style transfer with new references:

Adding elements: Upload a base image and an object image, and tell the model to combine them.
Style transfer: Upload a photo and ask the model to recreate its exact content in a different artistic style — for example, transforming a photo of a modern city street into a Van Gogh–style painting.

A style transfer example:

“Convert the image into a watercolor illustration: soft bleeding edges, paper texture, gentle washes. Colors should be brighter but not oversaturated. Do not add new elements. No text.”

Framework 4: Real-Time Web Search Integration

One of Nano Banana Pro’s most unique capabilities is its connection to live information.

Nano Banana Pro uses Google Search to generate imagery based on real-time data, current events, or factual verification, reducing hallucinations on timely topics.

Formula: [Source/Search request] + [Analytical task] + [Visual translation]

Example prompt (from Google Cloud’s official guide):

“[Search for current weather and date in San Francisco] + [Analytically, use this data to modify the scene — if raining, make it look grey and rainy] + [Visualize this in a miniature city-in-a-cup concept embedded within a realistic, modern smartphone UI.]”

More examples of real-time search prompts:

Data visualization: “Visualize the current stock value of the main tech companies and the current trends. For each add some explanation on what happened recently which could explain that trend.”
Event visualization: “Generate an infographic of the best times to visit the U.S. National Parks in 2025 based on current travel trends.”

Framework 5: Text Rendering & Localization

Finally, an AI that actually spells correctly. Nano Banana Pro renders legible text in multiple languages, making it invaluable for creating mockups, infographics, and branded content without post-processing.

To get the best typographic results, follow these official rules from the Google Cloud documentation:

Use quotes: Enclose your desired words in quotes (e.g., “Happy Birthday” or “URBAN EXPLORER”).
Choose a font: Describe the typography style or font name. Prompt for a “bold, white, sans-serif font” or “Century Gothic 12px font.”
Translate and localize: Write your prompt in one language and specify a target language for the text output.
Text-first strategy: When generating text for an image, Gemini Image models work best if you first converse with it to generate the text concepts, and then ask for an image with that text.

Full example prompt from Google’s official documentation:

“A high-end, glossy commercial beauty shot of a sleek, minimalist nude-colored face moisturizer jar resting on a warm studio background. The lighting is soft and radiant. Next to the product, render three lines of text: ‘GLOW’ in a flowing, elegant Brush Script font; ‘10% OFF’ in a heavy, blocky Impact font; ‘Your First Order’ in a thin, minimalist Century Gothic font.” Then translate the text into Korean and Arabic.

One of Nano Banana Pro’s most powerful professional features is its ability to handle multilingual text rendering with high accuracy. Unlike other models that struggle with non-Latin characters, this model supports a wide range of global languages, making it a useful tool for localizing marketing materials, menus, and signage. The model can generate text in one language and then translate it into another while preserving the original visual elements, lighting, and style of the image. This semantic translation ensures that your brand identity remains consistent across global campaigns.

Section 5: Prompting Like a Creative Director

With Nano Banana Pro’s capabilities, Google is putting advanced creative controls directly into users’ hands. Here are four studio-quality controls you should master, drawn directly from the official Google Cloud prompting guide.

1. Design Your Lighting

Tell the model exactly how the scene is illuminated:

Studio setups: Ask for a “three-point softbox setup” to evenly light a product.
Dramatic effects: Prompt for “Chiaroscuro lighting with harsh, high contrast” or “Golden hour backlighting creating long shadows.”
For portrait work, try: “Introduce harsh, directional light, appearing to come from above and slightly to the left, casting deep, defined shadows across the face. Only slivers of light illuminating the eyes and cheekbones, the rest of the face is in deep shadow.”

2. Choose Your Camera, Lens, and Focus

Use specific hardware and photographic terminology to control depth, distortion, and perspective:

Hardware: Dictate the exact camera type — ask for a shot taken on a GoPro for an immersive, distorted action feel, a Fujifilm for authentic color science, or a disposable camera for a raw, nostalgic flash aesthetic.
Lens: Force the perspective with “a low-angle shot with a shallow depth of field (f/1.8).” For vast scale, use “a wide-angle lens.” For intricate details, specify “a macro lens.”
You can also bring out the details of a composition by adjusting the depth of field or focal point — for example, “focusing on the flowers.”

3. Define Color Grading and Film Stock

If you want a nostalgic or gritty vibe, tell the model to render the image “as if on 1980s color film, slightly grainy.” For a modern, moody aesthetic, ask for “Cinematic color grading with muted teal tones.”

4. Emphasize Materiality and Texture

When generating logos, products, or characters, define their physical makeup. Don’t just ask for a suit jacket; ask for “navy blue tweed.” Instead of “armor,” describe “ornate elven plate armor, etched with silver leaf patterns.” If you are designing a mockup, specify the surface, like a “minimalist ceramic coffee mug.”

Section 6: The SCALIST Prompt Framework

For users who want a memorable, all-in-one prompting system, build prompts using the SCALIST framework: Subject, Composition, Action, Location, Image style, Specs (technical parameters), and Text rendering. Start broad, then add specific details that matter to your use case.

Here’s how to apply it in practice with an infographic use case:

Subject: A vertical timeline infographic about AI history
Composition: Symmetrical, evenly spaced, bold header at the top
Action: Showing major milestones from 2010 to 2025
Location: Digital screen format
Image style: Clean, modern, white background, thin grey dividers
Specs: Circular markers, smooth gradient accents in blue, 9:16 aspect ratio
Text rendering: Bold readable fonts, short text blocks

Full example prompt: “Create a clean, modern vertical timeline infographic illustrating the major milestones of AI evolution from 2010 to 2025. Use a white background, thin grey dividers, and circular markers for each year. Include minimal icons, short text blocks, and smooth gradient accents in blue. Ensure the layout is symmetrical, evenly spaced, and visually balanced, with a bold header at the top reading ‘Evolution of AI: 2010–2025.’”

Section 7: Character Consistency Across Multiple Images

One of the most impressive capabilities of Nano Banana Pro is its ability to keep characters consistent.

The model maintains the same person’s appearance across different scenes, poses, and environments. You can create entire visual narratives with recognizable characters that stay consistent throughout your campaign.

For comic panels or storyboards, Nano Banana Pro maintains facial features, proportions, and styling while placing characters in different poses, expressions, and environments.

Here is an advanced multi-character group prompt example used in official Google DeepMind documentation:

Example prompt: “A medium shot of the 14 fluffy characters sitting squeezed together side-by-side on a worn beige fabric sofa and on the floor. They are all facing forwards, watching a vintage, wooden-boxed television set placed on a low wooden table in front of the sofa. The room is dimly lit, with warm light from a window on the left and the glow from the TV illuminating the creatures’ faces and fluffy textures.”

For multi-person editorial shots, try: “Put these five people and this dog into a single image, they should fit into a stunning award-winning shot in the style of a fashion editorial. The identity of all five people and their attire and the dog must stay consistent throughout but they can and should be seen from different angles and distances as is most natural and suitable to the scene. Make the colour and lighting look natural on them all.”

Section 8: Practical Prompts by Use Case

Product Photography

Example: “Generate product lifestyle photo maintaining brand aesthetic from references, using warm earth tones with blue accents as shown.”

Storyboard Creation

Input the initial frame photo, and Nano Banana can generate subsequent frame photos, which you can edit as needed to create a movie storyboard. You can then use the picture-to-video function to generate a coherent video from each frame without changing the characters.

Sticker and Asset Generation

To create stickers, simply upload an image or describe your vision, and Nano Banana will bring your concepts to life with incredible detail and vibrant colors.

Photo Editing Without Expertise

Nano Banana can edit pictures without any professional photo editing expertise — including background replacement, lighting adjustment, color change, and item replacement.

Brand Identity Systems

For a complete brand identity workflow, the official Google documentation showcases this two-step approach:

Step 1 — Logo generation: Describe the logo concept in full detail, including typography style, color palette, and visual metaphor.

Step 2 — Identity rollout: “Now create an identity system, one by one. Use 10 high-quality mockups with a variety of relevant products, ads, billboards, bus stops, etc. Generate one at a time, 16:9 each.”

Section 9: Common Mistakes to Avoid

Avoid overloading a single prompt. A bad example is describing 15 different elements in one prompt. A good example is focusing on 4–6 of the most important elements, generating a base image, then refining through conversation.

Specify camera settings and lighting. Failing to do so limits control over the final appearance. A good example: “85mm f/1.8 lens, natural window light from left, shallow depth of field.”

Consider your output format before generating. Generating without considering the final use leads to sizing and formatting issues. A good example: “9:16 for Instagram Stories” or “1200×628 for Facebook ads.”

Nano Banana Pro prioritizes logical consistency — mixing too many visual rules reduces clarity and adds instability to textures.

Section 10: Going Further — Integrations

Nano Banana Pro and Nano Banana 2 are designed to work seamlessly with other generative creation models. Gemini 3 can help you create prompts and with creative direction. You can create keyframes with Nano Banana to direct an animation, then use Veo to generate the video between them.

If you need to rapidly generate creative images — such as social ads, A/B test graphics, or Instagram/TikTok visuals — Nano Banana is your go-to tool for fast prototyping and high-volume experimentation.

If you’re producing brand assets, cross-language advertising (English/German/Japanese/Chinese), high-resolution materials (2K/4K), and need consistency across multiple channels, Nano Banana Pro excels in text clarity, multilingual support, brand consistency, high resolution, and professional control.

Section 11: Known Limitations

Google openly acknowledges areas still under development. From the official Gemini blog:

Visual and text fidelity: Rendering small text, fine details, and producing accurate spellings may not work perfectly.
Data and factual accuracy: Always verify the factual accuracy of data-driven visuals like diagrams and infographics.
Translation and localization: Multilingual text generation, factual accuracy in complex diagrams, and advanced editing tasks may still need improvement.
Complex edits and image blending: Advanced editing tasks like blending or lighting changes can sometimes produce unnatural artifacts.
Character features: While usually reliable, character consistency across edits may vary.

Quick-Reference Prompt Cheat Sheet

Goal	Prompt Structure
Fashion photo	Subject + pose + backdrop + shot type + film style
Text in image	Scene description + quoted text in “font name” font
Style transfer	Upload image + “Convert to [style], keep layout unchanged”
Multi-character scene	Upload reference images + describe scene with lighting
Infographic	Layout type + color palette + section count + text content
Real-time data viz	“Search for [X], visualize it as [format]”
Product mockup	Product description + surface material + lighting setup
Brand identity	Logo prompt → identity system prompt with mockups

Conclusion

Prompt writing in Nano Banana Pro is part art, part engineering. Keep it structured, logical, and expressive. The more you guide the model like a human designer, the closer your output will resemble professional work. Remember: clarity creates quality.

By following the official frameworks — text-to-image generation, multimodal composition, conversational editing, real-time search, and text rendering — and by applying Creative Director thinking to your lighting, lenses, color grading, and materials, you can unlock the full professional potential of Nano Banana Pro. Start simple, iterate with follow-up prompts, and let the model’s built-in reasoning do the heavy lifting.

Amit Shrivastava