Image Generation

The AgentDock Open Source Client includes a dedicated image generation feature that demonstrates how to integrate advanced AI capabilities into applications built with AgentDock Core.

Overview

The image generation page provides a full-featured interface for creating and editing images using Gemini's multimodal capabilities. It showcases how the Open Source Client extends beyond basic chat functionality to implement richer AI experiences.

Key Features

Text-to-Image Generation: Create images from text prompts
Image Editing: Upload and modify existing images
Image Gallery: View and manage previously generated images
Responsive Design: Works on mobile and desktop devices
Integration with Chat: Images can be sent from chat for editing

Implementation Details

The image generation functionality is implemented as a standalone page in the Open Source Client, showcasing:

Client-Server Architecture:
- Client-side UI components for image upload, prompt input, and result display
- Server-side actions for image generation using Gemini
Stateful UI:
- Local state management for image data and generation process
- Progress indicators and error handling
API Integration:
- Direct integration with Gemini's multimodal capabilities
- Image persistence API leveraging:
  - Vercel Blob: For storing image URLs when deployed to Vercel.
  - Browser localStorage: For storing image data (e.g., base64 or URLs) when running locally, providing temporary persistence during development.
UI Components:
- ImageUpload: Handles image file selection and preview
- ImagePromptInput: Provides an interface for entering generation prompts
- ImageResultDisplay: Shows generation results with download/share options
- ImageGallerySkeleton: Loading state for the image gallery

Technical Architecture

The image generation feature demonstrates these key patterns:

Key Flow:

UI components trigger server actions for generation.
Server actions call the Gemini API.
Server actions use a persistence API route (/api/images/store/add) to save the resulting image URL (from Vercel Blob if deployed) or potentially pass data back to the client.
Client-side code receives the image URL/data and stores it in localStorage for local persistence during development, updating the UI state.

Usage Example

Navigate to the Image Generation page
Enter a text prompt describing the desired image
Optionally upload an existing image to modify
Click "Generate" to create the image
View, download, or continue editing the generated image
Access previously generated images from the gallery

Integration with AgentDock Core

This feature demonstrates how the Open Source Client extends the capabilities of AgentDock Core by:

Leveraging the provider-agnostic API design to integrate with Gemini
Implementing specialized UI components for multimodal interactions
Managing state and persistence for complex AI workflows
Providing a complete reference implementation of an advanced AI feature

Future Enhancements

Potential future enhancements for the image generation feature include:

Support for additional image generation models
Enhanced image editing capabilities
Integration with other parts of the application
Advanced prompt techniques like negative prompting
Collections and organization for generated images