# OpenAI Image Vision - Usage Examples ## Setup Set up your API credentials using the agent's env_config tool: ```bash # Set your OpenAI API key env_config(action="set", key="OPENAI_API_KEY", value="sk-your-api-key-here") # Optional: Set custom API base URL (for proxy or compatible services) env_config(action="set", key="OPENAI_API_BASE", value="https://api.openai.com/v1") ``` ## Example 1: Analyze a Local Image ```bash bash scripts/vision.sh "/path/to/photo.jpg" "What's in this image?" ``` **Expected Output:** ```json { "model": "gpt-4.1-mini", "content": "The image shows a beautiful landscape with mountains in the background and a lake in the foreground. The sky is clear with some clouds, and there are trees along the shoreline.", "usage": { "prompt_tokens": 1234, "completion_tokens": 45, "total_tokens": 1279 } } ``` ## Example 2: Analyze an Image from URL ```bash bash scripts/vision.sh "https://example.com/image.jpg" "Describe this image in detail" ``` ## Example 3: Extract Text (OCR) ```bash bash scripts/vision.sh "document.png" "Extract all text from this image" ``` **Use Case:** Extract text from screenshots, scanned documents, or photos of text. ## Example 4: Identify Objects ```bash bash scripts/vision.sh "scene.jpg" "List all objects you can identify in this image" ``` ## Example 5: Analyze Colors and Composition ```bash bash scripts/vision.sh "artwork.jpg" "Describe the color palette and composition of this image" ``` ## Example 6: Count Items ```bash bash scripts/vision.sh "crowd.jpg" "How many people are in this image?" ``` ## Example 7: Use Different Models ```bash # Use gpt-4.1-mini (default, latest mini model) bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4.1-mini" # Use gpt-4.1 (most capable, best for complex analysis) bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4.1" # Use gpt-4o-mini (previous mini model) bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4o-mini" ``` ## Example 8: Complex Analysis ```bash bash scripts/vision.sh "product.jpg" "Analyze this product image. Describe the product, its features, colors, and suggest what kind of marketing copy would work well for it." ``` ## Example 9: Safety and Content Moderation ```bash bash scripts/vision.sh "content.jpg" "Is there any inappropriate or unsafe content in this image?" ``` ## Example 10: Technical Analysis ```bash bash scripts/vision.sh "diagram.png" "Explain what this technical diagram represents and how it works" ``` ## Integration with Agent When the agent loads this skill, it will be available in the `` section. The agent can use it like: ```bash bash "/scripts/vision.sh" "user_uploaded_image.jpg" "What's in this image?" ``` The `` will be automatically provided by the skill system. ## Error Handling Examples ### Missing API Key ```bash $ bash scripts/vision.sh "image.jpg" "What is this?" {"error": "OPENAI_API_KEY environment variable is not set", "help": "Visit https://platform.openai.com/api-keys to get an API key"} ``` ### File Not Found ```bash $ bash scripts/vision.sh "nonexistent.jpg" "What is this?" {"error": "Image file not found", "path": "nonexistent.jpg"} ``` ### Unsupported Format ```bash $ bash scripts/vision.sh "file.bmp" "What is this?" {"error": "Unsupported image format", "extension": "bmp", "supported": ["jpg", "jpeg", "png", "gif", "webp"]} ``` ### Missing Parameters ```bash $ bash scripts/vision.sh {"error": "Image path or URL is required", "usage": "bash vision.sh [model]"} ``` ## Tips for Best Results 1. **Be Specific**: Ask clear, specific questions about what you want to know 2. **Image Quality**: Higher quality images generally produce better results 3. **Model Selection**: - Use `gpt-4.1` for complex analysis requiring highest accuracy - Use `gpt-4.1-mini` (default) for most tasks - latest mini model with good balance 4. **Text Extraction**: For OCR tasks, ensure text is clearly visible and not too small 5. **Multiple Aspects**: You can ask about multiple things in one question 6. **Context**: Provide context in your question if needed (e.g., "This is a medical scan, what do you see?") ## Performance Notes - **Local Files**: Automatically base64-encoded, adds ~33% size overhead - **URLs**: Passed directly to API, no encoding overhead - **Timeout**: 60 seconds for API calls - **Max Tokens**: 1000 tokens for responses (configurable in script) - **Rate Limits**: Subject to your OpenAI API plan ## Supported Image Formats ✅ JPEG (`.jpg`, `.jpeg`) ✅ PNG (`.png`) ✅ GIF (`.gif`) ✅ WebP (`.webp`) ❌ BMP, TIFF, SVG, and other formats are not supported ## Cost Considerations Vision API calls cost more than text-only calls because they include image tokens. Costs vary by: - Model used (gpt-4.1 vs gpt-4.1-mini) - Image size and resolution - Length of response Check OpenAI's pricing page for current rates: https://openai.com/pricing