Lesson 4: Media Messages (Multimodality)

This lesson demonstrates the correct way to send images to a multimodal (vision-capable) LLM using the SDK's built-in message helpers.

Code: lesson_4_media_messages.mjs

The most important part of this lesson is the message payload. To link an image with a question, you send the media message first, followed immediately by the user message in the same request array.

// lesson_4_media_messages.mjs
// Merci SDK Tutorial: Lesson 4 - Using Media Messages (Multimodality)

// --- IMPORTS ---
// We import helpers for each distinct message type we will send.
import { MerciClient, createMediaMessage, createUserMessage } from '../lib/merci.2.14.0.mjs';
import { token } from '../secret/token.mjs';

const MODEL = 'openai-gpt-5-mini';

async function main() {
    console.log(`--- Merci SDK Tutorial: Lesson 4 - Media Messages (Model: ${MODEL}) ---`);

    try {
        // --- STEP 1: INITIALIZE THE CLIENT ---
        console.log('[STEP 1] Initializing MerciClient...');
        const client = new MerciClient({ token });

        // --- STEP 2: DEFINE PROMPT AND INPUT DATA ---
        // We define the path to our local image and the text prompt that refers to it.
        console.log('[STEP 2] Preparing prompt and input data...');
        const imagePath = './image.png';
        const userPrompt = "What is in this image? Describe it in a single, detailed sentence.";

        // --- STEP 3: CONFIGURE THE CHAT SESSION ---
        // No special configuration is needed on the session itself, just the right model.
        console.log('[STEP 3] Configuring the chat session...');
        const chatSession = client.chat.session(MODEL);

        // --- STEP 4: PREPARE THE MESSAGE PAYLOAD ---
        // THIS IS THE MOST IMPORTANT PART OF THE LESSON.
        // To link an image with a question, you send the media message first,
        // followed immediately by the user message in the same request.
        console.log('[STEP 4] Building the message array with separate media and user messages...');
        const messages = [
            await createMediaMessage(imagePath),
            createUserMessage(userPrompt)
        ];

        // --- STEP 5: EXECUTE THE REQUEST & PROCESS THE RESPONSE ---
        console.log('[STEP 5] Sending multimodal request and processing stream...');
        let finalResponse = '';
        process.stdout.write('🤖 Vision Assistant > ');

        for await (const event of chatSession.stream(messages)) {
            if (event.type === 'text') {
                process.stdout.write(event.content);
                finalResponse += event.content;
            }
        }
        process.stdout.write('
');
        console.log('
[INFO] Stream finished. Response fully received.');


        // --- FINAL RESULT ---
        console.log('

--- FINAL RESULT ---');
        console.log(`🖼️ Media > ${imagePath}`);
        console.log(`👤 User > ${userPrompt}`);
        console.log(`🤖 Vision Assistant > ${finalResponse}`);
        console.log('--------------------');

    } catch (error) {
        // --- ROBUST ERROR HANDLING ---
        if (error.code === 'ENOENT') {
            console.error(`
[FATAL ERROR] Image file not found at "${error.path}"`);
            console.error('  Please make sure the image file exists before running the script.');
            process.exit(1);
        }
        console.error('

[FATAL ERROR] An error occurred during the operation.');
        console.error('  Message:', error.message);
        if (error.status) {
            console.error('  API Status:', error.status);
        }
        if (error.details) {
            console.error('  Details:', JSON.stringify(error.details, null, 2));
        }
        if (error.stack) {
            console.error('  Stack:', error.stack);
        }
        console.error('
  Possible causes: Invalid token, network issues, or an API service problem.');
        process.exit(1); // Exit with a non-zero code to indicate failure.
    }
}

main().catch(console.error);

Expected Output

Assuming you have an `image.png` file showing a can of Campbell's Tomato Soup, the model will analyze the image and provide a textual description.

--- FINAL RESULT ---
🖼️ Media > ./image.png
👤 User > What is in this image? Describe it in a single, detailed sentence.
🤖 Vision Assistant > The image displays a modern, glossy logo featuring a large, three-dimensional 'M' symbol stylized as a flowing ribbon, transitioning from vibrant blue on the left to warm orange on the right, casting subtle shadows to enhance its depth, all positioned above the white, lowercase, sans-serif text "merci-sdk" against a solid dark navy blue background.
--------------------

Prerequisite

Before running, you must create an image file named image.png in the same directory as the script, or change the imagePath variable to point to an existing image.