In this lab, you will develop an AI chatbot agent that creates personalized photobooks from a large collection of photos, guided by real-time user feedback.
While Large Language Models (LLMs) are powerful, they face challenges with complex, multi-modal tasks like photobook creation. Most LLMs cannot process 100+ images effectively, and even with smaller sets, they often struggle to capture the context and user intent required for meaningful arrangement.
To overcome these limitations, you’ll adopt an agentic approach—breaking the problem into manageable steps and using the LLM’s strengths within a larger system. Here, the chatbot will orchestrate tools and reasoning steps rather than working in isolation.
By the end of this lab, you will gain hands-on experience in:
- LLMs and Multi-modal LLMs
- Agentic System Design
- Structured Output Generation
- Tool Integration with Gemini Function Calling
Please contact haoyun.feng@arcknow.com for access to the lab.
Leave a comment