Communication & Messagingintermediate
November 11, 2025
6 min read
45 minutes
Build a Voice-Powered Email Assistant That Works Through WhatsApp
Build an AI-powered voice email assistant with n8n and WhatsApp. Send, draft, and manage emails hands-free using voice commands.
By Mahedi Hasan Nadvee

Picture this: you're driving, walking the dog, or making coffee, and you suddenly remember that important email you need to send. Instead of fumbling with your phone's tiny keyboard, you simply open WhatsApp and say "Send an email to Sarah about the Q4 report." Done. Your AI assistant handles the rest.
That's not science fiction. It's what this n8n workflow makes possible today.
The problem is simple but universal: email management eats up hours of our day, and typing on mobile devices is painfully slow. Whether you're a busy founder, a remote team manager, or someone who just values their time, the friction between thought and action is real.
This workflow solves that by turning WhatsApp into your personal email command center. Send voice messages to draft emails, retrieve inbox content, or fire off quick replies. No typing required. The AI understands your intent, connects to your Gmail, looks up contacts from your database, and executes the action. Then it confirms everything back to you, also via voice if you prefer.
What You'll Need to Get Started
Before diving into the build, make sure you have these accounts and credentials ready:
- WhatsApp Business Cloud API access with a registered phone number and configured webhook
- Google Gemini API key for AI chat and voice transcription
- Gmail or Google Workspace account with OAuth2 credentials configured
- Airtable account for contact lookups and data storage
- OpenAI API key for text-to-speech responses
- n8n instance running with HTTPS enabled
The setup takes about 30 to 60 minutes if you have all credentials handy. The payoff? A hands-free email assistant that actually understands context.
Key Components in This Workflow
This automation is built with several powerful n8n nodes working in harmony:
- WhatsApp Trigger and Business Cloud nodes handle message reception and media retrieval
- Google Gemini transcribes voice messages and powers the AI agent's decision-making
- Switch and If nodes route messages based on type and determine response format
- Gmail Tool nodes send emails, create drafts, and retrieve inbox content
- Airtable Tool looks up contact information and email addresses
- OpenAI TTS node generates voice confirmations
- HTTP Request and Code nodes handle file downloads and MIME type conversions
Each piece serves a specific purpose, and together they create a seamless experience that feels almost magical.
How to Build This Voice Email Assistant
Step 1: Set Up WhatsApp Message Reception
Start by configuring the WhatsApp Trigger node to listen for incoming messages. This webhook connects to Meta's WhatsApp Business Cloud API and captures every message sent to your business number.
Add a Split Out node immediately after to handle edge cases where multiple messages arrive in a single payload. Then connect a Switch node that examines the message type. Text messages flow one way, audio messages another. This branching logic is crucial because voice and text need completely different processing paths.
For audio messages, chain together the WhatsApp Business Cloud node to retrieve the media ID, followed by an HTTP Request node to actually download the audio file. This two-step dance is required by WhatsApp's API architecture.
Step 2: Transcribe Voice Messages with AI
Connect the downloaded audio to a Google Gemini node configured for audio transcription. Use the gemini-2.5-pro model, which handles voice transcription with impressive accuracy across different accents and audio quality levels.
The output arrives as raw text. Add an Edit Fields node to clean and structure this transcription into a consistent format. This standardization ensures that whether the user sent text or voice, your AI agent receives the same clean input structure.
Step 3: Build the Email Intelligence Layer
This is where the magic happens. Create an Email Agent node and connect it to a Google Gemini Chat Model. In the system prompt, define the agent's personality and capabilities. Keep it strategic and action-oriented.
The prompt should instruct the AI to validate message content, format emails in HTML, ask clarifying questions when needed, and respond directly without unnecessary explanations. Think of this as programming the agent's personality and work ethic.
Connect three tools to this agent: Send Email for immediate sends, Create Draft for emails that need review, and Get many messages in Gmail for retrieving inbox content. Add the Airtable Tool node to enable contact lookups. When someone says "email John," the agent searches your Airtable contact database and retrieves the actual email address automatically.
image_1.png
The agent uses special expressions like $fromAI to extract structured data from its responses: email addresses, subject lines, and message bodies. This bridges the gap between natural language and executable actions.
Step 4: Generate Intelligent Responses
After the Email Agent completes its task, add an If node to determine response format. This checks whether the original message was voice or text.
For voice responses, connect an OpenAI TTS node with the nova voice model. It generates natural-sounding speech from the confirmation message. Insert a Code node next that fixes a common MIME type issue, converting audio/mp3 to audio/mpeg for WhatsApp compatibility.
Chain two WhatsApp Business Cloud nodes: one to upload the audio file and get a media ID, another to actually send the message with that ID. For text responses, route directly to a WhatsApp Business Cloud node that sends the confirmation as plain text.
Step 5: Test and Refine the Flow
Activate the workflow and send a text message like "Send an email to Sarah for meeting." Watch each node execute. The agent should generate a professional email, and send the email to Sarah.
image_2.png
image_3.png
If something breaks, check your API credentials first, then verify webhook configurations. The most common issues involve OAuth scopes for Gmail or incorrect WhatsApp phone number IDs.
Why This Workflow Changes Everything
The real power here isn't just automation. It's the removal of friction from a task everyone does daily. Drafting emails by voice while commuting saves 15 minutes per email compared to typing on mobile. Multiply that across a week, and you've reclaimed hours.
For remote teams, this becomes a universal interface. Team members in different time zones can fire instructions to the email agent without switching contexts. Sales professionals can respond to leads immediately after client calls. Executives can clear their inbox during morning walks.
The Airtable integration adds memory and context. Your agent knows who people are, what company they work for, and relevant details. It's not just processing commands, it's understanding relationships.
Beyond individual productivity, consider the accessibility implications. Voice-first interfaces open email management to people who struggle with traditional keyboards or screens. The workflow naturally accommodates different working styles and physical abilities.
Taking It Further
This workflow is a foundation, not a ceiling. You could extend it to schedule meetings by integrating calendar APIs, add sentiment analysis to prioritize urgent messages, or connect to CRM systems for automatic contact logging.
The Simple Memory node is included but disabled in this template. Activate it to give your agent conversation history, allowing multi-turn interactions like "Send that email" after discussing it over several messages.
Some users might add language detection to automatically respond in the same language as the incoming message. Others might integrate document storage to attach files from Google Drive or Dropbox based on voice commands.
The architecture handles all of this because it's built on flexible, composable nodes. Each addition is just another tool in the agent's toolkit.
Start Building Your Assistant Today
Email shouldn't chain you to a desk or force you to type on tiny screens. With n8n, WhatsApp, and a handful of AI services, you can build an assistant that works the way you actually work: by talking.
The workflow is ready to deploy. The tools are accessible. The only question is what you'll do with the time you get back.
Share this article
Help others discover this content
Tap and hold the link button above to access your device's native sharing options
More in Communication & Messaging
Continue exploring workflows in this category

Communication & Messagingintermediate
1 min read
AI-Powered LinkedIn Engagement Automator with Human Review & Multilingual Support
Nayma Sultana
Nov 13
Est: 45 minutes

Communication & Messagingintermediate
1 min read
Stop Drowning in Support Tickets: How AI Automation Transforms Jira Ticket Management
Nayma Sultana
Nov 12
Est: 50 minutes

Communication & Messagingintermediate
1 min read
Build a Self-Learning AI Customer Support Email System with n8n
Nayma Sultana
Nov 11
Est: 50 minutes"