How To Classify Gmail Messages With LLMs (v2)

2025-04-27

Back in February 2024 I posted a quick hack for filtering Gmail with GPT-3.5. It worked, but over the last year I kept pushing the envelope: fewer false‑positives, lower token costs, and zero inbox clutter.

Today's post walks through v2. You'll get:

Structured‑output JSON instead of fragile Yes/No parsing.
Four laser‑focused labels (AI/Newsletter, AI/Marketing, AI/Transactional, AI/Spam).
Automatic archiving of low‑priority mail while true action‑items stay front‑and‑center (starred, not hidden).
Domain‑level caching so repeat senders don't cost additional OpenAI calls.
A few‑shot prompt that bumps accuracy ~7 pp on edge‑cases but only adds pennies per month.

If you already set up the original script you'll be done in <10 minutes. New readers can follow along from scratch.

What changed since v1?

Area	v1 (Feb 2024)	v2 (Apr 2025)
Model prompt	System + user turns → Yes/No	JSON‑schema structured output (+ few‑shot examples)
Categories	`Likely Spam`, `Reviewed`	`Newsletter`, `Marketing`, `Transactional`, `Spam`
Inbox behaviour	Anything not spam stayed	Four categories auto‑archived; everything else stays (starred if action‑required)
Caching	None	Domain cache cuts API calls by ~80 %
Cost	≈ 1-2 $ / mo	Similar cost despite better accuracy (fewer calls, efficient prompt)

Step 1 - Clean up & create the new labels

In Gmail's sidebar click “+ Create new label.”
Add the parent AI if it doesn't exist.
Under that, create:
- AI/Newsletter
- AI/Marketing
- AI/Transactional
- AI/Spam
(Optional) Delete the old AI: Likely Spam / AI: Reviewed labels.

Step 2 - Drop‑in replacement Apps Script

Heads‑up: Put your own key & org‑ID in Project Settings ▸ Script Properties so you're not hard‑coding secrets.

/* global GmailApp, UrlFetchApp, PropertiesService */
//--------------------------------------------------
// CONFIG
//--------------------------------------------------
const {
  OPEN_AI_KEY,
  OPEN_AI_ORG,

  // any model that supports structured outputs will work
  OPEN_AI_MODEL = 'gpt-4.1',
} = PropertiesService.getScriptProperties().getProperties();

//--------------------------------------------------
// HELPERS
//--------------------------------------------------
const labelPath   = (name) => `AI/${name}`;
const ensureLabel = (name) =>
  GmailApp.getUserLabelByName(labelPath(name)) ||
  GmailApp.createLabel(labelPath(name));

const getCachedCategory = (d) => PropertiesService.getUserProperties().getProperty(d);
const cacheCategory     = (d, c) => PropertiesService.getUserProperties().setProperty(d, c);

const sanitizeBody = (html) => html
  .replace(/<script[^]*?<\/script>/gi, '')
  .replace(/<style[^]*?<\/style>/gi, '')
  .replace(/<[^>]+>/g, ' ')
  .replace(/\s{2,}/g, ' ')
  .slice(0, 2000);

//--------------------------------------------------
// OPENAI  -  JSON‑schema with few‑shot examples
//--------------------------------------------------
const emailSchema = {
  name: 'email_classification',
  schema: {
    type: 'object',
    required: ['category','action_required','confidence'],
    additionalProperties: false,
    properties: {
      category: {
        type: 'string',
        enum: ['newsletter','marketing','transactional','spam','personal','other'],
      },
      action_required: { type: 'boolean' },
      confidence:      { type: 'number'  },
    },
  },
  strict: true,
};

const classifyWithOpenAI = (body) => {
  const messages = [
    { role: 'developer', content: [{type:'text', text:
`You are an AI email‑triage bot. Return ONLY JSON that matches the schema.

# Schema
{"category":"newsletter|marketing|transactional|spam|personal|other","action_required":<bool>,"confidence":<0‑1>}

# Examples
User: "📰 The Data Dive - Your weekly roundup of analytics tips."
Assistant: {"category":"newsletter","action_required":false,"confidence":0.96}

User: "Limited‑time offer - 50 % off our Pro plan. Click to upgrade!"
Assistant: {"category":"marketing","action_required":false,"confidence":0.94}

User: "Your Amazon order #112 - 3594663 - 483 has shipped."
Assistant: {"category":"transactional","action_required":false,"confidence":0.97}

User: "Congrats! You won $5 000 in crypto—claim now."
Assistant: {"category":"spam","action_required":false,"confidence":0.99}

# Task
Classify the next email and reply with JSON only.`}]},
    { role: 'user', content: [{ type:'text', text: body }] },
  ];

  const res = UrlFetchApp.fetch('https://api.openai.com/v1/chat/completions', {
    method: 'post',
    contentType: 'application/json',
    headers: { Authorization: `Bearer ${OPEN_AI_KEY}`, 'OpenAI-Organization': OPEN_AI_ORG },
    payload: JSON.stringify({
      model: OPEN_AI_MODEL,
      store: false,
      temperature: 0.3,
      top_p: 0.4,
      max_tokens: 64,
      response_format: { type: 'json_schema', json_schema: emailSchema },
      messages,
    }),
  });
  const { choices } = JSON.parse(res.getContentText());
  return JSON.parse(choices[0].message.content);
};

//--------------------------------------------------
// CLASSIFY + ROUTE
//--------------------------------------------------
const classifyThread = (thread) => {
  const msg    = thread.getMessages().pop();
  const from   = msg.getFrom();
  const domain = from.replace(/^.*@/, '').toLowerCase();

  if (GmailApp.search(`to:${from} in:sent`).length) return { category: 'personal' };

  const cached = getCachedCategory(domain);
  if (cached) return { category: cached };

  if (thread.getLabels().some(l => l.getName() === 'CATEGORY_PROMOTIONS'))
    return { category: 'marketing' };

  const result = classifyWithOpenAI(sanitizeBody(msg.getBody()));
  if (['newsletter','marketing','transactional','spam'].includes(result.category))
    cacheCategory(domain, result.category);
  return result;
};

const routeThread = (thread, { category, action_required }) => {
  switch (category) {
    case 'newsletter':
    case 'marketing':
    case 'transactional':
    case 'spam': {
      thread.addLabel(ensureLabel(category.charAt(0).toUpperCase()+category.slice(1)));
      thread.moveToArchive();
      break;
    }
  }
  if (action_required) thread.addStar();
};

function run() {
  GmailApp.getInboxThreads(0, 30).filter(t => t.isUnread()).forEach(t => {
    if (t.getLabels().some(l => l.getName().startsWith('AI/'))) return;
    routeThread(t, classifyThread(t));
  });
}

// One‑time helper
function createLabels() {
  ['Newsletter','Marketing','Transactional','Spam'].forEach(ensureLabel);
}

/* ===== Testing ===== */
const tests = () => {
  run()
}

Step 3 - Triggers (if you're upgrading)

If you already had a time‑driven trigger pointing at run, you're set—no change needed. New users: add a 5‑minute interval trigger, same as in the original post.

Costs, accuracy, and what's next

Accuracy improved ≈ 7 pp F1 on my dataset thanks to the few‑shot examples.
Cost stayed flat (fewer API calls offset larger prompt). I'm still spending <$3/mo.
If price ever becomes an issue, batch 10 emails per call or drop the examples—your call.

TL;DR

Kill the old Likely Spam / Reviewed labels.
Create the four new ones under AI/.
Paste the new script, set the key/org, click Run ➜ createLabels, authorise.
Enjoy an inbox that shows only what matters.