Custom Instructions & Knowledge Bases: Train Your AI on Your Data
Ai

Custom Instructions & Knowledge Bases: Train Your AI on Your Data

By Sawan Kumar
Share:
0 views
Last updated:

Quick Answer

Custom instructions shape how ChatGPT responds. Knowledge bases let AI reference your data. Vector databases enable production RAG systems.

Key Takeaways

  • 1Custom instructions take 15 minutes and are free
  • 2Custom GPTs with your best past work take 4 hours and cost $20/mo
  • 3Vector databases enable production RAG but require engineering effort

Custom Instructions & Knowledge Bases: Train Your AI on Your Data

Generic AI doesn't know your business, your voice, or your constraints. Custom instructions and knowledge bases make AI yours. Add your brand guidelines, past work, customer data—now every AI response is tailored.

Three Levels (Easy to Advanced)**

Level 1: Custom Instructions (Easiest, Free)**

What it does: You write a system prompt. Every time you use ChatGPT, it remembers these instructions. "Answer like a direct-response copywriter," "Always cite sources," "Use metric tons, not pounds."

How to set up:**

  1. ChatGPT Settings → Custom instructions
  2. Write 2 sections:
    • "What would you like the AI to know about you?" (You're a B2B marketer targeting founders, you use metric-driven language, you avoid buzzwords)
    • "How would you like the AI to respond?" (Cite sources, write for email first, format as bullet points, flag assumptions)
  3. Save. Done.

Real example:**

"What would you like ChatGPT to know about you?"
"I'm a direct-response copywriter and marketing strategist. I work with B2B SaaS founders. I focus on conversion metrics, not vanity metrics. I think Gary Halbert and Alex Hormozi are the best copywriters alive."

"How would you like ChatGPT to respond?"
"Write copy for email, not blog posts. Lead with a specific benefit, not a hook. Include a single strong CTA. Avoid buzzwords: revolutionary, disruptive, game-changing, leverage, synergy, AI-powered. If you make a claim, cite evidence. Flag assumptions."

Result: Every response is now tailored to your voice and constraints.

Level 2: Knowledge Base (Medium, Paid)**

What it does: You upload your own files (PDFs, docs, blog posts, past emails). ChatGPT and Claude can now reference your data when answering.

Tools:**

  • ChatGPT + Custom GPT (paid): Upload docs, create a custom GPT. Share with your team or public.
  • Claude + Projects (paid): Upload docs. Claude references them in context window.
  • Notion AI (paid): Your entire Notion workspace becomes context.
  • LlamaIndex (open-source): Build your own knowledge base retrieval system.

Real example:**

Upload: 50 past email campaigns (your best work), 100 LinkedIn posts (your voice), company style guide, brand guidelines, customer case studies.

Now ask ChatGPT: "Write an email to a founder on the fence about my coaching. Use my voice and past examples."

Result: ChatGPT pulls patterns from your past work and generates an email that sounds like you.

Level 3: Vector Database + Embeddings (Advanced, Production)**

What it does: Your data is converted to vectors (mathematical representations). When you ask a question, the system finds the most relevant data and feeds it to the AI. This is RAG (Retrieval-Augmented Generation).

Tools:**

  • Pinecone: Managed vector database. Easiest to get started.
  • Weaviate: Open-source vector DB. More control.
  • Supabase + pgvector: Vector storage on top of PostgreSQL.
  • Chroma: Lightweight, local vector store.

Real example:**

You have 1,000 customer support conversations. You want every new support AI response to be grounded in past conversations.

  1. Convert all 1,000 conversations to vectors (one-time setup, 1 hour)
  2. When a new customer question arrives, convert it to a vector
  3. Find the 5 most similar past conversations
  4. Feed those + the new question to ChatGPT: "Based on these similar past conversations, answer this new question."
  5. Result: Your AI now knows your support patterns and context.

Implementation Paths**

Path 1: Personal Use (1-2 hours setup)**

ChatGPT custom instructions + your 10 best past examples as a simple prompt.

Setup:

  1. Write custom instructions (15 min)
  2. Collect 10 past emails/posts you love in a text file (30 min)
  3. Paste into ChatGPT prompt: "Here are 10 examples of my voice: [examples]. Now [task]." (2 min per task)

Result: Your AI outputs sound like you.

Path 2: Team Use (4-8 hours setup)**

Create a Custom GPT with your brand guidelines and best past work. Share with your team.

Setup:**

  1. Write brand guidelines doc (1 hour)
  2. Collect 50–100 past emails, posts, designs (2 hours)
  3. Create Custom GPT: upload files, write system prompt (2 hours)
  4. Share with team link (instant)

Cost: $20/mo (ChatGPT Plus)

Result: Every team member now has access to your voice and data. Consistency across all outputs.

Path 3: Production Use (1-2 weeks setup, ongoing maintenance)**

Build a RAG system. Users ask questions; AI retrieves relevant context and answers.

Setup:**

  1. Collect and organize data (1 week)
  2. Set up vector database (Pinecone: 2 hours, or self-hosted Chroma: 4 hours)
  3. Build the retrieval pipeline (engineer: 1-2 days)
  4. Integrate with your app (engineer: 2-3 days)
  5. Test and monitor (ongoing)

Cost: $0–200/mo depending on volume (Pinecone) + engineering time

Result: Production AI system grounded in your data. Scales to millions of questions.

What Data to Include**

Best Data (High Signal)**

  • Past work you're proud of (best emails, posts, case studies)
  • Customer conversations (support, sales, feedback)
  • Internal docs (strategy, frameworks, guidelines)
  • Testimonials and case studies (proof of your approach)
  • Research and data (your unique insights)

Worst Data (Low Signal)**

  • Random content you didn't write
  • Outdated or contradictory docs
  • Sensitive information (passwords, customer PII—redact first)
  • Broken links or corrupted files
  • Too much data is worse than too little. If you upload 10,000 docs, the AI gets confused. Curate to 100–500 best examples.
  • Quality over quantity. 50 excellent emails beat 500 mediocre ones.
  • Update periodically. If you're not refreshing the data, it gets stale. Audit quarterly.
  • Watch for contradictions. If your data has conflicting advice ("be formal" + "be casual"), the AI will be confused. Resolve first.
  • PII is dangerous. Before uploading, redact customer names, email addresses, and sensitive info. Legal will thank you.

The Real Workflow**

Week 1: Set up custom instructions (free, 15 min). Test for a week.

Week 2–3: If working, create a Custom GPT with your best past work (4 hours, $20/mo). Share with team.

Month 2+: If custom GPT solves your problem, stay there. If you need production scale, build a RAG system (engineer time).

Don't over-engineer. Start simple. Upgrade when you hit the ceiling.

Want to build a custom AI system for your business? Email [email protected] for knowledge base strategy and implementation.

Frequently Asked Questions

Tags:
custom instructions
knowledge base
RAG
AI personalization
BestsellerRecommended for you

📚 Mastering AI with ChatGPT, Gemini & 25+ AI Tools

Create content, automate marketing, and transform your business using ChatGPT and 25+ AI tools. Trusted by 45,000+ students.

FreeMini-Course

Want to master Ai ?

Get free access to our mini-course and start learning with step-by-step video lessons from Sawan Kumar. Join 115,000+ students already learning.

No spam, ever. Unsubscribe anytime.

Bestseller

Mastering AI with ChatGPT, Gemini & 25+ AI Tools

Create content, automate marketing, and transform your business using ChatGPT and 25+ AI tools. Trusted by 45,000+ students.

$49$199
Enroll Now →

30-day money-back guarantee

Free Strategy Call

Want personalised help with Ai ?

Book a free 30-min call with Sawan — no pitch, just clarity.

Book a Free Call

115,000+ students trained