Build with AI May 21, 2026 By PatientCopilot Team

Training Your AI Bot on Your Practice's Knowledge Base — A Step-by-Step Guide

An untrained AI bot embarrasses your practice. Here is what to train on, what to never train on, and how to keep the knowledge base from going stale.

The single most common reason an AI bot fails in a practice is not the model — it is the knowledge base behind the model. A generic AI with no practice-specific knowledge will confidently tell patients the wrong hours, quote the wrong prices, and invent services you do not offer. The fix is not a fancier model. It is a well-built, well-maintained Knowledge Base.

This guide walks through what to put in your KB, what to keep out of it, and how to keep it from rotting.

Why an untrained bot fails

Out of the box, a large language model knows general medicine, general business hours conventions, and general scheduling logic. It does not know that your practice closes at 3pm on Fridays, that your new-patient exam is $89, that your providers do not treat patients under 12, or that your insurance acceptance list is what it is.

Ask an untrained bot “do you accept Aetna?” and you will get a confident “yes, we accept Aetna” — because the model is pattern-matching on what dental and chiropractic practices typically accept. The patient shows up, you do not take Aetna, the patient is upset, the front desk eats the cleanup.

Retrieval-Augmented Generation (RAG) is the fix. The Knowledge Base sits between the patient question and the model: the question gets used to retrieve the relevant snippets from your KB, the model sees those snippets, and the model answers from them rather than from its general prior knowledge.

What to train on

Build the KB in this order. Each section is its own retrievable unit.

1. Services and pricing. Every service you offer, with name, brief description, duration, price (or “call for pricing” if you do not publish), and any prerequisites. This is the single highest-traffic category of questions.

2. Hours and locations. Each location’s address, phone, hours including holiday exceptions, parking notes, accessibility, and provider availability per location.

3. Insurance and payment. The exact list of accepted insurance plans. Self-pay rates. Payment plans. HSA/FSA acceptance. What patients should bring at check-in.

4. Provider profiles. Each provider’s name, credentials, specialties, what they do and do not treat, languages spoken, and any patient-facing bio content already on your website.

5. New-patient intake instructions. What to expect on the first visit, how long it will take, what to wear, what to bring, what forms to complete in advance. This category catches the “what should I expect?” anxiety questions.

6. Common FAQs. Pull from your front desk’s actual recurring questions. Things like “do you treat kids?” “do you offer telehealth?” “can I bring a parent into the room?” “do you take walk-ins?” Most practices have 30-50 of these and have never written them down.

7. Policies. Cancellation, no-show, late-arrival, refund. Patients ask these constantly. Having the policy in the KB means the bot quotes it accurately instead of guessing.

8. Conditions and treatment basics. Educational content about the conditions you commonly treat. Helps the bot answer “what does adjustment do for sciatica?” without overstepping into medical advice.

What NOT to train on

This part matters more than the inclusion list. The KB is queryable, and anything in it can surface in a patient-facing conversation. Therefore:

No PHI. No patient records, no charts, no appointment histories with identifying details. The KB is for practice knowledge, not patient knowledge. Patient-specific data belongs in your EHR/PM with access controls, not in a RAG store.
No staff personal info. Provider home phones, personal emails, payroll info, anything HR.
No financial internals. Margins, revenue, internal pricing analysis, vendor contracts.
No internal-only operational notes. “Provider X is leaving in March” — do not put that in a KB the bot can read until it is publicly announced.
No outdated content marked “do not use.” If it is in the KB and not flagged as deprecated, the retrieval system might surface it. Delete, do not annotate.

The four sources to train from

Inside Agent Studio’s Knowledge Bases, content comes from four ingestion sources:

1. Web crawler. Point it at your live website. The crawler ingests your services pages, FAQ, about, locations, etc. Fast way to bootstrap a KB from existing public content. Re-crawl when the site changes.

2. Documents. Upload PDFs, Word docs, and text files — internal SOPs, intake instruction sheets, policy documents. Useful for content that lives in your operations binder but is not on the website.

3. Rich-text entries. Manually written KB entries inside the platform. Best for fast iteration — when a front desk question keeps coming up, write a 100-word rich-text entry, save, done. The KB is searchable in 30 seconds.

4. Google Sheets. Connect a sheet for structured, frequently-changing data (provider schedules, current pricing, insurance acceptance lists). The KB re-syncs on a schedule, so updates in the sheet flow into the bot without re-deploying anything.

The right mix is usually: web crawler for the bulk, rich-text for FAQs and policies, Google Sheets for anything that changes weekly, documents for everything else.

How the Retrieval Tester audits gaps

The Retrieval Tester is the most underused tool in the platform. It lets you type a question — exactly as a patient would phrase it — and see what KB chunks the retrieval system returns, ranked by relevance score.

Use it like this:

Pull a list of 30-50 real patient questions from your front desk (or from your messaging history if you already have a bot running).
Run each through the Retrieval Tester.
Flag any question where the top result is wrong, irrelevant, or thin.
For each gap, decide: is the answer missing from the KB entirely, or is it there but poorly indexed? Add or rewrite as needed.

Run this audit monthly for the first quarter, then quarterly. Practices that skip the Retrieval Tester end up with bots that “work great in testing” and fail on real patient questions because the test set and the real questions did not match.

Refresh and maintenance cadence

The KB rots if you do not maintain it. Build the rhythm in:

Weekly: Check the Google Sheets sources for changes (new insurance plans accepted, provider schedule adjustments). Re-sync.
Monthly: Re-crawl the website. Run the Retrieval Tester against a saved test set of 20-30 patient questions and review any regressions.
Quarterly: Audit the FAQ entries against actual front-desk question patterns. Add new entries for emerging questions. Archive obsolete ones.
On any operational change: New provider, new service, price change, hours change, new location, new insurance dropped or added → update the KB the same day.

Treat the KB as a living asset, not a one-time setup. The practices that get the most out of AI are the ones where someone owns the KB.

Common mistakes

Dumping the entire website blindly. Pages with thin content (privacy policy, generic SEO landers) pollute retrieval. Curate.
Writing entries that mix three topics. Each KB entry should answer one question or describe one thing. Long mixed entries return lower-quality retrievals.
Skipping rewriting for clarity. If your website says “we leverage advanced biomechanical analysis,” patients are not searching for that phrasing. Add plain-language equivalents.
No ownership. If the KB does not have an owner, it will not get maintained. Assign one person.

Where to go next

Build your first KB inside Knowledge Bases, then connect it to an agent in Agent Studio. For practices using SMS as the primary patient channel, see AI Text Messaging for how the KB feeds into the Conversation AI loop.

Tags:

#Knowledge Base #RAG #AI training #Conversation AI

Found this helpful?

Share it with someone who needs to read this.

Share on Facebook Share on X

PatientCopilot Team

Editorial Team

View full profile →

Ready to Get Started?

Start Your Free Trial Call (323) 826-7690

Training Your AI Bot on Your Practice's Knowledge Base — A Step-by-Step Guide

Why an untrained bot fails

What to train on

What NOT to train on

The four sources to train from

How the Retrieval Tester audits gaps

Refresh and maintenance cadence

Common mistakes

Where to go next

Ready to Get Started?

Healthcare Workflow Automation: 20 Templates to Steal

HIPAA-Compliant Patient Intake: What Actually Matters in 2026

HIPAA Texting Rules: What Healthcare Practices Can and Cannot Send

Ready to Get Started?

Why an untrained bot fails

What to train on

What NOT to train on

The four sources to train from

How the Retrieval Tester audits gaps

Refresh and maintenance cadence

Common mistakes

Where to go next

Ready to Get Started?

More from Our Blog

Healthcare Workflow Automation: 20 Templates to Steal

HIPAA-Compliant Patient Intake: What Actually Matters in 2026

HIPAA Texting Rules: What Healthcare Practices Can and Cannot Send

Ready to Get Started?