The CTO's AI Readiness Checklist: 10 Questions Before You Build
Oscar Gallo
Published on April 7, 2026
Every CTO I talk to wants to build AI. Most aren't ready. Not because they lack ambition or budget. Because they haven't asked the hard questions first. Here are the 10 questions that separate the teams that ship from the teams that stall.
Six months ago, a Series B CTO called me in a panic. His board had approved a $400K AI initiative. His team had picked a model, spun up a vector database, and started building a customer support copilot. Three months in, they had a demo that impressed the board and a production system that impressed nobody.
The embeddings were noisy. The retrieval was slow. The answers hallucinated company policies that didn't exist. His engineers weren't bad. They were building without a baseline. Nobody had stopped to ask whether the foundation was there.
We spent the first week of my engagement doing what should've been done before a single line of code was written: a readiness assessment. Within five days, we'd identified three critical gaps (data quality, evaluation methodology, and cost modeling) that would've burned another $200K before anyone noticed.
That experience is why I built this checklist. It's the exact set of questions I run through in the first week of every fractional CTO engagement. If you answer these honestly, you'll know whether you're ready to build, or whether you need to do some foundational work first.
Why Readiness Matters More Than Speed
There's a stat that gets thrown around: 85% of AI projects fail. The number varies depending on the source, but the pattern is real. And the failures almost never trace back to the model. They trace back to everything around the model. The data, the infrastructure, the team, the expectations.
The companies that ship AI fastest aren't the ones that start coding first. They're the ones that pause long enough to make sure they're building on solid ground. A two-week readiness assessment saves you from a six-month rebuild.
I've seen this enough times now that I can predict which projects will ship and which will stall within the first conversation. The difference almost always comes down to whether the team has asked, and honestly answered, the questions below.
The 10 Questions
1. Do you have a specific business problem, or an AI solution looking for a problem?
This is question one for a reason. It kills more AI projects than any technical challenge.
"We should add AI to our product" is not a business problem. "Our support team spends 40% of their time answering questions that are already in our documentation." That's a business problem. One has a measurable outcome. The other is a technology looking for a reason to exist.
The test is simple: can you articulate the business metric that AI will move? Revenue, cost reduction, time savings, conversion rate, churn. Pick one. If you can't name the metric, you don't have a project. You have an experiment. Experiments are fine, but they get a different budget and a different timeline.
Every AI project I've shipped started with a number. "Reduce support ticket volume by 30%." "Cut content production costs by half." "Increase lead qualification accuracy from 60% to 85%." The number forces clarity. It forces you to define what success looks like before you start building.
If your answer is "we want to explore what AI does for us," that's honest, and it's okay. But call it what it is: a discovery phase, not a product initiative.
2. Is your data accessible, clean, and sufficient?
AI doesn't run on ambition. It runs on data. And most companies dramatically overestimate their data readiness.
Here's what I ask when I audit a company's data posture:
- Accessible: Can your engineers query the data they need through an API or database, or is it trapped in spreadsheets, PDFs, and someone's email inbox? If getting the data requires a manual export and a Slack message to finance, you have an access problem.
- Clean: Is the data consistent, labeled, and structured? Or are you dealing with duplicate records, missing fields, and three different naming conventions for the same customer? RAG on dirty data gives you confident, wrong answers, which is worse than no answers at all.
- Sufficient: Do you have enough data to solve the problem? For a RAG system, that might mean comprehensive documentation. For a fine-tuned model, it might mean thousands of labeled examples. For a classification task, it might mean balanced training sets across every category.
Most teams I work with score themselves a 7 out of 10 on data readiness. After the audit, they're usually a 4. That gap kills AI projects.
The fix isn't complicated, but it takes time. Budget for it. A data cleanup sprint before you write your first prompt is the highest-ROI week you'll spend.
3. Have you defined what "good enough" looks like?
AI outputs are probabilistic. They're not deterministic like traditional software. A function either returns the right answer or it doesn't. An LLM returns an answer that's right most of the time, almost right some of the time, and confidently wrong occasionally.
If your team expects 100% accuracy, you'll never ship. The question isn't "will the AI be perfect?" It's "what error rate is acceptable, and what happens when it's wrong?"
Define these before you build:
- Accuracy threshold. What percentage of correct responses makes this useful? 90%? 95%? 80%? The answer depends on the stakes. A product recommendation engine can tolerate more errors than a medical triage system.
- Failure mode handling. When the AI is wrong, what happens? Does a human review it? Does the system flag low-confidence answers? Does it gracefully decline to answer?
- Edge case strategy. What inputs will the AI handle poorly? Every system has blind spots. Knowing yours in advance lets you build guardrails instead of apologizing to customers.
The teams that ship AI quickly are the ones that define "good enough" early and iterate from there. Waiting for perfection means you never ship.
4. Does your team have the right skills, or the right support?
You don't need a machine learning PhD to ship AI products. What you need are engineers who understand APIs, can build data pipelines, and know how to evaluate whether something is working.
Here's the honest skills assessment:
- Can your engineers work with APIs and handle async workflows? If yes, they can integrate LLMs.
- Does someone on the team understand data pipelines? Ingestion, transformation, storage. The plumbing that feeds your AI system.
- Can your team design and run evaluations? This is the gap I see most often. Building the AI feature is one thing. Knowing whether it's working is another skill entirely.
If your team is strong on general engineering but AI-inexperienced, that's not a blocker. It's a coaching problem. The fastest path is pairing your engineers with someone who's shipped AI before, not replacing them. I wrote about this model in detail in my post on fractional CTO work for AI.
The worst move is hiring three junior "AI engineers" and hoping they figure it out. Senior guidance on the architecture decisions upfront saves months of wandering.
5. What's your build vs. buy decision framework?
Most teams get this wrong in one of two directions. They build custom infrastructure for problems that a $20/month API solves. Or they buy a platform that locks them into someone else's limitations when they should own the core logic.
Here's the framework I use:
Build when:
- AI is your core competitive advantage
- You need control over the model, the data pipeline, and the iteration speed
- Off-the-shelf solutions can't handle your specific data or domain
- You have the engineering capacity to maintain what you build
Buy when:
- AI is a utility feature, not your product's differentiator
- Time to market matters more than customization
- The vendor's solution covers 80%+ of your requirements
- You don't want to hire a team to maintain AI infrastructure
Hybrid when:
- You use a vendor API (OpenAI, Anthropic, etc.) but own the orchestration layer: your prompts, your retrieval logic, your evaluation pipeline
Most companies I work with land in the hybrid zone. They use foundation model APIs but build their own RAG pipeline, their own evaluation framework, and their own prompt management system. That gives you the speed of not training your own model with the control of owning everything around it.
6. Have you scoped a pilot that can ship in 30 days?
If your first AI project is a six-month initiative with twelve stakeholders and a 40-page requirements doc, it's going to fail. Not because the idea is bad. The feedback loop is too long.
The right first project has three properties:
- Narrow. It solves one specific problem for one specific user group. "AI-powered customer support" is not narrow. "Auto-draft responses for password reset tickets" is narrow.
- Measurable. You can quantify the impact within two weeks of launch. Ticket deflection rate. Time saved per query. Accuracy on a held-out test set.
- Deliverable in 30 days. If you can't ship it in a month, your scope is too big. Cut features until you can.
The 30-day pilot does two things. First, it proves feasibility. Does AI solve this problem with your data, your infrastructure, your constraints? Second, it builds organizational confidence. Nothing sells the next AI investment like a working product that moved a metric.
I've watched teams spend nine months on an AI strategy document. I've also watched teams ship a working prototype in three weeks that made the strategy conversation irrelevant. Shipping teaches you more than planning ever will.
7. Do you have an evaluation framework?
This is the question that separates teams that ship from teams that demo.
"It looks good" is not an evaluation. "The CEO liked the output" is not an evaluation. You need a systematic way to measure whether your AI system is working, and you need it before you start building, not after.
Here's what a real evaluation framework includes:
- A test dataset. A curated set of inputs with known-good outputs that you run every model change against. This is your regression suite for AI.
- Quantitative metrics. Accuracy, precision, recall, F1, whatever matters for your use case. For RAG systems, add retrieval relevance and answer groundedness.
- Human review loops. Automated metrics catch the obvious failures. Human reviewers catch the subtle ones: tone issues, factual errors the metrics miss, responses that are technically correct but practically useless.
- A/B testing infrastructure. Once you're in production, you need to compare model versions, prompt changes, and retrieval strategies against real user behavior.
If you're building without eval, you have no way to measure progress. You'll ship something, get negative feedback, make changes based on anecdotes instead of data, and oscillate between "it's amazing" and "it's broken" without ever knowing which is true.
Build your eval suite first. Then build your product. It sounds backwards, but it's the fastest path to production.
8. What's your cost model for inference at scale?
The demo costs $0.02 per request. You show it to the board, everyone's excited, and you greenlight production. Then you launch to 10,000 users and your monthly OpenAI bill hits $25,000.
I've seen this exact scenario three times in the last year. The fix is modeling your costs before you scale, not after.
Things to account for:
- Token costs at real volume. How many requests per day? What's the average input/output length? Multiply by your model's per-token price. Then add a 2x buffer because usage always exceeds projections.
- Model selection economics. GPT-4o costs roughly 15x more than GPT-4o-mini per token. For many tasks, the smaller model is good enough. For some, it's not. Test both and know the accuracy-cost tradeoff.
- Caching and deduplication. If 30% of your queries are variations of the same question, a semantic cache can cut your costs by 30%. Simple wins compound fast.
- Embedding and retrieval costs. If you're running a RAG system, you're paying for embeddings on every query and every document update. These add up at scale.
- Infrastructure costs. Vector database hosting, GPU instances for self-hosted models, monitoring and logging tools. The model API is one line on the bill.
The teams that control AI costs build the cost model before they pick the model. They know their per-query budget and work backwards from there.
9. How will you handle AI's failure modes?
AI will hallucinate. It will give confidently wrong answers. It will work perfectly 95% of the time and catastrophically fail on the 5% that matters most. The question isn't whether this will happen. It's what happens when it does.
Every production AI system needs three things:
Guardrails. What can the AI not do? If it's a customer-facing system, can it promise discounts? Share pricing that isn't public? Give legal or medical advice? Define the boundaries explicitly and enforce them with system prompts, output filters, or both.
Confidence signals. Not all AI outputs are created equal. If your system can flag low-confidence responses and route them to a human, you've solved 80% of the hallucination problem. The worst AI systems present every answer with equal confidence.
Monitoring. You need to know when things go wrong in production, not when a customer complains on Twitter. Track response quality over time. Set up alerts for anomalies. Sample and review outputs regularly. AI systems degrade silently. If you're not watching, you won't notice until the damage is done.
The difference between a liability and a useful tool is the failure handling. Build it into the architecture from day one, not as an afterthought in month three.
10. Do you have executive alignment on timeline, cost, and expectations?
The most technically ready team in the world will still fail if the CEO expects a production system in four weeks and the real timeline is four months.
Misaligned expectations kill more AI projects than bad architecture. I've watched CTOs get fired not because they built the wrong thing, but because they didn't manage the narrative around what AI can realistically deliver and when.
Before you start building, align on three things:
- Timeline. A realistic pilot takes 4-8 weeks. A production-grade system takes 3-6 months. If your board expects results in two weeks, either scope down dramatically or reset expectations now. Doing it later is harder and more expensive.
- Cost. AI isn't free after the initial build. There are ongoing inference costs, maintenance, and iteration. Make sure your budget accounts for the run cost, not only the build cost.
- What "v1" looks like. The first version will not do everything. Define what it will do, what it won't, and what comes in v2. Write it down. Get sign-off. Refer back to it when scope creep starts, and it will start.
The CTO's job is to build organizational understanding of what AI does, how long it takes, and what it costs. Building the system is only part of it. The teams that ship are the ones where leadership and engineering share the same mental model.
Scoring Yourself
Go through the 10 questions. For each one, give yourself an honest score:
- Clear yes: You've addressed this and have evidence to back it up.
- Partial: You've thought about it, but there are gaps or assumptions.
- No: You haven't addressed this, or you don't know the answer.
8-10 clear answers: You're ready. Start building. Your foundation is solid and you'll be debugging real problems, not foundational ones.
5-7 clear answers: You have gaps. Identify the weakest 2-3 areas and address them before committing serious engineering resources. This is a two-to-four week exercise, not a six-month delay.
Below 5: You need foundational work first. That's not failure. That's maturity. The worst thing you can do is start building on a shaky foundation and discover the cracks at scale.
Most teams that come to me are in the 5-7 range. They have strong engineering, good instincts, and two or three blind spots that would've cost them months. Closing those gaps before you build is the highest-leverage work I do.
What Comes After Readiness
Once you've answered these questions honestly, the path forward gets a lot clearer. You know your strengths. You know your gaps. You know what needs to happen before code gets written.
For teams that scored high, the next step is straightforward: scope a 30-day pilot, build your evaluation framework, and start shipping.
For teams with gaps, the next step is closing them. Sometimes that means a data cleanup sprint. Sometimes it means a cost modeling exercise. And sometimes it means bringing in someone who's closed these exact gaps before. Someone who compresses three months of learning into three weeks.
That's the model I described in my post on fractional CTO work for AI. It's not the right fit for everyone, but for teams that are strong on engineering and short on AI-specific experience, it's the fastest way to get from "almost ready" to "shipping."