AI in Recruitment: Hype vs Reality (A Founder Guide)

Almost every applicant tracking system on the market today says it has AI. Most of them do not, in any sense that would matter to an engineer. They have rule-based automation, keyword extraction, and the occasional sentence embedding model layered on top of a 2014 codebase. The marketing page says AI. The model behind the curtain is the same regular expression that has been there for a decade.

This is the gap between AI hype and AI reality in recruitment. It matters because the hiring outcomes are completely different depending on which one you actually buy.

What real AI in recruitment looks like

Real AI in recruitment means a model that reads a resume, extracts measurable outcomes (revenue generated, teams managed, systems shipped, problems solved), and ranks candidates by relevance to the role. The model produces a score and a written reasoning paragraph that a human can read and disagree with. It does not just count tokens. It does not just compute a vector distance. It interprets what the candidate actually did, the way a senior engineer or operator would. The McKinsey research on talent practices, summarized in The shape of talent in 2023 and 2024, shows skills-based hiring adoption climbed from 40% in 2020 to roughly 60% in 2024, which is exactly what an outcome-reading model is built to support. Hype-AI cannot do skills-based hiring well because it has no concept of skill, only of token presence. Real AI can.

The simplest test is to ask the vendor for the reasoning paragraph behind a score. If they cannot produce one, the score is a black box and the AI is marketing.

Hype AI: the patterns that keep showing up

After looking at most ATS demos in this market, the same handful of patterns get sold as AI when they are something else.

Keyword frequency dressed up as semantic match. The model counts how often Python and AWS appear and calls the result a match score. Add stop-word filtering and a synonym dictionary and it gets called “natural language processing.” The candidate experience is unchanged: stuff your resume with keywords and you win.
Sentence embeddings as a thin wrapper. A more sophisticated version uses pretrained sentence embeddings to compute cosine similarity between the resume and the JD. This catches synonyms, but it is still text-to-text matching. A candidate who copy-pastes the JD into their resume summary gets a high score with zero actual experience.
Chatbot interviewers. A scripted form with an LLM front end. Useful for scheduling and triage. Not screening intelligence.
AI-generated summaries. The model summarizes the resume into bullet points, which is a productivity feature, not a ranking decision. The ranker underneath is still keyword-based.

None of this is fraud. It is just generous labeling. It also happens to be the reason candidates have learned to game ATS so reliably. Jobscan documents the prevalence of keyword stuffing on real resumes, and the pattern only works because the screening systems reward it.

Why the hype is so durable

Three forces keep the gap between AI marketing and AI reality wide open in recruitment software, and worth naming so you can ignore them.

The first is that the buyer is usually not a technical evaluator. Procurement is HR, the demo is in front of a head of talent, and the question “what is the actual model architecture” rarely gets asked. Vendors learn that they can ship the same regex they shipped in 2018 and call it AI as long as the demo looks clean. There is no penalty for the marketing.

The second is that legacy ATS are platform plays. Greenhouse, Lever, and Workable make money on workflow, not on screening accuracy. Adding real ranking AI is expensive engineering for a feature that does not show up cleanly in a procurement comparison sheet. So they ship the cheap version.

The third is that hiring is downstream of so many other things, like job description quality, sourcing channels, market conditions, and team brand, that bad screening hides inside noise. A team that hires twelve good engineers a year credits the process. The same team would have hired the same twelve people with a Google Sheet, because the screening was not doing the work the team thinks it was doing. Real AI screening is testable in a way that helps a founder distinguish actual lift from noise: you can compare the model’s top ten ranked candidates against your own top ten and see whether they overlap. Hype AI fails this test almost every time.

What at-Amazon recommendation systems taught us about hiring

Before CurriculoATS, our founder Dev spent years at Amazon working on search and recommendation systems. There is a lesson from that work that maps directly onto hiring AI: the only model worth shipping is one whose top-K results are good. Everything else is theater.

Amazon’s recommender does not ship a model that performs slightly better in average precision but ranks the wrong items at position 1. Position 1 decides whether the user clicks. The same is true of resume ranking. The recruiter looks at the top ten or fifteen, period. If the model’s top ten are wrong, the rest of the system does not save you.

That is why CurriculoATS is built around a multi-signal evaluation, quantified achievements, experience relevance, career trajectory, skills alignment, that produces a 0â€“100 composite score with a full written reasoning paragraph. The paragraph is the part that turns a black-box ranker into something a founder can actually trust. You can read it, disagree with it, override it. That is the difference between AI as decision support and AI as decision replacement, and it matters more than any model architecture detail. See the impact scoring page for the full breakdown.

Why the buyer’s question matters more than the vendor’s marketing

The way a founder asks about AI in a demo determines whether they end up with real ranking intelligence or with marketing dressed up as one. The vendors who win procurement bake-offs are not always the vendors with the best models; they are the vendors whose demos answer the questions buyers actually ask. Most buyers ask the wrong questions. “Does it have AI?” is the wrong question, because every vendor will say yes. “What does the AI score?” is closer, because it forces the vendor to describe the model’s input. “Show me the reasoning behind a top-ranked candidate” is the right question, because it forces the vendor to produce the output. A vendor with real ranking intelligence can show this in 90 seconds. A vendor with hype AI cannot. This is the single highest-signal moment in any ATS demo, and most founders skip it because the demo flow is built to keep them on workflow features. We’ve seen founders sit through 40-minute walkthroughs of pipeline boards and offer-letter templates, then sign annual contracts without ever asking the model to produce a paragraph for a specific candidate. The procurement gap is not knowledge of AI; it is willingness to interrupt the demo. A founder who asks for the reasoning paragraph in the first ten minutes filters out 70% of the legacy ATS market without having to read a single benchmark.

How a founder evaluates AI recruitment vendors in one demo

You do not need a procurement framework. You need three questions.

Show me the reasoning behind the top-ranked candidate. If the answer is a number with no explanation, the model is keyword-based. Move on.
Run my last hire’s resume through the system before I joined. If the model would have ranked them in the top ten, the system has signal. If they would have been rejected, the model is broken.
What does this cost when I have ten roles open instead of three? Per-seat pricing punishes growing teams. Flat pricing scales with hire volume, which is what a startup actually cares about.

That is the whole demo. Thirty minutes is enough.

Founder questions

How do I know if an ATS uses real AI or just keyword matching?

Ask for the reasoning paragraph behind any candidate score. Real AI can explain why a candidate ranked where they did, in plain English, citing specific resume content. Keyword-based systems cannot, because there is no reasoning to explain â€” just a token-overlap calculation. If the demo skips this question, the AI is marketing.

Is AI in recruitment legal?

Yes, with constraints. NYC Local Law 144 requires a bias audit and candidate notice for automated employment decision tools used in NYC. The EU AI Act classifies hiring AI as high-risk under Annex III, with obligations enforceable from August 2026. Both regimes favor systems with documented reasoning, which is what an outcome-based model produces and a black-box keyword filter does not.

Will AI replace recruiters?

No. The realistic role for AI is reducing the screening load so a recruiter or founder spends time on candidates worth talking to instead of reading 200 resumes a week. The decision still belongs to the human. AI moves the top-of-funnel from a dumping ground to a triaged inbox.

What is the difference between automation and AI in hiring?

Automation runs predefined rules. Send this email when the candidate moves to stage two. Schedule a slot when both calendars are free. AI makes a judgment that requires reading and interpretation. Ranking 200 resumes by relevance to a role requires AI. Sending a confirmation email does not. Most ATS only do the second and call it the first.

Can AI screening produce a defensible audit trail?

Only if the model produces a written reasoning paragraph per candidate. NYC Local Law 144 requires bias audits and candidate notice for automated employment decision tools, and the EU AI Act’s Annex III obligations enforceable from August 2026 require documented reasoning. A 73% match score with no explanation cannot be audited, disclosed, or defended in a regulator’s review. A paragraph that names specific outcomes and their mapping to the role can. Founders should treat the reasoning paragraph as a compliance asset, not just a UX feature.

How much should AI screening cost?

Pricing in the ATS market is wide. Workable starts at $149/mo and scales with employee count. Greenhouse contracts typically run $12,000+ per year. CurriculoATS Pro is $100/mo (currently $50/mo early bird), which is roughly an order of magnitude less for the same screening layer. The price difference reflects positioning, not capability gap.

What to do next

If you have an ATS shortlist on your desk and you are not sure whether the AI is real, run the three-question demo on each one. Then read our AI ATS for founders page and the compare page to see how the major vendors stack up against an outcome-based model. The Starter tier on CurriculoATS is free for one active job, which is enough to test the ranker on a role you are running right now without committing to a contract.