Keyword Matching vs Outcome-Based ATS: Technical Comparison

Q: How does the outcome-based reasoning paragraph actually work?

The model produces a 60 to 100 word paragraph that names the specific resume content driving the score. It identifies strong signals such as systems shipped, revenue generated, and teams led; flags weaker areas where evidence is missing; and aligns the score to the role seniority requirement. That is what makes the rank trustable.

What is ATS keyword matching? Keyword matching is when an applicant tracking system scores a resume by how many job-description terms it contains. It is fast but shallow: it rewards resumes that repeat the right words and misses candidates who describe the same work differently. Outcome-based screening instead reads what a candidate actually built or delivered.

Every applicant tracking system uses some form of AI to rank resumes. The architectures fall into two categories: keyword-based matching, used by nearly every legacy platform, and outcome-based ranking, used by CurriculoATS. The difference is not marketing. It is a structural difference in how the model reads a resume, and it produces fundamentally different shortlists for the same set of candidates.

This is a technical comparison written for a startup founder who wants to understand what is actually happening under the hood when they buy ATS software, and why the two architectures cannot be smoothed over by a better UX or a cleverer model on the same input.

How keyword-based matching works

Keyword-based matching extracts tokens from the job description (Python, AWS, Kubernetes, Django, Postgres) and scans the resume for the same tokens. Each found token counts toward a match score. The basic implementation looks roughly like this: job_tokens = extract_keywords(job_description); resume_tokens = extract_keywords(resume_text); overlap = intersection(job_tokens, resume_tokens); score = len(overlap) / len(job_tokens). More sophisticated versions use TF-IDF weighting (rare terms count more than common ones) or sentence embeddings (resumes whose vector is close to the JD vector rank higher, catching synonyms like “led” for “managed”). All three variants share the same underlying property: they are deterministic text-matching functions, and the resume is the input. Jobscan documents how widespread keyword stuffing is in the candidate pool, and the prevalence is a direct response to the architecture: candidates who know the rules can reliably produce the output the model wants.

The three structural vulnerabilities of keyword matching

1. Adversarial resume poisoning. Any deterministic text-matching algorithm can be gamed by candidates who know the rules. If the model rewards the presence of specific keywords, candidates inject those keywords in hidden white text at the bottom of the resume, in repeated bullet points, or in fake skills sections. Keyword-based systems are structurally vulnerable to this; the model has no concept of whether the keyword is there because the candidate did the work or because they figured out the filter. Even semantic-similarity models are vulnerable, just to a slightly different attack: copy-paste the JD into the resume summary and the cosine distance collapses to near-zero.

2. False positives on weak candidates. A candidate whose resume mentions “Python” four times outranks a candidate whose resume describes shipping a real-time fraud detection system. The model rewards token density, not depth. The recruiter sees the false positive at the top of the rank and moves it forward; the strong candidate sits at position 47 and never gets read. The seven-second resume scan documented by Ladders, Inc. means recruiters do not have time to manually correct the rank.

3. False negatives on strong candidates. The Harvard Business School research on hidden workers estimated 27 million qualified Americans get rejected by automated screening for reasons unrelated to ability. A senior engineer who wrote “built a streaming service” instead of “architected a real-time data pipeline” disappears. A candidate with non-traditional vocabulary, gap on the resume, or different industry framing gets filtered out before any human reviews them.

How outcome-based ranking works

Outcome-based ranking inverts the mechanism. Instead of comparing the resume’s text to the JD’s text, the model extracts measurable accomplishments from the resume in four categories and scores them against the role’s outcome requirements. The four categories: revenue (deals closed, ARR generated, savings produced), teams (people managed, orgs scaled), systems (built, shipped, scaled in production), and problems (solved, complex, novel). The model returns a 0â€“100 composite score with a written reasoning paragraph that names the specific outcomes it found. The paragraph is the part that turns the system from a black box into a tool a human can audit.

Architecturally, the model is reading the bullet rather than the keyword. “Shipped a real-time fraud detection pipeline that processed 2M events/sec” registers as a strong systems-shipped signal regardless of whether the bullet uses the words “streaming,” “pipeline,” “real-time,” or any specific framework. “Python, Python, Python” registers as nothing, because there is no outcome behind it. The two candidates produce inverted ranks compared to a keyword model, and the inversion is the entire point.

Why outcome-based ranking is harder to game

Adversarial robustness is the cleanest way to compare the two architectures. A keyword-based model can be fooled by any candidate who pastes tokens onto a resume; the model has no defense, because the input it reads is the input the candidate controls. An outcome-based model can only be fooled by a candidate who fabricates accomplishments, which is a categorically harder attack. Inventing fictional revenue numbers or fake systems-shipped is fraud. Stuffing keywords is not. The cost-of-attack jump is what makes the outcome-based model resistant.

This is the same principle Amazon’s recommendation systems used a decade ago to protect against listing manipulation by sellers. Sellers can write whatever they want in a product title; they cannot fabricate purchase data, return rates, or verified review patterns. The ranker reads the harder-to-fabricate signal. Before CurriculoATS, our founder Dev spent years at Amazon working on search and recommendations, and the principle moved over almost intact: rank on signals the input cannot fabricate cheaply.

How the two architectures handle compliance differently

Modern hiring AI sits inside two regulatory regimes that matter for any startup hiring in NYC or the EU. NYC Local Law 144 requires bias audits and candidate notice for automated employment decision tools used in NYC, with annual third-party audits and public summary reporting. The EU AI Act classifies hiring AI as high-risk under Annex III, with obligations enforceable from August 2026, including mandatory risk assessments, technical documentation, bias testing, and human-oversight requirements.

Both regimes favor models with documented, explainable reasoning. An outcome-based ATS that produces a written paragraph per candidate satisfies the documentation and audit requirements far more cleanly than a black-box keyword model that returns a number with no explanation. The compliance cost difference between the two architectures is real: an outcome-based vendor can produce the audit artifacts directly from the model’s output, while a keyword-based vendor has to retrofit explanations or accept ongoing risk. This is not the marketing reason CurriculoATS uses the outcome-based architecture, but it is one of the reasons the architecture choice ages well.

What this means for a founder picking an ATS

The architectural difference shows up in the shortlist. Run the same hundred resumes through a keyword-based ATS and an outcome-based ATS and compare the top ten ranked candidates from each. The two lists will overlap by maybe 30â€“50%; the rest will be different people. The keyword list will include candidates who optimized their resumes for the filter. The outcome list will include candidates who actually shipped relevant work. The recruiter or founder reads one of those two lists.

Pricing reflects positioning rather than capability gap. Workable starts at $149/mo and scales by employee bracket. Greenhouse runs roughly $50â€“$150 per seat per month per buyer reports on PriceLevel, with median contracts around $12,250/year. CurriculoATS Pro is $100/mo (currently $50/mo early bird) with unlimited team members. The price difference is roughly an order of magnitude on the relevant startup tier, and the screening architecture is structurally different.

Founder questions

Can a keyword-based ATS just upgrade to outcome-based ranking?

Not without a rewrite. The two architectures use different model topologies, different training data, and different output formats. Most legacy ATS were built around the keyword filter as the core ranking primitive; replacing it requires rebuilding the screening pipeline. This is why most vendors layer marketing language (‘AI-powered’) on top of the keyword model rather than actually changing the architecture.

Does outcome-based ranking work for entry-level roles?

Yes, but the signal looks different. An entry-level candidate’s outcomes come from internships, projects, and academic work. The model reads those the same way it reads a senior candidate’s revenue and team signals. The score is calibrated to the role; an entry-level resume is not penalized for lacking ARR signals on a role that does not require them.

Is semantic similarity actually a kind of outcome-based model?

No. Semantic similarity is still text-to-text matching; it just uses learned embeddings instead of literal token overlap. The model has no concept of whether the candidate did the work, only of whether the candidate’s text is close to the JD’s text. A copy-pasted JD in the resume summary scores high on semantic similarity with zero actual experience. An outcome-based model reads what the bullet describes, not how close the bullet is to the JD’s wording.

How does the outcome-based reasoning paragraph actually work?

The model produces a 60â€“100 word paragraph that names the specific resume content driving the score. “Strong systems-shipped signal: built and scaled a real-time fraud detection pipeline at $employer; led a team of six engineers; shipped to production in eight months. Weaker on revenue signal: no ARR or savings figures listed. Career trajectory consistent with the role’s seniority requirement.” That is what makes the rank trustable.

Can I see the comparison in action?

Yes. Run the same role through your current keyword-based ATS and the free Starter tier of CurriculoATS, then compare the top ten lists. The overlap is the most informative demo you can do. See resume screening for the model details.

What to do next

If you are evaluating ATS options right now, the architectural choice is the most important variable, and it is the one most demos paper over. Read the impact scoring page for the technical mechanism, then check the compare page for how CurriculoATS stacks up against Greenhouse, Lever, Workable, Ashby, and Manatal on the screening layer specifically. The Starter tier is free and is enough to verify the architectural difference on a real role.