We test our detector on thousands of human and AI-generated texts from the public RAID benchmark, including paraphrased, edited, and disguised content. Here’s what it actually does.
Real strengths, in plain language.
Trained on text from ChatGPT, Claude, Gemini, Llama, Mistral, and other widely used AI assistants. On text from today's top commercial models, it's right close to 100% of the time.
Not fooled by look-alike characters (Cyrillic 'а' swapped for Latin 'a'), deliberate typos, weird spacing, or article deletions. These tricks have near-zero effect on the score.
When the score says 'very confident', it means it. A 90% score really means about 90% likely AI, not 50% rounded up, not 99% inflated. The percentage you see matches reality.
If wrongly flagging a human is the worst outcome, a tighter setting flags humans incorrectly only about 1 time in 100, while still catching roughly 98 of 100 AI texts.
What our detector struggles with. Every AI detector has limits, and we'd rather tell you ours.
If someone takes AI output and substantially rewrites it in their own words, we catch about 8 in 10 of those, and miss roughly 1 in 5. This is the hardest case in AI detection, and no tool solves it cleanly today.
Works best on paragraphs and full passages, roughly 250 words and up. Single sentences don't give the model enough to commit to a verdict, and we'll tell you when there's too little signal.
About 6 of every 100 genuinely human-written texts can be flagged as AI. For decisions that affect someone's grade, job, or reputation, treat our score as evidence, not a verdict. Always combine with human review.
Trained and benchmarked on English. We don't make claims about other languages until we've tested them properly.
Our detector reads your entire text in context, not sentence by sentence in isolation, but together, the way a careful reviewer would. It looks at patterns most AI assistants share: word choices, sentence rhythm, transitions, and the small stylistic tells they leave behind.
The output is a single probability from 0 to 100% plus per-sentence highlights, so you can see where the signal is coming from instead of just an opaque verdict.
The values that guide how we build this tool.
We prioritize minimizing false positives. Mislabeling human-written text as AI-generated can have serious consequences, so precision comes before recall in our pipeline.
Submitted texts are not stored permanently or used for training. We process content for analysis and return results. Your writing stays yours.
AI detection is probabilistic, not absolute. We calibrate scores so the percentage you see matches the real probability, and we publish what the tool can and can't do.
We regularly benchmark our models against established datasets and real-world adversarial examples. As AI writing evolves, our detection keeps pace.
RAID (Robust AI Detection) is an independent public benchmark that pits AI detectors against thousands of real human and AI-generated texts, plus a dozen-plus ways people try to evade detection: paraphrasing, character swaps, deliberate typos, and more.
We run our model on RAID regularly and publish what it scores. We don’t grade our own homework.
Paste any text and get an instant AI detection analysis. Free, no account needed.
Sentence-level highlights included.