What does the Content Gap Quick-Scan actually measure?
It pulls both URLs server-side, strips nav/footer/script/style noise, then for each page extracts: the heading outline (H1 to H4), word count of the body content, the top 20 most-frequent meaningful words (3+ letters, stopwords removed), and the count of internal versus external links. The two pages are then put side by side and the tool produces three keyword lists - keywords your competitor uses but you don't, keywords you use but they don't, and keywords you both use.
A bigger word count means I should make my page longer, right?
Not automatically. Word count is a coarse signal: if your page is 800 words and the ranking competitor is 3,400, you probably do need more depth, but stuffing fluff to hit a target backfires. Use the heading outline diff and the "they have, you don't" keyword list to see which subtopics they cover that you skipped. Add sections that genuinely answer related questions, not paragraphs of filler. Quality plus completeness beats word count on its own every time.
How are the top keywords picked - is there any NLP behind this?
It is a frequency count, not real NLP. The body text is lower-cased, tokenised on whitespace and punctuation, filtered against a stopword list (the, and, is, etc.) and a minimum length of three letters, then ranked by how often each remaining word appears. That is enough to surface the dominant topics on most pages, but it can be fooled by repeated brand names, navigation phrases that survived the strip, or very technical jargon. Treat the lists as a starting point for human judgement, not a definitive topic model.
Why might the tool produce sparse results for a page I know is rich?
The body extractor strips elements it identifies as boilerplate - header, nav, footer, script, style, aside. If a page wraps its main content in non-semantic <div>s without the right ARIA roles, some of that content can be mis-classified as boilerplate and dropped. Likewise, single-page apps that load content via JavaScript will appear nearly empty because server-side fetching only sees the initial HTML shell. If results look thin, view-source on the URL and confirm the content is actually present in the raw HTML.
Should I copy the competitor's entire heading outline?
No - copying outlines produces derivative pages that struggle to differentiate. Use the side-by-side outline as a coverage check: are there subtopics or H2/H3 sections they include that you genuinely skipped? Add those if they fit your angle, but keep your own structure and voice. The "they have, you don't" keyword list is more useful than literal heading copying, because it points to themes rather than specific phrasings, leaving you free to write the headings in your own words.
Internal vs external link counts - what should I do with them?
Internal link count tells you how aggressively each page sends users deeper into its own site (ranking pages typically have 5-30 internal links). External link count signals trust and citation behaviour (1-5 outbound to authoritative sources is healthy; zero looks insular and 50+ looks like a link farm). If your page has notably fewer of either category than the competitor, audit the body for natural places to link out to references and back to your own related content.