A regional information portal launches with 750 pages, correctly configured SSR, valid sitemaps, proper canonical tags, and no technical blocks — yet Google indexes nothing beyond the homepage. The diagnostic reveals that the barrier is not technical. It is economic. Google’s retrieval system evaluates every URL through a cost-to-value calculation that accounts for crawl demand, information gain, domain authority signals, and page-level retrieval cost. And historical user data, but it doesn’t apply to this case. When all factors return negative, the system withholds crawl resources entirely — regardless of how technically sound the site is. This case study traces the diagnostic process, identifies the compounding mechanisms, and maps the path from zero indexation to retrieval eligibility — the threshold a page must cross before ranking factors even apply.
That framing matters beyond this single portal. Any company launching a new content vertical — whether a SaaS knowledge base, a marketplace category expansion, or an aggregator entering a new market — faces the same retrieval economics. The technical audit passes. The pages look right. And Google ignores them.
What you will find in this article:
The Site and Its Indexation Problem
The subject is a newly launched regional portal aggregating cultural events, places, and local news. Each place page presents contact details, opening hours, location data with Google Maps integration, public transport stops, and a 12-month event calendar. The technology stack runs Nuxt 3 with Server-Side Rendering confirmed both by source code inspection and Sitebulb’s Response vs Render analysis, which returned a 93/100 score with zero meta robots changes across 757 URLs. The server delivers fully pre-rendered HTML with structured data in JSON-LD format.
For weeks after launch, only the homepage entered Google’s index. Every subpage — places, events, news, categories — remained outside. No manual actions appeared in Google Search Console. No security issues flagged. The sitemaps processed successfully.
The instinct in most SEO diagnostics is to look for a technical block. A misconfigured robots.txt. A noindex tag hiding somewhere in the rendered DOM. A JavaScript rendering failure the crawler can’t parse. None of that applied here. The server returned status 200 with correct content-type headers, self-referencing canonicals checked out, and the URL Inspection tool’s live test confirmed Google could access, render, and read every page it was pointed at.
This is where many audits would stop — and that’s precisely where the actual diagnostic begins.
What the Crawl Data Revealed
A full crawl with JavaScript rendering enabled showed minimal discrepancy between server response and rendered DOM. The canonical delta was near-zero: 755 URLs unchanged, 2 modified, 1 created. SSR was functioning as intended. The rendering layer was not the problem.
Google Search Console’s Pages report told a different story. The “Why pages aren’t indexed” distribution broke down as follows:
| Status | URL Count |
|---|---|
| Discovered – currently not indexed | 342 |
| Crawled – currently not indexed | 13 |
| Redirect error | 5 |
| Page with redirect | 3 |
Six sitemaps had been submitted, collectively declaring over 750 URLs across news, events, event categories, places, and static pages. All returned “Success” status — Google read and processed every one. Not a single subpage reached the index.
The link portfolio told the rest of the story. Ahrefs reported a Domain Rating of 9, with 228 referring domains and 1,156 backlinks — but zero organic traffic and zero organic keywords. A site-level search excluding the domain itself returned only the homepage. None of the institutions whose profiles the portal hosted linked back to their own pages on the site. No external mentions of any subpage existed anywhere.
Two additional issues surfaced in the code. The JSON-LD structured data on place pages contained an AI prompt artifact in the description field — a string fragment clearly generated during development and never cleaned. Discrepancies between Schema.org data and visible page content included mismatched URLs, phone numbers, and email addresses. The structured data declared isAccessibleForFree: false while a visible badge read “Free admission.” Beyond the structured data, breadcrumbs on place pages linked to categories under /events/ rather than /places/, the main descriptive section lacked an H2 heading, and rel="nofollow" was applied to links pointing at the institutions’ own official websites.
These quality issues matter — but they are secondary to the structural diagnosis that the crawl data demands.
The 342:13 Ratio in Discovered – Currently Not Indexed
This ratio is the diagnostic key. It separates two fundamentally different indexation failures.
The 342 URLs sitting in “Discovered – currently not indexed” status mean Google knows they exist but never fetched them. The crawl scheduler looked at these URLs, weighed the predicted value against the crawl cost, and decided not to invest the resources. This is not a content quality judgment. Google has not seen the content — it made a pre-fetch decision based on domain-level signals and predicted information gain.
Contrast that with the 13 URLs sitting in “Crawled – currently not indexed.” Google fetched those pages, evaluated their content, and explicitly decided they did not merit inclusion in the index. A quality-level rejection — fundamentally different from never being fetched at all.
What the 342:13 split reveals — 96% of unindexed URLs never even fetched — is that the primary barrier sits at the crawl scheduling stage, not the content evaluation stage. Most SEO practitioners would focus on improving page quality to resolve an indexation problem. Here, that approach addresses only the 13. The 342 need a different intervention entirely: the domain must first earn the crawl investment before page-level quality even enters the equation.
For a B2B SaaS company launching a new documentation hub or resource center, this distinction is the difference between two capital allocation strategies for organic growth — one targets content quality, the other targets domain-level authority signals. If your new pages are being crawled and rejected, invest in content quality. If they’re discovered and never fetched, the investment needs to go into domain-level authority and external demand signals first — content improvements on pages Google hasn’t read are wasted spend.
A Negative Cost-to-Value Loop
Four forces act in combination, not any single cause.
The most fundamental: zero information gain. Every data point on the portal’s place pages — addresses, opening hours, contact details, transit stops, institutional descriptions — already exists in Google’s index from more authoritative sources. Official institution websites, Google Maps, Google Business Profiles, Facebook pages, and public transit authority feeds all provide this data. The portal aggregates it without adding information that would expand the index’s coverage. From the retrieval system’s perspective, indexing these pages produces no marginal value.
Google’s information gain scoring — described in patent literature as evaluating how much new information a page contributes relative to existing index coverage on the same topic — would assess these pages as near-zero gain. When a page’s content is a subset of what the index already holds from more authoritative sources, the system has no reason to allocate retrieval resources to it.
Compounding the information gain problem: extremely low crawl demand. DR 9. Zero organic traffic. Zero keywords in the index. No external links pointing to any subpage. No branded queries targeting specific URLs. The sitemaps declare 750 URLs, but a sitemap creates supply — it does not create demand. Without behavioral signals, without external validation, without any indicator that users are searching for or navigating to these pages, Google’s crawl scheduler treats them as lowest priority.
Then there’s the cost side of the equation. Each page serves a heavy DOM — a double-rendered 12-month calendar (one for mobile, one for desktop), duplicated contact sections for responsive variants, thousands of lines of inline SVG for decorative icons, and extensive navigation markup. For a domain with near-zero crawl demand, the computational cost of fetching and processing such pages is disproportionate to the expected informational return.
The fourth force closes the loop: no fresh offsite signals. None of the institutions whose profiles are hosted on the portal link back. No social media citations. No mentions in other publications. This creates a self-reinforcing cycle — without external signals there is no crawl demand, without crawling there is no indexation, without indexation there is no traffic, and without traffic there are no external signals. The loop is closed.
These four factors do not operate independently. They compound. A site with low domain authority but genuinely unique content can still earn indexation — information gain provides the incentive. A site aggregating existing data but backed by strong external signals can still earn crawl investment — demand overrides the low-gain assessment. When both information gain and crawl demand are near zero simultaneously, the retrieval system has no economic reason to act.
The Retrieval Systems Behind This Decision
The diagnosis maps directly onto documented Google systems — and understanding the scale at which those systems operate makes the cost-to-value logic unavoidable.
Google crawls trillions of pages. It indexes a fraction. During the US vs Google antitrust trial, Pandu Nayak — Google’s VP of Search — testified that the index contained approximately 400 billion documents as of 2020. That is not a record of the web. It is a curated selection. Nayak was explicit about the economics: page sizes are growing, metadata per document is growing, and for a fixed storage capacity, each increase means fewer documents can be indexed. As he put it under oath, there is a tradeoff between the volume of data, the diminishing returns of additional data, and the cost of processing it — and Google stops at the point where value stops justifying cost. A new regional portal with 750 pages of aggregated data that Google already holds from better sources sits on the wrong side of that equation. At planetary scale, every URL competes for retrieval resources against hundreds of billions of alternatives. The system is not ignoring this site — it is rationally deprioritizing it.
Google’s crawl budget documentation defines crawl demand as a function of URL popularity and staleness — how outdated the indexed data might be. For URLs that have never been indexed, staleness is undefined and popularity is zero. Crawl demand, by definition, approaches the minimum. This is confirmed fact, documented by Google on developers.google.com.
Navboost — confirmed through DOJ trial documents — uses behavioral user data including SERP clicks, time on page, and return-to-results patterns to modulate ranking and, based on available evidence, indirectly influence crawl priorities. For a domain with zero organic traffic, Navboost generates no positive signals. The behavioral feedback loop that would normally help a growing domain accelerate its crawl allocation simply does not activate.
Documents from the Google Leak and DOJ trial indicate that Google assigns quality scores at the domain level, not only to individual pages. References to “siteAuthority” suggest that a new domain with no ranking history, no entity-level authority signals, and no links from authoritative sources receives a low domain-level quality score — which depresses crawl priority for every URL under that domain regardless of individual page quality. This inference is well-supported by the leaked documentation, though the exact scoring mechanism remains undisclosed.
The Helpful Content system adds another layer. Google’s documentation describes it as a classifier evaluating whether content was created primarily for users or for search engines. One documented negative signal is a large volume of pages with repetitive structure and minimal unique content. The portal’s place page template — identical UI structure, identical calendar and map components, with only a single paragraph of unique description varying between pages — fits this pattern closely.
One point that the URL Inspection tool’s documentation makes explicit but many practitioners miss: the “What isn’t tested” section states that pages must conform to quality and security guidelines that the tool does not verify. A passing live URL test — “Google can access,” “Page is indexable” — is a necessary condition for indexation but not a sufficient one. The green checkmarks in URL Inspection create false confidence when the actual barriers are economic, not technical.
Shifting the Economics — Where Capital Should Go
Breaking the negative loop requires changing the cost-to-value calculation at each stage, prioritized by impact.
The highest-impact intervention is generating genuine information gain. The site must produce content that Google does not already hold from more authoritative sources. Original editorial content — reportages with unique perspective, thematic guides connecting multiple entities within a cultural district, post-event reviews, interviews with organizers — would shift the information gain score from near-zero to positive. This is not a question of “more content” but of content that expands index coverage. For a B2B SaaS company facing a similar indexation barrier on a new knowledge base, the same principle applies: if your documentation restates what the official API docs already say, the retrieval system has no reason to index a second copy. The value must be additive — unique analysis, integration examples, practitioner perspective.
Next in priority: building offsite demand signals. Obtaining links from the institutions whose profiles the site hosts would create the most targeted signal possible — an authoritative source confirming that the page about their institution has value. Foundational links from directories, industry platforms, and social profiles establish baseline domain authority. Social media activity pointing to specific subpages begins building the behavioral demand that feeds crawl scheduling. In capital allocation terms, the first $5,000 in this vertical should go entirely toward earning those institutional backlinks and establishing external sources of truth — not toward producing more aggregated pages.
Third: reducing the cost side of the equation. Moving the 12-month calendar to client-side lazy loading instead of SSR-rendering it in two responsive variants, eliminating DOM duplication for mobile and desktop sections, replacing inline SVGs with a sprite, and removing excessive preload directives would significantly slim the page weight. This doesn’t change the value side of the equation, but it improves the ratio — making each page cheaper for Google to process when crawl demand does arrive.
After the fundamental economics shift, the identified quality issues warrant attention. The AI artifact in the structured data, the data discrepancies between Schema.org markup and visible content, the broken breadcrumb hierarchy, the nofollow on institution links, the missing heading structure — all of these depress quality assessment. But fixing them in isolation, without addressing information gain and crawl demand, would not unlock indexation. You’d have a technically cleaner site that Google still ignores.
The sequence matters. Information gain and offsite signals first. Cost reduction second. Quality fixes third. That prioritization reflects how the retrieval pipeline actually evaluates — and it is the order in which capital should be deployed.
Frequently Asked Questions
Does “Discovered – currently not indexed” always mean low crawl demand?
In most cases, yes. This status indicates Google found the URL — through a sitemap or internal link — but the crawl scheduler has not allocated resources to fetch it. The most common causes are low domain authority, absence of external demand signals, and predicted low information gain. It is a crawl-priority decision, not a content-quality decision, because Google has not yet seen the content.
Can submitting more sitemaps or using the URL Inspection request fix this?
Neither addresses the underlying economics. Sitemaps declare URL supply — they do not create demand. The URL Inspection tool’s “Request Indexing” function can accelerate processing for individual URLs, but at scale (342 URLs in this case), it does not change the crawl scheduler’s domain-level priority assessment. The fix is structural: improve the signals that drive crawl demand.
How does this relate to the Helpful Content system?
The Helpful Content classifier evaluates whether pages are created primarily for users or for search engines. A template-driven site with hundreds of structurally identical pages and minimal unique content per page can trigger negative signals from this classifier. In this case, the Helpful Content assessment likely operates as a secondary factor — the primary barrier is crawl demand, but the template pattern would affect quality evaluation for the 13 URLs that were actually fetched and rejected.