data.world 67 C
🛡️ SEO 67 🤖 GEO 67 ⚡ Perf 59 🏗️ Arch 79

data.world — Global SEODiff Score 67/100

data.world
📊

At 69/100, the ACRI for data.world indicates strong fundamentals in AI extractability, surpassing the majority of indexed sites. The rendering approach is hybrid, with a moderate ghost ratio of 15% — most content is accessible without JS, though some elements are client-rendered. A 12.1× bloat ratio is typical for sites in this tech tier — not wasteful, but streamlining could further boost extractability. Only 1 schema block is present — adding Organization, WebSite, and Breadcrumb schemas would significantly improve structured data coverage. All major AI bot user-agents (GPTBot, ClaudeBot, CCBot, Google-Extended) are permitted by robots.txt, ensuring broad AI crawler access.

67
C — Global SEODiff Score
Comprehensive search visibility assessment
Strong foundations, but Performance (59) is your bottleneck.
🎯 Top Fix: Monitor weekly to catch regressions early
🔬 Automated SEODiff Assessment · Snapshot: Mar 15, 2026 · 📋 API
📈 ACRI Trend 2 snapshots
Mar 4 Mar 15
🔔 Recent AI Indexing Activity
No recent changes detected by adaptive crawler.
Does your site score higher than data.world?
Run the same 40-signal audit on your own domain — free, instant results.
Scan Your Site Free →
🧮 Score Transparency — How is this calculated?
🛡️ Traditional SEO (25% weight)67 × 0.25 = 16.8
🤖 AI Readiness / GEO (40% weight)67 × 0.40 = 26.8
⚡ Performance (20% weight)59 × 0.20 = 11.8
🏗️ Architecture & Trust (15% weight)79 × 0.15 = 11.8
Weighted sum = 16.8 + 26.8 + 11.8 + 11.8
Global SEODiff Score = 67 (C)
📊 ACRI Sub-Scores (AI Readiness Detail)
100
Bot Access
avg 92
94
Rendering
avg 93
21
Structure
avg 35
42
Schema
avg 9
50
Tech Stack
avg 63
🔀
Visibility Delta: Google vs AI
Google (Tranco)
Top 3%
Rank #33761
+33 pts
Gap
AI (ACRI)
Top 36%
Score 69/100

data.world punches above its weight in AI — AI visibility exceeds Google ranking. This is a competitive moat worth protecting. ACRI measures technical crawler readiness. Read the methodology →

Why data.world ranks here

Tech stackCustom / Proprietary
Industry
RenderingHybrid
Schema coverage1 blocks
Token bloat12.1×

Fastest improvements

  • Reduce token bloat (navigation/footer/code) so agents reach your main content faster (see Token Bloat).
  • Create an llms.txt file so AI crawlers can discover your content structure without heavy crawling. Generate llms.txt →
  • Run a full entropy audit to find which DOM regions waste the most tokens. Run Entropy Audit →
🧪

JavaScript Rendering Check

We check what AI crawlers miss when they skip JavaScript execution.

Running headless browser to simulate AI extraction…
🛡️

Traditional SEO

67/100 25 % of Global Score 🟢 High Confidence

📝 Title Tag

38 chars
Good length

Optimal range: 30–60 characters for SERP display.

📋 Meta Description

103 chars
Too short

Optimal range: 120–160 characters for snippet control.

🔤 Heading Hierarchy

  • ✓ Exactly 1 <h1> tag — found 1
  • ✓ Has <h2> headings — found 5
  • ✓ <h2> not before <h1>

🔍 Indexability

  • ✓ Canonical tag present → https://data.world/
  • ✓ No noindex directive
  • ✓ Meta viewport set
  • ✓ HTML lang attribute → en
  • ➖ Hreflang tags — N/A (single language site)
  • ✓ Googlebot allowed by robots.txt

🌐 Social / OpenGraph

  • ✓ og:title — The Data Catalog Platform
  • ✓ og:description — Discover data and metadata in seconds and develop data products and analytics that drive your business.
  • ✓ og:image — preview
  • ✓ twitter:card — summary_large_image
📐 How the SEO Pillar score is calculated

SEO Pillar = Title (20 pts) + Meta Desc (20 pts) + Heading Hierarchy (20 pts) + Indexability (20 pts) + Social/OG (20 pts)

Each sub-score is derived from the checks above. Canonical tag, lang attribute, og:image, and a single H1 are the highest-impact items.

🤖

AI Readiness / GEO

67/100 40 % of Global Score 🟢 High Confidence

This pillar aggregates citation share, hallucination risk, bot access, schema health, and content extractability. The individual diagnostic sections below contribute to this score.

🚨

Hallucination Risk

Research

Is AI lying about your brand? This panel measures how likely LLMs are to hallucinate facts when extracting information from your page.

Analyzing hallucination risk…

🤖 Bot Access Matrix

GPTBot (OpenAI)
Allowed
ClaudeBot (Anthropic)
Allowed
CCBot (Common Crawl)
Allowed
Google-Extended
Allowed
Googlebot
Allowed

👻 Rendering (Ghost Ratio) Docs

Ghost Ratio 15%
0% — Safe 50% 100% — Risk
Status Server-Side Rendered (Safe)
Rendering Type Hybrid

📊 Structure & Information Density Docs

Structure Grade 21/100 — Low
Structured Elements 11 elements (11 lists, 0 rows, 0 headers)
Total Words816
Raw Density1.4%
💡Low structure score (21/100). Your content appears as a wall of text with few structured HTML elements. You have 11 list items, 0 table rows, 0 table headers. Convert features into <ul> lists and data into <table> elements to help AI models extract structured information.

🏷️ Schema Health Docs

Organization Schema ✅ Present
Product / Service Schema ⚠️ Not Found
Total Schema Blocks1 block(s) — Basic (low value for AI)

Schema Coverage Map

2/7 schema types detected
✅ Organization
❌ Product/Service
❌ Breadcrumb
❌ FAQ
❌ Article
✅ WebSite
💡Product / Service schema missing. AI models don't know this is a SaaS product. Add Product or SoftwareApplication schema so AI understands what you offer and can surface pricing/features.
💡BreadcrumbList schema missing. AI cannot understand your site hierarchy or how pages relate to each other.
💡FAQ schema missing. Adding FAQPage schema lets AI models directly extract Q&A pairs for Featured Snippets and chatbot answers.

📐 AI Efficiency Metrics Docs

49
AI Extractability
Medium
Crawl Cost
None
Blocklist Risk
Extractability49/100 — AI models can partially extract answers from this page
Crawl CostMedium (50/100) — moderate for AI crawlers to process
Blocklist RiskNone — 0 of 5 AI crawlers blocked

Token Bloat Research

8%
🗑️ 92%
Useful Content (31.3 KB)Bloat (346.8 KB)
Token Bloat Ratio12.1× — Normal

Multimodal Readiness

Visual Context95% Optimized for Vision
Image Alt Coverage21 / 22 images have alt text

TDM Rights

TDM-Reservation HeaderNot set
X-Robots-Tag: noaiNot set

🔥 Structural Entropy Check Research

0 Entropy
Poor Token Bloat: High
Noise Ratio: 91.7% · SNR: 0.09 · Signal: 8014 / Noise: 88781 tokens

🔬 AI-Crawler Simulation

See your website the way AI crawlers do. CSS stripped, structure labeled, content chunked.

🌐
This is what humans see — styled, branded, visual.
Toggle to "AI Agent View" to see what GPTBot, ClaudeBot, and other AI crawlers actually extract from this page.
🤖

AI Answer Preview

NEW

See how AI models summarize your site. Left: your actual content. Right: what the LLM extracts and says about you.

Simulating AI extraction…
🧠

The LLM Interpretation

AI-VERIFIED

SEODiff AI analyzed the extracted content of data.world and produced this structured business intelligence. Fields marked SEMANTIC VOID indicate information the AI could not find — a critical gap in your site’s machine-readability.

Core Offering
Data.world provides a data catalog platform that helps enterprises discover, govern, and manage their data assets, enabling data-driven decision-making and
Target Audience
Data Leaders, Data Engineers, Data Governance Professionals, Analysts & Business Users
Pricing Model
Not specified - Platform-as-a-Service (PaaS)
🔗 Integration Partners
SnowflakeGoogle BigQueryJIRA
🏆 Competitive Moat
AI-powered data discovery and governance, agile data stack development methodology, and a flexible metadata model.
📊 Content Depth
8/10
🔄 Programmatic SEO Signals
Integration directory pagesTemplate comparison pages
⚡ Key Pain Points
• No structured FAQ schema
• Thin landing pages for features
Analyzed by SEODiff AI · 2026-03-02

🔧 Tech Stack

AI-Readiness Score50/100
Server
CDN
HTTP Status200
Load Time650 ms
Raw HTML Size378.1 KB
Visible Text Size31.3 KB

Performance & Speed

59/100 20 % of Global Score 🟢 High Confidence

⏱️ Time to First Byte

650 ms
Slow — bots may time out or deprioritise

Google considers <200 ms "good". AI crawlers may have even shorter timeouts.

📦 Page Weight

1003
DOM nodes
378 KB
HTML payload
Heavy page — consider reducing DOM complexity

🗄️ Cache & CDN

  • ✗ Cache-Control header
  • ✗ CDN cache status
  • ✗ CDN detected

🔬 Tracker Tax

0
tracker scripts
0
third-party domains
0.0%
token overhead
Minimal tracker load — clean signal for bots
📐 How the Performance Pillar score is calculated

Perf Pillar = TTFB (35 pts) + Page Weight (25 pts) + Cache/CDN (20 pts) + Tracker Tax (20 pts)

TTFB <200 ms = full marks. DOM >3000 or payload >300 KB incurs heavy penalties. Tracker scripts beyond 5 reduce score.

🏗️

Architecture & Trust

79/100 15 % of Global Score 🟢 High Confidence

🗺️ Sitemap & Robots

  • ✗ Sitemap declared in robots.txt
  • ✓ Googlebot allowed
  • ✓ GPTBot allowed
  • ✓ ClaudeBot allowed

🔗 Linking

85
internal links
11
external links
Good internal linking — helps crawlers discover content

🔒 Security & Trust

  • ✓ HSTS header (Strict-Transport-Security)
  • ✗ Content-Security-Policy header
  • ✓ HTTP status 200 OK (got 200)

♿ Accessibility Signals

  • ✓ HTML lang attribute → en
  • ✓ Meta viewport for mobile
  • ✓ Single H1 for screen readers
📐 How the Architecture Pillar score is calculated

Arch Pillar = Sitemap & Robots (30 pts) + Linking (25 pts) + Security (25 pts) + Accessibility (20 pts)

Having a valid sitemap, allowing AI bots, HSTS, and a good internal link count are the highest-impact items.

🏅 AI-Verified Trust Badge

Your site scores 47/100. Reach 80+ to unlock the green "AI-Verified" badge. Fix the issues below to improve your score.

AI-Verified badge for data.world
Pending Audit — score below 80 threshold
<a href="https://seodiff.io/radar/domains/data.world" rel="noopener"><img src="https://seodiff.io/api/v1/badge?domain=data.world" alt="AI-Verified by SEODiff" width="280" height="52"></a>

💡 Paste in your site footer, GitHub README, or email signature. Badge updates automatically as your score changes.

� Deep Crawl Analysis 85 pages · Deep-10

Homepage ACRI
47
Single-page score
+22
Subpages outperform homepage
Δ delta
Site-Wide ACRI
70
Avg across 85 pages · Range 0–85
Topical Cohesion
13%
Topical Drift
TF-IDF cosine similarity
Total Words
207303
Avg Bloat
15.4×
RAG Fractures [?]
7
⚠️
7 RAG-Chunking Fractures Detected

Poorly formatted tables or pricing grids on 7 pages will be split incorrectly during RAG chunking, causing AI models to hallucinate prices and features.

Page Type ACRI Token Bloat Words Status
https://data.world/blog/10-tips-for-building-your-data-warehouse-and-data-catalog-together/
10 tips for building your data warehouse and data catalog together | data.world
blog 85 8.6× 4205
https://data.world/blog/5-questions-with-tejas-manohar/
5 Questions with Tejas Manohar | data.world
pricing 85 9.4× 3512 💰 Pricing
https://data.world/blog/5-questions-with-lars-albertsson/
5 Questions with Lars Albertsson | data.world
blog 85 9.9× 3269
https://data.world/blog/5-questions-with-nick-schrock/
5 Questions with Nick Schrock | data.world
pricing 85 9.1× 3450 💰 Pricing
https://data.world/blog/ai-data-management/
5 advantages of AI for data management | data.world
pricing 85 9.1× 4213 💰 Pricing
https://data.world/blog/5-questions-with-sarah-catanzaro/
5 Questions with Sarah Catanzaro | data.world
blog 85 9.6× 3367
https://data.world/blog/a-starting-toolkit-for-humanity-navigate-the-road-to-a-future-of-ai-empowered-conscious-capitalism/
A starting toolkit for humanity: Navigate the road to a future of AI-empowered Conscious Capitalism | data.world
pricing 85 8.1× 4207 💰 Pricing
https://data.world/blog/5-questions-with-cindi-howson/
5 Questions with Cindi Howson | data.world
pricing 85 8.4× 4159 💰 Pricing
https://data.world/blog/5-questions-with-dj-patil/
5 Questions with DJ Patil | data.world
pricing 85 9.7× 3280 💰 Pricing
https://data.world/blog/5-questions-with-emily-hawkins/
5 Questions with Emily Hawkins | data.world
blog 85 9.1× 3505
https://data.world/blog/5-questions-with-erik-bernhardsson/
5 Questions with Erik Bernhardsson | data.world
blog 85 8.6× 3615
https://data.world/blog/5-questions-with-jans-aasman-2/
5 Questions with Jans Aasman | data.world
blog 85 9.8× 3379
https://data.world/blog/ai-data-catalog/
AI Data Catalogs: What They Are & Why They Matter | data.world
pricing 85 8.0× 4619 💰 Pricing
https://data.world/blog/ai-hallucination/
What are AI Hallucinations? Examples & Mitigation Techniques | data.world
blog 85 8.2× 4677
https://data.world/blog/ai-for-large-enterprises-benefits-use-cases-and-trends/
AI for Large Enterprises: Benefits, Use Cases & Trends | data.world
pricing 85 6.8× 6125 💰 Pricing
https://data.world/blog/ai-renaissance-2-0/
Chatbots, Knowledge Graphs, and the Agents Accelerating Enterprise Creativity in Renaissance 2.0 | data.world
pricing 85 5.7× 8988 💰 Pricing
https://data.world/blog/attention-ai-technologists-it-s-time-to-merge-intentionality-with-philosophy-ethics-and-law/
Attention AI technologists, it’s time to merge intentionality with philosophy, ethics, and law | data.world
pricing 85 5.5× 7028 💰 Pricing
https://data.world/blog/below-the-waterline-of-the-ai-iceberg-data-s-evolution-history-and-exponential-rise/
Below the Waterline of the AI Iceberg: Data’s Evolution, History, and Exponential Rise | data.world
pricing 85 5.6× 6800 💰 Pricing
https://data.world/blog/atlan-pricing/
Atlan Pricing: What You Need to Know | data.world
pricing 75 10.4× 3633 ⚠️ RAG Fracture
https://data.world/blog/5-questions-with-emil-eifrem/
5 Questions with Emil Eifrem | data.world
pricing 75 10.8× 2976 💰 Pricing
Showing 20 of 85 pages. Unlock full subpage table →
📂
Health by Sub-Directory
Average ACRI and top issues aggregated by URL path prefix
Path Pages Avg ACRI Ghost % Bloat Top Issue
/blog/ 80 74 0% 16.4× High JS Bloat
/about/ 1 0 0% 0.0× Low AI Readiness
/products/ 1 0 0% 0.0× Low AI Readiness
/case-studies/ 1 0 0% 0.0× Low AI Readiness
/pricing/ 1 0 0% 0.0× Low AI Readiness
/product/ 1 0 0% 0.0× Low AI Readiness
🔗
Outbound External Citations
0 unique external domains cited across 85 pages
servicenow.com ×80
careers.servicenow.com ×80
developer.data.world ×80
facebook.com ×78
twitter.com ×78
linkedin.com ×78
page.data.world ×13
gartner.com ×6
🔄 Re-Crawl & Update 📡 Track this Domain

Scores update automatically each month. Create a free account for on-demand re-crawls (3/month free).

🔌 API Access

Pull this data programmatically. All sub-page metrics are available via our public API.

curl https://seodiff.io/api/v1/deep10/domain/data.world

Get your free API key — 100 requests/month included.

🔗 Similar Sites

Domains with a similar tech stack, industry, and AI readiness profile to data.world. Compare side-by-side.

Domain ACRI AI Score Tech Stack Token Bloat Schema
data.world (this site) 47 69 Custom / Proprietary 12.1× 1
amherstma.gov 57 73 Custom / Proprietary 13.2× 0 Compare →
i10audio.com 57 72 Custom / Proprietary 10.7× 0 Compare →
hubsync.com 57 75 Custom / Proprietary 6.4× 1 Compare →
wbr.com 57 71 Custom / Proprietary 7.6× 0 Compare →
xn--80adrlem4a.mp3tm.net 57 78 Custom / Proprietary 7.4× 0 Compare →
Compare All 5 Similar Sites →

📊 Semantic Share of Voice

How often would an AI cite data.world when users ask about topics in this domain's niche? We run entity queries through our 188k-page search index and measure citation probability.

Analyzing citation landscape…

🩹

Remediation Patches

COPY-PASTE

Auto-generated code fixes tailored to data.world. Copy and paste these into your codebase to improve AI visibility. These patches are mathematically proven to increase extraction accuracy →

Reduce Token Bloat
Medium Impact ⏱ 1–2 hrs
Only 8% of your HTML is useful content. AI crawlers waste context window tokens on bloat.
html
<!-- Move inline CSS to external stylesheets -->
<link rel="stylesheet" href="/css/main.css">

<!-- Move inline scripts to external files with defer -->
<script src="/js/app.js" defer></script>

<!-- Remove duplicate navigation blocks -->
<!-- Keep only ONE <nav> in the <header> -->

<!-- Ensure <main> wraps your primary content -->
<main>
  <!-- Your content here — this is what AI sees first -->
</main>
Add FAQ Schema
Medium Impact ⏱ 10 min
FAQ schema lets AI models directly extract Q&A pairs. This is the easiest way to get featured in AI responses.
html
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is World?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Add your answer here — describe what World does in 1-2 sentences."
      }
    },
    {
      "@type": "Question",
      "name": "How does World work?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Explain the key features and how users interact with World."
      }
    }
  ]
}
</script>
📈

Projected Impact

ROI EST.

If you apply the patches above, here's the estimated improvement for data.world:

Current Score
69
Projected Score
77
Improvement
+8 pts
Reduce token bloat +5 pts
Add FAQ schema +3 pts

*Estimates based on SEODiff's scoring model. Actual results depend on implementation quality.

📋 Data Export

Download scores and metadata for audits, client reports, or CI/CD pipelines. Exports contain computed metrics only (no copyrighted content).

All data is generated automatically and updated with each crawl. JSON exports contain scores and metadata only (no copyrighted content).

Is this your company?

Monitor your AI visibility score weekly and get alerted when changes happen.

Start Free →

🧭 Self-Diffing (Private Layer)

For owned domains, combine this world snapshot with private drift + regression history.
Template Drift
Track in My Site
Drift → Traffic Impact
In development coming soon
Regression Incidents
Track in My Site
Internal Linking
Deep Audit graph
Semantic Structure
GEO view in Deep Audit
Content Quality
Thin/duplicate tracking

🕒 History

Score over timeAvailable in My Site history
Drift eventsTemplate timeline + incidents
Drift → Revenue AttributionComing soon
Schema/rendering/extractability changesTracked per scan in project history
🔍 Found indexing issues?
Run a free deep audit to diagnose crawled-not-indexed, soft 404s, redirect errors, and more.
Free Deep Audit → GSC Error Guide →