Anthropic ★★ Frequent Hard CrawlerDedup

A26 · Design a Web Crawler A26 · 设计网页爬虫

Verified source经核实出处

Prompt: "Design a Web Crawler." — Exponent, programhelp.net. Credibility B/C.

Core mechanics identical to O10 (OpenAI version). Anthropic's version often emphasizes multi-threaded / async workers and rate control. See O10 for the full architecture.核心机制与 O10(OpenAI 版本)一致。Anthropic 版本常强调多线程/异步 worker 与速率控制。完整架构见 O10。

Anthropic-specific follow-upsAnthropic 风格追问

  • How would you safely filter out sensitive / restricted content during crawling?爬取过程中如何安全过滤敏感/受限内容?
  • How do you handle dynamic JS-rendered pages? (Headless browser subset — expensive.)如何处理 JS 动态渲染页面?(Headless 浏览器子集——成本高。)
  • How do you avoid creating legal / ethical issues (robots.txt compliance, copyright)?如何避免法律/伦理问题(robots.txt 合规、版权)?

Related study-guide topics相关学习手册专题