Thanks for the inspiration. After checking it, I'll use the output of following command in the robots.txt of my doc:
❯ bash -c '(<br> curl -fsS --tlsv1.3 https://codeberg.org/robots.txt | \<br> tac | \<br> grep -A999 "^Disallow: /$" | \<br> grep -m1 -B999 "^[[:space:]]*$" | \<br> tac<br> curl -fsS --tlsv1.3 https://raw.githubusercontent.com/ai-robots-txt/ai.robots.txt/main/robots.txt<br> ) | sort -ru'<br>User-agent: omgilibot<br>User-agent: omgili<br>User-agent: meta-externalagent<br>User-agent: img2dataset<br>User-agent: facebookexternalhit<br>User-agent: cohere-ai<br>User-agent: anthropic-ai<br>User-agent: YouBot<br>User-agent: Webzio-Extended<br>User-agent: VelenPublicWebCrawler<br>User-agent: Timpibot<br>User-agent: Scrapy<br>User-agent: PetalBot<br>User-agent: PerplexityBot<br>User-agent: Omgilibot<br>User-agent: Omgili<br>User-agent: OAI-SearchBot<br>User-agent: Met
... mehr anzeigenThanks for the inspiration. After checking it, I'll use the output of following command in the robots.txt of my doc:
❯ bash -c '(<br> curl -fsS --tlsv1.3 https://codeberg.org/robots.txt | \<br> tac | \<br> grep -A999 "^Disallow: /$" | \<br> grep -m1 -B999 "^[[:space:]]*$" | \<br> tac<br> curl -fsS --tlsv1.3 https://raw.githubusercontent.com/ai-robots-txt/ai.robots.txt/main/robots.txt<br> ) | sort -ru'<br>User-agent: omgilibot<br>User-agent: omgili<br>User-agent: meta-externalagent<br>User-agent: img2dataset<br>User-agent: facebookexternalhit<br>User-agent: cohere-ai<br>User-agent: anthropic-ai<br>User-agent: YouBot<br>User-agent: Webzio-Extended<br>User-agent: VelenPublicWebCrawler<br>User-agent: Timpibot<br>User-agent: Scrapy<br>User-agent: PetalBot<br>User-agent: PerplexityBot<br>User-agent: Omgilibot<br>User-agent: Omgili<br>User-agent: OAI-SearchBot<br>User-agent: Meta-ExternalFetcher<br>User-agent: Meta-ExternalAgent<br>User-agent: ImagesiftBot<br>User-agent: ICC-Crawler<br>User-agent: GoogleOther-Video<br>User-agent: GoogleOther-Image<br>User-agent: GoogleOther<br>User-agent: Google-Extended<br>User-agent: GPTBot<br>User-agent: FriendlyCrawler<br>User-agent: FacebookBot<br>User-agent: Diffbot<br>User-agent: ClaudeBot<br>User-agent: Claude-Web<br>User-agent: ChatGPT-User<br>User-agent: CCBot<br>User-agent: Bytespider<br>User-agent: Applebot-Extended<br>User-agent: Applebot<br>User-agent: Amazonbot<br>User-agent: Ai2Bot-Dolma<br>User-agent: AI2Bot<br>Disallow: /<br>
#
ai #
seo #
robotstxt #
crawler