Zum Inhalt der Seite gehen


You might have guessed it already: We are struggling with excessive crawling today. We have - again - blocked several large IP ranges, but were not yet able to identify the new actor.

We are working on restoring service availability and fine-tuning our rate-limiting.

If someone is interested in implementing an improved native rate-limiting in #Forgejo that also protects other instances from abusive crawlers, please reach out 😉

Als Antwort auf Codeberg.org

We block all Azure now. Users whine but if they want good service they shouldn't be using an address in the bad part of Internet Town.
Als Antwort auf Codeberg.org

Just wondering what you folks use in front of forgejo. I experience abusive crawling as well but my instance is a small personal one on my homelab, so it's really annoying be losing bandwidth to abusive actors. Considering any self-hosatble WAF in front of my homelab services.
Als Antwort auf Felipe M.

@fmartingr We're using haproxy and have a custom blacklist loaded here: codeberg.org/Codeberg-Infrastr…

It's not public (yet), but we should probably consider opening it. Would need a check there are only publicly known IP addresses on there, though. I'm not fully up to date with how law considers publishing IP ranges of bad actors. ~f

Als Antwort auf Codeberg.org

I was considering trying something like CrowdSec, but unsure how they handle the bad IP ranges and what they consider "bad actors". If we could had something like that but with lists like adblockers do, maintained by the community, it could be nice. Will take a look :blobfoxeyes: Thanks and hope you resolve the issue soon!
Als Antwort auf Codeberg.org

Does "Aceville" ring bells for anyone by chance? Related to tencent probably? We are blocking one IP range after another ...
Als Antwort auf IAG

@iagondiscord The problem is that it's not targeting Codeberg. It's the #AIgoldrush. The web was completely crawled, just not by everyone yet. So startups start their #crawlers, carelessly and explicitly ignoring robots.txt to get the #biggestdata.

It does not matter if the web can no longer serve humanity due to this. Training the #AI is the only thing that matters.

Maybe a bit like a sacrifice for faith.

~f #goldrush

Als Antwort auf Codeberg.org

@iagondiscord ... of course not only startups funded by megacorps, but megacorps on their own, too.
@IAG
Als Antwort auf Codeberg.org

I won’t be able to provide an implementation but for better understanding: How does rate limiting work now and what kind of improvement would be helpful in your current scenario?
Als Antwort auf Daniel Böhmer

@dboehmer One of the primary constraints of the current rate-limiting is that there is only a global counter that increases for each request.

So a user watching Forgejo Actions logs scroll through will fire a lot of small requests. And a botnet that is distributed over many many IP addresses do not trigger the rate-limiting at all, because each server only fires a few requests.

Als Antwort auf Codeberg.org

I know what you feel... Same on gitnet.fr.

if ($http_user_agent ~* "facebookexternalhit|bytespider|Amazonbot|ClaudeBot|AhrefsBot") { return 429; }

Als Antwort auf Codeberg.org

I thought codeburg used their own solution for git? are you actually using forgejo? awesome! technically, rate limiting should be fixed if you install and configure crowdsec, then enable it at the reverse proxy level, but yeah, something in forgejo itself would be even better
Als Antwort auf Codeberg.org

Note that @fdroidorg can't build apps hosted on codeberg anymore due to this. Its buildserver clones the repo for each app and soon gets 429.
Als Antwort auf Torsten Grote

@grote
This is interesting feedback. There have been no changes to the rate-limiting, and the last two changes over the past three months were always increases.

We have blocked several offending IP ranges. Is there information about which hosting providers Fdroid uses?
@fdroidorg

Als Antwort auf Codeberg.org

@grote @fdroidorg where are production buildservers are located is not public information, and it is not necessarily static. But I imagine it would be easy to figure out which IP address by looking at the logs on the codeberg side. We haven't been blocked before by any other git/scm hoster, to my knowledge.

Diese Webseite verwendet Cookies. Durch die weitere Benutzung der Webseite stimmst du dieser Verwendung zu. https://inne.city/tos