1. This particular script is client-side, so it reads directly from the Jetstream WebSocket (rather than any service that I host).
2. I also have another server-side version that does store data. But in both cases, the script only captures the ATProto URI and alt text, not the image content itself. The media link is present in the Firehose data but I don't record or fetch it.
Some of the resulting data does ends up being moderated later, e.g. when I went to review suspected bots, some of the accounts had already been banned on the main Bluesky instance.
I may be wrong, but my understanding of the firehose is it is pre-moderation; don’t you worry that your workload may read horrific image content?
Two reasons I'm not too worried about it:
1. This particular script is client-side, so it reads directly from the Jetstream WebSocket (rather than any service that I host).
2. I also have another server-side version that does store data. But in both cases, the script only captures the ATProto URI and alt text, not the image content itself. The media link is present in the Firehose data but I don't record or fetch it.
Some of the resulting data does ends up being moderated later, e.g. when I went to review suspected bots, some of the accounts had already been banned on the main Bluesky instance.