Personal adventures in software engineering. And sometimes other things.

Bluesky bot deep dive

As you may know, I made a Bluesky bot

So let’s go over the things I learned about ATProto, Bluesky and their API, and Heroku, as well as the weird things I ended up having to do in the code to make things work.

Stuff to know about Heroku

Before we cover the other things, let’s talk about Heroku. The primary reason is that the way that Heroku operates has dictated a variety of design decisions for the bot that allow certain things to be true while the bot is watching for and making posts.

Heroku builds, deploys, and provides environment vars for me

Heroku does a few things that are extremely convenient if not absolutely required for this use case:

  1. they’ll monitor a Github repo branch to pull, build, and deploy
  2. they have what they call Buildpacks, one of which supports building with tsc
  3. they have their own support for environment variables which are automatically reloaded into a fresh worker when they change

So the end result is that locally I’m just running ts-node for testing, but Heroku is pulling down a branch of my repo, running npm install and tsc to get a deployable package, putting it on a tiny instance, and then npm start to actually get the worker going. Secrets are pulled from environment variables so I can avoid hard-coding things I’d prefer people didn’t find just searching through Github (including the service URL).

Procfile basics

Heroku expects your project to include a Procfile which tells Heroku how to run your application. You can run a website or a web worker and, optionally, a clock. A website will only be started up when your site is requested (super cool for hobby sites like this one that never see traffic), while a worker can run forever or be periodically started by a clock. For my purposes, I have two options:

  1. worker only: start the app, run forever, it manages its own internal run interval
  2. clock + worker: start the clock, run forever, clock defines the run interval for instantiating and running a worker

So here’s the thing: a clock is running 24/7/365, then your worker is being spawned and using compute hours when it runs. So if I already want to run my worker every minute or so, I’m not saving any compute hours by running a clock and only periodically running my worker. To keep my hours low, I just set up a single long-running worker:

Procfile
worker: npm start

Then if I end up hacking more stuff onto my worker, I can just re-factor to keep myself sane but continue to use the single worker.

Workers get restarted periodically

The only real wrinkle here is that Heroku arbitrarily, about once a day, restarts worker instances. This is beneficial in that I don’t have to worry about memory management that might require periodic restarts, but less beneficial in that anything that I need to keep track of in memory will of course be periodically wiped.

I basically just figured I could overcome this and forged ahead.

My very elemental understanding of the AT Protocol

Okay, so Bluesky is one of those fancy modern social applications that is decentralized by way of the Authenticated Transfer Protocol (or ATProto for short).

I’m not going to dive deeply into this since

  1. it’s just plain not my area of expertise
  2. even my mental model for it is fuzzy and I don’t care to clarify it right now
  3. I’m not the type to build something on this

But the gist is something like: common protocol allows you to build arbitrary applications backed by shared data structures that allows a central, well-known access point (like Bluesky), to accept your authentication and access your data because it looks just like theirs, in order to effectively aggregate it in their own site (and vice versa). Pretty rough, probably partially incorrect understanding, I know, that’s fine.

In order to use Bluesky’s API, you basically need to understand that the client is not a Bluesky client, it’s an ATProto client, and the data structures are not lean because they’re intended to be generic and used widely for whatever your particular application is. So a Bluesky bot that simply posts random text periodically (like mine) is pretty straightforward, but the way that things are related is not as straightforward as I had thought it would be.

Interfacing with Bluesky and their API

Authentication and sessions

Authentication and re-authentication is very straightforward. Create a Bluesky account for your bot, generate an app password for it, authenticate with the ATP agent using that info stored in secrets and you’re set. It’s actually very straightforward:

let atpSessionData: AtpSessionData | undefined = undefined;
const agent = new AtpAgent({
    service: 'https://bsky.social',
    persistSession: (event: AtpSessionEvent, session?: AtpSessionData) => {
        atpSessionData = session;
    }
});
...snip...
await agent.login({ identifier: id, password: pass });

Then if we want to resume a session rather do a full login call:

await agent.resumeSession(atpSessionData);

I’m not sure how important it is that we resumeSession over login but I was having trouble with posts over multiple runs, and this is one of the things I added support for.

There are additional hoops to jump through to get post content

This is a problem I ran into I didn’t end up actually going to the trouble to solve since I had no interest in doing additional work for proper OAuth when a simple sign-in was working for posting. I initially thought I could get my own post history but I could only get records with DIDs. Since I can’t check the content of the posts without doing extra work to then pass those DIDs off to get content (which must be possible, but I assume only through OAuth endpoints), I had to come up with a way to determine what I had and had not posted that didn’t actually require a diff.

Since I knew I had a long-running web worker, I ended up simply noting start-up time for my web worker and keeping a record of handled advisory IDs. Once all was said and done, I had a check like this:

const insideRunInterval = postedInIntervalMs(serviceAdvisory.Timestamp, RUN_INTERVAL_MS);
const insideStartUpIntervalAndNotPosted = postedSinceStartUp(serviceAdvisory.Timestamp) && knownPostedIds.indexOf(serviceAdvisory.Id) < 0;
return insideRunInterval || insideStartUpIntervalAndNotPosted;

It makes it possible for the bot to miss posts, but it’s rather unlikely so long as it’s running all the time.

I ended up having to work in my own rate limiter

Like any sane API, public or otherwise, Bluesky has rate limits. My initial look at these indicated that, depending on how they bucket calls, I could possibly just get away with rapid-fire posting whenever advisories are posted. Sometimes ten or more can be posted, and my bot is already operating on a delay, so two or three might have been naturally posted since their site was last updated.

I found that when I was posting a bunch, I would actually get responses from Bluesky indicating success, but then the posts would not be there. I assumed this was due to rate limits, so I give to you the weirdest thing that I think is in this code:

return Promise.all(posts.map(async (post, index) => {
	// force a wait
	await new Promise(resolve => setTimeout(resolve, 500 * index));

	return agent.post({ text: post.message })
		.then((response) => {
			return post.id;
		});
}));

Yep, I used setTimeout to force a 500 ms interval between posts. But it worked!

Other notes

Dates reflecting a certain timezone

The public API I was using was providing ready-to-print timezone-adjusted strings, so I ended up having to use timezone-adjusted nows for the server to compare and figure out if things need to be posted. Little bit of a weird quirk.

So, to get an LA timezone-adjusted now, you end up with this sort of grisly thing:

export function getPtNow() {
    return new Date(new Date().toLocaleString('en-US', { timeZone: 'America/Los_Angeles' }));
}

But then I was able to just convert the API’s timestamps to Dates, have my own timezone-adjusted now and then everything was fine.

Logging

Originally I followed what I’d call the hobby standard, which is just a bunch of console.logs to help me keep track of operations. Yes, we should all use the debugger, no, we do not always feel like it’s a good use of time. Once all was said and done, after troubleshooting locally and then “in prod,” I ended up with something far more robust than I originally thought I’d want:

export class Logger {

    private readonly logLevel: Logger.LogLevel;

    constructor(logLevel: string | undefined) {
        if (logLevel) {
            this.logLevel = Logger.LogLevel[logLevel.toUpperCase() as keyof typeof Logger.LogLevel];
        } else {
            this.logLevel = Logger.LogLevel.DEBUG;
        }
    }

    debug(message?: any, ...optionalParams: any[]): void {
        this.shouldLog(Logger.LogLevel.DEBUG) && console.debug(`DEBUG ${message}`, ...optionalParams);
    }

    ...snip info, warn, error...

    shouldLog(level: Logger.LogLevel): boolean {
        return level.valueOf() >= this.logLevel.valueOf();
    }
}

export namespace Logger {
    export enum LogLevel {
        DEBUG = 1,
        INFO = 2,
        WARN = 3,
        ERROR = 4
    }
}

Log level is set in Heroku’s environment variables so that I can, without a deploy, adjust the level and monitor things.

Conclusion

It was harder to get things working with the service API I ended up using than to post things to Bluesky. Overall dumb simple, easy to get started, probably much more complex if I wanted to write a whole client (I don’t).

At this point, I’ve just been thinking about other things that I might want to use my web worker for…