Personal adventures in software engineering. And sometimes other things.

I made a Bluesky bot

References / TLDR

The bot posts to Bluesky here.

The repo is here.

I’ve been trying to move away from Twitter.

I don’t want to get too much in the weeds here, but Twitter is just filled with garbage these days. The things it thinks I want to see no longer match up with what I actually want and I don’t like being there, so I’m out. However…

One thing Twitter is still the best at is getting live updates for real-life things I want to know about minute-by-minute. For me, that’s train schedules. If I’m catching a train, I can follow an appropriate Twitter account and get an update when my train is 5, 10, or 60 minutes late. And that’s very useful information.

But Metrolink only posts to Twitter, and whatever tools they’re using causes a 5+ minute delay if you’re just looking at their site manually (so far as I can tell they’re just using default Cloudlfare cache settings, which is not great).

First things first, conceptually what am I doing?

  1. Node application that runs on a schedule in the cloud
  2. Run a headless browser
  3. Load Twitter, scrape content, somehow figure out what we haven’t reposted
  4. Re-post to Bluesky

Some of the open questions at this point are:

  • What are my options for cloud running?
  • What headless browser stuff is available for Node?
  • How intense is it going to be to scrape Twitter?

First attempt: let’s scrape Twitter!

You know how insanely difficult it is to scrape Twitter? Like, they’ve really put the work in. I’m sure it’s possible, but for a random person just trying to get train updates, it’s not worth trying to scrape. But it did get my project going.

I started with:

  • Node project using ts-node so I can write clean, organized, smile-inducing code.
  • puppeteer for headless browser to scrape stuff with

Got a Twitter account sorted, got a CSS selector sorted for pulling posts, got Puppeteer loading Twitter and logging in and then…my account got locked. I mean, clearly it was going to happen, but it happened within a couple test runs over the course of an hour and I don’t think that’s going to be sustainable.

Account creation is more intense than I can automate, and if I’m running this in the cloud then I’m going to be logging in and making a specific request on a schedule that is 100% going to get my account insta-locked every second or third run. If I want to repost with speed, that’s not viable. Next idea.

Using puppeteer I got this fully working locally. But then I started looking at how I’m going to host this thing and I realized I’ll have to do some heavy lifting to build a host environment with Chrome installed for Puppeteer to use. Google Cloud Run uses Docker images, so I can probably just do that? But if I’m being honest that just feels so heavy…

So instead I installed jsdom and made a simple fetch to see if I could get that working. And it did! That’s cool. But the page makes an async request for advisory info, so I need that to complete.

Wait a second…the request they make is just an open service call. And it returns plain JSON! This is great! I don’t need any of this shit!

Finally: let’s just use this open API call

So here’s what we have:

  1. Node app using Typescript that needs to get built as part of the deploy process
  2. Simple fetches for JSON data

This makes for a very slim package.json with no heavy dependencies that require the host environment to be able to do anything other than run tsc and then npm start. Nice. Clean. Simple.

I still need a host, though. And I’m not sure that I want to use Google Cloud Run because then I have to do the whole Docker dance and that’s just not a fun part of software development for me. Don’t me wrong, Docker is very cool and useful and largely good, but I’m trying to be super lean here and I’d prefer not to have to do that.

Enter Heroku

I started looking for lightweight cloud hosts and stumbled upon the “free” set, where you get a certain number of compute hours for a low flat fee. Heroku used to allow 1000 compute hours for free (that’s wild) but then word got out that you could run a Discord bot for free and you can imagine what happened. Anyway, they still have a flat $5/mo 1000 hours of compute plan, plus they handle deploys for you straight from Github, plus they have a built-in ts-node builder so now I have:

  1. Typescript Node app that Heroku builds for me
  2. Auto-deploys from a target Github branch
  3. No Docker, no images, no special host setup
  4. Single file in the project that tells Heroku how I want the app to run
  5. Run 24/7/365 for a flat $5/mo
  6. A shiny host environment that’s running 24/7 that I can add whatever nonsense I want to

Nice. Clean. Simple.

The goods

The bot posts to Bluesky here. Unfortunately with the caching the post delay is notable: 5-30 minutes. I was running every minute but that feels like a lot more traffic than I want to put on them, so I settled on four minutes once I got all the kinks worked out.

The repo is here if you want to poke around. If I have the motivation to do so, I’ll put together a post that dives into the code. Not the cleanest thing I’ve ever put together, but working within the parameters I have from Heroku I’m pretty happy with it and at this point it has no glaring issues that require me to do anything to maintain it. It just works.