Resolving identity
Backend & Systems Engineer
Sish
I like knowing why things break more than just making them work.
focus: concurrency · correctness · event-driven design
status: learning in public
Sish

I was eleven when I started writing scripts. Not to build anything useful, I just wanted to see how the stuff I used every day actually worked under the hood. That curiosity pulled me in further than I expected.

I taught myself through trial and error, mostly by breaking things and figuring out why. I ended up gravitating toward backend work because that is where all the real responsibility lives. Right now I am mostly thinking about event-driven architecture and the problems that come with concurrent state — things like race conditions, ordering guarantees and what happens when a system partially fails.

Almost everything I know came from building something and watching it fall apart in an interesting way. The question that stuck with me is not what should this do but what happens when something goes wrong.

Sish
Stats · 2026
4
Production systems
~4.5 years
Writing software
Node.js
Primary runtime
Full Stack
Focus area
JavaScript4–5 yrs
TypeScript2–3 yrs
HTML4 yrs
CSS4 yrs
Python2 yrs
Bash / Shell2 yrs
Automation Platform
Mikami
A Discord bot I kept adding features to until the whole thing became impossible to touch. So I rebuilt it.
Modules register themselves at startup so the router never needs to know what features exist
Per-user locks stop two commands from stepping on each other's state
Rate limit responses get queued and retried at the right time instead of immediately
Open system docs →
Probabilistic Engine
AutoBets
A betting engine that broke in a way I did not expect, then taught me what concurrent state actually means.
Balance read happens inside the lock so it cannot go stale between check and use
Ledger write has to succeed before any balance is touched, no exceptions
Every transaction is replayable from the ledger
Open system docs →

Case studies.

These are things I built to figure out how software actually behaves once it is running. None of them stayed simple. Each one broke in a way I did not predict, and that is where the learning happened.

Automation Platform
Mikami
Started as a Discord bot, ended up as a lesson in what software architecture actually means when a codebase starts fighting back.
Module isolation so adding one feature cannot break another
Rate limit scheduling that treats 429s as timing signals, not errors
Per-user concurrency locks on shared state
Read the case study →
Probabilistic Engine
AutoBets
A wager system that exposed every wrong assumption I had about concurrent state and made me rebuild the whole pipeline.
TOCTOU prevention by reading inside the lock, not before it
Ledger write as a hard precondition before any state changes
Failure paths that exit cleanly before touching anything
Read the case study →
Generation System
Dork Generator Pro
A tool that takes a small set of inputs and expands them into thousands of structured search queries. The interesting part was keeping the output useful at scale.
Combinatorial query construction from pools of operators and templates
Deduplication that holds across large output sets
Generation logic, config, and interface kept completely separate
Read the case study →
CLI Toolkit
Dorky-Dorker
A CLI toolkit for generating and processing search queries. What started as one script became eight isolated tools when the data kept growing.
Template placeholders that multiply the output space without extra input
Staged pipeline processing to avoid memory pressure at scale
Each tool runs as its own subprocess so failures stay contained
Read the case study →
AutomationDiscord APINode.js
Mikami
Visit
A Discord automation platform and what it taught me about writing software that can actually change.
01
Problem

The first version of Mikami worked fine until I tried to add something new. Every feature was tangled up with every other one. Fixing a bug in the moderation logic would quietly break the economy system. There was no safe place to make a change because everything knew too much about everything else.

I did not have a bot with separate features. I had a single file that happened to respond to Discord events. The frustration was specific enough that I wanted to actually understand what people meant when they talked about software architecture, not just read about it.

02
System Design

The redesign flipped the dependency direction. Instead of the router importing modules, modules register themselves into the router at startup. The router just dispatches events to whatever handlers have been registered. It never needs to know what features exist. Adding a module means writing one file and calling register. Nothing else changes.

// modules announce themselves at startup loader.register([ ModerationModule, // owns its handlers, validators, data EconomyModule, // isolated, knows nothing about Moderation AnalyticsModule, // observes events, never blocks them ]) event → router.resolve(cmd) // find the registered handlervalidator.check(args) // validate before any module sees itmiddleware.permissions() // check accessmodule.execute(ctx) // dispatch to the right moduleanalytics.track(result) // async, never on the critical path
03
Key Challenge

The rate limiting problem was the most annoying to figure out. The first time Mikami hit a 429, I caught the error and retried right away, which produced another 429, which I retried again. I made it worse before I understood what was actually happening.

A 429 is not a failure. It is the API telling you exactly when to send that request. Once I saw it that way, the fix was obvious: give each route its own queue, read the retry-after header, and hold the request until the bucket opens back up. Nothing gets dropped. The rate limiter went from causing cascading errors to being invisible.

04
What I Learned

I had read about dependency inversion before building Mikami. I did not really get it until I had to live with the alternative. When the core does not depend on its features, you can add, remove or replace a feature without touching anything else. The direction of the dependency matters as much as the dependency itself.

Gateway disconnects are not edge cases. Discord has a resume protocol because disconnects are expected. Treating reconnection as the default path rather than the fallback changed how reliable the whole thing felt in practice.

Game EngineTransactionsConcurrency
AutoBets
GitHub
A wager system that made me rebuild everything once I understood what concurrency actually does to shared state.
01
Problem

AutoBets started simple. User submits a wager, system rolls an outcome, balance updates. It worked fine when only one person was betting. When two users submitted wagers at the same moment, both pipelines read the same balance, both passed validation, both committed. The balance was decremented twice from the same starting value and nothing threw an error.

The system did exactly what I told it to. The issue was I had not thought about what it means for two things to touch the same state at the same time.

02
System Design

Every wager became a transaction where each step only runs if the previous one succeeded. The most important decision was where to put the balance read relative to the lock. It has to happen inside the lock. Reading before acquiring it means the value you are acting on can be stale by the time you hold it.

// every step depends on the one before it wager.submit(userId, amount) → validator.check(amount) // fast reject, no lock yetlock.acquire(userId) // lock firstbalance.read(userId) // read inside lock, always freshvalidator.checkBalance() // validate against real current staterng.generate(params)ledger.write({...}) // must succeed, precondition for mutationbalance.mutate(userId)lock.release(userId)
03
Key Challenge

The hardest thing to figure out was where the lock boundary had to be. My original code read the balance before acquiring the lock. That sounds fine until you think about what can happen in the gap between the read and the lock. Another pipeline runs, modifies the balance, releases. Now your read is stale and you are about to act on a number that no longer exists.

Moving the read inside the lock closed that gap completely. The second problem was the order of the ledger write and the balance update. I originally updated the balance first. If the ledger write then failed, the balance had changed with no record anywhere. Flipping the order meant a failed ledger write leaves everything clean. A failed balance update after the ledger write leaves an orphaned record that is findable and fixable. I chose the failure I could live with.

04
What I Learned

Concurrency bugs are quiet by nature. The system does not know it is wrong. That is what makes them genuinely worse than regular bugs. The thing that stuck with me is that a lock and a read are not two separate decisions. They are one unit. Separating them creates the exact window you were trying to close.

The ordering lesson was broader. When you sequence operations you are choosing which failures are possible. There is no arrangement that makes all failures go away. The question is which failure leaves you in a state you can actually reason about. Silent corruption is always the worst outcome, not because it is the most dramatic but because you might not even know it happened.

Generation SystemPythonPyQt6
Dork Generator Pro
GitHub
A search operator tool that turns a small input set into thousands of structured queries. The real problem was keeping that output useful rather than just large.
01
Problem

Search engines have advanced operators like site:, filetype:, inurl: that are useful for security research and SEO. One keyword combined with a reasonable set of operators and parameters can produce hundreds of distinct patterns. Writing them by hand gets tedious fast.

I wanted to automate the construction and generate thousands of structured, deduplicated queries from a small input set. The interesting engineering problem was not the searching. It was the generation itself and how to keep the output meaningful as the numbers got large.

02
System Design

Three isolated layers. The generation engine handles the combinatorial logic and knows nothing about the interface. Configuration controls the engine without touching its internals. The PyQt6 interface handles input and output without any knowledge of how queries actually get built.

// generation pipeline config.load(keyword, queryType, fileTypes, size) engine.generate(config) → template.select(queryType) // pick a pattern shapepools.sample(params, fileTypes) // fill in the valuesquery.construct(template, values)// build the query stringdedup.check(query) // reject if already seenoutput.append(query) // add to result setrepeat until output.size === config.size
03
Key Challenge

The problem was combinatorial explosion. When a template has multiple placeholders and each one draws from a list of values, the number of possible outputs grows multiplicatively. Four placeholder types each with 20 options gives 160,000 combinations. The naive approach builds all of that in memory before writing anything, which at that scale just crashes.

I fixed it by expanding one placeholder type per pass and writing the intermediate output to disk before moving to the next pass. The working set in memory at any point is one intermediate file, not the final product. That kept memory bounded regardless of how large the target output was, and the generator could scale without any changes to the engine itself.

04
What I Learned

The quality of a generator's output is decided before generation starts, not during it. A generator sampling randomly from poorly designed pools produces a lot of technically unique results that are not actually useful. The interesting work is in what goes into the pools and how templates are shaped. Everything after that is just mechanics.

Staged writes are not an optimisation. When the output space is too large to hold in memory, the right structure is to never let it exist all at once. That is the correct model, not a workaround.

CLI ToolkitPythonData Processing
Dorky-Dorker
GitHub
A CLI toolkit for generating and processing search queries. What started as one script grew into eight when the data and the feature needs kept expanding.
01
Problem

The Dork Generator produced large query datasets and once I had them I needed to work with them — shuffle, split, extract, check. Each operation became its own script. After a few of them I had a pile of disconnected tools with no consistent way to run them or chain them together.

The friction was not any individual tool. It was the lack of structure around them. I wanted to understand how you keep a growing set of small programs usable as the collection gets bigger without each one turning into its own maintenance problem.

02
System Design

Each tool is a self-contained script with one job. A CLI launcher acts as a dispatcher. It shows a menu, reads the selection, and invokes the chosen module as a subprocess. The launcher has no idea what the tools do. It only knows their names and entry points. Adding a new tool means writing one script and registering its name. The launcher never changes.

// launcher dispatches to isolated subprocesses launcher.run() → menu.display(modules) // list registered toolsmenu.select() // user picks onesubprocess.run(module.path) // invoke as isolated processmodule executes independently // no shared state with launcherreturn to menu on exit // registered modules generate · shuffle · reverse · extract split · suggest · check · numgen
03
Key Challenge

The main design decision was whether modules should share state with the launcher or run as fully isolated subprocesses. Shared state is easier to wire up. Modules can read a common context, pass data around, build on each other's output. But it means every module is coupled to every other one through that shared context, and a bug in one can silently corrupt something another module depends on.

Subprocess isolation makes isolation the default. Each module runs in its own process, reads its own inputs, writes its own outputs and exits. There is no shared state to corrupt. The tradeoff is you cannot directly pass data between modules at runtime. But a broken module fails visibly in its own process without touching anything else. That is a trade I will take every time.

04
What I Learned

The subprocess dispatch model made the toolkit easy to extend almost by accident. Because the launcher has no knowledge of the modules, adding a tool requires zero changes to existing code. Once I understood why that worked, I could think about it deliberately rather than stumbling into it. A system is easy to extend when adding a feature means writing something new rather than modifying something existing.

Isolation also changes what failures look like. Shared state systems fail in tangled ways that are hard to trace. Isolated systems fail locally and visibly. When debugging eight independent tools, knowing exactly which one failed and exactly what state it was in makes an enormous difference.

How I think
through problems.

Problems I ran into while building, did not immediately understand, and had to actually work through. Writing them down is how I figure out what I learned.

Problem

In AutoBets, two simultaneous wagers from different users could both pass balance validation and both commit because each one read a balance the other had not modified yet. The resulting state was wrong. Nothing had thrown an error.

This is a TOCTOU race. The state you checked at the time of check is not the state you act on at the time of use. The window between those two moments is where the bug lives.

Reasoning

A wager is not a single calculation. It is a sequence of steps that need to behave as one indivisible unit. Once I saw it that way the fix became obvious. The lock has to come before the read, not after. The read and the mutation have to happen inside the same window with nothing in between.

Solution
  1. 1.Acquire per-user mutex lock before reading anything
  2. 2.Read and validate the balance inside that window
  3. 3.Generate and validate the outcome
  4. 4.Write the ledger record
  5. 5.Mutate the balance
  6. 6.Release the lock

The lock comes first. Every step that depends on the balance happens inside it. A second wager for the same user cannot start until the first has fully committed and released.

Problem

Discord enforces rate limits per route bucket. When Mikami hit them, I caught the 429 and retried immediately. Which produced another 429. Which I retried again. The backlog grew and the problem compounded. I made it worse before I understood it.

Reasoning

A 429 is the API telling you to wait a specific amount of time and try again at that exact moment. Once I thought of it that way, the right approach was clear. Treat outbound requests as a queue, give each route bucket its own cooldown state, and schedule retries at the right moment rather than immediately.

Solution
  • All outbound requests go through a central scheduler rather than being sent directly
  • Each route bucket tracks its own remaining count and reset window independently
  • A 429 causes the request to be requeued at the exact retry-after time, nothing gets dropped
  • Bursts of activity queue cleanly instead of producing cascading errors
Problem

Early AutoBets updated the balance first, then wrote the transaction record. I did not think about the ordering until I asked what happens if the record write fails. The balance would have changed with no trace of why. No way to detect it. No way to reconstruct it.

Silent inconsistency is the worst kind of failure. Not because it is dramatic but because it is invisible.

Solution

Reversing the order changed the failure mode entirely. The ledger write became a precondition. If it fails, nothing else runs. The balance is never touched without a record.

validation → RNG → ledger.write() must succeed firstbalance.mutate() only reaches here if above ok

A failed ledger write leaves the system clean. A failed balance mutation leaves an orphaned record, which is detectable and fixable. Either way there is no silent inconsistency.

Problem

The first Dork Generator substituted all placeholder types in one pass and built the entire dataset in memory before writing anything. With four placeholder types drawing from lists of 20 to 50 values, outputs reached hundreds of thousands of entries. The process was killed by memory pressure before finishing.

Reasoning

The output does not need to fully exist before any of it is written. Each intermediate stage only needs to exist long enough to produce the next one.

Solution
  • One substitution pass per placeholder type, expand one variable, write to disk, move to next pass
  • Memory at any point is proportional to a single intermediate file, not the final product
  • Output can grow to any size without changing the engine
  • Each pass is independently inspectable which makes debugging much easier
Problem

My first instinct for Dorky-Dorker was to have the launcher share state with its modules through a common context object. That made inter-tool communication easy but it also meant a bug in one module could silently corrupt state that another module depended on. Failures became entangled and tracing them was genuinely painful.

Reasoning

Shared state makes the failure surface of one module equal to the failure surface of the whole system. If a tool runs in isolation with its own process, its own inputs and its own outputs, its failure mode is local by construction.

Solution
  • Each module runs as an independent subprocess with no shared memory with the launcher
  • Modules communicate through files only, read inputs, write outputs, exit
  • A failing module crashes visibly in its own process without touching anything else
  • Adding a new tool requires zero changes to existing code, just register a name and write a script

What each system assumes to be true.

Every system rests on assumptions its design does not verify. Writing them down is how you start to know where it will break.

Mikami
  • Gateway delivers events in order within a session
  • Each user has at most one active command pipeline at a time
  • Module handlers are stateless and all state lives in the data layer
  • All API requests must respect Discord route bucket limits
  • Analytics events are non-critical and loss is acceptable but delay is not
AutoBets
  • Each user has a single active wager pipeline at any moment
  • Ledger writes must succeed before balance mutation with no exceptions
  • Balance reads must occur inside the lock window, never before acquiring it
  • RNG failure must exit before any state mutation occurs
  • Transaction records are immutable once written
Dork Generator Pro
  • Pool sizes must be large enough that collision probability stays low at target output size
  • Templates define shape only and pools define values and neither layer touches the other
  • Configuration is read-only to the engine and it never modifies its own config
  • Deduplication set is the source of truth for uniqueness and no query bypasses it
Dorky-Dorker
  • Each module is an isolated subprocess with no shared state with the launcher
  • Intermediate files between staged passes are the only cross-pass communication
  • Preset lists are the engine's only external dependency and are swappable without code changes
  • Launcher never imports module code and only invokes by path

Failure modes
and responses.

I find it useful to think through what can go wrong before it does. This is a record of the failure scenarios I have considered for these systems, what breaks, how the system responds, and what that response actually guarantees.

Failure Scenario System Response Guarantee
Network
Discord drops the gateway connection
Resume session using the last known sequence number. Discord's resume protocol replays missed events. If resume fails, full reconnect.No events dropped on clean disconnect
Integrity
Ledger write fails mid-wager
Ledger write is a hard precondition for balance mutation. If the write fails, execution stops. Balance never changes. Clean state.No balance mutation without a ledger record
Integrity
RNG generation fails
Outcome generation is validated before any state is touched. Failure exits the pipeline. No lock held. No balance mutated.Wager rejected before any state mutation
Concurrency
Two wagers from same user simultaneously
Per-user mutex ensures only one pipeline runs per user. Second request blocks until first commits. Reads the already-updated balance and is TOCTOU safe.Strict serial ordering per user
Network
API request hits rate limit (429)
Scheduler reads retry-after header and requeues the request. Not dropped. Not retried immediately. Route bucket cooldown is respected.Every queued request eventually completes
Concurrency
User wagers more than their balance
Two-stage validation. Pre-lock check rejects obvious overdrafts fast. Second check runs inside the lock against the live balance. The in-lock check is what actually prevents overdraft under concurrency.No overdraft under any concurrency level
Integrity
Generator output approaches combinatorial space size
Deduplication set rejects collisions. Pool sizes are designed so collision probability stays low well past the target output size. Generator keeps sampling until target is reached.Output contains no duplicate queries
Integrity
Staged expansion pass runs out of memory mid-generation
Each pass writes its output file before the next pass begins. A failed pass leaves the previous intermediate file intact. The run can be resumed from the last completed pass.No data loss on partial expansion failure
Network
A Dorky-Dorker module crashes or exits with an error
Module runs as an isolated subprocess. Crash is contained to that process. Launcher stays running, returns to the menu, and can invoke any other module normally.Module failure never affects the launcher or other tools
Integrity
Preset list file is missing or malformed at generation time
Engine validates preset files at startup before starting any substitution passes. A missing or unreadable file causes an immediate exit with a clear error. No partial output is written.No silent generation with incomplete inputs

Notes on things
I had to think through.

Writing is how I test whether I have actually understood something. These are problems I ran into while building, written the way that made them click for me.

/notes/race-conditions
TOCTOU: why where the read sits matters as much as the lock itself

TOCTOU stands for Time-Of-Check to Time-Of-Use. It describes the gap between when you observe a piece of state and when you act on that observation. In that gap another operation can change what you saw.

In AutoBets, two users submitted wagers at the same time. Both pipelines read the same balance, both found it sufficient, and both committed. The balance got decremented twice from the same starting value because neither operation was aware of the other.

Adding a mutex is not enough on its own. The question is where the read happens relative to acquiring the lock. If you read before the lock, the value you're acting on can go stale by the time you hold it. The read has to happen inside the protected window to mean anything.

// wrong, read before lock balance = read(userId) // TOCTOU window opens here lock.acquire(userId) validate(balance, amount) // stale // correct, read inside lock lock.acquire(userId) balance = read(userId) // always fresh validate(balance, amount) // safe

The ordering is the guarantee. The lock and the read are not two separate things. They are one unit. Once I understood that, the rest of the wager pipeline was straightforward to reason about.

/notes/discord-rate-limits
Rate limits as scheduling signals, not failures

Discord enforces rate limits per route bucket. Each endpoint has its own limit, its own window, and a retry-after header it sends back when you have exceeded it. The first time I hit a 429 in Mikami, I caught the error and retried. Which caused another 429. Which I retried again.

The shift that fixed it was realising a 429 is not an error in the usual sense. It is a scheduling instruction. The API is working exactly as intended and telling you to send that request at a specific point in the future. Treating it as a failure to retry immediately completely misses the point.

scheduler.enqueue(request) → check bucket.remaining → if depleted: wait(bucket.resetAfter) → send request → on 429: requeue after retry-after → on success: decrement bucket.remaining

Structuring the outbound layer as a per-bucket queue solved both problems at once. Requests get sent at the right time, and a burst on one endpoint does not affect timing on another. Nothing gets dropped and everything gets scheduled.

/notes/transaction-ledgers
Ordering two operations to choose which failure mode you can live with

If you update the balance first and the ledger write then fails, you have a state change with no record of it. No error to surface, no audit trail, no way to know what the correct state should be. The inconsistency is silent by construction.

Reversing the order changes the failure mode. If the ledger write fails, nothing else runs and the balance is untouched. If the ledger write succeeds and the balance update then fails, there is an orphaned record. That is recoverable. You can detect it by comparing records to balances and replay or reverse it. You know exactly what happened.

// correct ordering ledger.write({ userId, amount, outcome, balanceBefore, balanceAfter }) // record intent balance.mutate(userId, delta) // only if write succeeded // ledger fail → clean state // balance fail → orphan, detectable, replayable

The principle that came out of this: sequence operations so the worst reachable failure leaves you with something you can reason about. Silent inconsistency is harder to deal with than a detectable error, even if the detectable error is messier at first.

/notes/combinatorial-generation
Why the output space of a generator has to be designed, not discovered

When you combine inputs from multiple pools, the output space grows multiplicatively. A template with four placeholder types each drawing from 30 values can produce 810,000 combinations. That sounds like abundance but the structure of that space determines whether the outputs are useful, and random sampling cannot fix a poorly designed space.

The problem I ran into was that a naive generator samples randomly and rejects duplicates. That works when the output size is small relative to the combinatorial space. As the target size approaches the space size, the generator spends more and more time producing duplicates and discarding them. If you ask for more outputs than the space contains, it loops forever.

The fix was recognising that deduplication is not where this problem should be solved. Deduplication catches collisions after the fact. The real solution is pool and template design that keeps the space large enough that collision probability stays negligible at any realistic target size. The generator's job is mechanical. The design work happens before it starts.

// collision probability rises as output approaches space size space_size = |pool_A| × |pool_B| × |pool_C| × |templates| target_size = what you asked for // safe: target much smaller than space_size collision_prob ≈ target² / (2 × space_size) // danger zone: target approaches space_size // generator slows, eventually loops indefinitely

Staged expansion handles a different but related problem which is memory pressure. Expanding one placeholder type per pass and writing intermediate results to disk keeps memory proportional to one file. These are two separate concerns. Output space design prevents logical failure. Staged processing prevents resource failure.

/notes/process-isolation
Shared state makes failures entangled. Isolated processes make them local.

When I first designed Dorky-Dorker, the plan was to have the launcher maintain a shared context that modules could read from and write to. A tool finishes, leaves its results in the context, the next tool picks them up. Intuitive and easy to think about.

The problem showed up when something went wrong. A module that crashed or produced unexpected output did not just fail. It left the shared context in a partially modified state. The next module read that state, behaved unexpectedly, and produced output that was wrong in a way that looked plausible. Debugging meant tracing through several tools to find where the corruption started.

Shared state couples the failure modes of every component that touches it. One module's bug becomes every subsequent module's problem. The failure surface of the system is the union of all its parts.

// shared state model, failure propagates launcher.context = {} module_A.run(context) // corrupts context on error module_B.run(context) // reads corrupted state, wrong output // subprocess model, failure is contained subprocess.run('module_a.py') // crashes in its own process subprocess.run('module_b.py') // unaffected, reads its own inputs

Subprocess isolation makes each module's failure local by construction. A crash stays in that process. The launcher catches the exit code, reports it, and continues. No shared state is corrupted. Every other module is unaffected. The cost is that modules cannot share data in memory and communication goes through files. That is a real constraint but predictable local failures are worth it.

Get in touch.

Open to conversations about backend engineering, interesting system problems, or anything worth building. I respond to every message.

Currently Available

I am sixteen and still early in all of this. But I have a real foundation in backend architecture, a clear sense of how I approach problems, and a genuine interest in working on things that require careful thinking. If that sounds useful, reach out.

Backend Engineering Bot Development Systems Design Open Source