“Native” multi-process debugging#

tractor ships the thing every multiprocessing user has wished for and quietly assumed was impossible: a multi-process debugger that just works.

Drop await tractor.pause() — or, with greenback installed, a plain builtin breakpoint() — anywhere in any actor: the root, a child, a grandchild, a sync helper function, even an asyncio task inside an “infected” actor. A full-featured pdbp REPL opens in that process, with syntax-highlighted source listings, tab completion and sticky mode, attached to your one terminal.

Under the hood every REPL entry acquires a tree-global tty mutex via an IPC request to the root actor, so prompts from concurrent pauses and crashes never interleave. ctrl-c is shielded while any REPL is live, so a stray SIGINT can’t vaporize the tree out from under you. And in debug mode any uncaught error drops you into a crash REPL first in the failing child, then again at each parent as the boxed RemoteActorError climbs the supervision tree.

No remote-pdb sockets, no set_trace() port juggling, no ptrace attach dance: the debugger semantics you already know, transparently extended across an entire process tree. Because tractor is a structured concurrency (SC) runtime, the debugger composes with supervision instead of fighting it — quit a REPL and errors keep propagating exactly like trio taught you, ending in clean, zombie-free teardown.

We’re pretty sure it’s the (first ever?) “native” debugging UX for multi-process Python B)

Enabling debug mode#

Pass debug_mode=True to your runtime entrypoint, either tractor.open_nursery() (which forwards it to the implicitly opened root actor) or tractor.open_root_actor() directly:

async with tractor.open_nursery(
    debug_mode=True,    # arm the whole actor tree
) as an:
    ...

This arms the debug machinery tree-wide:

crash handling is enabled in every actor: uncaught errors enter a REPL before they propagate,
the internal tty-lock module is auto-exposed over RPC to every subactor (this is what makes the one-terminal handoff work),
console logging is bumped to include PDB-level status msgs so you can see REPL acquire/release events as they happen.

You can instead flip it on for just one child, letting its siblings crash-and-burn the normal way:

portal = await an.start_actor(
    'sketchy_worker',
    debug_mode=True,    # OR-ed with the tree-wide flag
)

See examples/debugging/per_actor_debug.py for a runnable proof of the selective style.

Note

Debug mode requires the child-side runtime to be trio-native so that the tty-lock IPC dialog works; it’s currently supported on the 'trio' (default) and 'main_thread_forkserver' spawn backends and raises RuntimeError for any other start_method.

Your first pause point#

tractor.pause() is the SC-aware, multi-process spelling of the stdlib’s breakpoint(). In the root actor it looks almost boring:

examples/debugging/root_actor_breakpoint.py#

import trio
import tractor

async def main():

    async with tractor.open_root_actor(
        debug_mode=True,
    ):

        await trio.sleep(0.1)

        await tractor.pause()

        await trio.sleep(0.1)

if __name__ == '__main__':
    trio.run(main)

Run it and you get a (Pdb+) prompt parked on the pause() line; type c (continue) and the program finishes normally.

The exact same call works from any subactor, no matter how deep in the tree:

examples/debugging/subactor_breakpoint.py#

import trio
import tractor


async def breakpoint_forever():
    '''
    Indefinitely re-enter debugger in child actor.

    '''
    while True:
        await trio.sleep(0.1)
        await tractor.pause()


async def main():

    async with tractor.open_nursery(
        debug_mode=True,
        loglevel='cancel',
    ) as n:

        portal = await n.run_in_actor(
            breakpoint_forever,
        )
        await portal.wait_for_result()


if __name__ == '__main__':
    trio.run(main)

Each loop iteration the child actor requests the terminal from the root over IPC, REPLs you, then releases it on c. Pause points are re-entrant-safe: repeat calls from the same task are no-op’d and other local tasks queue politely for the REPL.

When you get bored, type q (quit): the resulting bdb.BdbQuit is boxed and shipped to the parent like any other remote error XD — causality is preserved even for your debugging mistakes.

Crash REPLs: errors climb the tree#

Pause points are only half the story. With debug mode armed, any uncaught error anywhere in the tree triggers what we call crash handling mode:

examples/debugging/subactor_error.py#

import trio
import tractor


async def name_error():
    getattr(doggypants)  # noqa (on purpose)


async def main():
    async with tractor.open_nursery(
        debug_mode=True,
    ) as an:

        # TODO: ideally the REPL arrives at this frame in the parent,
        # ABOVE the @api_frame of `Portal.run_in_actor()` (which
        # should eventually not even be a portal method ... XD)
        # await tractor.pause()
        p: tractor.Portal = await an.run_in_actor(name_error)

        # with this style, should raise on this line
        await p.wait_for_result()

        # with this alt style should raise at `open_nusery()`
        # return await p.wait_for_result()


if __name__ == '__main__':
    trio.run(main)

What happens when the child hits that (very intentional) NameError:

a REPL opens in the crashed child first — you inspect the raising frame, its locals, the works, right inside the failed process,
when you quit, the error is boxed into a RemoteActorError and relayed to the parent,
the parent (here the root) gets its own crash REPL with the rendered remote traceback,
quit again and the nursery tears the tree down — errors keep propagating per SC rules, no zombies left behind.

You debug the failure at every hop of the supervision tree, which for multi-hop trees means you can chase an error from the leaf that raised it all the way up to the root that supervises it.

Need to skip REPL entry for certain exceptions? Pass a predicate via open_root_actor(debug_filter=...); by default cancellation-only exception (groups) don’t engage the REPL.

One terminal, many actors#

So how do N processes share one tty without garbling it? The root actor owns stdio for the whole tree and guards it with a FIFO mutex; every subactor REPL entry is an IPC lock request to the root. Exactly one actor-task in the entire tree can own the terminal at a time, so prompts never interleave — ever.

sequence diagram of two subactors serializing pdb REPL access through the root actor's tty lock — Every REPL entry serializes through the root actor’s tty lock; `continue`-ing one REPL hands the terminal to the next waiter, FIFO style.#

The runtime’s teardown paths cooperate too: a cancelling parent always waits for any live REPL to release before reaping children, so the debugger never gets yanked out from under you mid-keystroke.

Here’s the showpiece: one daemon child re-entering tractor.pause() forever inside a stream, while its sibling repeatedly raises a NameError:

examples/debugging/multi_daemon_subactors.py#

import tractor
import trio


async def breakpoint_forever():
    "Indefinitely re-enter debugger in child actor."
    try:
        while True:
            yield 'yo'
            await tractor.pause()
    except BaseException:
        tractor.log.get_console_log().exception(
            'Cancelled while trying to enter pause point!'
        )
        raise


async def name_error():
    "Raise a ``NameError``"
    getattr(doggypants)  # noqa


async def main():
    '''
    Test breakpoint in a streaming actor.

    '''
    async with tractor.open_nursery(
        debug_mode=True,
    ) as an:
        p0 = await an.start_actor('bp_forever', enable_modules=[__name__])
        p1 = await an.start_actor('name_error', enable_modules=[__name__])

        # retreive results
        async with p0.open_stream_from(breakpoint_forever) as stream:

            # triggers the first name error
            try:
                await p1.run(name_error)
            except tractor.RemoteActorError as rae:
                assert rae.boxed_type is NameError

            async for i in stream:

                # a second time try the failing subactor and this tie
                # let error propagate up to the parent/nursery.
                await p1.run(name_error)


if __name__ == '__main__':
    trio.run(main)

What you’ll actually see#

Running it looks roughly like this (uids, tracebacks and source listings elided; REPL order can vary with who wins the lock race):

$ python examples/debugging/multi_daemon_subactors.py

Opening a pdb REPL in paused actor: ('bp_forever', '<uuid>')
<highlighted source around the `await tractor.pause()` line>
(Pdb+) c

Opening a pdb REPL in crashed actor: ('name_error', '<uuid>')
<live traceback: NameError: name 'doggypants' is not defined>
(Pdb+) q

Opening a pdb REPL in crashed actor: ('root', '<uuid>')
<boxed RemoteActorError traceback relayed from 'name_error'>
(Pdb+) q

Two (then three) processes, one terminal, zero confusion: c-ing out of the paused daemon’s REPL releases the tty lock, which immediately hands the prompt to the crashed sibling; quit that and the error propagates as a fully-rendered RemoteActorError to the parent where one final crash REPL catches it before clean, zombie-free teardown.

For maximum drama run multi_nested_subactors_error_up_through_nurseries.py (under examples/debugging/) which pulls the same trick across a three-deep process tree — the tty lock keeps every prompt orderly the whole way up.

Post-mortem, on demand#

Crash handling is automatic, but you can also enter a REPL on a live exception manually with tractor.post_mortem() — the actor-aware equivalent of pdb.post_mortem() — from inside any except block in any actor (kwargs: tb= for an explicit traceback, plus shield= and hide_tb=):

examples/debugging/pm_in_subactor.py#

import trio
import tractor


@tractor.context
async def name_error(
    ctx: tractor.Context,
):
    '''
    Raise a `NameError`, catch it and enter `.post_mortem()`, then
    expect the `._rpc._invoke()` crash handler to also engage.

    '''
    try:
        getattr(doggypants)  # noqa (on purpose)
    except NameError:
        await tractor.post_mortem()
        raise


async def main():
    '''
    Test 3 `PdbREPL` entries:
      - one in the child due to manual `.post_mortem()`,
      - another in the child due to runtime RPC crash handling.
      - final one here in parent from the RAE.

    '''
    # XXX NOTE: ideally the REPL arrives at this frame in the parent
    # ONE UP FROM the inner ctx block below!
    async with tractor.open_nursery(
        debug_mode=True,
        # loglevel='cancel',
    ) as an:
        p: tractor.Portal = await an.start_actor(
            'child',
            enable_modules=[__name__],
        )

        # XXX should raise `RemoteActorError[NameError]`
        # AND be the active frame when REPL enters!
        try:
            async with p.open_context(name_error) as (ctx, first):
                assert first
        except tractor.RemoteActorError as rae:
            assert rae.boxed_type is NameError

            # manually handle in root's parent task
            await tractor.post_mortem()
            raise
        else:
            raise RuntimeError('IPC ctx should have remote errored!?')


if __name__ == '__main__':
    trio.run(main)

This example demos three REPL entries from one error:

the child’s manual post_mortem() inside its except,
the runtime’s automatic crash handler in the same child once the error re-raises out of the RPC task,
a manual post_mortem() in the parent on the received RemoteActorError, whose .boxed_type faithfully reports the original NameError.

Pausing from sync code#

No await? No problem. tractor.pause_from_sync() brings the same tree-aware REPL to plain synchronous functions — handy when the suspect code is three helpers deep and decidedly not async.

It’s powered by greenback, which is optional, so you need to:

install it (it ships in tractor’s sync_pause dependency group),
enable it at runtime entry:

async with tractor.open_nursery(
    debug_mode=True,
    maybe_enable_greenback=True,
) as an:
    ...

With that armed, sync code can pause from three different caller environments: the main trio thread, trio.to_thread bg threads, and (see the next section) asyncio tasks in infected actors. The greenback “portal” hops back into the trio loop to do the lock/REPL dance on your behalf:

examples/debugging/sync_bp.py (the sync fn, excerpt)#

def sync_pause(
    use_builtin: bool = False,
    error: bool = False,
    hide_tb: bool = True,
    pre_sleep: float|None = None,
):
    if pre_sleep:
        time.sleep(pre_sleep)

    if use_builtin:
        breakpoint(hide_tb=hide_tb)

    else:
        # TODO: maybe for testing some kind of cm style interface
        # where the `._set_trace()` call doesn't happen until block
        # exit?
        # assert get_lock().ctx_in_debug is None
        # assert get_debug_req().repl is None
        tractor.pause_from_sync()
        # assert get_debug_req().repl is None

    if error:
        raise RuntimeError('yoyo sync code error')

examples/debugging/sync_bp.py (called in a subactor, excerpt)#

@tractor.context
async def start_n_sync_pause(
    ctx: tractor.Context,
):
    actor: tractor.Actor = tractor.current_actor()
    disable_pdbp_color()

    # sync to parent-side task
    await ctx.started()

    print(f'Entering `sync_pause()` in subactor: {actor.uid}\n')
    sync_pause()
    print(f'Exited `sync_pause()` in subactor: {actor.uid}\n')

The full script also exercises the hairier root-actor bg-thread cases (and documents their remaining sharp edges) if you want the deep lore.

The builtin `breakpoint()` override#

When debug mode boots with greenback available, tractor wires Python’s PEP 553 hook so the builtin breakpoint() becomes the actor-aware sync pause, by exporting:

PYTHONBREAKPOINT=tractor.devx.debug._sync_pause_from_builtin

That means third-party and legacy code containing bare breakpoint() calls debugs correctly inside your actor tree with zero edits (the override even forwards kwargs like hide_tb to the underlying pause machinery, as shown in the excerpt above).

Warning

Without greenback (or with maybe_enable_greenback=False, the default), debug_mode=True instead blocks the builtin breakpoint(): sys.breakpointhook is swapped for a raiser and PYTHONBREAKPOINT=0 is set. A naive breakpoint() from some random process would clobber the shared tty, so we’d rather hand you a loud RuntimeError with install instructions.

Both the hook and the env var are restored to their prior values on runtime exit — see examples/debugging/restore_builtin_breakpoint.py for the proof.

Breakpoints inside `asyncio` tasks#

Yes, even “infected asyncio” actors get the goods. Spawn a child with infect_asyncio=True (trio runs as a guest on the asyncio loop inside it) and, with debug mode + greenback armed, every asyncio task started via tractor.to_asyncio is automatically granted a greenback portal — so a plain builtin breakpoint() (or tractor.pause_from_sync()) inside an asyncio.Task joins the same single-terminal, tree-locked REPL flow:

examples/debugging/asyncio_bp.py#

'''
Examples of using the builtin `breakpoint()` from an `asyncio.Task`
running in a subactor spawned with `infect_asyncio=True`.

'''
import asyncio

import trio
import tractor
from tractor import (
    to_asyncio,
    Portal,
)


async def aio_sleep_forever():
    await asyncio.sleep(float('inf'))


async def bp_then_error(
    chan: to_asyncio.LinkedTaskChannel,

    raise_after_bp: bool = True,

) -> None:

    # sync with `trio`-side (caller) task
    chan.started_nowait('start')

    # NOTE: what happens here inside the hook needs some refinement..
    # => seems like it's still `.debug._set_trace()` but
    #    we set `Lock.local_task_in_debug = 'sync'`, we probably want
    #    some further, at least, meta-data about the task/actor in debug
    #    in terms of making it clear it's `asyncio` mucking about.
    breakpoint()  # asyncio-side

    # short checkpoint / delay
    await asyncio.sleep(0.5)  # asyncio-side

    if raise_after_bp:
        raise ValueError('asyncio side error!')

    # TODO: test case with this so that it gets cancelled?
    else:
        # XXX NOTE: this is required in order to get the SIGINT-ignored
        # hang case documented in the module script section!
        await aio_sleep_forever()


@tractor.context
async def trio_ctx(
    ctx: tractor.Context,
    bp_before_started: bool = False,
):

    # this will block until the ``asyncio`` task sends a "first"
    # message, see first line in above func.
    async with (
        to_asyncio.open_channel_from(
            bp_then_error,
            # raise_after_bp=not bp_before_started,
        ) as (chan, first),

        trio.open_nursery() as tn,
    ):
        assert first == 'start'

        if bp_before_started:
            await tractor.pause()  # trio-side

        await ctx.started(first)  # trio-side

        tn.start_soon(
            to_asyncio.run_task,
            aio_sleep_forever,
        )
        await trio.sleep_forever()


async def main(
    bps_all_over: bool = True,

    # TODO, WHICH OF THESE HAZ BUGZ?
    cancel_from_root: bool = False,
    err_from_root: bool = False,

) -> None:

    async with tractor.open_nursery(
        debug_mode=True,
        maybe_enable_greenback=True,
        # loglevel='devx',
    ) as an:
        ptl: Portal = await an.start_actor(
            'aio_daemon',
            enable_modules=[__name__],
            infect_asyncio=True,
            debug_mode=True,
            # loglevel='cancel',
        )

        async with ptl.open_context(
            trio_ctx,
            bp_before_started=bps_all_over,
        ) as (ctx, first):

            assert first == 'start'

            # pause in parent to ensure no cross-actor
            # locking problems exist!
            await tractor.pause()  # trio-root

            if cancel_from_root:
                await ctx.cancel()

            if err_from_root:
                assert 0
            else:
                await trio.sleep_forever()


        # TODO: case where we cancel from trio-side while asyncio task
        # has debugger lock?
        # await ptl.cancel_actor()


if __name__ == '__main__':

    # works fine B)
    trio.run(main)

    # will hang and ignores SIGINT !!
    # NOTE: you'll need to send a SIGQUIT (via ctl-\) to kill it
    # manually..
    # trio.run(main, True)

Note the interleave: a breakpoint() on the asyncio side, tractor.pause() on the trio side of the same actor, and another pause up in the root — all serialized through the one tty lock with no cross-actor (or cross-event-loop!) clobbering.

One catch: asyncio tasks spawned out-of-band — i.e. not via tractor.to_asyncio, typically by some third-party aio lib — have no portal bestowed, so a sync pause from one raises a loud RuntimeError telling you to greenback.ensure_portal() first. See the caveats below.

Teardown debugging: the shielded pause#

Cancellation is trio’s bread and butter, which raises an awkward question: how do you REPL inside an already-cancelled scope, say while debugging some teardown sequence? A bare pause() would itself be cancelled at its next checkpoint.

The answer is await tractor.pause(shield=True), which wraps the lock acquisition and REPL session in a shielded cancel scope (post_mortem(shield=True) works the same way):

examples/debugging/shielded_pause.py#

import trio
import tractor


async def cancellable_pause_loop(
    task_status: trio.TaskStatus[trio.CancelScope] = trio.TASK_STATUS_IGNORED
):
    with trio.CancelScope() as cs:
        task_status.started(cs)
        for _ in range(3):
            try:
                # ON first entry, there is no level triggered
                # cancellation yet, so this cp does a parent task
                # ctx-switch so that this scope raises for the NEXT
                # checkpoint we hit.
                await trio.lowlevel.checkpoint()
                await tractor.pause()

                cs.cancel()

                # parent should have called `cs.cancel()` by now
                await trio.lowlevel.checkpoint()

            except trio.Cancelled:
                print('INSIDE SHIELDED PAUSE')
                await tractor.pause(shield=True)
        else:
            # should raise it again, bubbling up to parent
            print('BUBBLING trio.Cancelled to parent task-nursery')
            await trio.lowlevel.checkpoint()


async def pm_on_cancelled():
    async with trio.open_nursery() as tn:
        tn.cancel_scope.cancel()
        try:
            await trio.sleep_forever()
        except trio.Cancelled:
            # should also raise `Cancelled` since
            # we didn't pass `shield=True`.
            try:
                await tractor.post_mortem(hide_tb=False)
            except trio.Cancelled as taskc:

                # should enter just fine, in fact it should
                # be debugging the internals of the previous
                # sin-shield call above Bo
                await tractor.post_mortem(
                    hide_tb=False,
                    shield=True,
                )
                raise taskc

        else:
            raise RuntimeError('Dint cancel as expected!?')


async def cancelled_before_pause(
):
    '''
    Verify that using a shielded pause works despite surrounding
    cancellation called state in the calling task.

    '''
    async with trio.open_nursery() as tn:
        cs: trio.CancelScope = await tn.start(cancellable_pause_loop)
        await trio.sleep(0.1)

    assert cs.cancelled_caught

    await pm_on_cancelled()


async def main():
    async with tractor.open_nursery(
        debug_mode=True,
    ) as n:
        portal: tractor.Portal = await n.run_in_actor(
            cancelled_before_pause,
        )
        await portal.wait_for_result()

        # ensure the same works in the root actor!
        await pm_on_cancelled()


if __name__ == '__main__':
    trio.run(main)

If you forget, tractor has your back: an unshielded pause() from a cancelled scope fails fast with a hint suggesting await tractor.pause(shield=True) instead of silently never REPL-ing.

Go ahead, mash ctrl-c#

While any REPL is live the runtime installs a custom SIGINT handler tree-wide so that a reflexive ctrl-c (or five) can’t nuke your debug session:

the actor that owns the REPL ignores the interrupt and simply re-flushes the prompt — keep mashing, it’s fine,
the root actor ignores SIGINT while a still-IPC-connected child holds the tty lock, so the supervisor won’t tear down the tree out from under the debugger,
if the lock state has gone stale — the locking child died or its IPC channel dropped — the root cancels the stale lock scope and restores trio’s default handler, so ctrl-c works again exactly when it should.

The handler is uninstalled and trio’s own SIGINT semantics restored every time a REPL releases (on continue / quit).

Live task-tree dumps#

Sometimes there’s no error to catch — the tree is just hung and you want to know where. For that tractor integrates stackscope: send a signal, get a full trio task-tree dump from every actor in the tree.

Enable it any of three ways:

open_root_actor(enable_stack_on_sig=True) (or via open_nursery() which forwards it),
set TRACTOR_ENABLE_STACKSCOPE=1 in the env — it’s inherited through the process tree so every (sub)actor arms the handler at boot,
call tractor.devx.enable_stack_on_sig() directly.

It’s intentionally not gated on debug_mode so you can leave it armed in plain runs. Then, when the hang strikes, signal the tree with SIGUSR1.

Tip

No need to hunt down pids — pattern-match the original cmdline with pkill:

$ pkill --signal SIGUSR1 -f "python example_script.py"

Each actor dumps its entire trio task tree (full nursery recursion via stackscope.extract()) to its tty and tees it to /tmp/tractor-stackscope-<pid>.log — so the trace survives even under captured-stdio harnesses — then relays the signal on to its children, parent-before-child, until the whole tree has reported in.

Try it yourself with the demo script, which deliberately hangs a subactor in a shielded sleep:

examples/debugging/shield_hang_in_sub.py#

'''
Verify we can dump a `stackscope` tree on a hang.

'''
import os
import platform
import signal

import trio
import tractor

@tractor.context
async def start_n_shield_hang(
    ctx: tractor.Context,
):
    # actor: tractor.Actor = tractor.current_actor()

    # sync to parent-side task
    await ctx.started(os.getpid())

    print('Entering shield sleep..')
    with trio.CancelScope(shield=True):
        await trio.sleep_forever()  # in subactor

    # XXX NOTE ^^^ since this shields, we expect
    # the zombie reaper (aka T800) to engage on
    # SIGINT from the user and eventually hard-kill
    # this subprocess!


async def main(
    from_test: bool = False,
) -> None:

    if platform.system() != 'Darwin':
        tpt = 'uds'
    else:
        # XXX, precisely we can't use pytest's tmp-path generation
        # for tests.. apparently because:
        #
        # > The OSError: AF_UNIX path too long in macOS Python occurs
        # > because the path to the Unix domain socket exceeds the
        # > operating system's maximum path length limit (around 104
        #
        # WHICH IS just, wtf hillarious XD
        tpt = 'tcp'

    async with (
        tractor.open_nursery(
            debug_mode=True,
            enable_stack_on_sig=True,
            loglevel='devx',  # XXX REQUIRED log level!
            enable_transports=[tpt],
            # maybe_enable_greenback=True,
            # ^TODO? maybe a "smarter" way todo all this is how
            # `modden` does with a rtv serialized through the osenv?
        ) as an,
    ):
        ptl: tractor.Portal  = await an.start_actor(
            'hanger',
            enable_modules=[__name__],
            debug_mode=True,
        )
        async with ptl.open_context(
            start_n_shield_hang,
        ) as (ctx, cpid):

            _, proc, _ = an._children[
                ptl.chan.aid.uid
            ]
            assert cpid == proc.pid

            print(
                'Yo my child hanging..?\n'
                # "i'm a user who wants to see a `stackscope` tree!\n"
            )

            # XXX simulate the wrapping test's "user actions"
            # (i.e. if a human didn't run this manually but wants to
            # know what they should do to reproduce test behaviour)
            if from_test:
                print(
                    f'Sending SIGUSR1 to {cpid!r}!\n'
                )
                os.kill(
                    cpid,
                    signal.SIGUSR1,
                )

                # simulate user cancelling program
                await trio.sleep(0.5)
                os.kill(
                    os.getpid(),
                    signal.SIGINT,
                )
            else:
                # actually let user send the ctl-c
                await trio.sleep_forever()  # in root


if __name__ == '__main__':
    trio.run(main)

(That trio.CancelScope(shield=True) hang also shows off the zombie reaper: ctrl-c the root and the un-cancellable child still gets hard-reaped — if you can create a zombie it is a bug.)

Crash handling for sync and CLI code#

All of the above rides on the actor runtime, but crashes don’t politely wait for trio.run(). For plain sync code — think typer/click CLI endpoints, config parsing, anything pre-runtime — there’s a sync context manager that wraps the same pdbp post-mortem UX:

from tractor.devx import open_crash_handler

def main():    # any sync code, no runtime required
    with open_crash_handler() as boxed:
        run_my_cli_thing()

By default any BaseException (minus an ignore set defaulting to KeyboardInterrupt and trio.Cancelled) enters the REPL then re-raises on exit; pass raise_on_exit=False to suppress instead and introspect the boxed.value afterward. The catch/ignore sets and a repl_fixture are all tweakable.

For the classic --pdb CLI-flag pattern use the conditional variant:

from tractor.devx import maybe_open_crash_handler

@app.command()    # a `typer` (or `click`) endpoint
def cmd(pdb: bool = False):
    with maybe_open_crash_handler(pdb=pdb):
        ...

REPL niceties and hooks#

Every REPL in this guide is a pdbp instance (the maintained fork-and-fix of pdb++) pre-configured by tractor:

pygments syntax highlighting in listings and tracebacks,
tab completion — including an automatic fixup for libedit-compiled CPythons (e.g. uv-distributed pythons),
sticky mode available via the sticky command (off by default),
no long-line truncation (terminal resizes behave),
the (Pdb+) prompt, ll, hidden-frames support and the rest of the pdb++ goodies you may already know.

Internal runtime frames are traceback-hidden so the REPL lands exactly on your pause()-call or crash frame, never on tractor guts.

Finally, if your app owns the terminal (TUIs, fullscreen dashboards) pass repl_fixture=<your ctx mngr> to pause(), post_mortem() or open_crash_handler(): it’s entered just before the REPL engages (return False to skip entry entirely) and exited on release — perfect for suspending and restoring your screen around a debug session.

Caveats and platform notes#

An honest list of the current rough edges:

Windows: the debugger has no CI coverage on windows at all (the entire test module is skipped there); manual testing has shown it can work, but you’re in uncharted territory — reports welcome!
macOS: supported but with rough edges: special-cased prompt re-flushing for bash-on-darwin, a few tooling tests skipped on CI, and the AF_UNIX ~104-char socket-path limit forces some examples (like the stackscope demo above) to fall back from 'uds' to 'tcp' transport. Wonder if all of it’ll work on OS X? So do we.
CPython 3.14: greenback (via greenlet) doesn’t support 3.14 yet, so pause_from_sync() and the builtin breakpoint() override are effectively 3.13-only for now. The async APIs — pause() and post_mortem() — need no greenback and work everywhere.
out-of-band asyncio tasks: sync pauses from aio tasks not spawned via tractor.to_asyncio raise a RuntimeError (no greenback portal was bestowed); run await greenback.ensure_portal() inside such a task first.
nested-tree ctrl-c edges: SIGINT relay through intermediary parents that aren’t themselves in debug mode still has known rough edges — see #320.
captured stdio: pytest-style output capture can hang a pause(); use a real terminal (or a pty à la pexpect, which is how tractor’s own suite drives every one of these examples).