Error Management
Current status of Teespring app
Emojis not allowed in stores
Failing to do charity payouts
...
What's the main problem?
Is not that we are not fixing our errors
The problem
is that we are not ackowledging them
Potential solutions
Ignore
our
errors
Take responsibility of
our
errors
Error management in fulfillment
Roles
Triager role
Once an error happens, it assigns it to a specific team based on ownership document
Rotates on a weekly basis
Fixer role
Prioritizes the error
Fixes the error
Error management in fulfillment
Flow of an error
Joan writes this method
def imma_break() nil.fulfillment_requests end
Honeybagder pings @admin-triager that a new error happened in admin
Triager digs into the error and decides #FUN will own it
FUN gets pinged in Slack that a bug story was assigned to them
FUN prioritizes it and fixes it
Error management in fulfillment
Integrations
Honeybadger-JIRA: Create issue
JIRA-Honeybadger: Mark issue as resolved
PagerDuty-Slack: Add engineer to @admin_triager user group
Honeybadger-Slack: Ping @admin-triager when new error occurs
Documentation
Admin error management docs
Some thoughts
Errors will live with us forever.
Is not about fixing errors, it's about paying attention to them.
As engineers, if we don't take care of our errors, nobody else will do it.