Error Management

Current status of Teespring app

What's the main problem?

  • Is not that we are not fixing our errors
  • The problem is that we are not ackowledging them

Potential solutions

  1. Ignore our errors
  2. Take responsibility of our errors

Error management in fulfillment

Roles

  1. Triager role
    • Once an error happens, it assigns it to a specific team based on ownership document
    • Rotates on a weekly basis
  2. Fixer role
    • Prioritizes the error
    • Fixes the error

Error management in fulfillment

Flow of an error

  1. Joan writes this method
    
                      def imma_break() nil.fulfillment_requests end
                    
  2. Honeybagder pings @admin-triager that a new error happened in admin
  3. Triager digs into the error and decides #FUN will own it
  4. FUN gets pinged in Slack that a bug story was assigned to them
  5. FUN prioritizes it and fixes it

Error management in fulfillment

Integrations

Documentation

Some thoughts

  1. Errors will live with us forever.
  2. Is not about fixing errors, it's about paying attention to them.
  3. As engineers, if we don't take care of our errors, nobody else will do it.