Covers asynchronous notification pipelines, third-party push integrations (APNs, FCM, Twilio), and achieving reliability guarantees.
A notification system sends critical alerts to users across multiple channels. In interviews, establish support for:
A synchronous design (sending emails/pushes directly from the API thread) fails when third-party provider systems are slow or down. The system must be asynchronous, utilizing **Message Queues** for decoupling and workers for scaling.
Notifications must not be lost. We achieve this by:
• Storing a persistent log of all notification statuses (e.g. Sent, Failed, Retrying) in a database.
• Utilizing message brokers with persistence (like Kafka or RabbitMQ disk storage).
• Implementing **retry queues**. If a SendGrid request fails with a 503, workers put the notification back into a retry queue with exponential backoff.
While we want at-least-once delivery, sending the same payment alert twice is a terrible user experience.
Deduplication Strategy: When an event occurs, generate an idempotency key (e.g., transaction_id + event_type). Before workers send a notification, they check in Redis if the key exists (e.g., set key with EXPIRE time). If the key already exists, discard the duplicate request.
Clients shouldn't receive 50 spam notifications an hour.
• **Opt-in/Preference Check:** API servers look up user settings first to verify they haven't disabled the channel (e.g., marketing email = disabled).
• **User-level rate limiter:** Limit the number of marketing pushes to e.g. 3 per day per user, throwing out any excess notifications.