Contentsquare needed notification functionality for many use cases within its platform. The company created a generic solution spanning multiple services as part of its microservice architecture. During the implementation, the developers had to improve observability and overcome some scalability challenges.
Notifications at Contentsquare can be used for anything from password resets to critical alerts about exceeding API quotas and are delivered via email, Slack, or Microsoft Teams, depending on users’ preferences. The company opted to gradually roll out notification-related features to allow for performance and scalability improvements should they be required.
Notification Components (Source: Contentsquare Engineering Blog)
Contentsquare's platform leverages microservices architecture, and the notification subsystem comprises several microservices. Notification Consumer is responsible for processing messages from the Apache Kafka topic. Mailer Service handles the delivery of email notifications and uses the EJS templating engine to render email contents based on preconfigured templates. Lastly, the Integration Service takes care of Slack and Microsoft Teams notifications, for which it will compose JSON message bodies based on Slack's Block Kit or Microsoft Teams Adaptive Cards. Slack Service and Microsoft Teams Service (depicted below) are responsible for sending notification messages to Slack or Microsoft Teams APIs, respectively.
Microservices for Sending Notifications to Slack and Teams (Source: Contentsquare Engineering Blog)
Joseph-Emmanuel Banzio, a software engineer at Contentsquare, shares the team’s experiences while rolling out notification functionality:
We encountered several bottlenecks along the way that led us to scale and enhance the reliability of our system. One notable challenge was the fact that we initially used a single Kafka topic for inter-microservice communication, before creating the Notifications topic. This had been working well before we launched the beta of real-time alerts.
Aside from using a dedicated Kafka topic for notification alerts, the team optimized the notification storage to avoid high latency for reads. They implemented the data retention mechanism to remove old notification records. Another problem that required investigation was related to some users not receiving emails. After much research, it turned out to be caused by an incorrect SPF (Sender Policy Framework) configuration and was addressed by the security team.
To help troubleshoot email notification issues, the team created a dedicated email observability solution where delivery events collected by the 3rd-party email service were periodically retrieved and stored within Contentsquare's platform. This approach provided end-to-end visibility into the email notification flow.
During the rollout, the developers also worked on improving the observability of the platform. They created a Kibana dashboard to monitor and analyze the logs and a Grafana dashboard to monitor cloud resources used by notification microservices. Additionally, the team expanded the monitoring of the production Kafka cluster to ensure resource utilization and consumer group lags were acceptable. In the future, the team plans to work on additional resiliency in case of system failures and improve the timeliness of notification delivery to achieve near real-time delivery.