About the Role
We are seeking a skilled Software Developer to join our Middleware Reliability Engineering team. In this role, you will be responsible for enhancing our observability strategy across our middleware infrastructure, helping us maintain Always On availability for critical payment systems.
You'll work with a global team to implement modern monitoring solutions, integrate with AI/ML frameworks, and drive automation initiatives that ensure exceptional reliability and performance of our middleware services.
What You'll Do
- Observability Transformation: Lead the review and enhancement of our observability strategy for the middleware portfolio, integrating with Prometheus, Grafana, and Splunk to ensure comprehensive coverage and effectiveness
- Migration Support: Play a key role in our migration from Splunk to ELK stack, ensuring seamless transition and improved functionality
- AI/ML Integration: Develop and maintain integrations with internal AI/ML frameworks to enhance operational support and incident response
- OTEL Implementation: Evaluate opportunities and implement OpenTelemetry (OTEL) across our middleware stack
- Cloud Observability: Design and implement observability strategies for middleware products deployed on public cloud platforms (AWS, Azure, GCP)
- Technology Mastery: Develop deep expertise in our middleware technologies, including webservers, application servers, Hazelcast, and IBM DataPower
- Automation Development: Identify and implement automation opportunities using modern tools and frameworks
Requirements
Strong software development background with experience in observability tools, middleware technologies, and cloud platforms. Knowledge of AI/ML integration and automation development is essential.