Mission of the Role:Architect and lead the delivery of high-quality and reliable solutions through creative problem-solving and technical expertise to address our business problems on a frequent and regular cadence. Enable Engineers on your team to improve the quality and impact of their work and delivery. Evangelize reliability-as-a-feature through monitoring, service-level objectives, automation, everything-as-code, and testing.
Essential Functions and Impact Areas:
• Provide technical leadership and guidance to the SRE team, driving best practices in reliability engineering, automation, and service management.
• Set the direction for SRE projects, aligning them with organizational goals, and ensuring successful execution from concept to delivery.
• Helps define and instrument Service-Level Objectives to ensure the most excellent customer experience.
• Lead initiatives to improve system resilience and scalability.
• Hosts postmortems to share learnings, discover gaps, embrace transparency, and improve reliability across our services.
• Leads projects from inception to completion.
• Participates in an on-call rotation to assist in finding a resolution during incidents.
Minimum Skills & Requirements:
• 7+ years of experience building infrastructure solutions in AWS using Infrastructure-as-Code technologies such as Terraform or CloudFormation.
• 7+ years of experience working with Docker containers and related orchestration technologies (such as Kubernetes or ECS).
• 7+ years of experience building and deploying CI/CD pipelines.
• Experience with AWS, Docker, Kubernetes, Terraform, Python, PHP, and Laravel
• Experience with architectural patterns of large, high-scale applications, such as well-designed APIs and database schemas.
• Experience leading projects and initiatives that are wide in scale and complex in nature.
• Experience working collaboratively in cross-functional teams with engineers in product and data groups.
• Deep technical expertise; Writes, debugs, and refactors code while being mindful of tradeoffs, scalability, architecture, and code cleanliness.
• Demonstrates mastery of their craft to solve problems in automation, infrastructure, and/or developer tooling.
• Reliability & Quality; Experience leveraging observability tooling and practices such as SLOs to help engineering teams own the reliability and quality of the software they build.
• Leadership – Define and deliver large, complex projects that may include coordination with non-technical stakeholders. Help define the SRE function and be a champion for it throughout the organization.
Why You’ll Love Working at Curology:
• Competitive salary and equity packages
• Company Performance Incentive Plan
• Comprehensive benefits: medical, dental, and vision insurance for employees; flexible spending account; 401k; mental health & wellness programs
• Company Performance Incentive Plan
• $75 WFH stipend (remote employees)
• Home office setup stipend (remote employees)
• Minimum Time Off policy (unlimited PTO, with at least 3 weeks off) for exempt employees
• 11 company observed holidays
• Additional holidays: Curology days off (1 per quarter), 1 annual floating holiday (employee’s choice), and Gratitude Week (employees take the full week of Thanksgiving off; business critical teams observe different days)
• Paid parental leave
• Employee donation matching program
• Company-sponsored events
• Free subscription to Curology or Agency
Apply Now
Apply Now