Senior Site Reliability Engineer at OpenGov | Remotely

Imagine yourself here!

OpenGov is a mission driven fast-growth, Series D, venture backed startup (includes Andreessen Horowitz, Formation 8, and Emerson Collective). Our Board of Directors includes iconic Silicon Valley executives John Chambers (former Cisco Chairman and CEO) and Marc Andreessen (Time Magazine’s list of the 100 most influential people in the world).

OpenGov is the leader in modern cloud software for local governments and state agencies. We have surpassed 1,600+ governments (and growing fast!) using our products in our mission to power more effective and accountable government.

OpenGov is a 2022 Top Workplaces USA award winner and a Forbes 2022 America’s Best Startup Employer!

Job Summary:

The Sr. Site Reliability Engineer II will lead a highly-skilled team that combines software and systems engineering, to build and run large-scale, massively distributed, fault-tolerant systems for managing local governments’ assets and infrastructure. This role and its team will be responsible for ensuring that OpenGov’s cloud services—both our internally critical and our externally-visible systems—have reliability, uptime appropriate to customers’ needs, and at a fast rate of improvement. Additionally, SREs are responsible for constant monitoring of OpenGov’s systems capacities and performance.


• Build and develop software and writing automation

•Manage highly available cloud infrastructure for continuous integration, automated software releases, infrastructure automation, and monitoring

•Monitor and maintain infrastructure and applications

•Administer, support, and troubleshoot Microsoft Windows and Linux servers

•Provide operational services that are foundational to the product architecture in terms of usage across multiple products, such as Kafka, Authorization/Authentication, and MongoDB

•Lead deployment aspects related to schema migrations, security (e.g. secrets management), reliability, and scalability

•Provide incident response support, troubleshooting of production systems, including optimization to their performance, scale, utilization, and costs

Requirements and Preferred Experience:

•BS/MS in Computer Science or equivalent required

•Minimum 6 – 8 years industry experience with 5 years as an SRE engineer in SaaS companies required

•Ability to find, diagnose, troubleshoot and solve issues at any level of the stack or within the organization

•Experience building highly-available and reliable software

•Ability to easily identify potential issues with resource exhaustion, latency, traffic patterns, and errors

•Excellent understanding of monitoring and scaling Windows Servers in production on public clouds

•Experience with AWS and their APIs

•Development and administration experience on Linux environment with distributions like Debian and Ubuntu

•Programming skills in languages like Java/Python/Ruby/Go/C/C++

•Demonstrable experience in creating high-performance and highly scalable services

•Possess strong verbal and written communication skills

•Exhibit a good balance between strategic direction and tactical execution

•Possess a strong orientation towards delivering results incrementally

•Demonstrate a sense of high-level ownership and proactively get involved with stakeholder discussions

Work Location

Source link