Our Partner is an award-winning, independent, research company providing data and intelligence to the financial industry. Investment professionals use Our Partner’s online products to obtain key information for business-critical decision-making. Our Partner’s clients include fund managers, legal firms, institutional investors, financial advisors and placement agents.
Primarily a data company; Our Partner is also a fast-moving fintech with a strong growth record and a world-wide staff base. Owned by our founder, directors and employee shareholders, we care passionately about our customers, brand and the employees that make it all happen.
As a Site Reliability Engineer you will operate across Our Partner's full suite of services, supporting our clients around the world. We are responsible for designing, building, and operating our infrastructure, middleware, and CI/CD systems to ensure our teams have access to the best tools available. We combine problem-solving skills with software and systems engineering to take a proactive approach in building fault-tolerant and secure systems, improving observability and zealously automating away toil.
Roles and Responsibilities:
- Design, operate and support Our Partner’s infrastructure, middleware and internal services, while seeking to improve their performance, availability, scalability, latency and efficiency.
- Drive technical excellence in everything we do, fostering a culture of data-driven reliability, monitoring and automation, following SRE best-practices.
- Work alongside development teams to develop and design scalable and high available services and establish effective build framework for continuous deployment and self-service automation.
- As part of an on-call rotation, work on incident resolution and engage various teams (including 3rd parties) for support escalation.
Key Requirements for this Role:
- 5+ years of strong experience
- Expertise in Amazon AWS cloud administration, including services such as: EC2, S3, ELB, RDS, IAM, Route 53, Auto Scaling Groups, Lambda, Cloud Watch, Cloud Formation and Security Groups.
- Expertise with containerisation within Kubernetes and Docker and familiar with the pattern of Microservice Architecture. Able to define container configuration and troubleshoot; further experience with platform architecture is highly desirable.
- Expertise with configuration management technologies including Terraform and Ansible, as well as associated paradigms such as Infrastructure as Code and Immutable Infrastructure.
- CI/CD – comfortable with build pipelines in e.g. TeamCity/ Jenkins/ Concourse; might have deployed the platform itself.
- Hands-on experience developing in one or more programming or scripting languages (e.g. PowerShell, Bash, Python, JavaScript, Golang), within an SCM environment (e.g. Bitbucket, GitHub).
- Networking – knowledge of routing & switching protocols as well as DNS, firewalling, load-balancing and global traffic management.
- Persistence technologies – familiar with database technologies (NoSQL/SQL) and broker/ queuing technologies, including knowledge of HA/ clustering.
- Familiarity with various logging, monitoring and alerting platforms - expertise in the usage (and, desirably, the deployment) of e.g. ELK, Splunk, CloudWatch, to enable forensic log analysis and system tuning as well as data-driven performance analysis (i.e. SLI/ SLO) and capacity planning.
- Linux & Windows systems administration in multiple distributions, including storage management (e.g. LVM, RAID) and security practices e.g. SSH, SSL/TLS, HMAC, IPS/IDS.