Job number: JN -072024-37412 Posted: 2024-09-11

Infrastructure Manager

Lead the infrastructure and production environment
7 - 12 million yen Tokyo Information Technology IT Management

Job details

Company overview
Our client is global leading certificate authority, providing PKI and Identity and Access Management services to provide enterprises with a platform to manage internal and external identities for the Internet of Everything.
Responsibilities
Ownership of the infrastructure and the production environment

  • Treat the production environment with care, prioritizing its stability both when implementing changes and when changes are not applied to it.
  • Make final decisions on technical changes and deployment plans. As the ultimate authority within the team, take full responsibility for reviewing, approving, and ensuring the safety and effectiveness of all technical changes and deployment strategies. Approach each decision with a thorough risk assessment, considering both immediate impacts and long-term implications to maintain system integrity and reliability.


Being a playing manager

  • Directly manage a very small technical team by overseeing daily operations, providing technical guidance, and ensuring the team meets tasks deadlines and quality standards.
  • Hands-on involvement in tasks by actively contributing to coding, system administration, and operational tasks alongside team members.
  • Serve as a bridge between the technical team and other departments (business, QA, development), ensuring alignment on objectives.
  • Regularly provide reports to senior leadership on team progress, tasks statuses, technical challenges, and achievements. Ensure strategic alignment with organizational goals and objectives, and seek guidance and approval for major decisions and initiatives.


Cloud infrastructure

  • Design, implementation, and maintenance using Terraform
  • Security
  • Review and update of the current design
  • Secure design and implementation of new components
  • Stability (availability, consistency)
  • Easy deployments without service interruption
  • Auto-recovery
  • Scalability
  • Stable performance independently of number of users
  • Ownership of production and test environments

Development and QA team support

  • Support and implementation of tools (pipelines, etc.) to allow Development and QA teams create, reset, destroy test environments and to allow deployments to those environments
  • BCP
  • Readiness to address production environment issues at any time, if such happens (however, you are required to do everything to prevent this in the first place)
  • Planning and execution of exercises
  • Incidents’ investigation


Monitoring

  • Design and implementation of monitoring mechanism
  • Alerts’ setup


Systems logs

  • Collection of system logs
  • Error detections and alerts mechanism setup
  • Anomalies detection mechanism setup


Infrastructure costs

  • Regular optimization of infrastructure costs


Periodical maintenance

  • Product updates and management


Software update

  • Detection of outdated and/or vulnerable software and update (on infrastructure level)
  • Management of AWS managed services updates
Requirements
Main requirement

  • At least 3 years of production environment experience, meaning being responsible for stability of a production system
  • Thorough understanding of modern cloud technologies (AWS) and containers
  • How web applications work on infrastructure level


  • Desired technical skills

  • Good knowledge of AWS (managed services, ECS, RDS, EC2, KMS, VPC networking etc.)
  • Good knowledge of Docker
  • Good understanding of HTTP protocol and how web applications work on infrastructure level
  • Good understanding of networking
  • Good understanding of Terraform
  • Experience with pipelines (Jenkins or Bitbucket or Gitlab or GitHub etc.)
  • Fair understanding of SSL, PKI, including mutual SSL (client certificates), asymmetric and symmetric encryption and signing
  • Fair understanding of how DNS works
  • Fair understanding of relational databases
  • Fair understanding of site reliability engineering
  • Be comfortable with some scripting language
  • Understanding of common types of attacks and vulnerabilities in web applications
  • Some experience with Nginx
  • AWS certification
  • Salary
    7 - 12 million yen
    Location
    Tokyo
    Dejiddolgor Suuri
    BRS Consultant
    Dejiddolgor Suuri
    Tech Services
    Email me directly

    Recommended jobs