SRE Tech Lead

SREチームをリード／フレックスタイム制／充実した福利厚生／英語スキル活かせる

8 - 12 million yen Tokyo Information Technology Site Reliability Engineer

Job details

Company overview: Our client a leading Japanese internet services and e-commerce company, engaging in a wide range of business areas including online shopping malls. We deploy advanced technology and customer-centric services, utilized by users in Japan and abroad. We operate diverse businesses such as e-commerce, finance, and digital content, offering various services including online shopping platforms, cloud services, and internet banking. We also provide point programs and media content to enhance customer engagement. Additionally, we actively expand our business both domestically and internationally, strengthening our competitiveness in the global market.
Responsibilities: 同社は急速な成長を遂げており、サービスの信頼性、拡張性、そして運用効率が極めて重要になっています。
今回募集のポジションは、SREチームを率いてWebフロントエンドおよびバックエンドシステムの信頼性とパフォーマンスを最大化するために不可欠です。
SREチームリードとして、パブリッククラウドとプライベートクラウドを横断するインフラストラクチャの構築と改善を主導、また製品の品質、デリバリー、信頼性の管理をお任せします。
技術的な観点から当社のビジネス成長を支える重要なポジションです。
チームを鼓舞し、オペレーショナルエクセレンスの新たな高みへと導く、先見の明のあるあなたからの応募をお待ちしております。

━━━━━━━━━━━━━━━

■Responsibilities
SRE Strategy & Roadmap Development: Define and drive the execution of the SRE strategy and technical roadmap to enhance service reliability, performance, and scalability.
Observability Platform Leadership: Lead the management and improvement of monitoring, alerting, logging, and tracing tools, driving the establishment of optimal observability environments for each product.
Service Quality Definition & Achievement: Define Service Level Objectives (SLOs) and Service Level Agreements (SLAs), and plan/execute improvement activities to achieve them. Drive the adoption and operation of Error Budgets.
Performance & Latency Improvement: Identify bottlenecks in service performance and latency, and direct/oversee the team in proposing and implementing solutions.
Incident Management & Troubleshooting: Act as an incident commander during production outages, leading rapid restoration efforts. Conduct Root Cause Analysis (RCA) and drive the implementation of preventative measures.
Operational Efficiency & Automation: Promote automation of operational processes to reduce toil, building an efficient and scalable operational framework.
Team Management & Development: Provide technical guidance, mentorship, and performance evaluations for SRE team members, contributing to the overall skill enhancement and performance of the team.
Cross-functional Collaboration: Strengthen collaboration with product development teams, infrastructure teams, security teams, and other relevant departments, fostering a DevOps culture and strong cooperative relationships.

■部門概要
インセンティブプラットフォーム部（INPD）は、同社のポイントとクーポンの開発・運用を担っています。
ポイントの価値最大化を目指し、改善を推進し、エコシステムに貢献していきます。

■働く環境
フレックスタイム制
ストックオプション制度や退職金制度など充実した福利厚生

━━━━━━━━━━━━━━━#spotlightjob5
Requirements: 【必須】
5年以上のSRE、インフラストラクチャエンジニアリングの実務経験
2年以上のチームリーダーまたはテクニカルリーダー経験
パブリッククラウド (AWS、GCP、Azure など) またはプライベートクラウド環境での本番システムの構築と運用の経験
Kubernetes環境の設計、構築、運用、スケーリングに関する豊富な経験
最新の監視、アラート、ログ記録ツール (Prometheus、Grafana、ELK Stack、Datadog など) の構築と運用に関する深い知識と実務経験
UNIX 系オペレーティングシステムの内部構造やネットワークに関する深い知識があること。
IP ネットワークシステムとプロトコル (TCP/IP、HTTP など) に関する深い知識とトラブルシューティングの経験
CI/CD ツール (Jenkins、CircleCI、GitLab CI/CD など) を使用して自動化されたワークフローを構築した経験
Shell、Python などのスクリプト言語を使用して運用自動化ツールとスクリプトを開発した経験
ビジネスレベル上級以上の英語スキル
社内コミュニケーションは主に英語です
Salary: 8 - 12 million yen
Location: Tokyo

BRS Consultant
Jonathan Spence
Tech Services

Email me directly