Join Shopee & Work with Me!

I am currently working for Shopee(Singapore) as an SRE. If you enjoy this blog, you may fit this position as well, and we are hiring!

Here is the JD. But you can also talk to me directly if you have any questions. My contact information is listed on this page: https://www.kawabangga.com/connect . CVs/Resumes can be sent to the email on the right side of this blog.

 

TechOps Engineer: SRE – Middleware/DFS SRE

  • Engineering and Technology
  • Experienced
  • Singapore

official JD link

The Engineering and Technology team is at the core of the Shopee platform development. The team is made up of a group of passionate engineers from all over the world, striving to build the best systems with the most suitable technologies. Our engineers do not merely solve problems at hand; We build foundations for a long-lasting future. We don’t limit ourselves on what we can or can’t do; we take matters into our own hands even if it means drilling down to the bottom layer of the computing platform. Shopee’s hyper-growing business scale has transformed most “innocent” problems into huge technical challenges, and there is no better place to experience it first-hand if you love technologies as much as we do.

The mission of SRE (Site Reliability Engineer) team is to ensure the efficient and sustainable operation of the Shopee 24×7, and to build and maintain large-scale, highly available, high-performance distributed systems based on system availability and performance. It is a new system formed by combining traditional software engineering and technical operation. The SRE team needs to dive deep into the Shopee development lines to ensure that the system is highly scalable under rapid evolution of the System. From the perspective of stability and performance, it includes the design of business development, components of the basic platform (middleware, container scheduling, caching, object storage, etc.), OS optimization, data center and network optimization. We optimize the inefficient and complicated operation in the traditional operation and maintenance mode through engineering and service means, and are committed to building a sound monitoring system to improve the efficiency of incident handling.

Job Description:

  • Responsible for maintaining MiddleWare and Distributed File system such as Redis, Ceph , Kafka, etc.
  • Responsible for tech architecture review, capacity planning, cost optimisation, tracking and troubleshooting, and building acomponent monitoring system to maintain overall stability and efficiency.
  • Responsible for the maintenance and development of the MiddleWare ops automation platform, and improve the operation and maintenance management level of MiddleWare and Distributed File system.
  • Owner and the first incident responder for MiddleWare/DFS component

Requirements:

  • Bachelor’s or higher degree in Computer Science, Engineering, Information Systems or related fields
  • Familiar with MiddleWare or Distributed File System such as Redis/kafka/SPARK/Rabbit MQ/ELK
  • Have a certain programming foundation, familiar with the common python/golang background development framework.
  • More than 3 years experience in related fields, familiar with large-scale operation and maintenance .
  • Excellent communication, expression and organizational collaboration teamwork ability, adapt to a diversified international working environment, and have certain English ability.

Skills below are optional but preferable:

  • Experience with the development of Redis/Kafka/Ceph automation operation platform is preferred
  • Ability with HDFS/Ceph development is preferred.
  • Experience with Service Mesh is preferred

 

TechOps/DevOps Engineer: Cloud Native Developer

  • Engineering and Technology
  • Experienced
  • Singapore

official JD link

The Engineering and Technology team is at the core of the Shopee platform development. The team is made up of a group of passionate engineers from all over the world, striving to build the best systems with the most suitable technologies. Our engineers do not merely solve problems at hand; We build foundations for a long-lasting future. We don’t limit ourselves on what we can or can’t do; we take matters into our own hands even if it means drilling down to the bottom layer of the computing platform. Shopee’s hyper-growing business scale has transformed most “innocent” problems into huge technical challenges, and there is no better place to experience it first-hand if you love technologies as much as we do.

The mission of Ops-Dev team is to energize Tech Ops ability and power that control and manage massive resource and traffic in a highly efficient, accurate and consistent way. The team provides productional software, intelligent engines and stable system architectures, devotes themselves to build a DevOps ecosystem to integrate all resource and tools, eliminates the gap between Ops and Dev. The main scope focuses on Global Traffic Schedule and Management Platform(NLB, ALB, GSLB, Hybrid CDN, DNS and etc), Hybrid Cloud Resource Schedule and Management Platform(Bromo, Hybrid Cloud Management, Mesos, Kubernetes, Container, PHM,VM, CICD and etc), Internal System(CMDB, SPACE, TOC and etc).

Job Description:

  • Build “Service Oriented” platforms based on Kubernetes; Evolve Shopee Cloud Native infrastructures and empower Shopee businesses via Cloud Native technology stacks.
  • Design, implement and maintain Shopee Cloud Native platforms; Embrace Cloud Native ecosystems to speed up the delivery of the service and make development more efficient.
  • Keep Improving Shopee Cloud Native platforms’ stability, scalability, sustainability and security; Ensure the smooth running of Shopee Cloud Native platforms .
  • Develop and implement automation and engineering solutions; Detect and fix potential problems in advance via TDD (Test Driven Development), chaos engineering and regular fire drills.

Requirements:

  • Bachelor’s or higher degree in Computer Science or related fields.
  • Passionate about coding and programming, innovation, and solving challenging problems.
  • In-depth understanding of computer science fundamentals (data structures and algorithms, operating systems, networks, databases, etc).
  • Strong and hands-on experience with at least one of the programming languages: Go, Python, C++, Java.
  • Strong logical thinking abilities.

Skills below are optional but preferable:

  • SRE background, have hands-on experience for massive scale systems.
  • Experience with Cloud Native technology stack such as Kubernetes, Prometheus, CoreDNS, Istio, Helm, etcd, Jenkins and etc.
  • Experiences in the design and development of large-scale systems and platforms.
  • Contributed to open-source projects.
  • Published papers at top conferences like ASPLOS, Eurosys, NSDI, OSDI and etc.