Site Reliability Engineer

  • Vakgebied IT
  • Dienstverband Fulltime
  • Vacaturenummer VAC-10005855
  • Locatie Veldhoven
  • Type overeenkomst Secondment via YER
  • Branche IT & Telecom

Over deze vacature


ASML is one of the world’s leading manufacturers of semiconductor-chip-making equipment. A majority of the world’s microchips receive their critical lithographic patterning in machines made by

ASML. In addition ASML produces metrology tools and advanced applications to analyze and optimize the performance of the customer production process.

Job Mission

Troubleshoot short term problems and translate, develop into structural improvements on our distributed data and compute platform infrastructure. Be accurate, be precise and help drive up the aggregate availability of the installs of these distributed computing systems in Korea, Taiwan, Israel, China and the US (etc.). Be part of the compute platform that is one of the main pillars under the production of the next generation microchips of Apple, Samsung and many others.


Site Reliability Engineering is a new concept for ASML. You will be breaking new grounds. The SRE is expected to work for customer installs WW as well as on the test and integration systems running in Veldhoven. The ‘Site’ where you are expected to drive the reliability upwards is one or more of many installations of the Virtual Computing Platform, the VCP, in the world. This platform under development shall be the foundation under the applications developed in house by other teams. These applications take data from ASML scanners and ASML yield star equipment. They combine this data to real time production corrections and scanner process diagnostics. The corrections are sent back to the ASML production equipment. Failure of the platform would mean failure of the customers (tsmc, Samsung, Intel etc.) production facility. Hence we have an uptime requirement of 4 nine’s. As a true distributed computing expert you will have your own view on such a baseline requirement but that might be a nice topic to discuss during an interview.

The Managed Operations (MO) department, active 24/7 in 3 geographical locations (time zones) is in between customer and the SRE team. As such monitoring and alert handling is not in the scope of the SRE at ASML at the moment. Where MO cannot address the problem the SRE comes in to support solving the problem at hand. It is the task of the SRE team to enable their MO counterpart to handle alerts without escalation by clear documentation and well defined automated corrective actions. A great SRE will take the learning from the incident to improve the system in a next release. Via automation, automation and automation plus reduction of moving parts, upgrades of critical components or additional alerting the SRE tries to bring back the number of alerts back to ‘0’. The time that is saved is spent on adding features and capabilities to the platform to further drive the applications road map of ASML.

Responsibilities of the SRE:

  • Create awareness in other teams about methods and procedures we use to help them to prevent repetitive help requests.
  • Help application developers to understand the infrastructure / cluster / system
  • “We are the team that is in charge of understanding & explaining how the system fits into the customer’s ecosystem”
  • Share knowledge/mindset to other teams (dev/infra engineers)
  • Cross functional, share knowledge between infra engineers
  • Contribute towards building VCP as a Product which meets ASML standards of quality
  • Increase stability and reliability of VCP by automated testing and automation
  • Customer satisfaction and product reliability
  • Improve the functionality and reliability of VCP
  • Translate customer ecosystem needs to engineering deliverables
  • Find the broken pieces of the puzzle at system/cluster level
  • Combination of individual ‘stories’ in a complete book
  • Make the VCP reliable by improving system resilience (bug-fixing and beyond)
  • Resolve bugs in a sustaining way (implement regression test, design structural fixes)
  • Ambassador of predictable component life cycle management
  • Technical road map maintenance (App life cycle management)
  • Support feature and service request from the field
  • Suggest improvements to our technical solutions and way of working, and implement them in alignment with your team and their stakeholders


ASML is a successful Dutch high-tech enterprise that produces complex lithography systems used by chip manufacturers in the production of integrated circuits. ASML is at the cutting edge of this technology and delivers systems to all the world's leading chip manufacturers. ASML's employees are among the most creative talents in the fields of physics, mathematics, chemistry, mechanical engineering and software. Every day they collaborate in close-knit multidisciplinary teams in which members listen to and learn from one another and exchange ideas. It is the ideal environment for professional development and personal growth.

ASML is headquartered in Veldhoven, the Netherlands.


You will be employed by YER and seconded to ASML. We offer:

  • Good employee benefits (e.g. work-life balance, pension, lease car, bonus model)
  • Challenging assignments
  • Excellent guidance from your consultant and YER's back office
  • Development opportunities, including the YER Talent Development Programme with a personal coach
  • Intensive support for international candidates (including Dutch lessons, tax-return and accommodation assistance)
  • Cooperative and results and relationship-driven
  • Friendly atmosphere and open culture
  • Community/network with other technology professionals from a variety of multinationals
  • Events and master classes with interesting speakers and attractive companies



Required qualifications & experiences

  • Knowledge of distributed computing systems, practical experience (must!)
  • Experienced in build and release infrastructure, Maven, Nexus, Bamboo, Github
  • Familiar with at least one scripting language (Python)
  • Experience with Ansible
  • Linux expert


Highly valued qualifications & experiences

  • Experience with DC/OS
  • Experience with new technology introduction @ zero downtime including data migration
  • Fan of automatic testing and qualification, if can be part of CI/CD pipeline.
  • Affinity to dig deep into the details of networking issues
  • Available to work (remotely) outside regular office hours when it proves that attempt to build a fail-safe system was not yet successful. We really want this to be an exception, not a rule

Personal skills

  • Problem solving/Go-fix mentality
  • No is not an answer/Open to Challenges
  • Think out of the box
  • Look through the customer eyes
  • Automate everything
  • Positive attitude
  • Collaboration with stake holders
  • Curiosity, understand how the system is working
  • Broad Obsession about e.g, Java, Python, API, Ansible
  • Ability to dive deep into a specific topic
  • To build a more secure, faster, more reliable VCP
  • Keeping in mind we are not Netflix, we tend to choose more proven technology as latest greatest in order to keep meeting the 4 nine’s.
  • Think logically and use that ability to solve problems
  • Be able to combine the individual elements and requests into a system design
  • Share knowledge, work in pairs
  • End-to-end knowledge for VCP support (skill set)
  • Operations / support mindset
  • Have fun

Keywords: Ansible, Kubernetes, DC/OS, D2IQ, Mesosphere, HDFS, MongoDB, Docker, UCR, Spring Boot, Splunk, Linux, HDP, Bamboo, Nexus, JIRA, Scrum, RHEV, RHEL