Job title: ML Site Reliability Engineer
Company: JP Morgan Chase & Co.
Job description: Job Description : Organization Description The Chief Technology Office oversees enabling components inclusive of the top quality engineering and architecture tools and practices, key program management and processes as well as the technology workfor
Job Description : Organization Description The Chief Technology Office oversees enabling components inclusive of the top quality engineering and architecture tools and practices, key program management and processes as well as the technology workforce strategy required to make us a leading technology company for our customers, clients and colleagues around the world. Employer Description JPMorgan Chase & Co., one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world’s most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management. We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. In accordance with applicable law, we make reasonable accommodations for applicants’ and employees’ religious practices and beliefs, as well as any mental health or physical disability needs. Job Description The Chief Technology Office (CTO) within JPMorgan Chase develops products to guide technology across the firm globally, removing inefficiencies and streamlining how we deliver quality business applications. We’re continuing to evolve from building next-gen platforms to guiding architectures that unlock their capabilities, automating how we take code from inception to production. We’re focused on optimizing how apps are designed for the future, targeting solutions that are portable across multiple cloud platforms to stay resilient, scalable and maintainable. The CTO team invests heavily in researching bleeding edge and emerging technologies (Applied Research Engineering) such as ML on code, AR/VR, quantum computing with potential application across the firm, with the goal of cultivating knowledge and converting it to new products and features. This position is within the Applied Research Engineering team within the CTO org. Responsibilities: Implement SRE frameworks to support the AI/ML solutions, and ensure the highest level of SLA through operational excellence Engage with development team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes Perform analytics on previous incidents and usage patterns to better predict issues and take proactive actions Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents Perform the L1/L2/L3 support activities for the Production Support project with analysis and design work, including impact of requirements across all system components Build and drive adoption for greater self-healing and resiliency patterns Provide support to develop & improve the quality of technical engineering documentation Provide support to drive the maturity of the software development lifecycle Champion a DevOps model so that services are automated and elastic across all platforms Help coach and mentor less experienced team members. Write operation documentation and knowledge base of known issues with solutions Participate in the 24×7 support coverage as needed Requirements Minimum Bachelor of Science degree in Computer Science, Software Engineering, Electrical Engineering, Computer Engineering or related field. 8+ years of IT experience with expertise in supporting Enterprise Cloud infrastructure (AWS, Azure, GCP), AI/ML solutions Excellent problem solving and debugging skills. Strong interpersonal skills able to work independently as well as in a team Hands-on experience with cloud-based technologies and tools especially in deployment, monitoring and operations, such as Data Dog, Prometheus, Splunk, Elasticsearch, Grafana Experience with database systems including Postgres, Cassandra, ElasticSearch Experience with data pipelining and good understanding of Airflow, Kafka etc. Experience with DevOps tools such as GitLab, Ansible, Docker, Kubernetes Continuous Integration platforms like Jenkins, Terraform Good in writing automation scripts using Python Good understanding of HTTP stack , webservers like Apache, NGINX, Load balancer and Firewalls etc Familiar with tools such as Apache Airflow, Luigi, DVC, MLFlow, etc. Desirable You believe in continuous learning, sharing best practices, encouraging and elevating less experienced colleagues as they learn. You have a strong commitment to SRE best practices
Location: Bangalore, Karnataka
Job date: Wed, 23 Jun 2021 22:36:08 GMT
ML Site Reliability Engineer
This is a syndicated post. Read the original post at Source link .