* Company Profile Morgan Stanley is a leading global financial services firm providing a wide range of investment banking, securities, wealth management and investment management services. With offices in more than 41 countries, the Firm’s employees serve clients worldwide including corporations, governments, institutions and individuals. For further information about Morgan Stanley, please visit www.morganstanley.com. Reliability & Production Engineering Resiliency Engineering is a production-oriented discipline focused on improving service availability, latency, scalability, performance, and efficiency for technology products in Morgan Stanley. Our core infrastructure processes hundreds of millions of transactions and we serve assets of more than a trillion dollars daily. This role will be responsible for the design & implementation of the platform, and corresponding frameworks, application and gameday exercises, for testing critical applications at scale. If this scale resonates with you, come join us. Job Profile We are looking for a strong technologist and a doer who is willing to lead by example by being hands on every day. This role will be supporting Institutional Securities and Wealth Management brokerage Operations platforms which include diverse technologies. The role is expected to improve the reliability of our systems by working with the Software developers and Infrastructure engineering teams to develop automated reliability solutions. Using automation will help evolve our systems that will increase our velocity in managing uptime, mitigate risk. We are looking for the ideal candidate to improve our codebase in a growing area of the Firm that provides significant potential for career growth for the right candidate.
Responsibilities:
Opportunity to drive modern Observability platform by modernising the existing monitoring tool set by integrating new observability tools with end user applications.
Working with a team of incredibly talented and dedicated peers with hands on experience in cutting-edge cloud and Observability products
A chance to share best practices and create innovative application monitoring standards and logging solutions to modernise observability
Working with and integrating – Provide telemetry and logging capabilities to developers and SRE organizations as part of the firms devops efforts, * Primary Skills At least 5 years of relevant experience
Hands on Python , Shell or Perl developer
Experience of one or more of the following observability tools: Loki, ElasticSearch, Prometheus, DataDog, Zabbix, Graphana or Splunk.
Site Reliability Engineering [SRE] or Production Management interested in pursuing an SRE role
Experienced on setting up and configuring Grafana/ Prmetheus
Experiences in configuring dashboards, alerts and alarms
Integration with different tools, API integrations and reports
Exposure to analyze from dashboard, recommend and suggest
Good communication and problem solving skills to debug issues / integrate using Json, API, CI/CD pipeline using scripts
In- depth experience installing, configuring, maintaining Grafana.
Experience with Data visualization using Grafana
Experience creating Grafana dashboards to display time-based data plots
Enhance existing Grafana dashboards and added new dashboards
Cloud experience in installing, configuring, maintaining Grafana.
Experience with Data visualization, creating Grafana dashboards to display time-based data plots.
Strong verbal and written communication skills as well as presentation skills
Comfortable with customers with focus on customer interaction and client experience
Ability to provide mentoring and contribute to local office leadership
Proficiency with Linux operating system and databases
Understanding of how various software components involved in enterprise service delivery interact: web servers, application servers, databases, web services, mainframes, network attached storage, and so forth Good To have Skills Experience of programming languages, preferably Java. Experience with DevOps tooling
Able to persuade stakeholders and champion effective techniques through product development
Experience of integrating end user applications with monitoring and APM tools.
Experience instrumenting applications for transaction tracing and metric collection
Working experience of regular expressions in data extraction from log files
Understanding and familiarity with log parsing
Design and monitor tools to measure a particular problem or the contribution of a particular technique over time
Familiarity with the network and application monitoring space a plus
Due to the roles integration within APM team, prior experience in Application Performance Management (APM) is a big plus.
Database experience with at least one of the following: MySQL, DB2 or MSSQL
Understanding of enterprise-architecture concepts: 3-tier architecture, high-availability/disaster recovery (active-active data centers, redundant switch stacks, and so forth) Morgan Stanley is an equal opportunities employer. We work to provide a supportive and inclusive environment where all individuals can maximise their full potential. Our skilled and creative workforce is comprised of individuals drawn from a broad cross section of the global communities in which we operate and who reflect a variety of backgrounds, talents, perspectives and experiences. Our strong commitment to a culture of inclusion is evident through our constant focus on recruiting, developing and advancing individuals based on their skills and talents.