USNLX Diversity Jobs

USNLX Diversity Careers

Job Information

TEKsystems Staff Data Center Engineer in Ashburn, Virginia

*Candidates MUST live 45 minutes or less away from the worksite location:

21890 Uunet Drive, Ashburn, Virginia, 20147, United States

Description:

We have a terrific opportunity in our Systems Engineering team for an intelligent and motivated Staff Datacenter Engineer who is enthusiastic about datacenter deployment and site reliability for large-scale consumer online services. As a member of our Data Center Engineering and Operations team, you will be responsible for the day-today operations of our co-location data centers and assume a critical role in maintaining the overall uptime, performance, and capacity of the SiriusXM+Pandora service. You will be able to bring your solid experience to bear in supporting the various services we manage and take on interesting and mission-critical projects as part of a fastpaced, highly collaborative team. We hold ourselves to high standards and take pride in our work.

Duties and Responsibilities:

● Plan and facilitate datacenter expansions and build-outs for new and existing footprints.

● Responsible for monitoring datacenter power consumption and environmentals with existing footprints.

● Lead a team of datacenter operations engineers in accomplishing various day-to-day tasks.

● Facilitate weekly planning meetings with the datacenter team.

● Collaborate with internal teams to define project requirements for hardware deployments.

● Collaborate with our sysad team for keeping our PXE infrastructure up to date and help with creating new boot methods within the environment.

● Create Debian live images for the Data Center Engineering team to use for troubleshooting, disk wiping, performance testing, etc.

● Create zero touch provisioning methods and processes for streamlining hardware deployments in python.

● Review current automated processes, update as needed, and look for opportunities for greater efficiencies.

● Determine if automated processes can be containerized and if so, develop a migration plan, refactor, and deploy the process to our internal private cloud.

● Maintain existing monitoring processes and implement new functionality/metrics.

● Create (or update) documentation on all automated processes and monitoring infrastructure

● Plan, schedule and perform upgrades/maintenance on infrastructure hardware.

● Manage vendor relations with manufacturers and VARs.

● Develop methodologies for hardware stress testing, performance reports, and ways to compare various architectures and configurations

● Hands-on with datacenter infrastructure provisioning and server/network equipment deployments.

● Rack/Cable/Provision a large inventory of servers, switches, PDUs and consoles alongside a team of engineers.

● Perform initial configuration of systems as defined by our standard operating procedures. (BIOS configuration, PXE OS installs, DNS updates etc.)

● Diagnose complex technical problems, provide detailed analysis/root cause as well as remediation/mitigation recommendations.

● Plan and assist with hardware life-cycle management from provisioning to retiring and decommission.

● Manage RMA processes with various vendors.

● Maintain an up-to-date inventory list of all hardware equipment across our datacenters.

● Implement best-practice methodology for maintaining a datacenter environment.

● Document and track all assigned datacenter related issues and tasks via our internal ticketing system in a timely fashion.

Minimum Qualifications:

● BA/BS Information Technology, Computer Science or a related field. (Or equivalent experience)

● IT Certifications such as RHCE or similar are a plus.

● Minimum 8 years of combined data center and Linux administration related experience with at least 4 years

of day-to-day hands-on experience in an enterprise scale datacenter environment.

Requirements and General Skills:

● Self-motivated, continuous learner, appreciates challenge, comfortable and effective working in new areas that require experimentation and rapid problem solving.

● Excellent time management skills, with the ability to prioritize and multitask, and work under shifting deadlines in a fast-paced environment.

● Strong understanding of x86 server hardware architecture and subsystems as it relates to configuration, triage, and certification in a large-scale server environment.

● Knowledgeable in datacenter best practices including but not limited to cabling, power balancing, cooling and airflow optimization, inventory tracking, capacity planning and host/service diversity.

● Strong interpersonal skills with the ability to lead as well as work in a team environment.

● Meticulous attention to detail and strong organization skills.

● Past experience as a team lead or as a people manager is a plus.

● Mentoring datacenter engineers.

● Take pride in keeping a clean and tidy work environment within the datacenter co-location.

● Ability to lift and carry equipment up to 75 pounds safely and reliably on a regular basis.

● Excellent written and verbal communication skills.

● Participate in a 24x7x365 on-call rotation.

● Up to 15% travel

● Must have legal right to work in the U.S.

Technical Skills:

● Demonstrated proficiency in monitoring stacks such as Prometheus, Alertmanager, and Grafana.

● Hands-on experience with PXE boot, UEFI, AMI BIOS distributions, BMC/iDRAC implementation.

● Experience creating and executing Ansible playbooks.

● Experience with docker containers

● Basic understanding of Hashicorp Nomad/Consul/Vault

● Practical professional knowledge of Linux and full network stack from NIC firmware to TCP/IP.

● Expertise with SAN and NAS arrays such as Netapp, Isilon, Pure Storage, and Brocade.

● Familiarity with Bitbucket and Git.

● Familiarity with performance testing and reporting tools, such as Phoronix, FIO, Stream and others.

● Experience with ISC DHCP and BIND DNS operations.

● Intermediate scripting skills in Python and familiarity with OOP concepts.

● Significant knowledge of Linux kernel drivers, kernel tuning, and debugging hardware compatibility issues.

● Basic understanding of subnetting, DHCP Relays, network load balancing, and ARP.

● Working knowledge of package management tools such as APT and RPM.

Skills:

Data center, Linux, pixie, Docker, python, Cable, Rack and stack, redhat

Top Skills Details:

Data center,Linux,pixie,Docker,python

Additional Skills & Qualifications:

Day to Day in this Role:

• General operations of a data center

• Automating PDUs

• Daily tasks around automation; automation is extensive at SXM

• Automation practices i.e. refactoring to latest version of python

• Racking, stacking, and cabling

• Need to be able to think outside of the box

• RedHat cert is a plus – RHCSA is ok, RHCE better

Current 3 Staff Data Center Engineers at SiriusXM:

• John Nguyen - https://www.linkedin.com/in/john-nguyen-18abb13/

• Johnny Kisor - https://www.linkedin.com/in/johnkisor/

• Nick Chan - https://www.linkedin.com/in/nick-chan-7a70a21/

Experience Level:

Expert Level

About TEKsystems:

We're partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. That's the power of true partnership. TEKsystems is an Allegis Group company.

The company is an equal opportunity employer and will consider all applications without regards to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.

DirectEmployers