This job is no longer available

The job listing you are looking has expired.
Please browse our latest remote jobs.

See open jobs →
← Back to all jobs

Staff Data Engineer

Added
less than a minute ago
Location
Type
Full time
Salary
Not Specified

Use AI to Automatically Apply!

Let your AI Job Copilot auto-fill application questions
Auto-apply to relevant jobs from 300,000 companies

Auto-apply with JobCopilot Apply manually instead
Save job

Senior Data Engineer

Twenty is seeking a Staff Data Engineer for an in-office position in its Arlington, VA office to architect and lead the development of data infrastructure that powers our cyber operations applications and capabilities. We're looking for someone with 8+ years of experience in data engineering and architecture, with mastery-level expertise in ETL pipeline development, data lake architecture, and schema design for complex datasets, plus proven leadership experience mentoring engineers and driving technical initiatives. In this role, you'll architect scalable data lakes that aggregate cyber operation data from diverse sources, lead the design of sophisticated graph database schemas that capture relationships across cyber networks, personas, physical world entities, and electromagnetic spectrum data, establish best practices for AI/ML data preparation workflows, and mentor junior data engineers while driving technical excellence across the data platform. You'll join a world-class product and engineering team that delivers mission-critical solutions for U.S. national security, working in both cloud and on-premises environments to build data infrastructure that operates at machine speed. If you're passionate about solving complex data architecture challenges while leading technical initiatives and making a direct impact on national security, we want to talk to you.

About the Company

At Twenty, we're taking on one of the most critical challenges of our time: defending democracies in the digital age. We develop revolutionary technologies that operate at the intersection of cyber and electromagnetic domains, where the speed of operations exceeds human sensing and complexity transcends conventional boundaries. Our team doesn't just solve problems – we deliver game-changing outcomes that directly impact national security. We're pragmatic optimists who understand that while our mission of protecting America and its allies is challenging, success is possible.

Role Details

Technical Leadership & Architecture

  • Lead the design and architecture of enterprise-scale data platforms that support mission-critical cyber operations

  • Define technical vision and roadmap for data infrastructure, balancing operational needs with scalability and performance

  • Evaluate and recommend engineering courses of action for data platform enhancements and technology adoption

  • Drive technical decision-making for complex data architecture challenges across multiple systems and teams

  • Establish data engineering standards, best practices, and design patterns across the organization

  • Lead architecture review sessions and provide technical guidance on data-intensive projects

Data Lake Architecture & Management

  • Architect and implement highly scalable, multi-petabyte data lake solutions on AWS to support the our applications and cyber operations workflows

  • Design sophisticated data ingestion frameworks that collect and consolidate data from diverse sources including network traffic, threat intelligence feeds, sensor data, operational logs, and electromagnetic spectrum monitoring

  • Implement advanced data partitioning, compression, and storage optimization strategies to enable sub-second querying across massive datasets

  • Establish comprehensive data governance frameworks including data lineage, cataloging, metadata management, and quality monitoring

  • Design data mesh architectures that enable domain-oriented data ownership while maintaining central governance

  • Lead capacity planning and cost optimization efforts for petabyte-scale data infrastructure

ETL Pipeline Development & Optimization

  • Architect robust, fault-tolerant ETL pipelines that transform raw cyber operation data into structured formats for analysis at scale

  • Design complex data validation and quality assurance frameworks that ensure data integrity throughout multi-stage pipelines

  • Implement hybrid streaming and batch processing architectures to handle both real-time operational data and historical analysis

  • Lead development of sophisticated data enrichment processes that augment datasets with threat intelligence, geolocation data, temporal context, and multi-domain correlations

  • Design self-healing pipeline architectures with comprehensive error handling, retry logic, and monitoring systems

  • Drive performance optimization initiatives achieving significant improvements in throughput and latency

Schema Design & Graph Database Development

  • Lead the design and implementation of enterprise-grade graph database schemas using graph databases that model complex relationships across multiple operational domains:

    • Cyber network infrastructure, communication patterns, and exploitation chains

    • Cyber personas, attribution chains, and threat actor relationships

    • Physical world entities, geospatial relationships, and cross-domain connections

    • Electromagnetic spectrum data including WiFi, IoT protocols, and RF signal patterns

  • Architect sophisticated data models that balance query performance, analytical flexibility, and storage efficiency

  • Design schema evolution strategies and implement zero-downtime migration frameworks

  • Develop comprehensive ontologies and taxonomies that enable consistent data representation across diverse intelligence datasets

  • Lead graph query optimization efforts and establish indexing strategies for complex traversal patterns

  • Mentor engineers on graph modeling best practices and advanced Cypher query techniques

Collaboration & Strategic Planning

  • Partner closely with backend engineers, data scientists, cyber operations experts, and forward deployed analysts to understand evolving data requirements

  • Work with SRE teams to ensure data infrastructure reliability and operational excellence in secure government environments

  • Engage with government stakeholders to translate operational needs into technical data solutions

  • Provide technical leadership in customer engagements and capability demonstrations

  • Contribute to long-term technical strategy and roadmap planning for data platforms

Qualifications

Technical Skills & Experience

  • 8+ years of professional experience in data engineering, data architecture, or related roles with increasing technical leadership

  • Expert-level proficiency with AWS data services including S3, Athena, Kinesis, Lake Formation, and Redshift

  • Proven experience leading technical projects and mentoring junior data engineers

  • Advanced programming skills in Python and/or Golang for building production-grade data pipelines and frameworks

  • Extensive experience designing and implementing complex ETL pipelines using Apache Airflow or similar orchestration frameworks at scale

  • Deep expertise in data modeling techniques for both relational and NoSQL databases, including dimensional modeling and data vault methodologies

  • Mastery-level experience with graph databases (Neo4j, AWS Neptune, or similar) including advanced schema design, query optimization, and performance tuning using Cypher or Gremlin

  • Expert knowledge of big data processing frameworks such as Apache Spark, Flink, or similar technologies for petabyte-scale processing

  • Advanced SQL skills and proven experience with query optimization, indexing strategies, and performance tuning for massive datasets

  • Extensive experience with data lake architectures, modern data warehouse solutions (Snowflake, Redshift, Databricks), and lakehouse patterns

  • Deep understanding of data serialization formats (Parquet, Avro, ORC, JSON, Protocol Buffers) and optimization techniques

  • Expert knowledge of streaming data architectures, event-driven processing, and CDC (Change Data Capture) patterns

  • Advanced experience with containerization (Docker, Kubernetes) and infrastructure as code (Terraform, CloudFormation)

  • Understanding of distributed systems principles, consensus algorithms, and fault tolerance patterns

Security & Compliance

  • Deep understanding of data security best practices including encryption at rest and in transit, key management, and secure data sharing

  • Extensive experience implementing role-based access controls, data classification schemes, and audit logging

  • Knowledge of data privacy principles, compliance requirements for government systems, and secure data handling in classified environments

  • Understanding of zero-trust architecture principles and secure data pipeline design

Leadership & Communication Skills

  • Demonstrated experience mentoring data engineers and leading technical teams

  • Proven ability to organize development workflows, manage project delivery, and coordinate cross-functional initiatives

  • Strong communication skills with ability to explain complex technical concepts to diverse audiences including executives and government stakeholders

  • Experience conducting thorough code reviews and establishing data engineering standards

  • Track record of driving technical decision-making and architectural improvements

Education

  • Bachelor's degree in Computer Science, Data Science, Information Systems, or related field; Master's degree preferred

  • Equivalent practical experience in lieu of formal education may be considered for exceptional candidates

Security Requirements

  • Must be eligible to obtain a U.S. Government security clearance

  • Ability to work on-site in Arlington, VA with occasional travel to Fort Meade, MD

Distinguishing Qualifications

  • Previous experience as a technical lead or senior data engineer in government, defense, or intelligence applications

  • Track record of building data infrastructure for mission-critical systems with high availability requirements

  • Background in cybersecurity data analysis, threat intelligence platforms, or SIEM systems

  • Expert knowledge of graph analytics and network analysis for cyber operations and threat hunting

  • Deep understanding of cyber domain data including network flows, DNS logs, PCAP analysis, threat feeds, and vulnerability data

  • Expertise in geospatial data processing, analysis, and visualization

  • Experience with data mesh architectures, federated data systems, or multi-tenant data platforms

  • AWS certifications (Data Analytics, Solutions Architect, or similar) or other relevant data engineering certifications

  • Previous experience working with data scientists and ML engineers building production AI systems

  • Publications, conference talks, or recognized contributions to the data engineering community

Additional Skills

  • Expert knowledge of DataOps practices and CI/CD for data pipelines

  • Advanced understanding of cost optimization strategies for cloud data infrastructure at scale

  • Experience with data visualization platforms and building executive-level analytics dashboards

  • Deep knowledge of message queue systems (NATS, Kafka, RabbitMQ, Amazon SQS/SNS)

  • Experience designing APIs for data access and integration (REST, GraphQL, gRPC)

  • Understanding of multi-cloud or hybrid cloud data architectures

  • Expertise with data observability platforms and lineage tracking tools (Monte Carlo, Datadog, Great Expectations)

  • Experience with column-oriented databases and analytical query engines

  • Knowledge of data compression algorithms and storage optimization techniques

  • Experience with real-time analytics and complex event processing systems

Use AI to Automatically Apply!

Let your AI Job Copilot auto-fill application questions
Auto-apply to relevant jobs from 300,000 companies

Auto-apply with JobCopilot Apply manually instead
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to On site Data Jobs. Just set your preferences and Job Copilot will do the rest—finding, filtering, and applying while you focus on what matters.

Related Data Jobs

See more Data jobs →