Sayari Logo

Sayari

Data Engineering Intern

Job Posted 10 Days Ago Reposted 10 Days Ago
Remote
Hiring Remotely in United States
Internship
Remote
Hiring Remotely in United States
Internship
Assist the Data Engineering team in collecting global data, maintaining ETL pipelines, and developing new ones for Sayari Graph.
The summary above was generated by AI

About Sayari: 

Sayari is the counterparty and supply chain risk intelligence provider trusted by government agencies, multinational corporations, and financial institutions. Its intuitive network analysis platform surfaces hidden risk through integrated corporate ownership, supply chain, trade transaction and risk intelligence data from over 250 jurisdictions. Sayari is headquartered in Washington, D.C., and its solutions are used by thousands of frontline analysts in over 35 countries.


Our company culture is defined by a dedication to our mission of using open data to enhance visibility into global commercial and financial networks, a passion for finding novel approaches to complex problems, and an understanding that diverse perspectives create optimal outcomes. We embrace cross-team collaboration, encourage training and learning opportunities, and reward initiative and innovation. If you like working with supportive, high-performing, and curious teams, Sayari is the place for you.


Internship Description:

Sayari is looking for an intern to join its Data Engineering team! Sayari’s flagship product, Sayari Graph, provides instant access to structured business information from billions of corporate, legal, and trade records. As a member of Sayari's data team you will work with our Product and Software Engineering teams to collect data from around the globe, maintain existing ETL pipelines, and develop new pipelines that power Sayari Graph.


Our application tier is built primarily in TypeScript, running in Kubernetes, and backed by Postgres, Cassandra, Elasticsearch, and Memgraph. Our data ingest tier runs on Spark, processing terabytes of data collected from hundreds of data sources. The platform allows users to explore a large knowledge graph sourced from hundreds of millions of structured and unstructured records from over 200 countries and 30 languages. As part of this team, you'll have the chance to contribute to our growing library of open-source work, including our WebGL-powered network visualization library Trellis.


This is a remote paid internship with work expectations being between 20-30 hours a week.

Job Responsibilities:

  • Write and deploy crawling scripts to collect source data from the web
  • Write and run data transformers in Scala Spark to standardize bulk data sets
  • Write and run modules in Python to parse entity references and relationships from source data
  • Diagnose and fix bugs reported by internal and external users
  • Analyze and report on internal datasets to answer questions and inform feature work
  • Work collaboratively on and across a team of engineers using basic agile principles
  • Give and receive feedback through code reviews

Required Skills & Experience:

  • Experience with Python and/or a JVM language (e.g., Scala)
  • Experience working collaboratively with git

Desired Skills & Experience:

  • Experience with Apache Spark and Apache Airflow
  • Experience working on a cloud platform like GCP, AWS, or Azure
  • Understanding of or interest in knowledge graphs

What We Offer: 

·       A collaborative and positive culture - your team will be as smart and driven as you

·       Limitless growth and learning opportunities

·       A strong commitment to diversity, equity, and inclusion

·       Team building events & opportunities

 

Sayari is an equal opportunity employer and strongly encourages diverse candidates to apply. We believe diversity and inclusion mean our team members should reflect the diversity of the United States. No employee or applicant will face discrimination or harassment based on race, color, ethnicity, religion, age, gender, gender identity or expression, sexual orientation, disability status, veteran status, genetics, or political affiliation. We strongly encourage applicants of all backgrounds to apply.

Top Skills

Cassandra
Elasticsearch
Kubernetes
Memgraph
Postgres
Spark
Typescript

Similar Jobs

24 Days Ago
Remote
Arizona, USA
29-31 Annually
Internship
29-31 Annually
Internship
Legal Tech
As a Data Engineering Intern, you will assist in ETL processes, data integration, and orchestrate data flows using various tools while collaborating with teams in an agile environment.
Top Skills: AirflowGitGoogle Cloud PlatformLinuxPythonSnowflakeSQL
24 Days Ago
Remote
California, USA
29-31 Annually
Internship
29-31 Annually
Internship
Legal Tech
As a Data Engineering Intern, you will work with the Business Intelligence team on ETL processes, utilizing SQL and Google Cloud tools to manage data flows and collaborate with various teams in an agile environment.
Top Skills: AirflowGitGCPLinuxPythonSnowflakeSQL
24 Days Ago
Remote
Utah, USA
29-31 Annually
Internship
29-31 Annually
Internship
Legal Tech
As a Data Engineering Intern, you will assist the Business Intelligence team in ETL processes, utilize SQL and Airflow, and collaborate with various teams in an agile environment.
Top Skills: AirflowGitGCPLinuxPythonSnowflakeSQL

What you need to know about the Seattle Tech Scene

Home to tech titans like Microsoft and Amazon, Seattle punches far above its weight in innovation. But its surrounding mountains, sprinkled with world-famous hiking trails and climbing routes, make the city a destination for outdoorsy types as well. Established as a logging town before shifting to shipbuilding and logistics, the Emerald City is now known for its contributions to aerospace, software, biotech and cloud computing. And its status as a thriving tech ecosystem is attracting out-of-town companies looking to establish new tech and engineering hubs.

Key Facts About Seattle Tech

  • Number of Tech Workers: 287,000; 13% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Amazon, Microsoft, Meta, Google
  • Key Industries: Artificial intelligence, cloud computing, software, biotechnology, game development
  • Funding Landscape: $3.1 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Madrona, Fuse, Tola, Maveron
  • Research Centers and Universities: University of Washington, Seattle University, Seattle Pacific University, Allen Institute for Brain Science, Bill & Melinda Gates Foundation, Seattle Children’s Research Institute
By clicking Apply you agree to share your profile information with the hiring company.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account