Sayari Logo

Sayari

Data Engineer Intern - Web Crawling

Job Posted 10 Days Ago Reposted 10 Days Ago
Remote
Hiring Remotely in United States
Internship
Remote
Hiring Remotely in United States
Internship
As a Data Engineer Intern, you will enhance and maintain Sayari’s web crawling framework, ensuring scalability and reliability while collaborating with engineering teams.
The summary above was generated by AI

About Sayari: 

Sayari is the counterparty and supply chain risk intelligence provider trusted by government agencies, multinational corporations, and financial institutions. Its intuitive network analysis platform surfaces hidden risk through integrated corporate ownership, supply chain, trade transaction and risk intelligence data from over 250 jurisdictions. Sayari is headquartered in Washington, D.C., and its solutions are used by thousands of frontline analysts in over 35 countries.


Our company culture is defined by a dedication to our mission of using open data to enhance visibility into global commercial and financial networks, a passion for finding novel approaches to complex problems, and an understanding that diverse perspectives create optimal outcomes. We embrace cross-team collaboration, encourage training and learning opportunities, and reward initiative and innovation. If you like working with supportive, high-performing, and curious teams, Sayari is the place for you.


Internship Description:

Sayari is looking for a Data Engineer Intern specializing in web crawling to join its Data Engineering team! Sayari has developed a robust web crawling project that collects hundreds of millions of documents every year from a diverse set of sources around the world. These documents serve as source records for Sayari’s flagship graph product, which is a global network of corporate and trade entities and relationships. As a member of Sayari's data team your primary objective will be to work on maintaining and improving Sayari’s web crawling framework, with an emphasis on scalability and reliability. You will work with our Product and Software Engineering teams to ensure our crawling deployment meets product requirements and integrates efficiently with our ETL pipelines.


This is a remote paid internship with work expectations being between 20-30 hours a week.

Job Responsibilities:

  • Investigate and implement web crawlers for new sources
  • Maintain and improve existing crawling infrastructure
  • Improve metrics and reporting for web crawling
  • Help improve and maintain ETL processes
  • Contribute to development and design of Sayari’s data product

Required Skills & Experience:

  • Experience with Python
  • Experience managing web crawling at scale, any framework, Scrapy is a plus
  • Experience working with Kubernetes
  • Experience working collaboratively with git
  • Experience working with selectors such as: XPath, CSS, JMESPath
  • Experience with WebDev tools (Chrome/Firefox)

Desired Skills & Experience:

  • Experience with Apache projects such as Spark, Avro, Nifi, and Airflow
  • Experience with datastores Postgres and/or RocksDB
  • Experience working on a cloud platform like GCP, AWS, or Azure
  • Working knowledge of API frameworks, primarily REST
  • Understanding of or interest in knowledge graphs
  • Experience with *nix environments
  • Experience with reverse engineering
  • Proficient in bypassing anti-crawling techniques
  • Experience with Javascript

What We Offer: 

·       A collaborative and positive culture - your team will be as smart and driven as you

·       Limitless growth and learning opportunities

·       A strong commitment to diversity, equity, and inclusion

·       Team building events & opportunities

 

Sayari is an equal opportunity employer and strongly encourages diverse candidates to apply. We believe diversity and inclusion mean our team members should reflect the diversity of the United States. No employee or applicant will face discrimination or harassment based on race, color, ethnicity, religion, age, gender, gender identity or expression, sexual orientation, disability status, veteran status, genetics, or political affiliation. We strongly encourage applicants of all backgrounds to apply.

Top Skills

Data Engineering
ETL
Software Engineering
Web Crawling

Similar Jobs

An Hour Ago
Easy Apply
Remote
United States
Easy Apply
Entry level
Entry level
Automotive • Fintech • Hardware • Payments • Travel • Financial Services
As a Model Risk Analyst, you will validate statistical and machine learning models, research new tools, and collaborate with teams to enhance model performance and mitigate risks.
Top Skills: PythonSQL
An Hour Ago
Easy Apply
Remote
2 Locations
Easy Apply
160K-222K Annually
Senior level
160K-222K Annually
Senior level
Artificial Intelligence • Fintech • Machine Learning • Social Impact • Software
As a Senior Analytics Engineer, design and develop scalable data models, collaborate with teams to optimize analytics pipelines, and train users on BI tools for data insights.
Top Skills: AirflowAmazon Web ServicesAzureBigQueryDatabricksDbtEtl OptimizationGCPLookerLooker Data ModelsModePower BIPythonRedshiftSQLTableau
2 Hours Ago
Easy Apply
Remote
2 Locations
Easy Apply
Mid level
Mid level
Healthtech • Information Technology • Mobile • Productivity • Software • Analytics • Telehealth
Analyze and classify medical professionals' behavioral patterns, create analytics for products and clients, and collaborate on data projects.
Top Skills: GitPythonSparkSQLUnix

What you need to know about the Seattle Tech Scene

Home to tech titans like Microsoft and Amazon, Seattle punches far above its weight in innovation. But its surrounding mountains, sprinkled with world-famous hiking trails and climbing routes, make the city a destination for outdoorsy types as well. Established as a logging town before shifting to shipbuilding and logistics, the Emerald City is now known for its contributions to aerospace, software, biotech and cloud computing. And its status as a thriving tech ecosystem is attracting out-of-town companies looking to establish new tech and engineering hubs.

Key Facts About Seattle Tech

  • Number of Tech Workers: 287,000; 13% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Amazon, Microsoft, Meta, Google
  • Key Industries: Artificial intelligence, cloud computing, software, biotechnology, game development
  • Funding Landscape: $3.1 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Madrona, Fuse, Tola, Maveron
  • Research Centers and Universities: University of Washington, Seattle University, Seattle Pacific University, Allen Institute for Brain Science, Bill & Melinda Gates Foundation, Seattle Children’s Research Institute
By clicking Apply you agree to share your profile information with the hiring company.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account