Company Overview:
We are building Protege to solve the biggest unmet need in AI — getting access to the right training data. The process today is time intensive, incredibly expensive, and often ends in failure. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data, starting in the healthcare industry.
Solving AI's data problem is a generational opportunity. The company that succeeds will be one of the largest in AI — and in tech.
Summary
The Applied Data Scientist bridges the gap between our data assets and our customers' needs in our healthcare vertical. They play a key role in ensuring our datasets are well-matched to the AI models our customers are building and well-understood by those customers. This role requires both healthcare data expertise, extensive experience with statistical analysis, and some customer collaboration.
We are open to hiring someone for part-time, temp-to-hire, and full-time opportunities in this role. Part-time would require at least 20 hours per week.
Responsibilities
-
Data Analysis: Conduct feasibility analyses by querying healthcare datasets to assess patient cohort availability based on complex inclusion/exclusion criteria (i.e. procedures, diagnoses, diversity, longitudinal completeness, regulatory constraints).
-
Trade-off Assessments: Assess privacy-preservation techniques to maximize dataset utility.
-
Customer Collaboration: Work directly with prospective customers to understand their data requirements and help curate the best data assets for their use cases.
-
Data Strategy: Identify gaps in our data offerings and provide insights to our partnerships team on the highest-priority data acquisitions.
-
Data Quality Assurance: Evaluate potential data partnerships, ensuring the data is high-quality, well-documented, and commercially viable.
Technical Skill Set
-
Data Expertise: Experience working with healthcare/medical datasets: some combination of imaging, EHR, genomic, claims, and pathology data as well as comfort with SQL, R , and/or Python for data analysis. The bigger the dataset you have worked with, the better!
-
Longitudinal & Cohort Analysis: Ability to evaluate datasets for completeness over time, ensuring sufficient patient follow-up and retention for model training.
-
Diversity & Bias Mitigation: Knowledge of techniques to assess and improve dataset diversity across demographics, geographies, and clinical subpopulations.
-
Privacy-Preserving Technologies: Familiarity with de-identification techniques such as Safe Harbor and Expert Determination.
Qualifications
-
2+ years experience in a health data role (e.g., biomedical informatics, computational biology, AI/ML in healthcare) or equivalent experience, e.g., Ph.D. or Masters in healthcare economics, statistics or data science with healthcare focus, etc.
-
Excellent communication skills with the ability to translate complex data concepts.
-
Proficiency in Snowflake and a stats coding language (SQL, R, Python), including writing complex queries and working with large datasets.
-
Experience in a customer-facing role preferred.
Similar Jobs
What you need to know about the Seattle Tech Scene
Key Facts About Seattle Tech
- Number of Tech Workers: 287,000; 13% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Amazon, Microsoft, Meta, Google
- Key Industries: Artificial intelligence, cloud computing, software, biotechnology, game development
- Funding Landscape: $3.1 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Madrona, Fuse, Tola, Maveron
- Research Centers and Universities: University of Washington, Seattle University, Seattle Pacific University, Allen Institute for Brain Science, Bill & Melinda Gates Foundation, Seattle Children’s Research Institute