Projects

Selected Projects in Data Labeling, Taxonomy Design and Data Quality Systems

Reviewing AI safety annotations for LLM outputs

Consistent and nuanced high-quality human judgment is needed to evaluate LLM outputs for safety, especially in ambiguous edge cases where guidelines are open to interpretation

What I did:

Reviewed human annotations of LLm responses for safety and policy compliance - RLHF + SFT
Provided ground-truth decisions in complex or unclear scenarios
Helped refine what "correct" labeling looks like in practice

Impact:

Improved quality and consistency of safety annotations
Strengthened the quality of evaluation data used to assess LLM behaviour
Brought clarity to ambiguous labeling scenarios

Defining a job title taxonomy with normalization and entity relationships

Job titles are inconsistent, unstructured and difficult to analyze, making it hard to search, group, or compare roles across datasets.

What I did:

Build a normalized job title taxonomy to standardize variations
Defined an ontology layer (relationships between roles)
Structured the data to support better search, filtering, and grouping

Impact:

Dramatically improved search experience and discovery
Improved consistency across job data
Enabled more accurate grouping of roles for added search value

Designed LLM-based labeling workflow for job description data

Job descriptions can present hybrid/remote/on-site information in many different ways and combinations, making them hard to label automatically, even though this information is super important to job seekers.

What I did:

Designed a labeling workflow using the Gemini API to classify and label job description attributes
Build a process to validate and standardize outputs
Combined LLM-generated labels with structured validation

Impact:

Labeled and validated job data, adding new information for 350 companies (previously, the catalog had this information only for 200 companies)
Created a scalable approach for similar datasets

Page updated

Google Sites

Report abuse