Consistent and nuanced high-quality human judgment is needed to evaluate LLM outputs for safety, especially in ambiguous edge cases where guidelines are open to interpretation
Reviewed human annotations of LLm responses for safety and policy compliance - RLHF + SFT
Provided ground-truth decisions in complex or unclear scenarios
Helped refine what "correct" labeling looks like in practice
Improved quality and consistency of safety annotations
Strengthened the quality of evaluation data used to assess LLM behaviour
Brought clarity to ambiguous labeling scenarios
Job titles are inconsistent, unstructured and difficult to analyze, making it hard to search, group, or compare roles across datasets.
Build a normalized job title taxonomy to standardize variations
Defined an ontology layer (relationships between roles)
Structured the data to support better search, filtering, and grouping
Dramatically improved search experience and discovery
Improved consistency across job data
Enabled more accurate grouping of roles for added search value
Job descriptions can present hybrid/remote/on-site information in many different ways and combinations, making them hard to label automatically, even though this information is super important to job seekers.
What I did:
Designed a labeling workflow using the Gemini API to classify and label job description attributes
Build a process to validate and standardize outputs
Combined LLM-generated labels with structured validation
Impact:
Labeled and validated job data, adding new information for 350 companies (previously, the catalog had this information only for 200 companies)
Created a scalable approach for similar datasets