Job Overview
We are in search of a dedicated and skilled Data Analyst to become a vital part of our innovative AI & Threat Analytics team. This role is pivotal in advancing our autofill classification models through meticulous management, optimization, and analysis of datasets. This position is fully remote, with a hybrid option available for candidates in the El Dorado Hills, CA, or Chicago… IL, metro areas.
Key Responsibilities
• Oversee the comprehensive lifecycle of data collection, cleansing, and preprocessing for HTML-centric datasets utilized in machine learning initiatives.
• Employ web analysis tools to extract and structure data from DOM environments for training and validating models.
• Collaborate with machine learning engineers to facilitate feature engineering experiments, ensuring creation of training datasets that align with model specifications.
• Generate and enhance synthetic datasets via large language models (LLMs) to improve the quality and balance of training data.
• Conduct data analysis using dimensionality reduction techniques, such as t-SNE, PCA, and UMAP, to assess feature efficacy and enhance dataset quality.
• Automate data workflows to optimize data processing, transformation, and manipulation.
• Maintain comprehensive documentation of data workflows, methodologies, and processes to guarantee lineage and reproducibility.
• Develop validation and data quality protocols to uphold consistency and integrity across all datasets.
Required Skills
• 2+ years of hands-on experience as a Data Analyst, with a preference for backgrounds in cybersecurity or machine learning environments.
• Proficient in Python for data manipulation and analysis, utilizing libraries such as Pandas and NumPy.
• Extensive experience with web analysis tools (e.g., Selenium, BeautifulSoup) and a strong grasp of HTML and DOM structures to facilitate data extraction and preprocessing.
• Familiar with natural language processing (NLP) techniques like tokenization, stop word exclusion, and lemmatization for processing text data.
• Experience in generating synthetic datasets and employing LLMs to augment machine learning data.
• Ability to work effectively in collaboration with machine learning engineers and other technical teams.
• Strong analytical problem-solving capabilities and a meticulous approach to ensuring data quality and governance.
• Knowledgeable in utilizing cloud platforms (AWS, GCP, Azure) for data storage and processing.
• Bachelor’s degree in Data Science, Statistics, Computer Science, or a related discipline, or equivalent professional experience.
• Due to the role’s engagement with GovCloud, all candidates must be a US Person.
Career Growth Opportunities
Joining our esteemed organization provides significant opportunities for professional development and advancement within the dynamic fields of data analysis and machine learning, allowing you to enhance your expertise and further your career.
Company Culture And Values
We are commited to fostering a diverse and inclusive workplace, promoting collaboration, innovation, and a strong sense of community among our employees.
Compensation And Benefits
• Comprehensive medical, dental, and vision insurance, including domestic partnership coverage.
• Employer-paid life insurance and supplemental life insurance options for employees, spouses, and children.
• Voluntary short/long-term disability insurance.
• 401(k) plan with both Roth and traditional options available.
• Generous paid time off (PTO) plan that acknowledges your dedication and tenure, including bereavement and jury duty leave.
• Competitive annual bonuses.
We celebrate diversity and are committed to creating an inclusive environment for all employees.
Employment Type: Full-Time