Summary

Overview

Work History

Education

Skills

Timeline

Projects

TSVIEL BEN - SHABAT

Data Scientist

Summary

Data Scientist with 6+ years of experience in data analysis, machine learning, and statistical modeling. Proficient in Python, utilizing its capabilities to extract insights from complex datasets. Holds an M.Sc. in Statistics, with extensive knowledge in unsupervised learning techniques and AI frameworks. Passionate about leveraging cutting-edge technologies and expanding expertise through personal projects, including successful implementations using PyTorch. Committed to driving data-driven decision-making and delivering impactful solutions for business growth.

Overview

years of professional experience

years of post-secondary education

Work History

Data Scientist

Silk

02.2022 - Current

Anomaly Detection for Operational Risk:
Developed a novel tree-based anomaly detection algorithm that assigns decision tree leaves to system components and computes anomaly scores based on probability ratios.
Performance & Cost Optimization:
Collaborated with engineers to optimize system benchmarks and reduce cloud resource usage, ensuring high reproducibility of performance tests.
Cross-Functional Analytics:
Integrated statistical methods into multi-dimensional data pipelines for improved risk monitoring and operational efficiency.

Teaching Assistant

Technion

01.2021 - 12.2021

Among other topics, the course deals with Network Graph Analysis, Recommender Systems, and Blockchain.
Helped with grading assignments and tests, providing constructive feedback to students based on results.
Supported classroom activities, tutoring, and reviewing work.

Software Engineer

Kaminario

01.2018 - 12.2020

Designed, automated, and deployed microservices in a Kubernetes environment, including a storage capacity prediction service using ELK ML.
Developed an automated storage capacity prediction microservice using ELK ML, providing clients with clear OpEx projections for six months ahead.

Student Data Analyst

Kaminario

01.2016 - 12.2018

Extracted and analyzed Call Home data to create monitoring and insights, reducing human labor monitoring costs by $200K annually.
Updated and developed scripts and queries to extract and analyze data from multiple sources.

Education

Master of Science - Statistics

Technion, Israel

04.2019 - 01.2022

Cum Laude; GPA: 95; Awarded the M.Sc. Student Award for interdisciplinary data science research.
Research integrated unsupervised learning, game theory, and advanced statistical methodologies.
Authored and co-authored multiple academic papers.

Bachelor of Science - Computer Information Systems

Technion, Israel

04.2014 - 01.2018

Cum Laude; GPA: 88.6; Recognized on the President’s and Dean’s Lists.
Graduated with a project on a smart robot packaging system, emphasizing innovative application of information systems engineering.

Skills

Regression

Timeline

Data Scientist - Silk

02.2022 - Current

Teaching Assistant - Technion

01.2021 - 12.2021

Technion - Master of Science, Statistics

04.2019 - 01.2022

Software Engineer - Kaminario

01.2018 - 12.2020

Student Data Analyst - Kaminario

01.2016 - 12.2018

Technion - Bachelor of Science, Computer Information Systems

04.2014 - 01.2018

Projects

Anomaly Detection

Methodology: Developed a novel tree-based algorithm using a standard decision tree. Forced the tree’s leaves to correspond to system components and employed supervised learning (with known component labels) to compute probabilities for each time sample. Derived a score using 1 – (min(probabilities)/max(probabilities)) and flagged an anomaly when the 24-hour average exceeded 80%.
Outcomes/Impact: Enabled detection of significant deviations across system components, effectively addressing the limitations of isolation forests—which are unsupervised and limited to time series data.

Stable-Performance

Methodology: Replaced a binary search approach with a constant jump search strategy combined with a Mann-Whitney test to evaluate performance variability in cloud-based storage systems. This method accounted for the logarithmic nature of performance gains and ensured that deviations signaled genuine system issues rather than noise.
Outcomes/Impact: Achieved highly reproducible performance tests; by reducing variability, any observed poor performance reliably indicated a real system problem, thereby improving overall test reliability and troubleshooting.

Faster-Performance-Testing

Methodology: Identified that traditional 5-minute performance tests were excessive. Demonstrated that taking the median of the first 45 seconds of data yielded results 99.99% as accurate as a full 5-minute average.
Outcomes/Impact: Reduced test runtime by 85% per thread, enabling more frequent testing (from once a week to every two nights) and significantly cutting cloud costs by reducing virtual machine deployment time.

Zone Critic

Methodology: Tackled frequent test failures due to resource shortages by implementing a Multi-Armed Bandit algorithm with exponential decay. This dynamically updated and tracked the success ratios across cloud zones, reacting swiftly to abrupt changes while filtering out transient noise.
Outcomes/Impact: Reduced test failures due to missing resources from an average of 10% down to less than 1%, ensuring a more reliable automated testing process.

Test Coverage

Methodology: Collaborated with QA leadership to define key feature categories and employed a large language model API to analyze test descriptions for these features. Generated visual reports (e.g., word clouds) that detailed test coverage along with associated costs and success ratios.
Outcomes/Impact: Streamlined test planning for new features, saving significant time and effort for QA leads when navigating complex test suites.

Bug Duplicates

Methodology: Built a microservice that utilized minimal information (error messages and failing test steps) in combination with BERT embeddings and cosine similarity to flag duplicate bug reports.
Outcomes/Impact: Flagged potential duplicate bug reports with approximately 90% accuracy (10% false positive rate), thereby saving cloud resources and developer time by reducing redundant investigations.

IOPulse

Methodology: Developed a cost optimization solution for PV2 disks on Azure (details proprietary until patent protection). The approach ensures that costs are directly tied to actual resource consumption.
Outcomes/Impact: Provides a mechanism to avoid overpaying for disk resources, optimizing cloud costs by ensuring payment only for what is consumed.

Performance Analyzer

Methodology: Created a data pipeline that processes various system monitoring data (IOstat, VDBench, SAR) to identify resource bottlenecks. The pipeline detects the point of maximum system performance and then analyzes the surrounding data to diagnose whether the issue lies in network, compute, or another resource area.
Outcomes/Impact: Aids in diagnosing system performance issues by accurately pinpointing resource bottlenecks, thereby enabling targeted configuration improvements and performance tuning.

Chiplet Members

Methodology: Conducted latency tests between every pair of CPU cores in architectures featuring chiplets. Clustered latency measurements into two groups—lower latency (cores within the same chiplet) and higher latency (cores in different chiplets). Built graphs from low-latency connections and computed cliques to automatically determine chiplet membership.
Outcomes/Impact: Uncovered hidden chiplet structures (not disclosed by cloud providers), enabling potential optimizations in inter-core communication and offering a deeper understanding of the underlying architecture.

ExtractAI (Faster Reporting Delivery)

Methodology: Leveraged iterative large language model (LLM) API calls to automate the extraction and summarization code of information from a vast Elasticsearch database containing system events, call-home emails, and counters.
Outcomes/Impact: Accelerated report generation and streamlined data insights, reducing the reporting burden and enabling quicker access to critical system metrics.

Ransomware Ingest Pipeline

Methodology: Built a cost-effective data ingestion pipeline using Apache Beam (Dataflow on GCP), CloudSchedule, and BigQuery, completing development in under two weeks.
Outcomes/Impact: Delivered a scalable pipeline for big data processing that efficiently removes duplicates and streamlines data ingestion.

Spot TTL

Methodology: Applied survival analysis techniques using usage history metrics to evaluate system performance under varying conditions.
Outcomes/Impact: Although not deployed, the analysis revealed that almost all tests can run on spot machines and that some zones are indeed better than the others.

Ransomware Detection

Methodology: Employed statistical anomaly detection methods informed by extensive research into ransomware operational patterns.
Outcomes/Impact: Developed an end-user feature for detecting ransomware activity, enhancing system security through early threat identification.

Background Compactor Research

Methodology: Analyzed customer periodic behavior to identify inefficiencies, demonstrating how the existing algorithm contributed to high latency during critical periods.
Outcomes/Impact: Reduced product latency by optimizing algorithm performance, leading to improved system responsiveness.

Similar Profiles

Viktoriia PopiukViktoriia Popiuk
Restaurant Waiter at SilkRestaurant Waiter at Silk
Emin BabayevEmin Babayev
GSE & Tools Monitoring Team Manager at Silk Way Technics branch of Silk Way West Airlines LLCGSE & Tools Monitoring Team Manager at Silk Way Technics branch of Silk Way West Airlines LLC
Saba ToriaSaba Toria
Head of International Sales at Silk Trade LTD & Silk Invest LTDHead of International Sales at Silk Trade LTD & Silk Invest LTD
Zeyu JiangZeyu Jiang
Data Scientist & Clinical Scientist at Philips HealthcareData Scientist & Clinical Scientist at Philips Healthcare
Vidyashree NVidyashree N
Senior Data Associate – Data Scientist at UnileverSenior Data Associate – Data Scientist at Unilever

CREATE PROFILE

Summary

Overview

Work History

Data Scientist

Teaching Assistant

Software Engineer

Student Data Analyst

Education

Master of Science - Statistics

Bachelor of Science - Computer Information Systems

Skills

Timeline

Projects

Similar Profiles

Viktoriia PopiukViktoriia Popiuk

Emin BabayevEmin Babayev

Saba ToriaSaba Toria

Zeyu JiangZeyu Jiang

Vidyashree NVidyashree N