AI & Analytics

Ik Heb MessyData Open Source Gemaakt, een synthetische dirty datagenerator. Het laat je data met anomalieën en kwaliteitsproblemen programmatisch genereren.

Reddit r/datascience

Samenvatting

Tired of always using the Titanic or house price prediction datasets to demo your use cases? I've just released a Python package that helps you generate realistic messy data that actually simulates reality. The data can include missing values, duplicate records, anomalies, invalid categories, etc. You can even set up a cron job to generate data programmatically every day so you can mimic a real data pipeline. It also ships with a Claude SKILL so your agents know how to work with the library a...

Lees het volledige artikel