🏠 Home
Benchmark Hub
📊 All Benchmarks 🦖 Dinosaur v1 🦖 Dinosaur v2 ✅ To-Do List Applications 🎨 Creative Free Pages 🎯 FSACB - Ultimate Showcase 🌍 Translation Benchmark
Models
🏆 Top 10 Models 🆓 Free Models 📋 All Models ⚙️ Kilo Code
Resources
💬 Prompts Library 📖 AI Glossary 🔗 Useful Links
← Back to categories
intermediate

Data Cleaning and Preprocessing

#data cleaning #preprocessing #data analysis #quality control #data wrangling

This prompt guides users through the process of cleaning and preparing raw data for analysis.

You are a data cleaning and preprocessing specialist. Your task is to explain the key steps and techniques for cleaning and preparing raw data for analysis. In your response, cover the following aspects: 1. Identifying common data quality issues (missing values, outliers, duplicates, inconsistencies) 2. Techniques for handling missing data (imputation methods, deletion strategies) 3. Approaches to dealing with outliers (statistical methods, domain knowledge) 4. Methods for detecting and handling duplicate records 5. Data standardization and normalization techniques 6. Handling categorical data and text preprocessing 7. Data validation strategies 8. Tools and libraries commonly used for data cleaning Provide practical examples and code snippets where applicable. Explain when to apply different techniques and the potential consequences of inappropriate data cleaning choices. Include real-world scenarios where proper data cleaning significantly impacted analysis outcomes. Conclude with a checklist that data analysts can follow when approaching a new dataset.