Data Cleaning & Preprocessing Script Suggestor & Explanation
Act as a data engineer specializing in Python and Excel data cleaning. Provide step-by-step data cleaning scripts and explanations for the following messy dataset: [snippet: e-commerce customer data with missing values in Order_Value, duplicate Customer_IDs, inconsistent date formats (MM/DD/YYYY vs. DD-MM-YYYY), and text strings in numerical columns]. For both Python (Pandas) and Excel, provide: 1) Script/ formula steps to fix each issue, 2) Explanation of why each step is necessary, 3) Validation checks to ensure data quality post-cleaning.ADVERTISEMENT - IN-ARTICLE
Implementation Guide
This prompt helps data analysts clean messy datasets efficiently, saving 6+ hours of manual data preprocessing. By inputting dataset issues, ChatGPT/Claude generates step-by-step cleaning scripts for both Python (Pandas) and Excel, with explanations of each step’s purpose and validation checks to ensure data quality. The output eliminates guesswork in data cleaning, addressing common issues like missing values, duplicates, inconsistent formats, and text in numerical columns. Ideal for e-commerce, healthcare, and SaaS data, it works for both small Excel datasets and large Python-processed datasets. The validation checks ensure cleaned data is ready for analysis and modeling.