Data Analytics and AI

Data cleaning = making raw data ready for analysis.

Let’s take an example dataset of daily sales:

Date Product Sales
01-08-25 Prodct A 100
02-08-25 Product A -50
03-08-25 Product B 200
04-08-25 Product A 100000
05-08-25 Product C  

👉 Step 1: Check for spelling mistakes

  • “Prodct A” vs “Product A” → same product but typed wrong.

    • Excel fix: use Find & Replace or create a clean “Product List” and use Data Validation.


👉 Step 2: Handle missing values

  • 05-08-25 has no sales value.

    • Option 1: Replace with 0 if it means “no sales.”

    • Option 2: Fill with average if it’s truly missing data.


👉 Step 3: Handle wrong/illogical values

  • Sales = -50 (negative sales) → maybe a refund? Need to confirm.

  • Sales = 100000 → looks like a typo (maybe it was 1000).

    • Excel fix: use Filters or Conditional Formatting to highlight outliers.


👉 Step 4: Consistent formatting

  • Dates should all be in one format (dd-mm-yy).

  • Product names consistent (no extra spaces).


✅ After cleaning, the table looks like this:

Date Product Sales
01-08-25 Product A 100
02-08-25 Product A 0
03-08-25 Product B 200
04-08-25 Product A 1000
05-08-25 Product C 150

Now it’s ready for analysis.


👉 Question for you:
If you see a row with negative sales, what would you do first — delete it, or try to understand why it’s negative?