Topic: Data Cleaning – It’s Not Our Job to Fix Your Mess
Let’s be direct:
If your data is broken, we’re not cleaning it. We’re sending it back.
I know that’s hard to hear but you need to hear it. We want to use your data to give a service or to measure our program success. But we can’t do that because it isn’t right. We’d change it, but if we do, we’re responsible for it and that’s a risk we can’t afford. In any system that values legal accountability and data integrity, cleaning should be designed in, not tacked on.

Why We Don’t “Fix” Bad Data
- It’s a legal risk – Changing submissions can invalidate audit trails and open agencies to litigation
- It breaks trust – Edits behind the scenes lead to misinformation and public doubt
- It delays progress – Manual corrections slow everything down
- It’s unfair – The data originator has to be the one to fix it
We should not fix submitted data because doing so transfers legal ownership and accountability from the original source to the person or system that made the change. In regulated environments like government, healthcare, or finance, altering source data can:
- Compromise the integrity of audit trails
- Violate records retention policies and
- Invalidate the data’s admissibility in court or compliance reviews.
Every data point submitted is a record of action, intent, or declaration. Once it is modified, even with good intentions, the original context is lost, and the altered version may no longer reflect the truth. The correct approach is to log errors, flag issues, and return the data to its originator for correction, preserving both legal defensibility and the trustworthiness of the system.
What Data Cleaning Really Means
Data cleaning isn’t about fixing mistakes. It’s about building smart systems that:
- Reject bad data on entry
- Flag errors without modifying source records
- Push responsibility back to the original data submitter
- Preserve traceability through detailed logs and status tags
Our message is simple: own your data quality, or own the consequences.
Designing for Better Data: Tools That Prevent Garbage
1. Intuitive System Design
Make it easy for your customers or the people you serve to do the right thing. That means:
- Smart forms with auto-fill, real-time validation, and tooltips
- Drop-down menus instead of free text
- Required fields that don’t let you move forward with incomplete info
- Clear error messages that tell users exactly what went wrong
If your form or system is hard to use, users will guess. And when people guess, your data suffers.
2. Newsletters and Alerts on Data Quality
Make data quality a conversation, not just a correction.
Start a recurring newsletter or bulletin with:
- Top 3 rejection reasons this month
- How-to tips for avoiding common errors
- Quick links to updated templates and FAQs
- A spotlight on teams submitting high-quality files
Pro tip: Transparency builds culture. When people know where they’re going wrong, they improve.
3. Training for Bulk File Submitters
Bulk uploads are often the biggest offenders, but also the easiest to fix if trained properly.
Offer:
- Detailed file layout guides with clear field definitions
- Examples of correct and incorrect formats
- Short recorded demos or live walkthroughs
- “Test your file” tools that let users check their work before uploading
Don’t just hand them a template—teach them how to use it.
What a Healthy Data System Looks Like
- Upfront validation to catch errors before they pollute the system
- Error logs that return bad records to the source
- No silent fixing—all issues are transparent and traceable
- Ownership culture—if you broke it, you fix it
- Continuous education—newsletters, alerts, and training embedded into operations
Resources to Support Accountability and System Design
Final Word: Keep the Data Honest, or Take It Back
You submitted it? You own it.
If your file is wrong, missing fields, or not formatted properly—we’re sending it back with an error log and a smile. Not because we don’t care, but because we care too much to get it wrong.
We’re not fixing your data. We’re fixing the system that lets bad data in.
Design better. Communicate often. Train your team.
Because when data’s done right, nobody has to clean it.
Leave a Reply