Data Cleansing
How many times have you made a decision based on the wrong information? Having the right information is key in being successful. That’s why having accurate data is critical to everyday function.
Data cleansing, also known as data scrubbing, is the process of ensuring that a set of data is correct and accurate. During data cleansing, records are checked for accuracy and consistency, and either corrected, or deleted as necessary. Data cleansing can occur within a single set of records, or between multiple sets of data which need to be merged, or which will work together.
With the power of accurate information … you could:
- make the best decision and course of action (correct situational understanding)
- gain perspective of an obstacle you didn’t have before (learn new techniques)
- accomplish more (enhance productivity)
- prevent fires (figuratively and literally)
- change the course of history (correctly know your strategy vs. enemy strategy)
- save money and save time (know what works best in each situation)
- prevent disaster (accurate weather prediction)
- increase value (correct stock & bond gain/loss)
- prevent a war (accurate government intelligence)
- develop new technology (understand true needs of a culture)
- feed, heal and comfort disaster areas (accurate news coverage)
- save a life (correct doctor diagnosis)
For this reason, it’s critical to find out all the facts and be diligent about what an impact having the right information can have… not only to you, but to the world.
Stop, hammertime….

For example data mining, on inacurrate and “DIRTY DATA” can be a waste of time for the data engineer and the data requestor. If the requestor bases their decisions on faulty data, it could hinder their outcome. Data integry plays a bigger role than most people anticipate. Most of the reasons data get’s dirty is laziness, poor input planning processes and lack of concistency input. Here are some ideas that will help your team during a data cleanse.
STEP 1
Ask the right questions.
- What is the data source (manual or automated)?
- Is the input consistent (drop lists or free form)?
- Are you capturing all the data you might need or are you capturing data you won’t need? (lean vs. bulk)
- Who will need the data and what will they want to see?
- How long does it take to pull data vs. how freqently do they want the data?
- Do you understand what they are looking for vs. what they asked for?
- What is the best method to interpret the data to help the requestor understand the results?
STEP 2
Plan your steps
- Decide the best course before taking action (don’t waste time developing on the fly… work with the end in mind and work backwards).
- Ask the stakeholders before taking a step (ensure that the end result you plan to deliver will contain all they need and be easy to understand.)
- Document delivery methods if they are new before execution (it gives you a guide and if you need to recreate this, you can always refer to your map.)
STEP 3
Step your plans – Execute
- Follow your plan and adjust it as needed as you go, while keeping stakeholders informed.
- ALWAYS double check before you deliver.
With a little time on ensuring accuracy, you will increase the trust of your requestor and ensure the best decisions are made.
This entry was posted on April 26, 2010 at 10:37 am, and is filed under
Reporting. Follow any responses to this post through
RSS 2.0. Both comments and pings are currently closed.
Comments are closed.