When it comes to data, it is crucial that its quality is preserved throughout any process. A large risk while handling data is that it could be lost, unintentionally altered, or incorrectly calculated, which could lead to disastrous results further down the pipeline.
And so, validation tests should be play a core part of your process in SolveXia.
There is no one-size-fits-all set of validation tests that can be applied to all data. And so, it would not be prudent to blindly replicate all the validation tests described in this article, without any consideration on the data used in your process. Instead, it is the core principles that the tests in this article aim to highlight to maintain a high level of data quality in your processes.
From here on, this article will describe common validation tests that could be applied at various stages in an average process. However, as before, the aim is to guide you with a strong set of principles that these tests aim to adhere to.
Moving data
It is very common for at the beginning of any process for data to be uploaded by the user of a process, which will then be subsequently moved on to other files to be processed, where relevant calculations are made.
However, to ensure that these calculations are made on correct data, it would be wise to test that the data itself is actually correct ('correct' being an intentionally vague term, which we will try to tackle in the various tests covered later in the article!).
Checking column structure
One of the easiest, yet more effective checks is to simply test whether the data appears in the expected column structure as that which the Designer of the SolveXia process had in mind.
To better illustrate this, here's an example of a calculation file in a process where columns A to C are regularly replaced with the new month's sales data.
As you can see in the formula box at the top of the screenshot, the weekly report table (on the right) depends on the data in columns A to C. And so, the integrity of the weekly report depends heavily on the quality of the data in columns A to C.
The screenshot below shows how the step in SolveXia has been configured to import this data.
However, suppose that IT has changed the format of this sales data that you usually upload straight into SolveXia, such as:
Note that there is now a "Branch" and "Age" column and the "Sales" and "Date" columns have been re-arranged.
And so, without any validations in place, the wrong data will be pasted into columns A to C in the template file, and the weekly report will be displaying incorrect results:
Solution
To prevent this from happening, we need a simple validation test on the column structure of our input file:
In the bottom right, we can have a simple table that runs this check.
However, first we will need to add an additional step in our process which copies just the column names in the original source data file and copies it to the table:
And so, now with an extra control cell, we can use this with a "Decision Control Step" in the process to immediately alert the user through an email, for instance.
Data integrity checks
Contextual sense checks
To further ensure the quality of your data and process, implementing contextual sense checks, depending on known business rules could also ensure that your process behaves as intended.
Contextual sense checks, as the name suggests, depend on the process at present. The aim of these checks is to ensure that your data makes sense and follow known business logic throughout your process. And so, it would be wise to implement these sense checks regularly throughout your process.
Furthermore, it also depends on how well the process designer is aware of business logic, as it requires knowledge in the subject matter to implement sensible and effective sense checks.
For example, suppose we have a process that handles daily stock price data.
A simple and effective sense check is to flag whether your dataset contains an email address that doesn't contain a "@" character. This could make the user aware of any issues in the data, either in data input or whether the process is reading off the incorrect column (other than the intendend email field) etc.
Alternatively, suppose that we have a process that handles daily stock price data. Sense check in this case could include a step that flags whether any particular stock has shown daily returns that are greater/less than +-100%. Although, in some cases, this could happen as a false positive, it would still be worthwhile to make the user aware of this in the chance that is a true positive.
Comments
0 comments
Please sign in to leave a comment.