Automation is inevitable to support the increased audit, FOI, eDiscovery, security and other regulatory obligations that government agencies are expected to meet. Recently, Castlepoint engaged with a small specialist Federal agency. The agency had been running a manual sentencing process for two years, which involved reviewing content in a legacy shared drive and classifying it against AFDA Express and their Records Authority (RA). It wanted to explore an automation solution as an alternative.
We implemented Castlepoint and ran it across the same drive, registering, classifying and sentencing the content using Artificial Intelligence (AI).
In parallel, we selected a random sample of files, and undertook a separate manual sentencing activity by a qualified records manager, blind to the results of both the original manual activity and the Castlepoint audit.
Castlepoint uses Natural Language Processing (Machine Learning) to recognise key terms and concepts in documents and data held within systems.
A combination of Artificial Intelligence supervised and unsupervised learning models is employed to correctly match, categorise and sentence the terms and concepts.
Castlepoint then uses the model outputs to select the correct Record Authority, Taxonomy, and Ontology that best fits the data.
The previous manual process had involved:
Approximately 1 million items were sentenced in this way. The projected cost to complete sentencing of the 4.6 million items in the drive was 223 weeks, which is the equivalent of about A$300,000 APS6 FTE salary.
Our team installed Castlepoint, and used it to run an automated classification and sentencing process. The high level process looks like this:
After installation, the audit took 30 days to run across all 4.6 million items (average 40,000 items a day, or 280,000 per week – one every two seconds).
The cost of a Castlepoint enterprise license for this type of organisation is under $A20,000.00 per annum, so there was a 70% reduction in per-item costs (from 7c per item to 0.004 cents). This would continue to reduce as Castlepoint ran across more systems, as the license is for the whole enterprise. Overall, we reduced the cost of processing by 95% per item.
The other huge benefit of automation is extensibility. We ran some other key uses cases with Castlepoint, with the following projected results:
Our projected Return on Investment for Castlepoint was under one month to 100% cost recovery.
One thing to note is that manual classification should still be the gold standard. No AI can make inferences and assessments about the content and context of a document with the sophistication of a human brain.
But the benefits we can achieve from having human evaluators are quickly undermined by the sheer scale of the problem. To categorise all of these records in the timeframe, the manual sentencer had sentence on average 20,970 items a week, which required them to sentence nine items per minute (one every seven seconds). There is no time, in this model, to read and understand the entire document.
As a result, our validation exercise identified some key issues.
We found that 75% of the manually classified records in the sample size were potentially under-classified, and should be retained 40% longer than currently planned. This under-classification was caused by assessing the records based on their title and a (necessarily) quick scan of their content, which did not allow the sentencer to identify small (but key) portions of text that elevated the item from a 7-year class to a 10-year class, for example.
We also found that a lot of content was assigned Normal Administrative Practice (NAP) Classification for ad hoc deletion, as part of a necessary strategy to expedite the sentencing activity. Items were marked as NAP based on risk-based decisions, such as their format. Castlepoint’s assessment of the records marked as NAP indicated that most actually needed to be retained.
Castlepoint had a 100% success rate in retention application, compared to the in-depth records manager sentencing action. Castlepoint also used the retention on individual items to calculate the retention of the whole aggregation, reducing the number of disposition actions required.
The sentencer could not open some types of files, meaning they could not be sentenced at all. Attachments in emails couldn’t be opened; hidden files, system files, files with overlong names, and zip files were excluded. In this share, the most common file types included .properties, .bat, .html and .gz. which the sentencer couldn’t read. Castlepoint was able to read all of these.
We found that the requirement to rename files introduced some classification mistakes based on typographical errors. Transposing two digits changed the applied Class, and as such the sentence (e.g. 20314/20334 vs. 20344). Also, opening the file to appraise it can change the metadata, which affects the sentence calculation, so care must be taken here. Castlepoint avoided these issues by maintaining a standalone register of all items, and not modifying the source.
Disposition decisions need to be defensible. When we give a business owner a file name, out of context with its other related records, they are not able to make informed decisions about whether the sentence is appropriate without themselves also reviewing the document. Castlepoint provided the key phrases that were used to make the decision, so the owner could simply review these to validate the sentence.
Class isn’t the only consideration when it comes to disposition. We also need to know if the content could be subject to a retention hold or disposition freeze. We need to know if it relates to any key work that is ongoing, as it may still have real value to the organisation. On the flip side, we need to know if it’s a risky item. If it contains PII, details about Spent Convictions, sensitive commercial or other confidential information, or classified information, it may need to be disposed of more expeditiously (and handled differently) to less sensitive items.
Castlepoint identified over 500 items with sensitive content, and over 2,500 items subject to a freeze or hold. It also flagged actionable events, including deletions, classification downgrades, unauthorised modifications, or any other action we wanted visibility of. And we also created a taxonomy of ‘high value’ terms, so that the agency can easily see and protect information that is of interest to the executive, key projects, or current regulatory activities.
Artificial Intelligence has the advantage and relative luxury of being able to read every single word in a document, extremely quickly, and can scan and re-scan 24 hours a day without a break. AI doesn’t suffer from decision fatigue, or compassion fatigue, eye strain, or even a sore finger from clicking and scrolling. AI is a machine, and we can use that to our advantage to do the heavy lifting for us. Making the machine read all of the words, and apply all of the rules, frees up our subject matter experts in records management to add real value, and more easily make decisions.
So automation can help us make sure that:
Rachael Graves is CEO of Castlepoint Systems. e: [email protected] w: https://www.castlepoint.systems