The following blog was originally published on G2 Crowd.
Like many technical terms, optical character recognition – or OCR – actually describes a fairly straightforward concept.
OCR technology allows a computer to “read” text the way a human brain does. When you convert a PDF to an editable document, you're using OCR.
You might be wondering, “Can't computers already read text?” After all, computers are literally made to process huge volumes of information coded in letters, numbers, and symbols every day. Similar to how much of the neuroscience field conceives the brain as a network of processing power suggests overlap between the way computers and human brains operate.
The truth is that this is probably a projection: human brains have to work hard to compute as efficiently as our machines do, but the opposite is also true for computers when it comes to interpreting images (not to mention sounds, emotions, and body language).
If you've ever tried and failed to Ctrl-F a presentation slide or image file, you know exactly what it feels like to use software equipped with poor OCR technology.
Without OCR, what's called “scanned text” (as opposed to, say, a Word document) looks about as meaningful to your computer as does a dinner menu to an infant. Without the ability to recognize characters like letters and numbers in a visual way – as our eyes and brains do – a computer sees scanned text as an image, no different from a photograph of a flock of birds silhouetted against a pale sky.
In the dawn of the 1950s, a cryptanalyst at what eventually became the National Security Agency (NSA) anticipated this problem. Becoming the inventor of one of the first OCR machines, David H. Shepard's “Gismo” could read documents produced on a typewriter and joined other early OCR machines to influence the development of standardized fonts, credit card readers, and even technology to aid the blind in reading.
Now, of course, OCR is crucial in any software designed to archive, search, and monitor documents from old newspapers to contracts.
How does OCR work?
If you thought of OCR as teaching a computer to recognize the meaning of certain shapes, similar to the way we teach children what the letter A looks and sounds like, you wouldn't be wrong.
What makes OCR tricky is that the human brain is much better equipped to distinguish between the inconsistencies of written language (e.g. different fonts and handwriting styles) than a computer is.
Good OCR technology thus boils down to optimizing either a program's pattern recognition or its feature detection. With the former approach, a computer learns to recognize a set of different iterations of a given pattern – such as the shape of the letter A – starting with a few common standardized fonts and (hopefully) becoming sophisticated enough to recognize it in longhand.
Feature detection, on the other hand, breaks down the pattern of a given letter into its basic parts or features so that a program can memorize the letter by the relationship between those parts, allowing for a level of precision much closer to that of human brains. With feature detection, good OCR technology conceivably allows a computer to recognize even an obscure handwritten signature.
Why is OCR important to contract management?
If the point of contract management software is to save time and costs by digitalizing the processes of storing, locating, and monitoring contracts, that software needs to be better – and faster – at sifting through those contracts than humans. Knowing what a challenge this can be for computers, you can see the crucial ways OCR bridges that gap in several key areas:
Speed – Once a contract repository is digitalized, contract administrators need to be able to sift and search through high volumes of information – often scanned from hard copies – efficiently. Sophisticated OCR enables computer software to locate and pinpoint details from party names to renewal dates instantly, with the click of a button.
Agility – Contract systems are messy. Protocol changes over time, hard drafts get revised and distributed, and multiple versions of the same contract proliferate. Contract management software must be able to consistently read a diversity of documents and field types (such as names, dates, boilerplate text) – not to mention, different fonts and handwritings – so that contract administrators can access the information they need in a standardized, streamlined fashion.
Accuracy – When it comes to contracts, the devil is in the details. Missing a renewal date or failing to comply with certain stipulations can cost your business vast amounts of time and money, making attention to detail paramount. But if your computer can't read the renewal dates on old contracts, how can you keep up? Fortunately, using contract management software with OCR can help improve accuracy on many fronts.
At its core, OCR enables a program to read and monitor details more precisely than even the best contract administrator, making it a must-have in whichever contract management software a company decides to use.
From improving accuracy to enhancing efficiency, there’s more to OCR than meets the eye.