What is Optical Character Recognition?

September 2020
Optical Character Recognition, or OCR, is the process of converting visual text, usually text that exists in a physical form, into a digital version of the same text. There are a number of reasons to do this, which include allowing physical text to be digitally stored, manipulated, edited, shared or otherwise interacted with in ways that would not be possible with physical copies. OCR can be commonly found in areas such as data entry systems, passport recognition, and technology designed to assist people who are blind or visually impaired.

Before OCR, the only means to convert text to a digital format was to manually read and input the data. Naturally, there are a number of benefits to implementing OCR solutions when compared to human methods. As with many tasks where machines replace humans, there are huge efficiencies in terms of both time and error; machines are able to instantly scan and handle large amounts of data and have a far higher rate of accuracy than any human could hope to achieve, particularly at scale. In addition, without the need for rest, machines can carry out operations around the clock. This leads not only to the possibility of processing increasingly large amounts of data but also to instant processing of data no matter what time it is received, rather than having to wait on office hours - a possibility that becomes even more attractive when working on a global scale or spanning multiple time zones. 


Another benefit of implementing machines that use OCR to convert data is the potential of machines to be programmed to read code and instantly process its data, leading to developments like the inclusion of a Machine Readable Zone (MRZ) in documents. This allows data to be compacted into relatively short strings of alphanumeric characters, as can be found in passports, in order to allow them to convey a lot of information with minimal space which can be read without issue by a machine. With this possibility, more efficiency solutions can be implemented, such as machine-based border control at airports, drastically cutting down on the time and effort required to identify people and removing a lot of frustration.  


By using machines rather than humans, other important benefits of OCR arise in the areas of privacy and security. In relying on a machine, software can be included in products such as mobile phones or scanners which allow them to read and process the data without sending it elsewhere and without being seen by human eyes. This means that the data is securely handled on an individual device rather than being sent to a centralised location with a greater risk of being hacked or misused, and that the sensitive data being read is not shared with anyone with human judgement but is only read by a machine.  


Despite the multitude of benefits which using OCR brings, using it does still bring potential problems. One of the main issues arising when dealing with machines reading documents occurs when documents are handwritten or are written in less common characters; in these situations, some machines may have difficulty in recognising the correct spelling or even whole sections or documents, leading to errors when trying to use OCR. Another similar issue can be found where documents use a non-standard format. This is particularly problematic when working on a global scale as the layout of documentation in different countries does not always match the global norm. The final main problem is one is raised when people unaware of the location or purpose of a MRZ need to interact with it. This can be an issue in identity verification when users are asked to take a picture of their passport, or a selfie with their passport, but due to the angle of the photograph or the positioning of a hand, the MRZ is obscured, rendering it impossible for the OCR to read the photograph. However, with all of these issues, as OCR becomes increasingly standard and as their adoption increases globally, any problems are likely to become less and less frequent.   


At Blockpass, we use OCR in our KYC Connect® solution to read data from users’ identity documents to provide instant and painless identity verification services (providing users don’t accidentally obscure the MRZ!). Users can check their photographs to ensure the MRZ is readable before submitting it, making sure that the string of alphanumeric characters at the bottom is clearly visible. Whilst we employ solutions like OCR to meet the requirements we have, we also develop our own technology and constantly work to update the service we provide for both businesses and users in arrears such as privacy, usability and efficiency. Future developments in conjunction with the Blockpass Identity Lab at Edinburgh Napier University may lead to situations where, through our KYC Connect® solution, an OCR on a device is all that’s needed to establish a verifiable identity!


The Blockpass platform is fully automated and hosted in the cloud, with no integration or setup fee. Businesses can sign up to the KYC Connect® console in a matter of minutes, test out the service, and start conducting identity documents verification, KYC and AML checks. Sign up for FREE at