Norwegian version of this page

daiR: OCR with Google Document AI in R

Optical character recognition (OCR) promises to open vast bodies of historical data to scientific inquiry, but OCR can be cumbersome when documents are noisy. The past 18 months have seen the launch of new OCR processors with vastly improved accuracy. In this seminar, Thomas Hegghammer will give an overview of the latest tools and present a new R package that offers access to the most powerful of them all, Google Document AI.

Arabic text as picture

The R package can be found at dair.info

 

Tags: R, OCR, Political Science, MENA, Data Science
Published Apr. 8, 2021 9:50 AM - Last modified Jan. 28, 2024 2:48 PM