Why the country’s most strategic dataset is written by hand, and what it means for AI sovereignty.

Most of the AI conversation at FutureTech MeetUp: AI First circled around what models can already do. The keynote from Dmytro Voitekh — AI/ML Lead at Mriia and adviser on artificial intelligence to the Deputy Minister of Economy — went the other way. His subject was what machines still can’t see: the vast body of Ukrainian information that has never made it into their training.

​Millions of handwritten documents sit in archives and government offices, and to an algorithm they remain a blind spot. Closing that blind spot is precisely the purpose of RUKOPYS — Ukraine’s first open, annotated dataset of handwritten text, built to train AI and advance optical character recognition.

The missing layer of digitalisation

The Ministry of Digital Transformation of Ukraine has announced the release of roughly 10 terabytes of data from the State Archives to broaden the base used to train Ukraine’s national language model. The catch: much of that material simply cannot be read by a machine neither by a person. 

Voitekh estimates that up to 80% of the data in the state archives is either written in an old-fashioned manner so that even a human struggles to decipher it, or set on faded pages where almost nothing remains.

And the challenge isn’t text recognition in the classic sense. 

The machine first has to grasp the structure of a document and the types of objects on the page: where the handwriting is, where the print is, where the table is — and only then read for context, since the top of a page often signals what’s written at the bottom. Voitekh noted:

 

What you can pull from the standard tools on the market isn’t adapted to the full range of problems.

What the task demands is a sophisticated agentic system that doesn’t yet exist in ready-made form.

Teaching a model to say «no»

You can always force an off-the-shelf model to read a faded page. The trouble is that the result will bear no relation to reality. For archives, courts and registries, a confident AI-fabrication is far more dangerous than an honest «I can’t read this». 

So the quality of the dataset is measured not only by how much the model recognised, but by where it correctly declined.

speaker image Dmytro Voitekh
Dmytro Voitekh — AI/ML Lead at Mriia and adviser on artificial intelligence to the Deputy Minister of Economy

Why is this a question of sovereignty

From there, Voitekh laid out why owning your own data and models is an engineering necessity rather than a patriotic flourish.

  • ​​Masking personal data in typed digital text is straightforward. In a scan, it is not. That job requires a dedicated detection model of its own. Otherwise, the data has to be routed abroad via alien solutions, an action many organisations’ internal policies simply forbid.
  • ​There is also the dependency risk. For ordinary text tasks, you can switch from one model to another without consequence. But with handwriting — Ukrainian in particular — no one can say if, for instance, the next release of Gemini will read it. This kind of data is vanishingly rare, and a prompt that works today may already be failing tomorrow.

The «Handwritten to Data» hackathon

RUKOPYS draws on strikingly varied sources — school homework, exam papers from two universities, State Archives material, and submissions from the Radio Dictation of National Unity. Built on top of this dataset is an open hackathon, Handwritten to Data, which anyone can join.

In the first month of competition, participants’ solutions based on local models already outperformed the one used for auto-annotation on a third-party service by 7%. In other words, Ukrainian work is edging ahead of one of the best solutions on the market.

The hackathon metric scores more than recognised text: it also rewards the accuracy with which a model detects and classifies the regions of a page.

The hackathon’s lead technology partner is De Novo, a member company of Diia.City United Association. The final will be held at the Kyiv School of Economics.

​The project’s first beneficiaries are already known: eDozvil, the Ministry of Economy’s service; the Mriia education platform, which aims to free teachers from grading homework; the National Police; and the State Archives

Voitekh framed the goal of the initiative as giving Ukraine’s AI the ability to see more — and, beyond that, providing the first push for an ecosystem in which domains, derivative models and synthetic datasets begin to emerge.

 

speaker image Dmytro Voitekh
Dmytro Voitekh

Thank you

Our FutureTech MeetUp: AI First was made possible by the support of DiiaCityUnited’s trusted partners: AI HOUSE, IT SmartFlex and HPE by Sophela. 

We’re equally grateful to all the friends of the Association who helped make the evening truly special: Diia.City, Ukrainian Startup Fund, American Chamber of Commerce in Ukraine, Ukrainian Corporate Governance Academy, Vuzoll, Challenger Accelerator, Radar Tech, De Novo, DOU, Defender Media, Marketer, dev.ua, AIN.UA, Tala Water, Underwood Brewery, Kyiv Kraut, Kombucha Wild, BOX Catering and Sheriff Holding.