Home NEWS India’s First Vision-Language Foundational Model for Documents

India’s First Vision-Language Foundational Model for Documents


 BharatGen is a Government supported initiative for developing India-centric Multimodal Large Language Models. A team representing BharatGen from The International Institute of Information Technology, Hyderabad (IIIT-H) and Indian Institute of Technology, Bombay (IIT-B) has launched Patram-7B-Instruct, India’s first vision-language foundational model built from scratch for complex document understanding tasks.

Patram is part of BharatGen suite of Multimodal Large Language Models being created with funding from DST. Patram-7B-Instruct is a 7-billion parameter vision-language AI model trained on a large and diverse collection of Indian documents. It is designed to analyze and understand scanned or photographed documents and respond to natural-language instructions. The model is now freely available as an open-source release on Hugging Face and MeitY IndiaAI’s AIKosh platform.

Despite its compact size, Patram outperforms several larger international models, including DeepSeek-VL-2, on key benchmarks such as DocVQA and VisualMRC. It also shows strong results on Patram-Bench, a custom benchmark reflecting real-world Indian document scenarios.

Patram was officially unveiled on June 2, 2025, by Shri Jitendra Singh, Hon’ble Minister of State for Science and Technology, at the BharatGen National Summit in New Delhi, in the presence of Prof Abhay Karandikar, DST Secretary, Shri Kris Gopalakrishnan, Chair of the MGB- NMICPS, Shri Abhishek Singh – Additional Secy., MeitY and other dignitaries. Prof. P. J. Narayanan, Director of IIIT Hyderabad, also attended.

Prof. P. J. NarayananDirector, IIIT Hyderabad, said, “Patram marks a significant step as India designs state-of-the-art foundational models. With this launch, we integrate language available in all forms: as text, as speech, and as images.  This can power multimodal applications with integrated vision-language intelligence.”

Patram was developed in just five months by a team based at IIIT Hyderabad, consisting of engineers (alumni) and student interns, with support from IIIT-H and TiH-IoT, IIT Bombay. The team was led by Dr. Ravi Kiran Sarvadevabhatla, Associate Professor at IIIT-Hyderabad and Dr. Ganesh Ramakrishnan, Professor at IIT-Bombay.

Dr. Ravi Kiran SarvadevabhatlaAssociate Professor at IIIT-Hyderabad and lead researcher on the project, said, “With Patram, we’ve built a model that understands the unique structure and diversity of Indian documents. This is just the beginning of what India can achieve in vision-language AI.”

Alongside Patram, DocBodh, a generative AI suite for Indic document intelligence was also launched. DocBodh is designed for use across sectors like governance, education, law, and business.

This initiative reinforces India’s commitment to building open, inclusive, and cutting-edge AI infrastructure that aligns with national goals such as Digital India and Atmanirbhar Bharat.


About IIIT-Hyderabad

The International Institute of Information Technology, Hyderabad (IIIT-H) is an autonomous research university founded in 1998 that focuses on the core areas of Information Technology, such as Computer Science, Electronics and Communications, and their applications in other domains through inter-disciplinary research with great social impact. Some of its research domains include Visual Information Technologies, Human Language Technologies, Data Engineering, VLSI and Embedded Systems, Computer Architecture, Wireless Communications, Algorithms and Information Security, Robotics, Building Science, Earthquake Engineering, Computational Natural Sciences and Bioinformatics, IT in Agriculture and e-Governance



Source link