We introduce OpenMedLLM-70B, a large language model trained on 42M biomedical papers and 2.4M variant pairs for medical interpretation. Our model achieves 78.9% on the MedQA benchmark, surpassing GPT-4 (71.3%) and Gemini Ultra (69.8%). We release all model weights, training data, and evaluation code under Apache 2.0.
We present OpenBioLLM-70B and 8B, open biomedical language models that achieve state-of-the-art performance on MedQA (89.4%), PubMedQA (81.2%), and BioASQ-11B (74.6%). Models are trained on 42M papers with instruction tuning on clinical dialogues and USMLE-style questions.
We introduce a multimodal LLM that jointly models molecular SMILES notation, amino acid sequences, and clinical trial data to predict drug-target binding affinity and ADMET properties with state-of-the-art accuracy on BindingDB and ChEMBL benchmarks.
We develop a vision-language model trained on 2M+ annotated histopathology images from Indian cancer centers. Our model achieves 94.2% accuracy in cancer grading and 91.8% in tumor boundary delineation across 18 cancer types.
We present the first medical language model trained specifically on Indian population genomes. Using 10,247 whole-genome sequences across 100+ ethnic groups, we show significant improvements over Western-trained models on South Asian variant pathogenicity classification.