πŸ₯ OpenMedLLM-70B just released β€” State-of-the-art on MedQA benchmark  Β·  Download now β†’

πŸ“Š Medical Datasets

Curated, open-access datasets for training and evaluating medical AI models β€” variants, sequences, clinical trials, and more.

Filter:
πŸ₯
deepcog-ai Β· βœ“ Verified
GenomeIndia-10K-WGS

Whole-genome sequences from 10,000 Indian individuals across 100+ ethnic groups. Includes SNP, INDEL, and structural variant annotations. De-identified and ethics-approved.

EHRPopulationVCF
10,247Samples
2.4MClinical pairs
4.8 TBSize
Apache 2.0License
πŸ”¬
deepcog-ai Β· βœ“ Verified
ClinVar-Pathogenic-2024

Curated subset of ClinVar with 450,000 pathogenic and likely-pathogenic variants, enriched with functional evidence, ACMG classifications, and literature links.

ClinicalClinicalJSON
450KClinicals
82KGenes
12 GBSize
CC BY 4.0License
πŸ“„
deepcog-ai Β· aiims-delhi
BioMed-Papers-42M

42 million biomedical research papers from PubMed, PMC Open Access, and preprints. Preprocessed for LLM training with structured metadata, abstracts, and full texts where available.

LiteraturePretrainingJSONL
42MPapers
1995–2024Coverage
820 GBSize
MixedLicense
πŸ’Š
deepcog-ai
DrugTarget-SMILES-2M

2 million drug-target interaction pairs with SMILES notation, protein sequences, binding affinities, ADMET properties, and clinical outcome data from ChEMBL and BindingDB.

Drug DiscoverySMILESCSV
2.1MPairs
340KCompounds
8.3 GBSize
Apache 2.0License
🧫
iit-madras-bioai Β· βœ“ Verified
HistoPath-India-2M

2 million annotated histopathology images from Indian cancer centers, covering 18 cancer types. Includes WHO grading, tumor boundaries, and pathologist consensus labels.

PathologyVisionTIFF
2.1MImages
18Cancer types
14 TBSize
CC BY 4.0License
πŸ†
deepcog-ai Β· βœ“ Benchmark
GeneTuring-Benchmark-v2

The MedQA benchmark suite for evaluating medical AI models across 12 tasks including variant classification, gene function prediction, and clinical report generation.

BenchmarkEvaluationJSON
12Tasks
48KTest cases
2.1 GBSize
Apache 2.0License