From: The TREC 2004 genomics track categorization task: classifying full text biomedical documents
| File contents | Training data count | Test data count |
|---|---|---|
| Documents – PMIDs | 504 | 378 |
| Genes – Gene symbol, MGI identifier, and gene name for all used | 1294 | 777 |
| Document gene pairs – PMID-gene pairs | 1418 | 877 |
| Positive examples – PMIDs | 178 | 149 |
| Positive examples – PMID-gene pairs | 346 | 295 |
| Positive examples – PMID-gene-domain tuples | 589 | 495 |
| Positive examples – PMID-gene-domain-evidence tuples | 640 | 522 |
| Positive examples – all PMID-gene-GO-evidence tuples | 872 | 693 |
| Negative examples – PMIDs | 326 | 229 |
| Negative examples – PMID-gene pairs | 1072 | 582 |