For 2020, we present
CAMDA encourages an open contest, where all analyses of the contest data sets are of interest, not limited to the questions suggested here. There is an
online forum
for the free discussion of the contest data sets and their analysis, in which you are encouraged to participate.
We look forward to a lively contest!
From the comprehensive description of genomic, transcriptomic and epigenomic changes of cancers provided by Genomic Data Commons (GDC, formerly at TCGA), the main goal of this challenge is to develop and demonstrate novel methods for gaining novel biological insights or improving support for Precision Medicine. Innovation can come from
Examine algorithm performance in a real-world clinical settings! We know that many approaches work well on some data-sets yet not on others. We here challenge you to demonstrate a unified single approach that matches or outperforms the current state-of-the-art for
and for at least one of the less well studied
Please visit and participate in the open CAMDA data integration forum for free discussion related to this contest.
Analysis suggestions:
Biological:
Technical:
(e.g., less-well studied cancers)?
Contest data comprises raw and pre-processed data from matched molecular profiles with complementary clinical information.
For convenience, we provide a local copy of the data. In addition, anonymized RNA-seq read level data are now available.
Please sign up to announcements from the CAMDA data integration forum for alerts.
Please read and accept the data download agreement for access.
Due to safety / toxicity issues, attrition in drug discovery and development remains a significant concern, and there are strong efforts to identify and mitigate risk as early as possible. Drug-induced liver injury (DILI) is one of the primary liabilities in drug development and regulatory clearance due to the limited performance of mandated preclinical models. There is a pressing need to evaluate alternative methods for predicting severe DILI, the main concern of the regulatory agencies. Increasing evidence suggests that multiple factors, including the interactions between drug properties and host factors (i.e., patient information), contribute to the DILI effect of a drug (Journal of Hepatology 63). With great hopes being placed in modern approaches from statistics and machine learning applied to genome scale profiling data. If we can better integrate, understand, and exploit information from multiple complementary studies of chemical compounds remains thus a critical question, specifically, exploring chemical descriptors of the drugs (Mold2, Journal of Chemical Information and Modeling 48), cell-based screening of pathway perturbations of the drugs (Toxicology in the 21st Century/Tox21, Nature Communications 7), gene expression patterns induced by them (Broad Institute Connectivity Map/CMap, Science 313, Nature Reviews Cancer 7, Cell 171), as well as host factors from the FDA Adverse Event Reporting System database (FAERS).
This CAMDA challenge focuses on understanding or predicting a drug’s potential to cause acute liver failure, the most severe type of DILI. To support the development of supervised machine learning approaches, we retrieved DILI severity information from the FDA-approved drug labeling, and specifically, now provide a new set of training labels of 422 drugs, indicating their potential to cause acute liver failure effects. In addition, we acquired a validation set of 195 drugs with blinded labels, which should be predicted. In the 2020 challenge, instead of relying solely on gene expression data, we extended the predictors by Mold2 chemical descriptors, host factors information (age and gender of the patients) from the FDA FAERS database, and pathway perturbation data of Tox21. Moreover, we now narrowed down last year's challenge CMap L1000 gene expression data set to cover six cell lines, potentially most relevant to liver (i.e. PHH, HEPG2, HA1E, A375, MCF7, PC3). The analysis teams will be encouraged to develop models using these predictors individually and/or in combination.
Analysis suggestions:
Contest data comprise anonymized processed expression profiles from the Broad Institute Human L1000 epsilon platform. Complementary information includes Mold2 chemical descriptors of the drugs, Tox21 cell-based screening of pathway perturbations of the drugs, and FAERS information. Toxicity labels were compiled by the US FDA.
A local copy of relevant subsets of the data, including labels, is available now.
Please sign up to announcements from the CAMDA toxicogenomics forum for alerts and for free discussion related to this contest.
Please read and accept the data download agreement for access.
MetaSUB is creating a global genetic cartography of urban spaces, based on extensive sampling of mass-transit systems and other public areas across the globe. In a strategic partnership an extended set of data from global City Sampling Days is first introduced through the annual CAMDA contests. CAMDA delegates thus receive access to over a thousand novel MetaSUB samples, comprising over a terabase of whole genome shotgun (WGS) metagenomics data. The primary data set covers over 20 cities around the world, with tens of samples per city (over 1000 samples in total), providing a unique resource for the study of biodiversity within and across geographic locations as well as ecological niches.
For better understanding of the relation between metagenomic profiles and location specificity / ecological niche the set of over a thousand features describing the climate conditions are provided as well as city and neighbouring biomes classification.
Further extended global coverage can be achieved by complementary 16S rRNA studies contributing thousands of samples, the Earth Microbiome Project and A global atlas of the dominant bacteria found in soil. For a range of MetaSUB Boston reference samples we now provide both WGS and 16S profiles, allowing a first systematic link of WGS and 16S resources.
Together, these unique multi-source data set will allow to build novel models to predict ecological niche type or even origin locations of samples from cities seen for the very first time. Performance can be tested on an independent test set of over 50 new 'mystery' samples including locations from cities not sampled before.
Please visit and participate in the open CAMDA meta-genomics forum for free discussion related to this contest.
Analysis suggestions:
A key challenge in metagenomic forensics is the construction of a microbiome fingerprint which will allow the prediction of the
geographical origin of a sample even in case when no reference samples from this location are known.
Typical considerations include:
The primary data set is now available. This contains: i) hundreds of samples with WGS raw reads from urban locations from MetaSUB Consortium, ii) Over a thousand of weather/climate features for cities as well as city and neighbouring biome classification.
In addition the 16S sequencing-based OTUs for thousands of soil samples from two mentioned project from allover the world are also available.
With the set of mystery samples (now available), try to:
Please sign up to announcements from the CAMDA meta-genomics forum for alerts.
For a copy of our data, please accept the data download agreement for access.