Loading…
This event has ended. Visit the official site or create your own event on Sched.
Welcome to the Rocky 2021 Conference. Please click on the links below to access the Rocky website and the list of posters:

CONFERENCE RESOURCES
Rocky Website
Poster Presentation List without abstracts
Poster Presentation List with abstracts

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Wednesday, December 1
 

TBA

SLACK CHANNEL - Rocky Communications
The Rocky Conference will be using a Slack account https://rockyconference.slack.com to communicate throughout the event.

If you have not already done so, please join the Slack Channel here

Visit Slack Help for how to use the App.

Wednesday December 1, 2021 TBA

4:00pm MST

Registration
The Rocky Conference registration desk is located on the bottom floor of the Viceroy Hotel where you can pick up your badge, tickets, and check-in.  

Please note:
COVID-19 INFORMATION AND POLICIES FOR ROCKY 2021 ATTENDEES
All ROCKY conference attendees are required to follow the United States CDC (Center for Disease Control) COVID mandates AND all Rocky2021 attendees, upon check-in at the Rocky conference registration desk, will be required to:
  • Show documentation with proof of COVID-19 vaccination, OR the results of a negative COVID test taken within 72 hours of arrival at the Rocky registration desk AND,  
  • Wear masks at all indoor Rocky-hosted activities - per local Pitkin County requirements.
If you are required to get tested prior to leaving the United States, the document at the link below includes a link to Pitkin County (Snowmass/Aspen) testing sites.  We have pinned this message to the top of the SLACK Channel. Here are more details and where to find more information.  https://docs.google.com/document/d/1vpxiOz6G2djvGLcPQZR6taJzq0gXHVNGB1nXsZFqnxc/edit?usp=sharing


Wednesday December 1, 2021 4:00pm - 6:00pm MST
Viceoy - Lower Level
 
Thursday, December 2
 

8:00am MST

Breakfast
Thursday December 2, 2021 8:00am - 9:00am MST
Ballroom Foyer

8:00am MST

Registration
The Rocky Conference registration desk is located on the bottom floor of the Viceroy Hotel where you can pick up your badge, tickets, and check-in.  

Please note:
COVID-19 INFORMATION AND POLICIES FOR ROCKY 2021 ATTENDEES
All ROCKY conference attendees are required to follow the United States CDC (Center for Disease Control) COVID mandates AND all Rocky2021 attendees, upon check-in at the Rocky conference registration desk, will be required to:
  • Show documentation with proof of COVID-19 vaccination, OR the results of a negative COVID test taken within 72 hours of arrival at the Rocky registration desk AND,  
  • Wear masks at all indoor Rocky-hosted activities - per local Pitkin County requirements.
If you are required to get tested prior to leaving the United States, the document at the link below includes a link to Pitkin County (Snowmass/Aspen) testing sites.  We have pinned this message to the top of the SLACK Channel. Here are more details and where to find more information.  https://docs.google.com/document/d/1vpxiOz6G2djvGLcPQZR6taJzq0gXHVNGB1nXsZFqnxc/edit?usp=sharing

Thursday December 2, 2021 8:00am - 5:00pm MST
Viceoy - Lower Level

9:00am MST

Keynote 1 - Humans “in-the-loop”: Practical Recommendations for Enhancing the Trustworthiness of AI Development for Healthcare - Diane M. Korngiebel, PhD
DIANE M. KORNGIEBEL, DPhil
Bioethics Team, Google
Affiliate Associate Professor, in the Department of Biomedical Informatics & Medical Education and Department of Bioethics & Humanities, University of Washington School of Medicine
United States

Biography (.pdf)

Humans “in-the-loop”: Practical Recommendations for Enhancing the Trustworthiness of AI Development for Healthcare

Rather than rely mainly on human-in-the-loop oversight of AI, Dr. Korngiebel presents how trustworthy AI starts with trustworthy AI development. She describes some of her recent NIH-funded research on creating an ethics framework for AI development and reports some preliminary findings from the project’s first aim, mapping decision points in the AI development process. Then, drawing on the work of that project and her experience as an applied ethicist embedded in bioinformatics, she offers practical recommendations for ways to plan for “looping in” humans during AI development.

Presenters
avatar for Diane Korngiebel

Diane Korngiebel

Google
Diane M. Korngiebel, DPhilBioethics Team, GoogleAffiliate Associate Professor, in the Department of Biomedical Informatics & Medical Education and Department of Bioethics & Humanities, University of Washington School of MedicineUnited States... Read More →



Thursday December 2, 2021 9:00am - 9:45am MST
Ballroom Salon 1

9:45am MST

OP 01 - Adventures in Integrating Biobank Scale Data with Genomic, Phenotypic and Imaging Data from Other Sources
OP-01
Adventures in Integrating Biobank Scale Data with Genomic, Phenotypic and Imaging Data from Other Sources

Presenting Author: Ben Busby, DNAnexus

Abstract: We are creating a system to extend the cohort level analysis of a biobank to additional data types in order to enable mechanistic understanding. While all public data should be FAIR, privacy concerns prevent full access to all metadata fields for much of the data in question. Therefore we have harmonized the metadata on select fields of several hundred thousand pilot datasets, allowing researchers to see all of the tangential datasets available for their cohorts in what we are calling a “Boolean Knowledge Graph“. This “try before you buy” approach allows researchers to understand the available data landscape surrounding their cohort before investing the paperwork and/or computational work necessary to import said datasets. Once datasets are selected and access is granted, models can be generated that make predictions about the granular facets of these datasets and the results of these models can be appended to the original data set. In one specific example, we have crossed phenotypes from the ukbiobank with refine.bio data to see all RNAseq data available for any particular disease cohort. If a salient hypothesis were made from a given set of RNAseq data, we can import a variety of other biomedical data using OMOP and other standards leveraging the same phased approach. While the tools we have developed are useful, it is our hope that the insights we have gained in exploring these use cases are more widely applicable to the integration of biomedical data.


Presenters
BB

Ben Busby

DNAnexus


Thursday December 2, 2021 9:45am - 9:55am MST
Ballroom Salon 1

9:55am MST

OP 02 - ProTaxa: software to easily perform phylogenomic analyses for prokaryotic taxa
OP-02
ProTaxa: software to easily perform phylogenomic analyses for prokaryotic taxa

Presenting Author: Joseph Wirth, Harvey Mudd College

Co-Author(s):
Eliot Bush, Harvey Mudd College

Abstract:The nucleotide sequences of 16S ribosomal RNA (rRNA) genes have been used to inform the taxonomic placement of prokaryotes for several decades, but recent research has demonstrated that whole-genome approaches can better resolve the evolutionary relationships of organisms, especially when taxa are closely-related. However, the vast number of publicly available 16S rRNA gene sequences make this gene useful for obtaining a rough estimate of the phylogeny for a given taxon. Unfortunately, the reliance of 16S rRNA as the sole phylogenetic marker often causes closely-related organisms to be omitted from taxonomic analyses. In addition, NCBI Taxonomy is not an authoritative database for prokaryotic taxonomy. Although it is roughly accurate, the database has many erroneous entries, especially when it comes to the accurate designation of type material. While there are existing tools available to facilitate taxonomic placement, the genome-selection methods leave much to be desired. For example, the TYpe strain Genome Server (TYGS) uses a proprietary genome database, and the Microbial Genome Atlas (MiGA) relies on relationships provided by NCBI's Taxonomy database. ProTaxa was developed to resolve these issues in a freely accessible (open source) way. NCBI's Taxonomy database is cross-referenced with the List of Prokaryotes with Standing in Nomenclature (LPSN), a more definitive resource for prokaryotic taxonomy, which allows easy linking to NCBI’s sequence databases. This software also employs a unique strategy to identify closely-related genomes that were omitted by identifying and utilizing phylogenetic markers specific to the input genome. These approaches greatly improve taxonomic placements and are largely automated.


Presenters
JW

Joseph Wirth

Postdoctoral Scholar in Interdisciplinary Computation, Harvey Mudd College
I am a microbiologist and only recently began learning computational biology. I am interested in the ecology, evolution, and physiology of microbes, especially those organisms that are environmentally or clinically relevant. I seek to apply a combination of genetic, biochemical, physiological... Read More →


Thursday December 2, 2021 9:55am - 10:05am MST
Ballroom Salon 1

10:05am MST

OP 03 - Predicting functional consequences of mutations using molecular interaction network features
​​​​OP-03
Predicting functional consequences of mutations using molecular interaction network features

Presenting Author: Kivilcim Ozturk, University of California San Diego

Co-Author(s):
Hannah Carter, University of California San Diego

Abstract:Variant interpretation remains a central challenge for precision medicine. Missense variants are particularly difficult to understand as they change only a single amino acid in a protein sequence yet can have large and varied effects on protein activity. Numerous tools have been developed to identify missense variants with putative disease consequences from protein sequence and structure. However, biological function arises through higher order interactions among proteins and molecules within cells. We therefore sought to capture information about the potential of missense mutations to perturb protein-protein interaction networks by integrating protein structure and interaction data. We developed 16 network-based annotations for missense mutations that provide orthogonal information to features classically used to prioritize variants. We then evaluated them in the context of a proven machine-learning framework for variant effect prediction across multiple benchmark datasets and demonstrated their potential to improve variant classification. Interestingly, network features resulted in larger performance gains for classifying somatic mutations than for germline variants, possibly due to different constraints on what mutations are tolerated at the cellular versus organismal level. Our results suggest that modeling variant potential to perturb context-specific interactome networks is a fruitful strategy to advance in silico variant effect prediction.


Thursday December 2, 2021 10:05am - 10:15am MST
Ballroom Salon 1

10:15am MST

OP 04 - Co-evolution based machine-learning for predicting functional interactions between human genes
OP-04
Co-evolution based machine-learning for predicting functional interactions between human genes

Presenting Author: Yuval Tabach, The Hebrew University-Hadassah Medical School

Abstract:Over the next decade, more than a million eukaryotic species are expected to be fully sequenced. This has the potential improve our understanding of genotype and phenotype crosstalk, gene function and interactions, and answer evolutionary questions. Here, we develop a machine-learning approach for utilizing phylogenetic-profiles across 1154 eukaryotic species. This method integrates co-evolution across eukaryotic clades to predict functional interactions between human genes and the context for these interactions. We benchmarked our approach and found a 14% performance increase (auROC) compared to previous methods. Using this approach, we enabled functional annotation for less studied genes. We focused on DNA repair and verified that 9 of the top 50 predicted genes have been identified elsewhere, with others previously prioritized by high-throughput screens. Overall, our approach enables better annotation of function and functional interactions and facilitates the understanding of evolutionary processes underlying co-evolution. This work is accompanied by a webserver available at: https://mlpp.cs.huji.ac.il.


Presenters
avatar for Yuval Tabach

Yuval Tabach

Hebrew university
genomics, evolution, genetic diseases, Data, Cancer, RNA toxicity, DNA repair


Thursday December 2, 2021 10:15am - 10:25am MST
Ballroom Salon 1

10:25am MST

Break
Thursday December 2, 2021 10:25am - 10:45am MST
Ballroom Foyer

10:45am MST

OP 05 - Scalable estimation of microbial co-occurrence networks with Variational Autoencoders
OP 05
Scalable estimation of microbial co-occurrence networks with Variational Autoencoders

Presenting Author: James Morton, Simons Foundation

Co-Author(s):
Justin Silverman, Pennsylvania State University
Gleb Tikhonov, University of Helsinki
Harri Lähdesmäki, University of Aalto
Richard Bonneau, Simons Foundation

Abstract:Estimating microbe-microbe interactions is critical for understanding the ecological laws governing microbial communities. Rapidly decreasing sequencing costs have promised new opportunities to estimate microbe-microbe interactions across thousands of uncultured, unknown microbes. However, typical microbiome datasets are very high dimensional and accurate estimating of microbial correlations requires tens of thousands of samples, exceeding the computational capabilities of existing methodologies. Furthermore, the vast majority of microbiome studies collect compositional metagenomics data which enforces a negative bias when computing microbe-microbe correlations. The Multinomial Logistic Normal (MLN) distribution has been shown to be effective at inferring microbial-microbe correlations, however scalable Bayesian inference of these distributions has remained elusive. Here, we show that carefully constructed Variational Autoencoders (VAEs) augmented with the Isometric Log-ratio (ILR) transform can estimate low-rank MLN distributions thousands of times faster than existing methods. These VAEs can be trained on tens of thousands of samples, enabling co-occurrence inference across tens of thousands of microbes without regularization. The latent embedding distances computed from these VAEs are competitive with existing beta-diversity methods across a variety of mouse and human microbiome classification and regression tasks, with notable improvements on longitudinal studies.


Presenters
JM

James Morton

Simons Foundation


Thursday December 2, 2021 10:45am - 10:55am MST
Ballroom Salon 1

10:55am MST

OP 06 - Geographical Support Vector Machines (GSVM) for the Analysis of Spatially Data
OP-06
Geographical Support Vector Machines (GSVM) for the Analysis of Spatially Data


Presenting Author: Shachi Patel, University of Kansas Medical Center

Abstract:Analysis of geographical data presents a unique challenge because of the spatial correlation among the variables. Geographically Weighted Regression(GWR) has been developed as a tool to capture the strong effect of local variations. However, GWR is a parametric technique, and therefore, it requires an assumption about the functional form between the response and the independent variables.  On the other hand, Support Vector Machines(SVM) do not require a specific relationship between response and independent variables. However, not many studies have incorporated spatial weights of the geographical data with SVM. Therefore, we developed a method called Geographical Support Vector Machines(GSVM), which combines geographically related data with SVM. This approach creates separate SVM for each local context and weighting observations based on their distance to the local context. We tested our method on two different datasets: urologist dataset and election results dataset. For the urologist dataset, we built a model to predict the counties that exhibit an increase in urologist availability from 2010 to 2018 using the training dataset of an increase in urologist availability from 2000 to 2010 and socioeconomic variables of each county as a predictive parameter. For the election dataset, we used election results of 2012 and population-socioeconomic variables of each county to predict the election results of 2016. In both datasets, the GSVM model performs significantly better than SVM. In conclusion, we have developed a non-parametric spatial analysis technique that can estimate an arbitrary functional relationship among predictors and responses to analyze the geographically correlated data.


Thursday December 2, 2021 10:55am - 11:05am MST
Ballroom Salon 1

11:05am MST

OP 07 - sciCAN: Single-cell chromatin accessibility and gene expression data integration via Cycle-consistent Adversarial Network
OP-07
sciCAN: Single-cell chromatin accessibility and gene expression data integration via Cycle-consistent Adversarial Network

Presenting Author: Yang Xu, The University of Tennessee, Knoxville

Abstract:As the booming single-cell sequencing technologies bring a surge of high dimensional data that come from different sources and represent cellular systems with different features, there is an equivalent increase in the challenges of integrating single-cell sequencing data across modalities. Here, we present a novel adversarial approach (sciCAN) to integrate single-cell chromatin accessibility and gene expression data in an unsupervised manner. We benchmarked sciCAN with 3 state-of-the-art (SOTA) methods in 5 different ATAC-seq/RNA-seq datasets, and we demonstrated that sciCAN dealt with data integration with better balance of mutual transferring between modalities than the other 3 SOTA methods. sciCAN, along with Seurat, has the best integration performance. Next, we applied sciCAN to both PBMC RNA-seq and ATAC-seq data and showed that the integrated representation learned sciCAN preserved HSC-centered hematopoiesis hierarchy in both modalities. Finally, we used sciCAN to jointly cluster single-cell CRISPR-screed K562 RNA-seq and ATAC-seq data, and we identified a subcluster enriching similar markers in both modalities, suggesting a common effect after CRISPR perturbation.


Presenters
avatar for Yang Xu

Yang Xu

The University of Tennessee


Thursday December 2, 2021 11:05am - 11:15am MST
Ballroom Salon 1

11:15am MST

OP 08 - TADMaster: A Comprehensive Web-based Tool for Analysis of Topologically Associated Domain Detection Methods
OP-08
TADMaster: A Comprehensive Web-based Tool for Analysis of Topologically Associated Domain Detection Methods

Presenting Author: Oluwatosin Oluwadare, University of Colorado, Colorado Springs, USA

Co-Author(s):
Sean Higgins, University of Colorado, Colorado Springs
Allen Westcott, University of Colorado, Colorado Springs
Victor Akpokiro, University of Colorado, Colorado Springs

Abstract:Chromosome conformation capture and its derivatives have provided a substantial amount of genetic data for understanding the structure of chromosomes. The latter has led to the identification of topologically associated domains (TADs) that play a critical role in the local structure and function of a chromosome. There are numerous computational methods for identifying these regions, however, it is not trivial to directly compare the results of one method to another. In this work, we propose a web-based visualization and analysis suite to directly compare the outputs between two or more TAD identifying methods called TADMaster. This includes a direct analysis via a heat map of the identified regions; a quantitative comparison of the size/number of identified regions; the boundaries of the identified regions; the totality of the domains; and the amount of domain overlap are also provided via graphs for comparison between methods. Finally, methods are clustered using principal component analysis and t-distributed stochastic neighbor embedding to provide additional metrics for comparison between methods. In addition, TADMaster also provides multiple normalizations and state-of-the-art methods for identifying TADs in a multitude of various Hi-C data formats. Alternatively, users can also submit their own identified TADs from any TAD detection algorithm for comparison. Ultimately, TADMaster will prove to be a valuable tool for researchers working or exploring the chromatin genomic research area for understanding the hidden organization and hierarchical organization of the chromosome or genome structure. TADMaster can be accessed at http://tadmaster.io.


Presenters
avatar for Oluwatosin Oluwadare

Oluwatosin Oluwadare

Assistant Professor, University of Colorado
My research focuses on projects such as genome structure prediction, chromosome and genome features extraction, and the application of machine learning algorithms to biological datasets. Click on this link to see details about my research group, Oluwadare Lab... Read More →


Thursday December 2, 2021 11:15am - 11:25am MST
Ballroom Salon 1

11:25am MST

OP 09 - Error Modelled Gene Expression Analysis (EMOGEA)
OP-09
Error Modelled Gene Expression Analysis (EMOGEA)

Presenting Author: Tobias Karakach, Tobias Karakach

Co-Author(s):
Jasmine Barra, Dalhousie University
Federico Taverna, Dalhousie University

Abstract: Serial RNA-seq studies of bulk samples are widespread and provide an opportunity for improved understanding of gene regulation during e.g., development or response to an incremental dose. In addition, the widely popular single-cell RNA-seq (scRNA-seq) data implicitly exhibit serial characteristics because measured gene expression values recapitulate cellular transitions. Unfortunately, serial RNA-seq data continue to be analyzed by methods that ignore this ordinal structure and yield results that are difficult to interpret. Here, we present Error Modelled Gene Expression Analysis (EMOGEA), a principled framework for analyzing RNA-seq data that incorporates measurement uncertainty in the analysis, while introducing a special formulation for modeling data that are acquired as a function of time or other continuous variables. By incorporating uncertainties in the analysis, EMOGEA is specifically suited for RNA-seq studies in which low-count transcripts with small fold-changes lead to significant biological effects. Such transcripts include signaling mRNAs and non-coding RNAs (ncRNA) that are known to exhibit low levels of expression. Through this, missing values are handled by associating with them disproportionately large uncertainties which makes it particularly useful for scRNA-seq data. We demonstrate the utility of this framework by extracting a cascade of gene expression waves from a well-designed RNA-seq study of zebrafish embryogenesis and, a scRNA-seq study of mouse pre-implantation and provide unique biological insights into the regulation of genes in each wave. For non-ordinal measurements, we show that EMOGEA has a much higher rate of true positive calls and a vanishingly small rate.


Presenters
avatar for Tobias Karakach

Tobias Karakach

Assistant Professor, Dalhousie University


Thursday December 2, 2021 11:25am - 11:35am MST
Ballroom Salon 1

11:35am MST

Box Lunch Pick up (For those who purchased in advance)
Box lunches are available for pickup at the conference registration desk for those who purchase tickets in advance.   

Thursday December 2, 2021 11:35am - 11:50am MST
Ballroom Foyer

11:35am MST

Ski Break
We take a break for those who wish to ski or if you prefer taking a rest or visiting town. Enjoy and we will see you at 4pm


Thursday December 2, 2021 11:35am - 4:00pm MST

4:00pm MST

Keynote 2 - Dramatically Elevated Proteomic Risk Profiles Predict COVID-19 Severity - Michael A. Hinterberg, PhD
MICHAEL A. HINTERBERG, PhD
Senior Bioinformatics Scientist
SomaLogic, Inc.
United States

LinkedIn Profile

Dramatically Elevated Proteomic Risk Profiles Predict COVID-19 Severity
The effects of COVID-19 are strongly linked to cardiovascular disease and disease mechanisms and have additional multi-organ effects. Despite evidence that people with existing cardiovascular disease and risk factors are generally at higher risk for severe COVID-19, traditional clinical risk factors and measurements are often insufficient in the context of acute COVID-19. Using SomaScan® to measure 7,000 proteins simultaneously, along with developed and validated tests for cardiovascular risk, we show substantial effects of circulating plasma protein changes observed within hours or days associated with COVID-19 severity and mortality. Elevated cardiovascular risk prediction is predictive of COVID-19 severity and is superior to established cardiovascular clinical biomarkers. Additionally, predictive proteomic models for kidney and liver health, and models for cardiometabolic fitness, are significantly associated with COVID-19 outcomes. These results provide unique and important insight into cardiovascular disorders and multi-organ dysfunction associated with COVID-19, and broaden the applicability of proteomics into novel disease research.

Presenters
avatar for Michael A. Hinterberg

Michael A. Hinterberg

Senior Scientist, SomaLogic
Proteomics, translational, and clinical medicineCardiovascular disease, COVID-19, and exercise physiologyR-based data analysisColorado mountains and outdoor sports



Thursday December 2, 2021 4:00pm - 4:30pm MST
Ballroom Salon 1

4:30pm MST

OP 10 - New machine learning approaches to estimate the functional consequence of mutations in diverse human populations
OP-10
New machine learning approaches to estimate the functional consequence of mutations in diverse human populations

Presenting Author: Yuval Itan, Icahn School of Medicine at Mount Sinai

Co-Author(s):
Cigdem Sevim Bayrak, Icahn School of Medicine at Mount Sinai
Yiming Wu, Icahn School of Medicine at Mount Sinai
David Stein, Icahn School of Medicine at Mount Sinai
David Cooper, Cardiff University
Peter Stenson, Cardiff University
Avner Schlessinger, Icahn School of Medicine at Mount Sinai
Avner Schlessinger, Icahn School of Medicine at Mount Sinai
Judy Cho, Icahn School of Medicine at Mount Sinai

Abstract: The genome of a patient with a genetic disease contains about 20,000 non-synonymous variations, of which only one (or a few) is disease-causing. Current computational methods cannot predict the functional consequence of a mutation: whether it results in gain-of-function (GOF) or loss-of-function (LOF). Moreover, computational predictions of mutation pathogenicity are still lacking specificity when analyzing diverse human genetic data. Here we present two novel approaches to address these shortcomings: (1) a machine learning study to computationally differentiate GOF from LOF mutations, using natural language processing (NLP) and feature selection to generate the first large-scale human inherited GOF and LOF mutation database; and (2) a deep learning neural network approach to classify mutations by the human phenotype ontology (HPO) disease group. We demonstrate the utility of our combining our state-of-the-art with gold standard methods in case-control studies across different diseases including severe COVID-19 and inflammatory bowel disease (IBD), where we discovered novel genetic etiologies.


Presenters

Thursday December 2, 2021 4:30pm - 4:40pm MST
Ballroom Salon 1

4:40pm MST

OP 11 - Assessing Equivalent and Inverse Change in Genes between Diverse Experiments
OP-11
Assessing Equivalent and Inverse Change in Genes between Diverse Experiments

Presenting Author: Lisa Neums, University of Kansas Medical Center

Abstract: It is important to identify when two exposures impact a molecular marker (e.g., a gene’s expression) in similar ways, for example, to learn that a new drug has a similar effect to an existing drug. Currently, statistically robust approaches for making comparisons of equivalence of effect sizes obtained from two independently run treatment versus control comparisons have not been developed. Here, we propose two approaches for evaluating the question of equivalence between effect sizes of two independent studies: a bootstrap test of the Equivalent Change Index (ECI), which we previously developed, and performing Two One-Sided t-Tests (TOST) on the difference in log-fold changes directly. We used a series of simulation studies to compare the two tests on the basis of balanced accuracy and F1-socre. We found that TOST is not efficient for identifying equivalently changed genes (F1-score = 0) because it is too conservative, while the ECI bootstrap test shows good performance (F1-score = 0.96). Furthermore, applying the ECI bootstrap test and TOST to publicly available microarray expression data from pancreatic cancer of tumor tissue and peripheral blood mononuclear cells (PBMC) showed that, while TOST was not able to identify any equivalently or inversely changed genes, the ECI bootstrap test identified genes associated with pancreatic cancer. In conclusion, a bootstrap test of the ECI is a promising new statistical approach for determining if two diverse studies show similarity in the differential expression of genes and can help to identify genes which are similarly influenced by a specific treatment or exposure.


Presenters
LN

Lisa Neums

University of Kansas


Thursday December 2, 2021 4:40pm - 4:50pm MST
Ballroom Salon 1

4:50pm MST

OP 12 - Recent Advances in Sequence Assembly Using Flow Decomposition
OP-12
Recent Advances in Sequence Assembly Using Flow Decomposition

Presenting Author: Brendan Mumey, Montana State University

Abstract:This presentation summarizes recent work to improve assembly/quantification methods for RNA-seq, metagenomics and viral quasi-species assembly based on better algorithms for flow decomposition.


Presenters
BM

Brendan Mumey

Montana State University


Thursday December 2, 2021 4:50pm - 5:00pm MST
Ballroom Salon 1

5:00pm MST

OP 13 - About the Research Data Alliance
OP-13 About the Research Data Alliance

Presenting Author: Stephanie Hagstrom, RDA-US

Abstract: The Research Data Alliance (RDA) is a non-profit organization created to provide a neutral space where members - international researchers and data experts - meet to address the challenges of managing the proliferation of research data. The community addresses topics such as data sharing, data management, certification of data repositories, funder mandates, disciplinary and interdisciplinary interoperability, as well as technological aspects. The presentation will introduce the RDA, its structure, and how those attending Rocky can contribute to improving Data Managemen


Presenters
avatar for Stephanie Hagstrom

Stephanie Hagstrom

Director of Community Development, RDA-US
Research Data Alliance, Director of Community Development


Thursday December 2, 2021 5:00pm - 5:10pm MST
Ballroom Salon 1

5:10pm MST

Break
Thursday December 2, 2021 5:10pm - 5:30pm MST
Ballroom Foyer

5:30pm MST

OP 14 - Crossing complexity of space-filling curves reveals new principles of genome folding
OP-14
Crossing complexity of space-filling curves reveals new principles of genome folding

Presenting Author: Nicholas Kinney, Virginia College of Osteopathic Medicine

Co-Author(s):
Molly Hickman, Virginia Tech
Ramu Anandakrishnan, Virginia College of Osteopathic Medicine
Harold Garner, Virginia College of Osteopathic Medicine

Abstract: Space-filling curves have been used for decades to study the folding principles of globular proteins, compact polymers, and chromatin. Different types of curves can be distinguished by their folding principles. Random (equilibrium) curves tend to have abundant knots and tangles; on the other hand, crumpled (Hilbert) curves lack knots and tangles. This latter class of curves is thought to be biologically favorable; particularly as models of genome folding in actively dividing cells. Indeed, cell division requires robust segregation of DNA. The present work investigates a new property of space filling curves: the crossing complexity. Briefly, chain crossings are tallied in the same way that stand breaks and ligations occur during mitosis. Crossing complexity is then compared for equilibrium and Hilbert curves with two main results. First, Hilbert curves limit entanglement between chromosomes. Second, Hilbert curves do not limit entanglement in a rudimentary model of S-phase DNA. Our second result is particularly surprising yet easily rationalized with a geometric argument. The future direction of this work seeks to reconstruct space-filling curves directly from chromosome proximity ligation experiments. A candidate algorithm is discussed. If successful, this will lead to a better understanding of the folding principles that govern the human genome.


Thursday December 2, 2021 5:30pm - 5:40pm MST
Ballroom Salon 1

5:40pm MST

OP 15 - The Cultural Evolution of Vaccine Hesitancy: Modeling the Interaction between Beliefs and Vaccination Behaviors
OP-15
The Cultural Evolution of Vaccine Hesitancy: Modeling the Interaction between Beliefs and Vaccination Behaviors

Presenting Author: Kerri-Ann Anderson, Vanderbilt University

Co-Author(s):
Nicole Creanza, Vanderbilt University

Abstract: Vaccine-preventable diseases (VPDs), such as measles, pertussis, and polio, have resurged in the developed world as a result of decreasing vaccination coverage due to increased vaccine hesitancy. The current COVID-19 pandemic demonstrates the complexities of health behaviors and underscores the relevance of these behaviors to public health. Society, culture, and individual motivations affect health-related decisions, and health perceptions and behaviors can change as cultures evolve. In recent years, mathematical models of disease dynamics have begun to incorporate aspects of human behavior; however, they do not address how cultural beliefs influence these behaviors, or how these behaviors in turn impact cultural beliefs. Using a mathematical modeling framework, we explore the effects of cultural evolution on vaccine hesitancy and vaccination behavior. With this model, we shed light on the facets of cultural evolution that facilitate vaccine hesitancy, ultimately affecting levels of vaccination coverage and VPD outbreak risk. We show vaccine confidence and cultural selection pressures are driving forces of vaccination behavior, leading to a general pattern in which the spread of vaccine confidence leads to high vaccination coverage. We then demonstrate that an assortative preference among vaccine-hesitant individuals can lead to increased vaccine hesitancy and lower vaccination coverage. Further, we show that vaccine mandates can foster vaccine hesitancy despite high vaccination coverage, whereas vaccine scarcity can result in the opposite pattern of high vaccine confidence but low vaccination coverage. We present our model as a generalizable framework for exploring cultural evolution when beliefs influence, but do not strictly dictate, human behaviors.


Presenters
avatar for Kerri-Ann Anderson

Kerri-Ann Anderson

Ph.D. Candidate, Vanderbilt University


Thursday December 2, 2021 5:40pm - 5:50pm MST
Ballroom Salon 1

5:50pm MST

OP 16 - Automated and systematic verification and validation increases quality and long-term reuse of models
OP-16
Automated and systematic verification and validation increases quality and long-term reuse of models

Presenting Author: Natasa Miskov-Zivanov, University of Pittsburgh

Abstract: Although modeling is an important component of a research pipeline in biology, most often there is no systematic or standardized approach for quality assessment and annotation of models, reducing their trustworthiness and reuse potential. Moreover, most of the model design and documenting steps are still done manually. Creating useful and reliable models of cellular signaling requires thorough and careful information extraction, knowledge assembly, comprehensive model verification and validation, which can take months, sometimes even years. The verification step, assessing whether the model structure is correct by finding support for all its elements and interactions, and the validation step, evaluation of model behavior against experimental observations and data, usually occur iteratively with model expansion before the model can be used to make predictions or explanations. The objective of our work is to develop an architecture that will allow researchers to automatically verify, assess the quality, annotate and expand their models, utilizing available literature and model databases. We have developed several methods and tools to automatically verify models using the information from literature and databases, and to test for contradictions between new knowledge and existing models. Our tools are able to process large amounts of information from literature and compare with models within seconds, a task that would take days to complete manually, and would likely be prone to errors. Outcomes of this work will contribute to increasing the long-term reuse of models and aid computational and systems biology researchers in assembling or selecting models with trusted quality.


Thursday December 2, 2021 5:50pm - 6:00pm MST
Ballroom Salon 1

6:00pm MST

OP 17 - SBMLWebApp: Web-based Simulation, Steady-State Analysis, and Parameter Estimation of Systems Biology Models
OP-17
SBMLWebApp: Web-based Simulation, Steady-State Analysis, and Parameter  Estimation of Systems Biology Models

Presenting Author: Andreas Dräger, University of Tübingen

Co-Author(s):
Takahiro G. Yamada, Keio University
Kaito Ii, Hewlett-Packard
Matthias König, Humboldt University of Berlin
Martina Feierabend, University of Tübingen
Akira Funahashi, Keio University

Abstract:
Summary: In systems biology, biological phenomena are often modeled by Ordinary Differential Equations (ODEs) and distributed in the de facto standard file format SBML. The primary analyses performed with such models are dynamic simulation, steady-state analysis, and parameter estimation. These methodologies are mathematically formalized, and libraries for such analyses have been published. Several tools exist to create, simulate, or visualize models encoded in SBML. However, setting up and establishing analysis environments is a crucial hurdle for non-modelers. Therefore, easy access to perform fundamental analyses of ODE models is a significant challenge. We developed SBMLWebApp, a web-based service to execute SBML-based simulations, steady-state analysis, and parameter estimation directly in the browser without the need for any setup or prior knowledge to address this issue. SBMLWebApp visualizes the result and numerical table of each analysis and provides a download of the results. SBMLWebApp allows users to select and analyze SBML models directly from the BioModels Database. Taken together, SBMLWebApp provides barrier-free access to an SBML analysis environment for simulation, steady-state analysis, and parameter estimation for SBML models. SBMLWebApp is implemented in Java™ based on an Apache Tomcat® web server using COPASI, the Systems Biology Simulation Core Library (SBSCL), and LibSBMLSim as simulation engines.

Availability:
SBMLWebApp is licensed under MIT with source code available from https://github.com/TakahiroYamada/SBMLWebApp. The program runs online at http://simulate-biology.org.

Keywords:
SBML; kinetic models; time-course simulation; steady-state simulation; parameter estimation; model calibration; software; web application


Presenters
avatar for Andreas Dräger

Andreas Dräger

Juniorprofessor, University of Tübingen
Andreas Dräger is the junior professor of computational systems biology of infections and antimicrobial-resistant pathogens at the University of Tübingen in Germany. His team aims to combat the spread of threatening infectious diseases using mathematical modeling and computer simulation... Read More →


Thursday December 2, 2021 6:00pm - 6:10pm MST
Ballroom Salon 1

6:30pm MST

Dinner - Il Poggio (6:30PM Shuttle/7PM DINNER)
Dinner 7:00 pm at the Il Poggio Restaurant in upper Snowmass Village - 57 Elbert Ln, Snowmass Village, CO 81615.  

There will be a shuttle bus from the Viceroy Hotel starting at 6:30 pm until 7:00 pm returning back to the hotel starting at 8:30 pm

You may also take the Skittle ski lift from the lower Snowmass Village to the upper Snowmass village.

Il Poggio Restaurant Website

Thursday December 2, 2021 6:30pm - 9:00pm MST
Off Site
 
Friday, December 3
 

8:00am MST

Breakfast
Friday December 3, 2021 8:00am - 9:00am MST
Ballroom Foyer

8:00am MST

Registration
The Rocky Conference registration desk is located on the bottom floor of the Viceroy Hotel where you can pick up your badge, tickets, and check-in.  

Please note:
COVID-19 INFORMATION AND POLICIES FOR ROCKY 2021 ATTENDEES
All ROCKY conference attendees are required to follow the United States CDC (Center for Disease Control) COVID mandates AND all Rocky2021 attendees, upon check-in at the Rocky conference registration desk, will be required to:
  • Show documentation with proof of COVID-19 vaccination, OR the results of a negative COVID test taken within 72 hours of arrival at the Rocky registration desk AND,  
  • Wear masks at all indoor Rocky-hosted activities - per local Pitkin County requirements.
If you are required to get tested prior to leaving the United States, the document at the link below includes a link to Pitkin County (Snowmass/Aspen) testing sites.  We have pinned this message to the top of the SLACK Channel. Here are more details and where to find more information.  https://docs.google.com/document/d/1vpxiOz6G2djvGLcPQZR6taJzq0gXHVNGB1nXsZFqnxc/edit?usp=sharing

Friday December 3, 2021 8:00am - 5:00pm MST
Viceoy - Lower Level

9:00am MST

Keynote 3 - Single-cell Biology in a Software 2.0 World - David Van Valen, PhD
DAVID VAN VALEN, PhD
Assistant Professor
Biology and Biological Engineering
CalTech
United States

Biosketch (pdf)

Single-cell Biology in a Software 2.0 World

Multiplexed imaging methods can measure the expression of dozens of proteins while preserving spatial information. While these methods open an exciting new window into the biology of human tissues, interpreting the images they generate with single-cell resolution remains a significant challenge. Current approaches to this problem in tissues rely on identifying cell nuclei, which results in inaccurate estimates of cellular phenotype and morphology. In this work, we overcome this limitation by combining multiplexed imaging’s ability to image nuclear and membrane markers with large-scale data annotation and deep learning. We describe the construction of TissueNet, an image dataset containing more than a million paired whole-cell and nuclear annotations across eight tissue types and five imaging platforms. We also present Mesmer, a single model trained on this dataset that can perform nuclear and whole-cell segmentation with human-level accuracy across tissue types and imaging platforms. We show that Mesmer accurately measures cell morphology in tissues, opening up a new observable for quantifying cellular phenotypes in tissues and harmonizing disparate datasets. We make this model available to users of all backgrounds with both cloud-native software and on-premise software. Last, we also describe ongoing work to develop similar resources and models for dynamic live-cell imaging data.

Presenters
avatar for David Ashley Van Valen

David Ashley Van Valen

Assistant Professor, California Institute of Technology



Friday December 3, 2021 9:00am - 9:45am MST
Ballroom Salon 1

9:45am MST

OP 18 - A comprehensive benchmarking of WGS-based structural variant callers
OP-18
A comprehensive benchmarking of WGS-based structural variant callers


Presenting Author: Varuni Sarwal, University of California Los Angeles

Co-Author(s):
Sebastian Niehus, Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Str. 2, 10178 Berlin, Germany
Eleazar Eskin, University of California Los Angeles
Jonathan Flint, University of California Los Angeles
Serghei Mangul, serghei.mangul@gmail.com

Abstract: A comprehensive benchmarking of WGS-based structural variant callers<br>Advances in whole-genome sequencing promise to enable accurate and comprehensive structural variant (SV) discovery. Dissecting SVs from whole-genome sequencing (WGS) data presents a substantial number of challenges and a plethora of SV-detection methods have been developed. Currently, there is a paucity of evidence which investigators can use to select appropriate SV-detection tools. In this project, we evaluated the performance of SV-detection tools on mouse and human WGS data using a comprehensive PCR-confirmed gold standard set of SVs and the GIAB variant set, respectively. In contrast to the previous benchmarking studies, our mouse gold standard dataset included a complete set of SVs allowing us to report both precision and sensitivity rates of SV-detection methods. Our study investigates the ability of the methods to detect deletions, thus providing an optimistic estimate of SV detection performance, as the SV-detection methods that fail to detect deletions are likely to miss more complex SVs. We found that SV-detection tools varied widely in their performance, with several methods providing a good balance between sensitivity and precision. Manta was the top-performing tool for both mouse and human data, with F-score values consistently above 0.6. Additionally, we have determined the SV callers best suited for low and ultra-low pass sequencing data as well as for different deletion length categories. We hope that the results reported in this benchmarking study can help researchers choose appropriate variant calling tools based on the organism, data coverage, and deletion length.


Presenters
avatar for Varuni Sarwal

Varuni Sarwal

Undergraduate student, UC Los Angeles


Friday December 3, 2021 9:45am - 9:55am MST
Ballroom Salon 1

9:55am MST

OP 19 - Exploring hypotheses of small cell lung cancer growth mechanisms using Bayesian mutlimodel inference
OP-19
Exploring hypotheses of small cell lung cancer growth mechanisms using Bayesian mutlimodel inference

Presenting Author: Samantha Beik, Vanderbilt University

Co-Author(s):
Leonard Harris, University of Arkansas
Vito Quaranta, Vanderbilt University
Carlos Lopez, Vanderbilt University

Abstract: Small cell lung cancer (SCLC) is a phenotypically heterogeneous disease, comprising multiple cellular subtypes within a tumor that exhibit differential sensitivity to drug treatments. SCLC heterogeneity is hypothesized to be responsible for rapid development of chemotherapy resistance, leading to the dismal 6% five-year survival rate for this disease. Experimental results from several studies suggest that treatment alters tumor composition from an initial makeup of phenotypic subtype(s) to different, less-treatment-sensitive subtypes. We hypothesize that this change arises from phenotypic transitions, rather than outgrowth of subclone(s) selected for by treatment, and that interactions between subtypes are key for tumor survival. We set out to use mathematical modeling to investigate the theoretical basis for SCLC tumor growth, but soon realized that analysis of only one interpretation of SCLC data (one model) would be flawed, and turned to multimodel inference (MMI) to address this issue. We move beyond traditional information theoretic MMI to a fully Bayesian approach, applying MMI to population dynamics models fit to SCLC tumor steady-state data. We extend our findings beyond a ranking of models toward a probabilistic view of subtype behaviors, determining that the existence of reversible phenotypic transitions is highly likely in SCLC. Our results highlight what knowledge is supported by the data and where more experiments are needed, with an aim to modulate tumor composition and decrease treatment resistance. This is sorely needed in SCLC, for which the survival rate has barely improved in decades. With sensible treatment options, the burden of this aggressive disease can be decreased.


Presenters
avatar for Samantha Beik

Samantha Beik

Vanderbilt University


Friday December 3, 2021 9:55am - 10:05am MST
Ballroom Salon 1

10:05am MST

OP 20 - Identifying and Classifying Goals for Scientific Knowledge
OP 20
Identifying and Classifying Goals for Scientific Knowledge

Presenting Author: MAYLA BOGUSLAV, University of Colorado Anschutz Medical Campus

Co-Author(s):
Nourah M Salem, University of Colorado Anschutz Medical Campus
Elizabeth K White, National Jewish Health
Sonia M Leach, National Jewish Health
Lawrence E Hunter, University of Colorado Anschutz Medical Campus

Abstract:
Motivation: Science progresses by posing good questions, yet work in biomedical text mining has not focused on them much. We propose a novel idea for biomedical natural language processing: identifying and characterizing the questions stated in the biomedical literature. Formally, the task is to identify and characterize statements of ignorance, statements where scientific knowledge is missing or incomplete. The creation of such technology could have many significant impacts, from the training of PhD students to ranking publications and prioritizing funding based on particular questions of interest. The work presented here is intended as the first step towards these goals.

Results: We present a novel ignorance taxonomy driven by the role statements of ignorance play in research, identifying specific goals for future scientific knowledge. Using this taxonomy and reliable annotation guidelines (inter- annotator agreement above 80%), we created a gold standard ignorance corpus of 60 full-text documents from the prenatal nutrition literature with over 10 000 annotations and used it to train classifiers that achieved over 0.80 F1 scores.

Availability and implementation: Corpus and source code freely available for download at https://github.com/ UCDenver-ccp/Ignorance-Question-Work. The source code is implemented in Python.

Contact: Mayla.Boguslav@CUAnschutz.edu



Friday December 3, 2021 10:05am - 10:15am MST
Ballroom Salon 1

10:15am MST

OP 21 - The Egyptian Center for Genome and Microbiome Research: A Funded North-South Collaboration for Capacity Building and Genomics Research Sustainability
OP-21
The Egyptian Center for Genome and Microbiome Research: A Funded North-South Collaboration for Capacity Building and Genomics Research Sustainability

Presenting Author: Ramy Aziz, Faculty of Pharmacy, Cairo University

Co-Author(s):
Mostafa Elshahed, Oklahoma State University
Noha Youssef, Oklahoma State University
Aymen Yassin, Faculty of Pharmacy, Cairo University

Abstract: The widening scientific/technological gap between the Global North and Global South is alarming. Bioinformatics has the potential to narrow that gap as it relies on human resources rather than costly equipment/supplies. However, bioinformatics research gains much power when the investigators generate their own data (typically multi-omic data), which is an economically challenging task for low- and low-middle income countries. Key factors towards successful capacity building in the Global South are technology transfer, repatriation of highly trained individuals, and genuine, synergistic, symbiotic North-South and South-South collaborations. Here, I demonstrate a successful example of North-South collaboration that aims at building capacity for genome and microbiome research at an academic institute in Egypt: “The Center for Genome and Microbiome Research” was established through a short-term fund, to build a sustainable collaborative center of excellence for innovative research, led by US-based and repatriated Egyptian principal investigators, with a second generation of junior scientists from both countries. The center was founded to have three main missions: (i) to conduct world-class research through continuous collaboration between the participating institutions; (ii) to provide training in microbiomics and genomics to a wide range of Egyptian and regional scientists, and (iii) to sustain itself through providing services for academic and industrial entities. Having started in 2018, the project has been successfully ongoing, with major research projects, such as the Nile River microbiome project, a hospital microbiome project, and different genome projects, in addition to offered services and training in different aspects of genome and microbiome sciences.


Presenters

Friday December 3, 2021 10:15am - 10:25am MST
Ballroom Salon 1

10:25am MST

Break
Friday December 3, 2021 10:25am - 10:45am MST
Ballroom Foyer

10:45am MST

OP 22 - The Ramp Atlas: Facilitating tissue-specific ramp sequence analyses across humans and SARS-CoV-2
OP-22
The Ramp Atlas: Facilitating tissue-specific ramp sequence analyses across humans and SARS-CoV-2

Presenting Author: Justin Miller, University of Kentucky; Brigham Young University

Co-Author(s):
Taylor Meurs, Brigham Young University
Matthew Hodgman, University of Kentucky
Benjamin Song, Brigham Young University
Kyle Miller, Utah Valley University
Mark Ebbert, University of Kentucky
John Kauwe, Brigham Young University
Perry Ridge, Brigham Young University

Abstract: Ramp sequences are essential genetic regulatory regions that counterintuitively function to slow initial translation, which ultimately maximizes overall translational efficiency by evenly spacing ribosomes and limiting downstream ribosomal collisions. Since widespread tissue-specific differences in relative codon adaptiveness occur, we predicted that the existence of a gene-specific ramp sequence would change between tissues without altering the underlying genetic code and would ultimately result in differential tissue-specific gene expression. Here, we present the first comprehensive analysis of tissue-specific ramp sequences, and report 3,108 genes with ramp sequences that change between tissues. The Ramp Atlas (https://ramps.byu.edu/) is an accompanying web portal that shows that the presence of a ramp sequence significantly correlates with higher gene expression in The Functional Annotation of Mammalian Genomes (FANTOM5) dataset (odds=1.1152; p-value=3.00x10-99), The Genotype-Tissue Expression (GTEx) Project dataset (odds=1.1578; p-value=9,48x10-155), The Human Protein Atlas dataset (odds=1.1947; p-value=1.27x10-306), and a consensus dataset (odds=1.1477; p-value=1.00x10-254). We also identified seven SARS-CoV-2 genes and seven human SARS-CoV-2 entry factor genes with tissue-specific ramp sequences that are present more frequently in tissues that the virus is known to infect (p-value=0.009918), which may help explain viral proliferation within those tissues. The Ramp Atlas facilitates wider adoption and application of ramp sequences through interactive graphics and an online programmatic interface. We propose that future ramp sequence calculations should consider ramp sequence variability that may occur within an organism based on tissue-specific codon optimality, and variable ramp sequences might be an additional mechanism for regulating tissue and cell type-specific differential gene expression that warrants further exploration.


Presenters
avatar for Justin Miller

Justin Miller

University of Kentucky


Friday December 3, 2021 10:45am - 10:55am MST
Ballroom Salon 1

10:55am MST

OP 23 - The systematic assessment for the completeness of metadata information accompanying omics studies
OP-23
The systematic assessment for the completeness of metadata information accompanying omics studies

Presenting Author: Yu-Ning Huang, University of Southern California

Co-Author(s):
Serghei Mangul, University of Southern California
Anushka Rajesh, University of Southern California
Jieting Hu, University of Southern California
Ruiwei Guo, University of Southern California
Man Yee Wong, University of Southern California
Jiaqi Fu, University of Southern California
Elizabeth Ling, University of Southern California
Irina Nakashidze, Batumi Shota Rustaveli State University
Steven Beringer, University of Southern California
Aditya Sarkar, Indian Institute of Technology Mandi

Abstract:Genomic data is easily accessible and available, owing to the ubiquity of public genomic repositories that allow researchers to share their study datasets. However, improperly annotated and incomplete metadata accompanying the raw data make the researchers almost impossible to reuse the data directly through the public repositories for secondary analysis and might slow down biomedical discoveries’ progress. Our study aims to assess the completeness of metadata accompanying omics studies in both publication and its related online repositories and make observations about how the process of data sharing could be made reliable. The study involved an initial literature survey in finding studies based on the seven therapeutic fields, sepsis, tuberculosis, cystic fibrosis, cardiovascular disease, acute myeloid leukemia, inflammatory bowel disease, and Alzheimer’s disease. We used computational tools (Python scripts) to extract metadata from the public repository, manually observed the availability of metadata in both publication and repositories, and then statistically visualized the results obtained from the analysis. By comparing the metadata availability on both platforms, orginal publications, and online repositories, we observed discrepancies between omics data and the corresponding metadata on public repositories. We advocate the need to have a standardized "checklist" for researchers to submit their study results and data to public repositories based on our results. Our study opens a comprehensive discussion about this potential solution to bridge the gap between omics data and metadata on repositories.


Presenters
YH

Yu-Ning Huang

University of Southern California


Friday December 3, 2021 10:55am - 11:05am MST
Ballroom Salon 1

11:05am MST

OP 24 - A Tale of Two Systems: Influenza and COVID-19
OP-24
A Tale of Two Systems: Influenza and COVID-19

Presenting Author: Christian Forst, Icahn School of Medicine at Mount Sinai, New York, NY

Abstract: The ongoing SARS-CoV-2 pandemic poses a threat to public health and economy, thus urges the scientific community to join efforts in the search of cures. Meanwhile, both influenza and COVID-19 are respiratory diseases caused by airborne RNA viruses, cause massive interferon response as a first line of defense of the host against the infection, and scientific advances from studying influenza infection have potentials to benefit the search of cure for SARS-CoV-2 infections. Indeed, increasing evidences show co-infections of influenza and SARS-CoV-2 were common during the pandemic and pose greater risk of developing poor outcomes, and influenza A pre-infection promotes infectivity of SARS-CoV-2 potentially via increasing ACE2 expressions. Here we present a comprehensive, multi-scale network analysis of the systems response to the virus. We have developed methods that integrate single-cell and bulk transcriptomic data. These integrated data were further related to clinical outcomes. By this approach we were able to identify cell-population specific key-regulators and host-processes that are hijacked by the virus for its advantage and that contribute to the severity of these infectious diseases.


Friday December 3, 2021 11:05am - 11:15am MST
Ballroom Salon 1

11:15am MST

OP 25 - Genomic epidemiology of the SARS-CoV-2 Delta variant in Arizona USA
OP-25
Genomic epidemiology of the SARS-CoV-2 Delta variant in Arizona USA

Presenting Author: Matthew Scotch, Arizona State University

Co-Author(s):
Temitope Faleye, Arizona State University
Arvind Varsani, Arizona State University
Rolf Halden, Arizona State University

Abstract: We examined the genomic epidemiology of the delta variant of SARS-CoV-2 in Arizona to understand its evolution, introduction, and subsequent spread in the State.

We downloaded full genomes from GISAID and kept sequences with Arizona county metadata and a collection of global representative sequences. We trimmed the 5’ and 3’ UTRs and used NextClade to filter out poor quality consensus sequences, and then removed sequences from counties of low sample size. We used BEAST v1.10 to create global and intra-State phylogeography models under different tree priors, molecular clocks, and DNA substitution models. We included a generalized linear model (GLM) to quantify the importance of predictors of virus spread. Using Tracer v.1.7.1, we examined ESS values of model parameters and checked for convergence. Using SpreaD3 v0.9.7.1, we calculated Bayes Factors (BF) for the most well supported transmission rates between Arizona counties.

We found that Maricopa County originated as a source for the most highly supported transmission to other Arizona counties, this was the case even after down sampling to reduce bias. Of the 14 routes with a BF of ≥ 3, Maricopa County was origin for the highest 5 routes. Seven-day rolling averages of both cases and deaths from the county of origin served as the best predictors of SARS-CoV-2 spread. Our analysis found several independent introductions of the Delta variant into Arizona from North America and Europe.

Results highlight the importance of genomic sequencing and genomic epidemiology for monitoring local spread of public health threats, including variants of SARS-CoV-2.


Presenters
avatar for Matthew Scotch

Matthew Scotch

Professor, Arizona State University
Matthew Scotch is Professor of Biomedical Informatics in the College of Health Solutions and Assistant Director of the Biodesign Center for Environmental Health Engineering at Arizona State University. His research focuses on genomic epidemiology and bioinformatics of RNA viruses with a particular interest in influenza A viruses. Current projects include studying approaches to advance genomic epidemiology by enrichment of virus sequence metadata (funding... Read More →


Friday December 3, 2021 11:15am - 11:25am MST
Ballroom Salon 1

11:25am MST

OP 26 - Predicting protein level changes from transcript level data
OP-26
Predicting protein level changes from transcript level data

Presenting Author: Edward Lau, University of Colorado School of Medicine

Abstract: Proteins perform the majority of biological functions. It follows that gene signatures from transcriptomics data would have different biological relevance based on how well they predict protein levels. We revisit how well transcript level changes predict protein level changes at gene-wise granularity, using current sequencing and mass spectrometry data sets and comparing several statistical learning approaches. The result adds to emerging evidence for a biological basis of RNA-protein non-correlation that varies by cellular components and pathways. We identified proteins whose levels are nonlinearly related to transcript levels, as well as proteins better predicted by different transcripts than their own gene's. We propose a strategy to analyze and prioritize transcript signatures in RNA sequencing data and apply it to examine striated muscle aging mechanisms.


Friday December 3, 2021 11:25am - 11:35am MST
Ballroom Salon 1

11:35am MST

Box Lunch Pick up (For those who purchased in advance)
Box lunches are available for pickup at the conference registration desk for those who purchase tickets in advance.   

Friday December 3, 2021 11:35am - 11:50am MST
Ballroom Foyer

11:35am MST

Ski Break

Friday December 3, 2021 11:35am - 4:00pm MST
Personal Time

4:00pm MST

Keynote 4 - Mapping and Navigating the Human Regulatory Genome - Wouter Meuleman, PhD
WOUTER MEULEMAN, PhD
Principal Investigator
Altius Institute for Biomedical Sciences
United States

Biography (web)

Mapping and Navigating the Human Regulatory Genome


This year marks the 20 year anniversary of the sequencing of the human genome in 2001. Since then, many large-scale data generation and analysis efforts have built upon this work by producing genome-wide maps and annotations. Most recently, we have systematically delineated and annotated accessible DNA elements in the human genome, by integrating more than 700 genome-wide maps of chromatin accessibility resulting in a single high definition annotation. Additionally, we have developed simple information-theoretic metrics (epilogos) to integrate chromatin state data across nearly 1,000 cell types and states. Despite these and many other efforts, systems to efficiently navigate genomic maps at scale have remained lacking. At the same time, consumer-facing web businesses such as Zillow, Spotify, and Amazon have long understood the value of learning from patterns collected across large corpora of data, to better serve customers, maximize investment returns and prioritize future directions. This gap between current practice in genomics and ultimate potential forms the overarching motivation for our work. We are coupling massive amounts of genomics data to powerful recommendation engines and related machine learning approaches to generate insights not otherwise obtained. These ideas represent an essential and inevitable transition towards “augmented genomics”, a new field in which the work of genome scientists is supplemented - not replaced! - by data-driven machine intelligence.

Presenters
avatar for Wouter Meuleman

Wouter Meuleman

Principal Investigator, Altius Institute for Biomedical Sciences
Wouter Meuleman’s research focuses on how the regulatory genome is organized, and what the functional implications of this organization are. His long term research goal is to make “augmented genomics” a reality: a new field in which the work of genome scientists is supplemented... Read More →



Friday December 3, 2021 4:00pm - 4:30pm MST
Ballroom Salon 1

4:30pm MST

OP 27 - Morphology and gene expression profiling provide complementary information for mapping cell state
OP-27
Morphology and gene expression profiling provide complementary information for mapping cell state

Presenting Author: Gregory Way, University of Colorado Anschutz

Co-Author(s):
Ted Natoli, Broad Institute of MIT and Harvard
Adeniyi Adeboye, Broad Institute of MIT and Harvard
Lev Litichevskiy, Broad Institute of MIT and Harvard
Andrew Yang, Broad Institute of MIT and Harvard
Xiaodong Lu, Broad Institute of MIT and Harvard
Juan Caicedo, Broad Institute of MIT and Harvard
Beth Cimini, Broad Institute of MIT and Harvard
Kyle Karhohs, Broad Institute of MIT and Harvard
David Logan, Pfizer
Mohammad Rohban, Imaging Platform
Maria Kost-Alimova, Center for the Development of Therapeutics
Kate Hartland, Center for the Development of Therapeutics
Michael Bornholdt, Imaging Platform
Niranj Chandrasekaran, Imaging Platform
Marzieh Haghighi, Imaging Platform
Shantanu Singh, Imaging Platform
Aravind Subramanian, Cancer Program
Anne Carpenter, Imaging Platform

Abstract: Deep profiling of cell states can provide a broad picture of biological changes that occur in disease, mutation, or in response to drug or chemical treatments. Morphological and gene expression profiling, for example, can cost-effectively capture thousands of features in thousands of samples across perturbations, but it is unclear to what extent the two modalities capture overlapping versus complementary mechanistic information. Here, using both the L1000 and Cell Painting assays to profile gene expression and cell morphology, respectively, we perturb A549 lung cancer with 1,327 small molecules from the Drug Repurposing Hub across six doses. We determine that the two assays capture some shared and some complementary information in mapping cell state. We find that as compared to L1000, Cell Painting captures a higher proportion of reproducible compounds and mechanisms and has more diverse samples, but measures fewer distinct groups of features. In a deep learning analysis, L1000 predicted more compound mechanisms of action (MOA). In general, the two assays together provide a complementary view of drug mechanisms for follow up analyses. Our analysis answers fundamental biological questions comparing the two biological modalities and, given the numerous applications of profiling in biology, provides guidance for planning experiments that profile cells for detecting distinct cell types, disease phenotypes, and response to chemical or genetic perturbations.


Presenters
avatar for Gregory Way

Gregory Way

Assistant Professor, University of Colorado Anschutz


Friday December 3, 2021 4:30pm - 4:40pm MST
Ballroom Salon 1

4:40pm MST

OP 28 - Improving the interpretability of random forest models of genetic association in the presence of non-additive interactions
OP-28
Improving the interpretability of random forest models of genetic association in the presence of non-additive interactions

Presenting Author: Alena Orlenko, University of Pennsylvania

Co-Author(s):
Jason Moore, University of Pennsylvania

Abstract: Non-additive interactions among genes are frequently associated with a number of phenotypes, including known complex diseases such as Alzheimer’s, diabetes, and cardiovascular disease. Detecting interactions requires careful selection of analytical methods, and some machine learning algorithms are unable or underpowered to detect or model feature interactions that exhibit non-additivity. The Random Forest (RF) method is often employed in these efforts due to its ability to detect and model non-additive interactions. RF has the built-in ability to estimate feature importance scores, a characteristic that allows the model to be interpreted with the order and effect size of the feature association with the outcome. This characteristic is very important for epidemiological and clinical studies where results of predictive modeling could be used to define the future direction of the research efforts. An alternative way to interpret the model is with a permutation feature importance metric which employs a permutation approach and with the Shapely additive explanations which employ cooperative game theory approach. Currently, it is unclear which RF feature importance metric provides a superior estimation of the true informative contribution of features in genetic association analysis.<br><br>To address this issue, and to improve interpretability of RF predictions, we compared different methods for feature importance estimation in real and simulated datasets with non-additive interactions. As a result, we detected a discrepancy between the metrics for the real-world datasets and further established that the permutation feature importance metric provides more precise feature importance rank estimation for the simulated datasets with non-additive interactions.


Presenters
AO

Alena Orlenko

University of Pennsylvania


Friday December 3, 2021 4:40pm - 4:50pm MST
Ballroom Salon 1

4:50pm MST

OP 29 - Germline modifiers of the tumor immune microenvironment reveal drivers of immunotherapy response
OP-29
Germline modifiers of the tumor immune microenvironment reveal drivers of immunotherapy response

Presenting Author: Meghana Pagadala, UCSD

Co-Author(s):
Victoria Wu, Moores Cancer Center
Hyo Kim, UCSD
Andrea Castro, UCSD
James Talwar, UCSD
Cristian Gonzalez-Colin, La Jolla Institute of Immunology
Steven Cao, UCSD
Benjamin Schmiedel, La Jolla Institute of Immunology
Shervin Goudarzi, Canyon Crest Academy
Divya Kirani, UCSD
Rany Salem, UCSD
Gerald Morris, UCSD
Olivier Harismendy, Moores Cancer Center
Sandip Patel, Morres Cancer Center
Jill Mesirov, UCSD
Maurizio Zanetti, Moores Cancer Center
Chi-Ping Day, National Institutes of Health
Chun Fan, UCSD
Wesley Thompon, UCSD
Glenn Merlino, National Institutes of Health
Eva Pérez-Guijarro, National Institutes of Health
J Silvio Gutkind, Moores Cancer Center
Pandurangan Vijayanand, La Jolla Institute of Immunology
Hannah Carter, UCSD

Abstract: With the continued promise of immunotherapy as an avenue for treating cancer, understanding how host genetics contributes to the tumor immune microenvironment (TIME) is essential to tailoring cancer risk screening and treatment strategies. Using genotypes from over 8,000 European individuals in The Cancer Genome Atlas and 137 heritable tumor immune phenotype components (IP components), we identified and investigated 532 TIME-SNPs. Focusing on 77 variants that were relevant to cancer risk, survival, or treatment response, we explored their potential to reveal novel targets for immunotherapy. Many variants overlapped regions with histone marks indicating active transcription, and influenced gene activities in specific immune cell subsets, such as macrophages and dendritic cells. TIME-SNPs implicated genes such as LAIR1, TREX1, CTSS, CTSW and LILRB2 were differentially expressed between responders and non-responders to immune-checkpoint blockade (ICB) in preclinical studies. Of these, LILRB2 and LAIR1 have already been identified as putative targets for immunotherapy. Here we found that inhibition of CTSS led to better tumor control and survival in murine models, alone or in combination with anti-PD-1. Collectively we show that through an integrative approach, it is possible to link host genetics to TIME characteristics, informing novel biomarkers for cancer risk and target identification in immunotherapy.


Friday December 3, 2021 4:50pm - 5:00pm MST
Ballroom Salon 1

5:00pm MST

OP 30 - Genome Skimming by Shotgun Sequencing to Address Longstanding Questions of Lichen Species Diversity
OP-30
GENOME SKIMMING BY SHOTGUN SEQUENCING TO ADDRESS LONGSTANDING QUESTIONS OF LICHEN SPECIES DIVERSITY

Presenting Author: Jeffrey Clancy, Brigham Young University

Co-Author(s):
Steve Leavitt, Brigham Young University

Abstract: With advances in sequencing technologies, genome-scale data now play a pivotal role in resolving long-standing evolutionary and taxonomic questions. Genomic data can be particularly useful in non-model and understudied organismal groups, providing novel insight into questions that previously remained in the realm of speculation. Here we explore the utility of genome data for inferring species boundaries and evolutionary relationships in a symbiotic fungal species occurring in alpine/polar habitats worldwide. The nominal taxon is morphologically variable, and interpretation of this variation has been under debate for over 200 years. We sampled over 300 specimens from populations worldwide and delimited candidate species partitions using the standard DNA barcoding marker for fungi. Representative samples were then selected for high-throughput short-read, shotgun sequencing to validate their evolutionary independence using thousands of independent gene regions. We generated alignments for 1209 single-copy nuclear genes (2.27 Mb, total), in addition to an alignment spanning most of the mitochondrial genome (65.4 Kb). Over 70 candidate species were inferred from the fungal DNA barcoding data, and these species hypotheses were consistently supported by genome-scale data. This phylogenomic validation approach provides compelling evidence that the DNA-based candidate species represent evolutionarily distinct species-level lineages. While these species-level lineages were diagnosable using standard DNA barcoding marker for fungi, high levels of phenotypic variation were commonly observed among specimens within candidate species-level partitions, highlighting the limited utility of traditional taxonomic approaches. Genome skimming can provide powerful data to help resolve longstanding questions of species boundaries in taxonomically challenging groups.



Friday December 3, 2021 5:00pm - 5:10pm MST
Ballroom Salon 1

5:10pm MST

Break
Friday December 3, 2021 5:10pm - 5:30pm MST
Ballroom Foyer

5:30pm MST

OP 31 - Response Signatures of SARS-CoV2 Infection Identified Using Deep Learning Approaches on Single Cell RNA-Seq Data
OP-31
Response Signatures of SARS-CoV2 Infection Identified Using Deep Learning Approaches on Single Cell RNA-Seq Data

Presenting Author: Mario Flores, University of Texas at San Antonio

Abstract: One of the mysteries of Coronavirus Disease 19 (COVID-19) is why some people suffer severe symptoms, even life-threatening complications, while others suffer no symptoms or just mild ones. Several studies have related the severity of COVID-19 infection to immune system features resulting in more vulnerable groups to this viral infection. The goal of this study is to elucidate the response signatures of COVID-19 infection by identifying gene markers and activation patterns of cells related to patients with different degrees of severity. In particular, single cell RNA-Seq (scRNA-Seq) datasets of severe and mild cases were compared to uninfected cases using a Deep Learning approach.


Presenters
avatar for Mario Flores

Mario Flores

Assistant Professor, University of Texas at San Antonio


Friday December 3, 2021 5:30pm - 5:40pm MST
Ballroom Salon 1

5:40pm MST

OP 32 - Multispecies cities in the Anthropocene: bioremediation and biomining potential of the Gowanus Canal Microbiome, an urban Superfund site
OP-32
Microbial survival in the Anthropocene: Bioremediation and biomining potential of a superfund site - the Gowanus Canal

Presenting Author: Chandrima Bhattacharya, Weill Cornell Medicine

Co-Author(s):
Rupobrata Panja, CSIR-Institute of Minerals and Materials Technology
Ian Quate
Matthew Seibert
Ellen Jorgensen
Christopher Mason, Weill Cornell Medicine
Sergios-Orestis Kolokotronis, SUNY
Elizabeth Henaff, NYU

Abstract: The environment of the Gowanus Canal in New York City is emblematic of the many post-industrial Superfund sites across the country. Many of these locations were important hubs for manufacturing industries or research and development, and have now been abandoned, leaving a legacy of toxicity and pollutants not only in the canal itself but also in the surrounding areas. We explore microbial bioremediation of hazardous polluted sites as a promising field of study, especially when it is possible to potentially mine the microbes for novel secondary metabolites, including identification of molecules related to microbial multi-drug resistance as well as species harboring extreme adaptability characteristics. We present the largest metagenomic analysis consisting of both longitudinal study and depth-based study of sediment from the Gowanus Canal. We identify extremophiles as well as marine and freshwater sediment species and demonstrate enrichment of bioremediation-related metabolic pathways. These metabolisms include remediation of industrial pollutants of historical significance to the industrialization of the area including heavy metals and organic pollutants. We identify a cluster of genes related to antimicrobial resistance present in the Canal microbiome. Our findings on the Gowanus Canal microcosm usher in the potential of discovery and research on other extreme environments for novel species and secondary metabolites from biosynthetic gene clusters. We can conclude microbes associated with Extreme Environments including those in Superfund Sites can show adaptation to not only remediate and clean up hazardous material but also produce significant secondary metabolites with prospective biological significance to make life better in the Anthropocene.


Presenters

Friday December 3, 2021 5:40pm - 5:50pm MST
Ballroom Salon 1

5:50pm MST

OP 33 - Higher-order Markov models for metagenomic sequence classification
OP-33
Higher-order Markov models for metagenomic sequence classification

Presenting Author: Rajeev Azad, University of North Texas

Co-Author(s):
David Burks, University of North Texas

Abstract: Alignment-free, stochastic models derived from k-mer distributions representing reference genome sequences have a rich history in the classification of DNA sequences. In particular, the variants of Markov models have previously been used extensively. Higher-order Markov models have been used with caution, perhaps sparingly, primarily because of the lack of enough training data and computational power. Advances in sequencing technology and computation have enabled exploitation of the predictive power of higher-order models. We, therefore, revisited higher-order Markov models and assessed their performance in classifying metagenomic sequences. Comparative assessment of higher-order models (HOMs, 9th order or higher) with interpolated Markov model, interpolated context model and lower-order models (8th order or lower) was performed on metagenomic datasets constructed using sequenced prokaryotic genomes. Our results show that HOMs outperform other models in classifying metagenomic fragments as short as 100 nt at all taxonomic ranks, and at lower ranks when the fragment size was increased to 250 nt. HOMs were also found to be significantly more accurate than local alignment which is widely relied upon for taxonomic classification of metagenomic sequences. A novel software implementation written in C++ performs classification faster than the existing Markovian metagenomic classifiers and can therefore be used as a standalone classifier or in conjunction with existing taxonomic classifiers for more robust classification of metagenomic sequences. The software has been made available at https://github.com/djburks/SMM.


Friday December 3, 2021 5:50pm - 6:00pm MST
Ballroom Salon 1

6:00pm MST

OP 34 - DEGAS: Mapping clinical metrics to spatial transcriptomics with deep learning
OP-34
DEGAS: Mapping clinical metrics to spatial transcriptomics with deep learning

Presenting Author: Justin Couetil, Indiana University School of Medicine

Co-Author(s):
Justin Couetil, Indiana University School of Medicine
Jie Zhang, Indiana University School of Medicine
Kun Huang, Indiana University School of Medicine
Travis Johnson, Indiana University School of Medicine

Abstract: To search for links between cancer genotype and phenotype, we developed the DEGAS framework to map disease information to spatially resolved tumors.<br><br>In the era of precision medicine, spatial transcriptomics (ST) offers a unique opportunity to characterize tumor morphology and transcriptional heterogeneity simultaneously. We train deep transfer learning networks on ST and bulk-RNA seq with disease information (i.e., survival, treatment response, disease status, risk factors) to infer these characteristics spatially on the ST slide. Using the breast cancer data from TCGA, normal tissue from GTEX, and three 10x Genomics ST data sets of breast ductal adenocarcinoma, we identify high-risk regions of tumor tissue that align with 76-84% of the clusters derived from ST data alone. This shows that we can infer clinical attributes while maintaining the transcriptional differences in the ST slide.

Our methodology includes gold-standard preprocessing, feature selection, model training, post-processing, and data visualization tools. This represents a robust framework to use clinical data to identify regions of tumor which may reflect resistance to certain therapies, have certain mutations, or RNA signatures corresponding to lifestyle risk factors like smoking. As spatial transcriptomics become higher resolution and less costly, we hope our framework can be used as a “spotlight” to show researchers which subpopulations and spatial organizations of tumor cells may contribute to a patient’s clinical trajectory.

We plan to develop multimodal DEGAS models, allowing researchers to use this framework to link clinical phenotype to genomic (e.g. circulating tumor DNA), histologic, transcriptional, and proteomic data.



Friday December 3, 2021 6:00pm - 6:10pm MST
Ballroom Salon 1

6:10pm MST

6:30pm MST

Poster Presentations and Reception
The Poster Reception will start at 6:30 pm after we rearrange the room following the last abstract talk on this day.  

Poster List without Abstracts
Poster List with Abstracts  

FOR PRESENTERS

Poster Session Hours
The Poster Session with the authors present will be on Friday evening. Poster Presenters must be available for presentation during the scheduled poster session.

POSTER NUMBER ASSIGNMENTS
Posters are assigned to numbers here:  Poster List without Abstracts

POSTER SIZE
The poster board dimensions are 4 feet high x 4 feet wide. Tacks will be provided for securing your poster to the board.
Schedule:
DAY/DATETIMEACTIVITY/LOCATIONFriday:
Dec 312pm – 6pm                               SET UP POSTERS
(Maximum size 4 feet high
x 4 feet wide)

Location: Viceroy Hotel Ballroom
Friday:
Dec 36:30pm – 08:30pmPOSTER SESSION
Location:
Viceroy Hotel Ballroom6:30pm – 07:30pmAuthors with Even Number Posters Present7:30pm – 8:30pmAuthors with Odd Number Posters Present8:30pmAuthors: please remove posters from boards at end of this session
FURTHER QUESTIONS

rocky@iscb.org


Friday December 3, 2021 6:30pm - 8:30pm MST
Ballroom Salon 3
 
Saturday, December 4
 

8:00am MST

Breakfast
Saturday December 4, 2021 8:00am - 9:00am MST
Ballroom Foyer

8:00am MST

Registration
Saturday December 4, 2021 8:00am - 11:00am MST
Viceoy - Lower Level

9:00am MST

Keynote 5 - NLP Sandbox: Overcoming Data Access Barriers to Reliably Assess the Performance of NLP Tools - Thomas Schaffter, PhD
THOMAS SCHAFFTER, PhD
Lead of Research & Benchmarking Technology Workstream
Senior Bioinformatics & Full Stack Engineer
Sage Bionetworks
United States

LinkedIn Profile

NLP Sandbox: Overcoming Data Access Barriers to Reliably Assess the Performance of NLP Tools

Critical patient information derived from academic research, health care, and clinical

trials are off-limits for traditional data-to-model (whereby data is transferred/downloaded into a new environment to be colocated with the executable model) benchmarking of NLP tools. Existing barriers include restricted access to prohibitively large or sensitive data. In addition to data access constraints, we also lack effective frameworks for assessing the performance and generalizability of NLP tools.

The NLP Sandbox adopts a model-to-data approach to enable NLP developers to assess the performance of their tools on public and private datasets. When a developer submits a tool, partner organizations (e.g., hospitals, universities) automatically provision a tool, execute it, and evaluate its performance against their private data in a secure environment. Upon successful completion, the partner organization reports what the performance of the tool is and this report is automatically published in the NLP Sandbox leaderboards.

The first series of NLP tasks that the NLP Sandbox supports is the annotation of Protected Health Information (PHI) in clinical notes. These tasks have been identified through our collaboration with the National Center for Data to Health (CD2H). Submitted tools are currently evaluated on the dataset of the 2014 i2b2 NLP De-identification Challenge and private data from MCW. Additional data sites are currently being onboarded (Mayo Clinic, UW).


Presenters
avatar for Thomas Schaffter

Thomas Schaffter

Lead of the Research & Benchmarking Technology Workstream, Sage Bionetworks
Celebrating 10+ years of contributions to the DREAM Challenges and to the development of research and benchmarking technologies for biomedical computational tools.



Saturday December 4, 2021 9:00am - 9:30am MST
Ballroom Salon 1

9:30am MST

OP 35 - Scientific Reproducibility and Research Automation with QIIME 2 Provenance Replay
OP-35
Scientific Reproducibility and Research Automation with QIIME 2 Provenance Replay

Presenting Author: Christopher Keefe, Northern Arizona University

Co-Author(s):
J. Gregory Caporaso, Northern Arizona University

Abstract: Bioinformatics workflows are often complex, consisting of dozens or hundreds of processes, with variation possible in input data, method selection, and parameterization at each step. This complexity creates known challenges for study organization, reporting, and reproducibility. Reproducibility of analyses is further complicated by the diversity of computer and software systems, differences between which may prevent successful study replication. QIIME 2, a prominent free and open platform for microbiome science, packages the full history (i.e. “provenance”) of every analysis result within the result itself, including software versions, methods, parameters, and user-provided metadata. <br><br>Here we present software tools for reproducibility and automation, which use this provenance data to validate results and replay analyses. Beginning with any QIIME 2 Result, checksum-based validation allows us to confirm the integrity of the data artifact or visualization. “Provenance replay” produces executable files capable of replicating the Result in question from the original input data, providing a robust tool for methods reproducibility. These executables may also be applied directly to similarly structured data, modified, or extended, supporting results reproducibility and generalization. This facilitates the automation of repeated analyses, and reduces record-keeping, training, and communication burdens in collaborative research contexts. The software, a work in progress, will be extended to allow reproduction of full computational analyses from collections of data artifacts and visualizations. Additional benefits include the ability to output full-analysis citation lists, identify results impacted by known software bugs, and conduct meta-analyses of research methods.


Saturday December 4, 2021 9:30am - 9:40am MST
Ballroom Salon 1

9:40am MST

OP 36 - Robustness and reproducibility of computational genomics tools
OP-36
Robustness and reproducibility of computational genomics tools

Presenting Author: Serghei Mangul, USC

Co-Author(s):
Pelin Icer Baykal, ETH Zurich

Abstract: Reproducibility and robustness of genomic tools are two important factors to assess the accuracy of bioinformatics analysis. Such assessment based on these criteria requires repetition of experiments across lab facilities which is usually costly and time-consuming and sometimes even impossible. In this study, we report the development of CompRep, a novel scalable method able to generate computational replicates by altering the properties of sequencing data. Computational replicates are created by randomly shuffling the order of reads and by taking the reverse complement of the reads. Despite that our method is not able to capture full variability across real technical replicates, our method is able to provide a robust low bound estimate of the reproducibility of genomic tools.

We analyzed two different groups of genomic tools: genomic read alignment tools and structural variant (SV) detection. We observed that for some genomic tools handling reverse complement data is much more challenging than handling randomly shuffled data. This analysis reflects substantial variability across different genomic tools.

The model that we propose will enable broad biomedical communities to easily access the robustness and reproducibility of genomic tools allowing choose genomics tools able to preserve consistent results across technical replicates. Furthermore, our method will enable routine robustness and reproducibility evaluation of newly published tools at scale at no cost.


Presenters
avatar for Serghei Mangul

Serghei Mangul

Assistant Professor, University of Southern California
Dr. Mangul is an Assistant Professor in the Department of Clinical Pharmacy at the University of Southern California School of Pharmacy. His lab designs, develops and applies novel and robust data-driven, computational approaches that will accelerate the diffusion of genomics and... Read More →


Saturday December 4, 2021 9:40am - 9:50am MST
Ballroom Salon 1

9:50am MST

OP 37 - Define and visualize pathological architectures of human tissues from spatially resolved transcriptomics using deep learning
OP-37
Define and visualize pathological architectures of human tissues from spatially resolved transcriptomics using deep learning

Presenting Author: yuzhou Chang, The Ohio State University

Co-Author(s):
fei he, Northeast Normal University
Fei He, School of Information Science and Technology
Juexin Wang, University of Missouri
Dong Xu, University of Missouri

Abstract: Spatially resolved transcriptomics provides a new way to define spatial contexts and understand biological functions in complex diseases. Although some computational frameworks can characterize spatial context via various clustering methods, the detailed spatial architectures and functional zonation often cannot be revealed and localized due to the limited capacities of associating spatial information. We present RESEPT, a deep-learning framework for characterizing and visualizing tissue architecture from spatially resolved transcriptomics. Given inputs as gene expression or RNA velocity, RESEPT learns a three-dimensional embedding with a spatial retained graph neural network from the spatial transcriptomics. The embedding is then visualized by mapping as color channels in an RGB image and segmented with a supervised convolutional neural network model. Based on a benchmark of sixteen 10x Genomics Visium spatial transcriptomics datasets on the human cortex, RESEPT infers and visualizes the tissue architecture accurately. It is noteworthy that, for the in-house AD samples, RESEPT can localize cortex layers and cell types based on a pre-defined region- or cell-type-specific genes and furthermore provide critical insights into the identification of amyloid-beta plaques in Alzheimer's disease. Interestingly, in a glioblastoma sample analysis, RESEPT distinguishes tumor-enriched, non-tumor, and regions of neuropil with infiltrating tumor cells in support of clinical and prognostic cancer applications.


Saturday December 4, 2021 9:50am - 10:00am MST
Ballroom Salon 1

10:00am MST

OP 38 - Identifying Viruses from Host Genomes and Deep Learning for Prediction of Viral Integration Sites
OP-38
Identifying Viruses from Host Genomes and Deep Learning for Prediction of Viral Integration Sites

Presenting Author: Zhongming Zhao, University of Texas Health Science Center at Houston

Abstract:  Viral infections are commonly observed in nature. Effective and efficient detection of viruses in host genomes, together with tracking how viruses interact with host genomes, are major challenges. We recently developed an algorithm called VERSE: Virus intEgration sites through iterative Reference SEquence customization, which can effectively detect viruses with viral mutations from next generation sequencing data. VERSE improves detection through customizing reference genomes. Using 19 human tumors and cancer cell lines as test data, we demonstrated that VERSE substantially enhanced the sensitivity of virus integration site detection. VERSE has been used by some large network projects such as The International Cancer Genome Consortium (ICGC, 25k whole genome sequencing data). We next manually collected and curated viral integration sites (VISs, total 77,632 sites) from published works and made them publicly available through VISDB: Viral Integration Site DataBase. Furthermore, we developed a deep learning method, DeepVISP, for viral site integration prediction and motif discovery. DeepVISP is based on deep convolutional neural network (CNN) model with attention architecture. We demonstrated DeepVISP can accurately predict oncogenic VISs in the human genome using our curated benchmark integration data of three viruses, hepatitis B virus (HBV), human herpesvirus (HPV), and Epstein-Barr virus (EBV). Comparing to six classical machine learning methods, DeepVISP achieves higher accuracy and more robust performance for all three viruses through automatically learning informative features and essential genomic positions only from the DNA sequences. A user-friendly web server is developed for predicting putative oncogenic VISs in the human genome using DeepVISP.


Presenters
ZZ

Zhongming Zhao

Professor, University of Texas
I am doing bioinformatics and genomics work.


Saturday December 4, 2021 10:00am - 10:10am MST
Ballroom Salon 1

10:10am MST

Break
Saturday December 4, 2021 10:10am - 10:30am MST
Ballroom Foyer

10:30am MST

OP 39 - Quantification and visualization of the tumor microenvironment heterogeneity from spatial transcriptomic experiments
OP-39
Quantification and visualization of the tumor microenvironment heterogeneity from spatial transcriptomic experiments

Presenting Author: Oscar Ospina, Moffitt Cancer Center

Co-Author(s):
Alex Soupir, Moffitt Cancer Center
Christopher Wilson, Moffitt Cancer Center
Anders Berglund, Moffitt Cancer Center
Inna Smalley, Moffitt Cancer Center
Kenneth Tsai, Moffitt Cancer Center

Abstract: Spatially-resolved transcriptomics (ST) allows for a better assessment of tissue structure and function. In the context of cancer research, ST promises to deepen our understanding of the tumor microenvironment and lead to improved cancer prognosis and therapies. We present spatialGE, an R package for the visualization and quantification of gene expression heterogeneity from ST experiments. Our software has adapted geostatistical methods for the 1) generation of high-resolution gene expression surfaces via spatial interpolation and 2) the quantification of spatial heterogeneity measures that can be compared against clinical information (e.g., patient survival). In addition, spatialGE includes 3) cell deconvolution methods at the spot level; 4) a fast spatially-informed clustering approach (STClust); and 5) a new data structure that allows storage and analysis of multiple ST samples simultaneously. To demonstrate the utility of spatialGE, we used a publicly available ST data set from stage III melanoma lymph node biopsies [Thrane et al (2018); Cancer Research]. Spatial variation in gene expression was observed in a number of genes, including key cancer and immune-related genes such as PMEL and IGLL5. After applying deconvolution methods (e.g., xCell, ESTIMATE), B cells showed high spatial variation across the sampled locations. Moreover, tissue sections showing the highest non-uniform spatial distributions of B cell (as quantified by Moran’s I and Geary’s C) were extracted from a patient with the highest survival time. These results provide support to the hypothesis that spatial heterogeneity in the tumor microenvironment is a potential predictor of patient outcomes.


Presenters
avatar for Oscar Ospina

Oscar Ospina

Moffitt Cancer Center


Saturday December 4, 2021 10:30am - 10:40am MST
Ballroom Salon 1

10:40am MST

OP 40 - Inferring Pediatric Sickle Disease Genotypes from Molecular Mechanistic Knowledge
OP-40
Inferring Pediatric Sickle Disease Genotypes from Molecular Mechanistic Knowledge

Presenting Author: Tiffany Callahan, University of Colorado Anschutz Medical Campus

Co-Author(s):
Jordan M. Wyrwa, University of Colorado Anschutz Medical Campus
William A. Baumgartner Jr, University of Colorado Anschutz Medical Campus
Lawrence E Hunter, University of Colorado Anschutz Medical Campus
Michael G Kahn, University of Colorado Anschutz Medical Campus

Abstract: Morbidity and mortality from sickle cell disease (SCD) varies widely. Effectively treating SCD requires genotype information. Electronic health records are a valuable source of both individual- and population-level data, but most do not contain genomic data. The objective of this work was to examine whether Med2Mech, a joint learning framework for inferring molecular characterizations of patients from clinical data and publicly available biomedical data, could be used to detect SCD genotypes. Clinical data were obtained for 2,646 pediatric rare disease (816 SCD) and 10,000 control patients from the Children's Hospital of Colorado (CHCO). Genotype data was obtained for 198 (51 HbSC, 147 HbSS) pediatric SCD patients from the Gene Expression Omnibus (GEO). Patient representations built using Med2Mech and Kruskal-Wallis nonparametric ANOVAs were used to determine if the mean rank cosine similarity between the CHCO patient groups to the SCD GEO patients differed. Results revealed that CHCO SCD patients were significantly more similar to GEO patients with their respective genotypes than to other rare disease and control patients (HbSS [n=454]: 2(3)=80,760.30, p<0.001; HbSC [n=347]: 2(3)=27,820.50, p<0.001). Further, using the inferred genotypes assigned by Med2Mech revealed that 14.4% of CHCO SCD patients had at least one potentially erroneous diagnosis and 35.3% had no occurrence of any relevant primary diagnosis. These preliminary findings support using Med2Mech to infer important patient-level data, like genotypes, from publicly available resources, which would otherwise be unavailable.


Presenters
avatar for Tiffany Callahan

Tiffany Callahan

PhD Student, University of Colorado
Computational Biologist, data scientist, and knowledge engineer, interested in pursuing opportunities at the intersection of computer science, natural language processing, and machine learning. My PhD thesis leverages graph representation learning and probabilistic reasoning of biological... Read More →


Saturday December 4, 2021 10:40am - 10:50am MST
Ballroom Salon 1

10:50am MST

OP 41 - COPD subtypes identified from blood RNA-seq data using single sample gene network perturbations
OP-41
COPD subtypes identified from blood RNA-seq data using single sample gene network perturbations

Presenting Author: Panayiotis (Takis) Benos, University of Pittsburgh

Co-Author(s):
Kristina Buschur, Columbia University
Craig Riley, University of Pittsburgh
Aabida Saferali, Brigham Women's Hospital
Peter Castaldi, Brigham Women's Hospital
Grace Zhang, University of Pittsburgh
R. Graham Barr, University of Columbia
Frank Sciurba, University of Pittsburgh
Craig Hersh, Brigham Women's Hospital

Abstract: Chronic Obstructive Pulmonary Disease (COPD) diagnosis is based on spirometric measures. However, COPD is heterogeneous in the rate of progression, response to treatment, and symptom burden. Identifying COPD subtypes from easily accessible tissue is thus very important for disease management.

A network perturbation approach was used to identify gene expression network changes in single samples from COPD patients. The single sample perturbation vectors were used to cluster patients into subtypes. We identified 4 COPD subtypes in a training cohort of 617 former smokers from COPDGene. The four subtypes differ in their symptom severity, clinical characteristics, and mortality. Two of the clusters are considered "mild", but they differ in the use of corticosteroids. Another cluster contains the most severe patients, while the last is "intermediate". These results were validated in a second cohort (n=769), also from COPDGene. Additionally, we identified several significantly deregulated genes across subtypes, including DSP and GSTM1, which have been previously associated with COPD through GWAS. These findings may constitute a significant step towards COPD subtyping. The identified subtypes can be used for new patient stratification and disease prognosis.



Presenters
TB

Takis Benos

University of Pittsburgh


Saturday December 4, 2021 10:50am - 11:00am MST
Ballroom Salon 1

11:30am MST

Keynote 6 - Who Has long-Covid? A Small and Big Data Approach - Melissa Haendel, PhD
MELISSA HAENDEL, PhD
FACMI
University of Colorado
United States

Biography (.pdf)

Who Has long-Covid? A Small and Big Data Approach

Post-acute sequelae of SARS-CoV-2 infection, or long-COVID, have severely impacted recovery from the pandemic for patients and society alike. This new disease is characterized by evolving, heterogeneous symptoms, which not only makes it a challenge to derive an unambiguous long-COVID definition but hampers clinicians' ability to offer effective and timely treatment. Clinicians and patients report distinct albeit overlapping spectra of symptoms making long-COVID classification difficult for diagnosis and care management. The clinical view is therefore incomplete. We have used the Human Phenotype Ontology to classify symptoms from patients and clinicians, which can provide subclasses of long-covid and the foundation for improved patient diagnosis and care management. Electronic health records (EHRs) could also be a good source of data for rapidly identifying patients with long-COVID. However, the aforementioned overlapping and incomplete spectra of symptoms make harvesting the correct data from heterogeneous EHR databases a significant challenge. Using the National COVID Cohort Collaborative’s (N3C) EHR repository, we developed XGBoost machine learning models to identify potential long-COVID patients. We examined demographics, healthcare utilization, diagnoses, and medications for 97,995 adult COVID-19 patients. Our models identified potential long-COVID patients with high accuracy, with important features including the rate of healthcare utilization, patient age, dyspnea, and other diagnosis and medications. Combinatorial approaches such as those presented here are especially useful in the face of a new disease with different patient trajectories and few treatment options and can provide the basis for research studies and treatment strategies.

Presenters
avatar for Melissa Haendel

Melissa Haendel

Chief Research Informatics Officer, University of Colorado
Melissa Haendel is the Chief Research Informatics Officer at University of Colorado Anschutz Medical Campus. She directs the Center for Data to Health (CD2H), the Monarch Initiative, and the National Covid Cohort Collaborative. Her background is molecular genetics and developmental... Read More →



Saturday December 4, 2021 11:30am - 12:00pm MST
Ballroom Salon 1

12:00pm MST

 
  • Timezone
  • Filter By Date Rocky 2021 Dec 1 - 4, 2021
  • Filter By Venue Snowmass Village, CO, USA
  • Filter By Type
  • Break
  • Keynote
  • Meal
  • Oral Presentations
  • Poster Session
  • Registration