Towards Trustworthy AI-Generated Text Xiuying Chen, Ph.D. Student, Computer Science Mar 24, 15:00 - 17:00 B3 L5 R5209 language models AI-generated text The emergence of large language models in text generation has markedly transformed our technological environment, significantly impacting our daily digital interactions.
Student Poster Competition: The KAUST Research Conference on Computational Advances in Structural Biology Xin Gao, Program Chair, Computer Science May 2, 12:15 - 14:00 B9 Hallway structural biology The Computational Bioscience Research Center (CBRC) will be holding a student poster competition as part of its yearly conference. This poster competition is open to KAUST students whose research is relevant to the conference theme ‘Computational Advances in Structural Biology’. To register, please click on the register button below. Once registered, you will receive instructions and submission codes for uploading your poster electronically. Register The deadline for submissions is April 1, 2023. The poster competition will be judged by an external panel of experts, and prizes will be awarded
KAUST Research Conference 2023 on Computational Advances in Structural Biology Xin Gao, Program Chair, Computer Science May 1, 08:00 - May 3, 17:00 B4 B5 A0215 structural biology Computational Bioscience Research Center (CBRC) is pleased to announce the KAUST Research Conference 2023 on Computational Advances in Structural Biology. The conference will be held from May 1-3, 2023 at the Auditorium between Building 4 & 5. The visualization of the structure of a molecule, organelle or larger entity is key to understanding its biological function. Thus, structural biology is an extremely powerful means of unraveling the fundaments of life, but also of diseases. Consequently, structural biology is at the heart of medical therapies, including drug design, and has enabled the
Drug Repositioning through the Development of Diverse Computational Methods using Machine Learning, Deep Learning, and Graph Mining Maha Thafar, Ph.D. Student, Computer Science Jun 30, 08:30 - 10:30 KAUST Computational biology machine learning Deep learning graph mining In this dissertation, we combined artificial intelligence and machine/deep learning with chemical and biological properties to develop several computational methods to solve biomedical domain problems, specifically drug repositioning, and demonstrated their efficiencies and capabilities. We developed three network-based DTI prediction methods using machine learning, graph embedding, and graph mining. These methods significantly improved prediction performance, and the best-performing method even reduces the error rate by more than 33% across all datasets compared to the best state-of-the-art method. As it is more insightful to predict continuous values that indicate how tightly the drug binds to a specific target, we conducted a comparison study of current regression-based methods that predict drug-target binding affinities (DTBA). Our methods demonstrated their efficiency and capability by achieving high prediction performance and identifying therapeutic targets for several cancer types. We further conducted a lung cancer case study of findings that support the novel predicted targets.
Towards Accurate Biomedical Genomics Anywhere Anytime - Public Colloquium Xin Gao, Program Chair, Computer Science Nov 24, 14:00 - 15:30 KAUST In this talk, I will first give an overview of the research activities in Structural and Functional Bioinformatics Group (http://sfb.kaust.edu.sa). I will then focus on our efforts on developing computational methods to tackle key open problems in Nanopore sequencing. In particular, I will introduce our recent works on developing a collection of computational methods to decode raw electrical current signal sequences into DNA sequences, to simulate raw signals of Nanopore, and to efficiently and accurately align electrical current signal sequences with DNA sequences. I will further introduce their applications in biomedicine and healthcare.
Machine Learning Models for Biomedical Ontology Integration and Analysis Fatima Zohra Smaili, Ph.D. Student, Computer Science Sep 3, 16:00 - 17:00 KAUST Biological knowledge is widely represented in the form of ontologies and ontology-based annotations. The structure and information contained in ontologies and their annotations make them valuable for use in machine learning, data analysis and knowledge extraction tasks. In this thesis, we propose the first approaches that can exploit all of the information encoded in ontologies, both formal and informal, to learn feature embeddings of biological concepts and biological entities based on their annotations to ontologies by applying transfer learning on the literature. To optimize learning that combines ontologies and natural language data such as the literature, we also propose a new approach that uses self-normalization with a deep Siamese neural network to improve learning from both the formal knowledge within ontologies and textual data. We validate the proposed algorithms by applying them to generate feature representations of proteins, and of genes and diseases.
Novel computational methods for promoter identification and analysis Ramzan Umarov, Ph.D. Student, Computer Science Feb 16, 16:00 - 18:00 B2 L5 R5209 machine learning promoters protein coding RNA genes TSS prediction tools In this dissertation, I present the methods I have developed for prediction of promoters for different organisms. Instead of focusing on the classification accuracy of the discrimination between promoter and non-promoter sequences, I predict the exact positions of the TSS inside the genomic sequences, testing every possible location. The developed methods significantly outperform the previous promoter prediction programs by considerably reducing the number of false positive predictions. Specifically, to reduce the false positive rate, the models are adaptively and iteratively trained by changing the distribution of samples in the training set based on the false positive errors made in the previous iteration. The new methods are used to gain insights into the design principles of the core promoters. Using model analysis, I have identified the most important core promoter elements and their effect on the promoter activity. I have developed a novel general approach to detect long range interactions in the input of a deep learning model, which was used to find related positions inside the promoter region. The final model was applied to the genomes of different species without a significant drop in the performance, demonstrating a high generality of the developed method.
Computational Analysis of Transcriptional Regulation after Single and Multiple Drug Administration. Trisevgeni Rapakoulia, Ph.D., Computer Science Jul 17, 10:00 - 12:00 B3 L5 R5209 Transcriptomics RNA genes drug effects breast cancer cell conversion With the advances in transcriptomic analysis, the monitoring of genome-wide gene expression provides a powerful approach for determining the action of drugs. In this thesis, we analyzed the transcriptional responses of cells treated with drugs either alone or in combinations to explore their effects in two different applications: breast cancer therapy and cell conversion.
Promoters identification and analysis Ramzan Umarov, Ph.D. Student, Computer Science Mar 7, 16:00 - 18:00 B1 L2 R2202 promoters protein coding RNA genes TSS prediction tools Promoter is a key region that is involved in differential transcription regulation of protein-coding and RNA genes. The gene-specific architecture of promoter sequences makes it extremely difficult to devise the general strategy for their computational identification.
AI4GH Seminar Series - Computational Modeling of Malaria Metabolism Reveals Different Stages and Species Nutrient Preferences and Drug Targets Alyaa M Mohamed, Ph.D., Bioscience Nov 25, 12:00 - 13:00 B2 R5220 genome plasmodial infections Malaria kills nearly one-half million people a year and over 1 billion people are at risk of becoming infected by the parasite. Plasmodial infections are difficult to treat for a myriad of reasons, but the ability of the organism to remain latent in hosts and the complex life cycles greatly contributed to the difficulty in treat malaria.
AI4GH - Computational Modeling of Malaria Metabolism Reveals Different Stages and Species Nutrient Preferences and Drug Targets Alyaa M Mohamed, Ph.D., Bioscience Nov 25, 12:00 - 13:00 B2 R5220 Malaria kills nearly one-half million people a year and over 1 billion people are at risk of becoming infected by the parasite. Plasmodial infections are difficult to treat for a myriad of reasons, but the ability of the organism to remain latent in hosts and the complex life cycles greatly contributed to the difficulty in treat malaria.
AI4GH - Vector Representation of Biological Entities Based on Ontologies and Their Annotations Fatima Zohra Smaili, Ph.D. Student, Computer Science Nov 18, 12:00 - 13:00 B2 R5220 Biological knowledge is widely represented in the form of ontology-based annotations: ontologies describe the phenomena assumed to exist within a domain, and the annotations associate a biological entity with a set of phenomena within the domain.
AI4GH Seminar Series - Towards Rational Design of Biosynthesis Pathways Meshari Alazmi, Ph.D., Computer Science Nov 11, 12:00 - 13:00 B2 R5220 machine learning bioinformatics structural biology systems biology Recent advances in genome editing and metabolic engineering enabled a precise construction of de novo biosynthesis pathways for high-value natural products. One important design decision to make for the engineering of heterologous biosynthesis systems is concerned with which foreign metabolic genes to introduce into a given host organism.
AI4GH Seminar Series - Genome-scale Regression Analysis Reveals a Linear Relationship for Promoters and Enhancers After Combinatorial Drug Treatment Trisevgeni Rapakoulia, Ph.D., Computer Science Nov 7, 12:00 - 13:00 B2 R5220 machine learning bioinformatics Drug Combinations drug effects cancer Drug combination therapy for the treatment of cancers and other multifactorial diseases has the potential of increasing the therapeutic effect while reducing the likelihood of drug resistance. In order to reduce the time and cost spent on comprehensive screens, methods are needed which can model additive effects of possible drug combinations.
Towards rational design of biosynthesis pathways Meshari Alazmi, Ph.D., Computer Science Oct 24, 17:00 - 18:30 B3 R5209 machine learning bioinformatics structural biology systems biology Recent advances in genome editing and metabolic engineering enabled a precise construction of de novo biosynthesis pathways for high-value natural products. One important design decision to make for the engineering of heterologous biosynthesis systems is concerned with which foreign metabolic genes to introduce into a given host organism.
Neural Inductive Matrix Factorization for Predicting Disease-Gene Associations Siqing Hou, M.S., Computer Science Apr 18, 10:00 - 11:30 B3 R5208 bioinformatics machine learning Disease-Gene Associations In silico prioritization of undiscovered associations can help find causal genes of newly discovered diseases. Some existing methods are based on known associations and side information of diseases and genes. We exploit the possibility of using a neural network model, Neural Inductive Matrix Completion (NIMC) in disease-gene prediction.
Computational and Statistical Interface to Big Data Xin Gao, Program Chair, Computer Science Mar 19, 08:00 - Mar 21, 17:00 B9 L2 H2 We are now in the fourth paradigm of science: Data Science. The massive amount of structured and unstructured data has posed new challenges and opportunities to the fields of computer science and statistics. Traditional computational and statistical methods for data storage, curation, sharing, querying, updating, visualization, analysis, and privacy have been shown to fail in the big data scenario due to the unprecedented volume, velocity, variety, veracity and value of the big data. This conference will bring together a number of prominent researchers in Computer Science and Statistics with common interests and active research in big data, as well as the researchers at KAUST who regularly generate or face big data, such as those in bioscience and red sea research.
Mining Genome-Scale Growth Phenotype Data through Constant-Column Biclustering Majed Alzahrani, Ph.D., Computer Science May 17, 15:00 - 17:00 B3 L5 R5209 data mining machine learning Computational biology Growth phenotype profiling of genome-wide gene-deletion strains overstresses conditions can offer a clear picture that the essentiality of genes depends on environmental conditions. In this dissertation, we first demonstrate that detecting such "co-fit" gene groups can be cast as a less well-studied problem in biclustering, i.e., constant-column biclustering. Despite significant advances in biclustering techniques, very few were designed for mining in growth phenotype data.