Marcet-Houben M, Księżopolska E, Gabaldón T.
BMC Genomics. 2024; 25 (1)
DOI: 10.1186/s12864-024-10979-8
Abstract
Background
The Nakaseomyces clade is formed by at least nine described species among which three can be pathogenic to humans, namely Nakaseomyces glabratus (Candida glabrata), the second most-common cause of candidiasis worldwide, and two rarer emerging pathogens: Nakaseomyces (Candida) nivarensis and Nakaseomyces (Candida) bracarensis. Early comparative genomics analyses identified parallel expansions of subtelomeric adhesin genes in N. glabratus and N. nivarensis/bracarensis, and suggested possible links with the emergence of the virulence potential in these species. However, as shown for N. glabratus, the proper assessment of subtelomeric genes is hindered by the use of incomplete assemblies and reliance on a single isolate.Results
Here we sequenced seven N. bracarensis isolates and reconstructed chromosome level assemblies of two divergent strains. We show that N. bracarensis isolates belong to two diverging clades that have slightly different genomic structures. We identified the set of encoded adhesins in the two complete assemblies, and uncovered the presence of a novel adhesin motif, found mainly in N. bracarensis. Our analysis revealed a larger adhesin content in N. bracarensis than previously reported, and similar in size to that of N. glabratus. We confirm the independent adhesin expansion in these two species, which could relate to their different levels of virulence.Conclusion
N. bracarensis clinical isolates belong to at least two differentiated clades. We describe a novel repeat motif found in N. bracarensis adhesins, which helps in their identification. Adhesins underwent independent expansions in N. glabratus and N. bracarensis, leading to repertoires that are qualitatively different but quantitatively similar. Given that adhesins are considered virulence factors, some of the observed differences could contribute to variations in virulence capabilities between N. glabratus and N. bracarensis.
Muntión S, Sánchez-Luis E, Díez-Campelo M, Blanco JF, Sánchez-Guijo F, De Las Rivas J.
Int J Mol Sci. 2024; 25 (22)
DOI: 10.3390/ijms252211906
Abstract
In this paper, we present a comparative analysis of the transcriptomic profile of three different human cell types: hematopoietic stem cells (HSCs), bone marrow-derived mesenchymal stem cells (MSCs) and fibroblasts (FIBs). The work aims to identify unique genes that are differentially expressed as specific markers of bone marrow-derived MSCs, and to achieve this undertakes a detailed analysis of three independent datasets that include quantification of the global gene expression profiles of three primary cell types: HSCs, MSCs and FIBs. A robust bioinformatics method, called GlobalTest, is used to assess the specific association between one or more genes expressed in a sample and the outcome variable, that is, the 'cell type' provided as a single univariate response. This outcome variable is predicted for each sample tested, based on the expression profile of the specific genes that are used as input to the test. The precision of the tests is calculated along with the statistical sensitivity and specificity for each gene in each dataset, yielding four genes that mark MSCs with high accuracy. Among these, the best performer is the protein-coding gene Transgelin (TAGLN, Gene ID: 6876) (with a Positive Predictive Value > 0.96 and FDR < 0.001), which identifies MSCs better than any of the currently used standard markers: ENG (CD105), THY1 (CD90) or NT5E (CD73). The results are validated by RT-qPCR, providing novel gene biomarkers specific for human MSCs.
Feys S, Cardinali-Benigni M, Lauwers HM, Jacobs C, Stevaert A, Gonçalves SM, Cunha C, Debaveye Y, Hermans G, Heylen J, Humblet-Baron S, Lagrou K, Maessen L, Meersseman P, Peetermans M, Redondo-Rios A, Seldeslachts L, Starick MR, Thevissen K, Vande Velde G, Vandenbriele C, Vanderbeke L, Wilmer A, Naesens L, van de Veerdonk FL, Van Weyenbergh J, Gabaldón T, Wauters J, Carvalho A.
Am J Respir Crit Care Med. 2024; 210 (10)
DOI: 10.1164/rccm.202401-0145oc
Abstract
Rationale: The influence of the lung bacterial microbiome, including potential pathogens, in patients with influenza-associated pulmonary aspergillosis (IAPA) or coronavirus disease (COVID-19)-associated pulmonary aspergillosis (CAPA) has yet to be explored. Objectives: To explore the composition of the lung bacterial microbiome and its association with viral and fungal infection, immunity, and outcome in severe influenza versus COVID-19 with or without aspergillosis. Methods: We performed a retrospective study in mechanically ventilated patients with influenza and COVID-19 with or without invasive aspergillosis in whom BAL for bacterial culture (with or without PCR) was obtained within 2 weeks after ICU admission. In addition, 16S rRNA gene sequencing data and viral and bacterial load of BAL samples from a subset of these patients, and of patients requiring noninvasive ventilation, were analyzed. We integrated 16S rRNA gene sequencing data with existing immune parameter datasets. Measurements and Main Results: Potential bacterial pathogens were detected in 20% (28/142) of patients with influenza and 37% (104/281) of patients with COVID-19, whereas aspergillosis was detected in 38% (54/142) of patients with influenza and 31% (86/281) of patients with COVID-19. A significant association between bacterial pathogens in BAL fluid and 90-day mortality was found only in patients with influenza, particularly patients with IAPA. Patients with COVID-19, but not patients with influenza, showed increased proinflammatory pulmonary cytokine responses to bacterial pathogens. Conclusions: Aspergillosis is more frequently detected in the lungs of patients with severe influenza than bacterial pathogens. Detection of bacterial pathogens associates with worse outcome in patients with influenza, particularly in those with IAPA, but not in patients with COVID-19. The immunological dynamics of tripartite viral-fungal-bacterial interactions deserve further investigation.
Langschied F, Bordin N, Cosentino S, Fuentes-Palacios D, Glover N, Hiller M, Hu Y, Huerta-Cepas J, Coelho LP, Iwasaki W, Majidian S, Manzano-Morales S, Persson E, Richards TA, Gabaldón T, Sonnhammer E, Thomas PD, Dessimoz C, Ebersberger I.
Genome Biol Evol. 2024; 16 (10)
DOI: 10.1093/gbe/evae224
Abstract
The era of biodiversity genomics is characterized by large-scale genome sequencing efforts that aim to represent each living taxon with an assembled genome. Generating knowledge from this wealth of data has not kept up with this pace. We here discuss major challenges to integrating these novel genomes into a comprehensive functional and evolutionary network spanning the tree of life. In summary, the expanding datasets create a need for scalable gene annotation methods. To trace gene function across species, new methods must seek to increase the resolution of ortholog analyses, e.g. by extending analyses to the protein domain level and by accounting for alternative splicing. Additionally, the scope of orthology prediction should be pushed beyond well-investigated proteomes. This demands the development of specialized methods for the identification of orthologs to short proteins and noncoding RNAs and for the functional characterization of novel gene families. Furthermore, protein structures predicted by machine learning are now readily available, but this new information is yet to be integrated with orthology-based analyses. Finally, an increasing focus should be placed on making orthology assignments adhere to the findable, accessible, interoperable, and reusable (FAIR) principles. This fosters green bioinformatics by avoiding redundant computations and helps integrating diverse scientific communities sharing the need for comparative genetics and genomics information. It should also help with communicating orthology-related concepts in a format that is accessible to the public, to counteract existing misinformation about evolution.
Malmierca-Merlo P, Sánchez-Garcia R, Grillo-Risco R, Pérez-Díez I, Català-Senent JF, de la Iglesia-Vayá M, Hidalgo MR, Garcia-Garcia F.
Biol Sex Differ. 2024; 15 (1)
DOI: 10.1186/s13293-024-00646-8
Cantarero-Cuenca A, Gonzalez-Jimenez A, Martínez-Núñez GM, Garrido-Sánchez L, Ranea JAG, Tinahones FJ.
J Mol Med (Berl). 2024; 102 (11)
DOI: 10.1007/s00109-024-02475-z
Abstract
Epigenetic alterations play a pivotal role in conditions influenced by environmental factors such as overweight and obesity. Many of these changes are tissue-specific, which entails a problem in its study since obtaining human tissue is a complex and invasive practice. While blood is widely used as a surrogate biomarker, it cannot directly extrapolate the evidence found in blood to tissue. Moreover, the intricacies of metabolic diseases add a new layer of complexity, as obesity leads to significant alterations in adipose tissue, potentially causing associated pathologies that can disrupt existing correlations seen in healthy individuals. Here, our objective was to determine which epigenetic markers exhibit correlations between blood and adipose tissue, regardless of the metabolic status. We collected paired blood and adipose tissue samples from 64 patients with morbidity obesity and non-obese and employed the MethylationEPIC 850 K array for analysis. We found that only a small fraction, specifically 4.3% (corresponding to 34,825 CpG sites), of the sites showed statistically significant correlations (R ≥ 0.6) between blood and adipose tissue. Within this subset, 5327 CpG sites exhibited a strong correlation (R ≥ 0.8) between blood and adipose tissue. Our findings suggest that the majority of epigenetic markers in peripheral blood do not reliably reflect changes occurring in visceral adipose tissues. However, it is important to note that there exists a distinct set of epigenetic markers that can indeed mirror changes in adipose tissue within blood samples. KEY MESSAGES: More than 8% of methylation sites exhibit similarity between blood and adipose tissues, regardless of BMI The correlation percentage between blood and adipose tissue is strongly influenced by gender The principal genes implicated in this correlation are related to metabolism or the immunological system.
Malmierca-Merlo P, Sánchez-Garcia R, Grillo-Risco R, Pérez-Díez I, Català-Senent JF, de la Iglesia-Vayá M, Hidalgo MR, Garcia-Garcia F.
Biol Sex Differ. 2024; 15 (1)
DOI: 10.1186/s13293-024-00640-0
Abstract
Background
While sex-based differences in various health scenarios have been thoroughly acknowledged in the literature, we lack sufficient tools and methods that allow for an in-depth analysis of sex as a variable in biomedical research. To fill this knowledge gap, we created MetaFun as an easy-to-use web-based tool to meta-analyze multiple transcriptomic datasets with a sex-based perspective to gain major statistical power and biological soundness.Description
MetaFun is a complete suite that allows the analysis of transcriptomics data and the exploration of the results at all levels, performing single-dataset exploratory analysis, differential gene expression, gene set functional enrichment, and finally, combining results in a functional meta-analysis. Which biological processes, molecular functions or cellular components are altered in a common pattern in different transcriptomic studies when comparing male and female patients? This and other biological questions of interest can be answered with the use of MetaFun. This tool is available at https://bioinfo.cipf.es/metafun while additional help can be found at https://gitlab.com/ubb-cipf/metafunweb/-/wikis/Summary .Conclusions
Overall, Metafun is the first open-access web-based tool to identify consensus biological functions across multiple transcriptomic datasets, helping to elucidate sex differences in numerous diseases. Its use will facilitate the generation of novel biological knowledge that can be used in the research and application of Personalized Medicine considering the sex of patients.
Andreu Z, Hidalgo MR, Masiá E, Romera-Giner S, Malmierca-Merlo P, López-Guerrero JA, García-García F, Vicent MJ.
Cell Mol Life Sci. 2024; 81 (1)
DOI: 10.1007/s00018-024-05403-z
Abstract
Identifying novel breast cancer biomarkers will improve patient stratification, enhance therapeutic outcomes, and help develop non-invasive diagnostics. We compared the proteomic profiles of whole-cell and exosomal samples of representative breast cancer cell subtypes to evaluate the potential of extracellular vesicles as non-invasive disease biomarkers in liquid biopsies. Overall, differentially-expressed proteins in whole-cell and exosome samples (which included markers for invasion, metastasis, angiogenesis, and drug resistance) effectively discriminated subtypes; furthermore, our results confirmed that the proteomic profile of exosomes reflects breast cancer cell-of-origin, which underscores their potential as disease biomarkers. Our study will contribute to identifying biomarkers that support breast cancer patient stratification and developing novel therapeutic strategies. We include an open, interactive web tool to explore the data as a molecular resource that can explain the role of these protein signatures in breast cancer classification.
Casimiro-Soriguer CS, Pérez-Florido J, Robles EA, Lara M, Aguado A, Rodríguez Iglesias MA, Lepe JA, García F, Pérez-Alegre M, Andújar E, Jiménez VE, Camino LP, Loruso N, Ameyugo U, Vazquez IM, Lozano CM, Chaves JA, Dopazo J.
Sci Rep. 2024; 14 (1)
DOI: 10.1038/s41598-024-70107-0
Abstract
The One Health approach, recognizing the interconnectedness of human, animal, and environmental health, has gained significance amid emerging zoonotic diseases and antibiotic resistance concerns. This paper aims to demonstrate the utility of a collaborative tool, the SIEGA, for monitoring infectious diseases across domains, fostering a comprehensive understanding of disease dynamics and risk factors, highlighting the pivotal role of One Health surveillance systems. Raw whole-genome sequencing is processed through different species-specific open software that additionally reports the presence of genes associated to anti-microbial resistances and virulence. The SIEGA application is a Laboratory Information Management System, that allows customizing reports, detect transmission chains, and promptly alert on alarming genetic similarities. The SIEGA initiative has successfully accumulated a comprehensive collection of more than 1900 bacterial genomes, including Salmonella enterica, Listeria monocytogenes, Campylobacter jejuni, Escherichia coli, Yersinia enterocolitica and Legionella pneumophila, showcasing its potential in monitoring pathogen transmission, resistance patterns, and virulence factors. SIEGA enables customizable reports and prompt detection of transmission chains, highlighting its contribution to enhancing vigilance and response capabilities. Here we show the potential of genomics in One Health surveillance when supported by an appropriate bioinformatic tool. By facilitating precise disease control strategies and antimicrobial resistance management, SIEGA enhances global health security and reduces the burden of infectious diseases. The integration of health data from humans, animals, and the environment, coupled with advanced genomics, underscores the importance of a holistic One Health approach in mitigating health threats.
Bars-Cortina D, Ramon E, Rius-Sansalvador B, Guinó E, Garcia-Serrano A, Mach N, Khannous-Lleiffe O, Saus E, Gabaldón T, Ibáñez-Sanz G, Rodríguez-Alonso L, Mata A, García-Rodríguez A, Obón-Santacana M, Moreno V.
BMC Genomics. 2024; 25 (1)
DOI: 10.1186/s12864-024-10621-7
Abstract
Background
Gut dysbiosis has been associated with colorectal cancer (CRC), the third most prevalent cancer in the world. This study compares microbiota taxonomic and abundance results obtained by 16S rRNA gene sequencing (16S) and whole shotgun metagenomic sequencing to investigate their reliability for bacteria profiling. The experimental design included 156 human stool samples from healthy controls, advanced (high-risk) colorectal lesion patients (HRL), and CRC cases, with each sample sequenced using both 16S and shotgun methods. We thoroughly compared both sequencing technologies at the species, genus, and family annotation levels, the abundance differences in these taxa, sparsity, alpha and beta diversities, ability to train prediction models, and the similarity of the microbial signature derived from these models.Results
As expected, the results showed that 16S detects only part of the gut microbiota community revealed by shotgun, although some genera were only profiled by 16S. The 16S abundance data was sparser and exhibited lower alpha diversity. In lower taxonomic ranks, shotgun and 16S highly differed, partially due to a disagreement in reference databases. When considering only shared taxa, the abundance was positively correlated between the two strategies. We also found a moderate correlation between the shotgun and 16S alpha-diversity measures, as well as their PCoAs. Regarding the machine learning models, only some of the shotgun models showed some degree of predictive power in an independent test set, but we could not demonstrate a clear superiority of one technology over the other. Microbial signatures from both sequencing techniques revealed taxa previously associated with CRC development, e.g., Parvimonas micra.Conclusions
Shotgun and 16S sequencing provide two different lenses to examine microbial communities. While we have demonstrated that they can unravel common patterns (including microbial signatures), shotgun often gives a more detailed snapshot than 16S, both in depth and breadth. Instead, 16S will tend to show only part of the picture, giving greater weight to dominant bacteria in a sample. Therefore, we recommend choosing one or another sequencing technique before launching a study. Specifically, shotgun sequencing is preferred for stool microbiome samples and in-depth analyses, while 16S is more suitable for tissue samples and studies with targeted aims.
Carceller H, Hidalgo MR, Escartí MJ, Nacher J, de la Iglesia-Vayá M, García-García F.
Biol Sex Differ. 2024; 15 (1)
DOI: 10.1186/s13293-024-00635-x
Abstract
Background
Schizophrenia is a severe neuropsychiatric disorder characterized by altered perception, mood, and behavior that profoundly impacts patients and society despite its relatively low prevalence. Sex-based differences have been described in schizophrenia epidemiology, symptomatology and outcomes. Different studies explored the impact of schizophrenia in the brain transcriptome, however we lack a consensus transcriptomic profile that considers sex and differentiates specific cerebral regions.Methods
We performed a systematic review on bulk RNA-sequencing studies of post-mortem brain samples. Then, we fulfilled differential expression analysis on each study and summarized their results with regions-specific meta-analyses (prefrontal cortex and hippocampus) and a global all-studies meta-analysis. Finally, we used the consensus transcriptomic profiles to functionally characterize the impact of schizophrenia in males and females by protein-protein interaction networks, enriched biological processes and dysregulated transcription factors.Results
We discovered the sex-based dysregulation of 265 genes in the prefrontal cortex, 1.414 genes in the hippocampus and 66 genes in the all-studies meta-analyses. The functional characterization of these gene sets unveiled increased processes related to immune response functions in the prefrontal cortex in male and the hippocampus in female schizophrenia patients and the overexpression of genes related to neurotransmission and synapses in the prefrontal cortex of female schizophrenia patients. Considering a meta-analysis of all brain regions available, we encountered the relative overexpression of genes related to synaptic plasticity and transmission in females and the overexpression of genes involved in organizing genetic information and protein folding in male schizophrenia patients. The protein-protein interaction networks and transcription factors activity analyses supported these sex-based profiles.Conclusions
Our results report multiple sex-based transcriptomic alterations in specific brain regions of schizophrenia patients, which provides new insight into the role of sex in schizophrenia. Moreover, we unveil a partial overlapping of inflammatory processes in the prefrontal cortex of males and the hippocampus of females.
Ksiezopolska E, Schikora-Tamarit MÀ, Carlos Nunez-Rodriguez J, Gabaldón T.
Front Cell Infect Microbiol. 2024; 14
DOI: 10.3389/fcimb.2024.1416509
Abstract
The limited number of available antifungal drugs and the increasing number of fungal isolates that show drug or multidrug resistance pose a serious medical threat. Several yeast pathogens, such as Nakaseomyces glabratus (Candida glabrata), show a remarkable ability to develop drug resistance during treatment through the acquisition of genetic mutations. However, how stable this resistance and the underlying mutations are in non-selective conditions remains poorly characterized. The stability of acquired drug resistance has fundamental implications for our understanding of the appearance and spread of drug-resistant outbreaks and for defining efficient strategies to combat them. Here, we used an in vitro evolution approach to assess the stability under optimal growth conditions of resistance phenotypes and resistance-associated mutations that were previously acquired under exposure to antifungals. Our results reveal a remarkable stability of the resistant phenotype and the underlying mutations in a significant number of evolved populations, which conserved their phenotype for at least two months in the absence of drug-selective pressure. We observed a higher stability of anidulafungin resistance over fluconazole resistance, and of resistance-conferring point mutations as compared with aneuploidies. In addition, we detected accumulation of novel mutations in previously altered resistance-associated genes in non-selective conditions, which suggest a possible compensatory role. We conclude that acquired resistance, particularly to anidulafungin, is a long-lasting phenotype, which has important implications for the persistence and propagation of drug-resistant clinical outbreaks.
Farré-Gil D, Arcon JP, Laughton CA, Orozco M.
Nucleic Acids Res. 2024; 52 (12)
DOI: 10.1093/nar/gkae444
Abstract
We present CGeNArate, a new model for molecular dynamics simulations of very long segments of B-DNA in the context of biotechnological or chromatin studies. The developed method uses a coarse-grained Hamiltonian with trajectories that are back-mapped to the atomistic resolution level with extreme accuracy by means of Machine Learning Approaches. The method is sequence-dependent and reproduces very well not only local, but also global physical properties of DNA. The efficiency of the method allows us to recover with a reduced computational effort high-quality atomic-resolution ensembles of segments containing many kilobases of DNA, entering into the gene range or even the entire DNA of certain cellular organelles.
Rodríguez-Mejías S, Degli-Esposti S, González-García S, Parra-Calderón CL.
J Biomed Inform. 2024; 156
DOI: 10.1016/j.jbi.2024.104670
Abstract
Background
Art. 50 of the proposal for a Regulation on the European Health Data Space (EHDS) states that "health data access bodies shall provide access to electronic health data only through a secure processing environment, with technical and organizational measures and security and interoperability requirements".Objective
To identify specific security measures that nodes participating in health data spaces shall implement based on the results of the IMPaCT-Data project, whose goal is to facilitate the exchange of electronic health records (EHR) between public entities based in Spain and the secondary use of this information for precision medicine research in compliance with the General Data Protection Regulation (GDPR).Data and methods
This article presents an analysis of 24 out of a list of 72 security measures identified in the Spanish National Security Scheme (ENS) and adopted by members of the federated data infrastructure developed during the IMPaCT-Data project.Results
The IMPaCT-Data case helps clarify roles and responsibilities of entities willing to participate in the EHDS by reconciling technical system notions with the legal terminology. Most relevant security measures for Data Space Gatekeepers, Enablers and Prosumers are identified and explained.Conclusion
The EHDS can only be viable as long as the fiduciary duty of care of public health authorities is preserved; this implies that the secondary use of personal data shall contribute to the public interest and/or to protect the vital interests of the data subjects. This condition can only be met if all nodes participating in a health data space adopt the appropriate organizational and technical security measures necessary to fulfill their role.
Del Olmo V, Gabaldón T.
Curr Opin Microbiol. 2024; 80
DOI: 10.1016/j.mib.2024.102491
Abstract
Hybridisation is the crossing of two divergent lineages that give rise to offspring carrying an admixture of both parental genomes. Genome sequencing has revealed that this process is common in the Saccharomycotina, where a growing number of hybrid strains or species, including many pathogenic ones, have been recently described. Hybrids can display unique traits that may drive adaptation to new niches, and some pathogenic hybrids have been shown to have higher prevalence over their parents in human and environmental niches, suggesting a higher fitness and potential to colonise humans. Here, we discuss how hybridisation and its genomic and phenotypic outcomes can shape the evolution of fungal species and may play a role in the emergence of new pathogens.
Loucera-Muñecas C, Canal-Rivero M, Ruiz-Veguilla M, Carmona R, Bostelmann G, Garrido-Torres N, Dopazo J, Crespo-Facorro B.
Sci Rep. 2024; 14 (1)
DOI: 10.1038/s41598-024-60297-y
Abstract
The relation of antipsychotics with severe Coronavirus Disease 19 (COVID-19) outcomes is a matter of debate since the beginning of the pandemic. To date, controversial results have been published on this issue. We aimed to prove whether antipsychotics might exert adverse or protective effects against fatal outcomes derived from COVID-19. A population-based retrospective cohort study (January 2020 to November 2020) comprising inpatients (15,968 patients) who were at least 18 years old and had a laboratory-confirmed COVID-19 infection. Two sub-cohorts were delineated, comprising a total of 2536 inpatients: individuals who either had no prescription medication or were prescribed an antipsychotic within the 15 days preceding hospitalization. We conducted survival and odds ratio analyses to assess the association between antipsychotic use and mortality, reporting both unadjusted and covariate-adjusted results. We computed the average treatment effects, using the untreated group as the reference, and the average treatment effect on the treated, focusing solely on the antipsychotic-treated population. Among the eight antipsychotics found to be in use, only aripiprazole showed a significant decrease in the risk of death from COVID-19 [adjusted odds ratio (OR) = 0.86; 95% CI, 0.79-0.93, multiple-testing adjusted p-value < 0.05]. Importantly, these findings were consistent for both covariate-adjusted and unadjusted analyses. Aripiprazole has been shown to have a differentiated beneficial effect in protecting against fatal clinical outcome in COVID-19 infected individuals. We speculate that the differential effect of aripiprazole on controlling immunological pathways and inducible inflammatory enzymes, that are critical in COVID19 illness, may be associated with our findings herein.
Ahaik I, Nunez-Rodríguez JC, Abrini J, Bouhdid S, Gabaldón T.
J Fungi (Basel). 2024; 10 (6)
DOI: 10.3390/jof10060373
Abstract
The incidence of Candida infections has increased in the last decade, posing a serious threat to public health. Appropriately facing this challenge requires precise epidemiological data on species and antimicrobial resistance incidence, but many countries lack appropriate surveillance programs. This study aims to bridge this gap for Morocco by identifying and phenotyping a year-long collection of clinical isolates (n = 93) from four clinics in Tetouan. We compared the current standard in species identification with molecular methods and assessed susceptibility to fluconazole and anidulafungin. Our results identified limitations in currently used diagnostics approaches, and revealed that C. albicans ranks as the most prevalent species with 60 strains (64.52%), followed by C. glabrata with 14 (15.05%), C. parapsilosis with 6 (6.45%), and C. tropicalis with 4 (4.30%). In addition, we report the first identification of C. metapsilosis in Morocco. Susceptibility results for fluconazole revealed that some isolates were approaching MICs resistance breakpoints in C. albicans (2), and C. glabrata (1). Our study also identified anidulafungin resistant strains in C. albicans (1), C. tropicalis (1), and C. krusei (2), rendering the two strains from the latter species multidrug-resistant due to their innate resistance to fluconazole. These results raise concerns about species identification and antifungal resistance in Morocco and highlight the urgent need for more accurate methods and preventive strategies to combat fungal infections in the country.
Alvarez-Romero C, Polo-Molina A, Sánchez-Úbeda EF, Jimenez-De-Juan C, Cuadri-Benitez MP, Rivas-Gonzalez JA, Portela J, Palacios R, Rodriguez-Morcillo C, Muñoz A, Parra-Calderon CL, Nieto-Martin MD, Ollero-Baturone M, Hernández-Quiles C.
JMIR Form Res. 2024; 8
DOI: 10.2196/52344
Abstract
Background
Functional impairment is one of the most decisive prognostic factors in patients with complex chronic diseases. A more significant functional impairment indicates that the disease is progressing, which requires implementing diagnostic and therapeutic actions that stop the exacerbation of the disease.Objective
This study aimed to predict alterations in the clinical condition of patients with complex chronic diseases by predicting the Barthel Index (BI), to assess their clinical and functional status using an artificial intelligence model and data collected through an internet of things mobility device.Methods
A 2-phase pilot prospective single-center observational study was designed. During both phases, patients were recruited, and a wearable activity tracker was allocated to gather physical activity data. Patients were categorized into class A (BI≤20; total dependence), class B (2060; moderate or mild dependence, or independent). Data preprocessing and machine learning techniques were used to analyze mobility data. A decision tree was used to achieve a robust and interpretable model. To assess the quality of the predictions, several metrics including the mean absolute error, median absolute error, and root mean squared error were considered. Statistical analysis was performed using SPSS and Python for the machine learning modeling.Results
Overall, 90 patients with complex chronic diseases were included: 50 during phase 1 (class A: n=10; class B: n=20; and class C: n=20) and 40 during phase 2 (class B: n=20 and class C: n=20). Most patients (n=85, 94%) had a caregiver. The mean value of the BI was 58.31 (SD 24.5). Concerning mobility aids, 60% (n=52) of patients required no aids, whereas the others required walkers (n=18, 20%), wheelchairs (n=15, 17%), canes (n=4, 7%), and crutches (n=1, 1%). Regarding clinical complexity, 85% (n=76) met patient with polypathology criteria with a mean of 2.7 (SD 1.25) categories, 69% (n=61) met the frailty criteria, and 21% (n=19) met the patients with complex chronic diseases criteria. The most characteristic symptoms were dyspnea (n=73, 82%), chronic pain (n=63, 70%), asthenia (n=62, 68%), and anxiety (n=41, 46%). Polypharmacy was presented in 87% (n=78) of patients. The most important variables for predicting the BI were identified as the maximum step count during evening and morning periods and the absence of a mobility device. The model exhibited consistency in the median prediction error with a median absolute error close to 5 in the training, validation, and production-like test sets. The model accuracy for identifying the BI class was 91%, 88%, and 90% in the training, validation, and test sets, respectively.Conclusions
Using commercially available mobility recording devices makes it possible to identify different mobility patterns and relate them to functional capacity in patients with polypathology according to the BI without using clinical parameters.
Rodríguez-Pérez H, Ciuffreda L, Hernández-Beeftink T, Guillen-Guio B, Domínguez D, Corrales A, Espinosa E, Alcoba-Florez J, Lorenzo-Salazar JM, González-Montelongo R, Villar J, Flores C.
medRxiv; 2024.
DOI: 10.1101/2024.04.08.24305484
Abstract
Background
Previous metabarcoding studies based on 16S rRNA sequencing in patients with extrapulmonary sepsis have found early pulmonary dysbiosis associated with a poor prognosis. To further discern this association, here we aimed to better characterize the pulmonary bacterial communities in these patients by leveraging metagenomics and to evaluate if the presence of antibiotic resistance genes (ARGs) could explain the higher mortality of the patients. Material and methods
Metagenomic sequencing was performed using the Nextera XT Library Prep Kit and HiSeq 4000 (Illumina Inc.) on tracheal aspirate samples that were obtained within 24 h from diagnosis from patients with extrapulmonary sepsis admitted to the Intensive Care Unit (ICU). Analysis involved MetaSpades for contig assembly, Kraken2 and Metaphlan4 for taxonomic classification, and CARD and GTDB-tk for ARGs annotation and assignment to the bacterial species. The relationship between the presence of antibiotic resistance and ICU mortality was evaluated using the Wilcoxon test and logistic regression models adjusting for clinical and demographic variables. Results
In total, 127 different ARGs were detected circumscribed only to seven patients. The most common ARGs found were from the antibiotic groups of aminoglycosides and beta-lactams, both present in most of the patients. These ARGs were found, almost entirely linked to Klebsiella pneumoniae, Escherichia coli , and Pseudomonas aeruginosa . The results also show a significant enrichment of ARGs among patients who died while admitted in the ICU (57%, 95% confidence interval [CI]: 18-90%) compared to surviving patients (20%, 95% CI: 7-40%) ( p =0.022). Analyses adjusting for clinical and demographic variables did not alter this result. Conclusion
Metagenomic sequencing has allowed an unprecedented characterization of the sepsis lung microbiome showing that antibiotic resistance is common among these patients. The results also suggest a relationship between the early accumulation of ARGs in the lung of patients with extrapulmonary sepsis who die while admitted in the ICU. Studies in independent samples will be needed to validate our findings.
Gallego D, Serrano M, Cordoba-Caballero J, Gámez A, Seoane P, Perkins JR, Ranea JAG, Pérez B.
Biochim Biophys Acta Mol Basis Dis. 2024; 1870 (5)
DOI: 10.1016/j.bbadis.2024.167163
Abstract
PMM2-CDG (MIM # 212065), the most common congenital disorder of glycosylation, is caused by the deficiency of phosphomannomutase 2 (PMM2). It is a multisystemic disease of variable severity that particularly affects the nervous system; however, its molecular pathophysiology remains poorly understood. Currently, there is no effective treatment. We performed an RNA-seq based transcriptomic study using patient-derived fibroblasts to gain insight into the mechanisms underlying the clinical symptomatology and to identify druggable targets. Systems biology methods were used to identify cellular pathways potentially affected by PMM2 deficiency, including Senescence, Bone regulation, Cell adhesion and Extracellular Matrix (ECM) and Response to cytokines. Functional validation assays using patients' fibroblasts revealed defects related to cell proliferation, cell cycle, the composition of the ECM and cell migration, and showed a potential role of the inflammatory response in the pathophysiology of the disease. Furthermore, treatment with a previously described pharmacological chaperone reverted the differential expression of some of the dysregulated genes. The results presented from transcriptomic data might serve as a platform for identifying therapeutic targets for PMM2-CDG, as well as for monitoring the effectiveness of therapeutic strategies, including pharmacological candidates and mannose-1-P, drug repurposing.
Giannoula A, Comas M, Castells X, Estupiñán-Romero F, Bernal-Delgado E, Sanz F, Sala M.
J Am Med Inform Assoc. 2024; 31 (4)
DOI: 10.1093/jamia/ocad251
Abstract
Objectives
Long-term breast cancer survivors (BCS) constitute a complex group of patients, whose number is estimated to continue rising, such that, a dedicated long-term clinical follow-up is necessary.Materials and methods
A dynamic time warping-based unsupervised clustering methodology is presented in this article for the identification of temporal patterns in the care trajectories of 6214 female BCS of a large longitudinal retrospective cohort of Spain. The extracted care-transition patterns are graphically represented using directed network diagrams with aggregated patient and time information. A control group consisting of 12 412 females without breast cancer is also used for comparison.Results
The use of radiology and hospital admission are explored as patterns of special interest. In the generated networks, a more intense and complex use of certain healthcare services (eg, radiology, outpatient care, hospital admission) is shown and quantified for the BCS. Higher mortality rates and numbers of comorbidities are observed in various transitions and compared with non-breast cancer. It is also demonstrated how a wealth of patient and time information can be revealed from individual service transitions.Discussion
The presented methodology permits the identification and descriptive visualization of temporal patterns of the usage of healthcare services by the BCS, that otherwise would remain hidden in the trajectories.Conclusion
The results could provide the basis for better understanding the BCS' circulation through the health system, with a view to more efficiently predicting their forthcoming needs and thus designing more effective personalized survivorship care plans.
Novoa J, López-Ibáñez J, Chagoyen M, Ranea JAG, Pazos F.
Database (Oxford). 2024; 2024
DOI: 10.1093/database/baae025
Abstract
The CoMentG resource contains millions of relationships between terms of biomedical interest obtained from the scientific literature. At the core of the system is a methodology for detecting significant co-mentions of concepts in the entire PubMed corpus. That method was applied to nine sets of terms covering the most important classes of biomedical concepts: diseases, symptoms/clinical signs, molecular functions, biological processes, cellular compartments, anatomic parts, cell types, bacteria and chemical compounds. We obtained more than 7 million relationships between more than 74 000 terms, and many types of relationships were not available in any other resource. As the terms were obtained from widely used resources and ontologies, the relationships are given using the standard identifiers provided by them and hence can be linked to other data. A web interface allows users to browse these associations, searching for relationships for a set of terms of interests provided as input, such as between a disease and their associated symptoms, underlying molecular processes or affected tissues. The results are presented in an interactive interface where the user can explore the reported relationships in different ways and follow links to other resources. Database URL: https://csbg.cnb.csic.es/CoMentG/.
Casimiro-Soriguer CS, Perez-Florido J, Lara M, Camacho-Martinez P, Merino-Diaz L, Pupo-Ledo I, de Salazar A, Fuentes A, Viñuela L, Chueca N, Martinez-Martinez L, Lorusso N, Lepe JA, Dopazo J, Garcia F.
Health Sci Rep. 2024; 7 (3)
DOI: 10.1002/hsr2.1965
Abstract
Background and aim
Until the May 2022 Monkeypox (MPXV) outbreak, which spread rapidly to many non-endemic countries, the virus was considered a viral zoonosis limited to some African countries. The Andalusian circuit of genomic surveillance was rapidly applied to characterize the MPXV outbreak in the South of Spain.Methods
Whole genome sequencing was used to obtain the genomic profiles of samples collected across the south of Spain, representative of all the provinces of Andalusia. Phylogenetic analysis was used to study the relationship of the isolates and the available sequences of the 2022 outbreak.Results
Whole genome sequencing of a total of 160 MPXV viruses from the different provinces that reported cases were obtained. Interestingly, we report the sequences of MPXV viruses obtained from two patients who died. While one of the isolates bore no noteworthy mutations that explain a potential heightened virulence, in another patient the second consecutive genome sequence, performed after the administration of tecovirimat, uncovered a mutation within the A0A7H0DN30 gene, known to be a prime target for tecovirimat in its Vaccinia counterpart. In general, a low number of mutations were observed in the sequences reported, which were very similar to the reference of the 2022 outbreak (OX044336), as expected from a DNA virus. The samples likely correspond to several introductions of the circulating MPXV viruses from the last outbreak. The virus sequenced from one of the two patients that died presented a mutation in a gene that bears potential connections to drug resistance. This mutation was absent in the initial sequencing before treatment.
Esteban-Medina M, de la Oliva Roque VM, Herráiz-Gil S, Peña-Chilet M, Dopazo J, Loucera C.
Comput Struct Biotechnol J. 2024; 23
DOI: 10.1016/j.csbj.2024.02.027
Abstract
We introduce drexml, a command line tool and Python package for rational data-driven drug repurposing. The package employs machine learning and mechanistic signal transduction modeling to identify drug targets capable of regulating a particular disease. In addition, it employs explainability tools to contextualize potential drug targets within the functional landscape of the disease. The methodology is validated in Fanconi Anemia and Familial Melanoma, two distinct rare diseases where there is a pressing need for solutions. In the Fanconi Anemia case, the model successfully predicts previously validated repurposed drugs, while in the Familial Melanoma case, it identifies a promising set of drugs for further investigation.
Chorostecki U, Saus E, Gabaldón T.
Front Microbiol. 2024; 15
DOI: 10.3389/fmicb.2024.1362067
Abstract
Understanding the intricate roles of RNA molecules in virulence and host-pathogen interactions can provide valuable insights into combatting infections and improving human health. Although much progress has been achieved in understanding transcriptional regulation during host-pathogen interactions in diverse species, more is needed to know about the structure of pathogen RNAs. This is particularly true for fungal pathogens, including pathogenic yeasts of the Candida genus, which are the leading cause of hospital-acquired fungal infections. Our work addresses the gap between RNA structure and their biology by employing genome-wide structure probing to comprehensively explore the structural landscape of mRNAs and long non-coding RNAs (lncRNAs) in the four major Candida pathogens. Specifically focusing on mRNA, we observe a robust correlation between sequence conservation and structural characteristics in orthologous transcripts, significantly when sequence identity exceeds 50%, highlighting structural feature conservation among closely related species. We investigate the impact of single nucleotide polymorphisms (SNPs) on mRNA secondary structure. SNPs within 5' untranslated regions (UTRs) tend to occur in less structured positions, suggesting structural constraints influencing transcript regulation. Furthermore, we compare the structural properties of coding regions and UTRs, noting that coding regions are generally more structured than UTRs, consistent with similar trends in other species. Additionally, we provide the first experimental characterization of lncRNA structures in Candida species. Most lncRNAs form independent subdomains, similar to human lncRNAs. Notably, we identify hairpin-like structures in lncRNAs, a feature known to be functionally significant. Comparing hairpin prevalence between lncRNAs and protein-coding genes, we find enrichment in lncRNAs across Candida species, humans, and Arabidopsis thaliana, suggesting a conserved role for these structures. In summary, our study offers valuable insights into the interplay between RNA sequence, structure, and function in Candida pathogens, with implications for gene expression regulation and potential therapeutic strategies against Candida infections.
Flook M, Rojano E, Gallego-Martinez A, Escalera-Balsera A, Perez-Carpena P, Moleon MDC, Gonzalez-Aguado R, Rivero de Jesus V, Domínguez-Durán E, Frejo L, G Ranea JA, Lopez-Escamez JA.
Genes Immun. 2024; 25 (2)
DOI: 10.1038/s41435-024-00260-z
Abstract
Meniere Disease (MD) is a chronic inner ear disorder characterized by vertigo attacks, sensorineural hearing loss, tinnitus, and aural fullness. Extensive evidence supporting the inflammatory etiology of MD has been found, therefore, by using transcriptome analysis, we aim to describe the inflammatory variants of MD. We performed Bulk RNAseq on 45 patients with definite MD and 15 healthy controls. MD patients were classified according to their basal levels of IL-1β into 2 groups: high and low. Differentially expression analysis was performed using the ExpHunter Suite, and cell type proportion was evaluated using the estimation algorithms xCell, ABIS, and CIBERSORTx. MD patients showed 15 differentially expressed genes (DEG) compared to controls. The top DEGs include IGHG1 (p = 1.64 × 10-6) and IGLV3-21 (p = 6.28 × 10-3), supporting a role in the adaptative immune response. Cytokine profiling defines a subgroup of patients with high levels of IL-1β with up-regulation of IL6 (p = 7.65 × 10-8) and INHBA (p = 3.39 × 10-7) genes. Transcriptomic data from peripheral blood mononuclear cells support a proinflammatory subgroup of MD patients with high levels of IL6 and an increase in naïve B-cells, and memory CD8+ T cells.
Sinaci AA, Gencturk M, Alvarez-Romero C, Laleci Erturkmen GB, Martinez-Garcia A, Escalona-Cuaresma MJ, Parra-Calderon CL.
Comput Struct Biotechnol J. 2024; 24
DOI: 10.1016/j.csbj.2024.02.014
Abstract
Objective
This paper introduces a privacy-preserving federated machine learning (ML) architecture built upon Findable, Accessible, Interoperable, and Reusable (FAIR) health data. It aims to devise an architecture for executing classification algorithms in a federated manner, enabling collaborative model-building among health data owners without sharing their datasets.Materials and methods
Utilizing an agent-based architecture, a privacy-preserving federated ML algorithm was developed to create a global predictive model from various local models. This involved formally defining the algorithm in two steps: data preparation and federated model training on FAIR health data and constructing the architecture with multiple components facilitating algorithm execution. The solution was validated by five healthcare organizations using their specific health datasets.Results
Five organizations transformed their datasets into Health Level 7 Fast Healthcare Interoperability Resources via a common FAIRification workflow and software set, thereby generating FAIR datasets. Each organization deployed a Federated ML Agent within its secure network, connected to a cloud-based Federated ML Manager. System testing was conducted on a use case aiming to predict 30-day readmission risk for chronic obstructive pulmonary disease patients and the federated model achieved an accuracy rate of 87%.Discussion
The paper demonstrated a practical application of privacy-preserving federated ML among five distinct healthcare entities, highlighting the value of FAIR health data in machine learning when utilized in a federated manner that ensures privacy protection without sharing data.Conclusion
This solution effectively leverages FAIR datasets from multiple healthcare organizations for federated ML while safeguarding sensitive health datasets, meeting legislative privacy and security requirements.
Marcet-Houben M, Cruz F, Gómez-Garrido J, Alioto TS, Nunez-Rodriguez JC, Mesanza N, Gut M, Iturritxa E, Gabaldon T.
mSystems. 2024; 9 (3)
DOI: 10.1128/msystems.00928-23
Abstract
Lecanosticta acicola is the causal agent for brown spot needle blight that affects pine trees across the northern hemisphere. Based on marker genes and microsatellite data, two distinct lineages have been identified that were introduced into Europe on two separate occasions. Despite their overall distinct geographic distribution, they have been found to coexist in regions of northern Spain and France. Here, we present the first genome-wide study of Lecanosticta acicola, including assembly of the reference genome and a population genomics analysis of 70 natural isolates from northern Spain. We show that most of the isolates belong to the southern lineage but show signs of introgression with northern lineage isolates, indicating mating between the two lineages. We also identify phenotypic differences between the two lineages based on the activity profiles of 20 enzymes, with introgressed strains being more phenotypically similar to members of the southern lineage. In conclusion, we show undergoing genetic admixture between the two main lineages of L. acicola in a region of recent expansion.
Importance
Lecanosticta acicola is a fungal pathogen causing severe defoliation, growth reduction, and even death in more than 70 conifer species. Despite the increasing incidence of this species, little is known about its population dynamics. Two divergent lineages have been described that have now been found together in regions of France and Spain, but it is unknown how these mixed populations evolve. Here we present the first reference genome for this important plant pathogenic fungi and use it to study the population genomics of 70 isolates from an affected forest in the north of Spain. We find signs of introgression between the two main lineages, indicating that active mating is occurring in this region which could propitiate the appearance of novel traits in this species. We also study the phenotypic differences across this population based on enzymatic activities on 20 compounds.
Niarakis A, Ostaszewski M, Mazein A, Kuperstein I, Kutmon M, Gillespie ME, Funahashi A, Acencio ML, Hemedan A, Aichem M, Klein K, Czauderna T, Burtscher F, Yamada TG, Hiki Y, Hiroi NF, Hu F, Pham N, Ehrhart F, Willighagen EL, Valdeolivas A, Dugourd A, Messina F, Esteban-Medina M, Peña-Chilet M, Rian K, Soliman S, Aghamiri SS, Puniya BL, Naldi A, Helikar T, Singh V, Fernández MF, Bermudez V, Tsirvouli E, Montagud A, Noël V, Ponce-de-Leon M, Maier D, Bauch A, Gyori BM, Bachman JA, Luna A, Piñero J, Furlong LI, Balaur I, Rougny A, Jarosz Y, Overall RW, Phair R, Perfetto L, Matthews L, Rex DAB, Orlic-Milacic M, Gomez LCM, De Meulder B, Ravel JM, Jassal B, Satagopam V, Wu G, Golebiewski M, Gawron P, Calzone L, Beckmann JS, Evelo CT, D'Eustachio P, Schreiber F, Saez-Rodriguez J, Dopazo J, Kuiper M, Valencia A, Wolkenhauer O, Kitano H, Barillot E, Auffray C, Balling R, Schneider R, COVID-19 Disease Map Community.
Front Immunol. 2023; 14
DOI: 10.3389/fimmu.2023.1282859
Abstract
Introduction
The COVID-19 Disease Map project is a large-scale community effort uniting 277 scientists from 130 Institutions around the globe. We use high-quality, mechanistic content describing SARS-CoV-2-host interactions and develop interoperable bioinformatic pipelines for novel target identification and drug repurposing.Methods
Extensive community work allowed an impressive step forward in building interfaces between Systems Biology tools and platforms. Our framework can link biomolecules from omics data analysis and computational modelling to dysregulated pathways in a cell-, tissue- or patient-specific manner. Drug repurposing using text mining and AI-assisted analysis identified potential drugs, chemicals and microRNAs that could target the identified key factors.Results
Results revealed drugs already tested for anti-COVID-19 efficacy, providing a mechanistic context for their mode of action, and drugs already in clinical trials for treating other diseases, never tested against COVID-19.Discussion
The key advance is that the proposed framework is versatile and expandable, offering a significant upgrade in the arsenal for virus-host interactions and other complex pathologies.
Esteban-Medina M, Loucera C, Rian K, Velasco S, Olivares-González L, Rodrigo R, Dopazo J, Peña-Chilet M.
J Transl Med. 2024; 22 (1)
DOI: 10.1186/s12967-024-04911-7
Abstract
Background
Retinitis pigmentosa is the prevailing genetic cause of blindness in developed nations with no effective treatments. In the pursuit of unraveling the intricate dynamics underlying this complex disease, mechanistic models emerge as a tool of proven efficiency rooted in systems biology, to elucidate the interplay between RP genes and their mechanisms. The integration of mechanistic models and drug-target interactions under the umbrella of machine learning methodologies provides a multifaceted approach that can boost the discovery of novel therapeutic targets, facilitating further drug repurposing in RP.Methods
By mapping Retinitis Pigmentosa-related genes (obtained from Orphanet, OMIM and HPO databases) onto KEGG signaling pathways, a collection of signaling functional circuits encompassing Retinitis Pigmentosa molecular mechanisms was defined. Next, a mechanistic model of the so-defined disease map, where the effects of interventions can be simulated, was built. Then, an explainable multi-output random forest regressor was trained using normal tissue transcriptomic data to learn causal connections between targets of approved drugs from DrugBank and the functional circuits of the mechanistic disease map. Selected target genes involvement were validated on rd10 mice, a murine model of Retinitis Pigmentosa.Results
A mechanistic functional map of Retinitis Pigmentosa was constructed resulting in 226 functional circuits belonging to 40 KEGG signaling pathways. The method predicted 109 targets of approved drugs in use with a potential effect over circuits corresponding to nine hallmarks identified. Five of those targets were selected and experimentally validated in rd10 mice: Gabre, Gabra1 (GABARα1 protein), Slc12a5 (KCC2 protein), Grin1 (NR1 protein) and Glr2a. As a result, we provide a resource to evaluate the potential impact of drug target genes in Retinitis Pigmentosa.Conclusions
The possibility of building actionable disease models in combination with machine learning algorithms to learn causal drug-disease interactions opens new avenues for boosting drug discovery. Such mechanistically-based hypotheses can guide and accelerate the experimental validations prioritizing drug target candidates. In this work, a mechanistic model describing the functional disease map of Retinitis Pigmentosa was developed, identifying five promising therapeutic candidates targeted by approved drug. Further experimental validation will demonstrate the efficiency of this approach for a systematic application to other rare diseases.
Alfonsín G, Berral-González A, Rodríguez-Alonso A, Quiroga M, De Las Rivas J, Figueroa A.
Int J Mol Sci. 2024; 25 (3)
DOI: 10.3390/ijms25031919
Abstract
The consensus molecular subtypes (CMSs) classification of colorectal cancer (CRC) is a system for patient stratification that can be potentially applied to therapeutic decisions. Hakai (CBLL1) is an E3 ubiquitin-ligase that induces the ubiquitination and degradation of E-cadherin, inducing epithelial-to-mesenchymal transition (EMT), tumour progression and metastasis. Using bioinformatic methods, we have analysed CBLL1 expression on a large integrated cohort of primary tumour samples from CRC patients. The cohort included survival data and was divided into consensus molecular subtypes. Colon cancer tumourspheres were used to analyse the expression of stem cancer cells markers via RT-PCR and Western blotting. We show that CBLL1 gene expression is specifically associated with canonical subtype CMS2. WNT target genes LGR5 and c-MYC show a similar association with CMS2 as CBLL1. These mRNA levels are highly upregulated in cancer tumourspheres, while CBLL1 silencing shows a clear reduction in tumoursphere size and in stem cell biomarkers. Importantly, CMS2 patients with high CBLL1 expression displayed worse overall survival (OS), which is similar to that associated with CMS4 tumours. Our findings reveal CBLL1 as a specific biomarker for CMS2 and the potential of using CMS2 with high CBLL1 expression to stratify patients with poor OS.
Llera-Oyola J, Carceller H, Andreu Z, Hidalgo MR, Soler-Sáez I, Gordillo F, Gómez-Cabañes B, Roson B, de la Iglesia-Vayá M, Mancuso R, Guerini FR, Mizokami A, García-García F.
Biol Sex Differ. 2024; 15 (1)
DOI: 10.1186/s13293-024-00588-1
Abstract
Background
The incidence of Alzheimer's disease (AD)-the most frequent cause of dementia-is expected to increase as life expectancies rise across the globe. While sex-based differences in AD have previously been described, there remain uncertainties regarding any association between sex and disease-associated molecular mechanisms. Studying sex-specific expression profiles of regulatory factors such as microRNAs (miRNAs) could contribute to more accurate disease diagnosis and treatment.Methods
A systematic review identified six studies of microRNA expression in AD patients that incorporated information regarding the biological sex of samples in the Gene Expression Omnibus repository. A differential microRNA expression analysis was performed, considering disease status and patient sex. Subsequently, results were integrated within a meta-analysis methodology, with a functional enrichment of meta-analysis results establishing an association between altered miRNA expression and relevant Gene Ontology terms.Results
Meta-analyses of miRNA expression profiles in blood samples revealed the alteration of sixteen miRNAs in female and 22 miRNAs in male AD patients. We discovered nine miRNAs commonly overexpressed in both sexes, suggesting a shared miRNA dysregulation profile. Functional enrichment results based on miRNA profiles revealed sex-based differences in biological processes; most affected processes related to ubiquitination, regulation of different kinase activities, and apoptotic processes in males, but RNA splicing and translation in females. Meta-analyses of miRNA expression profiles in brain samples revealed the alteration of six miRNAs in female and four miRNAs in male AD patients. We observed a single underexpressed miRNA in female and male AD patients (hsa-miR-767-5p); however, the functional enrichment analysis for brain samples did not reveal any specifically affected biological process.Conclusions
Sex-specific meta-analyses supported the detection of differentially expressed miRNAs in female and male AD patients, highlighting the relevance of sex-based information in biomedical data. Further studies on miRNA regulation in AD patients should meet the criteria for comparability and standardization of information.
Lucena-Padros H, Bravo-Gil N, Tous C, Rojano E, Seoane-Zonjic P, Fernández RM, Ranea JAG, Antiñolo G, Borrego S.
Biomolecules. 2024; 14 (2)
DOI: 10.3390/biom14020164
Abstract
Hirschsprung's disease (HSCR) is a rare developmental disorder in which enteric ganglia are missing along a portion of the intestine. HSCR has a complex inheritance, with RET as the major disease-causing gene. However, the pathogenesis of HSCR is still not completely understood. Therefore, we applied a computational approach based on multi-omics network characterization and clustering analysis for HSCR-related gene/miRNA identification and biomarker discovery. Protein-protein interaction (PPI) and miRNA-target interaction (MTI) networks were analyzed by DPClusO and BiClusO, respectively, and finally, the biomarker potential of miRNAs was computationally screened by miRNA-BD. In this study, a total of 55 significant gene-disease modules were identified, allowing us to propose 178 new HSCR candidate genes and two biological pathways. Moreover, we identified 12 key miRNAs with biomarker potential among 137 predicted HSCR-associated miRNAs. Functional analysis of new candidates showed that enrichment terms related to gene ontology (GO) and pathways were associated with HSCR. In conclusion, this approach has allowed us to decipher new clues of the etiopathogenesis of HSCR, although molecular experiments are further needed for clinical validations.
Perpiñá-Clérigues C, Mellado S, Galiana-Roselló C, Fernández-Regueras M, Marcos M, García-García F, Pascual M.
Biol Sex Differ. 2024; 15 (1)
DOI: 10.1186/s13293-024-00584-5
Abstract
Background
Alcohol use disorder (AUD) is one of the most common psychiatric disorders, with the consumption of alcohol considered a leading cause of preventable deaths worldwide. Lipids play a crucial functional role in cell membranes; however, we know little about the role of lipids in extracellular vesicles (EVs) as regulatory molecules and disease biomarkers.Methods
We employed a sensitive lipidomic strategy to characterize lipid species from the plasma EVs of AUD patients to evaluate functional roles and enzymatic activity networks to improve the knowledge of lipid metabolism after alcohol consumption. We analyzed plasma EV lipids from AUD females and males and healthy individuals to highlight lipids with differential abundance and biologically interpreted lipidomics data using LINEX2, which evaluates enzymatic dysregulation using an enrichment algorithm.Results
Our results show, for the first time, that AUD females exhibited more significant substrate-product changes in lysophosphatidylcholine/phosphatidylcholine lipids and phospholipase/acyltransferase activity, which are potentially linked to cancer progression and neuroinflammation. Conversely, AUD males suffer from dysregulated ceramide and sphingomyelin lipids involving sphingomyelinase, sphingomyelin phosphodiesterase, and sphingomyelin synthase activity, which relates to hepatotoxicity. Notably, the analysis of plasma EVs from AUD females and males demonstrates enrichment of lipid ontology terms associated with "negative intrinsic curvature" and "positive intrinsic curvature", respectively.Conclusions
Our methodological developments support an improved understanding of lipid metabolism and regulatory mechanisms, which contribute to the identification of novel lipid targets and the discovery of sex-specific clinical biomarkers in AUD.
Ramon E, Obón-Santacana M, Khannous-Lleiffe O, Saus E, Gabaldón T, Guinó E, Bars-Cortina D, Ibáñez-Sanz G, Rodríguez-Alonso L, Mata A, García-Rodríguez A, Moreno V.
Int J Mol Sci. 2024; 25 (2)
DOI: 10.3390/ijms25021181
Abstract
Colorectal cancer (CRC), the third most common cancer globally, has shown links to disturbed gut microbiota. While significant efforts have been made to establish a microbial signature indicative of CRC using shotgun metagenomic sequencing, the challenge lies in validating this signature with 16S ribosomal RNA (16S) gene sequencing. The primary obstacle is reconciling the differing outputs of these two methodologies, which often lead to divergent statistical models and conclusions. In this study, we introduce an algorithm designed to bridge this gap by mapping shotgun-derived taxa to their 16S counterparts. This mapping enables us to assess the predictive performance of a shotgun-based microbiome signature using 16S data. Our results demonstrate a reduction in performance when applying the 16S-mapped taxa in the shotgun prediction model, though it retains statistical significance. This suggests that while an exact match between shotgun and 16S data may not yet be feasible, our approach provides a viable method for comparative analysis and validation in the context of CRC-associated microbiome research.
Fuster-Martínez I, Català-Senent JF, Hidalgo MR, Roig FJ, Esplugues JV, Apostolova N, García-García F, Blas-García A.
J Pathol. 2024; 262 (3)
DOI: 10.1002/path.6242
Abstract
High-fat diet (HFD) mouse models are widely used in research to develop medications to treat non-alcoholic fatty liver disease (NAFLD), as they mimic the steatosis, inflammation, and hepatic fibrosis typically found in this complex human disease. The aims of this study were to identify a complete transcriptomic signature of these mouse models and to characterize the transcriptional impact exerted by different experimental anti-steatotic treatments. For this reason, we conducted a systematic review and meta-analysis of liver transcriptomic studies performed in HFD-fed C57BL/6J mice, comparing them with control mice and HFD-fed mice receiving potential anti-steatotic treatments. Analyzing 21 studies broaching 24 different treatments, we obtained a robust HFD transcriptomic signature that included 2,670 differentially expressed genes and 2,567 modified gene ontology biological processes. Treated HFD mice generally showed a reversion of this HFD signature, although the extent varied depending on the treatment. The biological processes most frequently reversed were those related to lipid metabolism, response to stress, and immune system, whereas processes related to nitrogen compound metabolism were generally not reversed. When comparing this HFD signature with a signature of human NAFLD progression, we identified 62 genes that were common to both; 10 belonged to the group that were reversed by treatments. Altered expression of most of these 10 genes was confirmed in vitro in hepatocytes and hepatic stellate cells exposed to a lipotoxic or a profibrogenic stimulus, respectively. In conclusion, this study provides a vast amount of information about transcriptomic changes induced during the progression and regression of NAFLD and identifies some relevant targets. Our results may help in the assessment of treatment efficacy, the discovery of unmet therapeutic targets, and the search for novel biomarkers. © 2024 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.
Schikora-Tamarit MÀ, Gabaldón T.
Nat Microbiol. 2024; 9 (1)
DOI: 10.1038/s41564-023-01547-z
Abstract
Understanding how microbial pathogens adapt to treatments, humans and clinical environments is key to infer mechanisms of virulence, transmission and drug resistance. This may help improve therapies and diagnostics for infections with a poor prognosis, such as those caused by fungal pathogens, including Candida. Here we analysed genomic variants across approximately 2,000 isolates from six Candida species (C. glabrata, C. auris, C. albicans, C. tropicalis, C. parapsilosis and C. orthopsilosis) and identified genes under recent selection, suggesting a highly complex clinical adaptation. These involve species-specific and convergently affected adaptive mechanisms, such as adhesion. Using convergence-based genome-wide association studies we identified known drivers of drug resistance alongside potentially novel players. Finally, our analyses reveal an important role of structural variants and suggest an unexpected involvement of (para)sexual recombination in the spread of resistance. Our results provide insights on how opportunistic pathogens adapt to human-related environments and unearth candidate genes that deserve future attention.
Cordoba-Caballero J, Perkins JR, García-Criado F, Gallego D, Navarro-Sánchez A, Moreno-Estellés M, Garcés C, Bonet F, Romá-Mateo C, Toro R, Perez B, Sanz P, Kohl M, Rojano E, Seoane P, Ranea JAG.
Brief Bioinform. 2024; 25 (2)
DOI: 10.1093/bib/bbae060
Abstract
A wide range of approaches can be used to detect micro RNA (miRNA)-target gene pairs (mTPs) from expression data, differing in the ways the gene and miRNA expression profiles are calculated, combined and correlated. However, there is no clear consensus on which is the best approach across all datasets. Here, we have implemented multiple strategies and applied them to three distinct rare disease datasets that comprise smallRNA-Seq and RNA-Seq data obtained from the same samples, obtaining mTPs related to the disease pathology. All datasets were preprocessed using a standardized, freely available computational workflow, DEG_workflow. This workflow includes coRmiT, a method to compare multiple strategies for mTP detection. We used it to investigate the overlap of the detected mTPs with predicted and validated mTPs from 11 different databases. Results show that there is no clear best strategy for mTP detection applicable to all situations. We therefore propose the integration of the results of the different strategies by selecting the one with the highest odds ratio for each miRNA, as the optimal way to integrate the results. We applied this selection-integration method to the datasets and showed it to be robust to changes in the predicted and validated mTP databases. Our findings have important implications for miRNA analysis. coRmiT is implemented as part of the ExpHunterSuite Bioconductor package available from https://bioconductor.org/packages/ExpHunterSuite.
Gómez-García I, Ladehesa-Pineda M, Diaz-Tocados J, López-Medina C, Abalos-Aguilera M, Ruiz-Vilches D, Paz-Lopez G, Gonzalez-Jimenez A, Ranea J, Escudero-Contreras A, Moreno-Indias I, Tinahones F, Collantes-Estévez E, Ruiz-Limón P.
Front Endocrinol (Lausanne). 2024; 15
DOI:
Schiavinato M, Del Olmo V, Muya VN, Gabaldón T.
Comput Struct Biotechnol J. 2023; 21
DOI: 10.1016/j.csbj.2023.11.003
Abstract
Heterozygosity is a genetic condition in which two or more alleles are found at a genomic locus. Individuals that are the offspring of genetically divergent yet still interfertile parents (e.g. hybrids) are highly heterozygous. One of the most studied aspects in the genomes of these individuals is the loss of their original heterozygosity (LOH) when multi-allelic sites lose one of their two alleles by converting it to the other, or by remaining hemizygous at that site. The region undergoing LOH may involve a single nucleotide polymorphism (SNP) or a longer stretch of DNA. LOH is deeply interconnected with adaptation but the in silico techniques to infer evolutionary relevant LOH blocks are hardly standardised, and a general tool to infer and analyse them across genomic contexts and species is missing. Here, we present JLOH, a computational toolkit for the inference and exploration of LOH blocks in genomes with at least 1% heterozygosity. JLOH only requires commonly available genomic sequencing data as input. Starting from mapped reads, called variants and a reference genome sequence, JLOH infers candidate LOH blocks based on SNP density (SNPs/kbp) and read coverage per position. Considering that most organisms that undergo extensive LOH are hybrids, JLOH has been designed to capture any subgenomic LOH pattern, assigning each LOH block to its subgenome of origin.
Lacruz-Pleguezuelos B, Piette O, Garranzo M, Pérez-Serrano D, Milešević J, Espinosa-Salinas I, Ramírez de Molina A, Laguna T, Carrillo de Santa Pau E.
Database (Oxford). 2023; 2023
DOI: 10.1093/database/baad075
Abstract
Food-drug interactions (FDIs) occur when a food item alters the pharmacokinetics or pharmacodynamics of a drug. FDIs can be clinically relevant, as they can hamper or enhance the therapeutic effects of a drug and impact both their efficacy and their safety. However, knowledge of FDIs in clinical practice is limited. This is partially due to the lack of resources focused on FDIs. Here, we describe FooDrugs, a database that centralizes FDI knowledge retrieved from two different approaches: a natural processing language pipeline that extracts potential FDIs from scientific documents and clinical trials and a molecular similarity approach based on the comparison of gene expression alterations caused by foods and drugs. FooDrugs database stores a total of 3 430 062 potential FDIs, with 1 108 429 retrieved from scientific documents and 2 321 633 inferred from molecular data. This resource aims to provide researchers and clinicians with a centralized repository for potential FDI information that is free and easy to use. Database URL: https://zenodo.org/records/8192515 Database DOI: https://doi.org/10.5281/zenodo.6638469.
Del Olmo V, Mixão V, Fotedar R, Saus E, Al Malki A, Księżopolska E, Nunez-Rodriguez JC, Boekhout T, Gabaldón T, Gabaldón T.
Nat Commun. 2023; 14 (1)
DOI: 10.1038/s41467-023-42679-4
Abstract
Hybridisation is a common event in yeasts often leading to genomic variability and adaptation. The yeast Candida orthopsilosis is a human-associated opportunistic pathogen belonging to the Candida parapsilosis species complex. Most C. orthopsilosis clinical isolates are hybrids resulting from at least four independent crosses between two parental lineages, of which only one has been identified. The rare presence or total absence of parentals amongst clinical isolates is hypothesised to be a consequence of a reduced pathogenicity with respect to their hybrids. Here, we sequence and analyse the genomes of environmental C. orthopsilosis strains isolated from warm marine ecosystems. We find that a majority of environmental isolates are hybrids, phylogenetically closely related to hybrid clinical isolates. Furthermore, we identify the missing parental lineage, thus providing a more complete overview of the genomic evolution of this species. Additionally, we discover phenotypic differences between the two parental lineages, as well as between parents and hybrids, under conditions relevant for pathogenesis. Our results suggest a marine origin of C. orthopsilosis hybrids, with intrinsic pathogenic potential, and pave the way to identify pre-existing environmental adaptations that rendered hybrids more prone than parental lineages to colonise and infect the mammalian host.
Loucera C, Carmona R, Esteban-Medina M, Bostelmann G, Muñoyerro-Muñiz D, Villegas R, Peña-Chilet M, Dopazo J.
Virol J. 2023; 20 (1)
DOI: 10.1186/s12985-023-02195-9
Abstract
Purpose
Despite the extensive vaccination campaigns in many countries, COVID-19 is still a major worldwide health problem because of its associated morbidity and mortality. Therefore, finding efficient treatments as fast as possible is a pressing need. Drug repurposing constitutes a convenient alternative when the need for new drugs in an unexpected medical scenario is urgent, as is the case with COVID-19.Methods
Using data from a central registry of electronic health records (the Andalusian Population Health Database), the effect of prior consumption of drugs for other indications previous to the hospitalization with respect to patient outcomes, including survival and lymphocyte progression, was studied on a retrospective cohort of 15,968 individuals, comprising all COVID-19 patients hospitalized in Andalusia between January and November 2020.Results
Covariate-adjusted hazard ratios and analysis of lymphocyte progression curves support a significant association between consumption of 21 different drugs and better patient survival. Contrarily, one drug, furosemide, displayed a significant increase in patient mortality.Conclusions
In this study we have taken advantage of the availability of a regional clinical database to study the effect of drugs, which patients were taking for other indications, on their survival. The large size of the database allowed us to control covariates effectively.
Alvarez-Romero C, Martínez-García A, Bernabeu-Wittel M, Parra-Calderón CL.
Health Res Policy Syst. 2023; 21 (1)
DOI: 10.1186/s12961-023-01026-1
Abstract
Background
Digital transformation in healthcare and the growth of health data generation and collection are important challenges for the secondary use of healthcare records in the health research field. Likewise, due to the ethical and legal constraints for using sensitive data, understanding how health data are managed by dedicated infrastructures called data hubs is essential to facilitating data sharing and reuse.Methods
To capture the different data governance behind health data hubs across Europe, a survey focused on analysing the feasibility of linking individual-level data between data collections and the generation of health data governance patterns was carried out. The target audience of this study was national, European, and global data hubs. In total, the designed survey was sent to a representative list of 99 health data hubs in January 2022.Results
In total, 41 survey responses received until June 2022 were analysed. Stratification methods were performed to cover the different levels of granularity identified in some data hubs' characteristics. Firstly, a general pattern of data governance for data hubs was defined. Afterward, specific profiles were defined, generating specific data governance patterns through the stratifications in terms of the kind of organization (centralized versus decentralized) and role (data controller or data processor) of the health data hub respondents.Conclusions
The analysis of the responses from health data hub respondents across Europe provided a list of the most frequent aspects, which concluded with a set of specific best practices on data management and governance, taking into account the constraints of sensitive data. In summary, a data hub should work in a centralized way, providing a Data Processing Agreement and a formal procedure to identify data providers, as well as data quality control, data integrity and anonymization methods.
Jiménez-Santos MJ, Nogueira-Rodríguez A, Piñeiro-Yáñez E, López-Fernández H, García-Martín S, Gómez-Plana P, Reboiro-Jato M, Gómez-López G, Glez-Peña D, Al-Shahrour F.
Nucleic Acids Res. 2023; 51 (W1)
DOI: 10.1093/nar/gkad412
Abstract
Genomics studies routinely confront researchers with long lists of tumor alterations detected in patients. Such lists are difficult to interpret since only a minority of the alterations are relevant biomarkers for diagnosis and for designing therapeutic strategies. PanDrugs is a methodology that facilitates the interpretation of tumor molecular alterations and guides the selection of personalized treatments. To do so, PanDrugs scores gene actionability and drug feasibility to provide a prioritized evidence-based list of drugs. Here, we introduce PanDrugs2, a major upgrade of PanDrugs that, in addition to somatic variant analysis, supports a new integrated multi-omics analysis which simultaneously combines somatic and germline variants, copy number variation and gene expression data. Moreover, PanDrugs2 now considers cancer genetic dependencies to extend tumor vulnerabilities providing therapeutic options for untargetable genes. Importantly, a novel intuitive report to support clinical decision-making is generated. PanDrugs database has been updated, integrating 23 primary sources that support >74K drug-gene associations obtained from 4642 genes and 14 659 unique compounds. The database has also been reimplemented to allow semi-automatic updates to facilitate maintenance and release of future versions. PanDrugs2 does not require login and is freely available at https://www.pandrugs.org/.
Alonso-Moreda N, Berral-González A, De La Rosa E, González-Velasco O, Sánchez-Santos JM, De Las Rivas J.
Int J Mol Sci. 2023; 24 (13)
DOI: 10.3390/ijms241310765
Abstract
In the last two decades, many detailed full transcriptomic studies on complex biological samples have been published and included in large gene expression repositories. These studies primarily provide a bulk expression signal for each sample, including multiple cell-types mixed within the global signal. The cellular heterogeneity in these mixtures does not allow the activity of specific genes in specific cell types to be identified. Therefore, inferring relative cellular composition is a very powerful tool to achieve a more accurate molecular profiling of complex biological samples. In recent decades, computational techniques have been developed to solve this problem by applying deconvolution methods, designed to decompose cell mixtures into their cellular components and calculate the relative proportions of these elements. Some of them only calculate the cell proportions (supervised methods), while other deconvolution algorithms can also identify the gene signatures specific for each cell type (unsupervised methods). In these work, five deconvolution methods (CIBERSORT, FARDEEP, DECONICA, LINSEED and ABIS) were implemented and used to analyze blood and immune cells, and also cancer cells, in complex mixture samples (using three bulk expression datasets). Our study provides three analytical tools (corrplots, cell-signature plots and bar-mixture plots) that allow a thorough comparative analysis of the cell mixture data. The work indicates that CIBERSORT is a robust method optimized for the identification of immune cell-types, but not as efficient in the identification of cancer cells. We also found that LINSEED is a very powerful unsupervised method that provides precise and specific gene signatures for each of the main immune cell types tested: neutrophils and monocytes (of the myeloid lineage), B-cells, NK cells and T-cells (of the lymphoid lineage), and also for cancer cells.
Pérez-Díez I, Andreu Z, Hidalgo MR, Perpiñá-Clérigues C, Fantín L, Fernandez-Serra A, de la Iglesia-Vaya M, Lopez-Guerrero JA, García-García F.
Cancers (Basel). 2023; 15 (11)
DOI: 10.3390/cancers15112887
Abstract
Pancreatic ductal adenocarcinoma (PDAC) prognoses and treatment responses remain devastatingly poor due partly to the highly heterogeneous, aggressive, and immunosuppressive nature of this tumor type. The intricate relationship between the stroma, inflammation, and immunity remains vaguely understood in the PDAC microenvironment. Here, we performed a meta-analysis of stroma-, and immune-related gene expression in the PDAC microenvironment to improve disease prognosis and therapeutic development. We selected 21 PDAC studies from the Gene Expression Omnibus and ArrayExpress databases, including 922 samples (320 controls and 602 cases). Differential gene enrichment analysis identified 1153 significant dysregulated genes in PDAC patients that contribute to a desmoplastic stroma and an immunosuppressive environment (the hallmarks of PDAC tumors). The results highlighted two gene signatures related to the immune and stromal environments that cluster PDAC patients into high- and low-risk groups, impacting patients' stratification and therapeutic decision making. Moreover, HCP5, SLFN13, IRF9, IFIT2, and IFI35 immune genes are related to the prognosis of PDAC patients for the first time.
Martínez-García A, Alvarez-Romero C, Román-Villarán E, Bernabeu-Wittel M, Luis Parra-Calderón C.
Heliyon. 2023; 9 (5)
DOI: 10.1016/j.heliyon.2023.e15733
Abstract
Background
The FAIR principles, under the open science paradigm, aim to improve the Findability, Accessibility, Interoperability and Reusability of digital data. In this sense, the FAIR4Health project aimed to apply the FAIR principles in the health research field. For this purpose, a workflow and a set of tools were developed to apply FAIR principles in health research datasets, and validated through the demonstration of the potential impact that this strategy has on health research management outcomes.Objective
This paper aims to describe the analysis of the impact on health research management outcomes of the FAIR4Health solution.Methods
To analyse the impact on health research management outcomes in terms of time and economic savings, a survey was designed and sent to experts on data management with expertise in the use of the FAIR4Health solution. Then, differences between the time and costs needed to perform the techniques with (i) standalone research, and (ii) using the proposed solution, were analyzed.Results
In the context of the health research management outcomes, the survey analysis concluded that 56.57% of the time and 16800 EUR per month could be saved if the FAIR4Health solution is used.Conclusions
Adopting principles in health research through the FAIR4Health solution saves time and, consequently, costs in the execution of research involving data management techniques.
Daneshnia F, de Almeida Júnior JN, Ilkit M, Lombardi L, Perry AM, Gao M, Nobile CJ, Egger M, Perlin DS, Zhai B, Hohl TM, Gabaldón T, Colombo AL, Hoenigl M, Arastehfar A.
Lancet Microbe. 2023; 4 (6)
DOI: 10.1016/s2666-5247(23)00067-8
Abstract
Candida parapsilosis is one of the most commen causes of life-threatening candidaemia, particularly in premature neonates, individuals with cancer of the haematopoietic system, and recipients of organ transplants. Historically, drug-susceptible strains have been linked to clonal outbreaks. However, worldwide studies started since 2018 have reported severe outbreaks among adults caused by fluconazole-resistant strains. Outbreaks caused by fluconazole-resistant strains are associated with high mortality rates and can persist despite strict infection control strategies. The emergence of resistance threatens the efficacy of azoles, which is the most widely used class of antifungals and the only available oral treatment option for candidaemia. The fact that most patients infected with fluconazole-resistant strains are azole-naive underscores the high potential adaptability of fluconazole-resistant strains to diverse hosts, environmental niches, and reservoirs. Another concern is the multidrug-resistant and echinocandin-tolerant C parapsilosis isolates, which emerged in 2020. Raising awareness, establishing effective clinical interventions, and understanding the biology and pathogenesis of fluconazole-resistant C parapsilosis are urgently needed to improve treatment strategies and outcomes.
Perpiñá-Clérigues C, Mellado S, Català-Senent JF, Ibáñez F, Costa P, Marcos M, Guerri C, García-García F, Pascual M.
Biol Sex Differ. 2023; 14 (1)
DOI: 10.1186/s13293-023-00502-1
Abstract
Background
Lipids represent essential components of extracellular vesicles (EVs), playing structural and regulatory functions during EV biogenesis, release, targeting, and cell uptake. Importantly, lipidic dysregulation has been linked to several disorders, including metabolic syndrome, inflammation, and neurological dysfunction. Our recent results demonstrated the involvement of plasma EV microRNAs as possible amplifiers and biomarkers of neuroinflammation and brain damage induced by ethanol intoxication during adolescence. Considering the possible role of plasma EV lipids as regulatory molecules and biomarkers, we evaluated how acute ethanol intoxication differentially affected the lipid composition of plasma EVs in male and female adolescents and explored the participation of the immune response.Methods
Plasma EVs were extracted from humans and wild-type (WT) and Toll-like receptor 4 deficient (TLR4-KO) mice. Preprocessing and exploratory analyses were conducted after the extraction of EV lipids and data acquisition by mass spectrometry. Comparisons between ethanol-intoxicated and control human female and male individuals and ethanol-treated and untreated WT and TLR4-KO female and male mice were used to analyze the differential abundance of lipids. Annotation of lipids into their corresponding classes and a lipid set enrichment analysis were carried out to evaluate biological functions.Results
We demonstrated, for the first time, that acute ethanol intoxication induced a higher enrichment of distinct plasma EV lipid species in human female adolescents than in males. We observed a higher content of the PA, LPC, unsaturated FA, and FAHFA lipid classes in females, whereas males showed enrichment in PI. These lipid classes participate in the formation, release, and uptake of EVs and the activation of the immune response. Moreover, we observed changes in EV lipid composition between ethanol-treated WT and TLR4-KO mice (e.g., enrichment of glycerophosphoinositols in ethanol-treated WT males), and the sex-based differences in lipid abundance are more notable in WT mice than in TLR4-KO mice. All data and results generated have been made openly available on a web-based platform ( http://bioinfo.cipf.es/sal ).Conclusions
Our results suggest that binge ethanol drinking in human female adolescents leads to a higher content of plasma EV lipid species associated with EV biogenesis and the propagation of neuroinflammatory responses than in males. In addition, we discovered greater differences in lipid abundance between sexes in WT mice compared to TLR4-KO mice. Our findings also support the potential use of EV-enriched lipids as biomarkers of ethanol-induced neuroinflammation during adolescence.
Guaita-Cespedes M, Grillo-Risco R, Hidalgo MR, Fernández-Veledo S, Burks DJ, de la Iglesia-Vayá M, Galán A, Garcia-Garcia F.
Biol Sex Differ. 2023; 14 (1)
DOI: 10.1186/s13293-023-00506-x
Abstract
Background
As the housekeeping genes (HKG) generally involved in maintaining essential cell functions are typically assumed to exhibit constant expression levels across cell types, they are commonly employed as internal controls in gene expression studies. Nevertheless, HKG may vary gene expression profile according to different variables introducing systematic errors into experimental results. Sex bias can indeed affect expression display, however, up to date, sex has not been typically considered as a biological variable.Methods
In this study, we evaluate the expression profiles of six classical housekeeping genes (four metabolic: GAPDH, HPRT, PPIA, and UBC, and two ribosomal: 18S and RPL19) to determine expression stability in adipose tissues (AT) of Homo sapiens and Mus musculus and check sex bias and their overall suitability as internal controls. We also assess the expression stability of all genes included in distinct whole-transcriptome microarrays available from the Gene Expression Omnibus database to identify sex-unbiased housekeeping genes (suHKG) suitable for use as internal controls. We perform a novel computational strategy based on meta-analysis techniques to identify any sexual dimorphisms in mRNA expression stability in AT and to properly validate potential candidates.Results
Just above half of the considered studies informed properly about the sex of the human samples, however, not enough female mouse samples were found to be included in this analysis. We found differences in the HKG expression stability in humans between female and male samples, with females presenting greater instability. We propose a suHKG signature including experimentally validated classical HKG like PPIA and RPL19 and novel potential markers for human AT and discarding others like the extensively used 18S gene due to a sex-based variability display in adipose tissue. Orthologs have also been assayed and proposed for mouse WAT suHKG signature. All results generated during this study are readily available by accessing an open web resource ( https://bioinfo.cipf.es/metafun-HKG ) for consultation and reuse in further studies.Conclusions
This sex-based research proves that certain classical housekeeping genes fail to function adequately as controls when analyzing human adipose tissue considering sex as a variable. We confirm RPL19 and PPIA suitability as sex-unbiased human and mouse housekeeping genes derived from sex-specific expression profiles, and propose new ones such as RPS8 and UBB.
Çubuk C, Loucera C, Peña-Chilet M, Dopazo J.
Int J Mol Sci. 2023; 24 (8)
DOI: 10.3390/ijms24087450
Abstract
The reprogramming of metabolism is a recognized cancer hallmark. It is well known that different signaling pathways regulate and orchestrate this reprogramming that contributes to cancer initiation and development. However, recent evidence is accumulating, suggesting that several metabolites could play a relevant role in regulating signaling pathways. To assess the potential role of metabolites in the regulation of signaling pathways, both metabolic and signaling pathway activities of Breast invasive Carcinoma (BRCA) have been modeled using mechanistic models. Gaussian Processes, powerful machine learning methods, were used in combination with SHapley Additive exPlanations (SHAP), a recent methodology that conveys causality, to obtain potential causal relationships between the production of metabolites and the regulation of signaling pathways. A total of 317 metabolites were found to have a strong impact on signaling circuits. The results presented here point to the existence of a complex crosstalk between signaling and metabolic pathways more complex than previously was thought.
Rodi M, Gross C, Sandri TL, Berner L, Marcet-Houben M, Kocak E, Pogoda M, Casadei N, Köhler C, Kreidenweiss A, Agnandji ST, Gabaldón T, Ossowski S, Held J.
Front Cell Infect Microbiol. 2023; 13
DOI: 10.3389/fcimb.2023.1159814
Abstract
Introduction
Mansonella species are filarial parasites that infect humans worldwide. Although these infections are common, knowledge of the pathology and diversity of the causative species is limited. Furthermore, the lack of sequencing data for Mansonella species, shows that their research is neglected. Apart from Mansonella perstans, a potential new species called Mansonella sp "DEUX" has been identified in Gabon, which is prevalent at high frequencies. We aimed to further determine if Mansonella sp "DEUX" is a genotype of M. perstans, or if these are two sympatric species.Methods
We screened individuals in the area of Fougamou, Gabon for Mansonella mono-infections and generated de novo assemblies from the respective samples. For evolutionary analysis, a phylogenetic tree was reconstructed, and the differences and divergence times are presented. In addition, mitogenomes were generated and phylogenies based on 12S rDNA and cox1 were created.Results
We successfully generated whole genomes for M. perstans and Mansonella sp "DEUX". Phylogenetic analysis based on annotated protein sequences, support the hypothesis of two distinct species. The inferred evolutionary analysis suggested, that M. perstans and Mansonella sp "DEUX" separated around 778,000 years ago. Analysis based on mitochondrial marker genes support our hypothesis of two sympatric human Mansonella species.Discussion
The results presented indicate that Mansonella sp "DEUX" is a new Mansonella species. These findings reflect the neglect of this research topic. And the availability of whole genome data will allow further investigations of these species.
Gundogdu P, Alamo I, Nepomuceno-Chamorro IA, Dopazo J, Loucera C.
Biology (Basel). 2023; 12 (4)
DOI: 10.3390/biology12040579
Abstract
Single-cell RNA sequencing is increasing our understanding of the behavior of complex tissues or organs, by providing unprecedented details on the complex cell type landscape at the level of individual cells. Cell type definition and functional annotation are key steps to understanding the molecular processes behind the underlying cellular communication machinery. However, the exponential growth of scRNA-seq data has made the task of manually annotating cells unfeasible, due not only to an unparalleled resolution of the technology but to an ever-increasing heterogeneity of the data. Many supervised and unsupervised methods have been proposed to automatically annotate cells. Supervised approaches for cell-type annotation outperform unsupervised methods except when new (unknown) cell types are present. Here, we introduce SigPrimedNet an artificial neural network approach that leverages (i) efficient training by means of a sparsity-inducing signaling circuits-informed layer, (ii) feature representation learning through supervised training, and (iii) unknown cell-type identification by fitting an anomaly detection method on the learned representation. We show that SigPrimedNet can efficiently annotate known cell types while keeping a low false-positive rate for unseen cells across a set of publicly available datasets. In addition, the learned representation acts as a proxy for signaling circuit activity measurements, which provide useful estimations of the cell functionalities.
Upchurch S, Palumbo E, Adams J, Bujold D, Bourque G, Nedzel J, Graham K, Kagda MS, Assis P, Hitz B, Righi E, Guigó R, Wold BJ, GA4GH RNA-Seq Task Team.
Bioinformatics. 2023; 39 (4)
DOI: 10.1093/bioinformatics/btad126
Abstract
Summary
Large-scale sharing of genomic quantification data requires standardized access interfaces. In this Global Alliance for Genomics and Health project, we developed RNAget, an API for secure access to genomic quantification data in matrix form. RNAget provides for slicing matrices to extract desired subsets of data and is applicable to all expression matrix-format data, including RNA sequencing and microarrays. Further, it generalizes to quantification matrices of other sequence-based genomics such as ATAC-seq and ChIP-seq.Availability and implementation
https://ga4gh-rnaseq.github.io/schema/docs/index.html.
Bueno-Fortes S, Berral-Gonzalez A, Sánchez-Santos JM, Martin-Merino M, De Las Rivas J.
Bioinform Adv. 2023; 3 (1)
DOI: 10.1093/bioadv/vbad037
Abstract
Motivation
Modern genomic technologies allow us to perform genome-wide analysis to find gene markers associated with the risk and survival in cancer patients. Accurate risk prediction and patient stratification based on robust gene signatures is a key path forward in personalized treatment and precision medicine. Several authors have proposed the identification of gene signatures to assign risk in patients with breast cancer (BRCA), and some of these signatures have been implemented within commercial platforms in the clinic, such as Oncotype and Prosigna. However, these platforms are black boxes in which the influence of selected genes as survival markers is unclear and where the risk scores provided cannot be clearly related to the standard clinicopathological tumor markers obtained by immunohistochemistry (IHC), which guide clinical and therapeutic decisions in breast cancer.Results
Here, we present a framework to discover a robust list of gene expression markers associated with survival that can be biologically interpreted in terms of the three main biomolecular factors (IHC clinical markers: ER, PR and HER2) that define clinical outcome in BRCA. To test and ensure the reproducibility of the results, we compiled and analyzed two independent datasets with a large number of tumor samples (1024 and 879) that include full genome-wide expression profiles and survival data. Using these two cohorts, we obtained a robust subset of gene survival markers that correlate well with the major IHC clinical markers used in breast cancer. The geneset of survival markers that we identify (which includes 34 genes) significantly improves the risk prediction provided by the genesets included in the commercial platforms: Oncotype (16 genes) and Prosigna (50 genes, i.e. PAM50). Furthermore, some of the genes identified have recently been proposed in the literature as new prognostic markers and may deserve more attention in current clinical trials to improve breast cancer risk prediction.Availability and implementation
All data integrated and analyzed in this research will be available on GitHub (https://github.com/jdelasrivas-lab/breastcancersurvsign), including the R scripts and protocols used for the analyses.Supplementary information
Supplementary data are available at Bioinformatics Advances online.
Piñero J, Rodriguez Fraga PS, Valls-Margarit J, Ronzano F, Accuosto P, Lambea Jane R, Sanz F, Furlong LI.
Comput Struct Biotechnol J. 2023; 21
DOI: 10.1016/j.csbj.2023.03.014
Abstract
The use of molecular biomarkers to support disease diagnosis, monitor its progression, and guide drug treatment has gained traction in the last decades. While only a dozen biomarkers have been approved for their exploitation in the clinic by the FDA, many more are evaluated in the context of translational research and clinical trials. Furthermore, the information on which biomarkers are measured, for which purpose, and in relation to which conditions are not readily accessible: biomarkers used in clinical studies available through resources such as ClinicalTrials.gov are described as free text, posing significant challenges in finding, analyzing, and processing them by both humans and machines. We present a text mining strategy to identify proteomic and genomic biomarkers used in clinical trials and classify them according to the methodologies by which they are measured. We find more than 3000 biomarkers used in the context of 2600 diseases. By analyzing this dataset, we uncover patterns of use of biomarkers across therapeutic areas over time, including the biomarker type and their specificity. These data are made available at the Clinical Biomarker App at https://www.disgenet.org/biomarkers/, a new portal that enables the exploration of biomarkers extracted from the clinical studies available at ClinicalTrials.gov and enriched with information from the scientific literature. The App features several metrics that assess the specificity of the biomarkers, facilitating their selection and prioritization. Overall, the Clinical Biomarker App is a valuable and timely resource about clinical biomarkers, to accelerate biomarker discovery, development, and application.
López-López D, Roldán G, Fernández-Rueda JL, Bostelmann G, Carmona R, Aquino V, Perez-Florido J, Ortuño F, Pita G, Núñez-Torres R, González-Neira A, CSVS Crowdsourcing Group, Peña-Chilet M, Dopazo J.
Hum Genomics. 2023; 17 (1)
DOI: 10.1186/s40246-023-00466-8
Abstract
Background
Despite being a very common type of genetic variation, the distribution of copy-number variations (CNVs) in the population is still poorly understood. The knowledge of the genetic variability, especially at the level of the local population, is a critical factor for distinguishing pathogenic from non-pathogenic variation in the discovery of new disease variants.Results
Here, we present the SPAnish Copy Number Alterations Collaborative Server (SPACNACS), which currently contains copy number variation profiles obtained from more than 400 genomes and exomes of unrelated Spanish individuals. By means of a collaborative crowdsourcing effort whole genome and whole exome sequencing data, produced by local genomic projects and for other purposes, is continuously collected. Once checked both, the Spanish ancestry and the lack of kinship with other individuals in the SPACNACS, the CNVs are inferred for these sequences and they are used to populate the database. A web interface allows querying the database with different filters that include ICD10 upper categories. This allows discarding samples from the disease under study and obtaining pseudo-control CNV profiles from the local population. We also show here additional studies on the local impact of CNVs in some phenotypes and on pharmacogenomic variants. SPACNACS can be accessed at: http://csvs.clinbioinfosspa.es/spacnacs/ .Conclusion
SPACNACS facilitates disease gene discovery by providing detailed information of the local variability of the population and exemplifies how to reuse genomic data produced for other purposes to build a local reference database.
Núñez-Moreno G, Tamayo A, Ruiz-Sánchez C, Cortón M, Mínguez P.
Hum Genet. 2023; 142 (4)
DOI: 10.1007/s00439-023-02539-z
Abstract
DNA variants altering the pre-mRNA splicing process represent an underestimated cause of human genetic diseases. Their association with disease traits should be confirmed using functional assays from patient cell lines or alternative models to detect aberrant mRNAs. Long-read sequencing is a suitable technique to identify and quantify mRNA isoforms. Available isoform detection and/or quantification tools are generally designed for the whole transcriptome analysis. However experiments focusing on genes of interest need more precise data fine-tuning and visualization tools.Here we describe VIsoQLR, an interactive analyzer, viewer and editor for the semi-automated identification and quantification of known and novel isoforms using long-read sequencing data. VIsoQLR is tailored to thoroughly analyze mRNA expression in splicing assays of selected genes. Our tool takes sequences aligned to a reference, and for each gene, it defines consensus splice sites and quantifies isoforms. VIsoQLR introduces features to edit the splice sites through dynamic and interactive graphics and tables, allowing accurate manual curation. Known isoforms detected by other methods can also be imported as references for comparison. A benchmark against two other popular transcriptome-based tools shows VIsoQLR accurate performance on both detection and quantification of isoforms. Here, we present VIsoQLR principles and features and its applicability in a case study example using nanopore-based long-read sequencing. VIsoQLR is available at https://github.com/TBLabFJD/VIsoQLR .
Perez-Florido J, Casimiro-Soriguer CS, Ortuño F, Fernandez-Rueda JL, Aguado A, Lara M, Riazzo C, Rodriguez-Iglesias MA, Camacho-Martinez P, Merino-Diaz L, Pupo-Ledo I, de Salazar A, Viñuela L, Fuentes A, Chueca N, The Andalusian Covid-Sequencing Initiative, García F, Dopazo J, Lepe JA.
Int J Mol Sci. 2023; 24 (3)
DOI: 10.3390/ijms24032419
Abstract
Recombination is an evolutionary strategy to quickly acquire new viral properties inherited from the parental lineages. The systematic survey of the SARS-CoV-2 genome sequences of the Andalusian genomic surveillance strategy has allowed the detection of an unexpectedly high number of co-infections, which constitute the ideal scenario for the emergence of new recombinants. Whole genome sequence of SARS-CoV-2 has been carried out as part of the genomic surveillance programme. Sample sources included the main hospitals in the Andalusia region. In addition to the increase of co-infections and known recombinants, three novel SARS-CoV-2 delta-omicron and omicron-omicron recombinant variants with two break points have been detected. Our observations document an epidemiological scenario in which co-infection and recombination are detected more frequently. Finally, we describe a family case in which co-infection is followed by the detection of a recombinant made from the two co-infecting variants. This increased number of recombinants raises the risk of emergence of recombinant variants with increased transmissibility and pathogenicity.
de la Fuente L, Del Pozo-Valero M, Perea-Romero I, Blanco-Kelly F, Fernández-Caballero L, Cortón M, Ayuso C, Mínguez P.
Int J Mol Sci. 2023; 24 (2)
DOI: 10.3390/ijms24021661
Abstract
Screening for pathogenic variants in the diagnosis of rare genetic diseases can now be performed on all genes thanks to the application of whole exome and genome sequencing (WES, WGS). Yet the repertoire of gene-disease associations is not complete. Several computer-based algorithms and databases integrate distinct gene-gene functional networks to accelerate the discovery of gene-disease associations. We hypothesize that the ability of every type of information to extract relevant insights is disease-dependent. We compiled 33 functional networks classified into 13 knowledge categories (KCs) and observed large variability in their ability to recover genes associated with 91 genetic diseases, as measured using efficiency and exclusivity. We developed GLOWgenes, a network-based algorithm that applies random walk with restart to evaluate KCs' ability to recover genes from a given list associated with a phenotype and modulates the prediction of new candidates accordingly. Comparison with other integration strategies and tools shows that our disease-aware approach can boost the discovery of new gene-disease associations, especially for the less obvious ones. KC contribution also varies if obtained using recently discovered genes. Applied to 15 unsolved WES, GLOWgenes proposed three new genes to be involved in the phenotypes of patients with syndromic inherited retinal dystrophies.
Niarakis A, Ostaszewski M, Mazein A, Kuperstein I, Kutmon M, Gillespie ME, Funahashi A, Acencio ML, Hemedan A, Aichem M, Klein K, Czauderna T, Burtscher F, Yamada TG, Hiki Y, Hiroi NF, Hu F, Pham N, Ehrhart F, Willighagen EL, Valdeolivas A, Dugourd A, Messina F, Esteban-Medina M, Peña-Chilet M, Rian K, Soliman S, Aghamiri SS, Puniya BL, Naldi A, Helikar T, Singh V, Fernández MF, Bermudez V, Tsirvouli E, Montagud A, Noël V, de Leon MP, Maier D, Bauch A, Gyori BM, Bachman JA, Luna A, Pinero J, Furlong LI, Balaur I, Rougny A, Jarosz Y, Overall RW, Phair R, Perfetto L, Matthews L, Rex DAB, Orlic-Milacic M, Cristobal MGL, De Meulder B, Ravel JM, Jassal B, Satagopam V, Wu G, Golebiewski M, Gawron P, Calzone L, Beckmann JS, Evelo CT, D’Eustachio P, Schreiber F, Saez-Rodriguez J, Dopazo J, Kuiper M, Valencia A, Wolkenhauer O, Kitano H, Barillot E, Auffray C, Balling R, Schneider R, the COVID-19 Disease Map Community.
bioRxiv; 2022.
DOI: 10.1101/2022.12.17.520865
Abstract
The COVID-19 Disease Map project is a large-scale community effort uniting 277 scientists from 130 Institutions around the globe. We use high-quality, mechanistic content describing SARS-CoV-2-host interactions and develop interoperable bioinformatic pipelines for novel target identification and drug repurposing. Community-driven and highly interdisciplinary, the project is collaborative and supports community standards, open access, and the FAIR data principles. The coordination of community work allowed for an impressive step forward in building interfaces between Systems Biology tools and platforms. Our framework links key molecules highlighted from broad omics data analysis and computational modeling to dysregulated pathways in a cell-, tissue- or patient-specific manner. We also employ text mining and AI-assisted analysis to identify potential drugs and drug targets and use topological analysis to reveal interesting structural features of the map. The proposed framework is versatile and expandable, offering a significant upgrade in the arsenal used to understand virus-host interactions and other complex pathologies.
López-Cerdán A, Andreu Z, Hidalgo MR, Grillo-Risco R, Català-Senent JF, Soler-Sáez I, Neva-Alejo A, Gordillo F, de la Iglesia-Vayá M, García-García F.
Biol Sex Differ. 2022; 13 (1)
DOI: 10.1186/s13293-022-00477-5
Abstract
Background
In recent decades, increasing longevity (among other factors) has fostered a rise in Parkinson's disease incidence. Although not exhaustively studied in this devastating disease, the impact of sex represents a critical variable in Parkinson's disease as epidemiological and clinical features differ between males and females.Methods
To study sex bias in Parkinson's disease, we conducted a systematic review to select sex-labeled transcriptomic data from three relevant brain tissues: the frontal cortex, the striatum, and the substantia nigra. We performed differential expression analysis on each study chosen. Then we summarized the individual differential expression results with three tissue-specific meta-analyses and a global all-tissues meta-analysis. Finally, results from the meta-analysis were functionally characterized using different functional profiling approaches.Results
The tissue-specific meta-analyses linked Parkinson's disease to the enhanced expression of MED31 in the female frontal cortex and the dysregulation of 237 genes in the substantia nigra. The global meta-analysis detected 15 genes with sex-differential patterns in Parkinson's disease, which participate in mitochondrial function, oxidative stress, neuronal degeneration, and cell death. Furthermore, functional analyses identified pathways, protein-protein interaction networks, and transcription factors that differed by sex. While male patients exhibited changes in oxidative stress based on metal ions, inflammation, and angiogenesis, female patients exhibited dysfunctions in mitochondrial and lysosomal activity, antigen processing and presentation functions, and glutamic and purine metabolism. All results generated during this study are readily available by accessing an open web resource ( http://bioinfo.cipf.es/metafun-pd/ ) for consultation and reuse in further studies.Conclusions
Our in silico approach has highlighted sex-based differential mechanisms in typical Parkinson Disease hallmarks (inflammation, mitochondrial dysfunction, and oxidative stress). Additionally, we have identified specific genes and transcription factors for male and female Parkinson Disease patients that represent potential candidates as biomarkers to diagnosis.
Sorzano COS, Vilas JL, Ramírez-Aportela E, Krieger J, Del Hoyo D, Herreros D, Fernandez-Giménez E, Marchán D, Macías JR, Sánchez I, Del Caño L, Fonseca-Reyna Y, Conesa P, García-Mena A, Burguet J, García Condado J, Méndez García J, Martínez M, Muñoz-Barrutia A, Marabini R, Vargas J, Carazo JM.
Faraday Discuss. 2022; 240 (0)
DOI: 10.1039/d2fd00059h
Abstract
The number of maps deposited in public databases (Electron Microscopy Data Bank, EMDB) determined by cryo-electron microscopy has quickly grown in recent years. With this rapid growth, it is critical to guarantee their quality. So far, map validation has primarily focused on the agreement between maps and models. From the image processing perspective, the validation has been mostly restricted to using two half-maps and the measurement of their internal consistency. In this article, we suggest that map validation can be taken much further from the point of view of image processing if 2D classes, particles, angles, coordinates, defoci, and micrographs are also provided. We present a progressive validation scheme that qualifies a result validation status from 0 to 5 and offers three optional qualifiers (A, W, and O) that can be added. The simplest validation state is 0, while the most complete would be 5AWO. This scheme has been implemented in a website https://biocomp.cnb.csic.es/EMValidationService/ to which reconstructed maps and their ESI can be uploaded.
Gabaldón T, Hittinger CT.
Front Fungal Biol. 2022; 3
DOI: 10.3389/ffunb.2022.1063609
Marcet-Houben M, Alvarado M, Ksiezopolska E, Saus E, de Groot PWJ, Gabaldón T.
BMC Biol. 2022; 20 (1)
DOI: 10.1186/s12915-022-01412-1
Abstract
Background
Candida glabrata is an opportunistic yeast pathogen thought to have a large genetic and phenotypic diversity and a highly plastic genome. However, the lack of chromosome-level genome assemblies representing this diversity limits our ability to accurately establish how chromosomal structure and gene content vary across strains.Results
Here, we expanded publicly available assemblies by using long-read sequencing technologies in twelve diverse strains, obtaining a final set of twenty-one chromosome-level genomes spanning the known C. glabrata diversity. Using comparative approaches, we inferred variation in chromosome structure and determined the pan-genome, including an analysis of the adhesin gene repertoire. Our analysis uncovered four new adhesin orthogroups and inferred a rich ancestral adhesion repertoire, which was subsequently shaped through a still ongoing process of gene loss, gene duplication, and gene conversion.Conclusions
C. glabrata has a largely stable pan-genome except for a highly variable subset of genes encoding cell wall-associated functions. Adhesin repertoire was established for each strain and showed variability among clades.
Pérez-Granado J, Piñero J, Furlong LI.
Front Genet. 2022; 13
DOI: 10.3389/fgene.2022.1006903
Abstract
Our knowledge of complex disorders has increased in the last years thanks to the identification of genetic variants (GVs) significantly associated with disease phenotypes by genome-wide association studies (GWAS). However, we do not understand yet how these GVs functionally impact disease pathogenesis or their underlying biological mechanisms. Among the multiple post-GWAS methods available, fine-mapping and colocalization approaches are commonly used to identify causal GVs, meaning those with a biological effect on the trait, and their functional effects. Despite the variety of post-GWAS tools available, there is no guideline for method eligibility or validity, even though these methods work under different assumptions when accounting for linkage disequilibrium and integrating molecular annotation data. Moreover, there is no benchmarking of the available tools. In this context, we have applied two different fine-mapping and colocalization methods to the same GWAS on major depression (MD) and expression quantitative trait loci (eQTL) datasets. Our goal is to perform a systematic comparison of the results obtained by the different tools. To that end, we have evaluated their results at different levels: fine-mapped and colocalizing GVs, their target genes and tissue specificity according to gene expression information, as well as the biological processes in which they are involved. Our findings highlight the importance of fine-mapping as a key step for subsequent analysis. Notably, the colocalizing variants, altered genes and targeted tissues differed between methods, even regarding their biological implications. This contribution illustrates an important issue in post-GWAS analysis with relevant consequences on the use of GWAS results for elucidation of disease pathobiology, drug target prioritization and biomarker discovery.
Naranjo-Ortiz MA, Molina M, Fuentes D, Mixão V, Gabaldón T.
Gigascience. 2022; 11
DOI: 10.1093/gigascience/giac088
Abstract
Background
Recent technological developments have made genome sequencing and assembly highly accessible and widely used. However, the presence in sequenced organisms of certain genomic features such as high heterozygosity, polyploidy, aneuploidy, heterokaryosis, or extreme compositional biases can challenge current standard assembly procedures and result in highly fragmented assemblies. Hence, we hypothesized that genome databases must contain a nonnegligible fraction of low-quality assemblies that result from such type of intrinsic genomic factors.Findings
Here we present Karyon, a Python-based toolkit that uses raw sequencing data and de novo genome assembly to assess several parameters and generate informative plots to assist in the identification of nonchanonical genomic traits. Karyon includes automated de novo genome assembly and variant calling pipelines. We tested Karyon by diagnosing 35 highly fragmented publicly available assemblies from 19 different Mucorales (Fungi) species.Conclusions
Our results show that 10 (28.57%) of the assemblies presented signs of unusual genomic configurations, suggesting that these are common, at least for some lineages within the Fungi.
Loucera C, Perez-Florido J, Casimiro-Soriguer CS, Ortuño FM, Carmona R, Bostelmann G, Martínez-González LJ, Muñoyerro-Muñiz D, Villegas R, Rodriguez-Baño J, Romero-Gomez M, Lorusso N, Garcia-León J, Navarro-Marí JM, Camacho-Martinez P, Merino-Diaz L, Salazar A, Viñuela L, The Andalusian Covid-Sequencing Initiative, Lepe JA, Garcia F, Dopazo J.
Viruses. 2022; 14 (9)
DOI: 10.3390/v14091893
Abstract
Objectives
More than two years into the COVID-19 pandemic, SARS-CoV-2 still remains a global public health problem. Successive waves of infection have produced new SARS-CoV-2 variants with new mutations for which the impact on COVID-19 severity and patient survival is uncertain.Methods
A total of 764 SARS-CoV-2 genomes, sequenced from COVID-19 patients, hospitalized from 19th February 2020 to 30 April 2021, along with their clinical data, were used for survival analysis.Results
A significant association of B.1.1.7, the alpha lineage, with patient mortality (log hazard ratio (LHR) = 0.51, C.I. = [0.14,0.88]) was found upon adjustment by all the covariates known to affect COVID-19 prognosis. Moreover, survival analysis of mutations in the SARS-CoV-2 genome revealed 27 of them were significantly associated with higher mortality of patients. Most of these mutations were located in the genes coding for the S, ORF8, and N proteins.Conclusions
This study illustrates how a combination of genomic and clinical data can provide solid evidence for the impact of viral lineage on patient survival.
Jiménez-Santos MJ, García-Martín S, Fustero-Torre C, Di Domenico T, Gómez-López G, Al-Shahrour F.
Mol Oncol. 2022; 16 (21)
DOI: 10.1002/1878-0261.13286
Abstract
Tumour heterogeneity is one of the main characteristics of cancer and can be categorised into inter- or intratumour heterogeneity. This heterogeneity has been revealed as one of the key causes of treatment failure and relapse. Precision oncology is an emerging field that seeks to design tailored treatments for each cancer patient according to epidemiological, clinical and omics data. This discipline relies on bioinformatics tools designed to compute scores to prioritise available drugs, with the aim of helping clinicians in treatment selection. In this review, we describe the current approaches for therapy selection depending on which type of tumour heterogeneity is being targeted and the available next-generation sequencing data. We cover intertumour heterogeneity studies and individual treatment selection using genomics variants, expression data or multi-omics strategies. We also describe intratumour dissection through clonal inference and single-cell transcriptomics, in each case providing bioinformatics tools for tailored treatment selection. Finally, we discuss how these therapy selection workflows could be integrated into the clinical practice.
Loucera C, Carmona R, Esteban-Medina M, Bostelmann G, Muñoyerro-Muñiz D, Villegas R, Peña-Chilet M, Dopazo J.
medRxiv; 2022.
DOI: 10.1101/2022.08.14.22278751
Abstract
Despite the extensive vaccination campaigns in many countries, COVID-19 is still a major worldwide health problem because of its associated morbidity and mortality. Therefore, finding efficient treatments as fast as possible is a pressing need. Drug repurposing constitutes a convenient alternative when the need for new drugs in an unexpected medical scenario is urgent, as is the case with COVID-19. Using data from a central registry of electronic health records (the Andalusian Population Health Database, BPS), the effect of prior consumption of drugs for other indications previous to the hospitalization with respect to patient survival was studied on a retrospective cohort of 15,968 individuals, comprising all COVID-19 patients hospitalized in Andalusia between January and November 2020. Covariate-adjusted hazard ratios and analysis of lymphocyte progression curves support a significant association between consumption of 21 different drugs and better patient survival. Contrarily, one drug, furosemide, displayed a significant increase in patient mortality.
Snyder M, Iraola-Guzmán S, Saus E, Gabaldón T.
Cancers (Basel). 2022; 14 (16)
DOI: 10.3390/cancers14163866
Abstract
Colorectal cancer (CRC) is the third most prevalent cancer worldwide, with nearly two million newly diagnosed cases each year. The survival of patients with CRC greatly depends on the cancer stage at the time of diagnosis, with worse prognosis for more advanced cases. Consequently, considerable effort has been directed towards improving population screening programs for early diagnosis and identifying prognostic markers that can better inform treatment strategies. In recent years, long non-coding RNAs (lncRNAs) have been recognized as promising molecules, with diagnostic and prognostic potential in many cancers, including CRC. Although large-scale genome and transcriptome sequencing surveys have identified many lncRNAs that are altered in CRC, most of their roles in disease onset and progression remain poorly understood. Here, we critically review the variety of detection methods and types of supporting evidence for the involvement of lncRNAs in CRC. In addition, we provide a reference catalog that features the most clinically relevant lncRNAs in CRC. These lncRNAs were selected based on recent studies sorted by stringent criteria for both supporting experimental evidence and reproducibility.
Iancu IF, Perea-Romero I, Núñez-Moreno G, de la Fuente L, Romero R, Ávila-Fernandez A, Trujillo-Tiebas MJ, Riveiro-Álvarez R, Almoguera B, Martín-Mérida I, Del Pozo-Valero M, Damián-Verde A, Cortón M, Ayuso C, Minguez P.
Int J Mol Sci. 2022; 23 (15)
DOI: 10.3390/ijms23158431
Abstract
The introduction of NGS in genetic diagnosis has increased the repertoire of variants and genes involved and the amount of genomic information produced. We built an allelic-frequency (AF) database for a heterogeneous cohort of genetic diseases to explore the aggregated genomic information and boost diagnosis in inherited retinal dystrophies (IRD). We retrospectively selected 5683 index-cases with clinical exome sequencing tests available, 1766 with IRD and the rest with diverse genetic diseases. We calculated a subcohort's IRD-specific AF and compared it with suitable pseudocontrols. For non-solved IRD cases, we prioritized variants with a significant increment of frequencies, with eight variants that may help to explain the phenotype, and 10/11 of uncertain significance that were reclassified as probably pathogenic according to ACMG. Moreover, we developed a method to highlight genes with more frequent pathogenic variants in IRD cases than in pseudocontrols weighted by the increment of benign variants in the same comparison. We identified 18 genes for further studies that provided new insights in five cases. This resource can also help one to calculate the carrier frequency in IRD genes. A cohort-specific AF database assists with variants and genes prioritization and operates as an engine that provides a new hypothesis in non-solved cases, augmenting the diagnosis rate.
Moya-García AA, González-Jiménez A, Moreno F, Stephens C, Lucena MI, Ranea JAG.
Genes (Basel). 2022; 13 (7)
DOI: 10.3390/genes13071292
Abstract
Among adverse drug reactions, drug-induced liver injury presents particular challenges because of its complexity, and the underlying mechanisms are still not completely characterized. Our knowledge of the topic is limited and based on the assumption that a drug acts on one molecular target. We have leveraged drug polypharmacology, i.e., the ability of a drug to bind multiple targets and thus perturb several biological processes, to develop a systems pharmacology platform that integrates all drug-target interactions. Our analysis sheds light on the molecular mechanisms of drugs involved in drug-induced liver injury and provides new hypotheses to study this phenomenon.
Pérez-Granado J, Piñero J, Medina-Rivera A, Furlong LI.
Genes (Basel). 2022; 13 (7)
DOI: 10.3390/genes13071259
Abstract
Understanding the molecular basis of major depression is critical for identifying new potential biomarkers and drug targets to alleviate its burden on society. Leveraging available GWAS data and functional genomic tools to assess regulatory variation could help explain the role of major depression-associated genetic variants in disease pathogenesis. We have conducted a fine-mapping analysis of genetic variants associated with major depression and applied a pipeline focused on gene expression regulation by using two complementary approaches: cis-eQTL colocalization analysis and alteration of transcription factor binding sites. The fine-mapping process uncovered putative causally associated variants whose proximal genes were linked with major depression pathophysiology. Four colocalizing genetic variants altered the expression of five genes, highlighting the role of SLC12A5 in neuronal chlorine homeostasis and MYRF in nervous system myelination and oligodendrocyte differentiation. The transcription factor binding analysis revealed the potential role of rs62259947 in modulating P4HTM expression by altering the YY1 binding site, altogether regulating hypoxia response. Overall, our pipeline could prioritize putative causal genetic variants in major depression. More importantly, it can be applied when only index genetic variants are available. Finally, the presented approach enabled the proposal of mechanistic hypotheses of these genetic variants and their role in disease pathogenesis.
Leis A, Casadevall D, Albanell J, Posso M, Macià F, Castells X, Ramírez-Anguita JM, Martínez Roldán J, Furlong LI, Sanz F, Ronzano F, Mayer MA.
JMIR Cancer. 2022; 8 (3)
DOI: 10.2196/39003
Abstract
Background
A cancer diagnosis is a source of psychological and emotional stress, which are often maintained for sustained periods of time that may lead to depressive disorders. Depression is one of the most common psychological conditions in patients with cancer. According to the Global Cancer Observatory, breast and colorectal cancers are the most prevalent cancers in both sexes and across all age groups in Spain.Objective
This study aimed to compare the prevalence of depression in patients before and after the diagnosis of breast or colorectal cancer, as well as to assess the usefulness of the analysis of free-text clinical notes in 2 languages (Spanish or Catalan) for detecting depression in combination with encoded diagnoses.Methods
We carried out an analysis of the electronic health records from a general hospital by considering the different sources of clinical information related to depression in patients with breast and colorectal cancer. This analysis included ICD-9-CM (International Classification of Diseases, Ninth Revision, Clinical Modification) diagnosis codes and unstructured information extracted by mining free-text clinical notes via natural language processing tools based on Systematized Nomenclature of Medicine Clinical Terms that mentions symptoms and drugs used for the treatment of depression.Results
We observed that the percentage of patients diagnosed with depressive disorders significantly increased after cancer diagnosis in the 2 types of cancer considered-breast and colorectal cancers. We managed to identify a higher number of patients with depression by mining free-text clinical notes than the group selected exclusively on ICD-9-CM codes, increasing the number of patients diagnosed with depression by 34.8% (441/1269). In addition, the number of patients with depression who received chemotherapy was higher than those who did not receive this treatment, with significant differences (P<.001).Conclusions
This study provides new clinical evidence of the depression-cancer comorbidity and supports the use of natural language processing for extracting and analyzing free-text clinical notes from electronic health records, contributing to the identification of additional clinical data that complements those provided by coded data to improve the management of these patients.
Loucera C, Perez-Florido J, Casimiro-Soriguer CS, Ortuño FM, Carmona R, Bostelmann G, Martínez-González LJ, Muñoyerro-Muñiz D, Villegas R, Rodriguez-Baño J, Romero-Gomez M, Lorusso N, Garcia-León J, Navarro-Marí JM, Camacho-Martinez P, Merino-Diaz L, de Salazar A, Viñuela L, The Andalusian COVID-19 sequencing initiative, Lepe JA, Garcia F, Dopazo J.
medRxiv; 2022.
DOI: 10.1101/2022.07.07.22277353
Abstract
After more than two years of COVID-19 pandemic, SARS-CoV-2 still remains a global public health problem. Successive waves of infection have produced new SARS-CoV-2 variants with new mutations whose impact on COVID-19 severity and patient survival is uncertain. A total of 764 SARS-CoV-2 genomes sequenced from COVID-19 patients, hospitalized from 19th February 2020 to 30st April 2021, along with their clinical data, were used for survival analysis. A significant association of B.1.1.7, the alpha lineage, with patient mortality (Log Hazard ratio LHR=0.51, C.I.=[0.14,0.88]) was found upon adjustment by all the covariates known to affect COVID-19 prognosis. Moreover, survival analysis of mutations in the SARS-CoV-2 genome rendered 27 of them significantly associated with higher mortality of patients. Most of these mutations were located in the S, ORF8 and N proteins. This study illustrates how a combination of genomic and clinical data provide solid evidence on the impact of viral lineage on patient survival.
Alvarez-Romero C, Martinez-Garcia A, Ternero Vega J, Díaz-Jimènez P, Jimènez-Juan C, Nieto-Martín MD, Román Villarán E, Kovacevic T, Bokan D, Hromis S, Djekic Malbasa J, Beslać S, Zaric B, Gencturk M, Sinaci AA, Ollero Baturone M, Parra Calderón CL.
JMIR Med Inform. 2022; 10 (6)
DOI: 10.2196/35307
Abstract
Background
Owing to the nature of health data, their sharing and reuse for research are limited by legal, technical, and ethical implications. In this sense, to address that challenge and facilitate and promote the discovery of scientific knowledge, the Findable, Accessible, Interoperable, and Reusable (FAIR) principles help organizations to share research data in a secure, appropriate, and useful way for other researchers.Objective
The objective of this study was the FAIRification of existing health research data sets and applying a federated machine learning architecture on top of the FAIRified data sets of different health research performing organizations. The entire FAIR4Health solution was validated through the assessment of a federated model for real-time prediction of 30-day readmission risk in patients with chronic obstructive pulmonary disease (COPD).Methods
The application of the FAIR principles on health research data sets in 3 different health care settings enabled a retrospective multicenter study for the development of specific federated machine learning models for the early prediction of 30-day readmission risk in patients with COPD. This predictive model was generated upon the FAIR4Health platform. Finally, an observational prospective study with 30 days follow-up was conducted in 2 health care centers from different countries. The same inclusion and exclusion criteria were used in both retrospective and prospective studies.Results
Clinical validation was demonstrated through the implementation of federated machine learning models on top of the FAIRified data sets from different health research performing organizations. The federated model for predicting the 30-day hospital readmission risk was trained using retrospective data from 4.944 patients with COPD. The assessment of the predictive model was performed using the data of 100 recruited (22 from Spain and 78 from Serbia) out of 2070 observed (records viewed) patients during the observational prospective study, which was executed from April 2021 to September 2021. Significant accuracy (0.98) and precision (0.25) of the predictive model generated upon the FAIR4Health platform were observed. Therefore, the generated prediction of 30-day readmission risk was confirmed in 87% (87/100) of cases.Conclusions
Implementing a FAIR data policy in health research performing organizations to facilitate data sharing and reuse is relevant and needed, following the discovery, access, integration, and analysis of health research data. The FAIR4Health project proposes a technological solution in the health domain to facilitate alignment with the FAIR principles.
López-Sánchez M, Loucera C, Peña-Chilet M, Dopazo J.
Hum Mol Genet. 2022; 31 (12)
DOI: 10.1093/hmg/ddac007
Abstract
Recent studies have demonstrated a relevant role of the host genetics in the coronavirus disease 2019 (COVID-19) prognosis. Most of the 7000 rare diseases described to date have a genetic component, typically highly penetrant. However, this vast spectrum of genetic variability remains yet unexplored with respect to possible interactions with COVID-19. Here, a mathematical mechanistic model of the COVID-19 molecular disease mechanism has been used to detect potential interactions between rare disease genes and the COVID-19 infection process and downstream consequences. Out of the 2518 disease genes analyzed, causative of 3854 rare diseases, a total of 254 genes have a direct effect on the COVID-19 molecular disease mechanism and 207 have an indirect effect revealed by a significant strong correlation. This remarkable potential of interaction occurs for >300 rare diseases. Mechanistic modeling of COVID-19 disease map has allowed a holistic systematic analysis of the potential interactions between the loss of function in known rare disease genes and the pathological consequences of COVID-19 infection. The results identify links between disease genes and COVID-19 hallmarks and demonstrate the usefulness of the proposed approach for future preventive measures in some rare diseases.
Alvarez-Romero C, Martínez-García A, Sinaci AA, Gencturk M, Méndez E, Hernández-Pérez T, Liperoti R, Angioletti C, Löbe M, Ganapathy N, Deserno TM, Almada M, Costa E, Chronaki C, Cangioli G, Cornet R, Poblador-Plou B, Carmona-Pírez J, Gimeno-Miguel A, Poncel-Falcó A, Prados-Torres A, Kovacevic T, Zaric B, Bokan D, Hromis S, Djekic Malbasa J, Rapallo Fernández C, Velázquez Fernández T, Rochat J, Gaudet-Blavignac C, Lovis C, Weber P, Quintero M, Perez-Perez MM, Ashley K, Horton L, Parra Calderón CL.
Open Res Eur. 2022; 2
DOI: 10.12688/openreseurope.14349.2
Abstract
Due to the nature of health data, its sharing and reuse for research are limited by ethical, legal and technical barriers. The FAIR4Health project facilitated and promoted the application of FAIR principles in health research data, derived from the publicly funded health research initiatives to make them Findable, Accessible, Interoperable, and Reusable (FAIR). To confirm the feasibility of the FAIR4Health solution, we performed two pathfinder case studies to carry out federated machine learning algorithms on FAIRified datasets from five health research organizations. The case studies demonstrated the potential impact of the developed FAIR4Health solution on health outcomes and social care research. Finally, we promoted the FAIRified data to share and reuse in the European Union Health Research community, defining an effective EU-wide strategy for the use of FAIR principles in health research and preparing the ground for a roadmap for health research institutions. This scientific report presents a general overview of the FAIR4Health solution: from the FAIRification workflow design to translate raw data/metadata to FAIR data/metadata in the health research domain to the FAIR4Health demonstrators' performance.
Carmona-Pírez J, Poblador-Plou B, Poncel-Falcó A, Rochat J, Alvarez-Romero C, Martínez-García A, Angioletti C, Almada M, Gencturk M, Sinaci AA, Ternero-Vega JE, Gaudet-Blavignac C, Lovis C, Liperoti R, Costa E, Parra-Calderón CL, Moreno-Juste A, Gimeno-Miguel A, Prados-Torres A.
Int J Environ Res Public Health. 2022; 19 (4)
DOI: 10.3390/ijerph19042040
Abstract
The current availability of electronic health records represents an excellent research opportunity on multimorbidity, one of the most relevant public health problems nowadays. However, it also poses a methodological challenge due to the current lack of tools to access, harmonize and reuse research datasets. In FAIR4Health, a European Horizon 2020 project, a workflow to implement the FAIR (findability, accessibility, interoperability and reusability) principles on health datasets was developed, as well as two tools aimed at facilitating the transformation of raw datasets into FAIR ones and the preservation of data privacy. As part of this project, we conducted a multicentric retrospective observational study to apply the aforementioned FAIR implementation workflow and tools to five European health datasets for research on multimorbidity. We applied a federated frequent pattern growth association algorithm to identify the most frequent combinations of chronic diseases and their association with mortality risk. We identified several multimorbidity patterns clinically plausible and consistent with the bibliography, some of which were strongly associated with mortality. Our results show the usefulness of the solution developed in FAIR4Health to overcome the difficulties in data management and highlight the importance of implementing a FAIR data policy to accelerate responsible health research.
Casimiro-Soriguer CS, Loucera C, Peña-Chilet M, Dopazo J.
Sci Rep. 2022; 12 (1)
DOI: 10.1038/s41598-021-04182-y
Abstract
Gut microbiome is gaining interest because of its links with several diseases, including colorectal cancer (CRC), as well as the possibility of being used to obtain non-intrusive predictive disease biomarkers. Here we performed a meta-analysis of 1042 fecal metagenomic samples from seven publicly available studies. We used an interpretable machine learning approach based on functional profiles, instead of the conventional taxonomic profiles, to produce a highly accurate predictor of CRC with better precision than those of previous proposals. Moreover, this approach is also able to discriminate samples with adenoma, which makes this approach very promising for CRC prevention by detecting early stages in which intervention is easier and more effective. In addition, interpretable machine learning methods allow extracting features relevant for the classification, which reveals basic molecular mechanisms accounting for the changes undergone by the microbiome functional landscape in the transition from healthy gut to adenoma and CRC conditions. Functional profiles have demonstrated superior accuracy in predicting CRC and adenoma conditions than taxonomic profiles and additionally, in a context of explainable machine learning, provide useful hints on the molecular mechanisms operating in the microbiota behind these conditions.
Gundogdu P, Loucera C, Alamo-Alvarez I, Dopazo J, Nepomuceno I.
BioData Min. 2022; 15 (1)
DOI: 10.1186/s13040-021-00285-4
Abstract
Background
Single-cell RNA sequencing (scRNA-seq) data provide valuable insights into cellular heterogeneity which is significantly improving the current knowledge on biology and human disease. One of the main applications of scRNA-seq data analysis is the identification of new cell types and cell states. Deep neural networks (DNNs) are among the best methods to address this problem. However, this performance comes with the trade-off for a lack of interpretability in the results. In this work we propose an intelligible pathway-driven neural network to correctly solve cell-type related problems at single-cell resolution while providing a biologically meaningful representation of the data.Results
In this study, we explored the deep neural networks constrained by several types of prior biological information, e.g. signaling pathway information, as a way to reduce the dimensionality of the scRNA-seq data. We have tested the proposed biologically-based architectures on thousands of cells of human and mouse origin across a collection of public datasets in order to check the performance of the model. Specifically, we tested the architecture across different validation scenarios that try to mimic how unknown cell types are clustered by the DNN and how it correctly annotates cell types by querying a database in a retrieval problem. Moreover, our approach demonstrated to be comparable to other less interpretable DNN approaches constrained by using protein-protein interactions gene regulation data. Finally, we show how the latent structure learned by the network could be used to visualize and to interpret the composition of human single cell datasets.Conclusions
Here we demonstrate how the integration of pathways, which convey fundamental information on functional relationships between genes, with DNNs, that provide an excellent classification framework, results in an excellent alternative to learn a biologically meaningful representation of scRNA-seq data. In addition, the introduction of prior biological knowledge in the DNN reduces the size of the network architecture. Comparative results demonstrate a superior performance of this approach with respect to other similar approaches. As an additional advantage, the use of pathways within the DNN structure enables easy interpretability of the results by connecting features to cell functionalities by means of the pathway nodes, as demonstrated with an example with human melanoma tumor cells.
Loucera C, Peña-Chilet M, Esteban-Medina M, Muñoyerro-Muñiz D, Villegas R, Lopez-Miranda J, Rodriguez-Baño J, Túnez I, Bouillon R, Dopazo J, Quesada Gomez JM.
Sci Rep. 2021; 11 (1)
DOI: 10.1038/s41598-021-02701-5
Abstract
COVID-19 is a major worldwide health problem because of acute respiratory distress syndrome, and mortality. Several lines of evidence have suggested a relationship between the vitamin D endocrine system and severity of COVID-19. We present a survival study on a retrospective cohort of 15,968 patients, comprising all COVID-19 patients hospitalized in Andalusia between January and November 2020. Based on a central registry of electronic health records (the Andalusian Population Health Database, BPS), prescription of vitamin D or its metabolites within 15-30 days before hospitalization were recorded. The effect of prescription of vitamin D (metabolites) for other indication previous to the hospitalization was studied with respect to patient survival. Kaplan-Meier survival curves and hazard ratios support an association between prescription of these metabolites and patient survival. Such association was stronger for calcifediol (Hazard Ratio, HR = 0.67, with 95% confidence interval, CI, of [0.50-0.91]) than for cholecalciferol (HR = 0.75, with 95% CI of [0.61-0.91]), when prescribed 15 days prior hospitalization. Although the relation is maintained, there is a general decrease of this effect when a longer period of 30 days prior hospitalization is considered (calcifediol HR = 0.73, with 95% CI [0.57-0.95] and cholecalciferol HR = 0.88, with 95% CI [0.75, 1.03]), suggesting that association was stronger when the prescription was closer to the hospitalization.