entropy-logo

Journal Browser

Journal Browser

Statistical Inference from High Dimensional Data II

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Statistical Physics".

Deadline for manuscript submissions: closed (31 December 2021) | Viewed by 7903

Special Issue Editor


E-Mail Website
Guest Editor
Department of Computer Science, Faculty of Computer Science, University of A Coruña, CITIC, 15071 A Coruña, Spain
Interests: machine learning; feature selection; complex biological systems; cancer systems; bionformatics; biomedical data science; computational biology
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Continuous improvement and cost reduction in next-generation sequencing platforms is enabling better understanding of multifactorial and complex pathologies such as cancer. This is the typical problem in which the amount of data matters and where, in addition, the so-called curse of dimensionality occurs (the number of variables is many orders of magnitude greater than the number of cases). In this Special Issue, we welcome contributions that apply different approaches of Statistical Inference or Machine Learning for the characterization of complex pathologies using -omic data. We strongly encourage interdisciplinary works with real data (TCGA, HMP, clinicogenomic data or related datasets) and heterogeneous data integration (clinical, genomic, proteomic, and so on).

This Special Issue solicit submissions in, but not limited to, the following areas:

  • Applications based on statistical inference from high dimensional data;
  • Dimensionality reduction with imbalanced biological datasets;
  • Applications based on feature selection (e.g., text processing, bioinformatics, medical informatics and natural language processing);
  • Applications based on Information Theory for data integration (e.g., semantic interoperability, clustering, classification);
  • Applications based on feature selection methods using meta-heuristic search methods such as genetic algorithms, particle swarm optimization and so on;
  • Applications based on feature extraction (e.g., PCA, LDA);
  • Applications based on prior knowledge (e.g., ontologies, pathways).

Volume I: Special Issue "Statistical Inference from High Dimensional Data"

Dr. Carlos Fernandez-Lozano
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Feature selection
  • Machine learning
  • Statistical inference
  • Dimensionality
  • Complex biological systems
  • Multifactorial diseases
  • Computational biology
  • Bioinformatics
  • Information theory
  • Large-scale data analysis
  • Information theory
  • Data mining

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

15 pages, 2365 KiB  
Article
A Method for Neutralizing Entropy Measurement-Based Ransomware Detection Technologies Using Encoding Algorithms
by Jaehyuk Lee and Kyungroul Lee
Entropy 2022, 24(2), 239; https://doi.org/10.3390/e24020239 - 04 Feb 2022
Cited by 13 | Viewed by 2259
Abstract
Ransomware consists of malicious codes that restrict users from accessing their own files while demanding a ransom payment. Since the advent of ransomware, new and variant ransomwares have caused critical damage around the world, thus prompting the study of detection and prevention technologies [...] Read more.
Ransomware consists of malicious codes that restrict users from accessing their own files while demanding a ransom payment. Since the advent of ransomware, new and variant ransomwares have caused critical damage around the world, thus prompting the study of detection and prevention technologies against ransomware. Ransomware encrypts files, and encrypted files have a characteristic of increasing entropy. Due to this characteristic, a defense technology has emerged for detecting ransomware-infected files by measuring the entropy of clean and encrypted files based on a derived entropy threshold. Accordingly, attackers have applied a method in which entropy does not increase even if the files are encrypted, such that the ransomware-infected files cannot be detected through changes in entropy. Therefore, if the attacker applies a base64 encoding algorithm to the encrypted files, files infected by ransomware will have a low entropy value. This can eventually neutralize the technology for detecting files infected from ransomware based on entropy measurement. Therefore, in this paper, we propose a method to neutralize ransomware detection technologies using a more sophisticated entropy measurement method by applying various encoding algorithms including base64 and various file formats. To this end, we analyze the limitations and problems of the existing entropy measurement-based ransomware detection technologies using the encoding algorithm, and we propose a more effective neutralization method of ransomware detection technologies based on the analysis results. Full article
(This article belongs to the Special Issue Statistical Inference from High Dimensional Data II)
Show Figures

Figure 1

18 pages, 1187 KiB  
Article
Statistical Approach of Gene Set Analysis with Quantitative Trait Loci for Crop Gene Expression Studies
by Samarendra Das and Shesh N. Rai
Entropy 2021, 23(8), 945; https://doi.org/10.3390/e23080945 - 23 Jul 2021
Viewed by 2057
Abstract
Genome-wide expression study is a powerful genomic technology to quantify expression dynamics of genes in a genome. In gene expression study, gene set analysis has become the first choice to gain insights into the underlying biology of diseases or stresses in plants. It [...] Read more.
Genome-wide expression study is a powerful genomic technology to quantify expression dynamics of genes in a genome. In gene expression study, gene set analysis has become the first choice to gain insights into the underlying biology of diseases or stresses in plants. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results from the primary downstream differential expression analysis. The gene set analysis approaches are well developed in microarrays and RNA-seq gene expression data analysis. These approaches mainly focus on analyzing the gene sets with gene ontology or pathway annotation data. However, in plant biology, such methods may not establish any formal relationship between the genotypes and the phenotypes, as most of the traits are quantitative and controlled by polygenes. The existing Quantitative Trait Loci (QTL)-based gene set analysis approaches only focus on the over-representation analysis of the selected genes while ignoring their associated gene scores. Therefore, we developed an innovative statistical approach, GSQSeq, to analyze the gene sets with trait enriched QTL data. This approach considers the associated differential expression scores of genes while analyzing the gene sets. The performance of the developed method was tested on five different crop gene expression datasets obtained from real crop gene expression studies. Our analytical results indicated that the trait-specific analysis of gene sets was more robust and successful through the proposed approach than existing techniques. Further, the developed method provides a valuable platform for integrating the gene expression data with QTL data. Full article
(This article belongs to the Special Issue Statistical Inference from High Dimensional Data II)
Show Figures

Figure 1

Review

Jump to: Research

12 pages, 305 KiB  
Review
Virtual Gene Concept and a Corresponding Pragmatic Research Program in Genetical Data Science
by Łukasz Huminiecki
Entropy 2022, 24(1), 17; https://doi.org/10.3390/e24010017 - 23 Dec 2021
Viewed by 2506
Abstract
Mendel proposed an experimentally verifiable paradigm of particle-based heredity that has been influential for over 150 years. The historical arguments have been reflected in the near past as Mendel’s concept has been diversified by new types of omics data. As an effect of [...] Read more.
Mendel proposed an experimentally verifiable paradigm of particle-based heredity that has been influential for over 150 years. The historical arguments have been reflected in the near past as Mendel’s concept has been diversified by new types of omics data. As an effect of the accumulation of omics data, a virtual gene concept forms, giving rise to genetical data science. The concept integrates genetical, functional, and molecular features of the Mendelian paradigm. I argue that the virtual gene concept should be deployed pragmatically. Indeed, the concept has already inspired a practical research program related to systems genetics. The program includes questions about functionality of structural and categorical gene variants, about regulation of gene expression, and about roles of epigenetic modifications. The methodology of the program includes bioinformatics, machine learning, and deep learning. Education, funding, careers, standards, benchmarks, and tools to monitor research progress should be provided to support the research program. Full article
(This article belongs to the Special Issue Statistical Inference from High Dimensional Data II)
Back to TopTop