Algorithms for Non-negative Matrix Factorisation

A special issue of Algorithms (ISSN 1999-4893). This special issue belongs to the section "Databases and Data Structures".

Deadline for manuscript submissions: closed (31 March 2023) | Viewed by 8033

Special Issue Editors


E-Mail Website
Guest Editor
Institute for Teaching and Learning Innovation, University of Queensland, Lucia 4072, Australia
Interests: data science; topic modelling; deep learning; algorithm usability and interpretation; learning analytics
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
QUT Centre for Data Science, School of Computer Science, Queensland University of Technology, GPO Box 2434, Brisbane, QLD 4001, Australia
Interests: data mining; machine learning; multi-view image and text mining; dimensionality reduction and manifold learning; transfer learning

Special Issue Information

Dear Colleagues,

The idea of using non-negative matrix factorization (NMF) for topic modeling was first introduced by Lee and Seung (1999) and popularized by Blei et al. (2003) with the publication of the latent Dirichlet allocation (LDA) algorithm. Both NMF and LDA are able to discover latent topics within a document collection, but used different mathematical approaches. NMF algorithms decompose a matrix into two or more reduced-dimension non-negative matrices. One of the advantages of NMF is that it can be used to identify meaningful subcomponents of high-dimensional data from a variety of modalities. For example, it can be used to decompose audio signals or images into their component parts, or to identify clusters of similar documents or related molecules. NMF is a powerful tool for data analysis and pattern recognition, and has found many applications across many domains, including signal processing, computer vision, image compression, gene expression data analysis, clustering, text mining, topic modelling, social network analysis, audio source separation, bioinformatics and chemoinformatics. Many different algorithms have been developed for NMF, including Lanczos, active sets methods and variational approaches.

NMF research has focussed on the design of novel NMF algorithms with improved initialization techniques, convergence properties, and the ability to take additional domain information and constraints as input. In recent years, there has been a growing interest in using NMF for data augmentation in the field of deep learning. Algorithms is a leading academic publication and seeks high-quality papers on the latest advances in NMF algorithms, a rapidly growing area of research.

Researchers are invited to submit original papers on all aspects of non-negative matrix factorization, including but not limited to:

  • Novel NMF algorithms with improved convergence properties;
  • Novel variants of LDA, NMF and neural-inspired algorithms for topic modeling;
  • Initialization techniques utilizing but not limited to genetic algorithms;
  • Numerical analysis and comparison of existing NMF algorithms (e.g., Lanczos, active sets methods and variational approaches);
  • Deep NMF architectures for topic modeling;
  • NMF algorithms for constraint clustering and the inclusion of domain information (e.g., vectors obtained from large language models such as BERT or transformer models);
  • Neural network architectures augmented with NMF layers;
  • The use of NMF to assist with the interpretation of large language models (using transformer architectures) and deep neural networks;
  • Applications of NMF within varied domain areas, including but not limited to natural language processing, bioinformatics and computer vision;
  • Semi-NMF and simultaneous NMF algorithms.
  • Improving the scalability of neural-inspired algorithms for topic modeling.
  • Algorithms to support cross-lingual topic modeling;
  • Topic modeling algorithms that allow for the inclusion of domain information (e.g., vectors obtained from large language models such as BERT or transformer models);
  • The use of topic modeling algorithms to assist with the interpretation of large language models (using transformer architectures) and deep neural networks;
  • Applications of topic modeling algorithms within varied domain areas including but not limited to natural language processing, bioinformatics and computer vision;
  • Smart user interfaces for user interaction and steering of topic modeling algorithm output.

Dr. Aneesha Bakharia
Dr. Khanh Luong
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 2412 KiB  
Article
Continuous Semi-Supervised Nonnegative Matrix Factorization
by Michael R. Lindstrom, Xiaofu Ding, Feng Liu, Anand Somayajula and Deanna Needell
Algorithms 2023, 16(4), 187; https://doi.org/10.3390/a16040187 - 30 Mar 2023
Cited by 1 | Viewed by 1216
Abstract
Nonnegative matrix factorization can be used to automatically detect topics within a corpus in an unsupervised fashion. The technique amounts to an approximation of a nonnegative matrix as the product of two nonnegative matrices of lower rank. In certain applications it is desirable [...] Read more.
Nonnegative matrix factorization can be used to automatically detect topics within a corpus in an unsupervised fashion. The technique amounts to an approximation of a nonnegative matrix as the product of two nonnegative matrices of lower rank. In certain applications it is desirable to extract topics and use them to predict quantitative outcomes. In this paper, we show Nonnegative Matrix Factorization can be combined with regression on a continuous response variable by minimizing a penalty function that adds a weighted regression error to a matrix factorization error. We show theoretically that as the weighting increases, the regression error in training decreases weakly. We test our method on synthetic data and real data coming from Rate My Professors reviews to predict an instructor’s rating from the text in their reviews. In practice, when used as a dimensionality reduction method (when the number of topics chosen in the model is fewer than the true number of topics), the method performs better than doing regression after topics are identified—both during training and testing—and it retrains interpretability. Full article
(This article belongs to the Special Issue Algorithms for Non-negative Matrix Factorisation)
Show Figures

Figure 1

19 pages, 1004 KiB  
Article
A Comparison of Different Topic Modeling Methods through a Real Case Study of Italian Customer Care
by Gabriele Papadia, Massimo Pacella, Massimiliano Perrone and Vincenzo Giliberti
Algorithms 2023, 16(2), 94; https://doi.org/10.3390/a16020094 - 08 Feb 2023
Cited by 6 | Viewed by 2649
Abstract
The paper deals with the analysis of conversation transcriptions between customers and agents in a call center of a customer care service. The objective is to support the analysis of text transcription of human-to-human conversations, to obtain reports on customer problems and complaints, [...] Read more.
The paper deals with the analysis of conversation transcriptions between customers and agents in a call center of a customer care service. The objective is to support the analysis of text transcription of human-to-human conversations, to obtain reports on customer problems and complaints, and on the way an agent has solved them. The aim is to provide customer care service with a high level of efficiency and user satisfaction. To this aim, topic modeling is considered since it facilitates insightful analysis from large documents and datasets, such as a summarization of the main topics and topic characteristics. This paper presents a performance comparison of four topic modeling algorithms: (i) Latent Dirichlet Allocation (LDA); (ii) Non-negative Matrix Factorization (NMF); (iii) Neural-ProdLDA (Neural LDA) and Contextualized Topic Models (CTM). The comparison study is based on a database containing real conversation transcriptions in Italian Natural Language. Experimental results and different topic evaluation metrics are analyzed in this paper to determine the most suitable model for the case study. The gained knowledge can be exploited by practitioners to identify the optimal strategy and to perform and evaluate topic modeling on Italian natural language transcriptions of human-to-human conversations. This work can be an asset for grounding applications of topic modeling and can be inspiring for similar case studies in the domain of customer care quality. Full article
(This article belongs to the Special Issue Algorithms for Non-negative Matrix Factorisation)
Show Figures

Figure 1

16 pages, 1471 KiB  
Article
Topic Scaling: A Joint Document Scaling–Topic Model Approach to Learn Time-Specific Topics
by Sami Diaf and Ulrich Fritsche
Algorithms 2022, 15(11), 430; https://doi.org/10.3390/a15110430 - 16 Nov 2022
Cited by 1 | Viewed by 1569
Abstract
This paper proposes a new methodology to study sequential corpora by implementing a two-stage algorithm that learns time-based topics with respect to a scale of document positions and introduces the concept of Topic Scaling, which ranks learned topics within the same document [...] Read more.
This paper proposes a new methodology to study sequential corpora by implementing a two-stage algorithm that learns time-based topics with respect to a scale of document positions and introduces the concept of Topic Scaling, which ranks learned topics within the same document scale. The first stage ranks documents using Wordfish, a Poisson-based document-scaling method, to estimate document positions that serve, in the second stage, as a dependent variable to learn relevant topics via a supervised Latent Dirichlet Allocation. This novelty brings two innovations in text mining as it explains document positions, whose scale is a latent variable, and ranks the inferred topics on the document scale to match their occurrences within the corpus and track their evolution. Tested on the U.S. State Of The Union two-party addresses, this inductive approach reveals that each party dominates one end of the learned scale with interchangeable transitions that follow the parties’ term of office, while it shows for the corpus of German economic forecasting reports a shift in the narrative style adopted by economic institutions following the 2008 financial crisis. Besides a demonstrated high accuracy in predicting in-sample document positions from topic scores, this method unfolds further hidden topics that differentiate similar documents by increasing the number of learned topics to expand potential nested hierarchical topic structures. Compared to other popular topic models, Topic Scaling learns topics with respect to document similarities without specifying a time frequency to learn topic evolution, thus capturing broader topic patterns than dynamic topic models and yielding more interpretable outputs than a plain Latent Dirichlet Allocation. Full article
(This article belongs to the Special Issue Algorithms for Non-negative Matrix Factorisation)
Show Figures

Figure 1

16 pages, 1180 KiB  
Article
SepFree NMF: A Toolbox for Analyzing the Kinetics of Sequential Spectroscopic Data
by Renata Sechi, Konstantin Fackeldey, Surahit Chewle and Marcus Weber
Algorithms 2022, 15(9), 297; https://doi.org/10.3390/a15090297 - 24 Aug 2022
Viewed by 1496
Abstract
This work addresses the problem of determining the number of components from sequential spectroscopic data analyzed by non-negative matrix factorization without separability assumption (SepFree NMF). These data are stored in a matrix M of dimension “measured times” versus “measured wavenumbers” and can be [...] Read more.
This work addresses the problem of determining the number of components from sequential spectroscopic data analyzed by non-negative matrix factorization without separability assumption (SepFree NMF). These data are stored in a matrix M of dimension “measured times” versus “measured wavenumbers” and can be decomposed to obtain the spectral fingerprints of the states and their evolution over time. SepFree NMF assumes a memoryless (Markovian) process to underline the dynamics and decomposes M so that M=WH, with W representing the components’ fingerprints and H their kinetics. However, the rank of this decomposition (i.e., the number of physical states in the process) has to be guessed from pre-existing knowledge on the observed process. We propose a measure for determining the number of components with the computation of the minimal memory effect resulting from the decomposition; by quantifying how much the obtained factorization is deviating from the Markovian property, we are able to score factorizations of a different number of components. In this way, we estimate the number of different entities which contribute to the observed system, and we can extract kinetic information without knowing the characteristic spectra of the single components. This manuscript provides the mathematical background as well as an analysis of computer generated and experimental sequentially measured Raman spectra. Full article
(This article belongs to the Special Issue Algorithms for Non-negative Matrix Factorisation)
Show Figures

Figure 1

Back to TopTop