Perceiving Conflict of Interest Experts Recommendation System Based on a Machine Learning Approach

Im, Yunjeong; Song, Gyuwon; Cho, Minsang

doi:10.3390/app13042214

Open AccessArticle

Perceiving Conflict of Interest Experts Recommendation System Based on a Machine Learning Approach

by

Yunjeong Im

,

Gyuwon Song

^*

and

Minsang Cho

Data Science Laboratory, Advanced Institute of Convergence Technology (AICT), 145 Gwanggyo-ro, Yeongtong-gu, Suwon-si 16229, Gyeonggi-do, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(4), 2214; https://doi.org/10.3390/app13042214

Submission received: 28 November 2022 / Revised: 24 January 2023 / Accepted: 7 February 2023 / Published: 9 February 2023

(This article belongs to the Special Issue Advances in Recommender Systems and Information Retrieval)

Download

Browse Figures

Versions Notes

Abstract

:

Academic societies and funding bodies that conduct peer reviews need to select the best reviewers in each field to ensure publication quality. Conventional approaches for reviewer selection focus on evaluating expertise based on research relevance by subject or discipline. An improved perceiving conflict of interest (CoI) reviewer recommendation process that combines the five expertise indices and graph analysis techniques is proposed in this paper. This approach collects metadata from the academic database and extracts candidates based on research field similarities utilizing text mining; then, the candidate scores are calculated and ranked through a professionalism index-based analysis. The highly connected subgraphs (HCS) algorithm is used to cluster similar researchers based on their association or intimacy in the researcher network. The proposed method is evaluated using root mean square error (RMSE) indicators for matching the field of publication and research fields of the recommended experts using keywords of papers published in Korean journals over the past five years. The results show that the system configures a group of Top-K reviewers with an RMSE 0.76. The proposed method can be applied to the academic society and national research management system to realize fair and efficient screening and management.

Keywords:

conflict of interest; expert recommendation; highly connected subgraph; machine learning; natural language processing; scholarly big data; subject classification

1. Introduction

There has been a global annual increase in the academic research and national research and development (R&D) projects [1,2,3]. However, continuous efforts are required to ensure the quality of publications (e.g., research papers, patents, reports, and proposals) and processes that support and manage accurate and effective peer review. Although peer review has been instrumental in academic societies and national research management institutions over the last few years, it remains significantly challenging to provide accurate and effective reviewer recommendations [3,4,5,6,7,8]. Manually selecting reviewers from a pool of experts in the field [3,9] is time-consuming [3,7,10,11] and can suffer from bias. Currently, the delays in the peer review process can be attributed to the time spent by the journal editors in selecting appropriate reviewers, and that spent by the reviewers to accept the requests [9]. Only a small number of researchers are selected because reviewers need to possess a high level of expertise in the target field [6]. Additional reviewers also need to be secured for ensuring fairness in the peer review process and to provide a rapid response to the inseparable process of recruiting reviewers [4,6,12].

Several advanced approaches have been recently proposed to improve the quality of recommendations and to support professional reviewers accurately and automatically. Several heuristic algorithms automatically assign reviewers to papers effectively [6,7]. Further, intelligent decision-support models [4,8,13] have improved the overall grouping quality when determining the best reviewer allocation. A content-based reviewer recommendation system supports various information search technologies that evaluate research proposals or papers [2,10]. Most of these approaches are based on the reviewer’s level of expertise, and they can be considered an optimization problem that ranks the reviewer for a specific objective.

In real-world systems, factors such as quality, state-of-the-art, productivity, contribution, persistence, conflict of interest (CoI), and diversity of reviewers should be considered for reviewer selection. Conventional approaches for reviewer selection evaluate expertise by utilizing subject and discipline relevance through simple keyword matching. Most researchers update information on their own in the expert pool, and because research management institutions have no enforcing power to request periodic information updates, the recency and reputation of reviewer recommendations can degrade when only existing information is used. The research topics and fields of interest of a researcher may change over time; however, the history of these research activities is not updated regularly [10,14], which can lead to an assignment bias.

The number of papers published by the researcher in the past is used as a representative indicator of academic contribution. Although important research results are used in addition to the thesis, a more precise calculation method than the simple counting method is required. Attributes such as the quality of the paper, the contribution of the author, and the time of publication should be considered.

Another problem is that CoIs often exist in the process of selecting reviewers. Therefore, the potential CoIs among researchers need to be considered to improve the quality and reliability of the peer review processes [3,4]. For example, a bias in reporting and interpretation can occur if institutions fail to ensure fairness and transparency in the peer review process. These limitations can cause misinterpretations of the author’s viewpoint and lead to the rejection of excellent scientific work or a potentially successful project proposal [10].

This paper presents a solution that improves the quality of reviewer recommendation considering the topic’s relevance to the reviewer’s recent research direction, academic contribution, and CoI. The proposed configuration periodically collects researcher data, such as papers, patents, and reports, from multiple sources such as open-access databases and websites, and automates the construction and renewal of the expert pool. To design a recommendation system based on field similarity, the proposed system extracts candidates from similar fields through subject extraction and classification of input literature using text mining and SVM-based multiclassifiers. Then, candidate scores are aggregated and ranked by considering accuracy, productivity, persistence, contribution, and quality. The CoI aims to exclude direct stakeholders such as author–reviewer and potential stakeholders such as reviewer–reviewer for diversity. The highly connected subgraphs (HCS) [15] algorithm is used to cluster a set of nodes with strong intimacy based on the intimacy of the researcher on the researcher network; this algorithm selects nodes classified as weakly connected relationships by excluding a set of nodes that are strong connections in the network. Our system effectively improves the existing reviewer recommendation system and operation status by allowing academic societies and national research management institutions to automatically configure an expert group with higher fitness based on recent data at the time of recommendation.

The major contributions of this paper are as follows:

(1): The academic information of 100,000 or more experts is automatically updated to recommend the best experts in each field.
(2): Five professionalism indices (accuracy, productivity, persistence, contribution, and quality) are proposed to determine expertise, and a novel ranking-based reviewer recommendation method that considers multiple attributes is presented using natural language processing and machine learning.
(3): We address direct CoI such as those between authors–reviewers and exclude potential stakeholders among reviewers–reviewers to ensure diversity and balance using graph-based machine learning.
(4): The experimental results confirm that our model effectively considers the reviewer’s research direction and manuscript relevance in accurately recognizing the recency of expertise and resolving CoI.

The rest of this paper is organized as follows. Section 2 introduces related work and background knowledge of the algorithm applied to the recommendation system. Section 3 describes the proposed system, and Section 4 presents the experimental environment and evaluation measures to verify system performance. The experimental results and analysis are provided in Section 5, and, finally, the conclusions and future work are summarized in Section 6.

2. Related Work

2.1. Reviewer Recommendation System

2.1.1. Recommendation System

A recommendation system recommends items to users using collaborative or content-based filtering methods [16]. The former learns from the past behavior of the user and similar decisions made by other users and predicts items (or ratings for items) of interest to the user [10,17]. The latter is based on an information search and utilizes a series of individual characteristics for recommending additional items with similar attributes [2,10,18]. The properties of similar items are identified through information generated from user profile data [1] or history, and it is assigned to item categories according to predefined criteria. The items preferred by the user and the Top-N items of a category with high keyword similarity are included in the recommended list. These two approaches can be combined into a hybrid recommendation system [4,19]; moreover, new technologies such as social networks and semantic information can be introduced into such recommendation systems [20].

2.1.2. Reviewer Recommendation

The emergence of big data generated by scientific writing has inspired researchers to use such data to facilitate the tasks involved in the research process. Reviewer recommendation procedures involve recognizing expertise by tracking research experience or interest in the target field [10,16]. Further, many researchers leave the results of the research field such as papers, patents, and research reports. Given this information, studies where reviewer recommendations are provided by judging the expertise of researchers are actively being studied, and they fall into three categories:

(i): Profile-centric: In this category, the reviewers’ profiles are generated and matched with the keywords of the papers to be reviewed. The methods used in this approach range from information retrieval, subject modeling, and natural language processing to self-evaluation and bidding (preferred base). Thus far, several studies have been conducted to allocate reviewers to manuscripts using methods such as web mining, latent semantic indexing [2,21], topic modeling, and fuzzy modeling. For example, Protasiewicz, Pedrycz, Kozłowski, Dadas, Stanisławek, Kopacz, and Gałężewska [10] proposed a content-based recommendation system that selects reviewers to evaluate research proposals or papers. The information search process includes publication classification, author clarification, keyword extraction, and total text indexing. The recommendation method combines cosine similarity among keywords, and total text indexing is suggested. Xie, Li, Zhang, Pan, and Han [20] proposed the topic-specific contextual feature model that analyzes users’ social activities and recommends reviewers. Chughtai, Lee, Shahzadi, Kabir, and Hassan [21] proposed an ontology-based model using links open data to recommend reviewers. Nguyen, Sánchez-Hernández, Agell, Rovira, and Angulo [6] proposed a greedy algorithm that ranks each candidate reviewer through various scores, such as subject consistency, up-to-date, and quality, using an ordered weighted averaging (OWA) operator. The research interests of the reviewers are taken directly from the mentioned website and sorted (automatically or by a reviewer) based on the topic of interest. Finally, the thesis–reviewer similarity is calculated as an OWA function of reviewer expertise based on the thesis’s topic coverage needs, date, quality, and reviewer availability. Mirzaei et al. [22] presented thematic modeling methods such as extracting topics from the text content of papers and reviewers using probabilistic latent semantic analysis; they clustered reviewers’ publications and used cosine distances to identify potential research areas most similar to query papers and reviewers. The relevance score was then calculated using the Kullback–Leibler branch measurement. However, matching people and objects efficiently using cosine measurements between keywords can be challenging because other terms may have similar meanings, which can make fitting difficult. Therefore, Tayal et al. [23] proposed fuzzy logic to represent the expertise level of reviewers in different areas. The expertise level is calculated based on the reviewer’s workload balance, CoI prevention, and individual preferences through multiple keyword mapping of the proposal. Shon et al. [24] proposed a system that measures similarity using keywords with weights based on fuzzy logic, and they automatically matched research proponents and reviewers. Zhao, Zhang, Duan, Chen, Zhang, and Tang [5] used the word2vec-based word mover distance algorithm for calculating the text distance between the reviewers’ research interests and manuscripts, and they used the label information to classify reviewers. These methods determine the subject relationship between the reviewer and the manuscript through textual information.
(ii): Group-centric: This category clusters the manuscripts into groups based on topic similarity and then assigns reviewers. Hettich and Pazzani [25] proposed a text mining approach to group proposals using keywords for identifying and assigning reviewers to proposals. However, proposals with similar research areas can be incorrectly grouped for several reasons. For example, the keyword can contain incomplete information regarding the manuscript contents; keywords are provided by applicants who may have subjective views and misunderstandings and are only representative of some of the research proposals; and manual grouping is performed by department or program managers of funding agencies, who may have a different understanding of the field of study without the appropriate knowledge to assign proposals to the right group. Sun, Ma, Fan, and Wang [4] and Liu, Wang, Ma, and Sun [13] reviewed classification studies that focus on reviewer expertise. Further, research has focused on knowledge-based decision-support tools for reviewer classification or grouping practices. For example, Duan, Tan, Zhao, Wang, Chen, and Zhang [3] proposed a sentence pair modeling approach using neural network models trained to learn the field relationship between subject and abstract pairs. The field similarity between the title and abstract was first converted into a similarity score between the manuscript and examiner’s thesis, and then into that between the manuscript and examiner.
(iii): Pattern-based and time-range approaches: The pattern-mining problem is a subtask of association rule mining. Many studies have distinguished the representativeness of keywords, which assigns weights based on the importance of extracted keywords for association rule mining. Bibliographic datasets are collected in the pattern-based reviewer recommendation approach, and reviewer allocation is treated as a pattern-mining problem. The time-range approach considers expertise that changes over time. Dehghan et al. [26] proposed a novel method to find T-shaped reviewers in address networks to investigate the effect of time on expertise. The method involved searching for reviewers based on the temporal change in different expertise trees after taking snapshots of the expertise tree at regular time intervals.

However, it remains difficult to find a combination of multiple attributes (e.g., up-to-dateness in research direction, CoI, and diversity in reviewer groups) of significance besides subject relevance in the existing approaches. Therefore, this study aims to improve reviewer recommendation quality by considering several aspects such as subject relevance, quality, productivity, persistence, state-of-the-art, accuracy, CoI, and diversity.

2.2. Conflict of Interest (CoI)

In the peer review process, CoI is a scenario that can undermine the professional judgment of a reviewer because of the interest between the reviewer and author. Academia should (1) manually select reviewers through editors and academic organizers, (2) ask authors submitting the paper to suggest reviewers themselves, and (3) manually declare all CoI between authors and reviewers. These passive methods are error-prone, time-consuming, and tedious for both reviewers and editors.

Recently, research on CoI has been conducted in the form of establishing and utilizing social networks. Research on social networks include studies that focus on networks formed based on relationships and interactions between people rather than the information about each individual. These social networks can be applied to the organization of researchers’ networks to share knowledge among researchers or to support researcher search and collaborative research. The researcher network connects researchers based on joint research achievements, joint tasks, and academic ties, and it can be used to manage human resources at the national level, select reviewers for task screening, and support collaborative research. Pradhan et al. [27] considered potential CoI in various contexts by studying co-author graphs, wherein nodes represent authors and edges represent papers co-written in academic networks. The number of co-authors formed by researchers in an academic network is used as an indicator for measuring the sociability or influence of a researcher. Yan et al. [28] identified other potential CoI in the relationships between researchers and researchers (co-authors, colleagues, and advisor–advisee) and between institutions (cooperation between members and colleagues of two organizations).

However, most studies on CoI have focused on network configuration using co-author relationships and weighting based on the number of co-authors. In academic relationships, direct CoI, such as teacher relations, should be considered when the author–reviewer belong to the same institution or group, and this should include joint research relationships in recent years. The network configuration in the form of connecting all these complex interests is meaningless. It is not known exactly whether all interests represent CoI, and even if they are in a direct relationship, the intimacy may be low. Therefore, it is necessary to cluster researchers according to relevance or intimacy in the researcher network.

We present a researcher clustering technique for effectively constructing collaborative groups between researchers in a network of researchers with weighted information. To this end, we selected the HCS algorithm; weight was expressed as intimacy, reflecting the strength of the connection between researchers obtained by analyzing the co-author relationship, teacher relationship, and same affiliation or group relationship. The HCS algorithm clusters a set of nodes with strong connectivity on a graph, which helps divide the graph into minimal edge removal with weak connectivity [29]. This method divides researchers with weak intimacy based on researcher intimacy. Thus, it is possible to find a group of researchers with weak intimacy even if there is no CoI at all or even if there is a CoI.

3. Proposed Methodology

The flow of the proposed recommendation system is largely divided into data collection, keyword extraction and classification, and recommendation processes, as shown in Figure 1. In the recommendation task, we describe a method for aggregating multiple attributes, such as accuracy, productivity, persistence, contribution, and quality, to enable ranking. Then, we perform filtering to eliminate potential CoIs from the candidate extraction results of the recommendation process to obtain reviewer recommendations.

3.1. Data Acquisition

The experts include those who have obtained a Ph.D. or higher degree in Korea and can work as researchers or professors. To evaluate their expertise, it is necessary to conduct data collection of basic information necessary for analysis. In this respect, metadata accumulated in the academic database is representative data that can systematically analyze past and current research trends in various aspects. Each year, researchers submit numerous papers and undergo rigorous reviews by a group of reviewers. The academic databases contain many beneficial data sources, which makes them a very realistic and ideal choice. Table 1 summarizes a list of major academic databases containing journals and papers published by excellent academic institutions in Korea. The Korea Citation Index (KCI) provides information on registered journals that have been selected through the journal evaluation process of the National Research Foundation of Korea, and also on published papers. The Korean Research Memory (KRM) provides a vast number of search results from research projects, reports, and papers, in connection with the KCI, SCI, and SCOPUS databases. The RISS and DBPia provide various academic information, including domestic academic journals, proceedings, theses, and publications. The Korean Researcher Information (KRI) provides national-scale researcher achievement information that integrates and links research achievement information from each university at the Korea Academic Promotion Foundation.

The expert dataset used in this paper was collected using the open API provided by the Korean academic database, as well as a crawler that we developed ourselves. The crawler collects metadata from published papers, patents, and reports every certain quarter, and examples of the collected metadata are presented in the description in Table 2. We collected information on approximately 1.1 million papers and 500,000 Korean researchers published between 1970 and 2022. Then, we check the data for insolubility in the data preprocessor and generate an expert profile by field. The metadata stored in the database is updated annually to reflect the latest research directions of experts.

3.2. Preprocessing Collected Data

Textual data collected from academic databases often contain stop words, which often affect the efficiency of machine learning. Therefore, we preprocess the collected data; this step is largely divided into expert profile generation and keyword extraction. Expert profile generation is used to build the pool of experts needed to recommend experts suitable for the requested field based on the extracted keywords. Keyword extraction aims to find experts in similar fields through machine learning-based literature classification, and it aims to facilitate subsequent experiments. The detailed steps are shown in Figure 2.

3.2.1. Name Disambiguation

We conducted author name disambiguation and duplicate detection to create a complete reviewer profile. The ambiguity of the author’s name should be considered because there may be multiple persons with the same name or authors with multiple names in different places; this disambiguation was achieved using personal profiles and various properties extracted from publications such as name, affiliated institution, major, email, and birth date. Matching rules were sequentially applied to each author’s name in the database. The remaining rules were negated if a rule identified the paper’s author. The matching rules were designed based on the available data and experiments. The procedural considerations included (1) name and email, (2) affiliated institution, (3) major, and (4) date of birth. As a result, a total of 113,465 expert profiles were created. Expert profiles include name, affiliation, major, research field, keywords, email, and date of birth. The next step is to perform keyword extraction and classification tasks from the expert’s academic metadata and add the results to the profile to build a pool of experts by field.

3.2.2. Keyword Extraction

The keyword extraction process consists of two steps: (1) extracting a keyword list from the collected academic metadata, and (2) performing automatic keyword extraction based on the publications and project histories of the experts. In this study, the term frequency–inverse document frequency (TF-IDF) value for each keyword was generated by vectorizing the titles and abstracts of academic articles. The core keywords that effectively summarize the document were extracted by applying importance weights based on TextRank [30]. Furthermore, compound nouns that should be treated as one keyword in the document were processed by expanding conventional keyword extraction centered on single nouns. Stop word handling is essential in the TF-IDF [31] algorithm because it retains important words that represent the subject while filtering out high-frequency common words. Once the stop words were removed from all documents, the vocabulary size was reduced further by eliminating words that appeared less than three times. Since most publications are in Korean, KoNLPy (https://konlpy.org (accessed on 13 August 2022)), which is a Korean morpheme analysis package, was used to remove stop words by word-class tagging after dividing sentences into morphemes. For example, “A Study on the Behavior Modeling and Detection Method of Financial Fraud Using Ensemble Learning” can be divided into word sets “Behavior Modeling,” “Detection Method,” “Financial Fraud,” and “Ensemble Learning.” The English text uses NLTK (https://www.nltk.org (accessed on 13 August 2022)) to separate sentences from the document text and to analyze and extract keywords.

The TF-IDF indicates statistical values that represent the importance of words in a specific document to extract keywords. The TF-IDF of the kth word t in the jth document d is given as

TF - IDF (t_{k}, d_{j}) = TF - IDF (t_{k}, d_{j}) \times IDF

(1)

where TF represents the frequency of a specific word t in document d. Further, we assume that keywords that frequently appear in a document are more important. The TF is divided by the total frequency of all words for normalization as

TF (t_{k}, d_{j}) = \frac{f_{k, j}}{\max_{z} f_{z, j}}

(2)

However, a high-frequency word can correspond to a stop word. Therefore, IDF, which digitalizes the frequency of word t in a document set d, is calculated. The total number of documents is divided by the number of documents containing word t, and its log is obtained as

IDF (t_{k}, d_{j}) = \log \frac{N}{n_{k}}

(3)

where N denotes the total number of documents and

n_{k}

represents the number of documents in which the kth word t appears. Here, one is added to the denominator to prevent the case where no document contains word t. Words appearing in fewer documents have a higher IDF and are more likely to be extracted as keywords. Each word can be represented by a word weight between zero and one through TF-IDF. A document–keyword matrix is established when the Top-N words with high word weights are selected as keywords through the above process, as summarized in Table 3.

A word weight graph was generated using the above document–keyword matrix and the TextRank algorithm was applied. Each TextRank algorithm creates a graph with words as nodes and the simultaneous appearance relationship of each word as edges. Scores and rankings are assigned to word nodes, which are the vertices of the edge graph. TextRank generates a co-occurrence graph between words to extract core keywords, and it generates a sentence-similarity graph based on the similarity between words. The ranking of each word is calculated by training the PageRank with each graph as

PR (u) = c \times \sum_{v \in B_{u}}^{} \frac{PR (v)}{N_{v}} + (1 - c) \times \frac{1}{N}

(4)

Keywords with high ranking are determined as core words and extracted [21]. In (4), (

1 - c

),

B_{u}

, and

N_{v}

represent a value assumed as random inflow, starting point of BackLink, and number of links in the vertex of each node, respectively. The vertex of one node divides its ranking by

N_{v}

and delivers the value to the linked page

u

. A node with many backlinks is determined as a high-ranking node.

A sentence graph must be created to extract key sentences using TextRank. In the graph, each sentence becomes a node, and the edge weight represents the similarity between sentences. Cosine similarity is used to measure the similarity between documents or sentences, and TextRank utilizes the scale of similarity between sentences defined by

sim (s_{1}, s_{2}) = \frac{∣ {w_{k} ∣ w_{k} \in S_{1} ∣ w_{k} \in S_{2}} ∣}{\log ∣ S_{1} ∣ + \log ∣ S_{2} ∣}

(5)

The number of words that appear in both sentences is divided by the sum of the log values of the number of words in each sentence.

3.3. Documents Categorization Using SVM-Based Classifier

Currently, Korea has a “classification table for academic research” provided by the Korea Research Foundation (NRF) that can be used to classify academic journals. The major categories are divided into a total of eight categories, as listed in Table 4. The major classification is further subdivided into middle classifications that focus on more specific research areas. However, only a large classification of thesis units is assigned to the published academic journal papers; no detailed classification name is given below that. For accurate expert recommendation, it is necessary for the actual system environment to classify specific papers into categories below division. Therefore, we focus on the automatic classification of the intermediate field from the “Academic Research Field Classification Table” using a multiclassifier based on support vector machines (SVM).

3.3.1. SVM Algorithm

Text classification is used to classify text into predefined semantic categories or a set of labels. Each text is designated to belong to multiple categories instead of exactly one category. The SVM is a powerful tool for solving practical binary classification problems; the basic concept is to obtain an optimal hyperplane that distinguishes objects belonging to two classes using positive or negative values. SVMs can effectively handle linearly separable classification problems. In the area of classification and regression analysis, SVMs have been proven to perform better than conventional machine learning methods. Beltrán et al. [32] used SVM, neural network, and linear discrimination analysis to classify Chilean wine types. Sanjaa and Chuluun [33] used linear SVM algorithms to detect invasive, unknown malware samples. Arun Kumar and Kumar and Gopal [34] applied SVM to file classification and found that SVM outperforms other classification models in terms of performance and required training time.

3.3.2. Development of SVM Classifiers for Multiclass Problems

SVM is a binary classifier, and therefore, there is a need to a design method that can perform multiclass classification. To this end, methods that can create independent SVMs for each class and merge the results have been used; however, such methods degrade the generalization performance of SVMs and become noise sensitive when the number of classes is large and the number of data within one class is not small.

To address this issue, we improve classification performance by assigning classification priorities to classifiers in academic fields with high emergence rates through statistical-based models for keywords in the input literature. Such models use a large amount of external text corpus to calculate the frequency of appearance of keywords in the input literature by field in the “Academic Research Field Classification Table”. Next, the calculated academic fields are sorted in descending order from the order in which the keyword appearance rate is high; the classifier corresponding to the top 60% is sequentially classified. Figure 3 presents a schematic of the proposed classification priority application SVM. An SVM-based multiclassifier is generated for each academic field; the highest weighted academic field among the classification weights produced through each classifier is obtained.

3.3.3. Expert Pool Implementation

The expert pool is constructed in terms of the research field based on the “Academic Research Field Classification Table” using the subject keyword of the expert profile and an SVM-based multiclassifier. A reviewer pool was established for the remaining seven domain areas, excluding complex studies in the large classification system, based on researcher characteristics (major, research field) and the independence and expertise of the academic field. Subclassification was ignored, and the researchers were automatically classified into the middle classification of each domain or into the most similar area through the classification model.

For example, the middle classification categories for “engineering” include mechanical, transportation, electronic, information and communication, electrical, and architectural engineering. If a reviewer’s major is mechanical engineering and the main research is related to automobiles, it is classified as engineering, mechanical engineering, and automotive engineering. The reviewer classification of predefined recommended domains is illustrated as shown in Figure 4.

3.4. Expertise Score Computation and Reviewer Ranking

An objective expert discrimination technique is required to find a researcher with expertise in a specific research field. The proposed approach evaluates candidates’ expertise using the five expertise indices (accuracy, productivity, persistence, contribution, and quality), as illustrated in Figure 5. The index persistence assigns penalties based on the time elapsed between the time of publication of the paper and the current time point; the basis for this is that if more recent research points of the published paper are evaluated, the more likely the reviewer is to accept the request for review. The degree of contribution uses the author type and the performance of participation in academic activities for the evaluation. Accuracy is evaluated by measuring the similarity between the input paper’s keyword vector and the keyword vectors of papers previously published by the researcher.

3.4.1. Cosine Similarity

Cosine similarity comprehensively considers the candidate’s research field keywords and the frequency and timing of similar subject paper publications to evaluate the expertise of the reviewer candidate for the submitted manuscript. First, in a given domain, the subject relevance of the reviewer candidate is calculated using the TF-IDF matrix of the candidate keyword vector highly related to the manuscript’s subject keyword. The similarity technique considers valid information between two keywords as two vectors. The proximity degree can be determined by applying the cosine or Pearson correlation similarity techniques, which are representative methods for calculating keyword similarity. Cosine similarity evaluates the similarity of two vectors by measuring the cosine between the two vectors in an internal space. In contrast, Pearson’s correlation similarity measures the similarity between two vectors using correlation instead of distance measurements.

In this study, we quantify the subject relevance of reviewer candidates using cosine similarity, which is widely used for calculating string similarity. The cosine similarity for the keyword vectors A and B is given as

Similarity = \cos (⊖) = \frac{A • B}{‖ A ‖ ‖ B ‖} = \frac{\sum_{i = 1}^{n} A_{i} \times B_{i}}{\sqrt{\sum_{i = 1}^{n} {(A_{i})}^{2} \times \sqrt{\sum_{i = 1}^{n}} {(B_{i})}^{2}}}

(6)

The length or angle of the two keyword vectors can be obtained using a dot product, which calculates the sum of the products of the TF-IDF values of each keyword. The numerator represents the dot product of two keyword vectors. The similarity score of the “expertise keyword-reviewer” pair is derived for each user keyword. Reviewers with high similarity are selected as candidates, and the number of selected candidates is at least three times more than the number of reviewers required for each field.

3.4.2. Candidate Score Computation (C-Score)

In a given domain, the subject relevance of the reviewer candidate is calculated using the TF-IDF matrix of the candidate keyword vector highly related to the manuscript’s subject keyword. Then, the evaluated reviewer’s expertise (through the review of their thesis, patent, and publishing history) is counted as the total score that allows ranking, using time weights reflecting the difference between the published and current times. The expertise may vary depending on published research achievements related to the domain because of the difference in the frequency of publication of papers and patents for each candidate. Further, the candidate’s expertise may vary depending on the elapsed time since conducting the study. In other words, the longer the elapsed publication time, the longer the period of inactivity in the research field, which reduces the expertise in the domain. Therefore, there is a need to lower the weight of expertise or to penalize candidates for changes in interest. We introduce a Riemann zeta function as a weight to reflect the difference between the study and current points in time for each related keyword that undergoes similarity evaluation within the candidate group

ζ (s) = \sum_{n = 1}^{\infty} \frac{1}{n^{s}}

(7)

In this study, several integer representations are considered to evaluate the Riemann zeta function for points (s) > 1 as the time interval between the publication and current time point of the study deliverables (criteria for these time intervals vary depending on the situation or intention). The time weight ζ(s) is calculated based on information from the current time point (

y

) to the publication time point (

y_{i}

). The weights vary each year because the candidate’s recent research direction is assumed to be strong information that can reflect the candidate’s interest in the domain.

Let

K_{i}

represent a research career for a specific keyword at a specific time.

y

and

y_{i}

represent the time of recommendation and the time of research, respectively, and

C

represents the number of projects performed for related research in the specific year

y_{i}

. For zeta function ζ(s),

s > 1

, a new penalty score interval using the converging geometric series property can be defined as follows: when the difference (

y - y_{i}

) between the current and publication time points is zero,

s = 1.1

; s increases by 0.1 with an increase in the time interval (

y - y_{i}

). Thus, ζ(s) divides the difference (

y - y_{i}

) between the current and publication time points by 10 and converts the value added by 1.1 into a penalty score. Further, the converted figure is multiplied by the quantitative frequency of publications, which is a lower weight than the latest aspect because the number of related publications is an important indicator of a career even if it is not a relatively recent study. If the time weights are not applied (multiplied by the time interval (

y - y_{i}

) based on the frequency of publications (

C

)), the candidates with several past publications may be assigned a high ranking even if they have no recent publications. Thus, the recency of the expertise would be ignored, and only the frequency of publications would be reflected. However, the timing and frequency will matter, as demonstrated by the following case, if we compare the professional level (

K_{i}

) using

K_{i} = ζ (1.1 + \frac{y - y_{i}}{10}) \times C

(8)

Candidate A has three publications in 2021, three in 2018, and two in 2014, whereas candidate B has three in 2021, and four in 2019. In this case, candidate A has

C = 8

,

K_{i} = 42.7

and candidate B represents

C = 7

,

K_{i} = 43.9

. Therefore, candidate B has fewer publications than candidate A but a relatively recent publication.

The expertise score is calculated through the weighted sum of the time of research and number of studies based on (9) for each reviewer keyword as

C - score = \sum_{i = 1}^{N} K_{i}

(9)

The objective fitness can be determined if the weighted sum of every reviewer candidate is listed as a value between 0 and 1.

3.5. Avoiding Conflict of Interest Based on Researcher Network

3.5.1. Rules for Avoiding CoI

In real-world systems, the CoI should consider not only direct interests between authors and reviewers, but also potential interests between reviewers and reviewers. In the latter, the field of study is narrow, which can limit the number of researchers who are part of the field. This can lead to problems if there is no diversity among the reviewer members (e.g., most of them belong to the same advocacy group). The review process could become biased if there is an interest among some members in the five-member reviewer group, and therefore, it is important to avoid CoI with the plaintiff’s author (author) and form a group of various reviewers (Figure 6).

In this study, we exclude interests such as author–reviewer, reviewer–reviewer, and teacher–reviewer if they have experience conducting joint studies in the last 10 years, or belong to the same group of affiliates, and teacher–reviewer. This prevents bias within the reviewer group and addresses the potential CoI.

3.5.2. Highly Connected Subgraphs (HCS) Algorithm

The HCS algorithm is used to detect CoIs in the researcher’s network. The HCS algorithm is a graph division algorithm based on graph theory, and it is intended to cluster a set of nodes with a strong connection relationship on the graph, which divides the graph into minimal edge removal with a weak connection relationship. This can be used to divide researchers with weak intimacy based on the intimacy of researchers.

A network of researchers is first established using author information (main author, co-author, communication author, etc.) from the collected academic metadata. In the researcher network, the node is the author (researcher), and the link is defined as a link between the three presented perspectives. After data exploration, we reflect the inter-researcher connection strength and node size in the network based on the number of co-authored papers and co-authored papers. The HCS algorithm then clusters researchers based on the connection information between researchers, and the process is as shown in Algorithm 1. Finally, the reviewer excludes a set of nodes that are strongly connected and selects nodes classified as weakly connected.

Algorithm 1 Highly Connected Subgraphs

HCS(G(V,E))
begin

(H, \bar{H}, C) \leftarrow M I N C U T (G)

if G is highly connected
then return (G)
else

HCS (H)

HCS (\bar{H})

end if
end

4. Experimental Evaluation

The accuracy of matching the subject field of the academic paper with the recommended reviewer research field was evaluated to verify the proposed approach. The degree of matching refers to comparing thesis data with pre-stored reviewer data for recommending a reviewer in a field similar to the input thesis data. There are two main criteria for system success:

(1): The group of reviewers configured using the system should be the same or better than the results obtained through existing simple matching.
(2): The system should save considerable time.

According to these criteria, this paper compares subject matching and proposed system allocation methods using bibliographic datasets of papers listed in the KCI in the past five years (2017–2021). Further, the efficiency of the proposed approach is also evaluated. All experiments are performed on PCs running Windows 11 64-bit with Intel^® Core™ i7-10700K CPU @ 3.80 GHz, 64.0 GB RAM, and a solid state drive.

4.1. Dataset

Given that the actual process of recommending reviewers for academic conferences or journals is typically not accessible, we have created a dataset of paper-reviewer obtained from the KCI open access database. We have randomly selected bibliographic data from papers published in the past five years in seven major academic fields to generate this dataset. The dataset contains a total of 47,475 papers and includes vital information such as author identification (name and institution), paper content (title, abstract, and keywords), publication date, and research categorization.

4.2. Evaluatoin Metric

We believe that the study field of the recommended reviewer needs to match the requested domain in addition to the detailed subcategories. The expertise evaluates the consistency of the subject field of the submitted plaintiff by comparing the current research field of the expert. The matching experiment uses the author, affiliation, and five keywords as input to the system from the previously collected thesis dataset for each domain area, and it recommends five reviewers. Here, all members of the ranked reviewer group are evaluated. The following metrics are defined to evaluate the proposed method quantitatively: The root mean square error (RMSE), which is the square root of the mean square error (MSE), assigns smaller weights to observations with a larger prediction error than the MSE.

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(k_{a, i} - k_{r, i})}^{2}}

(10)

where

k_{a, i}

and

k_{r, i}

denote the academic field of the ith user keyword vector and the main research field of the reviewer having research experience related to

k_{a, i}

, respectively. Further,

N

represents the total number of test data points. Equation (10) helps calculate the difference in the number of datasets by the subject field and the effect on the cumulative matching score of the multidimensional domain. The RMSE indicates that the smaller the difference between

k_{a, i}

and

k_{r, i}

, the better is the performance; in addition, the accuracy of matching the subject field of academic papers with the research field of reviewers is high.

We assume that only the result score of the data recommended by the five reviewers is calculated through reviewer matching when calculating matching accuracy. The score matching the research field of each reviewer with subject field of the submitted manuscript is calculated as follows: The degree of matching between large and medium classifications is considered, and subclassifications or less are ignored. One point is assigned if both large and medium categories match. If the major classification matches and middle classification fails to match, 0.5 points are assigned; if both classifications fail to match, zero points are assigned. The matching accuracy is measured using system logs.

In addition, if there are fewer than 2000 thesis data points in the academic subject field of a specific domain, it is assumed that the average RMSE value of the relevant fields is measured. This condition is used for treating a field with a small or insufficient number of data samples with a sample size considering distribution characteristics and population size.

5. Analysis of Results

5.1. Performance Analysis of Keyword Extraction

In this section, we compare the performance of keyword extraction using TF-IDF and KeyBERT [35] on a sample of 1000 randomly selected papers. We only use the title and abstract information to extract the keywords, and measure the similarity between the extracted keywords and the author-assigned keywords to evaluate the performance of each method. The results show that TF-IDF provides better performance than KeyBERT on the KCI dataset, as shown in Figure 7.

5.2. Recommenation Evaluation

The keyword matching approach was used as a baseline to verify the proposed approach. Compared to the baseline method, the proposed system considers the keyword similarity between the submitted paper and the reviewer, recency of the research period, frequency of research activities, and CoI. In particular, the recency of the research period is weighted according to the difference in each period, which helps to reduce the expertise score over time. The experiment evaluated the accuracy of matching research fields using keywords from the publications of reviewer candidates. The accuracy of matching the subject field of academic papers for multidimensional domains and the research field of the recommended reviewer in the proposed system showed an RSME of 0.76.

We conclude that adding multiple attributes rather than the baseline (RMSE = 0.75), which are calculated considering only the keyword matching frequency, has no effect and can help accurately recommend reviewers. The matching accuracy of the reviewers by domains is illustrated in Figure 8. The RMSE for multidimensional domains was measured considering the distribution characteristics of the population and the number of data samples. This method divides the sum of RMSEs by field with the sum of the datasets. Thus, an RMSE value of 0.76 is obtained by applying weights based on the number of data of the RMSE value for each field output from reviewer matching.

Table 5 summarizes the comparison results with the baseline and proposed approaches. For example, a randomly entered paper in the computer science engineering domain has keywords artificial intelligence, big data, cloud, and blockchain as the subject. The baseline matches the keywords, and the higher frequency of keywords in publications suggests higher expertise. The baseline domain was engineering, and only one reviewer majoring in computer engineering was accurately matched; reviewers from other fields were matched from the 1st to 5th. Further, the recency of expertise has no effect on the ranking of the reviewer. In the baseline method, ranks 1 and 2 have almost similar expertise, with no significant difference in the number of studies conducted. However, the timing of studies shows that reviewer No. 2 has more recent studies and may be more effective. The accuracy of reviewer allocation based on the field may be poor because baseline decisions are influenced by the number of studies conducted. In contrast, all recommended reviewers were from similar domains and majors when using our approach. The matching scores were calculated by assigning high weights to the recentness-of-expertise from the 1st to 5th places.

5.3. Efficiency Evaluation

We set up and verified scenarios in which reviewers are assigned by academic societies for evaluating the efficiency of the proposed Top-N reviewer group selection algorithm. The Korean Conference of the Korea Computer Congress (KCC 2021) published 746 of the 810 submitted papers. It took a sufficiently long time for the editor to manually select 252 reviewers. We measured the total time required to recommend the Top-N reviewer groups (N = 2–10) 1000 times with and without activating CoI prevention rules (Figure 9). Times of 1863, 1879, 1890, and 1902 s were required when recommending the Top-2, -4, -6, and -8 reviewer groups, respectively, without considering CoI. Times of 1854, 1888, 1890, and 1902 s, respectively, were required when considering CoI for the Top-2, -4, -6, and -8 reviewer groups. The experimental results showed that even when considering CoI prevention rules, there is no difference in the recommendation time. Thus, a list of reviewers can be recommended within approximately 2 s per paper when using the proposed system. The performance evaluation result of the REST API, to which the proposed algorithm is applied, processes all responses within an average of 170 s when 200 people request simultaneously; it can process 44 requests per minute (Figure 10).

6. Conclusions and Future Work

The expertise index technique proposed in this paper allows researchers in a specific domain to simultaneously capture contributions, persistence, and up-to-dateness of researchers in a specific domain by considering the researchers’ performance statistics (number of publications, author types, etc.). We confirmed that the results of the application of the HCS algorithm for clustering similar researchers based on intimacy can achieve effective results in preventing CoI. The proposed system has the latest knowledge in the domain at each recommendation point while providing priority to reviewers with considerable research experience. Experiments on datasets show that the proposed approach can effectively and efficiently match reviewers and queries. We significantly improved reviewer recommendation performance and demonstrated that error measurements such as RMSE show good performance considering the high level of expertise and recent research directions.

The proposed approach has four major contributions:

(1): Considering other important attributes such as state-of-the-art and diversity compared to the topic matching method, the proposed method improved recommendation accuracy and achieved more objective and reliable results.
(2): The potential CoI between authors–reviewers and reviewers–reviewers was minimized. The system users can select or remove CoI functions to provide coordination and feedback if required.
(3): Many time-consuming chores that administrators perform manually were reduced significantly, and considerable time savings were achieved.
(4): Matching accuracy and system efficiency were verified through experiments, and the system load was effectively reduced.

Thus, the proposed approach provides robust support for real-world managers to effectively and efficiently perform reviewer recommendations for proposals.

In future studies, we intend to collect academic data from overseas researchers and to expand the model to meet global services. Further, as the proposed approach exhibited deviations when the number of data points in each field decreased, future work will address this issue and improve the recommendation accuracy. Moreover, we will incorporate the selection of reviewers with high acceptance rates for review requests through social-network-based reputation analysis, which can be extended to the online reputation inquiries of other companies and individuals, management services, and job search services.

Author Contributions

Conceptualization, G.S.; methodology, G.S., Y.I. and M.C.; software, Y.I. and M.C.; validation, Y.I. and M.C.; formal analysis, Y.I.; investigation, Y.I.; resources, G.S.; data curation, Y.I. and M.C.; writing—original draft preparation, Y.I.; writing—review and editing, G.S.; visualization, Y.I. and M.C.; supervision, G.S.; project administration, G.S.; funding acquisition, G.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partly supported by an Institute of Information and Communications Technology Planning and Evaluation (IITP) grant funded by the Korean government (MSIT) (No.2020-0-01963, All-in-one software platform for virtual (untact) conferences) and the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. 2020R1F1A1076924).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

The ethical review and approval were waived for this study as the researcher information used for the expert recommendation is from a publicly available open scholarly database and was used in a de-identified manner.

Data Availability Statement

Data is unavailable due to privacy or ethical restrictions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Xia, F.; Wang, W.; Bekele, T.M.; Liu, H. Big scholarly data: A survey. IEEE Trans. Big Data 2017, 3, 18–35. [Google Scholar] [CrossRef]
Wang, D.; Liang, Y.; Xu, D.; Feng, X.; Guan, R. A content-based recommender system for computer science publications. Knowl.-Based Syst. 2018, 157, 1–9. [Google Scholar] [CrossRef]
Duan, Z.; Tan, S.; Zhao, S.; Wang, Q.; Chen, J.; Zhang, Y. Reviewer assignment based on sentence pair modeling. Neurocomputing 2019, 366, 97–108. [Google Scholar] [CrossRef]
Sun, Y.-H.; Ma, J.; Fan, Z.-P.; Wang, J. A hybrid knowledge and model approach for reviewer assignment. In Proceedings of the 2007 40th Annual Hawaii International Conference on System Sciences (HICSS’07), Big Island, Hawaii, 3–6 January 2007; p. 47. [Google Scholar]
Zhao, S.; Zhang, D.; Duan, Z.; Chen, J.; Zhang, Y.-P.; Tang, J. A novel classification method for paper-reviewer recommendation. Scientometrics 2018, 115, 1293–1313. [Google Scholar] [CrossRef]
Nguyen, J.; Sánchez-Hernández, G.; Agell, N.; Rovira, X.; Angulo, C. A decision support tool using Order Weighted Averaging for conference review assignment. Pattern Recognit. Lett. 2018, 105, 114–120. [Google Scholar] [CrossRef]
Kalmukov, Y. An algorithm for automatic assignment of reviewers to papers. Scientometrics 2020, 124, 1811–1850. [Google Scholar] [CrossRef]
Pradhan, T.; Sahoo, S.; Singh, U.; Pal, S. A proactive decision support system for reviewer recommendation in academia. Expert Syst. Appl. 2021, 169, 114331. [Google Scholar] [CrossRef]
Ishag, M.I.M.; Park, K.H.; Lee, J.Y.; Ryu, K.H. A pattern-based academic reviewer recommendation combining author-paper and diversity metrics. IEEE Access 2019, 7, 16460–16475. [Google Scholar] [CrossRef]
Protasiewicz, J.; Pedrycz, W.; Kozłowski, M.; Dadas, S.; Stanisławek, T.; Kopacz, A.; Gałężewska, M. A recommender system of reviewers and experts in reviewing problems. Knowl.-Based Syst. 2016, 106, 164–178. [Google Scholar] [CrossRef]
Tan, S.; Duan, Z.; Zhao, S.; Chen, J.; Zhang, Y. Improved reviewer assignment based on both word and semantic features. Inf. Retr. J. 2021, 24, 175–204. [Google Scholar] [CrossRef]
Cagliero, L.; Garza, P.; Pasini, A.; Baralis, E. Additional reviewer assignment by means of weighted association rules. IEEE Trans. Emerg. Top. Comput. 2018, 9, 329–341. [Google Scholar] [CrossRef]
Liu, O.; Wang, J.; Ma, J.; Sun, Y. An intelligent decision support approach for reviewer assignment in R&D project selection. Comput. Ind. 2016, 76, 1–10. [Google Scholar] [CrossRef]
Peng, H.; Hu, H.; Wang, K.; Wang, X. Time-aware and topic-based reviewer assignment. In Proceedings of the International Conference on Database Systems for Advanced Applications, Suzhou, China, 27–30 March 2017; pp. 145–157. [Google Scholar]
Hartuv, E.; Shamir, R. A clustering algorithm based on graph connectivity. Inf. Process. Lett. 2000, 76, 175–181. [Google Scholar] [CrossRef]
Nikzad–Khasmakhi, N.; Balafar, M.; Feizi–Derakhshi, M.R. The state-of-the-art in expert recommendation systems. Eng. Appl. Artif. Intell. 2019, 82, 126–147. [Google Scholar] [CrossRef]
Jalili, M.; Ahmadian, S.; Izadi, M.; Moradi, P.; Salehi, M. Evaluating collaborative filtering recommender algorithms: A survey. IEEE Access 2018, 6, 74003–74024. [Google Scholar] [CrossRef]
Kanwal, S.; Nawaz, S.; Malik, M.K.; Nawaz, Z. A review of text-based recommendation systems. IEEE Access 2021, 9, 31638–31661. [Google Scholar] [CrossRef]
Xiong, R.; Wang, J.; Zhang, N.; Ma, Y. Deep hybrid collaborative filtering for web service recommendation. Expert Syst. Appl. 2018, 110, 191–205. [Google Scholar] [CrossRef]
Xie, X.; Li, Y.; Zhang, Z.; Pan, H.; Han, S. A topic-specific contextual expert finding method in social network. In Proceedings of the Asia-Pacific Web Conference, Suzhou, China, 23–25 September 2016; pp. 292–303. [Google Scholar]
Chughtai, G.R.; Lee, J.; Shahzadi, M.; Kabir, A.; Hassan, M.A.S. An efficient ontology-based topic-specific article recommendation model for best-fit reviewers. Scientometrics 2020, 122, 249–265. [Google Scholar] [CrossRef]
Mirzaei, M.; Sander, J.; Stroulia, E. Multi-aspect review-team assignment using latent research areas. Inf. Process. Manag. 2019, 56, 858–878. [Google Scholar] [CrossRef]
Tayal, D.K.; Saxena, P.C.; Sharma, A.; Khanna, G.; Gupta, S. New method for solving reviewer assignment problem using type-2 fuzzy sets and fuzzy functions. Appl. Intell. 2014, 40, 54–73. [Google Scholar] [CrossRef]
Shon, H.S.; Han, S.H.; Kim, K.A.; Cha, E.J.; Ryu, K.H. Proposal reviewer recommendation system based on big data for a national research management institute. J. Inf. Sci. 2017, 43, 147–158. [Google Scholar] [CrossRef]
Hettich, S.; Pazzani, M.J. Mining for proposal reviewers: Lessons learned at the national science foundation. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 20–23 August 2006; pp. 862–871. [Google Scholar]
Dehghan, M.; Biabani, M.; Abin, A.A. Temporal expert profiling: With an application to t-shaped expert finding. Inf. Process. Manag. 2019, 56, 1067–1079. [Google Scholar] [CrossRef]
Pradhan, D.K.; Chakraborty, J.; Choudhary, P.; Nandi, S. An automated conflict of interest based greedy approach for conference paper assignment system. J. Informetr. 2020, 14, 101022. [Google Scholar] [CrossRef]
Yan, S.; Jin, J.; Geng, Q.; Zhao, Y.; Huang, X. Utilizing Academic-Network-Based Conflict of Interests: For Paper Reviewer Assignment; LAP LAMBERT Academic Publishing: Saarbruecken, Germany, 2019. [Google Scholar]
Mun, H.J.; Lee, S.M.; Woo, Y.T. Researcher Clustering Technique based on Weighted Researcher Network. J. Korea Soc. Digit. Ind. Inf. Manag. 2009, 5, 1–11. [Google Scholar] [CrossRef]
Mihalcea, R.; Tarau, P. Textrank: Bringing order into text. In Proceedings of the Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 25–26 July 2004; pp. 404–411.
Yao, L.; Pengzhou, Z.; Chi, Z. Research on news keyword extraction technology based on TF-IDF and TextRank. In Proceedings of the 2019 IEEE/ACIS 18th International Conference on Computer and Information Science (ICIS), Beijing, China, 17–19 June 2019; pp. 452–455. [Google Scholar]
Beltrán, N.H.; Duarte-Mermoud, M.A.; Vicencio, V.A.S.; Salah, S.A.; Bustos, M.A. Chilean wine classification using volatile organic compounds data obtained with a fast GC analyzer. IEEE Trans. Instrum. Meas. 2008, 57, 2421–2436. [Google Scholar] [CrossRef]
Sanjaa, B.; Chuluun, E. Malware detection using linear SVM. In Proceedings of the Ifost, Ulaanbaatar, Mongolia, 28 June–1 July 2013; pp. 136–138. [Google Scholar]
Kumar, M.A.; Gopal, M. A hybrid SVM based decision tree. Pattern Recognit. 2010, 43, 3977–3987. [Google Scholar] [CrossRef]
Grootendorst, M. Keybert: Minimal Keyword Extraction with Bert. Available online: https://maartengr.github.io/KeyBERT/index.html (accessed on 13 January 2023).

Figure 1. Framework of the Top-K reviewer recommendation system without CoI.

Figure 2. Document preprocessing steps.

Figure 3. SVM-based multiclass classifier.

Figure 4. Expert datasets retention status by domain.

Figure 5. Expertise factor.

Figure 6. Rules for avoiding CoI.

Figure 7. Comparison of the similarity of each document with author-assigned keywords in different approaches.

Figure 8. Accuracy (as captured by the RMSE value) of reviewer matching by domain.

Figure 9. Efficiency performance comparison of deactivated CoI and activated CoI.

Figure 10. Efficiency performance comparison of proposed algorithms.

Table 1. Summary of the data sources in Korea.

Data Source	Description	URL
KCI	The KCI provides information about registered journals selected through the journal assessment of the National Research Foundation of Korea and published papers.	https://www.kci.go.kr (accessed on 15 June 2022)
KRM	The KRM provides a search service that accesses a vast amount of research results, including research tasks, results reports, and papers indexed in databases such as KCI, SCI, and SCOPUS.	https://www.krm.or.kr (accessed on 23 June 2022)
RISS	Integrated search service for domestic and foreign academic papers, degree papers, books, and open lectures operated by the Korea Education and Research Information Service (KERIS)	http://www.riss.kr (accessed on 12 June 2022)
DBPia	The DBpia is a Korean-style academic information portal that searches domestic academic journals and dissertations and provides full-text service for publications published by excellent academic societies in Korea.	http://www.dbpia.co.kr (accessed on 12 June 2022)
ScienceON	Service that provides the knowledge infrastructure needed by researchers in one place by linking and converging scientific information, national R&D information, research data, information analysis services, and research infrastructure.	https://scienceon.kisti.re.kr (accessed on 12 June 2022)
NTIS	National Science and Technology Knowledge Information portal which provides information on national R&D projects such as projects, tasks, researchers, and achievements, in one place.	https://www.ntis.go.kr (accessed on 13 June 2022)
KRI	The Korea Academic Promotion Foundation provides information on the achievement of researchers on a national scale that integrates and links the research achievement information of each university.	https://www.kri.go.kr (accessed on 15 June 2022)

Table 2. Description of the paper metadata.

Metadata	Description
ID	Paper ID
Title	Title of paper
Abstract	Abstract of paper
Authors	Name, affiliation, major, email, position of authors as listed
Year	Year of publication
Keywords	Author keywords
Publisher	Publisher name
URI	Uniform resource identifier

Table 3. Sample of TF-IDF document–keyword matrix.

Document	Palmitic Acid	csf	Inflammatory	Macrophage	Receptor	Cell
Doc1	0.407893	0.395015	0.376864	0.338935	0.335846	0.237157
Doc2	0	0	0	0	0	0
Doc3	0	0	0.135315	0.654413	0	0
Doc4	0.168545	0.145716	0.221245	0	0	0

Table 4. Academic research classification system of the NRF of Korea.

Specific Field	Intermediate Field	Specific Field	Detailed Field
Humanities	23	167	298
Social science	22	269	479
Natural science	13	135	371
Engineering	28	310	457
Medicine and pharmacy	39	409	648
Arts and physical education	7	64	132
Agriculture, fisheries, and oceanography	12	104	61
Compound science	8	93	22
Total	152	1551	2468

Table 5. Comparison of the baseline and our approach.

Rank	Baseline			Our Approach
Rank	Domain (Field)	Keyword (Y, C)	C-Score	Domain (Field)	Keyword (Y, C)	C-Score
1	Social science (Social work)	Cloud (2016, 3), Big Data (2016, 2), Block Chain (2019, 2), AI(2019, 1)	98.78	Engineering (Computer Engineering)	Cloud (2021, 2, 2020, 1), Big Data (2021, 2, 2020, 2), Block Chain (2019, 1), AI (2021, 2)	98.84
2	Engineering (Electrical engineering)	Cloud (2018, 1, 2020: 1), Big Data (2020, 2), Block Chain (2020, 1), AI(2019, 2)	97.58	Engineering (Computer Engineering)	Cloud (2020, 2), Big Data (2020, 1, 2019, 1), Block Chain (2019, 1), AI (2020, 2, 2019, 1)	98.38
3	Medicine and pharmacy (Medicine)	Cloud (2020: 2), Big Data (2015, 1), Block Chain (2020, 2), AI (2020, 1)	95.26	Engineering (Computer Engineering)	Cloud (2020, 1), Big Data (2019, 2), Block Chain (2019, 1), AI (2020, 1, 2019, 2)	96.96
4	Engineering (Computer engineering)	Cloud (2020: 2), Big Data (2020, 2), AI (2020, 2)	72.56	Engineering (Computer Engineering)	Cloud (2019, 1), Big Data (2017, 2), Block Chain (2018, 2), AI (2019, 1; 2017, 1)	96.81
5	Social science (Business administration)	Big Data (2020, 1), AI (2020, 2, 2019, 1)	58.79	Engineering (Computer Engineering)	Cloud (2018, 3), Big Data (2017, 1, 2014, 2), Block Chain (2017, 1), AI(2017, 1; 2015, 1)	96.07

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Im, Y.; Song, G.; Cho, M. Perceiving Conflict of Interest Experts Recommendation System Based on a Machine Learning Approach. Appl. Sci. 2023, 13, 2214. https://doi.org/10.3390/app13042214

AMA Style

Im Y, Song G, Cho M. Perceiving Conflict of Interest Experts Recommendation System Based on a Machine Learning Approach. Applied Sciences. 2023; 13(4):2214. https://doi.org/10.3390/app13042214

Chicago/Turabian Style

Im, Yunjeong, Gyuwon Song, and Minsang Cho. 2023. "Perceiving Conflict of Interest Experts Recommendation System Based on a Machine Learning Approach" Applied Sciences 13, no. 4: 2214. https://doi.org/10.3390/app13042214

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Perceiving Conflict of Interest Experts Recommendation System Based on a Machine Learning Approach

Abstract

1. Introduction

2. Related Work

2.1. Reviewer Recommendation System

2.1.1. Recommendation System

2.1.2. Reviewer Recommendation

2.2. Conflict of Interest (CoI)

3. Proposed Methodology

3.1. Data Acquisition

3.2. Preprocessing Collected Data

3.2.1. Name Disambiguation

3.2.2. Keyword Extraction

3.3. Documents Categorization Using SVM-Based Classifier

3.3.1. SVM Algorithm

3.3.2. Development of SVM Classifiers for Multiclass Problems

3.3.3. Expert Pool Implementation

3.4. Expertise Score Computation and Reviewer Ranking

3.4.1. Cosine Similarity

3.4.2. Candidate Score Computation (C-Score)

3.5. Avoiding Conflict of Interest Based on Researcher Network

3.5.1. Rules for Avoiding CoI

3.5.2. Highly Connected Subgraphs (HCS) Algorithm

4. Experimental Evaluation

4.1. Dataset

4.2. Evaluatoin Metric

5. Analysis of Results

5.1. Performance Analysis of Keyword Extraction

5.2. Recommenation Evaluation

5.3. Efficiency Evaluation

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI