Next Article in Journal
Fate of Duplicated Neural Structures
Next Article in Special Issue
Information Theory: Deep Ideas, Wide Perspectives, and Various Applications
Previous Article in Journal
Statistical and Proactive Analysis of an Inter-Laboratory Comparison: The Radiocarbon Dating of the Shroud of Turin
Previous Article in Special Issue
Project Management Monitoring Based on Expected Duration Entropy
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Letter

Inferring Authors’ Relative Contributions to Publications from the Order of Their Names When Default Order Is Alphabetical

Department of Industrial Engineering, Tel Aviv University, Tel-Aviv 69978, Israel
Entropy 2020, 22(9), 927; https://doi.org/10.3390/e22090927
Submission received: 29 January 2020 / Revised: 19 August 2020 / Accepted: 20 August 2020 / Published: 24 August 2020
(This article belongs to the Special Issue Applications of Information Theory to Industrial and Service Systems)

Abstract

:
In attributing individual credit for co-authored academic publications, one issue is how to apportion (unequal) credit, based on the order of authorship. Apportioning credit for completed joint undertakings has always been a challenge. Academic promotion committees are faced with such tasks regularly, when trying to infer a candidate’s contribution to an article they coauthored with others. We propose a method for achieving this goal in disciplines (such as the author’s) where the default order is alphabetical. The credits are those maximizing Shannon entropy subject to order constraints.

1. Introduction

More and more published research is a collaboration of several researchers (Shapiro et al., 1994 [1]). As various promotion committees need to know and estimate the contribution and the quality of an individual researcher, that raises the bibliometric issue of apportioning individual credit by the “fractional counting“ of joint publications (e.g., references [2,3,4]). Abbas (2011) [2] proposes a set of indices to evaluate the quality of research produced by an author, while Egghe (2008) [4] focuses on a mathematical theory of the h- and g-index in the case of the fractional counting of authorship.
In this paper, we focus on disciplines where the default order of authors is alphabetical (e.g., Social Science and Mathematics; Liu and Fang 2014 [5]). Thus, for example, an order such as (B, A, C) indicates that B’s contribution was “significantly” larger than A’s, while C’s was not. We shall thus assume that each discipline has a “standard”, where if, for example, B’s contribution exceeds A’s by more than the standard, their order will be switched. Therefore, the default order (A, B, C) indicates that neither B’s nor C’s excess contribution exceeds the standard. Other than these inferences, which become constraints on the fractional contributions, we shall assume that the contributions are as uncertain as possible.
Suppose there is a disciplinary standard ε ,   0 < ε < 1 , such that the alphabetical order is not changed unless the difference in contributions, in favor of the alphabetically latter author, is deemed to be larger than ε . Thus, if, for example, the order of three authors is (B, A, C) (meaning that the author who is alphabetically second is listed before the one who is alphabetically first), it reflects the fact that B’s contribution exceeds A’s by more than ε , while C’s contribution does not exceed A’s (and thus also not B’s) by more than ε .   A large value of ε indicates strong adherence to an alphabetical order, while small values correspond to a high sensitivity to the actual relative contributions. The standard ε   may or may not be known to those wishing to evaluate the contributions.
We shall make use of the constrained maximal (Shannon) entropy approach, reflecting the most diffused contribution distribution that satisfies the implications of the limited information given by the order. Constrained maximal entropy has been used, among other applications, in physics [6] and finance [7], where the constraints were the mean and/or variance of the distribution. Our constraints are simpler, so solving the problem is often rather trivial. First, we deal with estimating the mean contribution, and then, we propose an appropriate multivariate distribution.

2. Mean Contribution

Start with two authors (A and B), with the respective unknown expected contributions p for A and 1 p for B, which we wish to infer. Note that although p is a share and not a probability, we shall perform probability operations on it. The entropy function,
p log p 1 p log 1 p , is concave in p with its maximum at p = 1 2 , regardless of the base of the logarithm (e.g., Cover and Thomas 2006 [8]). We shall assume that each author’s contribution is at least δ δ < 1 4 . Now, the order (A, B) implies that p > 1 p ε , i.e., that p > 1 ε 2 . Since 1 ε 2 < 1 2 ε , it follows that the constrained entropy is maximized at p * = 1 2 . Thus, the authors are deemed to have contributed equally(!). The order (B, A) implies that 1 p > p + ε , i.e., that
p < 1 ε 2 < 1 2 , so p * = 1 ε 2 ,   ε < 1 . If 1 2 1 2 ε < δ , then p = δ . Thus, if, for example, ε = 1 4 , A’s mean contribution is estimated at 3 8 . If ε = 1 4 A’s mean contribution is m a x 1 8 , δ , where the low value reflects A’s demotion despite the high threshold.
For three authors A, B and C, and respective unknown mean shares p ,   q ,   1 p q , the unconstrained entropy is jointly concave and maximized at 1 3 ,   1 3 ,   1 3 . Now, if the order is (A, B, C), it follows that p > q ε and q > 1 p q ε . The feasible solution with the highest entropy is 1 3 ε 2 ,   1 3 ,   1 3 + ε 2 . If the order is (B, A, C), the constraints are q > p + ε , q > 1 p q ε and 0 p + q 1 . Among the feasible solutions, the one that maximizes the entropy is 1 3 2 2 ε   ,   1 3 + 1 3 ε ,   1 3 + 1 3 ε   , assuming ε < 1 2 (note that our notation lists the relative contributions in the alphabetical order of the authors).
Note that C’s contribution, 1 3 + 1 3 ε , is deemed to be larger that A’s 1 3 2 3 ε , even though they appear later in the order (but not by more than 2 ε ).
For (B, C, A), the solution is 1 3 2 3 ε ,   1 3 + 1 3 ε ,   1   3 + 1 3 ε ,   ε < 1 2 . If 1 3 2 3 ε < δ , then we have δ , δ + ε ,   δ + ε . Note that if ε is large, then as A was nevertheless demoted to last, its contribution is deemed to be negligible (another school of thought is that if all the authors were essential to the research, they should receive equal credit. If one adopts the philosophy that all the authors were essential to the creation of the paper, a possible approach would be to average the above order-dependent shares with equal shares. Thus, for example, the order (B, C, A) will result in 1 2 1 3 ε ,   1 3 + ε 2 ,   1 3 + ε 2 + 1 3 ,   1 3 ,   1 3 = 1 3 ε 2 ,   1 3 + ε 4 ,   1 3 + ε 4 ,   ε < 2 3 ).
For (A, C, B), the solutions is 1 3 1 3 ε ,   1 3 1 3 ε ,   1 3 + 2 3 ε ,   ε < 1 . If 1 3 1 3 ε < δ , then δ ,   δ ,   1 2 δ . If ε < 1 , then the allocation is δ , δ ,   1 2 δ , so A’s and B’s contribution is deemed negligible. The intuition here is that despite the high requirement for reversing an order ε > 1 , C has overtaken B.
For (C, B, A), the solution is 1 3 ε ,   1 3 , 1 3 + ε , ε + δ < 1 .
Consider now the case of four authors, whose mean contributions p ,   q ,   r ,   1 p q r we wish to find.
For (A, B, C, D), the mean contributions need to satisfy
q p < ε  
r q < ε  
r q < ε  
1 p q r r < ε  
r > 1 p q ε 2
  Max entropy is attained at
1 4 3 2 ε ,   1 4 1 2 ε ,   1 4 + 1 2 ε ,   1 4 + 3 2 ε , ε < 1 6
For (D, C, B, A), the mean contributions need to satisfy
1 p q r > r + ε
r < 1 p q ε 2
r > q + ε
p < q ε
  Max entropy is attained at
1 4 3 2 ε ,   1 4 1 2 ε ,   1 4 + 1 2 ε ,   1 4 + 3 2 ε ,   ε < 1 6 .
If ε > 1 6 , then the allocation is 0 ,   1 6 ,   1 3 ,   1 2   . For (A, B, D, C), we obtain 1 4 ε ,   1   4 ,   1 4 ,   1 4 + ε , and so forth.
Note that in all cases, the expected contributions are either independent of ε or dependent on it linearly. Thus, if ε   is a random variable (with some subjective distribution), the only change required is the substitution of E ε for ε wherever it appears.

3. Joint Distribution of Relative Contributions

We shall now assume that the joint distribution of the relative contributions is believed to be Dirichlet (e.g., Kotz, Balakrishnan and Johnson 2000, 40.1 [9]). That is, the joint density of the relative contributions p 1 , , p m is
f p 1 , , p m p 1 , , p m = Γ j = 0 m θ j j = 1 m Γ θ j 1 j = 1 m p j θ 0 1 j = 1 m p j θ j 1 ,       p j 0 ,     j = 1 , , m ,       j = 1 m p j 1 .
We have θ 0 = 1 , so E p i = θ i 1 + j = 1 m θ j ,   i = 1 , , m . Note that the marginal density of p i is beta θ i ,   j = 0 m θ j θ i , so E p i = θ i j = 0 m θ j ,
and
Var   p i = θ i j = 0 m θ j θ i j = 1 m θ j 2 j = 1 m θ j + 1
and
corr   p i ,   p j = θ i θ j k = 0 m θ k θ i k = 0 m θ k θ j .
As we wish to allocate the whole credit to the authors, we have θ 0 = 1 ,
So E p i = θ i 1 + j = 1 m θ j ,   i = 1 , , m .
Note that if we have already estimated (inferred) the mean relative contributions p 1 , , p m , then, to maintain these ratios, we need to have θ i = k p i ,   i = 1 , , m , for some k > 0 . The choice of k will determine the parameters of the (Dirichlet) distribution.

4. Effect of Number of Authors

How does the number of authors affect the relative contribution of one of them? Consider, for concreteness, A’s contribution in orders where he/she is last:
2 .   B A 1 2 ε 2   ,   ε < 1 .
3 a .   C B A δ
3 b .   B C A 1 3 ε ,   ε < 1 3
We see that, in the case of three authors, A’s relative contribution depends on the order of B and C; in 3a, C needs to be rewarded for “overtaking “ B (as well as A), which reduces A’s contributions.
4 a .   D C B A 1 4 3 2 ε ,   ε < 1 6
4 b .   B C D A 1 4 3 ε ,   ε < 1 12  
Now, for 3b and 4a, the relative contribution of A is:
B A = 1 2 ε 2 B C A = 1 3 ε = 3 1 ε 2 1 3 ε ,   B C A = 1 3 ε D C B A = 1 4 3 2 ε = 4 1 3 ε 3 1 6 ε .
The former ratio is larger than the latter for, and only for, ε < 15 + 294 36 0.07 . Thus, no general conclusion is possible.
For 3a and 4a, B A B C A is larger if ε > 7 + 25 + 18 δ 2 12 .
For 4b, the former ratio is larger if 1 < ε < 13 12 .
For n authors in alphabetical internal order with A at the end, A’s contribution is 1 n n 1 2 ε ,   ε < 2 n 1 n . If ε > 2 n 1 n   , A’s contribution is δ .

5. Short Discussion and Concluding Remarks

In this paper, we attempted to quantify the significance of deviations from some “natural“ or “default“ order of authors. We assumed that the unknown relative contributions maximize entropy subject to constraints reflecting default order reversal.
Future study could further demonstrate and validate the proposed method by empirical and data-driven methods. For example, a comparison could be made of academics’ rankings by the H-index (or some other measures), compared to authors’ rankings by their total fractional journal paper contributions. Another option is to apply this study to journals where the authors’ contributions are required and published, while comparing it to the order of the authors’ names. Clearly, a survey could be conducted over some well-cited papers, asking the authors to ascribe their fractional contributions, while comparing them to those determined by the proposed method.
We note that the proposed measure, if and when it will be used for personal evaluation and promotion, should be combined with qualitative assessments, as often done in T&P committees. Otherwise, “automated” evaluation metrics alone could encourage game-playing among collaborators, which would be an uninvited outcome.
Finally, let us note that several related applications could use, with required modifications, the proposed method—for example, for assessing the contribution of programmers and AI agents in tasks distributed and published over the Internet. One scenario with some similarity to the considered problem is the following.
In sports, some media “power rank” (PR) teams during the season. The PR is usually consistent with the number of wins the teams have achieved, but not always (there might be other factors such as recent injuries, recent performance, etc.).
Therefore, suppose that the number of wins of two teams A and B n , m are such that n = m + 1 , while team B is power ranked before A. Thus, 1 p > p + ε p * = 1 2 1 2 ε .
If n = m + 2 ,   1 p > p + 2 ε p * = 1 2 ε .

Funding

This research was partially funded by the Koret Foundation grant for “Smart Cities and Digital Living 2030”.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Shapiro, D.W.; Wenger, N.S.; Shapiro, M.F. The Contributions of Authors to Multi authored Biomedical Research Papers. J. Am. Med Assoc. 1994, 271, 438–442. [Google Scholar] [CrossRef]
  2. Abbas, A. Weighted Indices for Evaluating the Quality of Research with Multiple Authorship. Scientometrics 2011, 88, 107–131. [Google Scholar] [CrossRef] [Green Version]
  3. Tol, S.J. Credit where Credit’s Due: Accounting for Co-authorship in Citation Counts. Scientometrics 2011, 89, 291–299. [Google Scholar] [CrossRef] [Green Version]
  4. Egghe, L. Mathematical Theory of the h-and g-index in case of Fractional counting of Authorship. J. Am. Soc. Inf. Sci. Technol. 2008, 59, 1608–1616. [Google Scholar] [CrossRef]
  5. Liu, X.; Fang, H. The Impact of Publications from Mainland China on Trends in Alphabetical Authorship. Scientometrics 2014, 99, 865–879. [Google Scholar] [CrossRef]
  6. Jaynes, E.T. Probability Theory in Science and Engineering. In Colloquium Lectures in Pure and Applied Science; Field Research Laboratory, Socony Mobil Oil Company: Dallas, TX, USA, 1958. [Google Scholar]
  7. Cozzolino, J.M.; Zahner, M.J. The Maximum-Entropy Distribution of the Future Market Price of a Stock. Oper. Res. 1973, 21, 1200–1211. [Google Scholar] [CrossRef]
  8. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 1999. [Google Scholar]
  9. Kotz, S.; Johnson, N.L.; Balakrishnan, N. Continuous Multivariate Distributions: Model and Applications, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2000. [Google Scholar]

Share and Cite

MDPI and ACS Style

Gerchak, Y. Inferring Authors’ Relative Contributions to Publications from the Order of Their Names When Default Order Is Alphabetical. Entropy 2020, 22, 927. https://doi.org/10.3390/e22090927

AMA Style

Gerchak Y. Inferring Authors’ Relative Contributions to Publications from the Order of Their Names When Default Order Is Alphabetical. Entropy. 2020; 22(9):927. https://doi.org/10.3390/e22090927

Chicago/Turabian Style

Gerchak, Yigal. 2020. "Inferring Authors’ Relative Contributions to Publications from the Order of Their Names When Default Order Is Alphabetical" Entropy 22, no. 9: 927. https://doi.org/10.3390/e22090927

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop