Next Article in Journal
A Bayesian Statistical Model Is Able to Predict Target-by-Target Selection Behaviour in a Human Foraging Task
Previous Article in Journal
Assessing the Performance of a Novel Bayesian Algorithm at Point of Care for Red Eye Complaints
 
 
Communication
Peer-Review Record

Musical Novices Are Unable to Judge Musical Quality from Brief Video Clips: A Failed Replication of Tsay (2014)

by Jonathan M. P. Wilbiks * and Sung Min Yi
Submission received: 28 September 2022 / Revised: 1 November 2022 / Accepted: 7 November 2022 / Published: 9 November 2022

Round 1

Reviewer 1 Report

I found this paper well written and well conceived, and believe it should be published. The finding by Tsay (2014) rippled through the field, and is often discussed and cited in research on visual contributions to music perception, as well as discussions of adjudication bias. However, the original finding remains peculiar and surprising, so I'm not surprised it is being questioned here. The total citation count of Tsay's work is substantial but not massive, and the work is sometimes taken as "proof" that performance assessments are informed by visual signals.

In reading this paper, though, I did worry that people who don't know the field might view the original finding as bizarre and without clear scientific significance, and hence this non-replication as a bit pointless. Readers might focus on the fact that the current findings demonstrate what should be obvious: That people without a lot of training in music have difficulty judging the relative quality of fragments of elite level performances. The current findings are hardly surprising and almost seem obvious; what was surprising was Tsay's original finding. 

Readers might not appreciate the real-world and scientific significance of Tsay's original finding, and hence they might not find the current replication very illuminating. 

To address this concern, it may be valuable for the authors to provide a more compelling discussion of questions such as: Why is the original finding scientifically and practically important? What was its impact? Why is the current (non)replication important? The authors do discuss such points, but I was left worrying that outsiders might need more.

The null result in the current manuscript also leaves questions unanswered. All we know is that novices can't reliably pick a winning performance from short fragments of elite level performances, whether via auditory, visual, or auditory-visual clips. Would trained and experienced adjudicators be able to "pick" the winner based on the sounded music? Or would they be unreliable too? What would happen if there were larger differences between the quality of performances? At what point would musical novices be capable of adjudicating performances, and would both auditory and visual aspects of performance inform judgments? There is more room for such discussion points to be made, but I appreciate that the authors may prefer (and may be required) to keep the paper short. 

On balance, I found this a clear and clean study, and given the impact of Tsay's original finding, I'd like to see this work published. 

Author Response

Readers might not appreciate the real-world and scientific significance of Tsay's original finding, and hence they might not find the current replication very illuminating. 

To address this concern, it may be valuable for the authors to provide a more compelling discussion of questions such as: Why is the original finding scientifically and practically important? What was its impact? Why is the current (non)replication important? The authors do discuss such points, but I was left worrying that outsiders might need more.

  • We have included a discussion in the introduction about the importance of the original finding, as well as implications of the failed replication

The null result in the current manuscript also leaves questions unanswered. All we know is that novices can't reliably pick a winning performance from short fragments of elite level performances, whether via auditory, visual, or auditory-visual clips. Would trained and experienced adjudicators be able to "pick" the winner based on the sounded music? Or would they be unreliable too? What would happen if there were larger differences between the quality of performances? At what point would musical novices be capable of adjudicating performances, and would both auditory and visual aspects of performance inform judgments? There is more room for such discussion points to be made, but I appreciate that the authors may prefer (and may be required) to keep the paper short. 

  • The reviewer is correct that this paper is intended to be a short report of only the specific stimulus conditions that were tested. In future, we intend to pursue questions such as these – as such, we have included some of these ideas in the discussion

On balance, I found this a clear and clean study, and given the impact of Tsay's original finding, I'd like to see this work published. 

  • We thank the Reviewer for this feedback on our manuscript.

Reviewer 2 Report

The paper reports a replication of Tsay (2014) and finds that musical non-experts perform close to chance in judging the musical quality of video-only performance excerpts. This study is part of a project with the aim to replicate key studies in the literature. I find the study well done and well presented.

 

There are some questions I have that may be irrelevant as I, admittedly, am not too familiar with the presentation style in this type of replication study; the below comments thus explain the "Recommendation for authors" gradings which generally are "Can be improved". Not all may be applicable.

 

General comments:

 

My main concern is the lack of discussion about the whole study as such, not only the replication. This is exemplified also with the Mehr et al study: P2L97-98 concluded that "minor changes in methods generated significantly or even opposite results" - wouldn't this also be true for any study with the same outset? In P1L26-27 you describe the condition, but you don't discuss the magnitude of the different performances; how bad was the worst, and how good was the best? Considering that it was three finalist performances, were they almost equally good, but only details not covered by the excerpts differed? Is it reasonable to assume that anyone can judge quality in a short video fragment (as you show, we can't)?

 

The comparison to previous results in the literature (citations 3-16) all have very different conditions (experts, non-experts, music style, visual cues, study aims). You do a great job in summarizing and relating the research, but I still miss the discussion about the *real* impact of the results evidenced in your study and what we really learn. Instead, the lack of detail about each included study confuses the reader: for instance, P3L99, why do you juxtapose Brimhall's experiment with citations 6 and 7 in the same paragraph when the studies otherwise are hard to compare? Another example is how you compare citations 9 with 12 (P3L132), who studied very different visual stimuli. In short, it would be useful to read a discussion that is more critical about the differences in methods and material (or, possibly, focus more on the replicated study itself without the ambition to generalize).

 

There are some details that are left out that can be found in the cited studies. These are often important and could be included here as well. Were there experts in [3], [4], [5]? Were the results the same for non-experts and experts in [6]? What is meant by "register" in [9]? What are the ancillary gestures in [12] compared to [9]?

 

Expertise seems to be a very important factor throughout the paper, but the background, conclusions, and methods are not particularly stringent in how expertise is handled. If it is really important, then you should describe how you have ensured correct recruitment, see P4L194 and P5L211. Are you certain that all your participants can be described as "musical novices"? Can experts have participated from intrinsic (their interest) or extrinsic (monetary compensation) motivations, and answered untruthfully to the first questions? Would this have an impact on the result?

 

 

Details that could be improved:

 

Consider including a table summarising the results of previous studies. Could be helpful.

 

P1L8: The first sentence in the abstract about "thin slicing" is not recurring in the main part of the paper. Consider either dropping this from the abstract or including it in the paper. I think it makes sense to include it.

 

P3L128: "because it is the most effective factor" - among the included factors in the study.

 

P4L163: "were also curious" - please rephrase, all researchers were probably curious

 

P4L173: this is the only place you include anything about relevance for the industry, please consider moving this to the general discussion (as it is interesting)

 

P4L176-185: This, arguably, has nothing to do with perception, but can be due to a totally different set of factors where gender injustice is a main candidate. I would recommend removing this part or moving it to the impact discussion.

 

P4L190 only typo spotted: "[1]. and"

 

P5L236: consider making it very clear what you mean by "this phenomenon" to avoid any misinterpretation.

 

P5L248: You write this later, but even here you can mention that it is true for very short video clips. Alternatively, consider removing the possible redundancy with P6L252-254.

 

References and citations:

You use two different styles with Author (Year) and square brackets. Not all citations, e.g. Ambady, are listed in the references. One instance of Tsay seems to be the wrong year.

Author Response

My main concern is the lack of discussion about the whole study as such, not only the replication. This is exemplified also with the Mehr et al study: P2L97-98 concluded that "minor changes in methods generated significantly or even opposite results" - wouldn't this also be true for any study with the same outset? In P1L26-27 you describe the condition, but you don't discuss the magnitude of the different performances; how bad was the worst, and how good was the best? Considering that it was three finalist performances, were they almost equally good, but only details not covered by the excerpts differed? Is it reasonable to assume that anyone can judge quality in a short video fragment (as you show, we can't)?

 

  • We have added additional information in the introduction about the study conditions from Mehr et al. We agree that judging quality from such a short fragment is extremely challenging, regardless of experience, which was the intent of providing a failed replication of the original study.

 

The comparison to previous results in the literature (citations 3-16) all have very different conditions (experts, non-experts, music style, visual cues, study aims). You do a great job in summarizing and relating the research, but I still miss the discussion about the *real* impact of the results evidenced in your study and what we really learn. Instead, the lack of detail about each included study confuses the reader: for instance, P3L99, why do you juxtapose Brimhall's experiment with citations 6 and 7 in the same paragraph when the studies otherwise are hard to compare? Another example is how you compare citations 9 with 12 (P3L132), who studied very different visual stimuli. In short, it would be useful to read a discussion that is more critical about the differences in methods and material (or, possibly, focus more on the replicated study itself without the ambition to generalize). There are some details that are left out that can be found in the cited studies. These are often important and could be included here as well. Were there experts in [3], [4], [5]? Were the results the same for non-experts and experts in [6]? What is meant by "register" in [9]? What are the ancillary gestures in [12] compared to [9]?

 

  • The intention in this section was to provide some information about research that exists on the analysis of visual and other elements of musical assessment, and in doing so provide the reader with some background information about this field of study. However, we do not believe that it is necessary for us to provide large amounts of methodological information about these studies in a short report such as this one.

 

Expertise seems to be a very important factor throughout the paper, but the background, conclusions, and methods are not particularly stringent in how expertise is handled. If it is really important, then you should describe how you have ensured correct recruitment, see P4L194 and P5L211. Are you certain that all your participants can be described as "musical novices"? Can experts have participated from intrinsic (their interest) or extrinsic (monetary compensation) motivations, and answered untruthfully to the first questions? Would this have an impact on the result?

 

  • It is possible that people with musical experience may have participated, and have no way to filter them out. We have added a sentence to acknowledge this.

 

 

Details that could be improved:

 

Consider including a table summarising the results of previous studies. Could be helpful.

 

P1L8: The first sentence in the abstract about "thin slicing" is not recurring in the main part of the paper. Consider either dropping this from the abstract or including it in the paper. I think it makes sense to include it.

 

  • We have added a more fulsome discussion of thin slicing in the paper.

 

P3L128: "because it is the most effective factor" - among the included factors in the study.

 

  • This change has been made.

 

P4L163: "were also curious" - please rephrase, all researchers were probably curious

 

  • This change has been made.

 

P4L173: this is the only place you include anything about relevance for the industry, please consider moving this to the general discussion (as it is interesting)

 

  • This has been moved into the general discussion.

 

P4L176-185: This, arguably, has nothing to do with perception, but can be due to a totally different set of factors where gender injustice is a main candidate. I would recommend removing this part or moving it to the impact discussion.

 

  • We would argue this is better suited to remain in the introduction, as it sets up the interaction between evaluator expertise and stimulus modality. As such, we have left it in the current location.

 

P4L190 only typo spotted: "[1]. and"

 

  • This has been changed – we thank the Reviewer for noting it.

 

P5L236: consider making it very clear what you mean by "this phenomenon" to avoid any misinterpretation.

 

  • This has been clarified.

 

P5L248: You write this later, but even here you can mention that it is true for very short video clips. Alternatively, consider removing the possible redundancy with P6L252-254.

 

  • This change has been made.

 

References and citations:

You use two different styles with Author (Year) and square brackets. Not all citations, e.g. Ambady, are listed in the references. One instance of Tsay seems to be the wrong year.

 

  • We have included all references in square brackets as per the requirements of the journal. There are two Tsay references with different years on similar topics.

 

Reviewer 3 Report

Dear authors, I read with much interest your replication paper on audio-visual interaction in non musicians processing of short music clips.

I suggest a list of points that can help you improve the paper: 

1) "Thin slicing" appears as crucial concept in the abstract but then no direct reference is made to it throughout the introduction. 

2) When discussing the study you are replicating, more details about methodology would be beneficial and allow a more direct comparison between the two studies. E.g. duration of visual exposure in Tsay? It seems 6 secs, but please specify it...

3) I would recommend to make the introduction less "list-like", meaning that now it appears as a chronologic sequence of relevant studies. The section is well-written, but I think that more work is needed to increase readability and perhaps to make the conceptual links more evident. 

4) Line 211: Explain how such a self evaluation can be reliable for the study. Did they have quantifiable variables, e.g., less than 3/5/7 years of musical study...

5) Line 211: Did you exclude subjects due to their musical expertise? How many? 

6) Line 209: Any link to/ref/info about Gorilla platform?

7) Suggested background concepts: you might enlarge your conceptual background referring to multisensory integration and crossmodal associations

8) Suggested background literature: 

Godøy, R. I., & Leman, M. (Eds.). (2010). Musical gestures: Sound, movement, and meaning. Routledge. and in particular Leman, M. (2010). Music, gesture, and the formation of embodied meaning. In Musical Gestures (pp. 138-165).

Leman, M., Nijs, L., & Di Stefano, N. (2017). On the Role of the Hand in the Expression of Music. In The Hand (pp. 175-192). Springer, Cham.

Leman, M., & Maes, P. J. (2015). The role of embodiment in the perception of music. Empirical Musicology Review9(3-4), 236-246.

Timmers, R., Endo, S., Bradbury, A., & Wing, A. M. (2014). Synchronization and leadership in string quartet performance: a case study of auditory and visual cues. Frontiers in Psychology5, 645.

Vuoskoski, J. K., Thompson, M. R., Clarke, E. F., & Spence, C. (2014). Crossmodal interactions in the perception of expressivity in musical performance. Attention, Perception, & Psychophysics76(2), 591-604.

Author Response

I suggest a list of points that can help you improve the paper: 

1) "Thin slicing" appears as crucial concept in the abstract but then no direct reference is made to it throughout the introduction. 

- we have included thin slicing in the introduction

2) When discussing the study you are replicating, more details about methodology would be beneficial and allow a more direct comparison between the two studies. E.g. duration of visual exposure in Tsay? It seems 6 secs, but please specify it...

- We have included more methodological details as per the original Tsay study.

3) I would recommend to make the introduction less "list-like", meaning that now it appears as a chronologic sequence of relevant studies. The section is well-written, but I think that more work is needed to increase readability and perhaps to make the conceptual links more evident. 

- we have tried to rework the introduction to make it more readable.

4) Line 211: Explain how such a self evaluation can be reliable for the study. Did they have quantifiable variables, e.g., less than 3/5/7 years of musical study...

- an explanation has been included in the method.

5) Line 211: Did you exclude subjects due to their musical expertise? How many? 

- participants who reported being musically trained were immediately rejected (before completing the musical task).

6) Line 209: Any link to/ref/info about Gorilla platform?

- a link to Gorilla has been included

7) Suggested background concepts: you might enlarge your conceptual background referring to multisensory integration and crossmodal associations

- we have added some references to the literature suggested below.

Back to TopTop