Selected Papers from the 2022 Summer Conference on Applied Data Science (SCADS)

A special issue of Analytics (ISSN 2813-2203).

Deadline for manuscript submissions: closed (31 May 2023) | Viewed by 4495

Special Issue Editor


E-Mail Website
Guest Editor
Smith College, Northampton, MA, USA
Interests: visual analytics; visualization; human-computer interaction; machine learning; data science
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The papers in this Special Issue are derived from work conducted at the inaugural Summer Conference on Applied Data Science (SCADS) hosted by the Laboratory for Analytic Sciences (LAS) at North Carolina State University (NCSU) in Raleigh, NC, during the summer of 2022. This 8-week, in-person workshop brought data science expertise together from academia, industry, and government to address a "Grand Challenge" problem of interest to the broad community of knowledge workers in the Intelligence Community (IC) and beyond. Roughly 40 university professors, students, industry professionals, and government researchers participated. Beyond the Grand Challenge, SCADS provided a hands-on learning opportunity for every participant and served to develop collaborative partnerships spanning affiliations. The inspiration for SCADS stemmed from recommendations made in high-level documents from the National Security Commission on AI (NSCAI) and from the Center for Strategic and International Studies (CSIS), among others.

Dr. R. Jordan Crouser
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Analytics is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1000 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • knowledge graphs
  • text summarization
  • recommender systems
  • human-machine teaming

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 405 KiB  
Article
occams: A Text Summarization Package
by Clinton T. White, Neil P. Molino, Julia S. Yang and John M. Conroy
Analytics 2023, 2(3), 546-559; https://doi.org/10.3390/analytics2030030 - 30 Jun 2023
Viewed by 1092
Abstract
Extractive text summarization selects asmall subset of sentences from a document, which gives good “coverage” of a document. When given a set of term weights indicating the importance of the terms, the concept of coverage may be formalized into a combinatorial optimization problem [...] Read more.
Extractive text summarization selects asmall subset of sentences from a document, which gives good “coverage” of a document. When given a set of term weights indicating the importance of the terms, the concept of coverage may be formalized into a combinatorial optimization problem known as the budgeted maximum coverage problem. Extractive methods in this class are known to beamong the best of classic extractive summarization systems. This paper gives a synopsis of thesoftware package occams, which is a multilingual extractive single and multi-document summarization package based on an algorithm giving an optimal approximation to the budgeted maximum coverage problem. The occams package is written in Python and provides an easy-to-use modular interface, allowing it to work in conjunction with popular Python NLP packages, such as nltk, stanza or spacy. Full article
Show Figures

Figure A1

21 pages, 308 KiB  
Article
Preliminary Perspectives on Information Passing in the Intelligence Community
by Jeremy E. Block, Ilana Bookner, Sharon Lynn Chu, R. Jordan Crouser, Donald R. Honeycutt, Rebecca M. Jonas, Abhishek Kulkarni, Yancy Vance Paredes and Eric D. Ragan
Analytics 2023, 2(2), 509-529; https://doi.org/10.3390/analytics2020028 - 15 Jun 2023
Viewed by 1115
Abstract
Analyst sensemaking research typically focuses on individual or small groups conducting intelligence tasks. This has helped understand information retrieval tasks and how people communicate information. As a part of the grand challenge of the Summer Conference on Applied Data Science (SCADS) to build [...] Read more.
Analyst sensemaking research typically focuses on individual or small groups conducting intelligence tasks. This has helped understand information retrieval tasks and how people communicate information. As a part of the grand challenge of the Summer Conference on Applied Data Science (SCADS) to build a system that can generate tailored daily reports (TLDR) for intelligence analysts, we conducted a qualitative interview study with analysts to increase understanding of information passing in the intelligence community. While our results are preliminary, we expect that this work will contribute to a better understanding of the information ecosystem of the intelligence community, how institutional dynamics affect information passing, and what implications this has for a TLDR system. This work describes our involvement in and work completed during SCADS. Although preliminary, we identify that information passing is both a formal and informal process and often follows professional networks due especially to the small population and specialization of work. We call attention to the need for future analysis of information ecosystems to better support tailored information retrieval features. Full article
Show Figures

Figure 1

9 pages, 581 KiB  
Article
The AI Learns to Lie to Please You: Preventing Biased Feedback Loops in Machine-Assisted Intelligence Analysis
by Jonathan Stray
Analytics 2023, 2(2), 350-358; https://doi.org/10.3390/analytics2020020 - 18 Apr 2023
Cited by 2 | Viewed by 2549
Abstract
Researchers are starting to design AI-powered systems to automatically select and summarize the reports most relevant to each analyst, which raises the issue of bias in the information presented. This article focuses on the selection of relevant reports without an explicit query, a [...] Read more.
Researchers are starting to design AI-powered systems to automatically select and summarize the reports most relevant to each analyst, which raises the issue of bias in the information presented. This article focuses on the selection of relevant reports without an explicit query, a task known as recommendation. Drawing on previous work documenting the existence of human-machine feedback loops in recommender systems, this article reviews potential biases and mitigations in the context of intelligence analysis. Such loops can arise when behavioral “engagement” signals such as clicks or user ratings are used to infer the value of displayed information. Even worse, there can be feedback loops in the collection of intelligence information because users may also be responsible for tasking collection. Avoiding misalignment feedback loops requires an alternate, ongoing, non-engagement signal of information quality. Existing evaluation scales for intelligence product quality and rigor, such as the IC Rating Scale, could provide ground-truth feedback. This sparse data can be used in two ways: for human supervision of average performance and to build models that predict human survey ratings for use at recommendation time. Both techniques are widely used today by social media platforms. Open problems include the design of an ideal human evaluation method, the cost of skilled human labor, and the sparsity of the resulting data. Full article
Show Figures

Figure 1

13 pages, 348 KiB  
Article
Metric Ensembles Aid in Explainability: A Case Study with Wikipedia Data
by Grant Forbes and R. Jordan Crouser
Analytics 2023, 2(2), 315-327; https://doi.org/10.3390/analytics2020017 - 07 Apr 2023
Viewed by 1391
Abstract
In recent years, as machine learning models have become larger and more complex, it has become both more difficult and more important to be able to explain and interpret the results of those models, both to prevent model errors and to inspire confidence [...] Read more.
In recent years, as machine learning models have become larger and more complex, it has become both more difficult and more important to be able to explain and interpret the results of those models, both to prevent model errors and to inspire confidence for end users of the model. As such, there has been a significant and growing interest in explainability in recent years as a highly desirable trait for a model to have. Similarly, there has been much recent attention on ensemble methods, which aim to aggregate results from multiple (often simple) models or metrics in order to outperform models that optimize for only a single metric. We argue that this latter issue can actually assist with the former: a model that optimizes for several metrics has some base level of explainability baked into the model, and this explainability can be leveraged not only for user confidence but to fine-tune the weights between the metrics themselves in an intuitive way. We demonstrate a case study of such a benefit, in which we obtain clear, explainable results based on an aggregate of five simple metrics of relevance, using Wikipedia data as a proxy for some large text-based recommendation problem. We demonstrate that not only can these metrics’ simplicity and multiplicity be leveraged for explainability, but in fact, that very explainability can lead to an intuitive fine-tuning process that improves the model itself. Full article
Show Figures

Figure 1

Back to TopTop