TY - JOUR
T1 - An analysis of design process and performance in distributed data science teams
AU - Maier, Torsten
AU - DeFranco, Joanna
AU - Mccomb, Christopher
N1 - Publisher Copyright:
© 2019, Emerald Publishing Limited.
PY - 2019/10/4
Y1 - 2019/10/4
N2 - Purpose: Often, it is assumed that teams are better at solving problems than individuals working independently. However, recent work in engineering, design and psychology contradicts this assumption. This study aims to examine the behavior of teams engaged in data science competitions. Crowdsourced competitions have seen increased use for software development and data science, and platforms often encourage teamwork between participants. Design/methodology/approach: We specifically examine the teams participating in data science competitions hosted by Kaggle. We analyze the data provided by Kaggle to compare the effect of team size and interaction frequency on team performance. We also contextualize these results through a semantic analysis. Findings: This work demonstrates that groups of individuals working independently may outperform interacting teams on average, but that small, interacting teams are more likely to win competitions. The semantic analysis revealed differences in forum participation, verb usage and pronoun usage when comparing top- and bottom-performing teams. Research limitations/implications: These results reveal a perplexing tension that must be explored further: true teams may experience better performance with higher cohesion, but nominal teams may perform even better on average with essentially no cohesion. Limitations of this research include not factoring in team member experience level and reliance on extant data. Originality/value: These results are potentially of use to designers of crowdsourced data science competitions as well as managers and contributors to distributed software development projects.
AB - Purpose: Often, it is assumed that teams are better at solving problems than individuals working independently. However, recent work in engineering, design and psychology contradicts this assumption. This study aims to examine the behavior of teams engaged in data science competitions. Crowdsourced competitions have seen increased use for software development and data science, and platforms often encourage teamwork between participants. Design/methodology/approach: We specifically examine the teams participating in data science competitions hosted by Kaggle. We analyze the data provided by Kaggle to compare the effect of team size and interaction frequency on team performance. We also contextualize these results through a semantic analysis. Findings: This work demonstrates that groups of individuals working independently may outperform interacting teams on average, but that small, interacting teams are more likely to win competitions. The semantic analysis revealed differences in forum participation, verb usage and pronoun usage when comparing top- and bottom-performing teams. Research limitations/implications: These results reveal a perplexing tension that must be explored further: true teams may experience better performance with higher cohesion, but nominal teams may perform even better on average with essentially no cohesion. Limitations of this research include not factoring in team member experience level and reliance on extant data. Originality/value: These results are potentially of use to designers of crowdsourced data science competitions as well as managers and contributors to distributed software development projects.
UR - http://www.scopus.com/inward/record.url?scp=85074262139&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85074262139&partnerID=8YFLogxK
U2 - 10.1108/TPM-03-2019-0024
DO - 10.1108/TPM-03-2019-0024
M3 - Article
AN - SCOPUS:85074262139
SN - 1352-7592
VL - 25
SP - 419
EP - 439
JO - Team Performance Management
JF - Team Performance Management
IS - 7-8
ER -