barela_etal_2024_preprint.Rmd

---
title : "Impulsivity as a trait in domestic dogs (_Canis familiaris_): A systematic review and meta-analysis"
shorttitle: "Impulsivity as a trait in domestic dogs"

author:
  - name: Jessica Barela
    affiliation: '1'
    email: jbarela2@huskers.unl.edu
    address: B83 East Stadium, University of Nebraska-Lincoln, Lincoln, NE, USA 68588. ORCID 0000-0001-8359-2619
  - name: Yasmin Worth
    affiliation: '1'
  - name: Jeffrey R. Stevens
    affiliation: '1'
    corresponding: yes
    email: jeffrey.r.stevens@gmail.com
    address: B83 East Stadium, University of Nebraska-Lincoln, Lincoln, NE, USA 68588. ORCID 0000-0003-2375-1360


affiliation:
  - id: '1'
    institution: Department of Psychology, Center for Brain, Biology & Behavior, University of Nebraska-Lincoln, Lincoln, NE, USA


authornote: |
  \textbf{This preprint has been published in the \textit{Journal of Comparative Psychology}. This paper is not the copy of record and may not exactly replicate the authoritative document published in the APA journal. The final article is available at: \href{https://doi.org/10.1037/com0000352}{https://doi.org/10.1037/com0000352}.} 

  PsyArXiv preprint: https://doi.org/10.31234/osf.io/ctfns

  Version: 2024-03-29 

  Jeffrey R. Stevens, \orcidlink{0000-0003-2375-1360} [https://orcid.org/0000-0003-2375-1360](https://orcid.org/0000-0003-2375-1360). 
  
  Data, analysis scripts, supplementary materials, and reproducible research materials are available at https://osf.io/z6svt/. Pre-registration materials are available at https://osf.io/bsyxk.  This research was funded by a National Science Foundation grant NSF-1658837.
 
 
abstract: >
 Impulsivity is a critical component of dog (_Canis familiaris_) behavior that owners often want to curtail. Though studies of dog impulsivity have examined their inability to wait and to inhibit inappropriate behaviors, it is not clear whether impulsivity is a behavioral trait with consistent characteristics across contexts. For this project, we conducted a systematic review and meta-analysis to investigate whether impulsivity exists as a behavioral trait in domestic dogs. Under a pre-registered protocol, we processed over 10,000 bibliographic database records to uncover 13 articles with multiple impulsivity tasks assessed in the same subjects. Across 31 pairs of impulsivity tasks, 28 failed to detect a correlation in performance between tasks and 3 detected a correlation. For 15 correlations of impulsivity tasks with the owner's perception of their dog's impulsivity, 10 were not correlated, while 5 were correlated. A formal meta-analysis on one pair of tasks (A-not-B task and Cylinder task) tested across seven different studies showed no overall correlation between the tasks. Our systematic review and meta-analysis found little indication of consistent relationships between impulsivity levels across tasks for dogs. Therefore, at the moment, we do not have good evidence of impulsivity as a behavioral trait that transfers across contexts, suggesting that perhaps we should focus on the context-specific nature of impulsivity in dogs.
 
  
keywords          : "canine, impulsivity, inhibitory control, meta-analysis, self-control"
# # wordcount         : "5794"

bibliography      : ["r-references.bib", "barela_etal_2024.bib"]
csl               : "barela_etal_2024.csl"

figsintext        : yes
figurelist        : no
tablelist         : no
footnotelist      : no
lineno            : no
quote_labels      : yes

header-includes: 
  - \usepackage{orcidlink}
  - \usepackage[normalem]{ulem}
  - |
    \makeatletter
    \renewcommand{\paragraph}{\@startsection{paragraph}{4}{\parindent}%
      {0\baselineskip \@plus 0.2ex \@minus 0.2ex}%
      {-1em}%
      {\normalfont\normalsize\bfseries\typesectitle}}
    
    \renewcommand{\subparagraph}[1]{\@startsection{subparagraph}{5}{1em}%
      {0\baselineskip \@plus 0.2ex \@minus 0.2ex}%
      {-\z@\relax}%
      {\normalfont\normalsize\bfseries\itshape\hspace{\parindent}{#1}\textit{\addperi}}{\relax}}
    \makeatother

class             : "pub"
keep_tex          : "TRUE"
output            : papaja::apa6_pdf

---

```{r include = FALSE, echo = FALSE}
library(dplyr)
library(here)
library(rmarkdown)
library(knitr)
library(kableExtra)
library(papaja)
library(RoBMA)

source("barela_etal_2024_rcode.R")
# load(here("barela_etal_2023.RData"))
r_refs(file = "r-references.bib")
my_citations <- cite_r(file = "r-references.bib", pkgs = c("here", "kableExtra", "knitr", "lubridate", "papaja", "PRISMA2020", "readxl", "RoBMA", "robvis", "tidyverse"), omit = FALSE)


```

Domestic dogs (_Canis familiaris_) often engage in behaviors such as eating food from countertops, chasing after squirrels when on walks, or even playing too boisterously with other pets or humans in the household. To peacefully coexist with humans, dogs must inhibit their urges to engage in these harmful or inappropriate behaviors; they must curb their _impulsivity_ [@Stevens.etal.2022a].  For example, dogs are often expected to resist destroying furniture, urinating in the house, and jumping up on guests in order to be considered a “good dog”. The aim of this project is to conduct a systematic review and meta-analysis to investigate whether impulsivity exists as a behavioral trait in domestic dogs.

Impulsivity is a multifaceted concept that includes a wide range of behaviors such as an inability to wait, a preference for risky outcomes, a tendency to act without thinking of the consequences, and/or an inability to inhibit inappropriate behaviors [@Evenden.1999a;@Reynolds.etal.2006; @Stevens.2017c]. Impulsivity connects to several other behavioral aspects such as self-control, delay of gratification, inhibitory control, and risk taking, among many others. The wide scope of behaviors that fall under impulsivity has led researchers to carve up this concept into different subtypes: impulsive action (or disinhibition) and impulsive choice (or decision making) [@Reynolds.etal.2006; @Winstanley.etal.2006; @Stevens.2017c]. Impulsive action describes the failure to inhibit an action or the ability to withhold from making a response. Impulsive choice involves choosing between rewards with different costs. One type of impulsive choice is intertemporal choice, which involves choosing between a smaller, more immediate reward and a larger reward that comes further in the future [@Read.2004; @Stevens.2010a]. Another important concept related to impulsivity is inhibitory control, which refers to an individual’s ability to resist the urge to act in a way that is immediately tempting but ultimately harmful or counterproductive [@Bray.etal.2014; @Fagnani.etal.2016a]. 

Impulsivity is critical in how dogs fulfill a wide range of roles in today’s society. They offer companionship as pets, provide protection in military and law enforcement settings, provide assistance to individuals with a wide range of disabilities, and help detect drugs, bombs, and even disease [@Bray.etal.2014; @Olsen.2018]. Better understanding of impulsivity and the factors that influence it may help dogs be more effective in the roles that they serve. For instance, the absence of distractibility may predict success for drug detection and guide dogs [@Maejima.etal.2007; @Batt.etal.2008; @Bray.etal.2014]. Also, the top five reasons owners relinquished their dogs to animal shelters include biting, house soiling, aggression towards humans, escaping, and being destructive indoors [@Salman.etal.2000; @Olsen.2018]. In other words, dogs are often surrendered to shelters as a result of impulsive behaviors. Therefore, an increased understanding of impulsivity may allow us to more effectively breed, train, socialize, and place dogs to live and work alongside us.

Due to impulsivity's wide scope, research on its origins and the factors that have influenced it has many implications for the human-dog bond. Organizations and individuals have increasingly shown interest in temperament tests that may assess useful, predictable behavioral tendencies in working and companion dogs [@Taylor.Mills.2006]. These tests, if accurate, could assist individuals in selecting more effective working and service dogs, as well as more efficiently matching owners with companion dogs whose individual characteristics suit their lifestyle [@Fratkin.etal.2013]. The idea that makes these temperament tests feasible is that dogs possess behavioral tendencies that are stable over time and consistent across contexts [@Taylor.Mills.2006]. Therefore, if we are to develop reliable ways of testing and predicting impulsivity in domestic dogs, it is important to first determine if impulsivity may be a behavioral trait in this species.

Over the last several years, researchers studying dogs have increasingly investigated whether impulsivity is a behavioral trait in dogs by assessing whether individual dogs show consistency in impulsivity across different situations, usually in different behavioral tasks. @Olsen.2018 reviewed the literature investigating executive function and summarized what behavioral tasks have been used to study this concept in dogs. The review included several impulsivity measures, such as Cylinder task, Detour Fence task, A-not-B task, and Delay Discounting task. While Olsen's review focused on executive function associated with individual tasks in isolation of one another, we are interested in how they relate and whether there is consistency in performance across these tasks. For instance, @Bray.etal.2014 and @Fagnani.etal.2016a tested the same dogs in an A-not-B task and a Cylinder task, with both tasks designed to measure inhibitory control. @Brucks.etal.2017a tested dogs in a battery of four common inhibitory control tests. Though some studies of impulsivity as a trait demonstrate correlations across tasks [e.g., @Muller.etal.2016], many do not show consistency in behavior. Because of this mixed result, we take a meta-analytic approach to assess the evidence for impulsivity as a behavioral trait in dogs.

<@~{#dias}

The overall objective of this study was to conduct a comprehensive systematic review and meta-analysis of studies that measure dogs’ performance in multiple impulsivity tasks to compare overall relationships between tasks, as well as assess the evidence for a behavioral trait of impulsivity. There were two main aims of this analysis. The first aim was to examine studies in dogs that measure impulsivity in multiple tasks to find which tasks have been studied together and whether behavioral responses correlate between tasks. The second aim was to investigate relationships between behavioral measures of dog impulsivity and owner perceptions of dog impulsivity. If impulsivity is a behavioral trait, then it is possible that owners can reliably assess impulsivity, and therefore their assessments should correlate with behavioral measures. We used the Dog Impulsivity Assessment Scale (DIAS) as our measure of owner perception of dog impulsivity [@Wright.etal.2011b], as it is validated and is frequently used in the literature. This scale has 18 items and is composed of three subscales: behavioral regulation, responsiveness, and aggression.

~@>

To achieve our aims, we conducted a database search for studies including either multiple impulsivity tasks or at least one impulsivity task and the DIAS. We then summarized the data to investigate possible correlations between impulsivity tasks, as well as owner perceptions and behavioral measures of dog impulsivity. For pairs of tasks with enough studies, we conducted a formal meta-analysis to estimate overall effect sizes for these correlations.

# Methods

We pre-registered this study and followed the Preferred Reporting Items of Systematic Reviews and Meta-Analyses (PRISMA) guidelines for conducting a reproducible systematic review and meta-analysis [@Moreau.Gamble.2020]. Figure \ref{fig:prisma} summarizes the search in a PRISMA flowchart built with the _PRISMA2020_ package [@R-PRISMA2020].

## Literature Search

We searched databases to find studies with multiple impulsivity tasks tested on the same dogs. We refer to the output of the databases as _records_ since they only include citation information. _Reports_ are the actual journal articles and dissertations that we accessed to evaluate individual _studies_, which are the descriptions and results of the experiments conducted by the report authors. Reports may have more than one study.

A database search was conducted on 2022-06-18 using PsychINFO, Scopus, and Web of Science. We included the following search terms in the abstract field of each database: `(canine OR dog OR dogs) AND (impulsiv* OR inhibit* OR discount* OR “delayed gratification” OR “delay of gratification” OR “self control” OR “self-control”)`. According to @Bensky.etal.2013, the first report investigating impulsivity in dogs was published in 2003. Therefore, we limited the search to reports published in 2003 and afterwards. For PsychINFO and Scopus we used the date range of 2003 to 2022. For Web of Science we set the date range as 2003-01-01 to 2022-12-31 (though the search date was 2022-06-18). To ensure literature saturation, we also scanned the reference lists of the final set of included reports to be sure there were no reports that met our inclusion criteria but were missed in our database search (we found no additional reports). The search resulted in `r format(n_total_records, big.mark = ",")` records for review: `r nrow(psychinfo)` from PsychINFO, `r format(nrow(scopus), big.mark = ",")` from Scopus, and `r format(nrow(wos), big.mark = ",")` from Web of Science.

\begin{figure*}
\caption{\newline PRISMA Flowchart \label{fig:prisma}}
\begin{center}
\includegraphics[width=0.8\linewidth]{"figures/prisma_chart.png"}
\end{center}

\textit{Note.} Figure built using \textit{PRISMA2020} (Haddaway et al., 2022). Figure used with permission under a CC-BY4.0 license: Barela et al. (2023); available at https://doi.org/10.31234/osf.io/ctfns.
\end{figure*}

## Screening Process and Exclusion Criteria

To be included in our analysis, reports had to meet several criteria. Only complete, published reports that were available in English in an electronic format were included in the analysis. All reports had to be original, experimental work; observational work was not included. Relevant dissertations were included; however, if duplicate studies existed between dissertations and journal articles, the journal article was included in the analysis.

After our initial database search, we merged results across the three databases and filtered out duplicated records based on DOI or journal information. The remaining `r format(n_unique_records, big.mark = ",")` records were then screened in two phases. First, the title and abstract of each record were screened, and records were excluded if they: (1) did not use domestic dogs as the study subjects, (2) did not use one or more behavioral measures of impulsivity in dogs, or (3) were not experimental research studies. For the purpose of our analysis, we considered a record to use dogs as study subjects when the study incorporated live, behaving domestic dogs. We excluded studies that used only canine tissue/cells or if they investigated only wild canids. If wild canids and domestic dogs were both subjects of a study, the record was included; however, only the data from domestic dogs was analyzed. There were no restrictions on the age, sex, or neuter status of the dogs used in the studies. Records that raised the possibility of inclusion but did not clearly meet the criteria were included for further assessment.

After this first round of screening, we downloaded and evaluated the reports for the remaining `r nrow(reviewed_records)` records. During this round, reports were excluded if they: (1) were not complete, original reports available in English, (2) did not include either two impulsivity tasks or one impulsivity task and the overall score for the Dog Impulsivity Assessment Scale [DIAS, @Wright.etal.2011b], or (3) did not report the statistics required to obtain a correlation coefficient. The reports that met the criteria at this point were included in our final group of `r nrow(analyzed_records)` reports.

## Information Extracted from Studies

Our `r nrow(analyzed_records)` reports comprised `r nrow(analyzed_records) - 1` journal articles and 1 dissertation. These reports included a total of `r nrow(analyzed_studies)` studies since a record could include multiple studies or study populations (Table \ref{tab:studies}). The final set of studies contained a total of `r length(tasks) - 1` different impulsivity tasks plus the DIAS, which was used in `r sum(analyzed_records$dias)` of the studies. Table \ref{tab:tasks} lists the impulsivity tasks included in our analysis, as well as the specific measure used for each task.  For each study, we recorded population characteristics when available, including total sample size, numbers of males/females, neuter status of dogs,  type of dog (pet, working, free-ranging, captive, or shelter dogs), and whether a specific breed of dog or type of working dog was used as the subject (i.e., border collies, sled working dogs). For each study, we compiled all task pairs---pairwise combinations of tasks and/or task/DIAS combinations---that could result in a correlation coefficient.  Some studies had multiple measures for the same task (e.g., accuracy and latency), so we included all measures of all task pairs. This resulted in `r nrow(measure_pairs)` unique study measures. For each study measure, we recorded the impulsivity task(s) and/or DIAS used in task pair, the type of effect size, the effect size value, the sample size, and whether the authors reported the effect as statistically significant (i.e., $p$ < 0.05). To avoid the "double-counting" associated with multiple measures for the same task pair [@Harrer.etal.2021], we selected a single measure for each task in a pair. To select the measure, we prioritized measures that (1) were used in multiple studies, (2) were scaled in the direction of higher values representing more impulsivity, and (3) were commonly used in the literature. Once we removed extra measures, we had `r nrow(measure_pairs_trimmed)` pairs of measures to evaluate.

\renewcommand{\arraystretch}{1.2}

```{r studies}
#| fig.pos = "t"
options(knitr.kable.NA = '--')
opts_knit$set(kable.force.latex = TRUE) 
kable(studies_table, booktab = TRUE, format = "latex", escape = FALSE, linesep = "", table.envir = "table*",
      align = "llccccl",
      col.names = (c("No.", "Study", "Sample size", "Dog type", "Sex ratio", "Neutered status", "Tasks")),
      caption = "Study Characteristics") |> 
  column_spec(2:2, width = "4.75cm") |> 
  column_spec(3:5, width = "1.1cm") |> 
  column_spec(6:6, width = "1.25cm") |> 
  column_spec(7:7, width = "4cm") |> 
  kable_styling(font_size = 10) |> 
  # footnote("\linebreak \textit{Note}.", general_title = "", threeparttable = TRUE, escape = FALSE, fixed_small_size = TRUE) |> 
  I()
```


```{r tasks}
#| fig.pos = "h"
options(knitr.kable.NA = '')
task_table |> 
  mutate(task = sub("Delay Discounting", "Delay Discounting$^{\\\\dagger}$", task),
         task = sub("Social Inhibition", "Social Inhibition$^{\\\\dagger}$", task),
         task = sub("Spatial Impulsivity", "Spatial Impulsivity$^{\\\\dagger}$", task),
         measure = kableExtra::linebreak(measure),
         study = kableExtra::linebreak(study)) |>
  kable(booktab = TRUE, format = "latex", linesep = "", escape = FALSE, table.envir = "table*",
      align = "llll",
      col.names = c("Task", "Description", "Measure", "Study"),
      caption = "Tasks and Measures") |> 
  column_spec(2, width = "5.5cm") |> 
  column_spec(3, width = "4cm") |>
  column_spec(4, width = "2.5cm") |>
  kable_styling(font_size = 8) |> 
  footnote(general = "Table used with permission under a CC-BY4.0 license: Barela et al. (2023); available at https://doi.org/10.31234/osf.io/ctfns.\\\\newline", 
           symbol = c("represents measures scaled with higher values representing less impulsivity. Correlation coefficients for these measures were multiplied by $-$1 to ensure all positive correlations represent higher impulsivity. For studies with multiple measures in a task, \\\\sout{strikethrough} signals removal of the associated measure from the analysis.", "represents impulsive choice tasks; all other tasks are impulsive action."), 
                      general_title = "Note: ",
           threeparttable = TRUE, escape = FALSE)
```

\normalsize
## Data Reliability

After the initial database search, two reviewers (JB and YW) screened the title and abstract of a subset of the records from the first round of screening. First, both reviewers screened 100 randomly chosen records individually. For each record, the reviewers investigated whether the inclusion criteria were met, and, if not, the reason for exclusion was noted. The reviewers then compared their responses, and, after 100% agreement was reached, the two reviewers split the remaining records for this round of screening. After the first round of screening, the two reviewers screened the remaining `r nrow(reviewed_records)` papers, focusing on the Methods sections. During this stage, both reviewers individually screened 20 randomly chosen records and decided if each paper met the criteria to be included in the final analysis. If a record did not meet the inclusion criteria, then the reason for exclusion was noted. Once 100% agreement was reached on whether each paper should be included or the reason for exclusion, a single reviewer (JB) screened the remaining records. This reviewer also extracted and recorded the required data from each record.

## Transparency and Openness

We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study. We analyzed data from the project using `r my_citations`. The manuscript was created using *rmarkdown*  [Version `r packageVersion("rmarkdown")`, @R-rmarkdown_a] and *papaja* [Version `r packageVersion("papaja")`, @R-papaja]. Data, analysis scripts, supplementary materials, and reproducible research materials are available at the Open Science Framework (https://osf.io/z6svt/). We pre-registered our design and analysis plan at the Open Science Framework (https://osf.io/bsyxk).

## Data Analysis

The first step in our data analysis was to convert any other types of effect sizes to correlation coefficients. All effect sizes were already presented as correlation coefficients, though some were presented as Pearson correlations and other Spearman correlations. For measures scaled such that higher values meant _less_ impulsivity, we reversed the sign of the correlation coefficient (if both measures in a correlation were scaled this way, we reversed the sign twice). Thus, all correlation coefficients represented _more_ impulsivity for both tasks.

Our first analysis generated a matrix of the effect sizes for all impulsivity task pairs. Within each cell of (half of) the matrix, we aggregate the correlation coefficient, sample size, and citation for each study that correlates that pair of tasks.

For any cell in the table that had three or more studies in it, we conducted a robust Bayesian, model-averaged meta-analysis. We input the correlation coefficients and sample sizes into the `RoBMA()` function in the _RoBMA_ package [@R-RoBMA], using default priors (standard normal distribution on effect sizes, inverse gamma distribution with $\alpha$ = 1 and $\beta$ = 0.15 on heterogeneity, two two-sided weight functions with cut-points at (0.05) and (0.05, 0.10) and parameters $\alpha$  = (1, 1) and (1, 1, 1), and the default point priors on the null hypotheses) [@Bartos.etal.2022]. This function allowed us to test not only if there is evidence of an effect but also evidence for between-study heterogeneity (whether there is variation in true effect sizes across studies) and publication bias [@Maier.etal.2022]. This analysis used model averaging to calculate (1) a Bayesian estimate of the effect size across all studies, (2) a Bayes factor for evidence supporting the hypothesis of an effect, (3) a Bayes factor for evidence supporting the presence of between-study heterogeneity, and (4) a Bayes factor for evidence supporting the presence of publication bias [@Maier.etal.2022]. We set the prior hypothesis probability to 0.50 for the effect size, heterogeneity, and publication bias. 

# Results

Our first analysis aggregated all of the correlation coefficients for each pair of tasks or task/DIAS pair for each study (Table \ref{tab:taskpairs}). Across `r nrow(measure_pairs_trimmed)` task/task or task/DIAS pairs, study authors reported `r task_dias_pairs_count$n[task_dias_pairs_count$reported_significant == "No"]` pairs as not demonstrating a statistically significant correlation and `r task_dias_pairs_count$n[task_dias_pairs_count$reported_significant == "Yes"]` pairs as demonstrating a correlation. For only the `r nrow(task_pairs)` task/task pairs, `r task_pairs_count$n[task_pairs_count$reported_significant == "No"]` were not correlated, while `r task_pairs_count$n[task_pairs_count$reported_significant == "Yes"]` were correlated. For only the `r nrow(dias_pairs)` task/DIAS pairs, `r dias_pairs_count$n[dias_pairs_count$reported_significant == "No"]` were not correlated, while `r dias_pairs_count$n[dias_pairs_count$reported_significant == "Yes"]` were correlated. Thus, the vast majority of task/task pairs show no correlations. Notably, one of the three task/task correlations was between two versions of the same task: A-not-B Barrier and A-not-B Cup. Also, the other two task correlations included the Wait-for-Treat task. The task/DIAS correlation fared better with DIAS overall scores correlating with A-not-B Cup, Delay Discounting, Delay of Gratification, and Spatial Impulsivity.

```{r taskpairs, warning = FALSE}
task_dias_pairs_table |> 
  rename("Task" = "task_a") |> 
  mutate_all(linebreak) |> 
  select(-c(`A-not-B Barrier`, Box, `Delay Discounting`, Leash, `Spatial Impulsivity`)) |> 
  filter(!Task %in% c("Detour Fence", "Social Inhibition", "Wait-for-Treat")) |> 
  kable(booktab = TRUE, format = "latex", escape = FALSE, linesep = "\\addlinespace", # taskpairs
        align = "l",
        caption = "Task Pair Correlations") |> 
  landscape() |> 
  kable_styling(font_size = 8, latex_options = "scale_down") |> 
  column_spec(1, width = "1.6cm") |> 
  column_spec(2, width = "2.3cm") |> 
  column_spec(3, width = "1.8cm") |> 
  column_spec(4, width = "2.2cm") |> 
  # column_spec(5, width = "1.9cm") |> 
  column_spec(5, width = "2.2cm") |> 
  column_spec(6, width = "2cm") |> 
  column_spec(7, width = "2.1cm") |> 
  column_spec(8, width = "1.9cm") |> 
  column_spec(9, width = "2.2cm") |>
  column_spec(10, width = "2.3cm") |> 
  footnote(c("\\\\linebreak \\\\textit{Note}. Cells include correlation coefficient type (Pearson's $r$ or Spearman's $\\\\rho$), sample size (in parenthesis), correlation coefficient value, and study number [in brackets]. Correlation coefficients reported with $p < 0.05$ are \\\\textbf{bolded with an asterisk*}. Correlation coefficients were multiplied by $-$1 to ensure all positive correlations represent higher impulsivity for both tasks. DIAS is the Dog Impulsivity Assessment Scale (Wright et al., 2011)."),
           general_title = "", threeparttable = TRUE, escape = FALSE, fixed_small_size = TRUE)
```

Only two task pairs included three or more studies, thereby meeting our criterion for conducting a formal meta-analysis. Because one of those pairs (Spatial Impulsivity and DIAS) has recently had a meta-analysis conducted, with new studies published after our search deadline [@Stevens.etal.2022a], we did not conduct a meta-analysis for that pair. To summarize @Stevens.etal.2022a, two out of six studies found a correlation between Spatial Impulsivity and DIAS scores. A Bayesian, model-averaged meta-analysis found anecdotal evidence of no correlation between Spatial Impulsivity and DIAS scores, but further studies are needed to confirm this result.

The other task pair with more than two studies was A-not-B Cup and Cylinder. None of the seven studies that tested both of these tasks on the same dog reported a statistically significant correlation. `r metaanalysis_text` Thus, there was no evidence for a correlation in responses between the A-not-B Cup and Cylinder tasks (Figure \ref{fig:forest}). Further, there was no evidence for publication bias favoring studies that found a correlation. Even when correcting for publication bias, the conditional PET-PEESE estimate of the overall effect does not differ from 0 (**PET**-PEESE: `r paste0(printnum(pet_mean, digits = 3), " [", printnum(pet_lowerci, digits = 3), ", ", printnum(pet_upperci, digits = 3), "]")`). And there was weak evidence against between-study heterogeneity, suggesting that the effect sizes are relatively homogeneous across studies, despite using different measures within the tasks.


\begin{figure*}
\caption{\newline Meta-Analysis Forest Plot for A-not-B Cup and Cylinder Tasks \label{fig:forest}}
\begin{center}
\includegraphics[width=0.8\linewidth]{"figures/anotb-cylinder-forestplot.png"}
\end{center}


\end{figure*}

# Discussion

We examined almost 10,000 bibliographic records to discover `r nrow(analyzed_records)` reports that tested multiple impulsivity tasks or owner surveys of dog impulsivity within the same dogs. This resulted in `r nrow(measure_pairs_trimmed)` task/task or task/survey pairs of which only `r task_dias_pairs_count$n[task_dias_pairs_count$reported_significant == "Yes"]` found a correlation. Correlations were more common between tasks and the survey than between different tasks. We also conducted a formal meta-analysis for correlations between the A-not-B Cup task and Cylinder task (N=7 studies). We found no evidence for a correlation in performance between these two tasks. Thus, overall, our systematic review and meta-analysis found little evidence for consistent relationships between impulsivity levels across tasks.


## Implications

For the `r nrow(task_pairs)` task/task pairs (not including DIAS survey data), only `r task_pairs_count$n[task_pairs_count$reported_significant == "Yes"]` (`r printnum(task_pairs_count$n[task_pairs_count$reported_significant == "Yes"] / nrow(task_pairs) * 100, digits = 1)`%) were correlated. One of the correlated pairs involved the A-not-B Barrier and A-not-B Cup task [@Kelly.etal.2019]. It is therefore not surprising that two similar tasks result in consistent performance. Moreover, this study only tested 15 dogs, and small sample sizes can result in inflated effect sizes [@Gelman.Carlin.2014]. A separate study comparing these two tasks with a larger sample size did not find a correlation [@Vernouillet.etal.2018]. The other two correlated pairs both involved the Wait-for-Treat task [@Muller.etal.2016]. In this task, a treat was placed in front of the dog, a "wait" command was given, and the dog was supposed to wait for a "go" command before retrieving the treat. Though this task clearly requires inhibition, performance may reflect training more than impulsivity [@Muller.etal.2016]. Thus, even in the cases where we observe correlations between impulsivity tasks, they occur for small sample sizes, with similar tasks, and when training may account for performance.

The lack of correlations across tasks may seem surprising. However, work in human impulsivity also shows that different impulsivity tasks and surveys do not combine into a single behavioral trait [@Malle.Neubauer.1991; @Wingrove.Bond.1997; @Smith.etal.2007]. @MacKillop.etal.2016 separated out impulsive action (the failure to inhibit an action or the ability to withhold from making a response) from impulsive choice (choosing between rewards with different costs) and impulsive personality traits (self-reported attributions of self-regulatory capacity). Though the authors found correlations between tasks within the categories, they did not find correlations between categories. A review of the construct of impulsivity in human studies corroborates distinct subtypes of impulsivity [@Dick.etal.2010]. However, a key difference between human and animal work is that, while humans do show correlations between tasks _within_ the subtypes of impulsivity (e.g., impulsive action and impulsive choice), animal works shows that different tasks within a subtype are not necessarily related in birds [@Logan.etal.2022], rats [@Peterson.etal.2015], or primates [@Addessi.etal.2013; @Blanchard.Hayden.2015; @Parrish.etal.2018]. Indeed, our meta-analysis of performance in A-not-B Cup and Cylinder tasks (tasks within the impulsive action subtype) showed no correlation across seven studies (Figure \ref{fig:forest}). Even the similar Cylinder and Detour Fence tasks---which both involve motor inhibition of not moving directly toward food but instead detouring around a transparent obstacle---are not correlated in two studies. In fact, the Cylinder task was used in `r nrow(filter(task_dias_pairs, (grepl("cylinder", correlate_a) | grepl("cylinder", correlate_b)) & !grepl("DIAS", correlate_b)))` different studies (`r nrow(filter(task_dias_pairs, (grepl("cylinder", correlate_a) | grepl("cylinder", correlate_b)) & !grepl("DIAS", correlate_b))) - 1` of which were impulsive action tasks) but did not correlate with any of them. Thus, even within a subtype of impulsivity, animals do not show strong evidence for the behavioral trait of impulsivity, and dogs seem to follow this pattern.

Though behavioral tasks do not seem to correlate, it is possible for dog owners to extract a 'personality trait' from their dog's behavior. The DIAS provides a survey for owners to do just that. For the `r nrow(dias_pairs)` task/DIAS survey pairs, owner reports of impulsivity were correlated in `r dias_pairs_count$n[dias_pairs_count$reported_significant == "Yes"]` (`r printnum(dias_pairs_count$n[dias_pairs_count$reported_significant == "Yes"] / nrow(dias_pairs) * 100, digits = 1)`%) studies over four different tasks: A-not-B Cup, Delay Discounting, Delay of Gratification, and Spatial Impulsivity. This is obviously a higher rate than that observed between behavioral tasks. Three of these studies with correlations were conducted by the authors of the DIAS [@Wright.etal.2012a; @Brady.etal.2018], but two studies were independent of the DIAS authors [@Brucks.etal.2017a; @Olsen.2019]. Two of these studies (A-not-B Cup and Spatial Impulsivity) used small sample sizes (N=13--15), raising the possibility of inflated effect sizes. For both tasks, studies with larger sample sizes failed to find correlations [A-not-B Cup, @Olsen.2019; Spatial Impulsivity: @Mongillo.etal.2019]. Further replications of the Spatial Impulsivity studies not included here (due to publication after the search deadline) also failed to find a correlation between owner perceived impulsivity and performance on the Spatial Impulsivity tasks [@Stevens.etal.2022a]. Moreover, a Bayesian meta-analysis of all six studies found anecdotal evidence of no correlation between DIAS and Spatial Impulsivity performance. Interestingly, DIAS scores correlated with performance in the Delay Discounting [@Wright.etal.2012a] and Delay of Gratification [@Brucks.etal.2017a] tasks, two classic tasks for assessing intertemporal choice, or preferences for immediate vs.\ delayed rewards.

For task/DIAS pairs, 4 of the 5 correlations occurred for tasks associated with impulsive choice: Delay Discounting, Delay of Gratification, and Spatial Impulsivity. In this case, smaller, sooner or closer rewards vs.\ larger, later or more distant rewards.  This is perhaps surprising given that only 1 of the 18 DIAS questions references aspects of intertemporal or spatial choice ("Dog is not very patient"). In fact, most of the DIAS questions do not directly ask about impulsivity unless they do so in a general way ("Dog is considered to be very impulsive") or by focusing on impulsive action ("Dog does not think before it acts", "Dog appears to have a lot of control over how it responds"). Most DIAS questions reference excitement, persistence, trainability, aggression, neophobia, and reactivity. Therefore, it remains unclear what aspects of dog behavior the DIAS is capturing in its assessment of impulsivity.

In summary, we do not have strong evidence for impulsivity as a single behavioral trait. Perhaps this is not surprising given the multifaceted nature of impulsivity and the few correlations that we observe between tasks in humans and other species. Yet characterizing impulsivity as a behavioral trait---if it exists---could be useful for canine science, as often dog owners and handlers want dogs to inhibit their impulses. And we have some evidence of impulsivity mapping onto important behavior in working dogs. Impulsivity (measured via inhibition in the Cylinder task) was associated with success in an explosive detection task in a population of police explosive search dogs [@Tiira.etal.2020]. Persistence and problem solving were not associated with their detection success. @Lazarowski.etal.2020a did not find a relationship between Cylinder task success and detection dog performance, but they did find that success in an A-not-B Barrier task was associated with performance. Thus, the ability to inhibit impulsivity in behavioral tasks may predict real-world performance for working dogs. This relationship suggests that assessing impulsivity may help in the selection of working dogs for training programs.


## Limitations

While systematic reviews and meta-analyses can be a useful way to aggregate the literature to examine larger-scale patterns, they also come with limitations. Of course, one of the primary and most nefarious limitations is publication bias---only a subset of studies end up getting published, and often the published studies are biased toward demonstrating effects [@Scherer.etal.2018; @Siddaway.etal.2019; @Harrer.etal.2021]. Interestingly, for this review, only `r printnum(task_dias_pairs_count$n[task_dias_pairs_count$reported_significant == "Yes"] / nrow(measure_pairs_trimmed) * 100, digits = 1)`% of the tested task pairs or task/survey pairs reported statistically significant effects. So though publication bias is possible, it might not be as pervasive as it is in other areas. Furthermore, a direct test for publication bias in the A-not-B Cup and Cylinder task comparisons showed no evidence for publication bias.

A larger problem for this review is the quality of the data in the literature. Individual studies can vary in sample sizes, methodological rigor, and generalizability, which can lead to biasing the outcomes of systematic reviews and meta-analysis [@Siddaway.etal.2019; @Harrer.etal.2021]. Therefore, researchers have developed _risk of bias_ criteria to score individual studies. Here, we used Nudelman and Otto's [-@Nudelman.Otto.2020] generic Risk of Bias Utilized for Surveys Tool (ROBUST) to assess the studies included in our review. This tool categorizes the risk of bias for the following criteria: sampling frame (correspondence between theoretical population and sampled population), participant recruitment (description of subject recruitment), acceptability of exclusion rate, sufficiency of sample size, demographic variables (reporting of demographic information), reliability of measurements, setting (appropriateness of experimental setting), and data management (appropriate dealing with outliers, missing data) (see in-depth description of each criterion in Supplementary Materials). We categorized each study for each criterion (Figure S1). In general, the studies had relatively low risk of bias for most criteria (Figure S2). 

By far the greatest risk of bias resulted from low sample sizes---most of the studies analyzed here had fewer than 25 subjects (median: `r median(studies_table$sample_size)`, range: `r min(studies_table$sample_size)`-`r max(studies_table$sample_size)`, Table \ref{tab:studies}; Figure S3). Correlations are notoriously unstable with small sample sizes, resulting in multiple types of inferential errors [@Schonbrodt.Perugini.2013; @Gelman.Carlin.2014; @Knudson.Lindsey.2014]. Therefore, it is possible both that statistically significant correlations are false positives and that statistically non-significant correlations are false negatives. In our data, for three task pairs with multiple studies and significant correlations, the study with the larger sample size was non-significant (Table \ref{tab:taskpairs}). In one task pair, the larger sample size study showed a correlation, while the smaller did not. 

<@~{#sample-size1}

Sample size is a specific concern for the meta-analysis. For the A-not-B Cup/Cylinder task meta-analysis, sample sizes ranged from `r min(anotb_cylinder_data$n)`-`r max(anotb_cylinder_data$n)`. Underpowered studies can influence meta-analysis parameter estimates; however, having at least two well-powered studies can mitigate these issues [@Turner.etal.2013]. While ideally we would like to have more studies with larger samples, we have two with 30 or more, potentially providing some stability in our estimates despite the other studies with smaller sample sizes. Moreover, our estimate of between-study heterogeneity was low, suggesting that, even with small samples, we do not seem to have large sampling effects differing across studies. This is reassuring given the variation observed in dogs in breed, sex, and neuter status, some of which can influence impulsivity [@Junttila.etal.2021; @Junttila.etal.2022]. Nevertheless, as is often the case with meta-analyses [@Valentine.etal.2010], we cannot draw strong conclusions and must call for more studies with larger sample sizes to address these questions.

~@>

<@~{#sample-size2}

A key aim of canine behavioral science more generally should be to increase sample sizes to improve robustness of results. Not only are low sample sizes susceptible to sampling bias generally, but variability is critical in dogs specifically due to the potential for breed differences. This is evident by high heritability measures for inhibitory control in particular [@Gnanadesikan.etal.2020]. Across four cognitive factors derived from 11 tasks, inhibitory control had by far the highest heritability across a sample of more than 1,500 dogs. Given that breeds differ in impulsivity [@Junttila.etal.2022], the breed composition of small samples could have large influences on impulsivity levels measured, which could contribute additional variance and weaken statistical inference.

~@>

Another key potential contributor to bias is the reliability of the measures. One surprising outcome from this review was the variation in different measures used for the same task. Four of the tasks had three different measures for the same task, sometimes between studies, sometimes within studies (Table \ref{tab:tasks}). For our aggregation and meta-analysis, we selected a single measure for each task pair. We prioritized measures that were used in multiple studies, were scaled in the direction of higher values representing more impulsivity, and were commonly used in the literature. While not an arbitrary choice, these selection rules could have biased our results, and other measures might have resulted in different outcomes. In addition to measure selection, measure quality is important as well. Measures vary in their precision, variability, and objectivity. For example, for the A-not-B Cup task, the measure of _first location search_ is limited in its variability because there are only two or three possible outcomes, limiting the variability needed for robust correlations. The measure of _number of trials before correct_, however, is a count of trial numbers (0 to infinity), so it has the potential for more variability. Lastly, measures include uncertainty, and single point estimates may not accurately capture the underlying construct. Almost all of the studies included here [with the exception of @Brady.etal.2018] did not include test-retest validation of their measures. Therefore, the measure values may lack precision, making the correlations less accurate.

Only two task pairs included enough studies to warrant meta-analysis. One of those pairs has recently had a meta-analysis [@Stevens.etal.2022a], so we only conducted a single meta-analysis here: A-not-B Cup and Cylinder. A key limitation of this analysis is the number of studies included---only seven. Though this is relatively high for repeated studies in canine behavioral science, meta-analytic methods, especially estimating between-study heterogeneity, perform better with more studies included [@Harrer.etal.2021]. Other studies [@Lazarowski.etal.2020a; @Tiira.etal.2020] have tested both A-not-B Cup and Cylinder in the same dogs, but they did not report the effect size (though both sets of authors state there was no correlation). Similarly, other studies measured A-not-B Cup/DIAS [@Cavalli.etal.2018] and Spatial Impulsivity/DIAS [@Riemer.etal.2014] without reporting correlation coefficients. We recommend authors always report effect sizes for these correlations with an eye toward future meta-analyses. 


## Future directions

Impulsivity is closely tied to aspects of canine-human interaction: from detection dogs inhibiting the impulse to track non-target scents to pet dogs avoiding eating the scrumptious food (to them) but disgusting waste (to us) while on walks. Much of dog training focuses on impulse control. Thus, the line between impulsivity as a personality trait and a trained behavior is blurred. This, therefore, could result in variation from dogs being differently trained for impulse control, where the training generalizes more to some impulsivity tasks than others. The close relationship between impulsivity and training has two important implications going forward. First, we need to think more carefully about the construct of impulsivity, its different subtypes, and its susceptibility to training. Is the impulsive action/choice distinction sufficient for categorizing impulsivity, or do we need a more nuanced approach? When training impulse control, does this apply uniformly across all subtypes of impulsivity, or does it focus in on particular subtypes, thereby breaking any potential relationship across tasks? Second, given the strong connection between training and impulsivity, we must be more sophisticated in collecting data on training for individual dogs. The current general surveys on training [e.g., C-BARQ, @Hsu.Serpell.2003] may not be nuanced enough to properly measure impulse control training. To dissociate impulsivity from training, as researchers, we must agree on a standard metric for training and measure it along with any measures of impulsivity. We need to focus more effort on understanding the relationships between impulsivity and training.

In a previous meta-analysis of correlations between spatial impulsivity performance and owner perception of impulsivity (via the DIAS survey), different research labs produced different outcomes [@Stevens.etal.2022a]. One of the more interesting possible explanations for these mixed results is potential cultural differences between study populations. Those studies occurred in the United Kingdom, Italy, and the United States. Residents of different countries have different attitudes about, interactions with, and experience with training their dogs [@Bradshaw.Goodwin.1999; @Serpell.2004; @Wan.etal.2009; @Amici.etal.2019]. A key cultural difference relevant here is the frequency of spaying/neutering [@Diesel.etal.2010; @Trevejo.etal.2011], because neutering may influence impulsivity [@Fadel.etal.2016]. Thus, cultural differences should be accounted for when studying canine behavioral science. Moreover, variability in how different labs conduct their studies could account for potential differences seen across studies. Standardization of experimental methods would go a long way to ensure comparability across studies. One solution to this is to engage in big team science. The ManyDogs Project is a consortium of dog behavior researchers interested in conducting the same study across many labs all over the world [@ManyDogsProject.etal.2022]. Since different breeds may show different levels of impulsivity [@Junttila.etal.2022], large sample sizes may address sampling variability problems cause by differences in breed compositions across studies. Implementing impulsivity tasks and the DIAS across a wide range of labs will not only overcome the low sample size problem rampant in this area but also ensures standard methods and allows the analysis of potential cultural differences.

An active area of research in human and rat impulsivity explores the mechanisms underlying this construct [@Robbins.Dalley.2017]. Though studies of genetic [@Hejjas.etal.2007; @Kubinyi.etal.2012], neural [@Cook.etal.2016a], and hormonal [@Rayment.etal.2020; @Junttila.etal.2021] mechanisms of impulsivity are increasing in canine science, we lack a coherent research program on the underpinnings of impulsivity in dogs, and further investigations in this area could be a fruitful area of research [@Olsen.2018]. Understanding the genetic, neural, and hormonal influences on impulsivity has critical implications for the breeding, selection, and training of not only pet dogs but also working and service dogs.

## Conclusion

Currently, we have little evidence for a behavioral trait of impulsivity in dogs. Performance rarely correlates across impulsivity tasks, and owner perceptions of impulsivity often do not match behavioral measures. This may not be too surprising given what we know about the multifaceted nature of impulsivity and the lack of strong signals of a trait in humans and other animals. Moreover, dog owners and handlers expressly train for impulse control, potentially interfering with our ability to accurately measure it as a trait. Further, many of the studies evaluating impulsivity in dogs suffer from small sample sizes, which can lead to weak statistical analyses. Larger-scale studies with a clearer conceptual foundation for the nature of impulsivity and robust measures of impulsivity are needed to verify whether impulsive action and choice do carry over across contexts. Understanding the extent and limits of impulsivity in dogs is critical to the canine-human bond.


## Author Contributions

JB: Data curation, Investigation, Methodology, Project administration, Validation, Writing – original draft; YW: Investigation, Validation; JS: Conceptualization, Data curation, Formal analysis, Funding acquisition, Methodology, Project administration, Resources, Software, Supervision, Visualization, Writing – original draft.

## Conflict of interest

The authors declared that no conflicts of interest exist.

## Data Availability

The data and analysis code are available at: https://osf.io/z6svt/.

## Ethics statement

An ethics statement is not applicable because this study is based exclusively on published literature.

---
nocite: |
  @Brady.etal.2018, @Bray.etal.2014, @Brucks.etal.2017a, @Brucks.etal.2019, @Fagnani.etal.2016a, @Fagnani.etal.2016, @Kelly.etal.2019, @Marshall-Pescini.etal.2015, @Mongillo.etal.2019, @Muller.etal.2016, @Olsen.2019, @Vernouillet.etal.2018, @Wright.etal.2012a, @Barela.etal.2023
...


# References
\scriptsize

<div id="refs"></div>