Trade-Offs in Furry Research: [adjective][species] vs. The IARP

Guest post by Courtney “Nuka” Plante, PhD social psychology student at the University of Waterloo, furry, and co-founder of the International Anthropomorphic Research Project.

To paraphrase Lao Tzu: “There are many paths to enlightenment.” This statement is just as true of science as it is of philosophy or spiritual fulfillment. In science, knowledge is seldom gained through one perfectly-designed study that single-handedly topples all preexisting theories. Rather, the progress of science involves the convergence of dozens or hundreds of studies by numerous scientists, all with different approaches to the the topic at hand.

The reason science progresses this way becomes apparent as you delve into the empirical literature. Search as you might through the annals of scientific history, you will never find a perfectly-designed study. Every study has its confounding variables, alternative explanations, limitations in generalizability, problematic variable operationalizations; the list goes on and on.

Why are there no perfect studies? Is history rife with bad scientists who simply cannot design a good study?

Quite the opposite, I contend. Designing a “perfect study” is about skillfully deciding how to approach the unavoidable tradeoffs that arise in empirical science. It is about crafting your study so that its strengths speak to the question of interest, while its weaknesses are either minimal or, at very least, overcome through converging evidence with other studies.

Those empirically studying the furry fandom are bound to these same realities when it comes to study design. Whether it’s the folks at [adjective][species] or the social scientists that comprise the International Anthropomorphic Research Project (IARP) they both aim to answer questions about the furry fandom through the collection and interpretation of data. While this overarching goal may be the same, the individual questions and the means of collecting data differ drastically. Asking whether the [a][s] approach or the IARP approach is “better” or “more correct” is misguided, as it overlooks the fact that both approaches can provide converging evidence through complementary methodologies, and together they can answer a broader range of questions than either approach could by itself.

To illustrate this point, I outline several of the tradeoffs that we have to contend with when deciding how to study the furry fandom. By emphasizing how [a][s] and the IARP differ in our approaches to these tradeoffs, and by showing the merit in both positions, I hope to show that the cause of furry science is advanced through researchers pursuing both approaches.

Trade-off #1: Research Questions – Description vs. Inference

The first challenge a furry researcher faces is deciding which question(s) they wish to answer. At first glance, this may seem like little more than plucking the first inquisitive thought about furries from one’s mind. In reality, a research question must be precisely honed to be useful. Consider, for example, a question like “What’s the deal with furries?” This question is too vague to be pursued: how would one even begin to collect data to answer such a question? While this is a particularly extreme example, we, as researchers, are often asked similarly vague or ill-defined questions, or questions which do not lend themselves well to empirical answers from curious, well-intentioned furries:

What makes foxes different?” – Different from what? How do you define “different”?

Are furries screwed up?” – Compared to who? How do you define “screwed up”?

What about people who say they’re not furries, but they’re clearly furries?” – How do you even begin to find and study such a group? How do you know when you have found one? And what is meant by “what about”?

Such examples hopefully illustrate the importance of having a clear, well-defined research question that can be answered empirically. The questions that guide our research generally fall into two camps:

Description – These questions ask about the frequency, average, or range of something. They may include questions such as “What percent of the furry fandom is homosexual?” (approximately 22%), “What is the age of the average furry?” (approximately 22 years old), or “Are there more foxes or wolves in the furry fandom?” (wolves). Descriptive questions are “snapshots” of the furry fandom.

Inference – These questions ask for a conclusion to be reached on the basis of some kind of comparison. This usually involves comparing one group to another, one characteristic to another, or comparing something over time. They may include questions such as “Are foxes more likely to be outgoing than wolves?”, “Do furries have a better sense of community than anime fans?”, or “Does spending more time in the fandom reduce furries’ likelihood of developing depression?

Descriptive questions are more frequently asked of us by furries themselves, who are most interested in learning about the state of the fandom. In contrast, academics (e.g. psychologists) are less interested in getting an accurate picture of the furry fandom and tend to be more interested in broader implications (e.g. what furries can tell us about fandoms in general) or mechanisms (e.g., what causes furries to feel a sense of community). As such, academics tend to ask more inferential questions.

[a][s] caters primarily to a furry audience: the vast majority of its readers are furries, and it is run by furries. As such, [a][s] tends to pursue, collect data on, and report answers to descriptive questions that interest furries. In contrast, the IARP, while catering to a furry audience, also aims to publish research in academic journals and has, on its team, a number of non-furry social psychologists. As such, the IARP tends to pursue more inferential questions, questions that aim to either understand the mechanisms underlying phenomena in the furry fandom or to generalize beyond the furry fandom to other groups or to humanity more generally.

Both approaches serve an important purpose. Descriptive approaches help us to better understand the “landscape” of the furry fandom, while inferential questions help us to understand the mechanisms driving participation in the furry fandom, and how the furry fandom relates to other fandoms. Where both approaches provide converging evidence (e.g. showing that 21% of furries identify as completely heterosexual and that this number is significantly less than in the general population), they bolster our confidence in the obtained findings.

Trade-off #2: Presentation – Simplicity vs. Accuracy

A consideration that is perhaps less central to study design, but is crucial to reporting, is the way we present our data. Decisions about data presentation are guided by two primary factors: the nature of the data and the audience who reads the data.

Descriptive questions yield descriptive data, which lends itself nicely to a multitude of visualizations (e.g., pie charts, area charts, etc). Summary statistics, such as averages, ranges, and standard deviations can also be used. Such decisions depend both on the perceived knowledge of the reader and the intended medium of the presentation. For example: a good visualization can convey a tremendous amount of data very efficiently, and lends itself well to concise summaries which can be easily shared and distributed. In contrast, a paragraph of summary statistics, while perhaps more informative, may be less readily parsed by readers, may not feel as intuitive, and may not be as easily shared as a single figure. Moreover, if the audience lacks the statistical sophistication to understand what a measure of variance is, presenting such statistics may confuse the reader, effectively undermining the point of presenting the data at all.

This is the tradeoff of data presentation: conveying the data as accurately as possible while making it as memorable and easily readable as possible. Almost inevitably, the more accurately the researcher attempts to be with the data, the more complex it becomes. Imagine, for example, a simple statistic like “The average convention-going furry is 26 years old”. This number, in and of itself, is relatively straightforward: a person can walk away remembering “con-going furries are about 26 years old on average”. In contrast, if I said “Convention-going furries ranged from 18-66 years old, with a mean of 26 years and a standard deviation of 2.5 years in a distribution that was significantly positively skewed”, while this would be a more accurate picture of the age data, it may be overwhelming for a reader who has no concept about the significance of the data’s range, distribution shape, or variance statistics.

Where this trade-off becomes a slight nuisance in descriptive data, it becomes a large issue in inferential data. Take, for instance, a question about the relationship between fursonas and self-esteem. One way of reporting this data would be to say “in general, furries with fursonas who were significantly different from their non-fursona self tended to have lower life satisfaction”. It would be more accurate, however, to state that “the extent to which a person indicated that their fursona differed from their non-fursona self significantly negatively predicted life satisfaction, B = -.455, p < .001” (and, if one wanted to be particularly stats savvy, they could throw in the t-value, MSE, and calculations of Cohen’s d). Such a sentence, while again more factually correct, would be far less likely to be useful to a casual reader, who only wanted to know whether or not fursonas were related to life satisfaction.

If the goal of the researcher is to inform a general furry audience, they have to write at a level that will be understood by this audience. While the approach of the IARP is to provide some statistics where it is possible to do so without compromising the ability to easily understand the results, we tend to err on the side of a more readable presentation of the data, with the rationale that if the reader takes away nothing from the finding because they cannot interpret it, then we might as well have not presented it at all.

Trade-off #3: Sample – Representativeness vs. Convenience

One trade-off plaguing researchers in the social sciences is the tension between wanting to acquire a representative sample and the inconvenience of acquiring such a sample.

At first blush, it may seem like a matter of laziness: if the goal is to do good science, one should always strive to get the most representative sample as possible. It is bad science, after all, to make claims about the entirety of the furry fandom when you have only collected a small, biased subset of the broader fandom. While this may be true, two important factors need to be considered: pragmatics and the reality of sampling.

Pragmatically, there are only so many resources available to researchers. [a][s] operates within the constraints of a website, while the IARP, while able to conduct online surveys and attend conventions, is bound by the realities of budget limitations and an ethics board. As a result, [a][s] and the IARP collect different samples of furries.

[a][s] is consistently able to reach a large sample of online furries, usually larger than that of the IARP. [a][s] samples are also a better representation of the broader fandom because they are able to include minors, who comprise a significant portion of the furry fandom. The IARP are unable to study minors due to the limitations of ethics boards, which require parental consent. In contrast, the IARP routinely collects data from furries at conventions, who may have less of a web presence and thus be otherwise missed by the [a][s] survey.

Each sample, taken in isolation, suffers from significant limitations. But taken together, they overcome each other’s weaknesses and, where the data is in accordance (e.g., [a][s] estimates that about 21% of furries self-identify as exclusively heterosexual while the IARP estimates this number to be approximately 24%), we have greater confidence in the data’s representativeness of the fandom as a whole.

The reality of sampling is such that it is impossible to get a perfectly representative sample of any group. The most representative sample of a group would involve studying every member of the group itself. In the furry fandom, where boundaries are ill-defined and where furries may not attend conventions or know about the IARP or [a][s] online surveys, it is impossible to get a perfectly representative sample of the fandom as a whole. Instead, we, as researchers, do our best to maximize the size of our samples, to collect samples as broadly as possible, to avoid systematic biases where pragmatically possible, and to qualify our findings by recognizing the limitations of our samples.

As an illustration of this last point, the IARP has, stressed that, until 2012, the “control” sample was significantly flawed in that it was comprised of people who, while not identifying as a furry, were nevertheless attending a furry convention or were taking the online survey which had been advertised on furry websites. As such, all comparisons to these control groups were to be qualified by a recognition that they were not ideal control groups. More recently, efforts have been made using online crowdsourcing websites such as Mechanical Turk, to create a control group that is more representative of a general American population. While, again, it is not a perfectly representative sample, and itself suffers from its own biases (e.g., people with enough internet savvy to use Mechanical Turk), it nonetheless represents a systematic improvement in sampling techniques. This striving for improved sampling techniques, coupled with the use of converging evidence from multiple samples, provides the best solution to this trade-off.

Trade-off #4: Study Design – Correlational vs. Experimental Design

A trade-off that goes hand-in-hand with the description vs. inference trade-off is that between correlational and an experimental design.

A correlational study involves simultaneously assessing several variables of interest. Perhaps the most common example of a correlational study is a survey, the type commonly employed by [a][s] and the IARP. With a correlational study design, a researcher is able to not only provide descriptive statistics regarding the state of the furry fandom, but they are also able to look for patterns in participants’ responses: for example, are participants who report having been in the fandom for longer more likely to identify as homosexual?

By running simple statistical tests (e.g. chi-squared tests, comparing the magnitude of correlations against zero, etc), researchers using correlational designs can identify trends, but with an important caveat: correlation does not equal causation. These analyses only allow researchers to see that there is, indeed, a relationship between two variables: say, for example, between length of time in the fandom and sexual orientation. Such designs, however, say nothing about the direction of this pattern or which of several alternative explanations is true: is it the case that homosexual furries are more likely to remain in the fandom for longer, that as furries spend more time in the fandom they become more homosexual, or is there a third variable (e.g., identifying with a stigmatized minority group) that predicts both homosexual identification and length of time spent in the furry fandom?

While correlational data can be useful when it comes to providing a snapshot of the fandom and the pattern of results between variables, a simple correlational design cannot lead to conclusions about casual order (although there are some correlational designs, including longitudinal designs and cross-lagged panel designs, which can better address these problems relating to “which variable came first?”). On the upside, correlational studies are relatively easy to implement and can be relatively simple to analyze.
In contrast to correlational designs, researchers can employ experimental designs. The IARP regularly conducts experiments in its own work. Experimental designs involve the manipulation of a variable with random assignment of participants to the experimental conditions, and then assessing whether or not this manipulation led to systematic differences in the variables of interest.

For example: in a recent study, the IARP created three versions of its survey: one that asked furries to compare themselves to anime fans, one that asked furries to compare themselves to sports fans, and a control condition where furries made no comparisons. These different versions were mixed up and handed out to participants at a convention randomly, so that participants were randomly wound up in one of these three conditions. The variable of interest in this study was furries’ beliefs about whether or not being furry was something biologically determined. Because furries were randomly assigned to one of the three conditions, there were no systematic differences between the groups except for which comparison (or no comparison) they were asked to make. As such, any systematic differences between the groups could only have been caused by the manipulation of comparison group. Because of this, the differences that we observed (furries who were asked to compare to anime fans were more likely to say that being furry was something biologically determined than furries who compared to sports fans) were said to be caused by our manipulation, and not by something else.

Experimental designs, unlike correlational designs, allow us to make statements about the direction of causation, which allow us to test questions about psychological mechanisms. As you can see, however, such designs can be far more difficult to implement, and are typically limited to studying a small handful of manipulated variables at a time. They also require a high degree of precision, as an ineffective or problematic manipulation can lead to interpretation difficulties. Finally, such designs may also be highly artificial, as attempts to manipulate the independent variable, while effective, may not reflect naturally occurring events, leading some to conclude that differences between conditions are artificial and do not reflect real-world patterns of findings.

While it may be tempting to say that all questions should be answered with experimental designs, it again comes down to a matter of pragmatics and appropriateness. Descriptive questions, for example, may not warrant experimental designs or the use of inferential statistics to interpret them: if a person simply wants to know what percentage of the fandom identifies as homosexual, an experiment is not necessary. Furthermore, because experimental designs can be so resource-consuming, there is simply not enough time nor resources to run every experiment we would like to run.

Correlational studies are often far easier to run and interpret. While it would be nice to know the direction of causation for some effects, it is often implausible or simply impossible to manipulate some variables. For example: if a person wanted to know whether or not having a fox fursona led to systematic differences as compared to having a wolf fursona, a true experiment would involve randomly assigning some participants to have a fox fursona and others to have a wolf fursona. Given that people create their own fursonas, this is implausible as a design. Another example of this involves biological sex: given that we cannot “randomly assign” people to be biologically male or female, questions about direction of causation in these regards are impossible to test experimentally.

To summarize this section, while [a][s] tends to employ correlational designs and the IARP tends to employ a greater mix of experimental and correlational designs, this does not mean any one design is more valid or useful than another. While a significant correlation from an [a][s] survey may not mean that we can necessary make claims about direction of causation, such studies may provide converging evidence, alongside an experiment or longitudinal study from the IARP, which help to strengthen the conclusions we make about our findings.

Trade-off #5: Survey Design – Brevity vs. Data Breadth

The last trade-off I will mention is the trade-off between a survey’s brevity and its breadth. [a][s] surveys have always been shorter than IARP surveys, which can often exceed more than 200 questions in length. While, at first blush, this might seem to suggest that IARP surveys are simply “better”, there are number of significant limitations that arise from having a longer survey:

  • Firstly, longer surveys require more resources: more payment for participants to complete a longer survey, more time to analyze more variables, and (at least in the case of printed surveys) larger printing costs.
  • Second, larger surveys may dissuade some participants. For example, if, at a convention, the IARP is attempting to hand out a 10-page survey, there will likely be potential participants who will turn down the survey simply because it will involve a time investment they are unwilling to make. Far from being trivial, this may represent a potential bias in the sample, which may only be including furries who are particularly motivated to take a long survey instead of doing other events at a convention.
  • Third, longer surveys carry with them the possibility of participant fatigue: participants completing the first ten questions of a survey are likely more attentive, thoughtful, and patient than participants who complete the last ten questions of a 250-question survey. This leads to the possibility that data collected in longer surveys may be of poorer quality than data collected through shorter surveys.

As with the previous trade-offs, the message should be clear: while there are advantages to having surveys that include hundreds of variables, the weaknesses of such designs should be complemented through the implementation of shorter surveys. Shorter surveys are strong in their ability to draw more potential participants and to retain participant attention. This is why [a][s], with its shorter surveys, are a welcome complement to the more lengthy IARP studies.

Conclusion

I started this article off by suggesting that there are no perfect studies, nor is there one perfect way to summarize, analyze, and present the data from any one study. As my (admittedly brief) review of the different trade-offs involved hopefully makes clear, researchers are just as much artists as they are scientists, balancing each of these trade-offs to create studies that maximize the ability to answer the question of interest while minimizing weaknesses.

The best strategy for overcoming the weaknesses inherent in any one design, interpretation strategy, or presentation is through converging methodologies: approaching the same topic from multiple directions. The IARP believes, as we hope the folks at [a][s] do, that having multiple researchers who share the same goal of better understanding the furry fandom, and who set upon this goal through different approaches, represents the ideal way to get the fullest, most accurate picture of the furry fandom and to provide the best answers to the most questions, from furries and non-furry academics alike.
For these reasons, I choose to see the IARP and [a][s] not as rivals, but as partners in science, each seeking a slightly different, but converging path toward enlightenment.

Before posting a comment, please read our Code of Conduct

4 thoughts on “Trade-Offs in Furry Research: [adjective][species] vs. The IARP

  1. I think the “perfect study” is done independently by multiple people in different places, more than once – a single study is practically never very conclusive :)

    1. What you’re describing is replication, and pretty much every scientist would agree with you :) Of course, you could also add that the “perfect study” is not only done independently by independent researchers in different locations, but they are also done using conceptually similar, but not identical, operationalizations of the variables of interest, to show the robustness of the obtained effect (e.g. one researcher measuring aggression by how loud a noise blast participants give, while another researcher measures it by how much hot sauce you give to someone). Such conceptual replications not only show the robustness of an effect, but can be crucial to ruling out potential methodological confounds and establishing important boundary conditions of the effect. :)

  2. Something I would like to see more of out of these studies is results that are normalized.

    More than likely, Nuka knows what I’m talking about, but for the benefit of other readers I’ll go into a little more detail. As an example, the article gives a figure that 22% of furries identify as fully heterosexual and states that this is significantly lower than the normal population. But how do the numbers compare when they are demographically normalized? Let’s consider two demographics here – age and gender (there may be other demographics such as ethnicity and religious affiliation that will affect the results but for the sake of this example we’ll focus on these two). We are also told that the average furry fan is 22 years old, and let’s say furries are 80% male (an offhand guess since I’m too lazy to look up the actual figure from the surveys). Could part of the reason that the 22% figure is so low is because younger people in general are less likely to identify as fully heterosexual, or because males are less likely to do so than females? After all, the average age of the general population is probably somewhere in the late 30s and is about 49% male. Thus, to “normalize” the data we would need to remove these variables from the equation, by comparing, for instance, the figures for 20-24 year old male furries to 20-24 year old males in the general population. You would do the same for 20-24 year old females, 18-19 year old males, 18-19 year old females, 25-29 year old males and females, and so forth. These demographic subsets are likely to be of interest in themselves, but to aggregate them and compare them to the general population, you would need to “weight” the results in each group from the general population to match their proportion of the furries surveyed. (Or maybe not – here again Nuka would likely know better than me what the standard methods are for normalizing the data.)

    1. Thanks for the excellent comment! =^_^= It raises several points:

      a) You raise an important point regarding the demographics of our sample and the importance of comparable control groups when conducting experiments or making inferences about the demographics of one group as compared to another group. Just like there are no “perfect” studies, there are no perfect control groups – groups in which to compare the group under study. This is true of our research on furries: Were we to find a group whose composition is similar to that of furries (e.g. approximately 85% male, early twenties, predominantly white, liberal-minded, and about 50% atheist/agnostic), then, yes, when comparing furries to that particular group, if we found significant differences in the prevalence of heterosexuality, we could infer that they may have something to do with “being furry”. That said, using such a control group severely limits our ability to generalize to a broader population: when most people ask “are furries, you know… gay?”, the unspoken comparison they’re making in their head is “the rest of us”, or population in general. As such, we at the IARP have tried our best to sample several different possible control groups, including representative general population samples, samples of undergraduate college students, samples from related fandoms (e.g., anime), and samples from people who are fur-friendly but who are not furries. There is no perfect solution, and picking such control groups represents a trade-off between the extent to which you can generalize your results to a broader population and the extent to which you can claim that it’s “furry”, and not a confounding variable, that’s causing the observed difference.

      b) With regard to the “normalization” to which you are referring, there are two ways one can approach the topic. The first, as you suggest, is to “compare apples to apples” – that is, find a comparable group on the variables of interest. The benefit of such a strategy is that it’s reasonably straightforward and easy to do. The downside is that you can never create a perfectly matched sample: you can pick a few variables of interest (e.g. matching samples on age and gender), but with a near-infinite number of possible confounds (some of which, admittedly, are more likely than others, such as political orientation, religious beliefs, socio-economic status, education, etc…), at some point it becomes impractical to create a comparable group. As you suggested, one can focus on a single variable of interest (e.g. age), and see how furries compare to similarly-aged groups. And, indeed, it’s a good suggestion that, in the future, we may well try to do in our data presentation. Given that often times, however, [a][s] and the IARP are asked questions about furries specifically (e.g. “What percentage of furries are homosexual?”), which are more demographic-type questions, such comparisons are not necessary to answer these types of questions.

      A second approach to data “normalization” is actually something that we, at the IARP, do very routinely: statistically controlling for the effect of other variables, or creating “covariates”. More often than not, when attempting to answer inferential questions (e.g., trying to make an argument about, say, the happiness of furries compared to a control group comprised of a sample of the general American adult population), we recognize fully that there are plenty of differences between the samples. As such, when we’re conducting our analyses, be they t-tests, ANOVAs, regression analyses, or structural equation modeling, we statistically control for the confounding effects of age, gender, religious, and political differences by including them as covariates in these analyses. Doing this “partials out” these influences: that is, if there is a difference between furries and non-furries that it caused by, say, age, we “remove” that difference first, and then look at the remaining difference between the groups to see whether or not, after removing the effect of age, there is still any difference left to be explained. This is, of course, a simplification of the process, but that’s essentially what we do statistically when we run analyses for papers. We do this instead of presenting side-by-side descriptions of each of the groups of interest (e.g. furries at 20 vs. non-furries at 20, furries at 25 vs. non-furries at 25) because, more often than not, age is not actually a variable we’re interested in, it’s a nuisance variable we want to control for the effects of, or to rule it out as a potential alternate explanation.

      So, in short: we typically don’t present data in a “side-by-side” manner (comparing all the 20 year old furries to 20 year old non-furries) mostly because it’s a question that few people seem interested in, in and of itself. Furries usually just want to know about furries, and social scientists usually want to rule age and other confounding variables out of the picture and see if there’s something about “furry” that makes it unique or which generalizes to, say, fandoms more broadly. That said, it may be worth, in a future data presentation, presenting the data this way, not only as a way of “showing our work” so to speak, but also because it may yield some unexpected insights :)

Leave a Reply

Your email address will not be published. Required fields are marked *