I am delighted to see Stivala’s piece on geodesic cycle length, which responds to and goes considerably beyond my 2017 JOSS. This article (1) regularizes the terminology I used; (2) replicates my analyses using exponential random graph models; and (3) applies these models to other data sets to examine the degree to which these models predict geodesic cycle length. All of these constitute a welcome (and impressively done) contribution. Yet, I also have a sense that some of the motivation of this paper is to establish the superiority of the ERGM approach, and to treat all others as, at best, fallbacks.^{1}

Here, I will argue that the social networks community is increasingly moving towards an ill-considered ritualization of ERGMs, and in such a way to undermine the distinctiveness of network analysis/mathematical sociology, which had been the great hold-out against the “saming-of-everything” associated with the ideational sink of mainstream sociology. If we do not change course, we will import into our own field the contradictions that have, in the past generation, been recognized, but not solved, in mainstream statistical practice. I first address the contribution made by Stivala to the substantive problem at hand, then discuss the ritualism in current usage. I propose that close consideration demonstrates that the attributes of the ERGM model that Stivala suggests makes it superior to other techniques are more deleterious than advantageous. I try to demonstrate the collapse of the ERGM into the general linear modeling paradigm tends to leads us, and in this case, has led Stivala, to make the same interpretive errors that bedevil current social statistics more generally. I close by making a few suggestions as to where we can go in the future.

Regarding the substantive claims, I think that Stivala here shows some of the advantages of using the same model on many networks. He confirms my arguments about the geodesic cycles being surprisingly large, and surprisingly small, in Patricia’s first two graphs (respectively). Stivala also notes that this is not quite true for the final graph, an analysis I had not done, as this graph had seemed to me to be the same as the second except for the addition of a heteroplanar component which complicates things. (I think that analyzing the two planes together leads to problematic results, but I still should have done the analyses that Stivala does and reported these results.) Further, Stivala takes a set of both real world and fictional networks and shows that in none of these does a straightforward, out-of-box-parameterization version of an ERGM fail to reproduce the largest geodesic cycle, or indeed the distribution of geodesic lengths. As Stivala says, this supports the argument that Patricia’s mental model was such that the geodesic was an important consideration. Two different null models (the

Thus Stivala provides much stronger evidence than had I that there is something comparatively unusual in the networks made by Patricia. Whether this supports my argument as to the fundamentally

In my

Cases in which researchers change the model or drop data to achieve fit may be less common now than when I wrote ^{2}

Of course, sometimes science tells us that our first questions were unanswerable, and redirects us to ones that we

I also do not deny that ritualism can help a field move towards increased reliability, the value of which is not to be underestimated. The opposite pole of ritualism – frenetic individualist innovation, never doing the same things twice – is just as incompatible with scientific advance as is ritualism. It may sound glorious to call for an end to fundamentalist orthodoxy, and to “let a thousand flowers bloom.” But when every piece is both a substantive claim and a methodological innovation (which did tend to characterize the earlier period of social networks research), something is wrong, even if a good time is had by all. What we should be prizing, then, are robust techniques that allow for comparable, theoretically relevant, answers across a wide variety of data sets. Stivala is confident that the ERGM is just such a technique. I am not so sure, and I ^{3}

This seems like a good time to consider the purported advantages. But before simply assuming that we know what to tally up as a plus and what a minus, it is worth being clear as what we want our models to

In the late twentieth century, ideas coming from mathematical sociology had suggested some strong models for informal social structures, especially the notion of balance, still a topic of serious investigation today (

Their next major effort (_{1}, leading to a large family of interpretable models that generated probability distributions of graphs.

Shelby Haberman had noted the relation of the _{1} model to loglinear models, and Stanley Wasserman jumped on this, trying to push a general loglinear modeling framework as our “go to” for social network analysis (the reason

But just as the very success of Nelder and Wedderburn’s unification came at a great cost of allowing sociologists to have a single (and singularly false) vision of society in their minds (

Let me back up for a moment, to one of the earliest models for random graphs, the ^{4}

Harrison White, whose great mathematical vision, based on Lévi-Strauss’s structuralism, was really the inspiration for much of network analysis worthy of the name, once wrote a minor paper, “Parameterize!” (2000). He noted that the great work of his that energized social network analysis – his studies of kinship systems, and then his transposition of these to informal networks via the notions of structural equivalence and role algebras – had nary a parameter in it. He considered this a fault, and was excited that, in his work on markets, he was going to be able to reduce the variation into a single exponentiated parameter. He hoped to do what science does – to look for relations between invariants. What sociology does is something very different, and it is (unlike most physical sciences) based on the Great Divide between the left hand side and the right hand side, and the notion that all parameters are fundamentally of the same (intellectual) nature. I wish that he had warned his readers that not

In terms of programming convenience, there is much to be said for the capacity to reconceive any formal data analysis as an application of a linear model of some very general form. But in terms of the direction of our capacity to generate important arguments about the world, especially structural models, it can prove deleterious. I propose that we think not so much about the

Recall that when the excitement began for what became the ERGM, it did not have to do with the fitting algorithm used, nor with the notion that parameters were maximum likelihood estimates, but with the finding that a Markov graph could (assuming homogeneity of local effects) be factored in a

Stivala takes for granted that it is a point in favor of the ERGM that it can take various other covariates into account. Indeed it often ^{5}^{6}

This point was made quite clearly by

Indeed, in some interesting cases, individual covariates are as likely as not to be endogenous to structure. If the actual structure of a high school is the ranked cliques model, the highest ranking cliques may control access to certain extracurriculars which, if “taken into account” in the model (say, by entering a dummy variable for SHARED_EXTRACURRICULAR), might lead us to reject the ranked cliques model. The point is not that it never makes sense to enter dyadic or individual covariates, but that we must beware of falling back into the strange sociological conviction that the best model excludes no significant predictors (only, perhaps, relying on a claimed causal order to refrain from including post-treatment confounders). This conviction is based on an untenable assumption – and this is one of the few planks on which both those dedicated to causal analysis and those opposed to it can agree – that partialed coefficients on the right side can be treated as “effects,” whose precise metaphysical nature is left unexplored. The use of the ELM reinforces this, and I think we can even see this in Stivala’s analyses.

Stivala argues that the ERGM has the advantage of being able to take nodal attributes into account, and does this in Models 2 and 3 of Table 3, the first suggesting that Christian alters have higher degree, and the second that (not surprisingly) those in the Sphere of the Blue Flame are more likely to be tied to one another than are random pairs of nodes. But if one examines Patricia’s maps, we see that this Sphere is only one of four large clumps of nodes (there are two components, the larger of which easily breaks into three pieces with one or two cuts to separate each). She has labeled this one, but what if she had labelled another (for example, that to the left of the Sphere of the Blue Flame?). We of course would find homophily here as well. What if she had labeled this “Sphere of Ju-Ju” after the most central actor? What if she in fact had labelled every “wheel” (every structure consisting of a hub and its spokes), and we were to take this into account? As we added more and more of these seemingly nodal attributes, our structural parameters would of course change. But in a particular way – we can imagine that, at the end of the day, we would no longer have any idea as to the nature of the structure, because we had misparameterized it as nodal attributes! It is just the nightmare that would cause Harrison White to faint in horror – all structure had been turned back into seemingly individual variables. And yet it is difficult to prevent this sort of regress once one decides to envision one’s job as fitting an ELM. We are ineluctably drawn to add parameters, and the only way to keep track of what we are doing is to treat each parameter as “an effect,” thereby reifying it and projecting into our vision of the world that which is the most convenient interpretation for each that we can think of. And I think we see the way the ELMification of the ERGM pushes our interpretations in Stivala’s discussion of Patricia’s maps.

The sorts of interpretive slippages I will point to in Stivala’s arguments are, I think, characteristic of the way in which sociology has had to make use of the ELM, and how increasingly we see social network researchers interpreting the ERGM. Widespread or not, however, these issues get to the heart of the choice before us, and so I want to look carefully at how, after all the work done, the results of the ERGM are interpreted. Stivala writes, “Given an observed network, we estimate parameters for local effects, such as closure (clustering), activity (greater tendency to have ties), homophily, and so on. The sign (positive for the effect occurring more than by chance, negative for less than by chance) and significance tell us about these processes, taking dependency into account. That is, the parameter tells us about the process occurring significantly more or less than by chance, given all the other effects in the model occurring simultaneously.” This is the way we generally write about our models in sociology. We tend to take the ambiguity of the term “effects” (which can refer simply to a certain type of statistical predictor, but carries connotations of causality) as a cover for stretching a bit beyond what we really are doing.^{7}

But in this case, I do not think that Stivala is correct to say that parameters in ERGMs should be interpreted as giving us ^{8}

A possible example here is raised by Stivala, in noting that ^{9}

One might acknowledge the force of this critique but excuse such interpretive slippage (from parameter to effect, from effect to process) as a commonly cut corner in social statistics – we rarely explicitly remind the reader that our results are only interpretable if our assumed model is correct (one might argue), and so most of us bear this in mind as a mental reservation, making the omission of explicit mention innocuous. But such an omission cannot be seen as innocuous if it is made as part of an argument for the superiority of a certain model!

One might also accept my argument in principle, but demand that evidence be shown of a concrete misinterpretation. Such evidence may rarely be forthcoming, if all we have is a single model with parameters crying out for tendentious interpretation. This is where the virtues of comparing results across ^{10}

One will note here the unwarranted leap from the GWDEGREE parameter, which really describes the ceteris paribus degree

I have the feeling that many social networkers go from the admirable properties of the ^{11}

Even if the model is correct, its coefficients are not necessarily indicative of any particular

What do we want from ERGMs? One common answer among non-network-researchers is that we want them to adjust our models for dyadic data to deal with the statistical non-independence of observations. For example, someone is interested in high school friendship formation, and has a model including observed covariates, but a conventional logistic regression on these covariates, even if the model is correct, will not reach the maximum likelihood estimates of the parameters because of the violation of conventional sampling axioms. Since, however, we invariably do ^{12}

One possibility would be to return to the notion that we are attempting to generate known distributions of graphs – the parameters are merely a flexible way of doing what was being done in, say, the U|MAN analysis via combinatorics. (This is the interpretation of

We see some residue of this way of thinking about ERGMs in the concern with ^{2} were considered to be “bad models.” We no longer think this way – the degree of residual variance can be high in a model with important (and not merely statistically significant) predictors (and, as Mike Hout used to say, “Who wants to live in an ^{2}=0.9 world, anyway?”). But in ERGM modeling, “fit” is used (quite properly) in a different way – if we cannot reproduce the graph statistics, then we are not actually generating the family of graphs that we might be claiming is the class which includes the observed.

It is, therefore, quite understandable that fit becomes of great concern to those using ERGMs. Still, fit is only a means to an end of making meaningful statements about the world – it is not an end in itself. We know that bad models (e.g., those that condition on post-treatment confounders, for those doing causal modeling) can have better fits than good models. Yet I see with the current use of ERGMs a tendency to slip into a working consensus that one’s job is to fit data, and the better the fit, the better job has been done – even if this does not advance our understanding. I think there are two problems with this. One is that it can lead researchers to prefer ad hoc elaborated models that fit any particular case (or that

For this reason, I think that the out-of-the-box models that Stivala uses are the

The second problem with the emphasis on fit is that it actually flies in the face of what I at any rate hold to be the most successful procedure for using statistics to build social scientific knowledge, namely, falsification. The key evidence supporting my argument about the root social networks schema being spatial was not the capacity of a spatial network model to fit the data – this capacity should be obvious upon inspection of the raw data. Rather, it was the

Let us see whether fit is a good guide for determining when a model is helping us by considering the difference between the ERGM results for Patricia’s 1992 and 1993 networks. Stivala compares the results of the ERGM and the

The implication seems to be that we should be comparatively happier with the ERGM (compared to the

The model fits this statistic – but what does that tell us about the

Our knowledge, then, comes less from fit than from

Let me give as an example of how we best learn from ERGMs using a paper by

Had Gondal and McLean

I am grateful to Stivala not only for taking seriously (as do I myself) the analysis of these somewhat strange data, but, by placing this analysis in comparative perspective, strengthening the conclusion. And I am grateful to Stivala for connecting this idea to proper mathematical vocabulary. But I am also grateful to Stivala for giving us the opportunity to reflect on current practice, and on where we are going. It would be a shame if we were to commit ourselves to a monoculturalism that aligns us with the thoughtways of the ELM, precisely what structural sociology was trying to escape. But I do not mean to argue that the problem with the current use of the ERGM-as-ELM is that it is supporting a rather fundamentalist orientation among some adherents (and I certainly do not think that this characterizes all users, let alone the pivotal developers of the method). Rather, I think what we need to do is to re-awaken our interest in the ERGM-not-as-ELM.

Whether or not any particular model cast as an exponential function and fit using an MCMC method is an advance, we should recognize that the core approach that lies at the heart of the ERGM is a beautiful and generative idea. Indeed, it turns out that the same fundamental vision underlies the random graph model, and the Gibbs sampling used to identify it, as well as being deeply connected to the pseudolikelihood as well. This is the Boltzmann equation, and the notion that there are consistent analogies that can be made between graph configurations and physical systems with variable energy levels. It is this sort of interest in pursuing elegant and rigorous mathematical derivations that separated the field of mathematical sociology from statistics-as-generally-understood (which concerned itself largely with issues of inference).^{13}

We should be more enthusiastic about the equivalent of an Ideal Gas Law that can set the direction for plausibly cumulative social science than Actual Gas Fits that only predict the past using parameters that we all agree to pretend are meaningful. There is nothing outlandish in proposing that we pursue such

^{∗}) Models for Social Networks.”