Introduction:

In the world of science, scientists are rewarded for the quality of their publications. But, sometimes they are also rewarded for the relative ordering of author names-what we may call the first-author model. This incents different kinds of citation behaviour and these distinct credit-assignment models probably lead to different citation networks.

Among mathematicians and physicists who adhere to the alphabetical ordering of author names, this incents scientists to find brilliant collaborators. On the other hand, if relative author ordering matters as is the case with biologists we might expect scientists to prioritise finding brilliant collaborators and first-authorship, probably not in equal measure.

To a first-order approximation, we may understand the difference between these two types of credit-assignment systems by comparing the number of alphabetical orderings with the number of first-author orderings as a function of , the number of co-authors.

Alphabetical order:

Traditionally, in math and physics a group of researchers that co-author a paper use alphabetical orderings by default so we have:

\begin{equation} \forall N \in \mathbb{N}, A(N)=1 \end{equation}

where stands for the number of alphabetical orders as a function of . Although some information may be lost by adhering to alphabetical ordering one of its advantages is that it reduces the risk of internal friction within the group of authors.

An upper-bound on author orderings:

In a world as complex as ours, each author might have their own metric so orderings are possible where each author is a node in a fully-connected graph and it’s assumed that there are three possible labels for each edge in the graph.

However, most of these orderings aren’t linear orders. In order to have a linear order all authors must organise to use a single metric. How does this consensus emerge? Politics? Meritocracy? I have no idea. In any case, if is the number of first-author orderings it’s reasonable to believe that:

\begin{equation} \forall N \in \mathbb{N}, F(N) \ll 3^{N \choose 2} \end{equation}

where represents a maximally diverse number of orderings.

First-author orderings:

If no ties between authors are possible then is simply the number of hamiltonian paths in the fully-connected graph with nodes so we have:

\begin{equation} F(N) \geq N! \end{equation}

But, if we allow ties then for each of the edges in a hamiltonian path there are two options, . So in general we have:

\begin{equation} F(N) = 2^{N-1} \cdot N! \end{equation}

In a group of co-authors we might deduce that the fraction of orderings where a particular author comes first is given by:

\begin{equation} \frac{F(N-1)}{F(N)} = \frac{1}{2N} \end{equation}

so there’s a risk that the degree of selfish behaviour might increase as the number of co-authors increases because you might need to do more work to convince the other co-authors that you contributed more than them.

Furthermore, we may intuit that but we can make this comparison precise using:

\begin{equation} \forall e \leq A \leq B, \frac{A}{B} \leq \frac{\ln A}{\ln B} \end{equation}

Using the above inequality we find that:

\begin{equation} \frac{2^{N-1} \cdot N!}{3^{N \choose 2}} \sim \frac{2^{N-1} (\frac{N}{e})^{N}}{3^{\frac{N^2}{2}}} \leq \frac{2N \ln N}{\frac{N^2}{2} \ln 3} < \frac{4 \ln N}{N} \end{equation}

so the extent to which first-author orderings can capture a diversity of views vanishes faster than . From this analysis I can infer that the first-author model is more suitable for small numbers of authors.

Discussion:

At this point I must acknowledge that this constitutes the beginning of a mathematical analysis which must be refined. How can we model the outcome of sequential self-centered behaviour under both paradigms?

What kind of citation dynamics do these different credit-assignment models encourage? What if authors regularly co-author papers together? These are questions to be addressed in a future article.