Browse

• Abstract

SECTION I

## INTRODUCTION

By its very definition, bibliometrics is a set of methods to quantitatively analyze scientific and technical literature. As such, when used in research assessment, the purpose of bibliometric indicators is to quantify the performance and/or impact of a large set of scholarship products (referred to, in the following, also as “research products” or simply as “papers”). The ideal process to perform such an evaluation would simply be to rely on the very same analysis one uses for assessing the quality of a single paper: asking independent peer experts to read all products and score their impact. Regrettably, such direct assessment cannot scale to the current (and increasing) rate of production and volume of research products.

Since product quality cannot be directly evaluated, one therefore needs a suitable measurement proxy, the most widely accepted being citation analysis, which is based upon the assumption that a citation (from another scholarship product) is a positive reflection on the scientific quality of a paper. Such a choice is not immune from criticism [1], e.g., citation count is unable to distinguish between positive citations to seminal work and negative citations to poor work or incorrect results; furthermore it is also unable to discern between rhetorical citations, added as simple pointers to a specific area, and those with a true meaning of scientific remuneration. Notwithstanding these flaws, citation count undoubtedly still remains the most accepted approach by the scientific community.

It is however worth stressing that alternative indicators have recently been proposed, such as the number of on-line views or downloads [2] and the number of hits in scientific on-line bookmarking tools as Zotero and Mendeley [3]. They have attracted an increasing burst of interest, and first studies have recently appeared offering initial validation of their capabilities of measuring scientific impact [4]; despite very promising they are not yet mature enough to be considered as valid alternatives to more classical, citation-based, journal indicators.

The crude number of citations can be directly employed as a proxy for the quality of a single paper, but such counts must be aggregated and averaged to obtain indicators of the scientific impact of a journal. Indeed, such indicators are used widely by librarians for ranking, evaluating, categorizing, and comparing journals.1 Each metric has its own particular features, but in general, they have all been designed to provide rankings of and insight into journal quality based on citation analysis. Ideally, any indicator should meet at least these three fundamental criteria:

• It should reflect the true scientific quality of a journal. Highly respected publications, such as Science or Nature will have a much higher indicator than a low-reputation “Journal of Obscurity”;
• It should be consistent over time. Because publication reputations build over many years, an accurate index should not exhibit large fluctuations over a limited time period;
• It should be very robust (and possibly immune) to external manipulation. Explicit actions creating an index increase not corresponding to an increase in the actual quality of the journal should be difficult (if not impossible) to implement.

In the last half century, many measures have been proposed, each aiming to fulfill the three requirements given above. While a complete taxonomy is beyond the scope of this manuscript, we provide descriptions and analyses of ten such measures which have attracted greater attention, focusing in particular on the six which are currently computed in the Thomson Journal Citation Reports (JCR) [5] and in Scopus [6]. The definitions and properties of these indexes are reported in Appendix.

The manuscript makes a qualitative comparison of different bibliometric indicators and is organized as follows. In Section II we describe features and flaws of the most widespread (and misused) bibliometric indicator, namely the Impact Factor (IF) [7], [8]. Section III reports the main features of the most widely employed alternatives to the IF, highlighting their advantages and differences with respect to it, as well as introducing the important distinction between popularity and prestige measures. Section IV explores what information on such journal quality one may hope to extract from the indicators and highlights that the quality of a journal is sufficiently complicated that it cannot be measured by any single indicator. Conversely, the multiple indicators offer a more balanced perspective on journals impact and we will show that they may even be employed as warning of the possible existence of manipulative practices. Finally, Section V presents conclusions and Table I reports, for the readers' quick reference a list of all acronyms used through the manuscript.

TABLE I LIST OF ACRONYMS AND PRINCIPAL SYMBOLS USED IN THIS PAPER WITH THE CORRESPONDING SIGNIFICANCE
SECTION II

## THE IMPACT FACTOR AND ITS FLAWS

The IF [7], [8] (cf. Appendix part A) introduced by Garfield in 1972 as a measure of journal impact and computed by normalizing the expected relationship between journal size and number of received citations is certainly the most widespread and well known among the bibliometric indexes, but is also the simplest and most crude. Commonly mentioned flaws2 of the IF are:

• the absence of normalization for different citation practices and traditions in the various scientific areas [11], or of the distinction of the “quality” of the journal which is the source of each citation [13];
• the bias caused by the choice of the 2-year time window for collecting article citations; as Garfield himself in fact pointed out [14], by changing the “2-year based period to calculate impact, some kinds of journals are found to have higher impact;”
• the lack of transparency in the data used for its computation;
• the relative liability of the IF with respect to active manipulation practices [15], since:
• IF is generated from citations to all content, including non peer-reviewed content like editorials. Such content does not count in the denominator of the IF (being non-citable) but can inflate the numerator;3
• IF counts self-citations (citations from a given journal to itself), which creates a very favorable condition for the editors of journals (and possibly others involved in their management) to engineer its value by artfully placed self-citations [16], [17].

The first two criticisms are both self-explanatory and fairly straightforward to address in a computational manner (cf. Section III). As to point iii) Thomson uses proprietary data for IF computation, which are not audited and have exhibited many inconsistencies, as shown by [25], [26], [27], [28], where attempts to replicate or to predict the reported values have generally failed. Problem iv-a) of citations to non peer-reviewed content can also be addressed computationally, as explained again in Section III, but the criticism iv-b) is more subtle since it may also extend to the area of ethical misconduct where both numerical evidence and survey-based studies provide indications of the existence of the problem, as is explained in details in Sections II-A and II-B.

### A. Some Trends in Journal Self-Citations

Appropriate use of self-citations is a complex issue, since they cannot and should not be considered as a negative practice per se. On the contrary, a large number of self-citations can be reasonably expected when high-impact journals are the submission target, especially when there is a single top-level publication which is the reference target of a small community. Furthermore, authors can be naturally expected to cite their own previous work in the same area, and it is also reasonable to expect that authors publishing in high impact journals are more experienced and successful, and possess a long track record of publications in the same venue in which they are publishing. Often, these authors are also part of large groups which leads to a multiplicative effect for this phenomenon.

Yet, inconsistent and/or inexplicable increases in self-citations for several journals across different scientific areas suggest that deliberate IF manipulation (such as those in Section II-B) may indeed exist and be of increasing extension. Fig. 1(a) reports the percentage of self-cites ($\%SC^{{\rm IF}}_{i}$ for journal $i$) used for the computation of the IF from 2000 to 2011 for three different journals: the International Journal of Hydrogen Energy (IJHE), Laser and Particle Beams (LPB) and Cortex (CO). Each of them belongs to one or more Subject Categories (SC) in very different scientific areas, namely Energy & Fuels, Environmental Sciences, Physics, Atomic, Molecular & Chemical and Chemistry, Physical for IJHE; Physics, Applied for LPB; and Behavioral Science and Neuroscience for CO. An increasing trend is clearly present, sometimes with dramatic jumps. Of course, there is no evidence in these cases of coercive practices aiming at acquiring self-citations, so that their presence could be due to fully legitimate reasons such as those mentioned above. Also, it is difficult to determine how many journals exhibit similar trends in the scholarly publishing arena and therefore to appreciate the extent of this phenomena. Yet, both the large slope of and the very high peak (78% in 2008) reached by LPB caused Thomson to temporarily suspend the journal in 2009 due to the excessive number of self-cites [51], [52]. The journal was reinstated in the 2010 JCR once the level of self-cites dropped to more “normal” values according to Thompson.4

Fig. 1. (a) Percentages of self-citations used in the IF calculation $\%SC^{{\rm IF}}_{i}$ and (b) number of self-citations used in the IF calculation per published paper $R^{{\rm IF}}_{i}$ in the period 2000–2011 for IJHE (SC: Energy & Fuels; Environmental Sciences, Physics, Atomic, Molecular & Chemical; Chemistry, Physical), Laser and Particle Beams (LPB) (SC: Physics, Applied) and Cortex (CO) (SC: Behavioral Science; Neuroscience). LPB evaluation is absent in 2009 since Thomson decided to suspend the journal from the JCR due to the excessive influence of self-cites.

The conclusions of the above paragraph are even further strengthened by Fig. 1(b) which shows for the same 3 journals the trend of the self-citation rate $R^{{\rm IF}}_{i}$ per published paper for journal $i$ used in the IF calculation, namely, by using the notation of the Appendix and for a given year $Y_{n}$, TeX Source $$R^{{\rm IF}}_{i}={{c_{ii}^{\Delta_{2}\Delta_{1}}}\over{p_{i}^{\Delta_{2}}}}$$ where $c_{ii}^{\Delta_{2}\Delta_{1}}$ is the number of self-cites of journal $i$ to articles published in years in set $\Delta_{1}=\{Y_{n-2},Y_{n-1}\}$ from papers published in year $\Delta_{2}=\{Y_{n}\}$, and $p_{i}^{\Delta_{2}}$ is the number of papers published in year set $\Delta_{2}$. In other words, $R^{{\rm IF}}_{i}$ represents the average number of self-cites included in the reference list of each paper published by journal $i$ in year $n$ which enter the IF computation; with respect to $\%SC^{{\rm IF}}_{i}$, coefficient $R^{{\rm IF}}_{i}$ has the advantage to represent the influence on IF of the behavior of each journal, normalized to its size. Interestingly enough, $R^{{\rm IF}}_{i}$ shows a trend which is (at least until 2008) almost perfectly monotonically increasing, and the peak value of $R^{{\rm IF}}_{i}$ for CO in 2010 is larger than the one of LPB in 2008 which could be also seen as potential indication of a similar problematic self-citation pattern.

### B. Ethical Issues

As far as ethical misconduct is concerned, the use of self-citations to manipulate IFs has been reported in the last decade in the areas of psychology [16] and medicine [18], and has more recently been shown to be spreading in the area of mathematics. In the latter area, [19] showed how self-citations manipulation (along with other unethical behavior such as editorial board members using their own conference papers to boost the citation counts) allowed the International Journal of Nonlinear Sciences and Numerical Simulation (IJNSNS) to dominate the IF ranking in the SC Mathematics, Applied, where it took first place from 2006 to 2009 (generally by a wide margin), and second place in 2005.

TABLE II SELF-CITATIONS LEVEL IN TERMS OF PERCENTAGE OF SELF-CITES $\%SC^{{\rm IF}}_{i}$ AND SELF-CITATION RATE (I.E., PER PAPER) $R^{{\rm IF}}_{i}$ FOR THE JOURNALS WHICH WERE BOTH AMONG THE TEN MOST COERCING JOURNALS IN [20] AND APPEARED IN THE THOMSON JCR. SUP INDICATES THAT THE TITLE WAS SUPPRESSED IN THE JCR BY THOMSON FOR THAT YEAR

It is worthy of further study to determine whether specific cases of ethical misconduct such as that reported about the IJNSNS and in the journals found to be coercers in [20] possess similar self-citation patterns as reported in Section II-A.

The supporting material of [20] includes also the list of the coercing journals, and names 93 publications which were reported as coercer once, 31 twice, 18 three times, 7 four times, 6 five times, 8 six times, 2 seven times, 1 eight times and nine times each, and 8 ten or more times. Table II reports the values from 2008–2011 of the coefficients $\%SC^{{\rm IF}}_{i}$ and $R^{{\rm IF}}_{i}$ for the 6 titles which were both among the ten top most coercers in [20] and which are also included in the JCR. Despite the difficulty in drawing strong conclusions from such a reduced set of data, Table II indicates that not surprisingly the number of coercions appears to correlate better with the self citation rate $R^{{\rm IF}}_{i}$ than with the percentage of self-cites $\%SC^{{\rm IF}}_{i}$ (the latter of which, as highlighted in Section II-A, also depends on the number of citations from external sources, i.e., not only on the behavior of the journal itself w.r.t. self-cites). Also, a large number of coercions does not seem to correspond to any particularly unusual trend in the either for $\%SC^{{\rm IF}}_{i}$ or $R^{{\rm IF}}_{i}$ in all but the case of the Journal of Retailing and, even more so the Journal of Consumer Psychology whose self-citation pattern was considered so critical by Thomson to cause its exclusion from the 2011 JCR. These findings support the opinion, reported in [21], of Phil Davis, a scholarly-publishing consultant and regular columnist of the Scholarly Kitchen [22], who “worries that survey biases may have affected the numbers” even if “ultimately you can say that the behaviour exists and that it's a problem at a few journals”.

The data pertaining to IJNSNS are reported in Table III. Even if $\%SC^{{\rm IF}}_{i}$ can be generally considered to be relatively large, in 2005–2009 when most of the issues highlighted in [19] took place, it has a concave trend with minimum in 2008. This is not surprising since the potentially unethical practice involved citations coming from other sources, mainly related conferences, which cannot be detected by performing this analysis. Interestingly, the trend in $R^{{\rm IF}}_{i}$ reveals an increase after 2004, even if the ultimate values are not large enough to reach any strong conclusion in the absence of additional evidence such as that reported in [19].

TABLE III SELF-CITATIONS LEVEL IN TERMS OF COEFFICIENTS $\%SC^{{\rm IF}}_{i}$ AND $R^{{\rm IF}}_{i}$ FOR THE IJNSNS SINCE ITS INSERTION IN THE THOMSON JCR. THE DATA IN 2011 ARE 0% AND N/A AS NO DATA IS REPORTED IN THE JCR FOR THE NUMBER OF PAPERS PUBLISHED IN 2011 (SO THAT ALSO THE NUMBER OF SELF-CITES IS 0); IJNSNS WAS REGULARLY PUBLISHED IN 2011 AND THE JOURNAL IS NOT BANNED FROM THE JCR SINCE IT CAN BE ACCESSED ON LINE AND IN NOT PRESENT IN THE LIST [51].
SECTION III

## ALTERNATIVES TO THE IF: POPULARITY VS PRESTIGE MEASURES, AND THEIR COMPUTATIONAL FEATURES

To both offer a solution to the IF's flaws mentioned in Section II and to complement the information attainable from the IF itself, a plethora of alternative bibliometric indicators have been proposed over the last 30 years. Table IV lists the IF along with 8 of these indexes, their acronyms, and the data bases from which citations are acquired to compute them. A summary of the indexes' properties used in computation and features is given in Table IV, while the definitions and complete description of their peculiarities are reported in Appendix.

TABLE IV FEATURES OF SEVERAL BIBLIOMETRIC INDEXES

In this section, we will discuss their main properties and how they address the three requirements of an index mentioned in the Introduction as well as they cope with the IF's flaws. We will also draw attention to the aspects which will allow us to highlight the pros and cons of each of them in Section IV. In order to provide quick indexing for the reader, the primary properties are highlighted in bold and unerlined fonts at the beginning of each paragraph; furthermore the reading of this section assumes at least familiarity with the introductory definitions of the first paragraph of the Appendix.

Time window. A reduction of bias caused by the IF's limited two-year time window is achieved by increasing the citation window to either three years for both the Source Normalized Impact per Paper (SNIP) [29] and the Scimago Journal Rank (SJR) [30], or to five years for the Five Year Impact Factor (5YIF), the Journal to Field Impact Score (JFIS) [32], the Audience Factor (AF) [31], the Influence Weight (IW) [11], the Eigenfactor (EF) and the Article Influence (AI). [33], [34], [35]. Clearly, this addresses point ii) in Section II-A.

Quantifying popularity or prestige. A second and very important difference [24] is between indexes measuring popularity and those measuring prestige. These two concepts are based on the distinctions between the two factors which contribute to determine the status of an element in a social network, namely the number of endorsements and the prestige of the sources of the endorsements. There is a clear difference for publications too: for example, journals publishing only review articles may be very popular and cited often by any kind of source including low-impact publications, while highly specialized journals receive citations coming from a smaller but often highly qualified audience working at the cutting-edge of technology. IF, 5YIF, JFIS, and SNIP all consider the crude number of citations received by each paper as a measure of their value and are therefore popularity measures. Prestige measures include AF, IW, EF, AI and SJR, all of which weight citations based on their source. As such, they explicitly consider the “quality” of the journal which is the source of each citation and thereby address a flaw in the IF (second part of point i) in Section II-A). IW, EF, AI and SJR achieve this result by using recursive citation weighting. For the latter three indicators, the set of scholarly papers is represented as a network, where each node corresponds to a journal and each link indicates citations from one journal to another [33], [34], [35]. Furthermore, the connections between nodes are oriented and weighted: large numbers of citations correspond to strong weights and the orientation of the connection indicates the direction of the citations. The use of this network representation allows the exploitation of well known and effective algorithmic tools to extract information from the structure of the data. Important examples of these tools are the eigenvector centrality [36], [37], used to quantify the popularity or the status of an individual in a social or communication network; and the Page Rank (PR) [38], used by Google to rank the importance of websites by considering the hyperlink structure of the world wide web. Eigenvector centrality is indeed the concept grounding the EF, AI and SJR metrics as well [30], [31], [32], [33], [35] and, as such, the importance of a journal depends on where it is located in this structured network of citation links, that is, a journal is central in the mesh of citations if it is largely referenced, especially from other well connected journals. The IW is also a prestige measure and, as reported in Appendix (see Subsections E and F, particularly at the end of the latter), can be interpreted as a special case of the AI; historically it was the first one proposed [11] and was one of the elements which led Brin and Page to develop the idea of the PR algorithm [38].

Interestingly, despite its definition it is not based on a recurrence equation (see Subsection C), the AF can also be interpreted as a special case of the AI (see also the end of Subsection F).

Desirable insensitivities. As shown in [23] both AF and IW possess two desirable properties, namely insensitivity to insignificant journals (i.e., the index is almost invariant if non-significant journals are not included in the data base) and insensitivity to field differences (i.e., the indicator for 2 different areas citing only minimally to other fields is on average the same. As reported in [23], these properties are to some extent also “inherited” by the AI.

Kind of measured performance. IW is a measure of a journal performance per-reference and not per-article as it is the case for the AI and the SJR, as well as the IF, 5YIF, JFIS and SNIP, while the EF measures the total performance of the journal and therefore, ceteris paribus, journals publishing more papers tend to have a larger EF.5

Normalization. To account for different citation practices among various subjects/fields, all new indexes either rely on an implicit or explicit form of normalization, thereby addressing another of the IF's flaws (first part of point i) in Section II-A). IW, EF, AI and SJR all provide implicit normalization. Explicit normalization on the cited-side is included in the JFIS, using the average number of citations received by all papers published in a given Subject Category. In the SNIP, explicit normalization on the citing-side is included, using the average number of citations contained in the set of papers present in the data base and citing papers published in publications in the same data base within the prescribed time window.

Similarly, the AF also possesses a citing-side normalization using the ratio between the average number of citations of the papers published in the journal of the same area and the same figure for the specific journal under consideration.

Included/excluded document types. JFIS, SNIP, and SJR eliminate inaccuracies caused by citations from non-peer-reviewed sources by considering the same list for both “citable items” considered in the numerator of each indicator and “publication items” considered at the denominator of each indicator. EF and AI reduce inaccuracies by reducing the influence of citable items which are not publication items. As such, item iv)-a) in Section II-A is either completely addressed or at least notably reduced.

Self-citations. As already stated in Section II-A, appropriate inclusion of self-citations in bibliometric indexes is a complex issue. JFIS, SNIP, AF, and IW include all self-citations and do not therefore offer any advantage over the IF. At the other extreme, EF and AI ignore all self-citations, which obviously makes these indexes very difficult to manipulate. Counting self-citations is, however, a subtle issue, since, as already mentioned, there may good reasons to include them. To try to address this issue, SJR takes a more balanced approach towards self-citations by allowing each journal to receive a maximum percentage of self-cites equal to 33%, which of course limits room for possible index inflation without penalizing normal self-citation behavior. In any case EF, AI and (with some pros and cons) SJR successfully address issue iv)-b) in Section II-A.

As a final remark and despite not being reported in Table IV, use of an $h$-type index would well complement the information extracted by other bibliometric indexes, as suggested in [56]. The $h$-index of a journal should not be computed from the journal's creation date, but rather with respect to a definite time window, e.g., one or more years. Such a computation window will prevent strong biases against (relatively) old journals and, as such, a journal-level $h$-index is different from from what was originally proposed by Hirsh for individual scientist evaluation [43]. Such an indicator with a 5-year time window $(h_{5})$ has recently been adopted by Google Scholar [44] for the purpose of establishing a freely available ranking of scholarly publications6 and may therefore become an important reference in the future.

SECTION IV

## QUANTITY AND KIND OF INFORMATION OBTAINABLE BY VARIOUS BIBLIOMETRIC INDICATORS: FACTOR ANALYSIS AND LINEAR REGRESSION

Each of the proposed bibliometric indexes discussed above offers one or more advantages over the IF. Yet, each index captures its own data of interest while sometimes ignoring or deemphasizing other data and/or has its own disadvantages. In this section, we analyze the information that one can extract from various bibliometric indicators introduced in Section III. To do so, we illustrate some comparisons and criticisms of the three most commonly used indexes, namely EF, AI, and SJR, to which the $h$-index and the SNIP are added as additional references; we then review the main results of two studies which analyzed several measures including EF, AI, SJR and $h$ (but not the SNIP which was introduced in 2010) via a Principal Component Analysis (PCA) [46] and we conclude by showing how the use of a simple multiple linear regression model may successfully highlight anomalies in the trend of one specific indicator (the IF in particular).

As mentioned in Section III and very clearly highlighted in [40], EF and AI have the advantages that:

• they measure prestige by weighting citations depending on the source so those which come from highly prestigious journals such as Science, Nature, Physical Review Letters, Proceedings of the IEEE,… are considered more important than those coming from the “Journal of Obscurity;”
• they use a citation window of five years which contributes to a reduction of year-to-year fluctuations and which better captures most papers' citation impact for the vast majority of disciplines;
• they offer an implicit normalization by accounting for the citation intensity of a journal. In other words, since the number of citations given by journal $j$ to journal $i$ in (12) of Subsection F in Appendix is normalized by the total number of citations of all papers published by journal $j$, citations from journals with short list of references are implicitly considered more important than citations from journals with very long bibliographies (such as those publishing review papers only). This is also provides some normalization with respect to different areas with different citation practices.

EF and AI, however, both ignore all self-citations, an approach with both pros and cons. As already mentioned, ignoring them obviously makes these indexes very difficult to manipulate; however, it also fully discounts legitimate self-citations, especially in small scientific communities where one journal could be the undisputed reference.

The SJR possesses similar features as EF and AI with a (marginal) difference of using a citation window of three years instead of five and a (more important) difference of including self-citations up to a maximum of 33% of the total, which offers the advantage of including legitimate self-cites, but leaves some room for potential manipulation.

The SNIP offers trade-offs between adaptivity to citation practices across fields and to what it measures. It includes a subject-based normalization by considering the characteristics of a properly defined subject field [29], but measures popularity rather than prestige.

The most important positive features of the $h$-index include [56] the insensitivity to an “accidental excess” in the number of non-cited papers and in the number of citations for one or few very highly cited contributions, as well as the capability to combine both the effect of the number of papers published and the citation rate, thus reducing the apparent overperformance of small journals as measured by IF. The most important criticism is related to its lack of consistency as defined in [45].7

These simple examples illustrate clearly, as was first stated in [48], that “one single impact measure might not be sufficient to describe citation characteristics of journals.” Such a statement is even further strengthened by the analysis performed independently by [39] and [41], both of which performed a PCA [46] of the matrix containing the correlations between the rankings produced by, respectively, 39 and 13 existing and proposed different bibliometric measures. The 39 measures in [39] include, in addition to EF, AI, $h$-index and SJR, several other citation based metrics (Total Number of Cites, Immediacy Index,…) taken from the 2007 JCR and the Scimago web site [53] and others which are based on usage log data resulting from the MESUR project [54]. The 13 measures in [41] are based on the analysis of citations acquired from the 2006 and 2007 JCRs and downloaded from the Scimago web site also for 2006 and 2007.8

Fig. 2 provides a qualitative illustration of the main results reported in [39] and [41]. Their conclusions include that:

1. the scientific impact is (at least) a two-dimensional variable that cannot be fully captured by any single indicator. In fact, the cumulative variance of (i.e., the “amount of information” captured by) the PCA analysis [46] is sufficiently large $({>}70\%)$ when (at least) 2 components are taken, since it amounts to is 83.4% for the analysis in [39] and 74.6% for the one in [41];
2. different clusters of measures exist, distinguishing, in particular, between popularity and prestige measures. In particular, in Fig. 2 black ovals represent regions occupied by citation based metrics, where popularity and prestige indicators are clearly separated, while the green oval contains the region where usage based metrics are positioned. Interestingly, despite being a prestige measure, the SJR is positioned in both analysis very close to the IF. On the contrary, the $h$-index is somewhat better positioned and is equidistant between prestige and popularity measures.
Furthermore, [39] observed that the IF “is not positioned at the core of this construct, but at its periphery, and should thus be used with caution,” and certainly not considered as the “the only measure of choice”. Conversely, prestige indicators as the EF and the AI should be used to complement the information on the impact of the journal.
Fig. 2. Qualitative sketch illustrating the correlations between 37 [39] and 13 [41] rankings with respect to different bibliometric measures mapped onto first two principal components of PCA. The cumulative variance of the first two factors (i.e., the “amount of information” captured by considering the first two components only of the PCA indicated as PCA1 and PCA2) is 83.4% for the analysis in [39] and 74.6% for the one in [41]. Data were taken from the 2007 edition of the JCR and the Scimago web site as well for the MESUR project (http://mesur.lanl.gov/MESUR.html) for [39] and from the 2007 and 2006 editions of the JCR and from the Scimago web site for [41]. Cluster of metrics are clearly identifiable: black ovals represent regions occupied by citation based metrics, (including those based on popularity and prestige), while the green oval contains the region where usage based metrics are positioned. Note that the IF is not located at the core of the construct but at its periphery, highlighting that it is able to capture only part of the meaningful information.

A simple example of the kind of information one can obtain by using more than one measure is shown in Table V(a)(d), which reports the ranking with respect to the IF, AI and EF (denoted as RK-IF, RK-AI and RK-EF respectively and for four of the journals considered in Section II, i.e., (a) the IJNSNS (SC: Mathematics, Applied), (b) the IJHE (SC: Energy & Fuels), (c) the LPB (SC: Physics, Applied), and (d) the CO (SC: Behavioral Science). As can be seen by comparing the data of Table V(a) and those reported in Table III and in Section II-B, the large differences in RK-IF for the IJNSNS with respect to those of both RK-AI and RK-EF highlight the anomaly in citation patterns reported in [19]. Similarly, a very large discrepancy exists for LPB in 2007 and 2008 [see Table V(c)], which could again be considered as a good indicator of the anomalies which led Thomson to suppress the title for the 2009 JCR. On the contrary, the more limited differences in RK-IF, RK-AI and RK-EF for IJHE and CO may lead one to conclude that the increasing trends in Fig. 1(a) and (b) (including the large peak of $R^{{\rm IF}}_{i}$ in 2010 for CO) are not related to any particular anomaly.9

TABLE V RANKING WITH RESPECT TO THE IF (RK-IF), THE AI (RK-AI), AND THE EF (RK-EF) FOR: (A) THE IJNSNS (SC: MATHEMATICS, APPLIED), (B) THE IJHE (SC: ENERGY & FUELS), (C) THE LPB (SC: PHYSICS, APPLIED), AND (D) THE CO (SC: BEHAVIORAL SCIENCE). THE TOTAL NUMBER OF TITLES FOR THE YEARS 2007–2011 IS (165, 175, 204, 236, 245) FOR MATHEMATICS, APPLIED, (64, 67, 71, 79, 81) FOR ENERGY & FUELS, (94, 95, 108, 118, 125) FOR PHYSICS, APPLIED AND (45, 47, 49, 48, 48) FOR BEHAVIORAL SCIENCE. SUP INDICATED A TITLE SUPPRESSED FROM THE JCR THAT YEAR

Instead of correlating the ranking with respect to various indicators, a similar conclusion can be reached in a more quantitative way by performing a multiple linear regression [57] where the IF is the dependent variable and EF and AI are the explanatory variables. More precisely, let us consider the data set $\{{IF}_{i},{\rm EF}_{i},{\rm AI}_{i}\}_{i=1}^{M}$ of $M$ statistical units, which are the journals in the JCR edition of a given year for which the IF, the AI and the EF have been computed. We then assume that the relationship between the dependent variable ${\rm IF}_{i}^{{\rm Pr}}$ and $({\rm EF}_{i},{\rm AI}_{i})$ exists in the form TeX Source $${\rm IF}_{i}^{{\rm Pr}}=\alpha_{{\rm EF}}{\rm EF}_{i}+\alpha_{{\rm AI}}{\rm AI}_{i}+e\eqno{\hbox{(1)}}$$ where the coefficients $(\alpha_{{\rm EF}},\alpha_{{\rm AI}}, e)$ are determined as those defining the best fitting plane for the $M$ statistical units in the least square sense. The coefficients obtained by this procedure for the JCR editions from 2007 to 2011 are reported in Table VI, which shows also the corresponding value of $M$. As an example, Fig. 3(a) reports a plot of the points $({\rm AI}_{i},{\rm EF}_{i},{\rm IF}_{i})$ for most of the 5862 journals contained in the 2007 edition of the JCR for which all 3 indicators exists, as well as of the corresponding best fitting plane. The multiple linear regression model (1) can be used to determine, for each journal, the value ${\rm IF}_{i}^{{\rm Pr}}$, that is the IF that a generic journal $i$ would have if its quality could be determined by the corresponding EF and AI alone. Given the fact that the latter indicators discard self-citations, a too large positive difference ${\rm IF}_{i}-{\rm IF}_{i}^{{\rm Pr}}$ can be naturally considered as an indication that IF manipulation is taking place for the journal. Fig. 3(b) reports the for IJHE, LPB, and CO the relative difference $({\rm IF}_{i}-{\rm IF}_{i}^{Pr})/{\rm IF}_{i}^{{\rm Pr}}$ for year 2007 to 2011 together with the corresponding mean value ${\bf E}{\left[({\rm IF}_{i}-{\rm IF}_{i}^{{\rm Pr}})/{\rm IF}_{i}^{{\rm Pr}}\right]}$ computed with respect to the whole JCR and whose value is reported, together with the corresponding standard deviation $\sigma$ in the last two rows of Table VI. The data illustrate that, while for CO and IJHE the value of $({\rm IF}_{i}-{\rm IF}_{i}^{{\rm Pr}})/{\rm IF}_{i}^{{Pr}}$ always fall in an interval of amplitude 3 $\sigma$ centered on the JCR mean value (represented by the gray error bars in the plot of Fig. 3(b)) or are very close to it, LPB in 2007 and 2008 lie far outside from it, which offers further ground to Thomson decision of excluding the journal from the JCR in 2009.

Fig. 3. (a) Plot of the points $({\rm AI}_{i},{EF}_{i},{\rm IF}_{i})$ for the 5862 journals contained in the 2007 edition of the JCR and for which all 3 indicators have been computed and of the corresponding best fitting plane ${\rm IF}_{i}^{{\rm Pr}}=\alpha_{{\rm EF}}{\rm EF}_{i}+\alpha_{{\rm AI}}{\rm AI}_{i}+e$ with the parameters reported in the 2007 column of Table VI. The color of the points is set in accordance to a typical temperature scale and only the part of the plot corresponding to the largest aggregation of the data point is reported; (b) plot of the relative difference $(({\rm IF}_{i}-{\rm IF}_{i}^{{\rm Pr}}))/({\rm IF}_{i}^{{\rm Pr}})$ for for IJHE, LPB, and CO between 2007 and 2011 together with the corresponding mean ${\bf E}{\left[(({\rm IF}_{i}-{IF}_{i}^{{\rm Pr}}))/({\rm IF}_{i}^{{\rm Pr}})\right]}$ computed for the entire JCR, in which ${\pm}{3}\sigma$ error bars are also shown.
TABLE VI VALUE OF THE MULTIPLE LINEAR REGRESSION MODEL COEFFICIENTS $(\alpha_{{\rm EF}},\alpha_{{\rm AI}}, e)$ FOR THE $M$ JOURNALS REPORTED IN THE JCR IN YEARS FROM 2007 TO 2011 FOR WHICH THE VALUE OF THE IF, EF, AND AI WAS COMPUTED. THE LAST 2 ROWS SHOW THE MEAN VALUE ${\bf E}{\left[{{{\rm IF}_{i}-{\rm IF}_{i}^{Pr}}\over{{\rm IF}_{i}^{{\rm Pr}}}}\right]}$ AND THE STANDARD DEVIATION $\sigma$ OF THE RELATIVE DIFFERENCE BETWEEN THE REAL VALUE ${\rm IF}_{i}$ OF THE IF AND THE CORRESPONDING VALUE ${\rm IF}_{i}^{{\rm Pr}}$ PREDICTED USING THE MULTIPLE LINEAR REGRESSION MODEL
SECTION V

## CONCLUSION

The scientific impact of journals as evaluated by bibliometrics is a complicated, multi-dimensional construct which cannot be captured by any single measure. Furthermore, the well-known and widely employed IF is not at the center of the “metrics space” and therefore can be considered to express only a rather particular aspect of the scientific impact, which in no way supports its current status of “gold standard.” Many other bibliometric indicators exist whose features address the IF's flaws. In particular prestige indicators, such as the EF and the AI, should be selected as a complement of popularity measures, the IF in particular, to better characterize the quality of each journal.

There is an additional and very important advantage in promoting the use of more than one bibliometric index. One cannot stress enough that being an average measure of a particularly skewed distribution of the number of citations received by each manuscript, any journal-based metric is simply not designed to capture qualities of individual papers [49]. A similar statement holds also when one needs to evaluate the publications of individual scholars or, in general, any small collection of papers. Thomson-Reuters itself stresses that [50] that “The impact factor should be used with informed peer review. In the case of academic evaluation for tenure it is sometimes inappropriate to use the impact of the source journal to estimate the expected frequency of a recently published article,” and similar considerations apply also for any other measure one may consider to evaluate journal quality. Despite this, more and more often IF is misused as a central element in the assessment of the performance of scientists, for hiring, tenure and promotions, and as a fundamental factor in evaluating and scoring research proposals, a practice which is as much to be deplored as it is unfortunately spread. This fact, in addition to the misplaced emphasis on the IF which has become the de-facto single measure for journal evaluation, can be considered to be the core reason for the increase in the number of reported attempts of manipulate it. In similar cases, the well known Goodhart's law in Economics [19], [55] warns us that when a measure becomes a target, it ceases to be a good measure.

It is therefore time for the scientific community to take a stand against such behavior, by spreading this knowledge, encouraging the use of multiple bibliometric indicators, and by taking a position on the proper use of bibliometrics to evaluate journals' quality in the scholarly literature. At the same time, the community should condemn any attempt to manipulate bibliometric indicators and also their improper use for the evaluation of single scientists, whose work can only be correctly measured by the opinions of competent peers after a careful reading.

## APPENDIX

For completeness, this appendix reports the definitions of all the bibliometric indicators discussed in this paper. Here, we use the standard notation used in the bibliometric literature for variables. For the first time to the best of the author's knowledge, the notation is consistently maintained for all indicators, which make it easier to compare their features. Let us assume that the data base contains $N$ papers and let $\Delta_{\alpha}$ and $\Delta_{\beta}$ indicate 2 different set of years, where the element of the first set precede the element of the second one. Let us indicate with $c_{ij}^{\Delta_{\beta}\Delta_{\alpha}}$ the number of citations given to papers published in journal $j$ in period $\Delta_{\alpha}$ from papers published in journal $i$ in period $\Delta_{\beta}$, and with $p_{i}^{\Delta_{\beta}}(p_{j}^{\Delta_{\alpha}})$ the number of papers published by journal $i(j)$ in period $\Delta_{\beta}(\Delta_{\alpha})$, so that $C_{i\rightarrow}^{\Delta_{\beta}\Delta_{\alpha}}=\sum_{j=1}^{N}c_{ij}^{\Delta_{\beta}\Delta_{\alpha}}$ indicates the total number of citations originating from journal $i(i\rightarrow)$ in period $\Delta_{\beta}$ to all journals indexed in the data base and published in period $\Delta_{\alpha}$ and $C_{\rightarrow j}^{\Delta_{\beta}\Delta_{\alpha}}=\sum_{i=1}^{N}c_{ij}^{\Delta_{\beta}\Delta_{\alpha}}$ indicates the total number of citations received by all papers published in journal $j(\rightarrow j)$ in period $\Delta_{\alpha}$ originating from all journals in the data base in period $\Delta_{\beta}$.

### A. Impact Factor and Five-Year Impact Factor

For a given year $Y_{n}$, let $\Delta_{1}=\{Y_{n-2},Y_{n-1}\}$ and $\Delta_{2}=\{Y_{n}\}$ so that the Impact Factor of journal $i$ is defined as TeX Source $${\rm IF}_{i}={{C_{\rightarrow i}^{\Delta_{2}\Delta_{1}}}\over{p_{i}^{\Delta_{1}}}}\eqno{\hbox{(2)}}$$ For the Five Year Inpact Factor ${\rm 5YIF}_{i}$ the above definition is formally the same, with an identical choice for $\Delta_{2}$ and with $\Delta_{1}=\{Y_{n-5},Y_{n-4},\ldots,Y_{n-1}\}$. With this, ${\rm IF}_{i}$ and ${\rm 5YIF}_{i}$ assume the significance of the average number of citations given to each paper published in journal $i$ in the previous two years $\{Y_{n-2},Y_{n-1}\}$ or five years $\{Y_{n-5},Y_{n-4},\ldots, Y_{n-1}\}$, respectively.

### B. Journal to Field Impact Score

For a given set of years (usually 4, but also 5 or more have been used [32]) $\Delta_{1}=\{Y_{n-3}, Y_{n-2}, Y_{n-1}, Y_{n}\}$, let $\Delta_{2}=\{Y_{n-2},Y_{n-1},Y_{n}\}$ and $\Delta_{3}=\{Y_{n-1},Y_{n}\}$ and $\Delta_{4}=\{Y_{n}\}$, with $c(\xi)_{ij}^{\Delta_{\beta}\Delta_{\alpha}}$ for $\xi=A,R,L,{\rm and} N$ the number of citations given by all papers published in journal $i$ in period $\Delta_{\beta}$ to the different document types (articles, review, letters and notes) published in journal $j$ in period $\Delta_{\alpha}$ whose total number is $\xi_{j}^{\Delta_{\alpha}}$ respectively, and with $C(\xi)_{\rightarrow j}^{\Delta_{\beta}\Delta_{\alpha}}=\sum_{i=1}^{N}c_{ij}^{\Delta_{\beta}\Delta_{\alpha}}$ the corresponding total number of citations received by publication items $\xi$ in journal $j$ in period $\Delta_{\alpha}$ originating from all journals in the data base in period $\Delta_{\beta}$. The Journal to Field Impact Score for journal $i$ is defined as TeX Source $${\rm JFIS}_{i}={\displaystyle{\sum\limits_{k=1}^{4}\sum\limits_{\xi\in\{A,R,L,N\}}{{C(\xi)_{\rightarrow i}^{\Delta_{k}\Delta_{k}}}\over{\xi_{i}^{\Delta_{k}}}}}\over\displaystyle{\sum\limits_{j\in SC}\sum\limits_{k=1}^{4}\sum\limits_{\xi\in\{A,R,L,N\}}{{C(\xi)_{\rightarrow j}^{\Delta_{k}\Delta_{k}}}\over{\xi_{j}^{\Delta_{k}}}}}}\eqno{\hbox{(3)}}$$ where the denominator is the average number of citations received by all journals in the same area (i.e., Subject Category (SC) in WoS) as journal $i$.

### C. Audience Factor

The definition of the Audience Factor [31] is similar to that of the IF except that citations are weighted based on the journal from which they originate. More specifically, define ${\cal J}$ to be a specific set of journals (not necessarely a SC), and then for a given year $Y_{n}$ and with $\Delta_{1}=\{Y_{n-5},Y_{n-4},\ldots,Y_{n-1}\}$ and $\Delta_{2}=\{Y_{n}\}$, for journal $i$ we have TeX Source $${\rm AF}_{i}={{1}\over{p_{i}^{\Delta_{2}}}}\sum_{j=1}^{N}{\displaystyle{{{\sum\nolimits_{k\in{\cal J}}C_{k\rightarrow}^{\Delta_{2}\Delta_{1}}}\over{\sum\nolimits_{k\in{\cal J}}p_{k}^{\Delta_{2}}}}}\over\displaystyle{{{C_{j\rightarrow}^{\Delta_{2}\Delta_{1}}}\over{p_{j}^{\Delta_{2}}}}}}c_{ji}^{\Delta_{2}\Delta_{1}}\eqno{\hbox{(4)}}$$ Note that the AF performs a normalization with respect to the citing-side, while the JFIS definition is based on a normalization with respect to the cited-side of the expression.

### D. Source Normalized Impact Per Paper

For a given year $Y_{n}$, let $\Delta_{1}=\{Y_{n-3},Y_{n-2},Y_{n-1}\}$ and $\Delta_{2}=\{Y_{n}\}$. The Source Normalized Impact per Paper (SNIP) [29] of journal $i$ is defined as the ratio TeX Source $${\rm SNIP}_{i}={{{\rm RIP}_{i}}\over{{\rm RDCP}_{i}}}\eqno{\hbox{(5)}}$$ where ${\rm RIP}_{i}$ is the Raw Impact per Paper of journal $i$, i.e., the average number of citations per paper published in journal $i$ in period $\Delta_{1}$ by papers published in all journals present in the data base in period $\Delta_{2}$ TeX Source $${\rm RIP}_{i}={{C_{\rightarrow i}^{\Delta_{2}\Delta_{1}}}\over{p_{i}^{\Delta_{1}}}}\eqno{\hbox{(6)}}$$ so that, by comparing (6) and (2) one immediately sees that the only difference between ${\rm RIP}_{i}$ and ${\rm IF}_{i}$ is the length of the citation window $\Delta_{1}$. The ${\rm RDCP}_{i}$ is the Relative Database Citation Potential of journal $i$, which is the average number of citations contained in any paper citing journal $i$ in period $\Delta_{1}$ normalized in such a way that the median journal in the database has ${\rm RDCP}_{i}=1$ TeX Source $${\rm RDCP}_{i}=\theta{{\sum\nolimits_{j\in{\cal I}}C_{j\rightarrow}^{\Delta_{2}\Delta_{1}}}\over{p_{j\rightarrow i}^{\Delta_{2}\Delta_{1}}}}\eqno{\hbox{(7)}}$$ where ${\cal I}$ is the set of journals published in the data base in period $\Delta_{2}$ which cite papers published in journal $i$ in period $\Delta_{1}$, $p_{j\rightarrow i}^{\Delta_{2}\Delta_{1}}$ is the number of papers published in the journal $j$ of this set in the period $\Delta_{2}$ citing papers published in journal $i$ of the set in period $\Delta_{1}$, and $\theta$ is a normalizing constant chosen to have ${\rm RDCP}_{i}=1$ for the median journal in the data base.10

### E. Influence Weight

For a given year $Y_{n}$, let $\Delta_{1}=\{Y_{n-5},Y_{n-4},\ldots,Y_{n-1}\}$ and $\Delta_{2}=\{Y_{n}\}$. The Influence Weight [11] for journal $i$ is computed as TeX Source $${\rm IW}_{i}=\lim_{k\mapsto\infty}{\rm IW}_{i}[k]\eqno{\hbox{(8)}}$$ where at each recursion step, ${\rm IW}_{i}[k]$ is the solution of the following system of linear equations TeX Source \eqalignno{&{\rm IW}_{i}[k]=\sum_{j=1}^{N}{{c_{ji}^{\Delta_{2}\Delta_{1}}}\over{C_{i\rightarrow}^{\Delta_{2}\Delta_{1}}}}{\rm IW}_{i}[k-1]\cr&{\displaystyle{\sum\limits_{i=1}^{N}{\rm IW}_{i}[k] C_{i\rightarrow}^{\Delta_{2}\Delta_{1}}}\over\displaystyle{\sum\limits_{i=1}^{N}C_{i\rightarrow}^{\Delta_{2}\Delta_{1}}}}=1&{\hbox{(9)}}} The normalization term in the denominator makes the IW measure journal average performance per-reference and not per-paper. To get an indication of the latter, one needs therefore to compute TeX Source $${\rm IPP}_{i}={{{\rm IW}_{i}C_{i\rightarrow}^{\Delta_{2}\Delta_{1}}}\over{p_{i}^{\Delta_{1}}}}\eqno{\hbox{(10)}}$$

### F. Eigenfactor and Article Influence

For a given year $Y_{n}$, let $\Delta_{1}=\{Y_{n-5},Y_{n-4},\ldots,Y_{n-1}\}$ and $\Delta_{2}=\{Y_{n}\}$. As stated in Section I, the computation of the EF takes advantage of the entire journal network and is based on a recursive equation similar to the PR algorithm. More precisely [34], for each journal $i=1,\ldots, M$ let one needs to compute the value TeX Source $$\pi_{i}=\lim_{k\mapsto\infty}\pi_{i}[k]\eqno{\hbox{(11)}}$$ where at each recursion step, $\pi_{i}[k]$ is the solution of the following system of linear equations TeX Source \eqalignno{\pi_{i}[k]=&\, (1-\alpha){{p_{i}^{\Delta_{2}}}\over{\sum\nolimits_{k=1}^{M}p_{k}^{\Delta_{2}}}}\cr&+\alpha\left(\sum_{\matrix{j=1, j\ne i\cr j\notin{\cal D}}}^{M}{{c_{ji}^{\Delta_{2}\Delta_{1}}}\over\displaystyle{\sum\limits_{l=1,l\ne j}^{M}c_{jl}^{\Delta_{2}\Delta_{1}}}}\pi_{j}[k-1]\right.\cr&\qquad\quad\left.+{{p_{i}^{\Delta_{1}}}\over\displaystyle{\sum\limits_{l=1}^{M}p_{l}^{\Delta_{1}}}}\sum_{j\in{\cal D}}\pi_{j}[k-1]\right)\cr\sum_{i=1}^{M}\pi_{i}[k&]=\, 1&{\hbox{(12)}}} where:

• the parameter $\alpha$ must be less than 1 to ensure convergence of (12) (usually $\alpha=0.85$ [33])11;
• ${\cal D}$ is the set of dangling nodes in the journal network, i.e., the journals that are only cited, but do not cite any other journal in time window $\Delta_{1}$. To avoid losing their prestige $(\sum_{j\in{\cal D}}\pi_{j}[k-1])$, this is redistributed weighted with respect to the “journal relative prestige due to size,” i.e., the ratio between the number of papers published by the journal in period $\Delta_{1}$ with respect to the total for journals in the data base $p_{i}^{\Delta_{1}}/\sum_{l=1}^{M}p_{l}^{\Delta_{1}}$.

Once system (12) is solved, the EF for journal $i$ can be defined as TeX Source $${\rm EF}_{i}=100\sum_{j=1}^{M}{{\pi_{j}}\over{C_{j\rightarrow}^{\Delta_{2}\Delta_{1}}}}c_{ji}^{\Delta_{2}\Delta_{1}}\eqno{\hbox{(13)}}$$

Note that ${\rm EF}_{i}$ depends through (12) on the citations journal $i$ receives from all other journals, weighted both by the journal citation potential (i.e the total number of citations given by that specific journal to any other) and by the journal prestige coefficient (i.e., its own coefficient $\pi_{j}$). Note also that (because $j\ne i$, i.e., $c_{ii}^{\Delta_{2}\Delta_{1}}=0$) self-citations are not considered. If we assume $\alpha=1$, another useful way to interpret the above recursion is to notice that $\pi_{i}$ are the elements of the eigenvector of unity eigenvalue of a Markov chain whose nodes are the journals in the collection and whose transition matrix is proportional to the fraction of citations linking each journal $j$ to journal $i$. This corresponds to considering the scores as the result of the following random process [34]:

“Imagine that a researcher is to spend all eternity in the library randomly following citations within scientific periodicals. The researcher begins by picking a random journal in the library. From this volume a random citation is selected. The researcher then walks over to the journal referenced by this citation. The probability of doing so corresponds to the fraction of citations linking the referenced journal to the referring one, conditioned on the fact that the researcher starts from the referring journal. From this new volume the researcher now selects another random citation and proceeds to that journal. This process is repeated ad infinitum.”

Of course, the above scientist will read journals that receives a large number of citations and will reach them coming from journals that are also highly cited. The percentage of the time that the model researcher visits that journal in the walk through the library is given by the elements of the eigenvector of unity eigenvalue of the transition matrix of the Markov chain modeling the process. Each element of this eigenvector gives a quantity which is proportionally related to the EF score of the corresponding journal. So when we report that IEEE Transactions on Information Theory in 2010 had an EF score of 0.06987, the physical significance of this is that almost 0.07% of the time, the model researcher would have visited that periodical. It is also worthwhile to stress that:

• the Eigenfactor score is a per-journal measure and therefore tends to increase with the number of papers published, since having more articles one can expect them to be visited more often. On the other hand, it may be useful for librarians to determine the expected impact of the journal in their readership;
• because by its very definition the sum of the Eigenactor in the entire collection of publication is equal to 100%, Eigenfactors are additive, that is, the percentage of time a researcher spends at a given group of journals is the sum of their EFs.12

The AI is computed starting from the EF and normalizing its value with respect to the total number of papers $p_{i}^{\Delta_{2}}$ published by the corresponding journal, i.e., TeX Source $${\rm AI}_{i}=\beta{{{\rm EF}_{i}}\over{p_{i}^{\Delta_{2}}}}\eqno{\hbox{(14)}}$$ where $\beta$ is a normalization constant chosen so that ${\rm AI}={1}$ for the journal corresponding to the median of the distribution of the Article Influence values in the set of journals under consideration. The Article Influence score therefore measures the influence, per-article, of a given journal and as such is directly comparable to IF; a value above 1 indicates an above average performance for that journal.

Finally, as reported in [23]:

• for $\alpha=0$ one immediately find comparing the very definition (12)(14) and (4) that ${\rm AI}_{i}\propto{\rm AF}_{i}$ assuming that the number of papers $p_{i}^{\Delta_{1}}$ published by journal $i$ in period $\Delta_{1}$ is proportional to the number of papers $p_{i}^{\Delta_{2}}$ published by the same journal in period $\Delta_{2}$;
• for $\alpha=1$ one gets again by their very definitions (12)(14) and (8)(10) that ${\rm AI}_{i}\propto{\rm IPP}_{i}$.

### G. Scimago Journal Rank

The SJR [30] belongs, as the EF and the AI, to the family of bibliometric indexes related to the PR algorithm. Let us consider a specific year $Y_{n}$, let $\Delta_{1}=\{Y_{n-3},Y_{n-2},Y_{n-1}\}$ and $\Delta_{2}=\{Y_{n}\}$. As in the previous case the computation of the SJR involves to determine for each journal $i=1,\ldots, M$ the value TeX Source $$\sigma_{i}=\lim_{k\mapsto\infty}\sigma_{i}[k]\eqno{\hbox{(15)}}$$ where at each recursion step, $\sigma_{i}[k]$ is the solution of the following system of linear equations TeX Source \eqalignno{\sigma_{i}[k]=&\,{{(1-\gamma-\delta)}\over{M}}+\delta{{p_{i}^{\Delta_{1}}}\over{\sum\nolimits_{k=1}^{M}p_{k}^{\Delta_{1}}}}\cr&+\gamma\left({{p_{i}^{\Delta_{1}}}\over{\sum\nolimits_{k=1}^{M}p_{k}^{\Delta_{1}}}}\sum_{j\in{\cal D}}\sigma_{j}[k-1]\right.\cr&\left.+\Lambda[k-1]\sum_{j=1, j\ne i}^{M}{{c_{ji}^{\Delta_{2}\Delta_{1}}}\over{C_{j\rightarrow}^{\Delta_{2}\Delta_{1}}}}\sigma_{j}[k-1]\right)\cr\sum_{i=1}^{M}\sigma_{i}[k&]=1&{\hbox{(16)}}} where $\delta$ and $\gamma$ are two constant set to weight, respectively, the amount of influence of the number of articles published and the number of citations received to determine the level of prestige (typically $\delta=0.9$, and $\gamma=0.0999$), while ${\cal D}$ is the set of dangling nodes in window $\Delta_{1}$. It is worth mentioning the significance of each term in (16). The first two elements of the RHS, making up 10% of a journal's prestige value, are constant through the iterations and accounts, respectively, for the simple existence of the journal in the data base and for the fraction of articles it publishes in the entire network.

As far as the third element is concerned, the first addend represents the prestige transferred from each journal $j$ citing journal $i$, which is weighted by the fraction of citations which journal $j$ gives to journal $i$ with respect to its total in the citation window $\Delta_{1}$. Note also that due to the fact that only citations in the period $\Delta_{1}$ are considered in the process of transferring prestige, the quantity TeX Source $$\Lambda [k-1]={{1-\sum\nolimits_{j\in{\cal D}}\sigma_{j}[k-1]}\over{\sum\nolimits_{m=1}^{M}\sum\nolimits_{n=1}^{M}{{c_{mn}^{\Delta_{2}\Delta_{1}}}\over{C_{n\rightarrow}^{\Delta_{2}\Delta_{1}}}}\sigma_{n}[k-1]}}$$ has been introduced to avoid losing prestige in the transferring process since not all citations of each paper are considered. To do so, one simply multiplies by the above coefficient $\Lambda$ which is just the ratio between the total prestige (i.e., 1 apart from what is given to dangling nodes) and the one distributed through the citations falling in window $\Delta_{1}$. The second addend aims at redistributing the amount of prestige lost to dangling nodes and does so simply in a way which is proportional to the fractions of papers published in journal $i$ with respect to the total in the data base.

Once system (16) has been solved, the SJR of journal $i$ can be computed as TeX Source $${\rm SJR}_{i}={{\sigma_{i}}\over{p_{i}^{\Delta_{1}}}}\eqno{\hbox{(17)}}$$

From the above equation one readily gets that SJR is a measure of quality per-paper and it is therefore comparable to the AI but not directly to the EF. From this point of view, it is also interesting to notice that, even when one sets $\gamma+\delta=\alpha$, the solution of (16) does not lead to ${\rm EF}_{i}$ due to the absence of the equivalent of (13) in the set of equations defining the SJR. This offers a possible explanation of what Bollen at al. found in their analysis [39], in which they have shown that the SJR's placement in the “space” of impact measures was closer to IF than to EF and other PR-based measures.

### H. $h$-Index

Similarly to what is defined by Hirsh for scientists [43], a journal has $h$-index of $h$ if $h$ of the $N$ papers it has published have $h$ citations each and the other $N-h$ have fewer than $h$ citations each. The use of this indicator to complement IF in journal evaluation was first suggested by [56] and, differently with respect to a scientist, to compensate for the journal age, $N$ may also refer to the most recent (say in the last 5 or 10 years) papers published in a journal.

## Footnotes

G. Setti is with the Department of Engineering (ENDIF), University of Ferrara, Ferrara 44100, Italy, and also with the Advanced Research Center on Electronic Systems, University of Bologna, Bologna 40125, Italy Corresponding author: G. Setti (gianluca.setti@unife.it)

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

1We wish to stress that the scope of this paper is indeed to consider an overview of journals indicators as a way to measure journal impact. Other use (and most often misuse) of these indicators will not be considered here apart briefly in the concluding section.

2We will not consider here other criticisms related to the consistency of the data base (e.g., the accuracy of data capture, [9]) or the coverage of the data base (all-inclusive vs quality-based inclusive, [10]), since these factors will uniformly affect, ceteris paribus any bibliometric indicator.

3The most spectacular effect of such non-peer reviewed content was a 43% increase in the 1992 IF of the Lancet [48].

4Thomson maintains a list of suppressed titles which are those that “were found to have anomalous citation patterns resulting in a significant distortion of the IF, so that the rank does not accurately reflect the journal's citation performance in the literature.” [51], [52]

5Note that the absence of a normalization by the size of the journal (which is implicit in every per-article measure) can be considered a desirable feature. In fact, one may argue that publishing only high impact papers is increasingly more difficult with an increasing journal size and also that large size journals provide potentially more benefits to the overall growth of the scientific community and of the impact of its produced results than very small ones publishing only 15–20 very high quality contributions per year.

6This includes not only journals and conference proceedings but also archival repositories such as arXiv.org.

7Adapting the example in [45], assume that journal JA has a better performance index than JB which is based on the citations given to the papers they both publish. Suppose next that both journals obtain one additional paper each, both with the same number of citations. It would be natural to assume that JA still has better performance than JB. If this happens the performance index is said to be consistent. Interestingly, the $h$-index is not consistent. Suppose that JA has four papers with 4 citations each and JB has three papers with five citations each, then $h_{5}({\rm JA})=4$ and $h_{5}({\rm JB})=3$. If both journals receive another paper with five citations, then the relative performance changes since $h_{5}({\rm JA})=h_{5}({\rm JB})=4$ (and if an additional paper with five citations arrives for both journals, then JB even outperforms JA).

8see [41] pp. 1329 for a thorough description of the data consistency.

9Note that the particularly low value for RK-EF for IJHE may also be due to the large and increasing number of papers published by it in recent years.

10i.e., $\theta$ is the inverse of the median of the quantity $\sum_{j\in{\cal I}}C_{j\rightarrow}^{\Delta_{2}\Delta_{1}}/p_{j\rightarrow i}^{\Delta_{2}\Delta_{1}}$.

11It can be proven that if $\alpha<1$ this system of linear equations always has a unique solution. For $\alpha=1$, the system of linear equations has a unique solution if the journal citation matrix $[c_{ij}^{\Delta_{2}\Delta_{1}}]$ is irreducible [23].

12This property can be very useful for librarians to compare the value of specific journal bundles.

## References

No Data Available

## Cited By

No Data Available

None

## Multimedia

No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
ISSN:
None
INSPEC Accession Number:
None
Digital Object Identifier:
None
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available