文章信息: 作者： kaiwu; 日期：2022年8月10日; 点击数：797

Defining Citation Analysis

What is it?

Citation analysis is the study of the impact and assumed quality of an article, an author, or an institution based on the number of times works and/or authors have been cited by others.

Why use it?

To find out how much impact a particular article has had by showing which authors based some work upon it or cited it as an example within their own papers.
To find out more about a field or topic; i.e. by reading the papers that cite a seminal work in that area.
To determine how much impact a particular author has had by looking at the number of times his/her work has been cited by others.

Comparing Citation Analysis Sources

Here is a quick summary of what to expect from the three best known citation analysis tools.

	Web of Science	Scopus	Google Scholar
Subject Focus	Science Citation Index Social Science Citation Index Arts & Humanities Citation Index	Health Sciences Physical Sciences Social Sciences Life Sciences	Theoretically, all disciplines
Coverage	Over 12,000 peer-reviewed journals Over 1,300 open access journals 30,000 books with 10,000 added annually Over 2.6 M chemical compounds and 1 M chemical reactions 148,000 conference titles with 12,000 added annually	Over 21,500 peer-reviewed journals Over 360 trade publications Over 4,200 open access journals Over 120,000 book titles Over 7.2 M conference papers Over 27 M patent records	Books from Google Books Dissertations Peer-reviewed articles Patents Case law Trade journals Slide presentations Gray literature Newsletters Syllabi (if cited by scholarly articles)
Time Span	Some journals from 1900	Some journals from the 1820s	Some citations as far back as the 1660s and 1670s
Updated	Weekly	Daily	Unknown but generally quick
Strengths	Excellent search limits by discipline The most well-known and most used resource for citation analysis Citation analysis goes back farther than Scopus	Better open access journal coverage Better foreign language coverage Better Social Sciences & Arts/Humanities coverage	Excellent resource for finding cited references It's free May find more obscure references
Weaknesses	Weaker Arts/Humanities & Social Sciences content than Scopus	Cannot search by date any earlier than 1960	Too much irrelevant content in search results Few options for sorting results

bibliometrix-package.html

bibliometrix-package: An R-Tool for Comprehensive Science Mapping Analysis

Description Details Author(s) References

Tool for quantitative research in scientometrics and bibliometrics. It provides various routines for importing bibliographic data from SCOPUS (<http://scopus.com>), Clarivate Analytics Web of Science (<http://www.webofknowledge.com/>), Dimensions (<https://www.dimensions.ai/>), Cochrane Library (<http://www.cochranelibrary.com/>) and PubMed (<https://www.ncbi.nlm.nih.gov/pubmed/>) databases, performing bibliometric analysis and building networks for co-citation, coupling, scientific collaboration and co-word analysis.

INSTALLATION

- Stable version from CRAN:

install.packages("bibliometrix")

- Or development version from GitHub:

install.packages("devtools") devtools::install_github("massimoaria/bibliometrix")

- Load "bibliometrix"

library('bibliometrix')

DATA LOADING AND CONVERTING

The export file can be read by R using the function *readFiles*: (An example from bibliometrix vignettes)

D <- readFiles("http://www.bibliometrix.org/datasets/savedrecs.bib")

D is a large character vector. *readFiles* argument contains the name of files downloaded from SCOPUS, Clarivate Analytics WOS, or Cochrane CDSR website.

The function *readFiles* combines all the text files onto a single large character vector. Furthermore, the format is converted into UTF-8.

es. D <- readFiles("file1.txt","file2.txt", ...)

The object D can be converted in a data frame using the function *convert2df*:

M <- convert2df(D, dbsource = "isi", format = "bibtex")

*convert2df* creates a bibliographic data frame with cases corresponding to manuscripts and variables to Field Tag in the original export file. Each manuscript contains several elements, such as authors' names, title, keywords and other information. All these elements constitute the bibliographic attributes of a document, also called metadata. Data frame columns are named using the standard Clarivate Analytics WoS Field Tag codify.

BIBLIOMETRIC ANALYSIS

The first step is to perform a descriptive analysis of the bibliographic data frame. The function *biblioAnalysis* calculates main bibliometric measures using this syntax:

results <- biblioAnalysis(M, sep = ";")

The function *biblioAnalysis* returns an object of class "bibliometrix".

To summarize main results of the bibliometric analysis, use the generic function *summary*. It displays main information about the bibliographic data frame and several tables, such as annual scientific production, top manuscripts per number of citations, most productive authors, most productive countries, total citation per country, most relevant sources (journals) and most relevant keywords. *summary* accepts two additional arguments. *k* is a formatting value that indicates the number of rows of each table. *pause* is a logical value (TRUE or FALSE) used to allow (or not) pause in screen scrolling. Choosing k=10 you decide to see the first 10 Authors, the first 10 sources, etc.

S <- summary(object = results, k = 10, pause = FALSE)

Some basic plots can be drawn using the generic function plot:

plot(x = results, k = 10, pause = FALSE)

BIBLIOGRAPHIC NETWORK MATRICES

Manuscript's attributes are connected to each other through the manuscript itself: author(s) to journal, keywords to publication date, etc. These connections of different attributes generate bipartite networks that can be represented as rectangular matrices (Manuscripts x Attributes). Furthermore, scientific publications regularly contain references to other scientific works. This generates a further network, namely, co-citation or coupling network. These networks are analyzed in order to capture meaningful properties of the underlying research system, and in particular to determine the influence of bibliometric units such as scholars and journals.

*biblioNetwork* function

The function *biblioNetwork* calculates, starting from a bibliographic data frame, the most frequently used networks: Coupling, Co-citation, Co-occurrences, and Collaboration. *biblioNetwork* uses two arguments to define the network to compute: - *analysis* argument can be "co-citation", "coupling", "collaboration", or "co-occurrences". - *network* argument can be "authors", "references", "sources", "countries", "universities", "keywords", "author_keywords", "titles" and "abstracts".

i.e. the following code calculates a classical co-citation network:

NetMatrix <- biblioNetwork(M, analysis = "co-citation", network = "references", sep = ". ")

VISUALIZING BIBLIOGRAPHIC NETWORKS

All bibliographic networks can be graphically visualized or modeled. Using the function *networkPlot*, you can plot a network created by *biblioNetwork* using R routines.

The main argument of *networkPlot* is type. It indicates the network map layout: circle, kamada-kawai, mds, etc.

In the following, we propose some examples.

### Country Scientific Collaboration

# Create a country collaboration network

M <- metaTagExtraction(M, Field = "AU_CO", sep = ";")

NetMatrix <- biblioNetwork(M, analysis = "collaboration", network = "countries", sep = ";")

# Plot the network

net=networkPlot(NetMatrix, n = dim(NetMatrix)[1], Title = "Country Collaboration", type = "circle", size=TRUE, remove.multiple=FALSE,labelsize=0.8)

### Co-Citation Network

# Create a co-citation network

NetMatrix <- biblioNetwork(M, analysis = "co-citation", network = "references", sep = ". ")

# Plot the network

net=networkPlot(NetMatrix, n = 30, Title = "Co-Citation Network", type = "fruchterman", size=T, remove.multiple=FALSE, labelsize=0.7,edgesize = 5)

### Keyword co-occurrences

# Create keyword co-occurrences network

NetMatrix <- biblioNetwork(M, analysis = "co-occurrences", network = "keywords", sep = ";")

# Plot the network

net=networkPlot(NetMatrix, normalize="association", weighted=T, n = 30, Title = "Keyword Co-occurrences", type = "fruchterman", size=T,edgesize = 5,labelsize=0.7)

CO-WORD ANALYSIS: THE CONCEPTUAL STRUCTURE OF A FIELD

The aim of the co-word analysis is to map the conceptual structure of a framework using the word co-occurrences in a bibliographic collection. The analysis can be performed through dimensionality reduction techniques such as Multidimensional Scaling (MDS), Correspondence Analysis (CA) or Multiple Correspondence Analysis (MCA). Here, we show an example using the function *conceptualStructure* that performs a CA or MCA to draw a conceptual structure of the field and K-means clustering to identify clusters of documents which express common concepts. Results are plotted on a two-dimensional map. *conceptualStructure* includes natural language processing (NLP) routines (see the function *termExtraction*) to extract terms from titles and abstracts. In addition, it implements the Porter's stemming algorithm to reduce inflected (or sometimes derived) words to their word stem, base or root form.

# Conceptual Structure using keywords (method="CA")

CS <- conceptualStructure(M,field="ID", method="CA", minDegree=4, k.max=8, stemming=FALSE, labelsize=10, documents=10)

HISTORICAL DIRECT CITATION NETWORK

The historiographic map is a graph proposed by E. Garfield to represent a chronological network map of most relevant direct citations resulting from a bibliographic collection. The function histNetwork generates a chronological direct citation network matrix which can be plotted using *histPlot*:

# Create a historical citation network

histResults <- histNetwork(M, n = 20, sep = ". ")

# Plot a historical co-citation network

net <- histPlot(histResults, size = FALSE,label=TRUE, arrowsize = 0.5)

Massimo Aria [cre, aut], Corrado Cuccurullo [aut]

Maintainer: Massimo Aria <aria@unina.it>;

Aria, M. & Cuccurullo, C. (2017). *bibliometrix*: An R-tool for comprehensive science mapping analysis, *Journal of Informetrics*, 11(4), pp 959-975, Elsevier, DOI: 10.1016/j.joi.2017.08.007 (https://doi.org/10.1016/j.joi.2017.08.007).

Cuccurullo, C., Aria, M., & Sarto, F. (2016). Foundations and trends in performance management. A twenty-five years bibliometric analysis in business and public administration domains, *Scientometrics*, DOI: 10.1007/s11192-016-1948-8 (https://doi.org/10.1007/s11192-016-1948-8).

Cuccurullo, C., Aria, M., & Sarto, F. (2015). Twenty years of research on performance management in business and public administration domains. Presentation at the *Correspondence Analysis and Related Methods conference (CARME 2015)* in September 2015 (http://www.bibliometrix.org/documents/2015Carme_cuccurulloetal.pdf).

Sarto, F., Cuccurullo, C., & Aria, M. (2014). Exploring healthcare governance literature: systematic review and paths for future research. *Mecosan* (http://www.francoangeli.it/Riviste/Scheda_Rivista.aspx?IDarticolo=52780&lingua=en).

Cuccurullo, C., Aria, M., & Sarto, F. (2013). Twenty years of research on performance management in business and public administration domains. In *Academy of Management Proceedings* (Vol. 2013, No. 1, p. 14270). Academy of Management (https://doi.org/10.5465/AMBPP.2013.14270abstract).

massimoaria/bibliometrix documentation built on March 9, 2020, 3:58 p.m.

https://rdrr.io/cran/CITAN/man/CITAN-package.html

CITAN-package: CITation ANalysis toolpack

Description Details Author(s) References

CITAN is a library of functions useful in — but not limited to — quantitative research in the field of scientometrics. It contains various tools for preprocessing bibliographic data retrieved from, e.g., Elsevier's SciVerse Scopus and computing bibliometric impact of individuals. Moreover, some functions dealing with Pareto-Type II (GPD) and Discretized Pareto-Type II statistical models are included (e.g., Zhang-Stephens and MLE estimators, goodness-of-fit and two-sample tests, confidence intervals for the theoretical Hirsch index etc.). They may be used to describe and analyze many phenomena encountered in the social sciences.

Fair and objective assessment methods of individual scientists had become the focus of scientometricians' attention since the very beginning of their discipline. A quantitative expression of some publication-citation process' characteristics is assumed to be a predictor of broadly conceived scientific competence. It may be used e.g. in building decision support systems for scientific quality control.

The h-index, proposed by J.E. Hirsch (2005) is among the most popular scientific impact indicators. An author who has published n papers has the Hirsch index equal to H, if each of his H publications were cited at least H times, and each of the remaining n-H items were cited no more than H times. This simple bibliometric tool quickly received much attention in the academic community and started to be a subject of intensive research. It was noted that, contrary to earlier approaches, i.e. publication count, citation count, etc., this measure concerns both productivity and impact of an individual.

In a broader perspective, this issue is a special case of the so-called Producer Assessment Problem (PAP; see Gagolewski, Grzegorzewski, 2010b).

Consider a producer (e.g. a writer, scientist, artist, craftsman) and a nonempty set of his products (e.g. books, papers, works, goods). Suppose that each product is given a rating (of quality, popularity, etc.) which is a single number in I=[a,b], where a denotes the lowest admissible valuation. We typically choose I=[0,∞] (an interval in the extended real line). Some instances of the PAP are listed below.

	Producer	Products	Rating method	Discipline
A	Scientist	Scientific articles	Number of citations	Scientometrics
B	Scientific institute	Scientists	The h-index	Scientometrics
C	Web server	Web pages	Number of in-links	Webometrics
D	Artist	Paintings	Auction price	Auctions
E	Billboard company	Advertisements	Sale results	Marketing

Each possible state of producer's activity can therefore be represented by a point x\in I^n for some n. Our aim is thus to construct and analyze — both theoretically and empirically — aggregation operators (cf. Grabisch et al, 2009) which can be used for rating producers. A family of such functions should take the two following aspects of producer's quality into account:

the ability to make highly-rated products,
overall productivity, n.

For some more formal considerations please refer to (Gagolewski, Grzegorzewski, 2011).

To preprocess and analyze bibliometric data (cf. Gagolewski, 2011) retrieved from e.g. Elsevier's SciVerse Scopus we need the RSQLite package. It is an interface to the free SQLite DataBase Management System (see http://www.sqlite.org/). All data is stored in a so-called Local Bibliometric Storage (LBS), created with the lbsCreate function.

The data frames Scopus_ASJC and Scopus_SourceList contain various information on current source coverage of SciVerse Scopus. They may be needed during the creation of the LBS and lbsCreate for more details. License information: this data are publicly available and hence no special permission is needed to redistribute them (information from Elsevier).

CITAN is able to import publication data from Scopus CSV files (saved with settings "Output: complete format" or "Output: Citations only", see Scopus_ReadCSV). Note that the output limit in Scopus is 2000 entries per file. Therefore, to perform bibliometric research we often need to divide the query results into many parts. CITAN is able to merge them back even if records are repeated.

The data may be accessed via functions from the DBI interface. However, some typical tasks may be automated using e.g. lbsDescriptiveStats (basic description of the whole sample or its subsets, called ‘Surveys’), lbsGetCitations (gather citation sequences selected authors), and lbsAssess (mass-compute impact functions' values for given citation sequences).

There are also some helpful functions (in **EXPERIMENTAL** stage) which use the RGtk2 library (see Lawrence, Lang, 2010) to display some suggestions on which documents or authors should be merged, see lbsFindDuplicateTitles and lbsFindDuplicateAuthors.

For a complete list of functions, call library(help="CITAN").

Keywords: Hirsch's h-index, Egghe's g-index, L-statistics, S-statistics, bibliometrics, scientometrics, informetrics, webometrics, aggregation operators, arity-monotonicity, impact functions, impact assessment.

Marek Gagolewski

GTK+ Project, http://www.gtk.org
SQLite DBMS, http://www.sqlite.org/
Dubois D., Prade H., Testemale C. (1988). Weighted fuzzy pattern matching, Fuzzy Sets and Systems 28, s. 313-331.
Egghe L. (2006). Theory and practise of the g-index, Scientometrics 69(1), 131-152.
Gagolewski M., Grzegorzewski P. (2009). A geometric approach to the construction of scientific impact indices, Scientometrics 81(3), 617-634.
Gagolewski M., Debski M., Nowakiewicz M. (2009). Efficient algorithms for computing ”geometric” scientific impact indices, Research Report of Systems Research Institute, Polish Academy of Sciences RB/1/2009.
Gagolewski M., Grzegorzewski P. (2010a). S-statistics and their basic properties, In: Borgelt C. et al (Eds.), Combining Soft Computing and Statistical Methods in Data Analysis, Springer-Verlag, 281-288.
Gagolewski M., Grzegorzewski P. (2010b). Arity-monotonic extended aggregation operators, In: Hullermeier E., Kruse R., Hoffmann F. (Eds.), Information Processing and Management of Uncertainty in Knowledge-Based Systems, CCIS 80, Springer-Verlag, 693-702.
Gagolewski M. (2011). Bibliometric Impact Assessment with R and the CITAN Package, Journal of Informetrics 5(4), 678-692.
Gagolewski M., Grzegorzewski P. (2011a). Axiomatic Characterizations of (quasi-) L-statistics and S-statistics and the Producer Assessment Problem, for Fuzzy Logic and Technology (EUSFLAT/LFA 2011), Atlantic Press, 53-58. Grabisch M., Pap E., Marichal J.-L., Mesiar R. (2009). Aggregation functions, Cambridge.
Gagolewski M., Grzegorzewski P. (2011b). Possibilistic analysis of arity-monotonic aggregation operators and its relation to bibliometric impact assessment of individuals, International Journal of Approximate Reasoning 52(9), 1312-1324.
Hirsch J.E. (2005). An index to quantify individual's scientific research output, Proceedings of the National Academy of Sciences 102(46), 16569-16572.
Kosmulski M. (2007). MAXPROD - A new index for assessment of the scientific output of an individual, and a comparison with the h-index, Cybermetrics 11(1).
Lawrence M., Lang D.T. (2010). RGtk2: A graphical user interface toolkit for R, Journal of Statistical Software 37(8), 1-52.
Woeginger G.J. (2008). An axiomatic characterization of the Hirsch-index, Mathematical Social Sciences 56(2), 224-232.
Zhang J., Stevens M.A. (2009). A New and Efficient Estimation Method for the Generalized Pareto Distribution, Technometrics 51(3), 316-325.

citation analysis and bibliometrics

Defining Citation Analysis

What is it?

Why use it?

Comparing Citation Analysis Sources

https://rdrr.io/github/massimoaria/bibliometrix/man/bibliometrix-package.html

bibliometrix-package: An R-Tool for Comprehensive Science Mapping Analysis

Description

Details

Author(s)

References

CITAN-package: CITation ANalysis toolpack

Description

Details

Author(s)

References

Login Form