https://guides.lib.umich.edu/citation
Defining Citation Analysis
What is it?
Citation analysis is the study of the impact and assumed quality of an article, an author, or an institution based on the number of times works and/or authors have been cited by others.
Why use it?
- To find out how much impact a particular article has had by showing which authors based some work upon it or cited it as an example within their own papers.
- To find out more about a field or topic; i.e. by reading the papers that cite a seminal work in that area.
- To determine how much impact a particular author has had by looking at the number of times his/her work has been cited by others.
Comparing Citation Analysis Sources
Here is a quick summary of what to expect from the three best known citation analysis tools.
Web of Science | Scopus | Google Scholar | |
---|---|---|---|
Subject Focus |
|
|
Theoretically, all disciplines |
Coverage |
|
|
|
Time Span | Some journals from 1900 | Some journals from the 1820s | Some citations as far back as the 1660s and 1670s |
Updated | Weekly | Daily | Unknown but generally quick |
Strengths |
|
|
|
Weaknesses |
|
|
|
https://rdrr.io/github/massimoaria/bibliometrix/man/bibliometrix-package.html
bibliometrix-package: An R-Tool for Comprehensive Science Mapping Analysis
Description Details Author(s) References
Description
Tool for quantitative research in scientometrics and bibliometrics. It provides various routines for importing bibliographic data from SCOPUS (<http://scopus.com>), Clarivate Analytics Web of Science (<http://www.webofknowledge.com/>), Dimensions (<https://www.dimensions.ai/>), Cochrane Library (<http://www.cochranelibrary.com/>) and PubMed (<https://www.ncbi.nlm.nih.gov/pubmed/>) databases, performing bibliometric analysis and building networks for co-citation, coupling, scientific collaboration and co-word analysis.
Details
INSTALLATION
- Stable version from CRAN:
install.packages("bibliometrix")
- Or development version from GitHub:
install.packages("devtools") devtools::install_github("massimoaria/bibliometrix")
- Load "bibliometrix"
library('bibliometrix')
DATA LOADING AND CONVERTING
The export file can be read by R using the function *readFiles*: (An example from bibliometrix vignettes)
D <- readFiles("http://www.bibliometrix.org/datasets/savedrecs.bib")
D is a large character vector. *readFiles* argument contains the name of files downloaded from SCOPUS, Clarivate Analytics WOS, or Cochrane CDSR website.
The function *readFiles* combines all the text files onto a single large character vector. Furthermore, the format is converted into UTF-8.
es. D <- readFiles("file1.txt","file2.txt", ...)
The object D can be converted in a data frame using the function *convert2df*:
M <- convert2df(D, dbsource = "isi", format = "bibtex")
*convert2df* creates a bibliographic data frame with cases corresponding to manuscripts and variables to Field Tag in the original export file. Each manuscript contains several elements, such as authors' names, title, keywords and other information. All these elements constitute the bibliographic attributes of a document, also called metadata. Data frame columns are named using the standard Clarivate Analytics WoS Field Tag codify.
BIBLIOMETRIC ANALYSIS
The first step is to perform a descriptive analysis of the bibliographic data frame. The function *biblioAnalysis* calculates main bibliometric measures using this syntax:
results <- biblioAnalysis(M, sep = ";")
The function *biblioAnalysis* returns an object of class "bibliometrix".
To summarize main results of the bibliometric analysis, use the generic function *summary*. It displays main information about the bibliographic data frame and several tables, such as annual scientific production, top manuscripts per number of citations, most productive authors, most productive countries, total citation per country, most relevant sources (journals) and most relevant keywords. *summary* accepts two additional arguments. *k* is a formatting value that indicates the number of rows of each table. *pause* is a logical value (TRUE or FALSE) used to allow (or not) pause in screen scrolling. Choosing k=10 you decide to see the first 10 Authors, the first 10 sources, etc.
S <- summary(object = results, k = 10, pause = FALSE)
Some basic plots can be drawn using the generic function plot:
plot(x = results, k = 10, pause = FALSE)
BIBLIOGRAPHIC NETWORK MATRICES
Manuscript's attributes are connected to each other through the manuscript itself: author(s) to journal, keywords to publication date, etc. These connections of different attributes generate bipartite networks that can be represented as rectangular matrices (Manuscripts x Attributes). Furthermore, scientific publications regularly contain references to other scientific works. This generates a further network, namely, co-citation or coupling network. These networks are analyzed in order to capture meaningful properties of the underlying research system, and in particular to determine the influence of bibliometric units such as scholars and journals.
*biblioNetwork* function
The function *biblioNetwork* calculates, starting from a bibliographic data frame, the most frequently used networks: Coupling, Co-citation, Co-occurrences, and Collaboration. *biblioNetwork* uses two arguments to define the network to compute: - *analysis* argument can be "co-citation", "coupling", "collaboration", or "co-occurrences". - *network* argument can be "authors", "references", "sources", "countries", "universities", "keywords", "author_keywords", "titles" and "abstracts".
i.e. the following code calculates a classical co-citation network:
NetMatrix <- biblioNetwork(M, analysis = "co-citation", network = "references", sep = ". ")
VISUALIZING BIBLIOGRAPHIC NETWORKS
All bibliographic networks can be graphically visualized or modeled. Using the function *networkPlot*, you can plot a network created by *biblioNetwork* using R routines.
The main argument of *networkPlot* is type. It indicates the network map layout: circle, kamada-kawai, mds, etc.
In the following, we propose some examples.
### Country Scientific Collaboration
# Create a country collaboration network
M <- metaTagExtraction(M, Field = "AU_CO", sep = ";")
NetMatrix <- biblioNetwork(M, analysis = "collaboration", network = "countries", sep = ";")
# Plot the network
net=networkPlot(NetMatrix, n = dim(NetMatrix)[1], Title = "Country Collaboration", type = "circle", size=TRUE, remove.multiple=FALSE,labelsize=0.8)
### Co-Citation Network
# Create a co-citation network
NetMatrix <- biblioNetwork(M, analysis = "co-citation", network = "references", sep = ". ")
# Plot the network
net=networkPlot(NetMatrix, n = 30, Title = "Co-Citation Network", type = "fruchterman", size=T, remove.multiple=FALSE, labelsize=0.7,edgesize = 5)
### Keyword co-occurrences
# Create keyword co-occurrences network
NetMatrix <- biblioNetwork(M, analysis = "co-occurrences", network = "keywords", sep = ";")
# Plot the network
net=networkPlot(NetMatrix, normalize="association", weighted=T, n = 30, Title = "Keyword Co-occurrences", type = "fruchterman", size=T,edgesize = 5,labelsize=0.7)
CO-WORD ANALYSIS: THE CONCEPTUAL STRUCTURE OF A FIELD
The aim of the co-word analysis is to map the conceptual structure of a framework using the word co-occurrences in a bibliographic collection. The analysis can be performed through dimensionality reduction techniques such as Multidimensional Scaling (MDS), Correspondence Analysis (CA) or Multiple Correspondence Analysis (MCA). Here, we show an example using the function *conceptualStructure* that performs a CA or MCA to draw a conceptual structure of the field and K-means clustering to identify clusters of documents which express common concepts. Results are plotted on a two-dimensional map. *conceptualStructure* includes natural language processing (NLP) routines (see the function *termExtraction*) to extract terms from titles and abstracts. In addition, it implements the Porter's stemming algorithm to reduce inflected (or sometimes derived) words to their word stem, base or root form.
# Conceptual Structure using keywords (method="CA")
CS <- conceptualStructure(M,field="ID", method="CA", minDegree=4, k.max=8, stemming=FALSE, labelsize=10, documents=10)
HISTORICAL DIRECT CITATION NETWORK
The historiographic map is a graph proposed by E. Garfield to represent a chronological network map of most relevant direct citations resulting from a bibliographic collection. The function histNetwork generates a chronological direct citation network matrix which can be plotted using *histPlot*:
# Create a historical citation network
histResults <- histNetwork(M, n = 20, sep = ". ")
# Plot a historical co-citation network
net <- histPlot(histResults, size = FALSE,label=TRUE, arrowsize = 0.5)
Author(s)
Massimo Aria [cre, aut], Corrado Cuccurullo [aut]
Maintainer: Massimo Aria <aria@unina.it>;
References
Aria, M. & Cuccurullo, C. (2017). *bibliometrix*: An R-tool for comprehensive science mapping analysis, *Journal of Informetrics*, 11(4), pp 959-975, Elsevier, DOI: 10.1016/j.joi.2017.08.007 (https://doi.org/10.1016/j.joi.2017.08.007).
Cuccurullo, C., Aria, M., & Sarto, F. (2016). Foundations and trends in performance management. A twenty-five years bibliometric analysis in business and public administration domains, *Scientometrics*, DOI: 10.1007/s11192-016-1948-8 (https://doi.org/10.1007/s11192-016-1948-8).
Cuccurullo, C., Aria, M., & Sarto, F. (2015). Twenty years of research on performance management in business and public administration domains. Presentation at the *Correspondence Analysis and Related Methods conference (CARME 2015)* in September 2015 (http://www.bibliometrix.org/documents/2015Carme_cuccurulloetal.pdf).
Sarto, F., Cuccurullo, C., & Aria, M. (2014). Exploring healthcare governance literature: systematic review and paths for future research. *Mecosan* (http://www.francoangeli.it/Riviste/Scheda_Rivista.aspx?IDarticolo=52780&lingua=en).
Cuccurullo, C., Aria, M., & Sarto, F. (2013). Twenty years of research on performance management in business and public administration domains. In *Academy of Management Proceedings* (Vol. 2013, No. 1, p. 14270). Academy of Management (https://doi.org/10.5465/AMBPP.2013.14270abstract).
massimoaria/bibliometrix documentation built on March 9, 2020, 3:58 p.m.
https://rdrr.io/cran/CITAN/man/CITAN-package.html
CITAN-package: CITation ANalysis toolpack
Description Details Author(s) References
Description
CITAN is a library of functions useful in — but not limited to — quantitative research in the field of scientometrics. It contains various tools for preprocessing bibliographic data retrieved from, e.g., Elsevier's SciVerse Scopus and computing bibliometric impact of individuals. Moreover, some functions dealing with Pareto-Type II (GPD) and Discretized Pareto-Type II statistical models are included (e.g., Zhang-Stephens and MLE estimators, goodness-of-fit and two-sample tests, confidence intervals for the theoretical Hirsch index etc.). They may be used to describe and analyze many phenomena encountered in the social sciences.
Details
Fair and objective assessment methods of individual scientists had become the focus of scientometricians' attention since the very beginning of their discipline. A quantitative expression of some publication-citation process' characteristics is assumed to be a predictor of broadly conceived scientific competence. It may be used e.g. in building decision support systems for scientific quality control.
The h-index, proposed by J.E. Hirsch (2005) is among the most popular scientific impact indicators. An author who has published n papers has the Hirsch index equal to H, if each of his H publications were cited at least H times, and each of the remaining n-H items were cited no more than H times. This simple bibliometric tool quickly received much attention in the academic community and started to be a subject of intensive research. It was noted that, contrary to earlier approaches, i.e. publication count, citation count, etc., this measure concerns both productivity and impact of an individual.
In a broader perspective, this issue is a special case of the so-called Producer Assessment Problem (PAP; see Gagolewski, Grzegorzewski, 2010b).
Consider a producer (e.g. a writer, scientist, artist, craftsman) and a nonempty set of his products (e.g. books, papers, works, goods). Suppose that each product is given a rating (of quality, popularity, etc.) which is a single number in I=[a,b], where a denotes the lowest admissible valuation. We typically choose I=[0,∞] (an interval in the extended real line). Some instances of the PAP are listed below.
Producer | Products | Rating method | Discipline | |
A | Scientist | Scientific articles | Number of citations | Scientometrics |
B | Scientific institute | Scientists | The h-index | Scientometrics |
C | Web server | Web pages | Number of in-links | Webometrics |
D | Artist | Paintings | Auction price | Auctions |
E | Billboard company | Advertisements | Sale results | Marketing |
Each possible state of producer's activity can therefore be represented by a point x\in I^n for some n. Our aim is thus to construct and analyze — both theoretically and empirically — aggregation operators (cf. Grabisch et al, 2009) which can be used for rating producers. A family of such functions should take the two following aspects of producer's quality into account:
-
the ability to make highly-rated products,
-
overall productivity, n.
For some more formal considerations please refer to (Gagolewski, Grzegorzewski, 2011).
To preprocess and analyze bibliometric data (cf. Gagolewski, 2011) retrieved from e.g. Elsevier's SciVerse Scopus we need the RSQLite package. It is an interface to the free SQLite DataBase Management System (see http://www.sqlite.org/). All data is stored in a so-called Local Bibliometric Storage (LBS), created with the lbsCreate
function.
The data frames Scopus_ASJC
and Scopus_SourceList
contain various information on current source coverage of SciVerse Scopus. They may be needed during the creation of the LBS and lbsCreate
for more details. License information: this data are publicly available and hence no special permission is needed to redistribute them (information from Elsevier).
CITAN is able to import publication data from Scopus CSV files (saved with settings "Output: complete format" or "Output: Citations only", see Scopus_ReadCSV
). Note that the output limit in Scopus is 2000 entries per file. Therefore, to perform bibliometric research we often need to divide the query results into many parts. CITAN is able to merge them back even if records are repeated.
The data may be accessed via functions from the DBI interface. However, some typical tasks may be automated using e.g. lbsDescriptiveStats
(basic description of the whole sample or its subsets, called ‘Surveys’), lbsGetCitations
(gather citation sequences selected authors), and lbsAssess
(mass-compute impact functions' values for given citation sequences).
There are also some helpful functions (in **EXPERIMENTAL** stage) which use the RGtk2 library (see Lawrence, Lang, 2010) to display some suggestions on which documents or authors should be merged, see lbsFindDuplicateTitles
and lbsFindDuplicateAuthors
.
For a complete list of functions, call library(help="CITAN")
.
Keywords: Hirsch's h-index, Egghe's g-index, L-statistics, S-statistics, bibliometrics, scientometrics, informetrics, webometrics, aggregation operators, arity-monotonicity, impact functions, impact assessment.
Author(s)
Marek Gagolewski
References
GTK+ Project, http://www.gtk.org
SQLite DBMS, http://www.sqlite.org/
Dubois D., Prade H., Testemale C. (1988). Weighted fuzzy pattern matching, Fuzzy Sets and Systems 28, s. 313-331.
Egghe L. (2006). Theory and practise of the g-index, Scientometrics 69(1), 131-152.
Gagolewski M., Grzegorzewski P. (2009). A geometric approach to the construction of scientific impact indices, Scientometrics 81(3), 617-634.
Gagolewski M., Debski M., Nowakiewicz M. (2009). Efficient algorithms for computing ”geometric” scientific impact indices, Research Report of Systems Research Institute, Polish Academy of Sciences RB/1/2009.
Gagolewski M., Grzegorzewski P. (2010a). S-statistics and their basic properties, In: Borgelt C. et al (Eds.), Combining Soft Computing and Statistical Methods in Data Analysis, Springer-Verlag, 281-288.
Gagolewski M., Grzegorzewski P. (2010b). Arity-monotonic extended aggregation operators, In: Hullermeier E., Kruse R., Hoffmann F. (Eds.), Information Processing and Management of Uncertainty in Knowledge-Based Systems, CCIS 80, Springer-Verlag, 693-702.
Gagolewski M. (2011). Bibliometric Impact Assessment with R and the CITAN Package, Journal of Informetrics 5(4), 678-692.
Gagolewski M., Grzegorzewski P. (2011a). Axiomatic Characterizations of (quasi-) L-statistics and S-statistics and the Producer Assessment Problem, for Fuzzy Logic and Technology (EUSFLAT/LFA 2011), Atlantic Press, 53-58. Grabisch M., Pap E., Marichal J.-L., Mesiar R. (2009). Aggregation functions, Cambridge.
Gagolewski M., Grzegorzewski P. (2011b). Possibilistic analysis of arity-monotonic aggregation operators and its relation to bibliometric impact assessment of individuals, International Journal of Approximate Reasoning 52(9), 1312-1324.
Hirsch J.E. (2005). An index to quantify individual's scientific research output, Proceedings of the National Academy of Sciences 102(46), 16569-16572.
Kosmulski M. (2007). MAXPROD - A new index for assessment of the scientific output of an individual, and a comparison with the h-index, Cybermetrics 11(1).
Lawrence M., Lang D.T. (2010). RGtk2: A graphical user interface toolkit for R, Journal of Statistical Software 37(8), 1-52.
Woeginger G.J. (2008). An axiomatic characterization of the Hirsch-index, Mathematical Social Sciences 56(2), 224-232.
Zhang J., Stevens M.A. (2009). A New and Efficient Estimation Method for the Generalized Pareto Distribution, Technometrics 51(3), 316-325.