class: center, middle, inverse, title-slide .title[ # Exploring past literature:
A bibliometric approach ] .subtitle[ ## R confeRence 2022 ] .date[ ### November27, 2022 ] --- name: about-me layout: false class: about-me-slide, inverse, middle, center ## About me
<img style="border-radius: 50%;" src="figs/tengku_hanis.jpg" width="150px"/> ### Tengku Muhammad Hanis Mokhtar #### MBBCh, MSc (Medical Statistics), <br>PhD Student (Public Health Epidemiology) .fade[Universiti Sains Malaysia<br>Kelantan, Malaysia] [
tengkuhanis.netlify.app](https://tengkuhanis.netlify.app/) [
Twitter
@tmhanis](https://twitter.com/tmhanis) [
GitHub
@tengku-hanis](https://github.com/tengku-hanis)<br> [
LinkedIn
@Tengku Muhammad Hanis](https://www.linkedin.com/in/tengku-muhammad-hanis-9a7222144/) --- ## Outline ### - A bit about bibliometrics ### - Demo in R (not live demo) ### - Recap - brief step to do a bibliometric analysis --- ## What is biliometrics? - Basically, an analysis of bibliographic information - What we can do with bibliometric analysis: - Evaluate research progress - Quantitatively summarise research output - "Mapping" research contributions by author, institution, etc - Explore research trends - When to do a bibliometric? - Too many literature - The scope (research question) is broad - A lot of papers available on [Google Scholar (>45K papers)](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=allintitle%3A+bibliometrics+OR+bibliometric&btnG=) --- ## Bibliographic information <img src="figs/paper-both.png" width="76%" style="display: block; margin: auto;" /> --- ## Type of analysis <img src="figs/type_analysis.png" width="90%" style="display: block; margin: auto;" /> --- class: middle, center background-color: white ## Bibliometrics using bibliometrix --- ## Data - Data was downloaded from the [Scopus](https://www.scopus.com/search/form.uri?display=basic&clear=t&origin=searchadvanced&txGid=f676aa0f046baff36a28dd0a11c7427e#basic) through USM's access - Search terms used: _**(TITLE("covid-19") OR TITLE("covid 19") OR TITLE("2019-nCov") OR TITLE("SARS-CoV-2") OR TITLE("2019 novel coronavirus") OR TITLE("coronavirus disease 2019") OR TITLE("coronavirus disease-19")) AND (TITLE-ABS-KEY(Malaysia))**_ - The search result was further filtered into: 1. Article 2. Review 3. Conference paper 4. Data paper - Final data - 1,265 papers --- ## Load packages ```r library(tidyverse) library(bibliometrix) ``` ## Read data Data should downloaded in a BibTex format for Scopus database ```r dat <- convert2df("data/covid_msia.bib", dbsource = "scopus", format = "bibtex") ``` ## Descriptive ```r res <- biblioAnalysis(dat) summary(res, k = 10) ``` `k` reflects the number of results to be displayed. Since the result is too long, the output is available [here](https://tengku-hanis.github.io/add_rconf2022/). --- ## Basic plots ```r plot(res, k = 10) ``` .panelset.sideways[ .panel[.panel-name[**Plot 1**] <img src="index_files/figure-html/basic plots2-1.png" width="80%" style="display: block; margin: auto;" /> ] .panel[.panel-name[**Plot 2**] <img src="index_files/figure-html/basic plots3-1.png" width="80%" style="display: block; margin: auto;" /> ] .panel[.panel-name[**Plot 3**] <img src="index_files/figure-html/basic plots4-1.png" width="80%" style="display: block; margin: auto;" /> ] .panel[.panel-name[**Plot 4**] <img src="index_files/figure-html/basic plots5-1.png" width="80%" style="display: block; margin: auto;" /> ] .panel[.panel-name[**Plot 5**] <img src="index_files/figure-html/basic plots6-1.png" width="80%" style="display: block; margin: auto;" /> ] ] --- ## Collaborations ### Institutions ```r # Create network MT <- metaTagExtraction(dat, Field = "AU_CO", sep = ";") country_collab <- biblioNetwork(MT, analysis = "collaboration", network = "universities") # Plot network set.seed(123) int_collab <- networkPlot(country_collab, n = 20, cluster = "louvain", Title = "Institutions collaboration", type = "circle", size.cex = T) ``` <img src="index_files/figure-html/inst_collab1-1.png" width="30%" style="display: block; margin: auto;" /> --- Let's see only 2 institutions for each 3 clusters as suggested by the louvain algorithm. ```r int_collab$cluster_res %>% group_by(cluster) %>% slice(1:2) ``` ``` ## # A tibble: 6 × 5 ## # Groups: cluster [3] ## vertex cluster btw_centrality clos_centrality pagera…¹ ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 universiti kebangsaan malaysia 1 15.3 0.0256 0.0658 ## 2 universiti teknologi mara 1 10.2 0.0244 0.0573 ## 3 university of malaya 2 23.5 0.0256 0.0721 ## 4 universiti sains malaysia 2 54.2 0.0278 0.0643 ## 5 universiti malaysia terengganu 3 2.11 0.0238 0.0477 ## 6 monash university malaysia 3 15.4 0.025 0.0497 ## # … with abbreviated variable name ¹pagerank_centrality ``` The top 5 most active institutions based on the institutions collaboration plot. ```r int_collab$nodeDegree %>% slice(1:5) %>% remove_rownames() ``` ``` ## node degree ## 1 universiti sains malaysia 1.0000000 ## 2 universiti kebangsaan malaysia 0.9946043 ## 3 university of malaya 0.9262590 ## 4 universiti teknologi mara 0.6402878 ## 5 universiti putra malaysia 0.5359712 ``` --- ### Countries ```r # Create network country_collab <- biblioNetwork(MT, analysis = "collaboration", network = "countries") # Plot network set.seed(123) country_collab <- networkPlot(country_collab, n = 30, cluster = "louvain", Title = "Countries collaboration", type = "circle", size.cex = T) ``` <img src="index_files/figure-html/country_collab1-1.png" width="45%" style="display: block; margin: auto;" /> --- Let's see only 3 countries for each 2 clusters as suggested by the louvain algorithm. ```r country_collab$cluster_res %>% group_by(cluster) %>% slice(1:3) ``` ``` ## # A tibble: 6 × 5 ## # Groups: cluster [2] ## vertex cluster btw_centrality clos_centrality pagerank_centrality ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 malaysia 1 36.6 0.0244 0.0329 ## 2 singapore 1 29.7 0.0233 0.0327 ## 3 australia 1 27.2 0.0233 0.0327 ## 4 china 2 21.2 0.0222 0.0374 ## 5 indonesia 2 9.34 0.0204 0.0370 ## 6 united kingdom 2 21.2 0.0222 0.0374 ``` The top 5 most active countries based on the countries collaboration plot. ```r country_collab$nodeDegree %>% slice(1:5) %>% remove_rownames() ``` ``` ## node degree ## 1 malaysia 1.0000000 ## 2 united kingdom 0.1579826 ## 3 china 0.1469606 ## 4 indonesia 0.1456246 ## 5 usa 0.1319305 ``` --- ## Lotka's law Lotka's law describe the scientific productivity in certain research area. ```r L <- lotka(res) L$p.value ``` ``` ## [1] 0.01455341 ``` Kolmogorov-Smirnov two sample test showed that there is a significant difference between the observed and theoretical distribution. We can further plot the two distributions. ```r # Theoretical distribution with Beta = 2 Theoretical <- 10^(log10(L$C) - 2 * log10(L$AuthorProd[, 1])) # Data wrangling for plotting ldata <- L$AuthorProd %>% bind_cols(theory = Theoretical) %>% pivot_longer(cols = c(Freq, theory), names_to = "distribution", values_to = "val_distr") %>% mutate(distribution = as.factor(distribution), distribution = fct_recode(distribution, Observed = "Freq", Theoretical = "theory")) ``` --- ```r ggplot(ldata, aes(N.Articles, val_distr, color = distribution)) + geom_line(linewidth = 1) + labs(color = "Distribution:") + ylab("Frequency of authors") + xlab("Number of articles") + theme_minimal() + theme(legend.position = "top") + annotate("text", label = paste0("P-value = ", round(L$p.value, 3)), x = 12, y = 0.1, size = 4) ``` <img src="index_files/figure-html/lotka3 hidden-1.png" width="40%" style="display: block; margin: auto;" /> Lotka's law plot indicated that the observed distribution is significantly lower than the expected distribution. --- ## Trending keywords ```r trend_kw <- fieldByYear(dat, field = "DE", timespan = c(2020, 2022), min.freq = 3, n.items = 6, graph = FALSE, labelsize = 3) trend_kw$graph + theme_bw() ``` <img src="index_files/figure-html/trending keywords-1.png" width="40%" style="display: block; margin: auto;" /> --- ## Biblioshiny <img src="figs/biblioshiny.gif" width="60%" style="display: block; margin: auto;" /> --- ## Steps for bibliometric analysis 1. Define scope, objectives and selection criteria 2. Specify techniques (based on objectives) 3. Comes up with search terms - always check with the databases whether the terms valid or not 4. Data searching on databases: - SCOPUS, WOS, Digital Science Dimensions, PubMed, Lens or Cochrane 5. Review downloaded data/abstracts if needed (brief review) 6. Run bibliometric analysis and report finding --- ## References - Donthu, N., Kumar, S., Mukherjee, D., Pandey, N., & Lim, W. M. (2021). How to conduct a bibliometric analysis: An overview and guidelines. Journal of Business Research, 133, 285–296. https://doi.org/10.1016/j.jbusres.2021.04.070 - Aria, M., & Cuccurullo, C. (2017). bibliometrix : An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975. https://doi.org/10.1016/j.joi.2017.08.007 - Hood, W.W., Wilson, C.S. The Literature of Bibliometrics, Scientometrics, and Informetrics. Scientometrics 52, 291 (2001). https://doi.org/10.1023/A:1017919924342 --- class: center, middle <img src="figs/logo2.png" width="60%" /> # Thanks! .center[.footnote[
Material: https://tinyurl.com/biblio-rconference2022]]