Exploring past literature: A bibliometric approach

class: center, middle, inverse, title-slide

.title[
# Exploring past literature: <br>A bibliometric approach
]
.subtitle[
## R confeRence 2022
]
.date[
### November27, 2022
]

---

name: about-me
layout: false
class: about-me-slide, inverse, middle, center

## About me

<div>
<style type="text/css">.xaringan-extra-logo {
width: 60px;
height: 70px;
z-index: 0;
background-image: url(figs/R_MY.jpg);
background-size: contain;
background-repeat: no-repeat;
position: absolute;
top:1em;right:1em;
}
</style>
<script>(function () {
  let tries = 0
  function addLogo () {
    if (typeof slideshow === 'undefined') {
      tries += 1
      if (tries < 10) {
        setTimeout(addLogo, 100)
      }
    } else {
      document.querySelectorAll('.remark-slide-content:not(.title-slide):not(.inverse):not(.hide_logo)')
        .forEach(function (slide) {
          const logo = document.createElement('div')
          logo.classList = 'xaringan-extra-logo'
          logo.href = null
          slide.appendChild(logo)
        })
    }
  }
  document.addEventListener('DOMContentLoaded', addLogo)
})()</script>
</div>

### Tengku Muhammad Hanis Mokhtar

#### MBBCh, MSc (Medical Statistics), <br>PhD Student (Public Health Epidemiology)

.fade[Universiti Sains Malaysia<br>Kelantan, Malaysia]

[<svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M579.8 267.7c56.5-56.5 56.5-148 0-204.5c-50-50-128.8-56.5-186.3-15.4l-1.6 1.1c-14.4 10.3-17.7 30.3-7.4 44.6s30.3 17.7 44.6 7.4l1.6-1.1c32.1-22.9 76-19.3 103.8 8.6c31.5 31.5 31.5 82.5 0 114L422.3 334.8c-31.5 31.5-82.5 31.5-114 0c-27.9-27.9-31.5-71.8-8.6-103.8l1.1-1.6c10.3-14.4 6.9-34.4-7.4-44.6s-34.4-6.9-44.6 7.4l-1.1 1.6C206.5 251.2 213 330 263 380c56.5 56.5 148 56.5 204.5 0L579.8 267.7zM60.2 244.3c-56.5 56.5-56.5 148 0 204.5c50 50 128.8 56.5 186.3 15.4l1.6-1.1c14.4-10.3 17.7-30.3 7.4-44.6s-30.3-17.7-44.6-7.4l-1.6 1.1c-32.1 22.9-76 19.3-103.8-8.6C74 372 74 321 105.5 289.5L217.7 177.2c31.5-31.5 82.5-31.5 114 0c27.9 27.9 31.5 71.8 8.6 103.9l-1.1 1.6c-10.3 14.4-6.9 34.4 7.4 44.6s34.4 6.9 44.6-7.4l1.1-1.6C433.5 260.8 427 182 377 132c-56.5-56.5-148-56.5-204.5 0L60.2 244.3z"/></svg> tengkuhanis.netlify.app](https://tengkuhanis.netlify.app/)
[<svg aria-label="Twitter" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><title>Twitter</title><path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"/></svg> @tmhanis](https://twitter.com/tmhanis)
[<svg aria-label="GitHub" role="img" viewBox="0 0 496 512" style="height:1em;width:0.97em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><title>GitHub</title><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg> @tengku-hanis](https://github.com/tengku-hanis)<br>
[<svg aria-label="LinkedIn" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><title>LinkedIn</title><path d="M416 32H31.9C14.3 32 0 46.5 0 64.3v383.4C0 465.5 14.3 480 31.9 480H416c17.6 0 32-14.5 32-32.3V64.3c0-17.8-14.4-32.3-32-32.3zM135.4 416H69V202.2h66.5V416zm-33.2-243c-21.3 0-38.5-17.3-38.5-38.5S80.9 96 102.2 96c21.2 0 38.5 17.3 38.5 38.5 0 21.3-17.2 38.5-38.5 38.5zm282.1 243h-66.4V312c0-24.8-.5-56.7-34.5-56.7-34.6 0-39.9 27-39.9 54.9V416h-66.4V202.2h63.7v29.2h.9c8.9-16.8 30.6-34.5 62.9-34.5 67.2 0 79.7 44.3 79.7 101.9V416z"/></svg> @Tengku Muhammad Hanis](https://www.linkedin.com/in/tengku-muhammad-hanis-9a7222144/)

---

## Outline

### - A bit about bibliometrics
### - Demo in R (not live demo)
### - Recap - brief step to do a bibliometric analysis

---

## What is biliometrics?

- Basically, an analysis of bibliographic information
- What we can do with bibliometric analysis:
  - Evaluate research progress
  - Quantitatively summarise research output
  - "Mapping" research contributions by author, institution, etc
  - Explore research trends
- When to do a bibliometric?
  - Too many literature
  - The scope (research question) is broad
- A lot of papers available on [Google Scholar (>45K papers)](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=allintitle%3A+bibliometrics+OR+bibliometric&btnG=)
  
---

## Bibliographic information

---

## Type of analysis

---
class: middle, center
background-color: white

## Bibliometrics using bibliometrix

---

## Data

- Data was downloaded from the [Scopus](https://www.scopus.com/search/form.uri?display=basic&clear=t&origin=searchadvanced&txGid=f676aa0f046baff36a28dd0a11c7427e#basic) through USM's access
- Search terms used:  
  _**(TITLE("covid-19") OR TITLE("covid 19") OR TITLE("2019-nCov") OR TITLE("SARS-CoV-2") OR TITLE("2019 novel coronavirus") OR TITLE("coronavirus disease 2019") OR TITLE("coronavirus disease-19")) AND (TITLE-ABS-KEY(Malaysia))**_
- The search result was further filtered into:
  1. Article
  2. Review 
  3. Conference paper
  4. Data paper
- Final data - 1,265 papers

---

## Load packages

```r
library(tidyverse)
library(bibliometrix)
```

## Read data

Data should downloaded in a BibTex format for Scopus database

```r
dat <- convert2df("data/covid_msia.bib", dbsource = "scopus", format = "bibtex")
```

## Descriptive

```r
res <- biblioAnalysis(dat)
summary(res, k = 10)
```

`k` reflects the number of results to be displayed. Since the result is too long, the output is available [here](https://tengku-hanis.github.io/add_rconf2022/).

---

## Basic plots

```r
plot(res, k = 10)
```

.panelset.sideways[
.panel[.panel-name[**Plot 1**]
<img src="index_files/figure-html/basic plots2-1.png" width="80%" style="display: block; margin: auto;" />
]

.panel[.panel-name[**Plot 2**]
<img src="index_files/figure-html/basic plots3-1.png" width="80%" style="display: block; margin: auto;" />
]

.panel[.panel-name[**Plot 3**]
<img src="index_files/figure-html/basic plots4-1.png" width="80%" style="display: block; margin: auto;" />
]

.panel[.panel-name[**Plot 4**]
<img src="index_files/figure-html/basic plots5-1.png" width="80%" style="display: block; margin: auto;" />
]

.panel[.panel-name[**Plot 5**]
<img src="index_files/figure-html/basic plots6-1.png" width="80%" style="display: block; margin: auto;" />
]
]

---

## Collaborations

### Institutions

```r
# Create network
MT <- metaTagExtraction(dat, Field = "AU_CO", sep = ";")
country_collab <- biblioNetwork(MT, analysis = "collaboration", network = "universities")

# Plot network
set.seed(123)
int_collab <- networkPlot(country_collab, n = 20, cluster = "louvain", Title = "Institutions collaboration",
    type = "circle", size.cex = T)
```

---

Let's see only 2 institutions for each 3 clusters as suggested by the louvain algorithm.

```r
int_collab$cluster_res %>%
    group_by(cluster) %>%
    slice(1:2)
```

```
## # A tibble: 6 × 5
## # Groups:   cluster [3]
##   vertex                         cluster btw_centrality clos_centrality pagera…¹
##   <chr>                            <dbl>          <dbl>           <dbl>    <dbl>
## 1 universiti kebangsaan malaysia       1          15.3           0.0256   0.0658
## 2 universiti teknologi mara            1          10.2           0.0244   0.0573
## 3 university of malaya                 2          23.5           0.0256   0.0721
## 4 universiti sains malaysia            2          54.2           0.0278   0.0643
## 5 universiti malaysia terengganu       3           2.11          0.0238   0.0477
## 6 monash university malaysia           3          15.4           0.025    0.0497
## # … with abbreviated variable name ¹pagerank_centrality
```

The top 5 most active institutions based on the institutions collaboration plot.

```r
int_collab$nodeDegree %>%
    slice(1:5) %>%
    remove_rownames()
```

```
##                             node    degree
## 1      universiti sains malaysia 1.0000000
## 2 universiti kebangsaan malaysia 0.9946043
## 3           university of malaya 0.9262590
## 4      universiti teknologi mara 0.6402878
## 5      universiti putra malaysia 0.5359712
```

---

### Countries

```r
# Create network
country_collab <- biblioNetwork(MT, analysis = "collaboration", network = "countries")

# Plot network
set.seed(123)
country_collab <- networkPlot(country_collab, n = 30, cluster = "louvain", Title = "Countries collaboration",
    type = "circle", size.cex = T)
```

---

Let's see only 3 countries for each 2 clusters as suggested by the louvain algorithm.

```r
country_collab$cluster_res %>%
    group_by(cluster) %>%
    slice(1:3)
```

```
## # A tibble: 6 × 5
## # Groups:   cluster [2]
##   vertex         cluster btw_centrality clos_centrality pagerank_centrality
##   <chr>            <dbl>          <dbl>           <dbl>               <dbl>
## 1 malaysia             1          36.6           0.0244              0.0329
## 2 singapore            1          29.7           0.0233              0.0327
## 3 australia            1          27.2           0.0233              0.0327
## 4 china                2          21.2           0.0222              0.0374
## 5 indonesia            2           9.34          0.0204              0.0370
## 6 united kingdom       2          21.2           0.0222              0.0374
```

The top 5 most active countries based on the countries collaboration plot.

```r
country_collab$nodeDegree %>%
    slice(1:5) %>%
    remove_rownames()
```

```
##             node    degree
## 1       malaysia 1.0000000
## 2 united kingdom 0.1579826
## 3          china 0.1469606
## 4      indonesia 0.1456246
## 5            usa 0.1319305
```

---

## Lotka's law

Lotka's law describe the scientific productivity in certain research area.

```r
L <- lotka(res)
L$p.value
```

```
## [1] 0.01455341
```

Kolmogorov-Smirnov two sample test showed that there is a significant difference between the observed and theoretical distribution.

We can further plot the two distributions.

```r
# Theoretical distribution with Beta = 2
Theoretical <- 10^(log10(L$C) - 2 * log10(L$AuthorProd[, 1]))

# Data wrangling for plotting
ldata <- L$AuthorProd %>%
    bind_cols(theory = Theoretical) %>%
    pivot_longer(cols = c(Freq, theory), names_to = "distribution", values_to = "val_distr") %>%
    mutate(distribution = as.factor(distribution), distribution = fct_recode(distribution,
        Observed = "Freq", Theoretical = "theory"))
```

---

```r
ggplot(ldata, aes(N.Articles, val_distr, color = distribution)) + geom_line(linewidth = 1) +
    labs(color = "Distribution:") + ylab("Frequency of authors") + xlab("Number of articles") +
    theme_minimal() + theme(legend.position = "top") + annotate("text", label = paste0("P-value = ",
    round(L$p.value, 3)), x = 12, y = 0.1, size = 4)
```

Lotka's law plot indicated that the observed distribution is significantly lower than the expected distribution.

---

## Trending keywords

```r
trend_kw <- fieldByYear(dat, field = "DE", timespan = c(2020, 2022), min.freq = 3,
    n.items = 6, graph = FALSE, labelsize = 3)
trend_kw$graph + theme_bw()
```

---

## Biblioshiny

---

## Steps for bibliometric analysis

1. Define scope, objectives and selection criteria 
2. Specify techniques (based on objectives)
3. Comes up with search terms - always check with the databases whether the terms valid or not
4. Data searching on databases:
  - SCOPUS, WOS, Digital Science Dimensions, PubMed, Lens or Cochrane
5. Review downloaded data/abstracts if needed (brief review)
6. Run bibliometric analysis and report finding

---

## References

- Donthu, N., Kumar, S., Mukherjee, D., Pandey, N., & Lim, W. M. (2021). How to conduct a bibliometric analysis: An overview and guidelines. Journal of Business Research, 133, 285–296. https://doi.org/10.1016/j.jbusres.2021.04.070
- Aria, M., & Cuccurullo, C. (2017). bibliometrix : An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975. https://doi.org/10.1016/j.joi.2017.08.007 
- Hood, W.W., Wilson, C.S. The Literature of Bibliometrics, Scientometrics, and Informetrics. Scientometrics 52, 291 (2001). https://doi.org/10.1023/A:1017919924342

---
class: center, middle

<img src="figs/logo2.png" width="60%" />
# Thanks!

.center[.footnote[<svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M579.8 267.7c56.5-56.5 56.5-148 0-204.5c-50-50-128.8-56.5-186.3-15.4l-1.6 1.1c-14.4 10.3-17.7 30.3-7.4 44.6s30.3 17.7 44.6 7.4l1.6-1.1c32.1-22.9 76-19.3 103.8 8.6c31.5 31.5 31.5 82.5 0 114L422.3 334.8c-31.5 31.5-82.5 31.5-114 0c-27.9-27.9-31.5-71.8-8.6-103.8l1.1-1.6c10.3-14.4 6.9-34.4-7.4-44.6s-34.4-6.9-44.6 7.4l-1.1 1.6C206.5 251.2 213 330 263 380c56.5 56.5 148 56.5 204.5 0L579.8 267.7zM60.2 244.3c-56.5 56.5-56.5 148 0 204.5c50 50 128.8 56.5 186.3 15.4l1.6-1.1c14.4-10.3 17.7-30.3 7.4-44.6s-30.3-17.7-44.6-7.4l-1.6 1.1c-32.1 22.9-76 19.3-103.8-8.6C74 372 74 321 105.5 289.5L217.7 177.2c31.5-31.5 82.5-31.5 114 0c27.9 27.9 31.5 71.8 8.6 103.9l-1.1 1.6c-10.3 14.4-6.9 34.4 7.4 44.6s34.4 6.9 44.6-7.4l1.1-1.6C433.5 260.8 427 182 377 132c-56.5-56.5-148-56.5-204.5 0L60.2 244.3z"/></svg> Material: https://tinyurl.com/biblio-rconference2022]]