class: center, middle, inverse, title-slide .title[ # Exploring the COVID-19 Research Landscape in Malaysia: A Bibliometric Case Study ] .subtitle[ ## R confeRence 2024 ] .author[ ### Tengku Muhammad Hanis Bin Tengku Mokhtar, PhD ] .institute[ ### Founder and Academic Trainer,
Jom Research ] .date[ ### October 27, 2024 ] --- name: about-me layout: false class: about-me-slide, inverse, middle, center
## About me <img style="border-radius: 50%;" src="tengku_hanis.jpg" width="150px"/> <p style="font-size:20pt; font-weight:bold; font-family:lato">Tengku Muhammad Hanis Bin Tengku Mokhtar</p> **Founder and Academic Trainer, Jom Research** **MBBCh, MSc (Medical Statistics), <br>PhD (Public Health Epidemiology)** [
tengkuhanis.netlify.app](https://tengkuhanis.netlify.app/) [
Twitter
@tmhanis](https://twitter.com/tmhanis) [
GitHub
@tengku-hanis](https://github.com/tengku-hanis)<br> [
LinkedIn
@Tengku Muhammad Hanis](https://www.linkedin.com/in/tengku-muhammad-hanis-9a7222144/) [
jomresearch.netlify.app](https://jomresearch.netlify.app/) --- ## COVID-19 - Coronavirus disease (COVID-19) is an infectious disease caused by the SARS-CoV-2 virus - Previously, we had also been affected by the severe acute respiratory syndrome (SARS) and middle east respiratory syndrome (MERS) - The virus has affected countries globally, including Malaysia - Since the outbreak, extensive research has been conducted to understand and manage the disease - Thus, we can use a bibliometric analysis to explore the landscape of this research area <img src="covid_pic.jpg" width="40%" style="display: block; margin: auto;" /> --- ## Bibliometric analysis - Bibliometric analysis is an umbrella term for a set of analyses - Basically, it refers to an analysis of bibliographic information - Bibliometric analysis is explorative in nature - In this talk, we going to use both R (bibliometrix) and Python (pyBibX) <img src="biblio_summary.png" width="70%" style="display: block; margin: auto;" /> --- ## Data and search strategy | Parameters | Details | | ------------- |:-------------------:| | Database | Scopus | | Date searched | September30, 2024 | | Search terms | TITLE ( covid-19 ) OR TITLE ( covid ) OR TITLE ( 2019-ncov ) OR TITLE ( coronavirus ) AND ( LIMIT-TO ( AFFILCOUNTRY , "Malaysia" ) ) AND ( LIMIT-TO ( DOCTYPE , "ar" ) OR LIMIT-TO ( DOCTYPE , "cp" ) OR LIMIT-TO ( DOCTYPE , "re" ) OR LIMIT-TO ( DOCTYPE , "ch" ) OR LIMIT-TO ( DOCTYPE , "ed" ) OR LIMIT-TO ( DOCTYPE , "bk" ) ) | | Results | 5,995 | --- ## Analysis in R Packages: ``` r library(dplyr) library(ggplot2) library(bibliometrix) ``` Descriptive: ``` r res <- biblioAnalysis(covid_data) summary(res, k = 10) ``` ``` ## ## ## MAIN INFORMATION ABOUT DATA ## ## Timespan 2008 : 2025 ## Sources (Journals, Books, etc) 2238 ## Documents 5932 ## Annual Growth Rate % 8.5 ## Document Average Age 1.97 ## Average citations per doc 14.37 ## Average citations per year per doc 3.967 ``` --- ## Analysis in R (cont.) Basic plots: ``` r allplots <- plot(res) # we get 5 plots here ``` ``` r allplots$AnnualScientProd ``` <img src="r/asp.png" width="65%" style="display: block; margin: auto;" /> --- ## Analysis in R (cont.) ``` r allplots$AverTotCitperYear ``` <img src="r/atc.png" width="80%" style="display: block; margin: auto;" /> --- ## Analysis in R (cont.) Top authors in the research area: ``` r authorProdOverTime(covid_data, k=10) ``` <img src="r/top_author.png" width="80%" style="display: block; margin: auto;" /> --- ## Analysis in R (cont.) thematic map: ``` r thematicMap(covid_data, field = 'DE', n.labels = 5) ``` <img src="r/tm.png" width="75%" style="display: block; margin: auto;" /> --- ## Analysis in Python Install libraries: - Make sure to install reticulate package first (if using RStudio) - Some of the functions are not well executed in RStudio ``` python reticulate::py_install('pyBibX') ``` Load libraries ``` python import numpy as np import pandas as pd import textwrap from pyBibX.base import pbx_probe #read data ``` --- ## Analysis in Python (cont.) Most common n-grams from author's keywords (entry = 'kwa') ``` python rm = ['2019', 'disease', '19', 'pandemic'] #remove few words cov_data.get_top_ngrams(view = 'notebook', entry = 'kwa', ngrams = 2, stop_words = ['en'], rmv_custom_words = rm, wordsn = 15) ``` <img src="python/topbigrams.png" width="72%" style="display: block; margin: auto;" /> --- ## Analysis in Python (cont.) Top keywords evolution for the last 10 years: ``` python rm2 = ['pandemic', 'covid', 'covid-19', 'sars-cov-2', 'coronavirus', 'malaysia', 'covid-19 pandemic'] cov_data.plot_evolution_year(view = 'notebook', stop_words = ['en'], rmv_custom_words = rm2, key = 'kwa', topn = 5, txt_font_size = 12, start = 2015, end = 2024) ``` <img src="python/kw_evo0.png" width="50%" style="display: block; margin: auto;" /> --- ## Analysis in Python (cont.) Let's focus on the first first 5 years (2015-2019) ``` python cov_data.plot_evolution_year(view = 'notebook', stop_words = ['en'], rmv_custom_words = rm2, key = 'kwa', topn = 5, txt_font_size = 12, start = 2015, end = 2019) ``` <img src="python/kw_evo1.png" width="52%" style="display: block; margin: auto;" /> --- ## Analysis in Python (cont.) Next, let's see the next 5 years (2020-2024) ``` python cov_data.plot_evolution_year(view = 'notebook', stop_words = ['en'], rmv_custom_words = rm2, key = 'kwa', topn = 5, txt_font_size = 12, start = 2019, end = 2024) ``` <img src="python/kw_evo2.png" width="52%" style="display: block; margin: auto;" /> --- ## Analysis in Python (cont.) Tree map of countries: ``` python cov_data.tree_map(entry = 'ctr', topn = 20, size_x = 30, size_y = 15, txt_font_size = 22) ``` <img src="python/treemap_country.png" width="75%" style="display: block; margin: auto;" /> --- ## Analysis in Python (cont.) Further details on countries collaboration: ``` python cov_data.network_adj_map(view = 'notebook', connections = False, country_lst = []) ``` <img src="python/country_collab.png" width="55%" style="display: block; margin: auto;" /> --- ## Analysis in Python (cont.) Alternatively, we can use bibliometrix package from R: ``` r # Extract authors' countries author_country <- metaTagExtraction(dat, Field = "AU_CO") # Country collab network country_collab <- biblioNetwork(author_country, analysis = "collaboration", network = "countries") # Plot the network set.seed(123) networkPlot(country_collab, n = 20, cluster = "none", Title = "Collaboration between top 20 countries using bibliometrix (R)", type = "auto", size.cex = T) ``` <img src="r/country_collab_r.png" width="50%" style="display: block; margin: auto;" /> --- ## Conclusion - We have a good research landscape especially in terms of availability of the experts and collaboration between countries - We see a few dominant research area such as COVID-19 research related online learning and mental health - COVID-19 research in Malaysia before 2019 were almost non-existent - COVID-19 research related to AI especially deep learning in Malaysia have yet to be extensively studied --- ## Some remarks - Both bibliometrix (R) and pyBibX (python) have their own advantages - bibliometrix vs pyBibX : - pyBibX is a little bit heavier to run (biased opinion) - pyBibX takes a bit more effort to be able to run smoothly in RStudio - pyBibX is a larger package (and probably more powerful) than bibliometrix - Both packages have several similar overlapped functions - Both packages have limited documentations <img src="bibliometrix_logo.png" width="40%" style="background-color: #4131e8; padding:2px;" /><img src="pybibx_logo.png" width="40%" style="background-color: #4131e8; padding:2px;" /> --- ## Suggested readings - About bibliometric analysis: - [bibliometrix : An R-tool for comprehensive science mapping analysis](https://doi.org/10.1016/j.joi.2017.08.007) - [How to conduct a bibliometric analysis: An overview and guidelines. Journal of Business Research](https://doi.org/10.1016/j.jbusres.2021.04.070) - [Preliminary guideline for reporting bibliometric reviews of the biomedical literature (BIBLIO): a minimum requirements](https://doi.org/10.1186/s13643-023-02410-2) - pyBibX: https://github.com/Valdecy/pyBibX - bibliometrix: https://github.com/massimoaria/bibliometrix --- class: center, middle # Thank you! .center[
Slides: https://tengku-hanis.github.io/bibliocovidmalaysia/] <img src="qrcode.png" width="25%" />