library(dplyr)
library(knitr)
library(flextable)
#remotes::install_github('massimoaria/bibliometrix')
library(bibliometrix)
Generating a conflict of interest form automatically
Overview
While most of my grant proposals go to the NIH, which thankfully does not require an arcane conflict of interest (COI) document, I am sometimes part of a grant proposal that goes to the NSF or another agency that require a COI. Instead of just asking for any actual conflicts of interest, these documents ask one to list every co-author in the last N years, which is fairly stupid these days when most papers in the biomedical sciences have many co-authors. I hope the agencies get rid of this in my opinion pointless document soon. Until then, I have to do it.
I don’t want to retrieve all my co-authors and fill the form by hand. In a previous post, I showed how one can use the bibliometrix
R package to do an analysis of a set of publications. Among other things, this approach returns all co-authors, which I will use here to make the COI table almost completely automated.
The RMarkdown/Quarto file to run this analysis is here.
Required packages
Loading data
As explained in a previous post, the currently best way to get all my papers is to download them from NIH’s “My Bibliography” and export it in MEDLINE format. Then read in the file with the code below.
#read bib file, turn file of references into data frame
<- bibliometrix::convert2df("medline.txt", dbsource="pubmed",format="pubmed") pubs
Converting your pubmed collection into a bibliographic dataframe
Done!
Generating affiliation field tag AU_UN from C1: Done!
Each row of the data frame created by the convert2df
function is a publication, the columns contain information for each publication. For a list of what each column variable codes for, see the bibliometrix website.
Getting the right time period
This specific funding agency I’m currently writing a COI for (NIFA) requires co-authors of the last 3 years, so let’s get them. I don’t know if they mean 3 full years. I’m doing this mid-2020, so to be on safe side, I go back to 2017.
= 2017
period_start = pubs[pubs$PY>=period_start,] pubs_new
I need the full names of the authors. They are stored for each publication in the AF field. This is the only information I need for the COI form. I pull it out, then do a bit of processing to get it in the right shape, then remove duplicates and sort.
= paste0(pubs_new$AF,collapse = ";") #merge all authors into one vector
allauthors = unlist(strsplit(allauthors, split =";"))
allauthors2 = sort(unique(allauthors2)) #split vector of authors, get unique authors authors
Note that I originally did the above steps using biblioAnalysis(pubs_new)
. However, this function/approach broke in a recent version of the package, and I realized that I can just use a few base R commands to get what I need, which is the approach shown above. If you use the biblioAnalysis()
function, the Authors are in the Authors
field of the returned object.
Discussion
These kinds of COI documents that ask for all co-authors are in my opinion antiquated and should go away. In the meantime using a somewhat automated approach makes the problem not too bad. I will have to make a few manual adjustments to the table, but overall it’s not too bad. I’m still glad that NIH does not require this.