KWICer: Producing an annotated bibliography from a set of PDFs by quantifying keywords

March 3, 2026

This code can be used to rapidly create an annotated bibliography that will help users navigate and synthesize a large body of literature. Users can input a set of PDF or DOCX files that they have identified as relevant to their question as well as lists of relevant search terms. This code will convert the input documents into TXT files, trim the files to exclude extraneous text such as the references section of a scientific paper, and transform the body of the document into tokens that are easily searchable. It will then count the number of times each supplied search term occurs in each source, as well as identify occurrences of North American states and provinces, and print them to a CSV file. This CSV file can be used to identify documents that are most likely to address specific aspects of a research question by sorting relevant search terms by frequency. The code also generates a set of figures that characterizes the nature and content of the sources, including 1) a map which shows the number of sources that referenced each North American state or province 2) a graph which shows the number of sources that referenced different ecosystem types, and 3) a heatmap which shows the number of sources that mention intersecting search terms. Users can easily customize their search term lists to apply this code to their specific research question.

Publication Year	2026
Title	KWICer: Producing an annotated bibliography from a set of PDFs by quantifying keywords
DOI	10.5066/P1476GUY
Authors	Lydia N Bailey, Dana M Varner, Sarah E Whipple
Product Type	Software Release
Record Source	USGS Asset Identifier Service (AIS)
USGS Organization	Fort Collins Science Center
Rights	This work is licensed under CC BY 4.0

KWICer: Producing an annotated bibliography from a set of PDFs by quantifying keywords

Biologist

Biologist

Biologist, National CASC

Biologist

Biologist

Biologist, National CASC

Fort Collins Science Center

KWICer: Producing an annotated bibliography from a set of PDFs by quantifying keywords

Citation Information

Related

Lydia N Bailey, PhD

Biologist

Dana M Varner, PhD

Biologist

Sarah Whipple, Ph.D.

Biologist, National CASC

Related

Lydia N Bailey, PhD

Biologist

Dana M Varner, PhD

Biologist

Sarah Whipple, Ph.D.

Biologist, National CASC