Package 'prismaid'

Title: Open Science AI Tools for Systematic, Protocol-Based Literature Reviews
Description: prismAId leverages generative AI models to screen and extract data from scientific literature. It provides efficient, replicable, and user-friendly methods for conducting systematic reviews. Designed in line with Open Science practices, prismAId requires no coding skills to use.
Authors: Riccardo Boero [aut, cre] (ORCID: <https://orcid.org/0000-0002-7468-9096>)
Maintainer: Riccardo Boero <[email protected]>
License: AGPL (>= 3)
Version: 0.12.0
Built: 2026-06-03 13:41:52 UTC
Source: https://github.com/open-and-sustainable/prismaid

Help Index


Convert Files to Different Formats

Description

Converts files in the input directory to the requested formats.

Usage

Convert(input_dir, selected_formats, tika_address = "", single_file = "",
  ocr_only = FALSE)

Arguments

input_dir

Directory containing files to convert

selected_formats

Comma-separated list of target formats (e.g., "pdf,docx,html")

tika_address

Tika server address for OCR fallback (e.g., "localhost:9998"). Empty string disables OCR fallback. Defaults to "".

single_file

Convert only the specified PDF (PDF format only). Defaults to "".

ocr_only

Force OCR for PDFs via Tika (PDF format only). Defaults to FALSE.

Details

This function converts files in a directory to specified formats.

Value

A string indicating the result of the conversion process

Examples

## Not run: 
Convert("/path/to/files", "pdf,docx")

Convert("/path/to/files", "pdf", "localhost:9998")

Convert("/path/to/files", "pdf", "localhost:9998", "", TRUE)

## End(Not run)

Download Files from URL List

Description

Reads a file containing a list of URLs (one per line) and downloads the files.

Usage

DownloadURLList(path)

Arguments

path

Path to the file containing URLs

Details

This function downloads files from URLs listed in a file.

Value

A string indicating the result of the download process

Examples

## Not run: 
DownloadURLList("/path/to/url_list.txt")

## End(Not run)

Download PDFs from Zotero

Description

Downloads PDF documents from a Zotero collection using the Zotero API. The TOML configuration must contain a '[zotero]' table and may contain an optional '[revaise]' block to document the download in a RevAIse record.

Usage

DownloadZotero(toml_content)

Arguments

toml_content

Zotero TOML configuration as a string

Details

This function downloads PDFs from Zotero using a TOML configuration.

Value

A string indicating the result of the download process

Examples

## Not run: 
toml_content <- paste(readLines("zotero.toml"), collapse = "\n")
DownloadZotero(toml_content)

## End(Not run)

Run Review

Description

The input data must be structured in a TOML format, consisting of several sections and parameters.

Usage

RunReview(input_string)

Arguments

input_string

A string representing the input data.

Details

This function interfaces with a shared library to perform a review process on the input data.

[project]

  • name: A string representing the project title. Example: "Use of LLM for systematic review".

  • author: The name of the project author. Example: "John Doe".

  • version: The version number for the project configuration. Example: "1.0".

[project.configuration]

  • input_directory: The file path to the directory containing manuscripts to be reviewed. Example: "/path/to/txt/files".

  • results_file_name: The path and base name for saving results (file extension will be added automatically). Example: "/path/to/save/results".

  • output_format: The format for output results. Options: "csv" (default) or "json".

  • log_level: Determines logging verbosity:

    • "low": Minimal logging (default).

    • "medium": Logs to standard output.

    • "high": Logs to a file (see user manual for details).

  • duplication: Runs model queries twice for debugging purposes. Options: "yes" or "no" (default).

  • cot_justification: Requests chain-of-thought justification from the model. Options: "yes" or "no" (default).

  • summary: Generates and saves summaries of manuscripts. Options: "yes" or "no" (default).

[project.llm]

  • Configuration for LLMs, supporting multiple instances (llm.1, llm.2, etc.) for ensemble reviews.

  • Parameters for each LLM include:

    • provider: The LLM service provider. Options: "OpenAI", "GoogleAI", "Cohere", "Anthropic", "DeepSeek", or "Perplexity".

    • api_key: API key for the provider. If empty, environment variables will be checked.

    • model: Model name. Options vary by provider:

      • OpenAI: "gpt-5-nano", "gpt-5-mini", "gpt-5.2", "gpt-5.1", "gpt-5", "o4-mini", "o3-mini", "o3", "o1-mini", "o1", "gpt-4.1-nano", "gpt-4.1-mini", "gpt-4.1", "gpt-4o-mini", "gpt-4o", "gpt-4-turbo", "gpt-3.5-turbo", or "" (default for cost optimization).

      • GoogleAI: "gemini-3-flash-preview", "gemini-3-pro-preview", "gemini-2.5-flash-lite", "gemini-2.5-flash", "gemini-2.5-pro", "gemini-2.0-flash-lite", "gemini-2.0-flash", "gemini-1.5-flash", "gemini-1.5-pro", or "" (default for cost optimization).

      • Cohere: "command-a-reasoning-08-2025", "command-a-03-2025", "command-r-08-2024", "command-r7b-12-2024", "command-r-plus", "command-r", "command-light", "command", or "" (default for cost optimization).

      • Anthropic: "claude-4-5-haiku", "claude-4-5-sonnet", "claude-4-5-opus", "claude-4-0-opus", "claude-4-0-sonnet", "claude-3-7-sonnet", "claude-3-5-sonnet", "claude-3-5-haiku", "claude-3-opus", "claude-3-sonnet", "claude-3-haiku", or "" (default for cost optimization).

      • DeepSeek: "deepseek-chat", "deepseek-reasoner", or "" (default for cost optimization).

      • Perplexity: "sonar-deep-research", "sonar-reasoning-pro", "sonar-pro", "sonar", or "" (default for cost optimization).

    • temperature: Controls model randomness. Range: 0 to 1 (0 to 2 for GoogleAI). Lower values reduce randomness.

    • tpm_limit: Tokens per minute limit before delaying prompts. Default: 0 (no delay).

    • rpm_limit: Requests per minute limit before delaying prompts. Default: 0 (no delay).

[prompt]

  • Defines the main components of the prompt for reviews.

  • persona: Optional text specifying the model's role. Example: "You are an experienced scientist...".

  • task: Required text framing the task for the model. Example: "Map the concepts discussed in a scientific paper...".

  • expected_result: Required text describing the expected output structure in JSON.

  • definitions: Optional text defining concepts to clarify instructions. Example: "'Interest rate' is defined as...".

  • example: Optional example to illustrate concepts.

  • failsafe: Specifies a fallback if the concepts cannot be identified. Example: "Respond with an empty ” value if concepts are unclear".

[review]

  • Defines the keys and possible values in the JSON object for the review.

  • Example entries:

    • [review.1]: key = "interest rate", ⁠values = [""]⁠

    • [review.2]: key = "regression models", ⁠values = ["yes", "no"]⁠

    • [review.3]: key = "geographical scale", ⁠values = ["world", "continent", "river basin"]⁠

Value

A string indicating the result of the review process.

Examples

RunReview("example input")

Screen Manuscripts for Systematic Review

Description

Processes a list of manuscripts applying multiple filters to identify which should be excluded from a systematic review. Supports both rule-based and AI-assisted screening.

Usage

Screening(input_string)

Arguments

input_string

A string containing the TOML configuration for screening.

Details

This function screens manuscripts to identify items for exclusion based on various criteria.

The input data must be structured in a TOML format with the following sections:

[project]

  • name: Project title. Example: "Screening for climate change literature".

  • author: Name of the project author. Example: "Jane Smith".

  • version: Version number for the configuration. Example: "1.0".

  • input_file: Path to CSV/JSON file containing manuscripts to screen. Example: "/path/to/manuscripts.csv".

  • output_file: Path where screening results will be saved. Example: "/path/to/screening_results".

  • text_column: Name of column containing text or path to text files. Example: "abstract" or "text_file_path".

  • identifier_column: Column name for unique manuscript identifiers. Example: "doi" or "id".

  • output_format: Format for results. Options: "csv" or "json".

  • log_level: Logging verbosity. Options: "low", "medium", or "high".

[filters.deduplication]

  • enabled: Whether to apply deduplication. Options: true or false.

  • use_ai: Use AI for semantic similarity detection. Options: true or false.

  • compare_fields: List of fields to compare. Example: "title", "abstract", "doi".

[filters.language]

  • enabled: Whether to filter by language. Options: true or false.

  • accepted_languages: List of accepted language codes. Example: "en", "es", "fr".

  • use_ai: Use AI for language detection. Options: true or false.

[filters.article_type]

  • enabled: Whether to filter by article type. Options: true or false.

  • use_ai: Use AI for article classification. Options: true or false.

  • exclude_reviews: Exclude review articles. Options: true or false.

  • exclude_editorials: Exclude editorial articles. Options: true or false.

  • exclude_letters: Exclude letters to editor. Options: true or false.

  • exclude_theoretical: Exclude theoretical papers. Options: true or false.

  • exclude_empirical: Exclude empirical studies. Options: true or false.

  • exclude_methods: Exclude methodology papers. Options: true or false.

  • exclude_single_case: Exclude single case studies. Options: true or false.

  • exclude_sample: Exclude sample-based studies. Options: true or false.

  • include_types: Specific article types to include. Example: "research", "case_study".

[filters.topic_relevance]

  • enabled: Whether to filter by topic relevance. Options: true or false.

  • use_ai: Use AI for relevance scoring. Options: true or false.

  • topics: List of topic descriptions. Example: "climate change impacts", "adaptation strategies".

  • min_score: Minimum relevance score (0-1) to include. Example: 0.7.

  • score_weights.keyword_match: Weight for keyword matching (0-1). Example: 0.3.

  • score_weights.concept_match: Weight for concept matching (0-1). Example: 0.4.

  • score_weights.field_relevance: Weight for field relevance (0-1). Example: 0.3.

[filters.llm] (Optional, required if any filter has use_ai = true)

  • Configuration for AI models, supporting multiple instances (llm.1, llm.2, etc.).

  • Parameters for each LLM:

    • provider: LLM service provider. Options: "OpenAI", "GoogleAI", "Cohere", "Anthropic", "DeepSeek", or "Perplexity".

    • api_key: API key for the provider. If empty, environment variables will be checked.

    • model: Model name (see RunReview documentation for available models per provider).

    • temperature: Controls randomness (0-1, or 0-2 for GoogleAI).

    • tpm_limit: Tokens per minute limit. Default: 0 (no limit).

    • rpm_limit: Requests per minute limit. Default: 0 (no limit).

Value

A string indicating the result of the screening process.

Examples

## Not run: 
config <- '
[project]
name = "Climate Literature Screening"
author = "Research Team"
version = "1.0"
input_file = "/data/manuscripts.csv"
output_file = "/results/screening"
text_column = "abstract"
identifier_column = "doi"
output_format = "csv"
log_level = "medium"

[filters.deduplication]
enabled = true
use_ai = false
compare_fields = ["title", "doi"]

[filters.language]
enabled = true
accepted_languages = ["en"]
use_ai = false

[filters.article_type]
enabled = true
use_ai = false
exclude_reviews = true
exclude_editorials = true
'
Screening(config)

## End(Not run)