| Title: | Open Science AI Tools for Systematic, Protocol-Based Literature Reviews |
|---|---|
| Description: | prismAId leverages generative AI models to screen and extract data from scientific literature. It provides efficient, replicable, and user-friendly methods for conducting systematic reviews. Designed in line with Open Science practices, prismAId requires no coding skills to use. |
| Authors: | Riccardo Boero [aut, cre] (ORCID: <https://orcid.org/0000-0002-7468-9096>) |
| Maintainer: | Riccardo Boero <[email protected]> |
| License: | AGPL (>= 3) |
| Version: | 0.12.0 |
| Built: | 2026-06-03 13:41:52 UTC |
| Source: | https://github.com/open-and-sustainable/prismaid |
Converts files in the input directory to the requested formats.
Convert(input_dir, selected_formats, tika_address = "", single_file = "", ocr_only = FALSE)Convert(input_dir, selected_formats, tika_address = "", single_file = "", ocr_only = FALSE)
input_dir |
Directory containing files to convert |
selected_formats |
Comma-separated list of target formats (e.g., "pdf,docx,html") |
tika_address |
Tika server address for OCR fallback (e.g., "localhost:9998"). Empty string disables OCR fallback. Defaults to "". |
single_file |
Convert only the specified PDF (PDF format only). Defaults to "". |
ocr_only |
Force OCR for PDFs via Tika (PDF format only). Defaults to FALSE. |
This function converts files in a directory to specified formats.
A string indicating the result of the conversion process
## Not run: Convert("/path/to/files", "pdf,docx") Convert("/path/to/files", "pdf", "localhost:9998") Convert("/path/to/files", "pdf", "localhost:9998", "", TRUE) ## End(Not run)## Not run: Convert("/path/to/files", "pdf,docx") Convert("/path/to/files", "pdf", "localhost:9998") Convert("/path/to/files", "pdf", "localhost:9998", "", TRUE) ## End(Not run)
Reads a file containing a list of URLs (one per line) and downloads the files.
DownloadURLList(path)DownloadURLList(path)
path |
Path to the file containing URLs |
This function downloads files from URLs listed in a file.
A string indicating the result of the download process
## Not run: DownloadURLList("/path/to/url_list.txt") ## End(Not run)## Not run: DownloadURLList("/path/to/url_list.txt") ## End(Not run)
Downloads PDF documents from a Zotero collection using the Zotero API. The TOML configuration must contain a '[zotero]' table and may contain an optional '[revaise]' block to document the download in a RevAIse record.
DownloadZotero(toml_content)DownloadZotero(toml_content)
toml_content |
Zotero TOML configuration as a string |
This function downloads PDFs from Zotero using a TOML configuration.
A string indicating the result of the download process
## Not run: toml_content <- paste(readLines("zotero.toml"), collapse = "\n") DownloadZotero(toml_content) ## End(Not run)## Not run: toml_content <- paste(readLines("zotero.toml"), collapse = "\n") DownloadZotero(toml_content) ## End(Not run)
The input data must be structured in a TOML format, consisting of several sections and parameters.
RunReview(input_string)RunReview(input_string)
input_string |
A string representing the input data. |
This function interfaces with a shared library to perform a review process on the input data.
[project]
name: A string representing the project title. Example: "Use of LLM for systematic review".
author: The name of the project author. Example: "John Doe".
version: The version number for the project configuration. Example: "1.0".
[project.configuration]
input_directory: The file path to the directory containing manuscripts to be reviewed. Example: "/path/to/txt/files".
results_file_name: The path and base name for saving results (file extension will be added automatically). Example: "/path/to/save/results".
output_format: The format for output results. Options: "csv" (default) or "json".
log_level: Determines logging verbosity:
"low": Minimal logging (default).
"medium": Logs to standard output.
"high": Logs to a file (see user manual for details).
duplication: Runs model queries twice for debugging purposes. Options: "yes" or "no" (default).
cot_justification: Requests chain-of-thought justification from the model. Options: "yes" or "no" (default).
summary: Generates and saves summaries of manuscripts. Options: "yes" or "no" (default).
[project.llm]
Configuration for LLMs, supporting multiple instances (llm.1, llm.2, etc.) for ensemble reviews.
Parameters for each LLM include:
provider: The LLM service provider. Options: "OpenAI", "GoogleAI", "Cohere", "Anthropic", "DeepSeek", or "Perplexity".
api_key: API key for the provider. If empty, environment variables will be checked.
model: Model name. Options vary by provider:
OpenAI: "gpt-5-nano", "gpt-5-mini", "gpt-5.2", "gpt-5.1", "gpt-5", "o4-mini", "o3-mini", "o3", "o1-mini", "o1", "gpt-4.1-nano", "gpt-4.1-mini", "gpt-4.1", "gpt-4o-mini", "gpt-4o", "gpt-4-turbo", "gpt-3.5-turbo", or "" (default for cost optimization).
GoogleAI: "gemini-3-flash-preview", "gemini-3-pro-preview", "gemini-2.5-flash-lite", "gemini-2.5-flash", "gemini-2.5-pro", "gemini-2.0-flash-lite", "gemini-2.0-flash", "gemini-1.5-flash", "gemini-1.5-pro", or "" (default for cost optimization).
Cohere: "command-a-reasoning-08-2025", "command-a-03-2025", "command-r-08-2024", "command-r7b-12-2024", "command-r-plus", "command-r", "command-light", "command", or "" (default for cost optimization).
Anthropic: "claude-4-5-haiku", "claude-4-5-sonnet", "claude-4-5-opus", "claude-4-0-opus", "claude-4-0-sonnet", "claude-3-7-sonnet", "claude-3-5-sonnet", "claude-3-5-haiku", "claude-3-opus", "claude-3-sonnet", "claude-3-haiku", or "" (default for cost optimization).
DeepSeek: "deepseek-chat", "deepseek-reasoner", or "" (default for cost optimization).
Perplexity: "sonar-deep-research", "sonar-reasoning-pro", "sonar-pro", "sonar", or "" (default for cost optimization).
temperature: Controls model randomness. Range: 0 to 1 (0 to 2 for GoogleAI). Lower values reduce randomness.
tpm_limit: Tokens per minute limit before delaying prompts. Default: 0 (no delay).
rpm_limit: Requests per minute limit before delaying prompts. Default: 0 (no delay).
[prompt]
Defines the main components of the prompt for reviews.
persona: Optional text specifying the model's role. Example: "You are an experienced scientist...".
task: Required text framing the task for the model. Example: "Map the concepts discussed in a scientific paper...".
expected_result: Required text describing the expected output structure in JSON.
definitions: Optional text defining concepts to clarify instructions. Example: "'Interest rate' is defined as...".
example: Optional example to illustrate concepts.
failsafe: Specifies a fallback if the concepts cannot be identified. Example: "Respond with an empty ” value if concepts are unclear".
[review]
Defines the keys and possible values in the JSON object for the review.
Example entries:
[review.1]: key = "interest rate", values = [""]
[review.2]: key = "regression models", values = ["yes", "no"]
[review.3]: key = "geographical scale", values = ["world", "continent", "river basin"]
A string indicating the result of the review process.
RunReview("example input")RunReview("example input")
Processes a list of manuscripts applying multiple filters to identify which should be excluded from a systematic review. Supports both rule-based and AI-assisted screening.
Screening(input_string)Screening(input_string)
input_string |
A string containing the TOML configuration for screening. |
This function screens manuscripts to identify items for exclusion based on various criteria.
The input data must be structured in a TOML format with the following sections:
[project]
name: Project title. Example: "Screening for climate change literature".
author: Name of the project author. Example: "Jane Smith".
version: Version number for the configuration. Example: "1.0".
input_file: Path to CSV/JSON file containing manuscripts to screen. Example: "/path/to/manuscripts.csv".
output_file: Path where screening results will be saved. Example: "/path/to/screening_results".
text_column: Name of column containing text or path to text files. Example: "abstract" or "text_file_path".
identifier_column: Column name for unique manuscript identifiers. Example: "doi" or "id".
output_format: Format for results. Options: "csv" or "json".
log_level: Logging verbosity. Options: "low", "medium", or "high".
[filters.deduplication]
enabled: Whether to apply deduplication. Options: true or false.
use_ai: Use AI for semantic similarity detection. Options: true or false.
compare_fields: List of fields to compare. Example: "title", "abstract", "doi".
[filters.language]
enabled: Whether to filter by language. Options: true or false.
accepted_languages: List of accepted language codes. Example: "en", "es", "fr".
use_ai: Use AI for language detection. Options: true or false.
[filters.article_type]
enabled: Whether to filter by article type. Options: true or false.
use_ai: Use AI for article classification. Options: true or false.
exclude_reviews: Exclude review articles. Options: true or false.
exclude_editorials: Exclude editorial articles. Options: true or false.
exclude_letters: Exclude letters to editor. Options: true or false.
exclude_theoretical: Exclude theoretical papers. Options: true or false.
exclude_empirical: Exclude empirical studies. Options: true or false.
exclude_methods: Exclude methodology papers. Options: true or false.
exclude_single_case: Exclude single case studies. Options: true or false.
exclude_sample: Exclude sample-based studies. Options: true or false.
include_types: Specific article types to include. Example: "research", "case_study".
[filters.topic_relevance]
enabled: Whether to filter by topic relevance. Options: true or false.
use_ai: Use AI for relevance scoring. Options: true or false.
topics: List of topic descriptions. Example: "climate change impacts", "adaptation strategies".
min_score: Minimum relevance score (0-1) to include. Example: 0.7.
score_weights.keyword_match: Weight for keyword matching (0-1). Example: 0.3.
score_weights.concept_match: Weight for concept matching (0-1). Example: 0.4.
score_weights.field_relevance: Weight for field relevance (0-1). Example: 0.3.
[filters.llm] (Optional, required if any filter has use_ai = true)
Configuration for AI models, supporting multiple instances (llm.1, llm.2, etc.).
Parameters for each LLM:
provider: LLM service provider. Options: "OpenAI", "GoogleAI", "Cohere", "Anthropic", "DeepSeek", or "Perplexity".
api_key: API key for the provider. If empty, environment variables will be checked.
model: Model name (see RunReview documentation for available models per provider).
temperature: Controls randomness (0-1, or 0-2 for GoogleAI).
tpm_limit: Tokens per minute limit. Default: 0 (no limit).
rpm_limit: Requests per minute limit. Default: 0 (no limit).
A string indicating the result of the screening process.
## Not run: config <- ' [project] name = "Climate Literature Screening" author = "Research Team" version = "1.0" input_file = "/data/manuscripts.csv" output_file = "/results/screening" text_column = "abstract" identifier_column = "doi" output_format = "csv" log_level = "medium" [filters.deduplication] enabled = true use_ai = false compare_fields = ["title", "doi"] [filters.language] enabled = true accepted_languages = ["en"] use_ai = false [filters.article_type] enabled = true use_ai = false exclude_reviews = true exclude_editorials = true ' Screening(config) ## End(Not run)## Not run: config <- ' [project] name = "Climate Literature Screening" author = "Research Team" version = "1.0" input_file = "/data/manuscripts.csv" output_file = "/results/screening" text_column = "abstract" identifier_column = "doi" output_format = "csv" log_level = "medium" [filters.deduplication] enabled = true use_ai = false compare_fields = ["title", "doi"] [filters.language] enabled = true accepted_languages = ["en"] use_ai = false [filters.article_type] enabled = true use_ai = false exclude_reviews = true exclude_editorials = true ' Screening(config) ## End(Not run)