AI for Research — Tony Myers

Research lifecycle

Match the model to the task

The question is not whether AI can be inserted into research. It is whether the contribution is defensible, auditable, and proportionate to the evidence. The stages below show where AI can add genuine value — and where the risks concentrate.

Conception

Data collection

Analysis

Interpretation

Writing

Peer review

Capability alignment: match the prompt to the research task

Each pipeline stage has a productive AI role and a clear boundary. The principle throughout: deploy AI for generation, critique, transformation, and organisation. Never use it as a final authority.

Use AI to structure inquiry; use scholarly databases and expert judgement to establish evidence.
Pipeline stage	Best AI role	Do not use for
Idea conception	Critical sparring partner	Sole creator of hypotheses
Literature search	Search-term planner	Citation source or factual database
Data collection	Wording and bias checker	Generating synthetic data
Data analysis	Code drafter and assumption checker	Analytical authority
Writing	Clarity and structure editor	Primary author
Peer review	Reviewer-simulation and response organiser	Final reviewer or evaluator

Voice interview bot

A voice-enabled qualitative interview bot can support data collection by conducting semi-structured interviews using natural speech. The example below uses DeepSeek for completions, ElevenLabs for text-to-speech, and Groq Whisper for speech-to-text. This approach has been used in practice — for example, in a pilot study capturing participant experiences of rugby taster sessions, with full ethical approval and a participant information sheet (OPIS) in place. As with any data collection method, standard ethical requirements apply: institutional ethical approval, informed consent, and appropriate data handling must all be secured before deployment.

Demo: Voice-enabled interview bot using DeepSeek, ElevenLabs TTS, and Groq Whisper STT.

View the GitHub example

Sandboxed analysis

Use AI to draft deterministic R or Python code, not to calculate complex statistics inside a free-text chat window. A sandboxed environment runs the generated code in isolation, which protects the host system and makes the analysis reproducible. Julius AI provides one such sandbox, with model selection, a connected code environment, and R/Python support.

Demo: Using Julius AI's sandboxed R environment with model selection and connected data.

Julius AI for data workflows

AI-assisted qualitative analysis

Interpretative Phenomenological Analysis (IPA) and other qualitative frameworks can benefit from AI as an analytical companion — not a replacement for the researcher's interpretive labour. The tool below accepts a research question, researcher reflexive statement, and interview transcripts, then generates exploratory themes using DeepSeek. The reflexive statement is threaded through the analysis to surface the researcher's own biases and assumptions, maintaining the double hermeneutic that IPA demands.

Demo: IPA Analysis Tool with reflexive statement support and DeepSeek completions.

View the GitHub repository

Focused notebooks

NotebookLM-style systems are strongest when the source set is curated and narrow — typically 5 to 15 closely related sources addressing a defined question. Avoid dumping an entire field into one mega-notebook and treating the summary as synthesis. The model cannot perform systematic review methodology; it can help you navigate and interrogate a pre-selected corpus.

NotebookLM help

Local manuscript review

A locally-hosted review pipeline can provide confidential feedback on draft manuscripts without exposing unpublished work to cloud APIs. The example below runs on Apple Silicon using MLX with quantised open-weight models (Qwen 3.6, Gemma 4), providing structured methodological critique via a FastAPI web interface with separate Chat and Review modes. This solves the confidentiality problem — the manuscript never leaves the machine — but it does not solve the accountability problem. The researcher must still evaluate every critique the model produces.

Running multiple models against the same manuscript is instructive: each model tends to identify different genuine weaknesses while also introducing its own distinct factual errors. One model may correctly flag an inferential gap between large F-values and trivial adjusted R² differences; another may erroneously describe a flexible distribution method as "distribution-free." This pattern — broader critical coverage at the cost of model-specific errors — reinforces why multi-model comparison and researcher verification are both necessary.

Demo: Local LLM manuscript reviewer with model switching (Qwen 3.6 35B, Gemma 4 26B).

Architecture-driven prompting

Prompting is context design

Transformer-based systems generate likely continuations from the context window. That makes prompting a methodological act: you are constraining the probability distribution, not asking an oracle.

The sampling parameters below are relevant to researchers using API access or local models. If you are using a standard chat interface (ChatGPT, Claude, Gemini), the platform manages these settings and you can skip to the prompt framework.

temperature

Lower values (0.0–0.3) for factual consistency and reproducible outputs. Higher values (0.7–1.0) for divergent idea generation.

top_p

Limits sampling to the most likely cumulative token set. A value of 0.9 means the model considers only the top 90% probability mass.

top_k

Hard-cuts the candidate token pool to a fixed size. A value of 40 means only the 40 most probable next tokens are considered.

These parameters interact. Setting temperature to 0 makes top-p and top-k irrelevant (greedy decoding). In practice, adjust temperature first and leave the others at defaults unless you have a specific reason to constrain further. Most chat platforms do not expose these controls.

The 7-part research prompt

This framework structures prompts so that the model receives enough context to generate useful output while the constraints prevent the most common failure modes: hallucinated citations, overclaiming, and unsupported causal language.

Role — define the expert lens the model should adopt.

Task — specify the action: evaluate, draft, compare, extract.

Context — state the study design, purpose, and stage of work.

Evidence — supply the data, text, or output to work with.

Constraints — forbid unsupported claims, invented citations, speculation.

Format — require tables, code blocks, headings, or structured output.

Verification — ask for uncertainty flags, missing evidence, and checks needed.

# Research critique prompt — paste into any capable model Role: Act as a critical methodological adviser. Task: Evaluate the supplied material for factual, statistical, and methodological reliability. Context: I am preparing an academic research output and need conservative feedback before submission. Evidence: [Paste the source text, table, output, transcript, or model results here.] Constraints: Use only the supplied material unless I explicitly ask for external sources. Do not invent citations, missing facts, or statistical values. Distinguish clearly between fact, interpretation, and speculation. Output format: Use a table with these columns: Claim | Support status | Evidence | Concern | Required check | Conservative revision. Verification: Flag overclaiming, causal language without experimental design, missing uncertainty, mismatches between evidence and conclusion, and any claim that needs independent database or code verification.

Worked example — what good output looks like

Suppose you paste a draft results paragraph that reads: "Sprint training significantly improved VO₂max (p = 0.03), demonstrating that high-intensity intervals are superior to steady-state training for aerobic adaptation."

A well-configured model should return something like this:

Claim	Sprint training is "superior" for aerobic adaptation
Support status	Partially supported
Evidence	p = 0.03 for within-group change in VO₂max
Concern	"Superior" implies a between-group comparison, but only a within-group p-value is supplied. No effect size, no confidence interval, no comparison condition reported. "Significantly" conflates statistical and practical significance.
Required check	Report the between-group comparison statistic and effect size. Check whether the study design supports a causal claim (randomised? controlled?).
Conservative revision	"Sprint training was associated with a pre-to-post increase in VO₂max (mean difference = X, 95% CI [Y, Z], p = 0.03). Comparison with steady-state training requires the between-group analysis reported in Table N."

The model catches the overclaiming, flags the missing effect size, and distinguishes within-group from between-group evidence. This is the kind of output the 7-part prompt is designed to elicit. If the model instead validates the original claim, the prompt constraints need tightening or the model is not suitable for this task.

Verification

Fluency is not evidence

Agreement between polished outputs is not the same thing as independent corroboration. Two models trained on overlapping data can produce the same confident, wrong answer. Verification needs its own workflow, separate from generation.

Why hallucination is structural, not accidental

If a generative large language model cannot perfectly classify a fact, it is mathematically prone to hallucinate it. Hallucinations are not broken code — they are a natural consequence of the model doing exactly what it was trained to do: make the best statistical guess possible based on its training distribution (Kalai et al., 2025). This means hallucination cannot be fully eliminated through prompt engineering alone; it must be managed through verification workflows, source checking, and multi-model comparison.

From single answers to plausible answer sets

Never ask a generative model for "the correct interpretation." Instead, mandate that it provides a defined range of interpretations and evaluates the evidentiary weight for each.

Conservative

Grounded firmly in the supplied evidence. Lowest risk of hallucination. Claims only what the data directly supports.

Ask: "What evidence supports this?"

Moderate

Synthesises the supplied text with standard domain knowledge. Reasonable inferences, but introduce additional assumptions.

Ask: "What additional evidence is needed?"

Speculative

Extrapolates broader implications beyond the evidence. High uncertainty. Useful for hypothesis generation, not for claims.

Ask: "What facts would count against this?"

The VALID-AI checklist

This checklist was developed for this guide as a mnemonic for the minimum verification steps a researcher should perform on any AI-generated content before it enters a manuscript or analysis pipeline.

Verify sources

Confirm that every citation exists and says what the model claims it says. Check DOIs, page numbers, and author lists against the original database entry.

Assess authority

Prioritise peer-reviewed and primary material over plausible grey literature. Models can generate convincing-sounding references to reports and working papers that do not exist.

Look for bias

Inspect what is omitted: methods, geographies, populations, theoretical positions, and languages not represented in the output. Models reflect training data distributions, not the full evidence base.

Identify limits

Separate supported findings from interpretation and speculation. If the model does not distinguish these itself, the output cannot be trusted without manual classification.

Document provenance

Record the model name and version, the full prompt, the source set provided, the date of generation, and the verification checks performed. This documentation enables reproducibility and audit.

RAG helps, but it is not magic

Retrieval-augmented generation (RAG) moves from closed-book pattern completion to open-book, source-grounded generation. It can reduce hallucination by anchoring answers in a trusted corpus, but it introduces its own failure modes: retrieval misses, context-window truncation, and false confidence from partial matches.

When AI gets statistics wrong

The video below demonstrates a case where a model produces a confident but incorrect interpretation of statistical output. This is not a rare edge case — it is the default risk when statistical reasoning is delegated to a language model without independent verification. The model may identify the correct test, report plausible numbers, and still misinterpret what they mean.

Demo: An LLM producing confident but incorrect interpretation of statistical output.

Three levels of RAG use

Exploratory RAG — useful for orientation, question generation, and finding candidate passages in a curated source set. Acceptable for early-stage literature scanning.

Rigorous synthesis — requires defined inclusion criteria, paper-level extraction, traceable notes, and independent checking. RAG alone cannot perform this; it can assist with navigation within a pre-screened corpus.

StatsRAG pattern — a direct response to the kind of misinterpretation shown above. Build an auditable statistical specification, verify it against a trusted local reference library, then produce a verdict card covering compliance, metric integrity, direction, and source support. This is the approach used in tools like the StatsRAG project for Bayesian analysis specification.

Demo: StatsRAG — verifying LLM-generated statistical output against a trusted reference library.

IBM explainer on RAG

Safety protocol for AI-assisted analysis

Avoid	Relying on an LLM for direct calculation of complex statistics or numerical datasets.
Instead	Generate deterministic R or Python code and run it in a controlled environment. Inspect the code before execution.
Avoid	Accepting test selection or model specification without manual verification of the design and assumptions.
Instead	Check normality, variance structure, outliers, dependence, units, sample size, and model assumptions against the study design.
Avoid	Pasting AI-generated interpretation into a manuscript without independent checking.
Instead	Constrain the prompt, re-run with variations, compare outputs across models, and verify claims against the original data and published sources.

Secure AI choices

Choose the smallest exposure that fits the job

The newest secure options are considerably better than public consumer chat, but "secure" still depends on your institution, licence terms, data classification policy, region, retention settings, and whether features like web grounding or third-party connectors are enabled. No single answer works for every institution.

Public web AI

ChatGPT, Gemini, Claude (free/consumer tiers), Perplexity

Best for low-risk brainstorming, exploring public information, and learning how models behave. Do not upload unpublished manuscripts, sensitive participant data, or confidential grant material. Consumer tier data handling varies by provider and changes frequently — check the current terms.

Campus enterprise AI

ChatGPT Edu, Microsoft 365 Copilot Chat, Gemini for Workspace, Claude for Work

Stronger contractual controls, typically with no-training clauses and regional data residency. However, local policy decides what data classifications are permitted. Check your institution's AI acceptable use policy and the specific enterprise agreement before uploading anything beyond public data.

Managed secure cloud

Azure AI Foundry, AWS Bedrock, Google Vertex AI

Tenant-level governance, audit logging, region selection, and model-provider separation. Suitable for serious deployments with institutional data. Requires technical setup and ongoing administration — not a plug-and-play option for individual researchers.

Local or self-hosted AI

Ollama, LM Studio, Open WebUI, AnythingLLM, Jan, GPT4All

Run models on your own hardware. Nothing leaves the machine. This maximises confidentiality but does not maximise accuracy — local models are typically smaller and less capable than frontier cloud models. Best for peer-review assistance, manuscript critique, and code generation where the researcher can verify every output.

A simple classification rule

Public data Use any appropriate tool, but verify and cite independently. The convenience of AI does not reduce the citation standard.

Internal or unpublished work Use approved enterprise/campus AI or a managed secure cloud service. Check retention and training-exclusion clauses.

Sensitive, identifiable, or embargoed data Use approved local, self-hosted, or institutionally governed platforms only. This includes participant data, clinical records, and pre-publication findings under embargo.

Current platform guide

Tools worth knowing in 2026

These are not endorsements. They are a researcher's map: what each platform is good for, what to check, and when to consider alternatives. All links and descriptions were checked in May 2026, but product details, model access, pricing, data retention and privacy settings change frequently. Check current provider documentation before using any tool with non-public research data.

Cloud · Enterprise

ChatGPT (OpenAI)

Strong general-purpose model with web browsing, code execution, and image generation. Enterprise and Edu tiers offer no-training guarantees. Free/Plus tiers may use conversations for model improvement unless opted out.

Secure tiers available

Cloud · Enterprise

Claude (Anthropic)

Emphasis on careful reasoning, document work and long-context processing. Strong for manuscript critique, coding and structured analysis. Claude for Work provides enterprise data controls; check the current context limits and plan features.

Secure tiers available

Cloud · Enterprise

Gemini (Google)

Deep integration with Google Workspace. Gemini in Docs, Sheets, and Slides is useful for faculty already in the Google ecosystem. Workspace data protection policies apply to enterprise customers.

Secure tiers available

Cloud · Enterprise

Microsoft 365 Copilot Chat

Grounded in your Microsoft 365 data (SharePoint, Teams, email). Useful for institutional knowledge retrieval. Commercial data protection means prompts and responses are not used for training.

Secure tiers available

Literature · Search

Elicit

AI-assisted literature review and data extraction. Searches Semantic Scholar, extracts structured data from papers, and supports screening workflows. Useful for scoping reviews and evidence mapping.

Literature

Literature · Citation

Scite

Shows how a paper has been cited — supporting, contrasting, or mentioning — across the literature. Useful for assessing the reception of a specific finding and identifying disputes.

Literature

Literature · Screening

Rayyan

Systematic review management with AI-assisted screening. Supports blind review, conflict resolution, and PRISMA-compatible export. Free for individual researchers.

Literature

Literature · Notebooks

NotebookLM (Google)

Source-grounded chat over uploaded documents. Best with 5–15 curated sources on a focused topic. Generates audio overviews and summaries. Does not replace systematic search or formal synthesis.

LiteratureSecure (Workspace)

Analysis · Sandbox

Julius AI

Data analysis platform with sandboxed code execution in R and Python. Connects to data sources, generates code, and produces visualisations. Useful for exploratory analysis and teaching statistical workflows.

Analysis

Analysis · Search

Perplexity

AI search with visible source links. Useful for quick orientation and finding recent publications. Paid tiers may offer more capable models and longer outputs. Not a substitute for systematic database searching.

Analysis

Local · Desktop

LM Studio

Desktop application for running open models locally. Easy model discovery, download, and chat. Good entry point for researchers new to local AI. Supports GGUF-format models on CPU and GPU.

Local

Local · Desktop

Jan

Open-source desktop AI with a clean interface. Supports local models, API connections, and extensions. Good for researchers who want offline chat without terminal commands.

Local

Local · Desktop

GPT4All

Privacy-focused desktop client from Nomic. Runs quantised models locally with a simple GUI. Includes a local document Q&A feature for small corpora.

Local

Local · Research lab

Ollama + Open WebUI

Command-line model server (Ollama) paired with a browser-based interface (Open WebUI). Supports model switching, RAG, tool calling and multi-user access. A flexible local setup for research teams and demonstrations.

Local

Local · RAG

AnythingLLM

Desktop and server application for local RAG. Upload documents, build a vector store, and chat with local or cloud models grounded in your own data. Good for building a private knowledge base.

Local

Cloud · Privacy

Duck.ai (DuckDuckGo)

Anonymous access to a rotating set of third-party models with no account required. DuckDuckGo describes requests as proxied to reduce identifying metadata and says it has contractual no-training arrangements with model providers. Model availability and limits change, so check the current Duck.ai documentation before relying on a specific model.

Secure

Cloud · Privacy

Lumo (Proton)

Privacy-focused AI assistant from the makers of Proton Mail. Proton describes Lumo as running on Proton-controlled infrastructure with zero-access encryption for saved conversations and a temporary mode for ephemeral chats. This may suit sensitive drafting or file review where the capability is sufficient, but check current terms, model options and institutional requirements.

Secure

Browser · Privacy

Brave Leo

AI assistant built into the Brave browser with a choice of hosted models. Brave describes requests as proxied and chat history as stored locally unless users choose otherwise. Useful for page-level tasks such as summarisation, translation and Q&A grounded in the current tab; verify current privacy claims and model availability before using it for research material.

Secure

Cloud · Privacy

Okara

Private multi-model AI workspace with encryption and collaboration features. It may be useful for structured research workflows where the provider's current terms match the data classification, but it is newer and less institutionally tested than the large enterprise platforms. Check the current documentation before uploading non-public data.

Secure

Community · Models

Hugging Face

The primary hub for open-weight models. Inspect model cards, licences, benchmark results, and community Spaces before downloading. Essential for evaluating which model is appropriate for a given task.

LocalLiterature

Authorship and peer review

Local AI solves confidentiality. It does not solve accountability.

Some uses of AI in the peer-review and writing process are highly appropriate; some are risky; and unverified generation is academic misconduct regardless of where the model runs. The spectrum below applies to any model, cloud or local.

Use case	Rating	Researcher responsibility
Gap analysis and red teaming	Highly appropriate	The researcher must independently evaluate which critiques are valid and decide what to act on.
Grammar, wording, and structure	Highly appropriate	The meaning and argument must remain human-generated. All changes must be reviewed and approved.
Substantial drafting of text	Problematic	Risks false synthesis, fabricated citations, and authorship blur. If used at all, every claim must be independently verified and the contribution disclosed.
Unverified paste-in	Inappropriate	The author cannot vouch for accuracy, originality, or source integrity. This constitutes academic misconduct under most institutional and journal policies.

Disclosure statement templates

Adapt these to the target journal. Always include the tool name, version where available, specific task, date range, and the human verification performed.

Copy editing I used [tool and version] to suggest grammar, wording, and structure improvements to human-authored text. I reviewed all changes and take full responsibility for the final manuscript.

Analysis support I used [tool and version] to draft R/Python code and identify possible model assumptions. All analyses were executed in [software and version], checked against the dataset and relevant statistical references, and revised by the authors.

Red teaming I used [local/approved tool and version] to identify possible limitations, missing literature, and overclaims. Suggestions were independently evaluated and only incorporated after author review and verification against primary sources.

Non-negotiable research rules

Try it yourself — AI peer review of your own paper

Paste your abstract and methods section into any capable model using the prompt below. Then evaluate the output: did the model identify a real weakness? Did it fabricate a concern? How does it compare to actual reviewer feedback you have received?

You are a sceptical methodological reviewer for an academic journal. Your job is to find weaknesses, not to validate. Read the material below and evaluate every substantive claim for evidential support. Context: I am preparing a manuscript for peer-reviewed publication. I want conservative, evidence-grounded feedback before submission. I do not want encouragement — I want problems. === MATERIAL TO REVIEW === [PASTE YOUR ABSTRACT AND METHODS SECTION HERE] === END OF MATERIAL === Constraints: - Use ONLY the supplied text. - Do not invent missing information or fabricate citations. - If a claim cannot be assessed from the material provided, say so explicitly and state what additional information would be needed. - Do not soften your language. Output format: A numbered table with these columns: Claim | Support status | Evidence from text | Concern | What needs checking | Suggested revision Support status must be one of: directly supported, partially supported, unsupported, not assessable. After the table, write one paragraph summarising the three most serious methodological issues. Finally: list any claims where you were uncertain about your own assessment and explain why. If you have no concerns about a claim, do not fabricate one.

Debrief questions: Did the model identify a real weakness you had not noticed? Did it fabricate a concern that does not withstand scrutiny? Did different models produce different critiques? This exercise demonstrates both the power and the limits of AI-assisted review — and why the researcher must evaluate every critique independently.

Never cite anything you have not read and verified yourself.
Never treat AI output as scholarly evidence.
Always verify AI-suggested sources in academic databases (Scopus, Web of Science, PubMed, Google Scholar).
Use AI to assist thinking, not replace it.
Write your own argument and own the final judgement.

Future direction

From co-intelligence to managed agents

The trajectory points towards a shift from back-and-forth prompting to agentic workflows, where the researcher becomes less a prompt typist and more a manager of objectives, constraints, tools, audit logs, and checkpoints. This raises the verification burden rather than removing it.

Agentic research workflows

Coding and research agents (such as those in Claude Code, Cursor, and Windsurf) can execute multi-step tasks: searching literature, writing and running code, iterating on errors. The appeal is real, but so are the risks. Agents require explicit permission boundaries, sandboxed execution environments, source-boundary constraints (preventing the agent from citing material outside a defined corpus), and human review gates at each decision point. Unsupervised agent runs that modify data or submit outputs are not currently defensible in an academic context.

Practical example A coding agent could be tasked with writing a Bayesian power analysis in R, running it, checking convergence diagnostics, and producing a summary table — but the researcher must review the model specification, prior choices, and interpretation before the output enters a protocol or manuscript.

Demo: Claude Code building SecurXamine — an agentic coding workflow with human review gates.

Demo: OpenAI Codex generating a Bayesian 3D visualisation as an agentic task.

Fine-tuning and low-rank adaptation (LoRA)

LoRA allows specialisation of a foundation model for a narrow task — such as evaluating statistical claims in academic prose or classifying methodological frameworks — without the cost or data requirements of full fine-tuning. A LoRA adapter trained on 100 annotated examples can meaningfully shift model behaviour on a focused task. However, dataset quality determines everything: garbage in, confidently wrong garbage out. Licensing of the base model, evaluation against held-out test sets, and monitoring for distributional drift over time all matter. Fine-tuning is powerful and accessible, but it is not a shortcut to a reliable domain expert.

Practical example A LoRA adapter trained on annotated statistical claims (correct vs. overclaimed vs. under-reported) can be applied to a small open-weight model to produce a manuscript screening tool. The adapter adds domain specificity; the base model provides language capability. The researcher must still validate the tool against known-good and known-bad examples before trusting its output.

Demo: Fine-tuning a small thinking model (Ouro) with LoRA adapters for statistical claim evaluation.

Beyond standard transformer architectures

Current transformer limitations — finite context windows, no persistent memory, limited planning — are active research frontiers. Future systems may combine symbolic reasoning, retrieval, planning modules, and model-based generation. Mixture-of-experts architectures (already deployed in models like Qwen and Gemini) improve efficiency by activating only relevant subnetworks for a given input. State-space models and recurrent alternatives may reduce the quadratic cost of attention on long sequences. None of these architectural advances will eliminate the need for researcher verification; they will change the shape of the errors rather than removing them.

Sources

Primary and official links

All links were checked for this build in May 2026. Where a source is behind a paywall, the DOI or arXiv preprint is provided.

Research method and risk

Vaswani et al., Attention Is All You Need (2017) The foundational transformer architecture paper. Essential context for understanding why prompt design matters.
Liu et al., Lost in the Middle (TACL, 2024) Demonstrates positional bias: models attend more to information at the start and end of the context window, with reduced recall for middle-positioned content.
Kalai, Nachum, Vempala & Zhang, Why Language Models Hallucinate (2025) Shows that hallucination is a mathematical consequence of imperfect fact classification, not a fixable bug. The foundation for the verification-first approach taken throughout this guide.
Towards end-to-end automation of AI research (Nature, 2026) Recent exploration of agentic AI in the research pipeline. Demonstrates both capability and the verification challenges that remain.

Security and enterprise AI

Tools and local AI

Elicit AI-assisted literature review and evidence extraction.
Scite Citation context analysis: supporting, contrasting, mentioning.
Rayyan Systematic review screening and management.
Julius AI Sandboxed data analysis with R and Python.
Hugging Face Model hub, model cards, community Spaces, and datasets.
Duck.ai DuckDuckGo's anonymous, proxied AI chat. No account required.
Lumo (Proton) Zero-access encrypted AI on Proton-controlled European servers.
Brave Leo Browser-integrated AI with privacy-focused request handling.
Okara Encrypted multi-model AI workspace with client-side key generation.
Ollama Local model server for macOS, Linux, and Windows.
LM Studio Desktop application for running quantised open models.
Open WebUI Browser-based interface for Ollama and other backends.
AnythingLLM Local RAG and document Q&A platform.

AI is a collaborator, not evidence

Security and accountability are separate

Verification is structural

Match the model to the task

Capability alignment: match the prompt to the research task

Voice interview bot

Sandboxed analysis

AI-assisted qualitative analysis

Focused notebooks

Local manuscript review

Prompting is context design

temperature

top_p

top_k

The 7-part research prompt

Worked example — what good output looks like

Fluency is not evidence

Why hallucination is structural, not accidental

From single answers to plausible answer sets

Conservative

Moderate

Speculative

The VALID-AI checklist

RAG helps, but it is not magic

When AI gets statistics wrong

Three levels of RAG use

Safety protocol for AI-assisted analysis

Choose the smallest exposure that fits the job

ChatGPT, Gemini, Claude (free/consumer tiers), Perplexity

ChatGPT Edu, Microsoft 365 Copilot Chat, Gemini for Workspace, Claude for Work

Azure AI Foundry, AWS Bedrock, Google Vertex AI

Ollama, LM Studio, Open WebUI, AnythingLLM, Jan, GPT4All

A simple classification rule

Tools worth knowing in 2026

ChatGPT (OpenAI)

Claude (Anthropic)

Gemini (Google)

Microsoft 365 Copilot Chat

Elicit

Scite

Rayyan

NotebookLM (Google)

Julius AI

Perplexity

LM Studio

Jan

GPT4All

Ollama + Open WebUI

AnythingLLM

Duck.ai (DuckDuckGo)

Lumo (Proton)

Brave Leo

Okara

Hugging Face

Local AI solves confidentiality. It does not solve accountability.

Disclosure statement templates

Non-negotiable research rules

Try it yourself — AI peer review of your own paper

From co-intelligence to managed agents

Agentic research workflows

Fine-tuning and low-rank adaptation (LoRA)

Beyond standard transformer architectures

Primary and official links

Research method and risk

Security and enterprise AI

Tools and local AI