Using Generative AI (Artificial Intelligence) tools in research
Whether using a chatbot to collect data, training an LLM (large language model) to perform discipline-specific tasks, or developing the next AI technology, as scholars we have an obligation to ensure we are using and building AI responsibly. We also need to be aware of possible compliance needs and adhere to a code of ethics. These guidelines offer an overview of ethical considerations to help you design your research process based on real-world examples mostly focused on LLM and chatbot use from within and beyond academia. It aims to address AI activities across many disciplines, but in so doing may not cover all contexts.
AI tools and research are changing rapidly. Your interaction with these tools, and the thinking behind these guidelines, will also likely shift. Keep this in mind as you use AI tools, stay informed about the changes being made to your preferred chatbot or LLM, and continue to resource yourself concerning responsible use of AI in your research.
Summary: Responsible Use of AI Do’s & Don’ts
If you just need a quick overview, here are some best practices that will serve you well in a variety of situations. Note that these practices focus on LLM and chatbot use, but there are a wide range of AI technologies with which researchers may be engaging. These practices may also apply to those cases.
Do’s
- Do choose an LLM and/or chatbot based on your specific context and need. Not all LLMs are created the same, and general LLMs may not be trained to best support your research.
- Do use chatbots to search about or summarize topics that you already know well.
- Do use SIFT when asking chatbots to retrieve information: Stop, Investigate the source, Find better coverage, and Trace claims, quotes, and media back to their original context.
- Do use AI tools that actively protect private health and sensitive information–even if they say they do this, check for any legal proceedings that may indicate otherwise. This may mean setting up your own local environment and/or training your own models.
- Do consider where inequities may show up in your AI tools and usage. Engage in a STEEPV analysis to understand the potential Social, Technological, Economic, Environmental, Political, and Values-based impacts of your applications.
- Do consider the environmental impact of your AI use and whether it runs counter to your research ethics and objectives.
- Do consider using open source LLMs like BLOOM or Qwen and share AI tools you develop under a Responsible AI License.
- Do check your AI-generated code for both security vulnerabilities and validity before putting it into production.
- Do check publisher, funder, and professional organization guidelines before you use AI to brainstorm, draft, or edit that grant application, conference abstract, article, or book proposal.
- Do document and publish your decision-making alongside your research.
Don’ts
- Don’t cite information found or summarized by a chatbot that you haven’t authenticated.
- Don’t use AI tools that collect, store, or train on user data. Check a tool’s privacy policy; if you’re not sure, contact the tool’s creator.
- Don’t include AI tools as authors, and
- Don’t submit text that is mostly AI-generated as your own work.
Principles of Responsible Conduct of Research with AI
The following should be considered throughout the research process.
Accountability
Human researchers are responsible for the outputs we generate with AI tools. Are you prepared to stand behind your AI-assisted research?
- Choosing an LLM and/or chatbot: Not all LLMs are created the same, and general LLMs may not be trained to best support your research. Choose an LLM and/or chatbot based on your specific context and need.
- Using AI-generated information: LLMs are prone to not only errors based on their probabilistic methods of representing language (“hallucinations”), but also generalization bias, which can lead to misinformation and misinterpretation. Always check AI-generated information for accuracy before citing it. Try using the SIFT method.
- In grant proposals, reviews, & management: Funders are announcing their policies with respect to AI use in writing applications and conducting peer reviews. Make sure to check whether a potential funder has AI policies in place before drafting your application or participating in a review.
- In publications & presentations: Publishers and professional organizations are approaching AI use from a variety of perspectives. Some, like ACM, are embracing AI use and clarifying how it may be used and cited in bibliographies and/or acknowledgement sections. One conference, NeuroIPS, encourages LLM usage and stipulates when it does or does not need to be cited. Nature, meanwhile, allows for limited use in text while prohibiting use for image and video, and JAMA discourages the submission of both text and images created by AI unless part of the formal research process. Even if your organization or publisher doesn’t have a policy (here are some of the major publishers who do), it’s best practice to cite your use of AI.
Consent, Privacy, & Security
Researchers are obligated to adhere to practices of informed consent, protect subject privacy, and ensure security of sensitive data. Where do these issues show up in your research? Understanding how AI tools may or may not support these practices, and watching for when these practices change, is critical. Always check an AI tool’s privacy policy, reporting about the tool, and possible legal proceedings.
- Using datasets generated for training LLMs: Data provenance, the history of data’s creation, transformation, and holding, is important to transparent research and to understanding the context surrounding a dataset. Projects like the Data Provenance Initiative have documented the provenance of some more popular AI training sets, though they don’t document when a dataset contains information taken without consent, or sensitive data including private health information (PHI). If you use these datasets, you may need to check them yourself to be sure no sensitive or private information makes its way into your LLM.
- Prompting chatbots for data: As with using training datasets, be mindful of where LLM chatbots may source their data. Investigate data provenance by requesting chatbots provide citations and then check those citations. Consider whether the data may contain sensitive data including private health information (PHI), which if they are public can be gathered incidentally by AI web crawlers.
- Respecting “do not crawl” requests: If your research involves training your own models, you may be using bots to crawl the web for data. An important consideration in this data collection method is whether you respect organizations’ requests not to crawl–the equivalent of respecting a research subject’s right to give and revoke consent. Recently, bots ignoring these requests (often stated in robots.txt files) have overloaded cultural institutions including academic library websites, making it difficult for humans to access those sites and jeopardizing institutional infrastructures.
- Conducting surveys through chatbots: Chatbots are being integrated into online surveys. When creating such tools, it’s key to be mindful of which platforms you use: are you building a survey chatbot that will protect respondents’ privacy? Check the platform’s user privacy policy to ensure that it does not retain user data, and if you need IRB approval consult with your IRB. If you cannot be sure about their policies and practices, do not use the platform. Consider tools like Duke’s MyGPT Builder or AI Gateway, but check with your IRB if you need approval.
- Using LLMs to structure & analyze information: Is your data sensitive in any way? This may include not only private health information but also historical data about minoritized communities obtained from archival sources. In both cases, consider whether you need to gather consent from the affected individual or community. Only choose LLMs that actively protect user data, do not retain data, and do not sell user data to third parties. Check not only user privacy policies but also possible legal proceedings. A best practice for protecting people and their information is to run LLMs like Ollama or GPT4All directly on a local hard drive or virtual machine, but check with your IRB if you need approval.
Environmental Impact
All technology use comes with an environmental impact. Where might AI’s impact be outsized? Where might it conflict with your research ethics and goals?
- Prompting AI chatbots: Each prompt uses energy individually but also relies on a broader, constantly running infrastructure. Consider whether you need a chatbot for a task. If you decide you do, work on your prompt engineering to try to get the response you need with as few prompts as possible. If you’re curious about the possible energy usage of one prompt, try this calculator.
- Training custom LLMs: Consider calculating potential environmental impacts. This can be challenging due to the lack of corporate transparency, but there are researchers who are trying to make this work possible. Here are two places to start.
Equity
Human bias in research already exists. AI tools are adding new dimensions to the role that bias can play in research analyses, outcomes, and public policies and practices. How can we ensure that our research with AI supports equitable outcomes?
- Here are some common types of bias in AI drawn from IBM & the World Economic Forum. They can show up whether you are training LLMs or prompting a chatbot:
- Training Data - what’s in the data
- Cognitive - human bias in the data
- Confirmation - doubling down on what’s in the data
- Exclusion - when important data are left out
- Measurement - incomplete data
- Prejudice/stereotyping - when stereotypes and faulty societal assumptions are included
- Sample/selection - dataset isn’t big enough to train the LLM
- Model Behavior - how a LLM behaves
- Algorithmic - incorrect or incomplete instructions given
- Out-group homogeneity / Implicit - we don’t know what we don’t know
- Recall - subjective observations lead to inconsistent labeling
- Cognitive - personal biases among developers
- The World Economic Forum suggests that AI developers engage in STEEPV analyses to understand the potential Social, Technological, Economic, Environmental, Political, and Values-based impacts of their applications. Might there be an opportunity for you to adopt or adapt this model for your research?
- For those of us prompting chatbots, we can
- keep learning about the types of bias that show up in chatbot responses;
- identify potential biases in chatbot responses;
- and compare chatbots, responses, and personae to see what kinds of biases show up in each case.
Intellectual Property
When using AI tools, we must be aware of not only what might be fair use (e.g. training a LLM) but also when we might be infringing on copyright. Checking a tool’s Terms & Conditions for data retention practices and being cautious about uploading copyrighted and unlicensed materials can serve us well. Where do copyrighted or unlicensed materials show up in your research?
- Summarizing copyrighted works: Many researchers and students use chatbots to provide summaries of articles and books. If these works are copyrighted, we enter a gray zone concerning infringement. It is possible that chatbots could lift key phrases from texts to support their summarization, and if this reuse is extensive, then the summarization could be considered copyright infringement. Chatbots like ChatGPT and Copilot appear to have limits in place regarding the length of the summary that they will provide specifically to guard against infringement.
- Writing with AI: A growing number of publishers do not consider a chatbot to be an author because it cannot be held accountable for the text it produces. A work generated with AI might in theory be considered copyrightable “if the prompts provided by the user sufficiently controlled the AI such that the resulting work as a whole constituted an original work of human authorship,” but thus far the Copyright Office has declined to register AI-generated work since current technology does not allow for sufficient human control of outputs. AI-generated work can be considered copyright infringement if the new work is substantially similar to the original. In this instance, copyright owners might be able to take action against both AI providers and the user who prompted the AI-generated work.
- Respecting “do not crawl” requests and website terms of service: If your research involves training your own models, you may be using bots to crawl the web for data. An important consideration in this data collection method is whether you respect organizations’ requests not to crawl–the equivalent of respecting a research subject’s right to give and revoke consent. Recently, bots ignoring these requests (often stated in robots.txt files) have overloaded cultural institutions including academic library websites, making it difficult for humans to access those sites and jeopardizing institutional infrastructures.
It’s possible that LLM usage can compromise our own intellectual property. How can we carefully consider when and how we share our work with LLMs?
- Uploading or editing unpublished work with a chatbot: Some chatbots (like free versions of ChatGPT) use human prompts to train future models. This has become an issue across industries with a high-profile instance occurring when Amazon found that ChatGPT responses closely resembled internal company data. Some chatbots, like Claude, include user inputs in their model training but also claim to have strict privacy policies in place with regard to how their models interact with personal information. Be sure to check a chatbot’s privacy policy: do they save or train models on user prompts? If so, what implications might that have for how you do or do not use them with your intellectual property?
- Writing with AI: Using a chatbot to brainstorm, draft, and edit papers is becoming a common practice. Publishers are developing policies about how and when LLMs may be used in writing and when usage needs to be cited. Many are also clear that AI chatbots cannot be considered co-authors because they cannot assume responsibility for a written work.
- Patenting AI tools or AI-assisted inventions: Some researchers are using chatbots to help generate new ideas for patents. However, these ideas need to be carefully examined for reliability before they are submitted. In addition, researchers must review the Patent and Trademark Office’s Guidance for AI-Assisted Inventions. It’s important to document your uses of AI and evidence of ideas coming from you, the human researcher. If you want to patent your own AI tool, you’ll need to demonstrate transformativeness.
Labor
Labor concerns are prevalent not only in worker exploitation in LLM content moderation, but also in implications of replacing human workers, and in the role of collaborative research in training the next generation of researchers. Where do labor ethics show up in our research designs?
Transparency
Commercial AI models are often built as “black boxes,” meaning that their architecture and outputs cannot be explained. If researchers are responsible for our AI outputs, how do we ensure transparency not only in our usage but also in the tools themselves?
- For any use of AI: Document and publish your decision-making alongside your research. This might be in a methods or acknowledgments section of an article. It might be in the form of a readme fie accompanying your dataset in a digital repository. There are opportunities to help shape best practices here that scholars at Duke can be modeling now.
- Use an open source LLM: Some of the ethical issues that arise from LLM use stems from the use of commercial products. Because these products are “black boxes,” their processes are impossible to explain, which means researchers cannot adequately study them. Consider using free, multilingual, open source LLMs like BLOOM and Qwen.
- Share AI tools you develop under a Responsible AI License: Created by the folks behind BLOOM, this license incorporates ethical standards into how the code and documentation of an LLM may be reused.
Trustworthiness
From algorithm creation to training data to chatbot responses, AI contains flaws that call into question how much we can trust it. Here are some examples of AI use where trustworthiness needs to be questioned:
- Collecting information: LLMs are prone to “hallucinations” caused by their probabilistic approach to language, so you must review responses for accuracy. For this reason, it’s a good idea to ask chatbots only about topics you know well.
- Chatbots as search tools: A library catalog will provide you with real sources, every time. Chatbots that generate citations are offering only a “word salad,” with no guarantee of authenticity. A citation could be fabricated wholly, or in part. Using AI products that are targeted toward researchers requires additional care, as those tools may provide more accuracy but may also return only the most cited publications rather than those that are most relevant. No chatbots can assess publication quality. If you use chatbots to find information, practice SIFT: Stop, Investigate the source, Find better coverage, and Trace claims, quotes, and media back to their original context.
- Summarizing a publication: A common task that researchers may ask of a chatbot, summarizing publications requires some semantic understanding of a publication. AI-generated summaries draw information out of its context and may miss important information, fail to articulate nuance, or fabricate concepts, leading to further misunderstanding. If you plan to cite a publication, it remains best practice to read it fully so that you can analyze it directly.
- Building AI tools: If you will be building an AI tool as part of, or in support of, your research consider applying an AI Risk Management Framework to help you identify not only risks but also ways of making your tools more trustworthy.
- Integrity in training data: General LLMs (Copilot, ChatGPT, Claude, etc.) are trained on vast amounts of data that may not be trustworthy or relevant to your research. If you are training your own model, consider your data sources carefully. If you opt to use a dataset that someone else has created, it’s important to assess content relevance and integrity. If you are building a tool that others, especially members of the public, will use, assess the dataset for harmful materials and test the LLM for emergent misalignment.
- Refining a LLM’s dataset: You might want to use retrieval-augmented generation (RAG) to fine-tune an existing LLM to support your research topic, rather than training a model from scratch. Because you are adding your own data to an existing model and fine-tuning how those data are used, RAG can reduce potential for “hallucination” , which is good for building integrity in your system, but these probabilistic errors are still possible. Always check LLM responses for accuracy. Duke’s MyGPT Builder offers one option for creating a RAG LLM. As with all other aspects of AI, be sure to review privacy policies before training a RAG with sensitive data.
- Adjusting a LLM’s “temperature” and “top-p”: “Temperature” in LLM-speak is a setting that adjusts a LLM’s responses along a spectrum from accurate (0-0.3) to creative (0.7-1.0). “Top-p” sets vocabulary usage: it operates on a scale from diverse (0.7-0.99) to quality (0.1-0.3). If you want to tweak a chatbot’s responses, you can add these settings to your prompts. Adjusting temperature may help limit or increase probabilistic errors (“hallucinations”). Adjusting top-p, meanwhile, may help produce more or less specific language. Document your usage so that you can include this information in citations and acknowledgments.
- Writing with chatbots: LLMs can be thought partners for many types of writing tasks, from brainstorming and outlining to revising and formatting. Always cite your usage and check the policies governing submissions to funders and publishers before you begin writing. As noted above, text that is primarily or entirely AI-generated is not copyrightable and may be considered an ethics violation.
- Coding with chatbots: As with asking for information, it’s important that you validate and check security vulnerabilities in code that you generate with AI before putting it into production. Test it yourself and with your team or a colleague. While you can ask a chatbot to fix its code, you may find that it is unable to. In this case, it’s best to work with a coding language that you already have some knowledge of so that you can troubleshoot effectively.
- Generating images & charts: As with text, it’s important to check the accuracy of charts or images that you generate with AI. For images, consider your reasoning for creating the image before you begin generating: is it necessary to your research? Do not generate fake images of real events. If you edit or augment an existing image with AI, be sure you have a solid academic reason for doing so and document the changes that you make. In some contexts, such as in historical research, generating or augmenting images may be entirely unethical.
Unintended Consequences
Any research can create unintended consequences, good or bad. How do we think about these outcomes, and do we need to think differently about them, when incorporating AI into our research?
Conclusion
In order to conduct research responsibly, it’s critical that we weigh the benefits and risks of using AI in research before we open the chatbot window or command line. There are many considerations to account for. When in doubt, seek out assistance from the Office of Research & Innovation and/or Duke Libraries.
Discover the resources here and visit often for updates
| Resource | Topics Covered |
|---|---|
| AI and Teaching at Duke |
|
| AI at Duke (Centralized Resource Hub) |
|
| AI Health Events |
|
| ABCDS |
|
| Generative AI interest group in Teams |
|
| ICMJE Guidelines |
|
| NIH: Artificial Intelligence in Research: Policy Considerations and Guidance |
|
| NIH Notice: Application Process |
|
NSF notice: Grant Review NIH notice: Grant Review |
|
| Guidance from Duke Health Technology System |
|
| AI and Academic Research Town Hall (April 25, 2023) |
|
| NIH FAQs |
|
Do you know of additional resources of use to the Duke research community? Contact researchinitiatives@duke.edu with ideas.
Bibliography & Further Reading
“9 Takeaways about AI Energy and Water Usage.” AI Impact Risk. n.d. https://ai-impact-risk.com/ai_energy_water_impact.html.
Abrams, Zara. “Addressing Equity and Ethics in Artificial Intelligence.” Monitor on Psychology 55, no. 3: 2024. https://www.apa.org/monitor/2024/04/addressing-equity-ethics-artificial-intelligence.
“Addressing Bias in AI.” Kansas University Center for Teaching Excellence, n.d. https://cte.ku.edu/addressing-bias-ai.
Adam, David. “The Automated Lab of Tomorrow.” Proceedings of the National Academy of Sciences 121, no. 17 (2024): e2406320121. https://doi.org/10.1073/pnas.2406320121.
“AI Is Flooding the Zone with Patents. How Can They Be More Reliable?” Duke University School of Law, May 30, 2025. https://law.duke.edu/news/ai-flooding-zone-patents-how-can-they-be-more-reliable.
“AI Risk Management Framework.” National Institute of Standards and Technology Trustworthy & Responsible AI Resource Center, n.d. https://airc.nist.gov/airmf-resources/airmf/.
“Authorship and AI Tools.” COPE: Committee on Publication Ethics, February 13, 2023. https://publicationethics.org/guidance/cope-position/authorship-and-ai-tools.
Belanger, Ashley. “OpenAI Confronts User Panic over Court-Ordered Retention of ChatGPT Logs.” Ars Technica, June 6, 2025. https://arstechnica.com/tech-policy/2025/06/openai-confronts-user-panic-over-court-ordered-retention-of-chatgpt-logs/.
Betley, Jan, Daniel Tan, Niels Warncke, et al. “Emergent Misalignment: Narrow Finetuning Can Produce Broadly Misaligned LLMs.” arXiv:2502.17424. Preprint, arXiv, May 12, 2025. https://doi.org/10.48550/arXiv.2502.17424.
Ciet, Pierluigi, Christine Eade, Mai-Lan Ho, et al. “The Unintended Consequences of Artificial Intelligence in Paediatric Radiology.” Pediatric Radiology 54, no. 4 (2024): 585–93. https://doi.org/10.1007/s00247-023-05746-y.
Coeckelbergh, Mark. “A-Responsible Machines and Unexplainable Decisions,” in AI Ethics. MIT Press: April 7, 2020. https://doi.org/10.7551/mitpress/12549.003.0010.
“Data Provenance Initiative.” Data Provenance Initiative. 2024. https://www.dataprovenance.org/.
Deodato, Joseph. “AI in Publishing.” Rutgers University Library Research Guides, May 29, 2025. https://libguides.rutgers.edu/artificial-intelligence/ai-in-publishing.
“Frequently Asked Questions.” Association for Computing Machinery. n.d. https://www.acm.org/publications/policies/frequently-asked-questions.
Hansen, Dave. “Anthropic Wins on Fair Use for Training Its LLMs; Loses on Building a ‘Central Library’ of Pirated Books.” Authors Alliance, June 24, 2025. https://www.authorsalliance.org/2025/06/24/anthropic-wins-on-fair-use-for-training-its-llms-loses-on-building-a-central-library-of-pirated-books/.
Hardy, Deric. “ChatGPT & Generative AI Tools Collaborative Guide: AI Ethical Considerations.” Duke University Libraries Research Guides, June 16, 2025. https://guides.library.duke.edu/c.php?g=1351161&p=10927308.
Hicks, Maggie. “No, ChatGPT Can’t Be Your New Research Assistant.” The Chronicle of Higher Education, August 23, 2023. https://www-chronicle-com.proxy.lib.duke.edu/article/no-chatgpt-cant-be-your-new-research-assistant.
“How AI Is – and Isn’t – Changing the Patent Landscape.” Duke OTC, n.d. https://otc.duke.edu/news/how-ai-is-and-isnt-changing-the-patent-landscape/.
“How Publishers Are Building AI Regulation—and Who They’re Looking to for Guidance.” Choice 360, June 18, 2025. https://www.choice360.org/libtech-insight/how-publishers-are-building-ai-regulation-and-who-theyre-looking-to-for-guidance/.
“Instructions for Authors.” JAMA Network, July 10, 2025. https://jamanetwork-com.proxy.lib.duke.edu/journals/jama/pages/instructions-for-authors#:~:text=Nonhuman%20artificial%20intelligence,Image%20Integrity.
“Inventorship Guidance for AI-Assisted Inventions.” Federal Register, February 13, 2024. https://www.federalregister.gov/documents/2024/02/13/2024-02623/inventorship-guidance-for-ai-assisted-inventions.
Kim, Eugene. “Amazon Warns Employees Not to Share Confidential Information with ChatGPT after Seeing Cases Where Its Answer ‘closely Matches Existing Material’ from inside the Company.” Business Insider, January 24, 2023. https://www.businessinsider.com/amazon-chatgpt-openai-warns-employees-not-share-confidential-information-microsoft-2023-1.
Letter from Robert J. Kasunic to Van Lindberg. February 21, 2023. https://www.copyright.gov/docs/zarya-of-the-dawn.pdf.
Li, Jingquan. “Security Implications of AI Chatbots in Health Care.” Journal of Medical Internet Research 25 (November 2023): e47551. https://doi.org/10.2196/47551.
“Library IT vs. the AI Bots.” UNC University Libraries, n.d. https://library.unc.edu/news/library-it-vs-the-ai-bots/.
Lu, Jing. “Artificial Intelligence (AI): How to Cite AI Generated Content.” Purdue University Libraries Research Guides, n.d. https://guides.lib.purdue.edu/c.php?g=1371380&p=10135074.
Mehrotra, Dhruv. “Perplexity Is a Bullshit Machine.” Wired, June 19, 2024. https://www.wired.com/story/perplexity-is-a-bullshit-machine/.
Møberg Jacobsen, Rune, Samuel Rhys Cox, Carla F. Griggio, Niels van Berke. “Chatbots for Data Collection in Surveys: A Comparison of Four Theory-Based Interview Probes.” Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. April 25, 2025. https://dl.acm.org/doi/full/10.1145/3706598.3714128.
“MyGPT Builder.” Duke University Office of Information Technology, n.d. https://oit.duke.edu/service/mygpt-builder/.
“NOT-OD-25-132: Supporting Fairness and Originality in NIH Research Applications.” National Institutes of Health, July 17, 2025. https://grants.nih.gov/grants/guide/notice-files/NOT-OD-25-132.html.
“Notice On Model Training.” Anthropic, December 17, 2024. https://www.anthropic.com/legal/model-training-notice.
“Notice to Research Community: Use of Generative Artificial Intelligence Technology in the NSF Merit Review Process.” National Science Foundation, December 14, 2023. https://www.nsf.gov/news/notice-to-the-research-community-on-ai.
O’Donnell, James, and Casey Crownhart “Everything You Need to Know about Estimating AI’s Energy and Emissions Burden.” MIT Technology Review, May 20, 2025. https://www.technologyreview.com/2025/05/20/1116331/ai-energy-demand-methodology/.
O’Donnell, James, and Casey Crownhart. “We Did the Math on AI’s Energy Footprint. Here’s the Story You Haven’t Heard.” MIT Technology Review, May 20, 2025. https://www.technologyreview.com/2025/05/20/1116327/ai-energy-usage-climate-footprint-big-tech/.
Omowole, Agbolade. “Research Shows AI Is Often Biased. Here’s How to Make Algorithms Work for All of Us.” World Economic Forum, July 19, 2021. https://www.weforum.org/stories/2021/07/ai-machine-learning-bias-discrimination/.
Ornes, Stephen. “The AI Was Fed Sloppy Code. It Turned Into Something Evil.” Quanta Magazine, August 13, 2025. https://www.quantamagazine.org/the-ai-was-fed-sloppy-code-it-turned-into-something-evil-20250813/.
Ottenheimer, Davi, Bruce Schneier. “Data Integrity: The Key to Trust in AI Systems.” IEEE Spectrum, August 18, 2025. https://spectrum.ieee.org/data-integrity.
Perrigo, Billy. “Exclusive: The $2 Per Hour Workers Who Made ChatGPT Safer.” TIME, January 18, 2023. https://time.com/6247678/openai-chatgpt-kenya-workers/.
Proskuryakova, Liliana, Ozcan Saritas, and Elena Kyzyngasheva. “Figure 1. STEEPV Framework with Examples of What Is Covered under Each...” ResearchGate, 2015. https://www.researchgate.net/figure/STEEPV-framework-with-examples-of-what-is-covered-under-each-category_fig1_274566158.
“RAPID Virtual Machines (VMs).” Duke University Research Computing, n.d. https://rescomp.pages.oit.duke.edu/rthelp/services/rapid/.
ruv. “Cheat Sheet: Mastering Temperature and Top_p in ChatGPT API - API.” OpenAI Developer Community, April 22, 2023. https://community.openai.com/t/cheat-sheet-mastering-temperature-and-top-p-in-chatgpt-api/172683.
Strickland, Eliza. “With Robots.Txt, Websites Halt AI Companies’ Web Crawlers.” IEEE Spectrum, August 31, 2024. https://spectrum.ieee.org/web-crawling.
Stokel-Walker, Chris. “Can a Technology Called RAG Keep AI Models from Making Stuff Up?” Ars Technica, June 6, 2024. https://arstechnica.com/ai/2024/06/can-a-technology-called-rag-keep-ai-models-from-making-stuff-up/.
“Terms of Use.” OpenAI, December 11, 2024. https://openai.com/policies/row-terms-of-use/.
“The Duke AI Suite.” Duke University Office of Information Technology, n.d. https://oit.duke.edu/AI-suite/.
Thiel, David. “Identifying and Eliminating CSAM in Generative ML Training Data and Models.” Stanford Digital Repository. 2023. Available at https://purl.stanford.edu/kh752sm9123. https://doi.org/10.25740/kh752sm9123.
“Tools Such as ChatGPT Threaten Transparent Science; Here Are Our Ground Rules for Their Use.” Nature 613, no. 7945 (2023): 612–612. https://doi.org/10.1038/d41586-023-00191-1.
“Copyright and Artificial Intelligence: Part 2: Copyrightability: A Report of the Register of Coyprights.” United States Copyright Office. January 2025. https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-2-Copyrightability-Report.pdf.
user156499, “Copyright Issues When Asked to Summarise the Book - Prompting.” OpenAI Developer Community, February 15, 2024. https://community.openai.com/t/copyright-issues-when-asked-to-summarise-the-book/627583.
Wayne State University Libraries. “SIFT: Evaluating Web Content.” YouTube, June 16, 2020. https://www.youtube.com/watch?v=-NAkkcxbM5k.
“What Is AI Bias?” IBM, December 22, 2023. https://www.ibm.com/think/topics/ai-bias.
“What Is Fair Use?” Stanford Copyright and Fair Use Center, April 4, 2013. https://fairuse.stanford.edu/overview/fair-use/what-is-fair-use/.
“What Uses More? Compare the Environmental Footprint of Digital Tasks.” What Uses More? n.d. https://what-uses-more.com/.
“When AI Gets It Wrong: Addressing AI Hallucinations and Bias.” MIT Sloan Teaching & Learning Technologies, n.d. https://mitsloanedtech.mit.edu/ai/basics/addressing-ai-hallucinations-and-bias/.
“Why Nature Will Not Allow the Use of Generative AI in Images and Video.” Nature 618, no. 7964 (2023): 214–214. https://doi.org/10.1038/d41586-023-01546-4.
Acknowledgments
Hannah L. Jacobs (Digital Humanities Consultant, Duke Libraries) drafted this resource. Jenny Ariansen (Director of Research Integrity, ASIST), Kate Dickson (Copyright & Information Policy Librarian), Hannah Rozear (Librarian for Biological Sciences, Global Health, and Artificial Intelligence Learning), and Anne Washington (Rothermere Associate Professor in Technology Policy, Sanford School of Public Policy) provided key feedback.