Ton asked ChatGPT if it is a good idea for publishers to pushing academic authors to finish manuscripts and articles before the end of year, because of deals with LLM-companies. Spoiler: no, of course not!

I used the exact same prompt on Claude 3.5 Sonnet and got a longer version of basically the same answer: It’s a dumb idea. But to be honest, I think it’s pretty clever for a server with a really large harddisk and lots of power to come up with such a conclusion at the end…Read for yourself.

Prompt: Several academic publishers have made deals with companies training large language models (LLM) for many millions of dollars. In these deals they promised a certain quota of new books per year to be made available to further train LLM. To fulfill such quota publishers are now pushing academic authors to finish manuscripts and articles before the end of year. Including by foregoing proper peer review. Do you think this is a good idea? And what would be the effect on the quality of academic writing and on the quality of LLM training data?

Output: This situation raises several important concerns about the potential impact on academic publishing, research quality, and AI development. Let’s break down the key issues:

  1. Quality of academic work: Rushing academic publications and bypassing proper peer review processes could significantly compromise the quality and reliability of research. Peer review, while not perfect, plays a crucial role in validating methodologies, findings, and conclusions. Skipping this step may lead to:
  • Increased errors or inaccuracies in published work
  • Reduced rigor in research methodologies
  • Less thorough analysis of results
  • Potential publication of premature or incomplete findings
  1. Integrity of the publishing process: The pressure to meet quotas for AI training data could distort the normal academic publishing process. This may lead to:
  • Prioritizing quantity over quality
  • Biasing publication decisions based on fulfilling quotas rather than scientific merit
  • Potentially undermining the credibility of academic publishers
  1. Impact on researchers: This situation could negatively affect researchers by:
  • Increasing stress and pressure to produce work quickly
  • Potentially compromising their academic reputation if rushed work is of lower quality
  • Creating conflicts between maintaining research integrity and meeting publisher demands
  1. Quality of LLM training data: While more data can be beneficial for training LLMs, the quality of that data is crucial. Using rushed or improperly vetted academic work as training data could lead to:
  • Propagation of errors or inaccuracies in AI models
  • Potential biases or skewed understanding of academic fields
  • Reduced overall reliability of the AI’s knowledge base
  1. Long-term consequences: This approach could have broader implications for:
  • The public’s trust in academic research and publications
  • The reliability of AI systems trained on this data
  • The overall progress and integrity of scientific advancement

In my assessment, prioritizing quantity over quality in academic publishing to meet AI training data quotas is not a good idea. The potential negative consequences for research integrity, academic careers, and the reliability of both published work and AI systems trained on this data outweigh the short-term benefits of increased data availability.

A more balanced approach might involve:

  1. Extending timelines for data provision to ensure proper peer review processes are maintained
  2. Focusing on making existing, properly vetted research more accessible for AI training
  3. Developing alternative sources of high-quality training data that don’t compromise academic integrity
  4. Encouraging transparency about the sources and vetting processes for AI training data

Ultimately, maintaining the integrity and quality of academic research should be the priority, as this serves the long-term interests of both the scientific community and AI development.

(end of output)