Many workers on platforms like Amazon Mechanical Turk are using AI language models like GPT-3 to perform their tasks. This use of AI-produced data for tasks that eventually feed machine learning models can lead to concerns like reduced output quality and increased bias.
Human Labor & AI Models:
- AI systems are largely dependent on human labor, with many corporations using platforms like Amazon Mechanical Turk.
- Workers on these platforms perform tasks such as data labeling and annotation, transcribing, and describing situations.
- This data is used to train AI models, allowing them to perform similar tasks on a larger scale.
Experiment by EPFL Researchers:
- Researchers at the École polytechnique fédérale de Lausanne (EPFL) in Switzerland conducted an experiment involving workers on Amazon Mechanical Turk.
- The workers were tasked with summarizing abstracts of medical research papers.
- It was found that a significant portion of the completed work appeared to be generated by AI models, possibly to increase efficiency and income.
Use of AI Detected Through Specific Methodology:
- The research team developed a methodology to detect if the work was human-generated or AI-generated.
- They created a classifier and used keystroke data to detect whether workers copied and pasted text from AI systems.
- The researchers were able to validate their results by cross-checking with the collected keystroke data.
The Drawbacks and Future of Using AI in Crowdsourced Work:
- Training AI models on data generated by other AI could result in a decrease in quality, more bias, and potential inaccuracies.
- Responses generated by AI systems are seen as bland and lacking the complexity and creativity of human-generated responses.
- Researchers suggest that as AI improves, the nature of crowdsourced work may change with the potential of AI replacing some workers.
- The possibility of collaboration between humans and AI models in generating responses is also suggested.
The Importance of Human Data:
- Human data is deemed as the gold standard as it is representative of humans, whom AI serves.
- The researchers emphasize that what they often aim to study from crowdsourced data are the imperfections of human responses.
- This could imply that measures might be implemented in future to prevent AI usage in such platforms and ensure human data acquisition.
PS: I run a ML-powered news aggregator that summarizes with an AI the best tech news from 40+ media (TheVerge, TechCrunch…). If you liked this analysis, you’ll love the content you’ll receive from this tool!
You must log in or register to comment.
That’s interesting, haven’t thought of that ascpect before.