Apple has made its position clear in response to recent claims that it uses YouTube videos to train AI models. The tech giant confirmed that a specific dataset, which included YouTube subtitles, was used to train its open-source OpenELM language model. However, this model does not contribute to any consumer-facing AI or machine learning features.
OpenELM from Apple: Not a Consumer Good, But a Research Tool
Apple emphasized that OpenELM was developed as a research tool and does not power any of its customer-oriented products, including Apple Intelligence. This clarification follows a Wired report, based on a Proof News investigation, which revealed that several tech companies, including Apple, had utilized subtitles from thousands of YouTube videos in their AI training processes.
YouTube Data: A Small Part of a Diverse Training Set
YouTube subtitles were part of the training dataset but only a small fraction. The dataset included various content. This included transcripts from MIT and Harvard. It also had news from The Wall Street Journal and NPR. Additionally, content from popular YouTubers was used. The diverse dataset aimed to provide a comprehensive training ground for the AI models.
Balancing Innovation and Privacy: Apple’s Approach to AI
Reiterating its dedication to user privacy, Apple said that its web crawler collects data that is publicly available and licensed for use in training Apple Intelligence models. The business insists that neither user interactions nor private user data are used to train its AI models.
OpenELM: Advancing Open-Source AI Development
Apple’s OpenELM language model utilizes a unique layer-wise scaling strategy to optimize parameter allocation within the transformer model, leading to improved accuracy. Apple hopes to further the interests of the larger AI research community by making this model publicly available. It also hopes to foster advancements in open-source large language model development.