AI training is expensive: Only technology “giants” can afford it?
23:34 01/06/2024
2 minutes of reading
Researchers say data is key to creating more intelligent and capable AI systems. The article takes the example of two text generation models, Llama 3 from Meta and OLMo from the Allen Institute for Artificial Intelligence (AI2) to illustrate. Although it has almost the same structure, Llama 3 is trained on a larger amount of data so it performs better.
However, data quality is just as important as quantity. AI models operate based on the principle of “garbage in, garbage out”, so filtering and checking data quality is necessary.
Data racing can lead to problems. Experts fear that the focus on big and high-quality data will turn AI development into the monopoly of a few companies with big budgets. They can monopolize data sets and stifle innovation by others.
Additionally, data collection is sometimes not transparent. Some AI companies have pulled data from sources such as YouTube videos, Google Maps reviews without asking permission from content owners or creators. Some companies are even considering using copyright-protected content to train their models.
Another problem is the use of cheap labor in developing countries to label training data. These people are paid low wages and are exposed to violent content for long periods of time without benefits.
Commercial data transactions are also not entirely fair. OpenAI has spent hundreds of millions of dollars to buy content rights, far exceeding the budgets of most research groups, non-profit organizations and startups.
With the AI training data market expected to grow strongly, data platforms are charging higher fees. This hurts the AI research community as a whole because smaller groups cannot afford it.
However, there are some independent efforts to make open data sets free for everyone. EleutherAI, a nonprofit research group, is collaborating with the University of Toronto and other institutions to build The Pile v2, a suite of billions of text snippets.
The question is whether these efforts can keep up with major technology corporations. If data collection and testing still depends on financial resources, the answer is likely no, at least until there is a research breakthrough that levels the playing field.
Keywords:
Related articles
Robot with smart grip
NASA’s goal of conquering the Sun
Apple launches a new feature that makes it easier to use your phone while sitting on vehicle
Google Photos launches smart search feature “Ask for photos”
Roku streams live MLB baseball games for free
Gun detection AI technology company uses Disney to successfully persuade New York
Hackers claim to have collected 49 million Dell customer addresses before the company discovered the breach
Thai food delivery app Line Man Wongnai plans to IPO in Thailand and the US in 2025
Google pioneered the development of the first social networking application for Android
AI outperforms humans in gaming: Altera receives investment from Eric Schmidt
TikTok automatically labels AI content from platforms like DALL·E 3
Reddit locks public data, requires a contract to allow access
Cracking passwords using Brute Force takes more time, but don’t rejoice!
US lawsuit against Apple: What will happen to iPhone and Android?
The UAE will likely help fund OpenAI’s self-produced chips
AI-composed blues music lacks human flair and rhythm
iOS 17: iPhone is safer with anti-theft feature
Samsung launches 2024 OLED TV with the highlight of breakthrough anti-glare technology
Microsoft launches new Surface computer with integrated AI for businesses
REGISTER
TODAY
Sign up to get the inside scoop on today's biggest stories in markets, technology delivered daily.
By clicking “Sign Up”, you accept our Terms of Service and Privacy Policy. You can opt out at any time.
5
s
Comment (0)