Data: An expensive necessity for AI and Machine Learning

AI and Machine Learning are two big buzzwords within Information Technology and StartUps.  Since 2002 more then $3.8B have been raised by AI companies [10]. Nonetheless the biggest players are established companies such as Google, Facebook etc.. This might just be temporarily as they have the “benefit” of size and economics, but some argue that ownership of data is the actual reason. Accumulating useful, high-quality data is not only expensive but also very time consuming. So how can AI and Machine Learning flourish in a free market and not just inside of big corporations with gigantic data sets?

As we learned from our last guest speakers data centres which usually are owned by external operators (Google, Amazon etc.) do not only have access to a lot of data but have also raised a very high barrier to market entry into the data market. Can an AI StartUp really compete with a company such as Google or Facebook which generates millions and billions of user connected data points every day which it can partly use to train smart algorithms on? Some organisations, such as the US government (data.gov) are already taking steps towards making data more publicly available to promote innovation. There are concrete examples in history that show that opening up a market cause great breakthrough. An example would be the move from a Windows dominated world to the open Web, which lead to a surge in software innovation [12].

While many see the problem with restricted data sets and the high entry barrier that comes along with it, arguable no-one  knows a good solution to the problem. Should the governments restrict the privatisation of data and/or enforce more open API’s to companies data sets? And if thats the case, which data is to be accessible and with what interconnections? Should we move away from using big parts of data as commodities and make them public with distributed technologies such as the blockchain? Or should we just let the market decide and put our trust into the experts that work on developing AI on a daily basis?

This Blog Post raised more questions than it answered. Data Ownership is and will be an extremely sensitive and important issue to tackle for us and the time to start thinking about it is now. It is in all of our interests to make sure that AI is developed by many companies with different interests and not just a few big ones. In the end I want to be able to choose among a vast set of options and services. What do you think? I looking forward to discuss your thoughts and suggestions!

 

 

Sources

  1. https://hbr.org/2015/03/data-monopolists-like-google-are-threatening-the-economy
  2. https://www.economist.com/news/briefing/21721634-how-it-shaping-up-data-giving-rise-new-economy
  3. https://itif.org/publications/2017/03/06/myth-data-monopoly-why-antitrust-concerns-about-data-are-overblown
  4. https://www.forbes.com/sites/bernardmarr/2016/12/06/what-is-the-difference-between-artificial-intelligence-and-machine-learning/#20f1074b2742
  5. http://sloanreview.mit.edu/article/how-big-data-is-empowering-ai-and-machine-learning-at-scale/
  6. http://www.huffingtonpost.com/james-canton/from-big-data-to-artifici_b_10817892.html
  7. https://www.quora.com/How-much-data-is-enough-to-train-a-deep-NN-model
  8. https://www.quora.com/What-is-the-recommended-minimum-training-dataset-size-to-train-a-deep-neural-network
  9. https://stats.stackexchange.com/questions/200895/how-much-data-for-deep-learning
  10. https://www.cbinsights.com/blog/artificial-intelligence-top-startups/
  11. https://deeplearning4j.org/data-for-deep-learning.html
  12. https://www.technologyreview.com/s/533856/who-owns-big-data/
1+

Users who have LIKED this post:

  • avatar

One comment on “Data: An expensive necessity for AI and Machine Learning”

  1. Hi Dean – I think you touched on one of the most important problems for companies trying to create sustainable value in the AI space. AI as a technology is somewhat complicated, but isn’t so hard that it becomes a meaningful barrier to entry for competitors. There are lots of software libraries, web sites and online courses that people can use to learn what they need to implement AI and overcome that barrier. Only companies employing the most innovative researchers (Google Brain and DeepMind, OpenAI, Facebook, etc.) can maintain an edge on the technology itself. Everybody else needs to sustain their competitive position in some other way; capturing large amounts of proprietary data is one way to do it.

    While many companies, nonprofits and governments are starting to publish large amounts of data publicly, I don’t think that will be enough to level the playing field. The problem is that there is an advantage to having both gigantic scale and broad scope of data; large enterprises have both, but startups have neither.

    Governments forcing companies to publicize their data could create privacy issues and disincentives for companies who accumulate data, which would slow down innovation and investments.

    One approach is for AI-centric startups to work with big tech venture funds (like Google’s Gradient Ventures) to get access to expertise and data to develop their company. Thoughts?

    1+

    Users who have LIKED this comment:

    • avatar

Comments are closed.