Why Big Data will not have all the answers for consumer marketing

Written By Josh Toh Junxian

Elliot, a fictional character created by USA network's "Mr Robot" [1], is a vigilante hacker who suffers from a social anxiety disorder. For the skills he lacks in real-life communication, Elliot compensates them with inapt hacking abilities. Elliot constructs his interpretation of the real world by tapping into the digital lives of his friends and co-workers. While he gets it right at times, his conjectures are more frequently distorted or inflated.

"Mr Robot", with an ensemble of writers who are real-life hackers, has been widely accredited for hacker realism [2]. Yet the show’s rise to popularity is attributed more accurately to its timely debut in a world that is increasingly apprehensive of their burgeoning digital footprints [3]. On the flip side of the coin, businesses are jumping on the bandwagon to capture and analyze Big Data for consumer insights to better their trade.

(Video source [24])

It is from this culture reference that we discuss the limitations of consumer Big Data, consisting of personal, social and transactional information, in providing genuine marketing insights for businesses.

In an infamous article penned by Anderson, Editor- in-Chief of Wired [4], he writes: 

This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.

This bold claim has created a backlash of critiques even among those who were advocates of quantitative research [5;6]. The inclination to consider technically synthesized or selected data as more neutral hence more reliable, resulting in theory redundancy is coined as Dataism [6]. Opposing voices challenge the underlying assumption that Big Data alone can be used to provide all the answers [6]. They fear that by legitimatizing the direct relationship between Big Data and humans, inadequate profiles will be created to represent individuals in reality [7;8]. When handled lightly, Big Data can falsely shape knowledge of the consumer by altering the lens at which accurate understanding can be conceived [9].

To put it simply, they think Big data does not have all the answers.

To elaborate this further, let’s examine these discrepancies across three different consumer Big Data types: articulated, behavioral and profile data.


1) Articulated data are riddled with ego and fear

Social media, websites and forums present a database of consumer conversations for marketers. They look to these Big Data for answers to questions such as: What do consumers look for in a product or service? What are their contentment and pain points in relations to the brand? What motivates brand advocates and detractors?

However, user-generated digital selves have been found to vary across platforms and may deviate from their identities in real life [10]. In fact, much like the act of presenting a speech on stage in front of an audience, online articulation is largely motivated by the need to present oneself before others. As a result, Big Data collected may often be tainted with overlays of messages that tend to construct oneself in seeking likeness from others [11].

In addition, consumers are increasingly aware of police agencies that monitors publicly articulated data. The enduring repository of data recorded online requires them to carefully consider before articulation. This is to avoid the risk of Big Data being used against the users not only at present, but also in all foreseeable future events [7]. In fact, the ‘right to be forgotten’ a phrase first coined by Victor Mayer-Schonberger [12] and made popular by policy makers, non-governmental and civil rights organizations, [13] is a topic of much debate and a major concern among users. This in turn results in a possible ‘chilling effect’ characterized by online articulation that are restrained and self-censored to safeguard oneself from the dangers of unintended disclosures [14]. To a large extent, they reflect filtered public engagement with internet technologies [9] and may not be fully accurate in representing the consumer in market research purposes.

2) Behavioral data are contextual

A simple online errand easily concedes a series of behavioral data. With the help of cookies and website tracking tags, marketers seek answers from Big Data to questions such as: How do consumer behaviors differ across digital platforms? How do historical transactions influence their subsequent actions? How do they respond to an ad?

Digital savvy consumers Joe and Ben are both looking to buy lightbulbs. While surfing the web, they encounter the same well-crafted ad on lightbulbs from a retailer nearby at discounted prices. As a low-ticket item, the instrumentalist approach may be to click the ad and make their purchases - unless they are differentially motivated. Joe used to have a 50-watt bulb but is now looking for a higher-watt bulb that will not burn the house down. He is looking for a trusted brand that could provide quality advice for his purchase. Ben, on the other hand, would like to experiment with colored bulbs but is unsure of his decision. He is looking for a brand that offer hassle-free return policy.


While there is certain value in analyzing abstract Big Data, both in its holistic as well as finer forms for hyper-personalization, behavioral data loses some of its meanings when taken out of context [6]. Each data is uniquely created within a production setting that is influential in shaping consumer decisions. Big Data, which presents sheer amount of peripheral information for consideration, makes singular context analysis a difficult task to manage effectively at scale. Retaining contextual findings is even more challenging when data are condensed to fit into analytical models [5].

Consumer researchers are encouraged to consider the relationships between objects, the physicality and different contexts and rituals of life in evaluating virtual behaviors. Big Data, when studied exclusively, risk being detached from social reality. In other words, with an overly simplistic or at best an approximation of real-life scenarios, the quality of analyses and insights derived from Big Data are expected to decline [6].

3) Profile data are bounded

In the digital marketplace, it is increasingly common for consumers to trade their personal information in exchange for access to a website, white paper, eBook or webinar. On social media, consumer profile data including age, gender, marital status, country, occupation, preferences are also often revealed in constructing a symbolic digital representation. Marketers leverage Big Data to answer questions such as: Who are likely consumers for the brand? Where can these consumers be found online? When is the best time to release a new product?

However, there is an ongoing challenge in acquiring accurate profile data. Studies have shown that most consumers have resisted at some point in providing their personal data. In fact, a large proportion of them have admitted to falsifying information in exchange for access to online resources [15]. A lack of trust, based on a general impression of the soliciting website, was found to be a significant hurdle to data disclosure. Though, exchange fairness, in the form of social contract that provides users with the ability to control the disclosure of their Big Data, does help alleviate privacy concerns [15].


In addition, the overly focus on Big Data could risk become an end in itself. Especially with specialized tools of Big Data, there exist inherent limitations and restrictions that needs to be carefully considered before data collection [5]. One such consideration is understanding sample. For example, social media users do not represent ‘all people’ but a rather specific subset depending on the digital platform chosen for Big Data mining. Also, there are complexities in interpreting accounts and users. A single user can have multiple accounts while multiple users can manage a single account. Of course, accounts could be fictitious as well and fashioned as ‘bots’ that replicate content without direct human intervention. Big Data sources from social media can disrupt sampling effectiveness. Twitter Inc., for example, only provides a fraction of user information for public access through its APIs [5].

As shown above, the underlying assumptions that Big Data are inherently relevant, hygienic and exhaustive in its depiction of consumer phenomenon are highly contested in marketing research [6]. Consumer Big Data, as argued by many, possess the potential for valued contribution only when complemented with qualitative research [6].

Thick Data, first coined by ethnographer Wang, is “data brought to light using qualitative, ethnographic research methods that uncover people’s emotions, stories, and models of their world” [16]. To elucidate the potential value of Thick Data, let us compare the differences with Big Data from three perspectives.

(Image source  [16] )

(Image source [16])

1) Breadth // depth

The advantage of Big Data is the availability of a wide variety of data retrieved, handled and responded to in real time [9]. For instance, a popular method for Big Data research on tweets conducted by Axel Bruns and colleagues [17], among others, uses Big Data analyses to conceptualize network structures and dynamics. While providing useful insights into overall landscapes of networks, such analyses fail to reveal underlying meanings of networks, tweets and platforms of the users.

In contrast to Big Data, Thick Data can uncover deep research insights at very modest scales. For example, in the ethnographical study conducted by Veinot [18], the researcher shadowed a vault inspector at a hydroelectric utility company to examine the information practices of blue-collar workers and symbolic representations. Veinot’s study was accredited for its detailed insights that help define ‘information practices’ for a minority group and in unconventional research spaces. Despite the low number of participant involved, her work contributes to the field in a way that cannot be achieved by Big Data research on millions of Facebook or Twitter accounts [5].

2) Computationality // philosophy

With the ability to search, combine and cross-reference large data sets, Big Data is often falsely believed to offer a higher form of intelligence with computation that are not previously possible by human means [5]. Instead, Big Data creates destabilizing aggregates of information based on computationality that, at times, defy philosophical explanations [19].  Computationality, if falsely understood as an ontotheology, can mislead by reframing the constitution of knowledge, processes to conduct and analyze research, and the perception of reality [5]. Thick Data, in contrast to Big Data, is grounded by culture theories that provide the philosophical balance required when used in conjunction with Big Data.

3) Facts // interpretation

There is a common misconception that quantitative researches generate facts while qualitative researches interpret people stories [5]. In fact, no amount of data is capable of generating insights without data first being imagined in the first place [20]. While Big Data computational scientists tend to assert their conclusions as facts, any mathematical model, no matter how sound, involves a great degree of human interpretation to make sense of the data. Hence not all numbers in Big Data can be assumed unbiased in generating insights [5]. For example, Leinweber [21] showed based on analysis of Big Data that a correlation exists between the S&P 500 stock index fluctuations and Bangladesh’s butter production, which would be bogus if interpreted based on ‘numbers’.

As observed in its departure from the charts of Gartner’s hype cycle in 2015 [22], Big Data implementation is now widely perceived as a mainstream phenomenon across many industries and practices. As suggested by Suchman [23], “we are our tools.” Hence it is in the best interest of any consumer marketing researcher to critically question the underlying assumptions, values and biases associated with Big Data analyses [5]. Finally, as outlined in this article, and in a few others [6], the true promise of Big Data research may well be unleashed when coupled with Thick Data or other qualitative research methods for more holistic outcomes.


1     IMDB (2016). IMDB: Mr Robot, Available Online: http://www.imdb.com/title/tt4158110/ [Accessed 20 November 2016].

2     Fowler, B. (2016). How the TV show Mr. Robot won the prize for hacker realism, Available Online: http://phys.org/news/2016-09-tv-robot-won-prize-hacker.html [Accessed 20 November 2016].

3     Seitz, M. (2016). Why Mr. Robot’s film references are subtler than you think, Available Online: http://www.vulture.com/2016/07/mr-robot-film-reference-close-read.html [Accessed 20 November 2016].

4     Anderson, C. (2008). The end of theory, will the data deluge makes the scientific method obsolete?, Available Online at: https://www.wired.com/2008/06/pb-theory/ [Accessed 20 November 2016].

5     Boyd, D. & Crawford, K. (2012). Critical questions for Big Data, Information, Communication & Society, vol. 15, no. 5, pp. 662-679, Available Online: http://dx.doi.org/10.1080/1369118X.2012.678878 [Accessed 20 November 2016].

6     Lohmeier, C. (2014). The researcher and the never-ending field: reconsidering Big Data and digital ethnography, in Martin Hand, Sam Hillyard (ed.) Big Data? Qualitative Approaches to Digital Research, Emerald Group Publishing Limited, pp.75 - 89, Available Online: http://dx.doi.org/10.1108/S1042-319220140000013005 [Accessed 20 November 2016].

7     Andrejevic, M. (2013). Infoglut: How too much information is changing the way we think and know, New York, NY: Routledge.

8     Gandy, O. (1993). The panoptic sort: A political economy of personal information, Boulder, CO: Westview Press.

9     Trottier, D. (2014), Big Data ambivalence: visions and risks in practice, in Martin Hand, Sam Hillyard (ed.) Big Data? Qualitative Approaches to Digital Research, Emerald Group Publishing Limited, pp.51 – 72, Available Online: http://dx.doi.org/10.1108/S1042-319220140000013004 [Accessed 20 November 2016].

10   Cheung, C. (2000). A Home on the Web, in Web.Studies: Rewiring Media Studies for the Digital Age, ed. David Gauntlett, London, England: Arnold, pp. 43–51.

11   Schau, H. C. & Gilly, M. C. (2003). We Are What We Post? Self-Presentation in Personal Web Space, Journal of Consumer Research, vol. 30, no. 3, pp. 385-404.

12   Mayer-Schönberger, V. (2009). Delete: The virtue of forgetting in the digital age, Princeton, NJ: Princeton University Press.

13   Rosen, J. (2012). The right to be forgotten, Available Online: https://www.stanfordlawreview.org/online/privacy-paradox-the-right-to-be-forgotten [Accessed 20 November 2016].

14   BBC. (2012). Social media laws to be discussed in wake of prosecutions, Available Online: http://www.bbc.co.uk/news/technology-19910865 [Accessed 20 November 2016].

15   Li, H., Sarathy, R. & Xu, H. (2011). The role of affect and cognition on online consumers' decision to disclose personal information to unfamiliar online vendors, Decision Support System, vol. 51, no. 3, pp. 434-445, Available Online: http://www.sciencedirect.com/science/article/pii/S0167923611000467 [Accessed 20 November 2016].

16   Wang, T. (2013). Why Big Data needs Thick Data, Available Online: https://medium.com/ethnography-matters/why-big-data-needs-thick-data-b4b3e75e3d7#.g0cjp3j4s [Accessed 20 November 2016].

17   Bruns, A., & Burgess, J. E. (2012). Researching news discussion on Twitter: New methodologies. Journalism Studies, vol. 13, no. 56, pp. 801-814.

18   Veinot, T. (2007). The eyes of the power company: workplace information practices of a vault inspector, The Library Quarterly, vol. 77, no. 2, pp. 157–180.

19   Berry, D. (2011). The computational turn: thinking about the digital humanities, Culture Machine, vol. 12, Available Online: http://www.culturemachine.net/index.php/cm/article/view/440/470 [Accessed 20 November 2016].

20   Gitelman, L. (2011). Notes for the Upcoming Collection ‘Raw Data’ is an Oxymoron, Available Online: https://mitpress.mit.edu/books/raw-data-oxymoron [Accessed 20 November 2016].

21   Leinweber, D. (2007). Stupid data miner tricks: overfitting the S&P 500, The Journal of Investing, vol. 16, no. 1, pp. 15–22.

22   Woodie, A. (2015). Why Gartner dropped Big Data off the hype cycle, Available Online: https://www.datanami.com/2015/08/26/why-gartner-dropped-big-data-off-the-hype-curve/ [Accessed 20 November 2016].

23   Suchman, L. (2011). Consuming anthropology, in Interdisciplinarity: Reconfigurations of the Social and Natural Sciences, Routledge, London, Available Online: http://www.lancs.ac.uk/fass/doc_library/sociology/Suchman_consuming_anthroploogy.pdf [Accessed 20 November 2016].

24   Akadiluted. (2015). Mr Robot: Elliot Hacks his psychiatrist Krista, [Video Online] Available at: https://www.youtube.com/watch?v=eLo-XXvFn2k [Accessed 20 November 2016].