Chapter 3.10

Data Aggregation and Analysis

"Mass psychology is not simply a summation of individual psychologies; that is a prime theorem of social psychodynamics---not just my opinion; no exception has ever been found to this theorem."

Robert A. Heinlein, Methuselah's Children

These days, you hear the buzz-word "Big Data" being thrown around everywhere, as if it's something new and exciting. It isn't, really; it is just the natural extension of data aggregation and analysis techniques that have existed for decades, applied to the surprising amount of personal information that is shared with the world on the internet through social media, blogging platforms, and community platforms. Now, what can be inferred from this amount of data is what the buzz is really all about---and where a lot of companies are making all their money.

At its core, Big Data is simple. It starts with simple techniques for parsing the data available on the internet, from HTML, XML, JSON, and so forth; using concepts from the semantic web, it allows developers to infer more meaning about this data than would otherwise be possible; and then it aggregates large volumes of this data so that global trends and individual preferences can be exploited for profit. And using techniques from artificial intelligence for designing knowledge-based systems, this simple set of techniques can be transformed into a multi-billion dollar enterprise.

These types of tasks are Lisp's core strength. Symbolic computation is aptly suited for representing knowledge-based systems and artificial intelligence problems, and can give new-comers to the business of Big Data a significant edge against their monolithic, corporate competitors.

In this chapter we will look at the challenges of scraping websites, xml and json feeds; parsing the data intelligently; storing and indexing large quantities of data; analyzing and graphing such data in meaningful ways; and writing AI agents that can automate this process for you. We will also look at considerations of privacy, and your responsibility in handling, using, and protecting personal information appropriately.

Exercise 3.10.1

Lisp-Based HTTP Clients



Exercise 3.10.2

Scraping the Web



Exercise 3.10.3

Parsing XML and HTML



Exercise 3.10.4

Parsing JSON



Exercise 3.10.5

Data Aggregation



Exercise 3.10.6

Targeted Data Mining



Project 3.10.7

An Extensible Knowledge Engine



results matching ""

    No results matching ""