# Coding for data

This is the textbook for a course teaching _data science_ - coding for data.

We go into some detail about what we mean by "Data science" in the [next
section](what-is-data-science), but here is the one-line summary:

> Data science is an approach to data analysis with a foundation in code
> and algorithms.

The textbook aims to teach you this approach.

We designed the textbook so you can learn from it without doing the course. If
you read this textbook carefully, and do the exercises, you will have a solid
foundation for learning more about data science.

## The background you need

You do not need any previous experience in programming to use this book. We
aim to teach you the programming you need as we go.

The book deliberately uses very little formal mathematics. Instead, we show
how the procedures work with code. We hope you agree that this can be much
easier to understand, especially for those without much background in
mathematics.

## A summary of the book

We start by asking some questions about the real world that force us to think
about the effects of chance. Then we use some simple code to ask the computer
to _simulate_ the effects of chance. We find that this allows us to draw
important conclusions about events in the real world.

We continue with this approach throughout the textbook. First we look at some
data from the real world, then we think about the procedures we would need to
draw conclusions from these data. Next we go through the code that you need,
and then we implement the procedures in code, and draw our conclusions.

By the end of the course, we have covered simulation of the real world using
random numbers, drawing conclusions about differences between groups using
random permutation, looking for straight line relationships using regression,
and allowing for multiple explanatory measures with multiple regression.  We look at ways of estimating the confidence of our estimates, using confidence intervals.

## Inspired by the Berkeley textbook

This textbook draws heavily on the approach in the excellent [Computational and
Inferential Thinking](https://www.inferentialthinking.com) textbook for the
data science course at the University of California Berkeley: [The foundations
of data science](http://data8.org/).  As you will see noted in various places,
some pages in this textbook are early versions of pages in the Berkeley
textbook. Many thanks to the main authors [Ani
Adhikari](http://statistics.berkeley.edu/people/ani-adhikari) and [John
DeNero](http://denero.org). [David Wagner](https://www.cs.berkeley.edu/~daw),
wrote a [truly beautiful chapter on machine learning and
classification](https://inferentialthinking.com/chapters/17/Classification.html),
that you can find in the current Berkeley textbook.  Previous editions of this
textbook had versions of those pages, but we removed them as they were such
a substantial and original piece of work, we felt bad for including them in
a textbook that is largely our own.

## Many sections are interactive

You can interact with many of the sections in this textbook, with the
"Interact" button at the top of the page. This will take you to a free online
service that allows you to execute the code in the section, to generate the
tables and figures. We encourage you to play with these interactive sections by
changing the code and running it.
