Economics 257D1

Economics 257D1

This course in statistics is intended for all Honours students in Economics. The aim is to get students to understand what the discipline of statistics is, why it is important, and how to use it, especially with economic data. The specialised application of statistics to economics is called econometrics, and is the topic of later courses in the Honours program, although majors and even minors students often take these courses.

The mathematical requirements for the course are not very heavy, but students should have reasonable knowledge of, and ability to work with, both the differential and the integral calculus. Some acquaintance with linear algebra, more particularly of matrix algebra, is also desirable.

Course Outline:

The course outline is found by following this link for the PDF version. It is alternatively available from this link in HTML. Although the outline has official status regarding administrative matters, this website is the most important resource. It will be regularly updated, with information, assignments and so forth.

Announcements:

This is just a reminder that this webpage covers the first term only of Economics 257. The evaluation for the complete full-year course will not be complete until after the exam session at the end of the winter term.

The midterm exam is scheduled in class time 08.30-10.00 on Monday October 20.

Our TA for this term is Miroslav Zhao. His email is miroslav.zhao@mail.mcgill.ca. His office hours are on Wednesdays 16.30-17.30 in Leacock 112.

My own office hours, in Leacock 321C, are on Tuesdays and Thursdays from a little after 10.00 until a little before 13.00.

Textbooks:

There is no single textbook required for the course. Two that are suitable are as follows.

Economic Statistics, J.W. Galbraith.
Here you will find the appendix to Chapter 7 of the textbook, with the proofs of the theorems, including the Chebychev inequality.

The complete Chapter 10 of the textbook is found at this link.
I found this free ebook written by a Brazilian friend, on the web: Statistics for Business and Economics, Marcelo Fernandes, Ventus Publishing ApS

Exercises

In response to requests for exercises that let you understand statistics better and prepare for assignments and exams, here are some sources.

Go to this page for an online textbook Introductory Statistics compiled from numerous sources. It provides numerous exercises.
This link takes you to an ebook by Allen Downey, Think stats. The book is also available on the O'Reilly platform. It gives many Python-based exercises.

Software:

I have no specific instructions or recommendations about appropriate software, for assignments or other uses. If you have no preferred software of your own, or if you are having trouble with available software for running regressions, simulations, etc., you might like to try my own software, Ects. The documentation is available, not all but most of it in English, all of it in French. For ease, you can find the first volume here (in English), and the second volume here.

The paper reached by following this link is now a bit old (2009), but it contains a pretty comprehensive list of software available for econometrics and statistics.

Log of material covered:

Our first class was held on September 3. After the usual preliminaries, we embarked on Chapter 1 of Galbraith's textbook, entitled Statistical Reasoning. A set of examples is presented, in which various circumstances are described where statistical reasoning is useful.

The first, on gambling and lotteries, allowed us to think about how notions of probability and statistics were originally developed, by people who hoped to make money by gambling. They were of course disappointed. A much subtler example came next, in which the idea of information in uncertain situations was introduced. Giving qualitative and quantitative descriptions of numerical data sets is obviously important, and an example led to the concept of the density of a sample. The distinction between a population and a sample drawn from it was made, and we were led to formulate the question of how information about the population can be inferred from a sample.
On September 8, we continued and finished Chapter 1 of the textbook. First, we looked at the example on memory in random processes, distinguishing successions of coin tosses, without memory, from weather forecasts, or predictions about the stock market, where memory of various sorts may have an influence.

The next two examples dealt with association and conditional association. The concept of covariates was introduced, and it was pointed out that for statistical conclusions to have any validity, it is necessary to control for one, or more likely many, covariates. In an experimental situation, this is easy to do, but, when one has to rely on observations that one cannot control, things are typically much harder. What is considered the best way to proceed, if possible, is to run an RCT, or randomised control trial, with a test group and a control group.

The last example in the chapter was about prediction, and forecasting. A couple of examples from machine learning illustrated this.

In Chapter 2, different types of economic or financial data are discussed. In macroeconometrics, a very common data type is time series. An economic variable like GDP, or the inflation rate, is observed at different points in time, and the results grouped into an ordered set, with each observation carrying a time stamp. At the other extreme, the data in a cross-section are all collected at the same time from different entities, like households, firms, provinces, etc. Panel data are collections of observations on a set of cross-sectional units that are observed through time, so that each observation carries two indices, the time, and one that corresponds to the cross-sectional unit.

We saw some examples of graphical presentations of these data types, noting that three dimensions are needed for panel data. Some of these presentations are hard to interpret, unless the data set is ordered in some way.
We completed the study of Chapter 2 on September 10. Most of our time was spent on transformations of data series. One such is motivated by the fact that a particular series (US GDP seasonally adjusted) is closely matched by an exponential growth function. The actual transformation is to replace the raw data by their logarithms. After this, exponential growth looks more like a straight line.

Another transformation replaces nominal sums of money, or growth rates, by real values, which take account of inflation. A base period has to be selected, in which real and nominal coincide. Seasonally adjusted data are made available by statistical agencies in an attempt to separate seasonal variation from variation caused by other economic activity.

It is often desirable to transform a time series in levels into one of proportionate changes, as this can uncover features of the series that are not easily visible in the levels. Sometimes the reverse is the case.

Financial data are very different from macroeconomic data, and are much more precise. This means that we often need different techniques to deal with them.
The mathematical material on the exponential and logarithmic functions started us off on September 15. A few corrections were made to the Appendix to Chapter 2 of the textbook.

Chapter 3 deals with various summary statistics that can be used to describe, or characterise, a data set. First come measures of central tendency: these include the sample mean, median, and mode. The quantiles of a distribution are defined in terms of the order statistics, and they include quartiles, quintiles, deciles, vigintiles, and percentiles. Then came measures of dispersion, with the variance and its square root, the range, and the inter-quartile range. Box-whisker plots were introduced at this point, after which came the coefficients of skewness and kurtosis.
It was pointed out on September 17 that the coefficients of skewness and kurtosis were dimensionless as defined. This is not the case for the variance or the covariance of two variables. For the covariance, this can be corrected by using instead the correlation, which is dimensionless and is also restricted to the [0,1] interval.

At the end of Chapter 3, we are sternly warned that correlation in no way implies causation. Causation has no sense outside of a model that tries to account for empirical reality. Before one can infer causation, it is necessary to recognise some mechanism whereby the cause can give rise to the effect.

Next came Chapter 4, which is of a philosophical nature. The main themes come from the work of Karl Popper. It is impossible to prove that some theory is true, but one single example can show that it is false. For any finite set of empirical facts, it is in principle possible to find an infinite number of theories that are compatible with these facts. Scientific progress is thus achieved when theories are falsified, which can be achieved with a small amount of empirical evidence. A notable example is how Newton's laws were falsified in the early twentieth century by observations that were compatible with Einstein's theories of relativity.
A problem for philosophers is that of induction. This term refers to the way humans like to generalise from some specific examples to formulating theories or hypotheses meant to apply generally. Since a theory can never be proved, induction is not a foolproof method, and it may well lead to a theory that is later falsified. We started with this, as discussed at the end of Chapter 4, on September 22.

We then embarked on Part II of the textbook, beginning with Chapter 5. The topic of the chapter is Probability Theory. Mathematical probability is a formalisation of the idea of frequency - how many times does a coin come down heads in a large number of tosses? These tosses constitute a random experiment, of which the outcome is not known in advance with certainty. We may have a notion that the coin is fair, so that the probability of heads is one half. But if we carry out the experiment we may find that far more than a half of the tosses are heads, or perhaps far less. This would lead us to update our notion of the probability, and prefer an a posteriori probability. This is how we can learn from an experiment.

In order to proceed with mathematical probability, we need some ideas from set theory. There are two binary operations defined on sets: intersection and union. There is also a unary operation, where we define the complement of a set. The two binary operations satisfy the properties of commutativity, associativity, and distributivity, and when combined with the complement operation , they satisfy the de Morgan laws.
As we saw on September 24, the operations of set theory can be illustrated using Venn diagrams. Much of Axiomatic probability can be so illustrated. There is a fine distinction between axioms and definitions. For probability, we define a probability space as a triple. The first element is the outcome space, usually denoted Ω. Subsets of the outcome space are called events, and the set of events has to satisfy the axioms of a sigma algebra (σ-algebra), denoted 𝓕, which means that it is closed under the operations of union and intersection. The σ-algebra 𝓕 is the second element of the triple. If we have just (Ω, 𝓕), this constitutes a measurable space.

Probabilities may be assigned to events, elements of 𝓕. A probability measure must also satisfy some axioms, and these allow us to prove various identities involving the probabilities of events. If the probability measure is denoted P, then we have the probability space as the triple (Ω, 𝓕, P).

It is often necessary to count the number of ways in which some things can be done. Functions that are very useful in this context are the factorial, and the combination and permutation functions. Their use was illustrated by counting the number of hands that can be dealt using playing cards.
Conditional probability was the main theme of the class on September 29. From the definition of the probability of an event conditional on another event one can derive Bayes' Theorem, which follows from the fact that the operation of intersection is commutative. The theorem can be stated in several different ways, and numerous results follow from it in combination with the general properties of a probability measure.

A somewhat counter-intuitive result was examined, where we computed the probability that an individual had a rather rare condition when a diagnostic test that is not infallible gave a positive result. Although test has very low error rates, the computed probability was very low.

In Chapter 6, distributions of random variables are the principal topic. A real-valued random variable is a mapping from the outcome space Ω to the real line, and, as such, can have different probability measures superimposed on it. The most straightforward of doing so is to postulate a cumulative distribution function, or CDF. This function has various essential properties, and is sufficient to characterise the distribution of the random variable completely.
On October 2nd we went on to look at other properties of CDFs, and introduced EDFs, for empirical distribution functions, of samples. These are necessarily discrete, since a sample must be of finite size. But there are other discrete distributions, where the number of discrete points that can be realised is infinite. For a discrete distribution, we saw that another form of complete characterisation of the distribution is the probability mass function, or PMF, which specifies a positive probability for each of the possibly infinite number of points of the distribution.

The concept of the support of a distribution was introduced. It is the set of all points that are possible realisations, or drawings from, the distribution. It is sometine necessary, with continuous distributions, to include in the support any points that can be reached as limits of a sequence of points in the support. This can also be stated by saying that the support is a closed set.

When a distribution is continuous, the probability that some single value is realised is zero. Thus we prefer to argue in terms of the probability of intervals. The density of a continuous distribution is a function that, when integrated over an interval, gives the probability of that interval. When a density is integrated over the whole real line, from minus infinity to plus infinity, the answer must be one. The density is the derivative of the CDF.

A graphical way that can give a summary description of a distribution, continuous or discrete, is the histogram. Separate intervals, or cells, of the support are defined, and the histogram shows the probabilities of these intervals.

The family of Normal distributions is what is called a location-scale family. A Normal distribution is completely characterised by two parameters, the expectation and the variance. The standard Normal distribution has expectation zero and variance one, and any other Normal distribution can be generated from the standard Normal. The Normal density is a bit complicated, but should be remembered. It is an example of a distribution that is symmetric about its expectation.

Assignments:

When an assignment is due on a certain date, that means that, if it is emailed to the TA before midnight of that day, it is considered to be on time.

The first assignment, dated September 22, can be found by following this link. It is due on Tuesday September 30.

The second assignment, dated October 6, can be found by following this link. It is due on Monday October 13.

In order to encourage the use of the Linux operating system, here is a link to an article by James MacKinnon, in which he gives valuable information about what software is appropriate for the various tasks econometricians and statisticians wish to undertake.

To send me email, click here or write directly to russell.davidson@mcgill.ca.

Back to the main page

URL: http://russell-davidson.research.mcgill.ca/e257