541 Multivariate Analysis « Oleg Melnikov's website @Rice

541 Multivariate Analysis

Dear Stat 541 Students,

Links: course webpage, syllabus, HW, Dr. Scott’s Home Page

Tech Resources: LyX, Knitr, LyX+Knitr Instructions, RStudio , Revolution R, CamScanner (for iPhone), pdfFactory, ke

HW Tips and Comments:

11/14

Presentation slides on Kernel Methods
Data for your final projects:

UCI Machine Learning Repository

pick a dataset based on your needs (type of problem, size, etc.)

Market Matrix

pick a dataset based on it’s visual representation

US Census Bureau

Population estimtes

HW 5

Post-grading remarks:

General: note that your numeric results in these problems are approximations (rounded numbers)!! So, use $\approx$ symbol liberally (and appropriately) 🙂
#5.3.2: If you review all versions of theorems 5.3.1 and 5.3.2, you’ll find ones that are useful for solving this problem. Remember that you can modify (or should simplify) your “multivariate” hypothesis to become univariate. So $H_{0}:\mu_{1}=\mu_{2}\iff H_{0}:\mathbf{R}\boldsymbol{\mu}=0$ for $\mathbf{R}=\left(1,-1\right)$

HW 4

#4.1.1: be careful with dimensions. Work it out carefully!
#4.2.6 has a small typo. See below in Typos…
#4.2.12: small typo. See below.
Post-grading remarks:

4.1.1. Just a clarification on notation.

Always unambiguous notation shows the relationship between function being integrated/differentiated and variable of integration/differentiation
Make sure you understand that $f$ is a function name and it defines some relationship like $f(x)=x^2$. $f(x)$ is only a function if $x$ is also a function. Normally, $f(x)$ is a value of function $f$ at point $x$.
Correct derivative notation will show the relationship between function and variable of differentiation : $f’=f’\left(\mathbf{x}\right)=D_{\mathbf{x}}f\left(\mathbf{x}\right)=\frac{d}{d\mathbf{x}}f\left(\mathbf{x}\right)=\nabla_{\mathbf{x}}f\left(\mathbf{x}\right)$
Correct integral notation will do the same: $\int f=\int f\left(\mathbf{x}\right)d\mathbf{x}$
The following notation is potentially ambiguous because it does not show how the function and variable of differentiation/integration are related: $\frac{d}{d\mathbf{x}}f$, $\int fd\mathbf{x},\int fd\mathbf{x}$, $\lim f\left(\mathbf{x}\right)=a$

HW 3

Post-grading remarks:

#3.2.6: It’s a direct application of Thm. 3.2.4 (which relies on Thm.3.2.3), but with an important twist: Do not assume normal distribution! This problem wants you to read the book carefully (esp. 1st paragraph of p.64) to recognize that you can use Thm. 3.2.3 and 3.2.4 for any distribution with the given first two moments. If you assumed Normal distribution in this problem, I wonder if you carefully read the book 🙂 The lack of normality assumption should have raised concern and made you double check the text 🙂

Another very common mistake in this problem was mixing incorrect dimensions… Why does this not make sense (inn the context of the problem): $\mathbb{E}\left[\left(x_{1},x_{2}\right)\mid \cdots\right]=\begin{bmatrix}\mu_{1}\\ \mu_{2} \end{bmatrix}+\cdots$ ?

#3.3.4: The proof of the theorem is already given to you. In such case you don’t need to do much, but I still expect you to fill in the small details: correct dimensions, theorems used, etc. I do not want to see rewriting of the original problem and answer. I need solution. And, if solution is already given, add details that help me see you understand it 🙂 Surprisingly, not everyone was able to do the dimensions correctly! Please make sure you understand the notation on p.8: $\mathbf{X}=\begin{bmatrix}\mathbf{x}_{1}’\\ \vdots\\ \mathbf{x}_{n}’ \end{bmatrix}\overset{\text{OR}}{=}\begin{bmatrix}\mathbf{x}_{\left(1\right)} & \cdots & \mathbf{x}_{\left(p\right)}\end{bmatrix}$. Pay attention to subscripts!!!

$\mathbf{x}_{1}’$ is a row (horizontal) vector from original matrix $\mathbf{X}$. It is frequently written as $\mathbf{x}_{1}$, which makes it a vertical (usual) vector, which still represents a transposed row from $\mathbf{X}$. Such vector is an observation with values of multiple covariates (variables). It makes no sense to take it’s average because variables are likely of different units, mixed type (discrete, continuous, categorical), etc.
$\mathbf{x}_{\left(1\right)}$ is a column vector, which represents observations for one particular variable. This is where mean, median, variance, etc. all make sense.

#3.4.5: I know some of you were exhausted from problem 3.3.4, but simply rewriting the book is not acceptable. I need to see your contribution, not a verbatim copy of the steps that author provides as a guideline to the proof. You still need to show me how the theorems are applied, work out the matrix algebra, etc. Fill in the details! Any verbatim copy of the text doesn’t convey understanding and will earn zero points (especially, if no credit is given to the author).

HW 2

#2.2.3: see Typos list below
I’ll be seeking improvements in your computational project: follow my newly minted guidelines below.
Post-grading remarks:

Matrix Inverses (def on p.458): an inverse of a matrix must be both its left and its right inverse, unless explicitly recognized that the original matrix is square. It’s subtle, but important detail.

The proofs cannot begin by “Suppose $\bold{\Sigma}^{-1}=…$” because it assumes that the inverse already exists and is such, so your prove begins with what you need to prove! Of course, then you are showing that there is no contradiction and everything is peachy 🙂

That’s similar to “Suppose I have a oil rig. Well, I have no proof of not having a oil rig, so it’s correct: I have a oil rig!” LOL. It doesn’t really prove anything :)))

A proper way to construct your prove in such case: “Let $\bold{A}=…$”, then check that $\bold{A\Sigma}=\bold{\Sigma A}=\bold{I}$ and conclude that $\bold{A}$ must be $\bold{\Sigma}^{-1}$ from the definition on p.458.”

That’s similar to “Suppose an oil rig exists (still a big assumption 🙂 ). It has official ownership documents; and, they show my name as a whole owner. Hence, (by whatever ownership laws), I have an oil rig.” Makes sense?

Definition: A square matrix $\bold{A}$ is a projection matrix, if it is idempotent. Note that it says nothing about about symmetry. In fact, projection matrix is symmetric if it’s also orthogonal.

HW 1

Tips

don’t be afraid to derive 1st and 2nd derivatives w.r.t. vector argument (i.e. compute gradient $\nabla$ and Hessian)
setting derivative to zero only gives you critical points (either minimum, maximum, or saddle) and is not sufficient to find minima
keep track of your dimensions!!! It helps to regularly (for your reference) write down the dimensions of your variables.

Typo in text:

p.456, eq.(A.2.3k) should have $|\mathbf{A}+\mathbf{ab}’|=…$

I prefer typed up HW, submitted electronically with file name in the form: 541_HW#_FirstName_LastName_m_d

For example: 541_HW1_Oleg_Melnikov_9_3

Post-grading remarks:

Everyone did very well on theoretical part! Yeah-ha.
Strangely, computational part was equally challenging for the most. Please see my freshly minted guidelines below
#1.4.2: See Typos section below for more details.

Students divided into

those claiming $\mathbf{S}\ge\mathbf{0}\Longrightarrow\mathbf{S}^{-1}\ge\mathbf{0}$ (citing some mysterious “result from basic linear algebra,”)

This result is incorrect (counterexample is zero matrix), unless such proof is presented. I took off a point

and those who explicitly looking at the case $\mathbf{S}>\mathbf{0}\Longrightarrow\mathbf{S}^{-1}>\mathbf{0}$

This result is correct, although one case was left unattended.

In HW solutions, you can use:

Concepts covered in the assigned text: Multivariate Analysis by K.V. Mardia, et al

The text has some typos and I’ll keep a list of them below. If you note them, share them with others and let me know as well.

Ex: p.456, eq.(A.2.3k) should have $|\mathbf{A}+\mathbf{ab}’|=…$

In-class concepts allowed by professor
Concepts learned in prerequisite courses: 405, 410, real-valued calculus and basic linear algebra

Using other resources:

Any key result not covered by above needs to be proved before it can be used in HW 🙂

otherwise, it’s difficult to judge the knowledge of the class material

and grading is more difficult because other resources have a different build up towards their results (also different notation, etc.)

References in proofs: my former students already know that I ask to cite justification for equations, definitions and theorems. For example,

$\mathbf{M}\overset{\text{by definition}}{=}\mathbf{X}’\mathbf{X}$ is vague and may not convey your knowledge of definitions. There are too many definitions; and, I don’t know which is referenced.
I prefer: $\mathbf{M}\overset{\text{by hypothesis (or assumption)}}{=}\mathbf{X}’\mathbf{X}$, if this is given in the problem statement

or, $\mathbf{M}\overset{\text{by eq.(1.4.10), p.11.}}{=}\mathbf{X}’\mathbf{X}$, if you are referencing the text

Typed HW: always prefered and appreciated 🙂

Grading is faster, with fewer misunderstandings 🙂
Typing forces you to be exact in your notation, helping you better understand the structure of your variables. This is super important in MVA course, where it’s easy to confuse vector, scalar, and matrix symbols and operations.

I will take off points, if I can’t understand notation or variables.
And, I will reward for consistent and clean expression of mathematics 🙂

However, you choose to write, write clearly and unambiguously. Don’t let the grader guess the notation.

LyX (visual LaTeX) is easy to use, and, allows combining R code/output in your HW (via sweave or knitr)

If in doubt on HW requirements, ask! Just e-mail me and/or professor.

It’s better to double check any requirements before submitting your HW.

Submission guidelines and format:
- Your HW should carry: your name, course #, HW #, date, page #s
- For e-submission use file name: 541_HW#_FirstName_LastName_m_d (ex: 541_HW1_Oleg_Melnikov_9_3)
- Submit problems in order they are assigned.
- Submit only what’s required (not more or less) and clearly mark each problem and subproblem
- Clearly mark your answers.
- Clearly label plots and data sets (units, axis, variables, values, …). Make sure your plots are legible (use color plots, if submitting in colors, etc.)
- Simplify fractions; Try to keep your calculus answers in terms of $e, \pi, \ln{2}, \sqrt{3}$, etc. It’s ok to use decimal values from R output.
- Keep HW solutions clean, organized, stapled, well identified; write clearly (especially, when referencing vectors, matrices, random variables, observations, etc.)
- When deriving your key functions (like cdf and pdf), always show domain for the function to make sense (remain as cdf or pdf)
- If in doubt, ask!
Grading computing projects:

Herein, by “OUTPUT” I mean either plots, pictures, diagrams, data summaries, data tables, statistical models, etc. that highlights the story from the data. You are the story teller and outputs should support your story.
My expectations:

Legible, properly labeled and scaled output with colors and markers appropriate for your presentation (black and white, color, etc.) !!! Don’t overcrowd with labels.
Output should show only what you want me to see. Not everything!
It’s easier to talk about output, if you already circled, bolded or otherwise marked the point of interest
Project should have more analysis than output
Key lines of your code (I prefer descriptive and meaningful code comments, as if I do not know the language and you are telling me what the code line does)
Always credit your data!!! Include URLs where data can be found and provider’s name.

Tell me a story!

Start by stating the hypothesis you are trying to prove or disprove
Continue with story-like transitions from one output to the next.
Each output must be included to support your assumption (whether it does or doesn’t)
Impress me with your logical interpretation of the output and what you want me to see in it.
Convince me that your interpretation is, in fact, drawn from output
What were you hoping to see before seeing output and whether you saw the expected (or any surprises)
Describe any unusual details of the output (outliers, linear or nonlinear relation, etc.)
What other output would be helpful and why (if you had the time…)?
Tell me a story, show me a motion picture that makes me go “WOW”, I didn’t see this before!; not a collection of vines
Make a conclusion that is convincing, not another probabilistic statment of uncertainty.

For every computing project, check if your story is cohesive and has your supportive evidence
Google for help on interpreting outputs.
Here are some links I found (let me know, if you find better ones):
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/TheresaScott/Interpret.Graphs.TAScott.handout.pdf
http://www.sjsu.edu/faculty/gerstman/StatPrimer/anscombe1973.pdf
http://stattrek.com/regression/influential-points.aspx

Grading Rubric:

Late HW penalty 10% per day
No HW is accepted after HW is released
Any special circumstance must be valid & verifiable (eg. doctor’s note with doctor’s contact info)
10 pts per problem, partial credits will be given
Deductions up to:

1 pt:

undefined variables/functions
wrong notation, domains, dimensions
missing domain on key derived functions (especially, cdf and pdf)

2 pts:

unjustified key equalities, implications (justify with theorems, definitions, etc.. Cite theorem numbers,..)
disorganized and messy HW solutions
wrong integration, differentiation, limits, justification, plot, algebra

all pts:

incomplete or no solution (I do not accept answers without worked out solutions), incorrect setup, wrong problem

all pts and report to professor and Rice Honor Council: cheating, plagiarism, copying (from another student, public source, etc.)

Typos in Multivariate Analysis by K.V. Mardia, et al

p.22, #1.4.2. A tricky detail that cannot be overlooked (in the context of real matrices, of course).

In general, a covariance matrix is a p.s.d, i.e. $\mathbf{S}\ge\mathbf{0}$ (p.11/top), yes, even for continuous distributions (albeit less likely); and we always have $\mathbf{S}>\mathbf{0}\Longrightarrow\mathbf{S}^{-1}>0$ (p.14/bottom and appendix).
However, it’s not true that $\mathbf{S}\ge\mathbf{0}\Longrightarrow\mathbf{S}^{-1}\ge0$. In fact, inverses do not even exist for p.s.d. matrix like zero matrix.
The solution lies in the context of the problem statement, in a subtle detail: merely writing down $\mathbf{S}^{-1}$ implies that it exists, implying that $\mathbf{S}$ is non-singular. Now you can combine this statement with $\mathbf{S}\ge\mathbf{0}$ to imply $\mathbf{S}^{-1}>\mathbf{0}$

p.54, #2.2.3: not really a typo, but an implication

Just clarifying: $\delta\in\mathbb{R}\backslash\left\{ 0\right\}$, which should be clear from the context 🙂 If $\delta=0$, then the author would not distinguish the first element of the vector structure and would have defined $\boldsymbol{\mu}_{1}=\mathbf{0}$ instead of $\boldsymbol{\mu}_{1}’=\left(\delta,\mathbf{0}’\right)$. Note that context implies that $\delta$ is a number (if it was a vector, vertical of course, it would be written as $\mathbb{\delta}’$ for a vector $\boldsymbol{\mu}_{1}$ to make sense in this definition.

Much can be implied from these brief definitions, but sometimes we all can miss details. So, if in doubt, just ask!

p.456, eq.(A.2.3k) should have $|\mathbf{A}+\mathbf{ab}’|=…$
p.117, Problem 4.2.6 should read “…$\boldsymbol{\hat{\mu}}$ is given by …”
p.118, Problem 4.2.12 should read “$\hat{\boldsymbol{\mu}}=\bar{\boldsymbol{x}}$” instead of “$\hat{\mu}=\bar{x}$”

Main

Links

Archives

Meta

541 Multivariate Analysis