M384G/M374G: Regression Analysis - Fall, 2005


Unique Numbers: 59035/58930                                                                  Time: MWF 10-11 Room: RLM 6.116

Class Web Page:http://www.ma.utexas.edu/users/mks/384G05/384G05home.html

Instructor: Smith http://www.ma.utexas.edu/users/mks/index.html

Office hours: Office hours for the first three class days are:

Office hours for the second week of classes and regular office hours will be announced in class and on my web page when they are set. I will need to cancel office hours now and then to accommodate meetings, oral exams, etc. I will try to give you several days advance notice when this happens. If it is impossible for you to make office hours, I will try to arrange individual appointments at other times. However, I am not available MWF before 11.

Text: Applied Regression Including Computing and Graphics, Cook and Weisberg, Wiley, 1999

Topics covered: With some omissions, we will cover Chapters 1 — 17 (Parts I and II) of the text, supplemented by derivations of some results (mainly in Chapter 6) and occasional short additional topics.

Focus of the Course: The course will be a mix of theory and application. Some results will be derived, so that you get some feel for why things are as they are. Most of what you will be expected to do is apply the theory with understanding. This means I will expect you to think about what you do. Do not expect rules that you can use in a mechanical manner.

Prerequisites: An upper division undergraduate course in statistics such as M358K or M378K, or a similar graduate course. Students enrolled for the undergraduate course (M374G) also need permission of the instructor. Since introductory statistics courses vary widely in their content and emphasis, it is inevitable that most students will be fuzzy on some of the prerequisite material. Therefore, I will give very brief reviews of most undergraduate concepts we use and expect you to consult the appendices of the references listed below or other suitable sources as needed to fill in any gaps in your particular background. The references by Mendenhall and Sincich, Neter et al, Rice, and Ross listed below are especially appropriate for this purpose.

Also, some acquaintance with linear algebra (e.g., multiplying matrices, understanding matrix inverses, linear dependence and independence) will be assumed eventually. If your linear algebra background is weak, see Section 7.9.1 of the text and/or Chapter 5 of the reference by Neter et al listed below now to get up to speed before we start using matrix notation.

Assignments and grading: Course grades will be based mainly on problem sets to be turned in approximately every two weeks. There will also be a take-home midterm and a take-home final. The final exam will be due at noon, Monday, December 19 (the end of the final exam date and time listed in the course schedule for this course).  Two homework grades will be dropped, to allow for a normal amount of illness, emergencies, bad weeks, and a learning curve. The midterm and final exams will each count equally with each of the remaining homework assignments in determining the course grade, but the exam grades will not be dropped.

Assignments and exams for M 384G and M 374G will have substantial overlap, but some differences. In some cases they will be the same, but usually the M 374G assignment will have either one (typically more challenging) problem deleted, or parts (usually the more challenging parts) of some M 384G problems deleted, or will have a less challenging problem substituted for one on the M 384G assignment.

I will expect you to write up your homework solutions carefully. Do not hand in a rough draft. In particular, I expect you to:

1. Write in complete sentences.

2. Organize your presentation. In particular, put computer output and graphs as close as possible to the place where you discuss or refer to them. (Please do not put them at the end of each problem.) This often requires cutting and pasting (either by hand or computer). In some cases, writing on your computer output will work.

3. Do not hand in computer output that you have not referred to in your discussion. Again, this may require cutting and pasting. But be sure to include computer output that you have referred to in your discussion.

4. Explain your reasoning clearly. The quality of your reasoning will be an important consideration in your grade, especially as the semester progresses and you have more options available to consider. Do not expect full credit if you do not give reasons for your answers or if you do not interpret your output in the context of the problem.

5. Do not include extra possiblities in the hope that I will give you credit if you have the right answer or explanation along with some incorrect ones.

6. Write legibly.

I will also sometimes assign questions for discussion. -> Be sure to think about these before we discuss them in class. Their purpose is to help you understand (and avoid misunderstanding!) some of the subtleties involved in the concepts and their application.

I will also try to give you reading assignments so you can preview material if you choose.

Often I will post class lecture notes on the class home page the night before lectures. Be sure to check.

Policy on late work: I am willing to accept one slightly late homework assignment from each student. "Slightly late" means after class on the day the assignment is due, but before the grader picks up homework. Late exams may be subject to a late penalty. I am always willing to accept assignments early. They may be slipped under my door if I am not available. Extenuating circumstances will be handled on a case-by-case basis. In particular, according to Section 51.911 of the Texas Education Code, a student who misses an examination, work assignment, or other project due to the observance of a religious holy day must be given an opportunity to complete the work missed within a reasonable time after the absence, provided that he or she has properly notified each instructor. It is the policy of The University of Texas at Austin that the student must notify each instructor at least fourteen days prior to the classes scheduled on dates he or she will be absent to observe a religious holy day. For religious holidays that fall within the first two weeks of the semester, the notice should be given on the first day of the semester. Alternate arrangements will be made as soon as possible after notification.

Computer software: I will expect you to learn how to use Arc regression software. The textbook integrates instructions on using Arc into the text, and includes an Appendix that serves as an Arc user's manual. Arc is especially designed for regression and includes some features not available in other software packages. I will accept use of other software on assignments when the special features of Arc are not needed, provided:

1. You don't ask me for help with the software. (In particular, it is your responsibility to put the data into a format appropriate for the software.)

2. It can do what is needed.

3. You don't use it to replace doing your part (in particular, thinking) on homework.

4. You interpret output assuming I am unfamiliar with the software.

Arc availability: Arc is available free for Windows, Macintosh, and Unix platforms at http://www.stat.umn.edu/arc/.(Note: This website is more direct than the one given on p. 545 of the textbook.) If you have your own computer, you may want to download your own copy. Arc has been installed on the Math Department computer system for your use in math department student labs. You can sign up for an account in the new "big lab" in RLM 7.122. It might also be possible to use Arc on other University computers by downloading it onto a suitable portable disk.

Cautions Regarding Arc:

1. There has been a bug in the lisp-stat program in which arc is written that messes up histograms when the window is resized. So be cautious in resizing windows for histograms.

2. In the past, at least one student had problems with arc loaded after upgrading to Windows 2000 Professional. This was reported to the developers of arc, but they had had no additional reports of the problem. If you use Windows 2000 Professional, please be alert for possible problems and check the arc website above for possible updates.

Copying and Printing from Arc: Arc does not support printing directly. However, text from Arc can be copied and pasted to a word processing program, then edited and printed. The Windows and Mac versions of Arc also support copying and pasting of graphics. The Unix version (which is the version available on math department computers) does not support copying graphics. However, graphics may be saved in PostScript format (using "Save to file"), then converted to another format, then imported into the Star Office word processor available on the math department computers. See "Using Arc and Star Office on Math Department Computers" at http://www.ma.utexas.edu/users/mks/384G04/arcstoffice.html for more information.

Data: Data needed for problems in the textbook comes with Arc. If I assign other data problems, I will put the data on the math department computer system and on the web for students using other computers. More details will be given as the need arises.

Ethical matters:

Statistical ethics: Statistics consists of a collection of tools which, like any tools, can be used either for good or for ill. It is your responsibility as a citizen of the world to be sure not to misuse these tools. I encourage you to read the Ethical Guidelines for Statistical Practice developed by the American Statistical Association, available on the web at http://www.amstat.org/profession/index.cfm?fuseaction=ethicalstatistics.

Authorized collaboration: Since the University defines collaboration that is not specifically authorized as academic dishonesty, I need to tell you what collaboration is authorized in this class.

The following type of collaboration is authorized on homework, but not on exams: Working on homework with someone who is at roughly the same stage of progress as you, provided both parties contribute in roughly equal quantity and quality (in particular, thinking) to whatever problem or problem parts they collaborate on.

The following types of collaboration are not authorized:

            1. Working together with one person the doer and one the follower.

            2. Any type of copying; this includes splitting up a problem so that different people do different parts, obtaining solutions from students who took the course previously, or consulting any kind of solutions manual for the textbook.

            3. Any type of collaboration on exams.

Academic dishonesty aside, asking anyone, "How do I do this problem?" (as opposed to questions like, "How do I carry out this detail of this technique?" or, "I'm not sure whether to proceed this way or this way; here is my thinking about each possibility; am I missing something?") is cheating -- cheating yourself and your future employer, since it avoids the most important part of statistics: thinking.

Students with Disabilities: Please notify me as soon as possible of any modification/adaptation you may require to accommodate a disability-related need. You will be requested to provide documentation to the Dean of Students' Office, in order that the most appropriate accommodations can be determined. Specialized services are available on campus through Services for Students with Disabilities. For more information, contact the Office of the Den of Students at 471-6259, 471-4641 TTY.

Additional references: Although I believe that our textbook is the best regression textbook available, I realize that no textbook is just right for everyone at all times. Here are some suggestions if you need to consult another text. However, do not try to find solutions to homework problems in another textbook. I expect you to think in doing homework problems. If you look up the solution, you have largely defeated the purpose of the problem.

(I have not put any of these on reserve; please let me know if you think I need to do so.)

Chatterjee S, Price B, Regression Analysis By Example, 2nd ed, 1991. New York: John Wiley & Sons, QA 278.2 C5 1991. Has less of an emphasis on linear algebra than our textbook.

Cook, R. D., Regression Graphics: Ideas for Studying Regression through Graphics, 1998, New York, Wiley, QA 278.2 C6647 1998 PMA. Gives the theory behind the graphical techniques used in our textbook.

Draper and Smith, Applied regression analysis,3rd ed., New York, Wiley 1998 QA 276 D68 1998 Physics-Math-Astronomy Library (Also available as an e-book through UTNetCAT.) An earlier edition of this book has been used as a text for this course in the past. There is a third edition, but the library does not have it.

Graybill, Franklin A. and Hariharan K. Iyer, Regression analysis: concepts and applications, Duxbury, Belmont, Calif., 1994 QA 278.2 G73 1994. Slightly less advanced than our textbook. Weak point: Examples tend to be made-up rather than real. Possible strong point (depending on how you plan to use regression): Emphasizes the use of regression for prediction.

Mendenhall, William and Terry Sincich, A Second Course in Statistics: Regression Analysis, 5th edition, Prentice Hall, 1996, HF 1017 M46 1996. Slightly less advanced than this course. The first chapter has a review of basic statistics.

Montgomery, Douglas C. and Elizabeth Peck, Introduction to Linear Regression Analysis, 2nd ed., New York, Wiley, 1992 QA 278.2 M65 1992. Another book that has been used as a text for this course.

Neter, John, Michael Kutner, Christopher Nachsteim and William Wasserman, Applied Linear Regression Models, 3rd edition, Chicago, Irwin, 1996, QA 278.2 A65 1996. A third book which has been used as a text for this course.

Rice, John A. Mathematical Statistics and Data Analysis, 2nd ed. Duxbury, 1994, QA 276.12 R53 1994. A textbook on introductory mathematical statistics that would be suitable if you need review of prerequisite material.

Ross, Sheldon M., Introduction to probability and statistics for engineers and scientists, New York, N.Y., Wiley, 1987, TA 340 R67 1987 Engineering Library. Another textbook on introductory statistics that might be useful for review or reference.

Ryan, Thomas P., Modern Regression Methods, Wiley, 1997, QA 278.2 R93 1997. Often terse, but has summaries of recent developments.

Sen, Ashish and Srivastava, M. Regression analysis : theory, methods and applications, Springer, 1990 QA 278.2 S46 1990

Stapleton, James H., Linear Statistical Models, Wiley, 1995, QA 279 S695 1995. Regression and analysis of variance from a linear algebra and geometric point of view.

Weisberg, Sanford, Applied Linear Regression, 3rd edition, Wiley, 2005, QA 278.2 W44. A lot of overlap with the textbook.