MATH/CMPU 144: Foundations of Data Science Fall 2023 Syllabus


Key Information at a Glance:

Classroom
  • Lecture and Lab: Rockefeller 210
Times
  • Lecture: M/W 12:00–1:15pm
  • Lab: F 1:00–3:00pm
Website Course Moodle page
Instructor Simon Hoellerbauer
Instructor Email shoellerbauer@vassar.edu
Instructor Office Rockefeller 206
Office Hours

Dr. Hoellerbauer:

  • Mondays: 1:30pm–3:00pm
    • In-Person, Drop-In, Rocky 206
    • On Zoom, link on Moodle page, sign up via Calendly
  • Tuesdays: 3:30pm–5:00pm
    • On Zoom, link on Moodle page, sign up via Calendly
  • Fridays: 3:00pm–4:00pm
    • In-Person, Drop-In, Rocky 206 (or Rocky 210, after lab)
    • On Zoom, link on Moodle page, sign up via Calendly
  • By appointment (please don’t hesitate to ask about different times!)

Megan Burnett (Coach):

  • Tuesdays: 5-6 PM - Rocky 201
  • Thursdays: 3-4 PM - Rocky 203
  • Saturday: 3-5 PM - Rocky 112
Textbooks

Please note that there are free versions of all of these books! The links below lead to the free versions.

Assessment
Item Percent of Grade
Homeworks 30%
Labs 20%
Project 40%
Class Engagement 10%

The target audience for this course is students interested in data science, social science, computer science, quantitative analysis, statistics, and coding, among many others topics. It is ideal for students wanting to learn quantitative approaches to solving important problems and develop marketable analytic and data management skills.

In this course, we will talk about the technical aspects of data science – such as coding, data wrangling, data visualization, and model building – as well as the social aspects of data science – data quality, data ethics, and causal implications. We will do so using real-world data. Because data science combines aspects of computer science and statistics, the course is very quantitative in nature. A core part of the course is gaining familiarity with the programming language R and the tidyverse suite of R packages that facilitate data science.

This course is introductory in nature, but we will by diving quite deep into data science. This course assumes no prerequisites besides high school algebra. You do not need to have any coding or stats experience to succeed in this class. If you do have some experience in either of these topics, you will still find a lot that is new in this class. If you have taken several computer science or statistics courses, this class may be a bit too elementary for you. Feel free to come chat with me about it if you are unsure!


Course Structure

This class is comprised of two in-class components: lectures that will involve in-class activities and weekly labs. Out of class, it involves (some) readings, and periodic homework assignments. It is highly interactive. I will rarely lecture for a full class-period — we learn best by doing. There are some readings — most often from the two textbooks (see below), but also newspaper articles and some academic journal articles.

We will use Ed Discussion (look out for a sign-up link) as a Wiki-style question and answer/discussion forum. This makes it easier for the instructor to answer common questions and also allows students to crowd-source answers. It allows you to write in code with proper formatting and with syntax highlighting, making it slightly easier to get proper feedback.

Unless indicated, you are expected to have completed the readings and assignments by the date they are listed in the course schedule.

Assignments and Grading

Homeworks (30%)

There will be five homework assignments due as noted in the schedule, almost always on Wednedays (except for Homework 5, which is due on a Friday). They are weighted equally (so each is worth 6% of your overall grade). These homework assignments are due by 11:59pm on the days indicated, unless we decide something different in class or I announce a different due time. All homeworks are to be completed individually. You will typically have one week to complete a homework assignment.

Labs (20%)

There are weekly labs (except for certain project workdays indicated in the schedule). All labs are weighted equally. Labs are evaluated for effort and completion, not necessarily correctness. Each lab session you will complete an assignment in which you apply topics from that week, which you will turn in by 11:59pm on the Monday after which they are released. They are designed to be completed during the lab time.

We have a Department of Computer Science coach who will help during lab.

Project (40%)

The class has a capstone final project for which students, working in groups, conduct and present an original data analysis on a dataset of your (collective) choice. You will be given your project groups once the Add Period has ended.

The aim of the project is for you to apply concepts and techniques we will cover in this course. You can use an existing dataset (you may not reuse data used in this course for examples, labs, and assignments) or collect your own data using a survey or an experiment.1

The project consists of the following components (in parentheses is weight of that component’s grade in the overall project grade):

  1. Proposal (7.5%)
  2. Exploratory Data Analysis (20%)
  3. Written Report Draft (10%)
  4. Written Report (30%)
  5. Presentation (30%)
  6. Team member evaluation (2.5%)

At the end of the project, you will evaluate your own and your group members’ contribution to the project. I will take these evaluations into account when assigning individual grades.

More information about the project and the individual components will be provided later in the semester.

To use a grace day for a project component, all group members have to decide to use a grace day. You cannot use a grace day on the final report and presentation. If you think your group might not be able to turn those in on time, please contact me ASAP.

Class Engagement (10%)

Class engagement is what you may often see called “participation” in other classes. While participation is important, I know that know that not everyone participates in the same way. I encourage you all to be involved in class and during labs. What I am really looking for, however, is engagement in the course. All of the following demonstrate engagement in the course:

  • Participation in class discussions
  • Participation in individual and group in-class activities
  • Being active on Ed Discussion (asking or answering questions)
  • Coming to office hours
  • Emailing me with questions

Extra Credit Opportunities

The Data Science and Society Initiative is organizing a colloquium series this semester where data scientists will come to present on data science topics. If you attend and write a two (2) page, double-spaced reflection on the speaker’s presentation, including what you learned and how it related to what you have learned in class, you will get one (1) percentage point of extra credit on a homework of your choice. You can only get points for attending and writing a reflection three (3) times. However, you are strongly encouraged to attend all of them!

There may be other extra credit opportunities introduced by me during the semester, but please do not ask for extra credit opportunities for you in particular – if it is offered, extra credit will always come from me and will always be available to all students.

Grade Breakdown

I will use the following grading scale:

  • A: 100-93; A-: < 93-90
  • B+: < 90-87; B: < 87-83; B-: < 83-80
  • C+ < 80-77; C: < 77-73; C-: < 73-70
  • D+: < 70-67; D: < 67-60
  • F: < 60

Some professors make subjective decisions about rounding up or down in certain ranges (93-94, for example). This has always struck me as unfair and subjective. This grading scale makes it clear exactly what percentage you need to get for a particular letter grade. I will not do any rounding beyond this.

Class Texts and Software

Texts

We will be primarily using three textbooks, two of which are entirely online:

  • R for Data Science (2nd Edition) by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund (R4DS in schedule). We will use this book to gain a better understanding of key concepts in data science and how to implement them in R.

  • Introduction to Modern Statistics by Mine Çetinkaya-Rundel and Johanna Hardin (IMS in schedule). We will use this book to introduce topics from statistics that are key to data science (All topics in statistics are relevant to data science but we won’t cover them all in this course).

Please note that the online versions of these textbooks are free! If you want a physical copy of these books, you will have to purchase them (I want to stress that in no way shape or form are you expected to purchase the physical copies). The links above lead to the free online versions.

  • OpenIntro Statistics by David Diez, Mine Çetinkaya-Rundel, Christopher Barr, and OpenIntro (OIS in schedule). You can get a free PDF version of this book here: https://www.openintro.org/book/os/

    There is not an html version of this book, unfortunately. This third textbook is a bit more technical. We will not use it nearly as much as the other two books, but it is a good resource for those interested in statistics.

**If you need a physical copy of any of these textbooks for accessibility reasons, please let me know ASAP!*

Other readings may be posted to Moodle or linked in the course schedule.

Note About Reading

I will not quiz you on the reading materials. I strongly recommend, but do not require that you read them before lecture, as they will make lecture easier for you to parse. That said, if you ask me a conceptual question, I will first always ask you if you’ve had a chance to read up on that topic in the course texts yet.

Reading More About R

If you are interested in R as a programming language (a topic that we will touch on in this course but into which we will not go into major detail), feel free to check out Hadley Wickham’s2 excellent Advanced R. You can access its contents for free online here: https://adv-r.hadley.nz/. I also have a physical copy of this book in my office if you would like to peruse it.

Software

There are software requirements for the course. Students must download and install R, a free statistical program available at http://cran.r-project.org/, as well as RStudio (also free), which is available at https://www.rstudio.com/products/rstudio/download/ Please follow the instructions for downloading both here: https://moderndive.netlify.app/1-getting-started.html#installing

I believe that working through installing and having an installation of programs like RStudio and R on one’s personal computer is a very useful skill and experience. However, I do not want this to be an obstacle to learning class material. If you do not have a laptop or have a laptop that cannot run RStudio or R for any reason please let me know; Vassar has access to an RStudio Server, which will let you access RStudio and use R in a web browser.

Detailed Course Policies

Office Hours and Contact Policy

I will hold two sets of office hours during the week:

  • Mondays: 1:30-3:00pm
  • Wednesdays: 4:30–6:00pm.

During these times, I will be available on Zoom (please see link on Moodle page) and in-person (Rocky 206). However, if you would like to talk to me over Zoom during office hours, I request that you please sign up for a 15 minute slot via this link: https://calendly.com/simon_hoellerbauer/office-hours. You may sign up for two slots (but no more, please) if needed.

The drop-in hours are first-come, first-serve office hours. But if you are there before the end of office hours, I will do my best to get to you (that is, I won’t turn people away who have been waiting just because it is after 6pm on Mondays or after 3:30pm on Wednesdays).

If I have to change my office hours for any reason, I will let you know. If these times do not work for you, please email me; I am more than happy to schedule a meeting at a different time. I will be available by Zoom on Tuesdays and Thursdays, if needed.

You are encouraged to come to my office hours and to contact me with any questions you may have, even if you just want to chat. You can come to office hours to talk about the class, but my office hours are open more generally as well. I am more than happy to talk about research interests, data science, quantitative social (and political) science, data science careers, classes at Vassar, and many other things.

For general questions about course materials, I encourage you to use Ed Discussion, which allows you to properly format code and also has things like syntax highlighting. If your question involves code, I would prefer that you use Ed Discussion; if it is a question about an assignment, you can make a private question that only I can see. The reason I prefer Ed Discussion for such questions is because of the ability to write code with proper formatting and highlighting.

I will try to respond to emails and private questions on Ed Discussion as soon as possible, although I cannot guarantee same day response. Therefore, I encourage you to ask me questions about assignments and projects as far in advance as possible, which will hopefully help you get in the habit of working on assignments well before they are due.

Attendance

Presence in the classroom is a key factor in student success in a course. I will take attendance at the beginning of each lecture and lab meeting. You are allowed two (2) unexcused absences from lecture and one (1) unexcused absence from lab.

Please contact me for excused absences. Excused absences are generally limited to short-term or long-term illness, personal emergencies, varsity athletic participation, and religious holidays, although I am willing to discuss absences that do not exactly fit under one of those categories on a case-by-case basis before the date of the planned absence. If I do not promptly reply to an email related to absences, please follow up with me.

After two unexcused absences from lecture, students will be penalized a letter grade on their class engagement grade – if you do not come to lecture, you can not engage in class. After one unexcused absence from lab, students will be penalized a letter grade on their lab grade. Each additional absence in either case will result in a further letter grade penalty. Please note that the potential consequences for unexcused absences is different for the two in-class aspects of the class.

Students who are consistently late for lecture or lab may also see the corresponding grades reduced. Excessive unexcused absences (more than 25% of class meetings, lab meetings or a combination of the two) may also be grounds for failing or dismissal from the course.

Late Work

I have a grace days policy in my class. During the semester, you have three (3) grace days to use on any assignment or combination of assignments. One grace day allows you to turn in an assignment one day (24 hours) late. The new due time would be the same time of the day, but one day later. For example, if a homework is due on September 15 at 11:59 PM, and you take a grace day, you can turn in this homework up to 11:59 PM on September 16. Weekends (a Saturday-Sunday period) count as one grace day. If you want to use a grace day on an assignment, you must indicate so on the assignment itself.

If you do not use a grace day and have not talked to me beforehand, I will deduct a letter grade (10 percentage points) per day that an assignment is late from the maximum grade you can receive. I will then grade your paper as normal and weight it so that it could not exceed this new maximum grade. For example, if you turn in an assignment one day late and do not use a grace day, the highest grade you can receive is a 90. If you then receive an 85 on the assignment, your actual grade will be .85 * 90 = 76.5. I do this is because it helps me separate out where you lose points in ways that are not related to the lateness of your assignment.

What Grace Days Do Not Need to Be Used For

If you exhaust your grace days and think you may need more time on an assignment, please contact me, but please note that I will only grant extensions in extraordinary circumstances.

However, please keep in mind that the idea behind grace days is to give students a bit of flexibility and ownership over deadlines. If you want to have an extra day of vacation, need a mental health day, want to spread out some of your assignments a bit more, or an assignment is taking you longer than expected, use a grace day! You are not expected to use grace days in extraordinary circumstances. If you are observing a religious rite or holiday that impacts your ability to complete an assignment, for example, or if you are consistently struggling physically, mentally, or emotionally, and that is preventing you from making progress, you do not have to use your grace days for extensions on assignments. In the case of religious observances, please just send me an email. In other instances, I request that you also reach out to the Dean of Studies office (email: dos@vassar.edu) because they can help triage and coordinate with all of your instructors, not just me.

Academic Integrity3

In a class setting, cooperative work has both benefits and pitfalls. Peers learn a lot by explaining things to each other. But it can also be easy to stumble into a passive mindset where you’re not really assimilating the concepts. In this course, you are allowed — and in fact welcome to — discuss course content with your peers, including homework assignments. However, you must always write your own code and written answers, except for the final project, where you can share everything, including code, with your project partners, as you will turn in one assignment. 4

In addition, in general, you are not allowed to share code in any way for any assignment (except, as stated above, the final project, and then only within your group). You are also not allowed to turn in code obtained from others or online.

Do not post publicly on Ed Discussion about homework and other assignments. Ed Discussion is only to be used to ask conceptual questions about class materials. You can write me a private message to ask about homework and other assignments.

To make it totally clear, you can use the following guidelines to determine what collaboration is allowed on assignments that are to be turned in:

What is Cheating?

  • Sharing code or other electronic files: either by copying, retyping, looking at, or supplying a copy of a file from this or a previous semester. Also not allowed is verbal or other description of one person’s code to another.
  • Sharing written assignments: Looking at, copying, or supplying an assignment.
  • Using other’s code. Using code from this or previous offerings of this class, from other courses at Vassar or other institutions (e.g., software or code found on the Internet).
  • Looking at other’s code. Although mentioned above, it bears repeating. Looking at other students’ code or allowing others to look at yours is cheating. This includes one person looking at code and describing it to another. There is no notion of looking “too much”, since no looking is allowed at all.

What is not Cheating?

  • Clarifying ambiguities or vague points in class handouts or textbooks.
  • Using code from the textbook or from the class web pages is always OK.

These guidelines will be slightly relaxed for in-class activities, which will not be turned in. For these you will be put in groups; you still need to write your own code, but you can work together to come up with solutions and can look at each others’ code.

Please remember that I am here to help and that all you have to do is ask for assistance if you need it. You do not have to face the course alone; I just want to make sure that all students are best situated to learn and practice the course material.

Generartive AI

Generative AI – such as ChatGPT, Bing Chat, or Anthropic Claude – can be powerful tools to help you in writing and debugging code. At times during the semester, we may encourage you to try these tools in class and in lab. For this class, you may also use generative AI on your own when you are studying or working on homework assignments.

If you use generative AI while working on a homework assignment, you must include a comment acknowledging the use of these tools and how you used them, and you may be asked to include a transcript showing your interactions. (This will help us to understand both the difficulties students are having in the class and the ways that generative AI can help them!)

We will talk about this a bit in class, but please note that generative AI is designed to mimic human output - it is not designed to output the truth, however. Any mistakes resulting from code taken from a generative AI count as your own.

In addition, be careful that you are using generative AI to help you learn rather than as a way to avoid learning! If I feel that students are relying too heavily on generative AI, I reserve the right to start holding regular in class paper quizzes.

Regrade Requests

Requests for regrades have a time window. They cannot be submitted until at least 48 hours have passed since the assignment was returned (a cool-down period), and then they will only be accepted within three weeks of an assignment being returned (a statute of limitations). To request a regrade, you must submit a written memo (two pages max) explaining what aspect of your original grade you think was in error.

Please note that you do not have to do this if you think there is an error in the assignment or in the calculation of a grade. Just bring this to my attention.

Electronic Policy

Please put away all cell phones while class is in session. You are permitted to use laptops in class. On most days we will be doing some sort of activity for which you will have to use a computer.5 Please realize that I can tell when you are looking at materials that are not related to the class. When taking notes, I strongly encourage you to not use your laptops in class, as studies have shown that using pen and paper is better for comprehension and understanding, while laptop use can decrease participation.

COVID Policies

Although the situation has improved markedly, COVID-19 is still a presence in our lives, and there are many individuals for whom COVID-19 is still a significant risk, for a variety of reasons. I will be wearing a mask to teach and encourage all of you to wear a mask while in the classroom as well, in particular if you are having any kind of upper-respiratory system symptoms (masks protect against more than just COVID-19!); the rooms in Rocky are small and air circulation is not the best.

I will require masks during in-person office hours because my office is quite small.

Teaching Philosophy

I view my role as a teacher as a support person for you, my students. Because of my background and education, I have knowledge that I will strive to communicate with my you, which is why lectures do form an important part of this course. My primary goal as a teacher, however, is to make you feel engaged and active and to help you learn skills that you will be able to use outside of the contexts of this course and even of this field of study. As such, I believe that active engagement with the course material is essential to helping you learn, and I structure the course in such a way that there are plenty of ways in which to participate and be active, as I recognize that not all students learn in the same way. At the same time, I do not believe that surface-level skimming of a topic is all that useful; therefore, this class is more detail-oriented than other introductory courses may be, without being overwhelming. Finally, I am always open to feedback—I want to make sure that you are getting both what you want and need from this course.

Discrimination and Harassment

I want to remind everyone that we are bound to abide by Vassar’s policies regarding discrimination and harassment, which you can read on page 16 in the Vassar College Regulations.

Names and Pronouns6

As noted by the Office of Equal Opportunity and Affirmative Action/Title IX, Vassar is committed to diversity, inclusion, equity, and non-discrimination. Many people might use a name that is different from their current legal name. In all areas of campus, we refer to people by the names, in addition to the pronouns, that they use for themselves. Students are invited to share their names and the pronouns that they use. Students are also encouraged to use gender-neutral language, if they aren’t sure of someone’s pronouns.

Resources for Students

If you are having any trouble during your time at Vassar, please remember that there are resources for you and people who want to help you on campus. If you feel like you are struggling mentally, physically, or emotionally and it is impacting your academic experience at Vassar, please don’t hesitate to reach out to me, your advisor, your class advisor, and especially the Dean of Studies office (email: dos@vassar.edu). The ones with the most ability to effect change for you will be the Dean of Studies office (they will also be able to connect you with all of the resources on campus), so I strongly recommend that you reach out to them as soon as you feel like you are having some difficulty.

They will usually communicate that to me then (without sharing details), but it can help me to assist you in the course if you let me know when you reach out to them. I never expect you to share any details for why you are reaching out to them with me.

Q-Center Peer Support Services

This course fulfills the quantitative analysis (QA) requirement for graduation. All Vassar students have access to free, drop-in, peer-to-peer quantitative tutoring at the Quantitative Reasoning Center (Q-Center). Quantitative tutors (Q-Tutors) excel in a variety of STEM courses. They are typically available Sunday-Thursday 3pm-11pm while classes are in session. Q-Tutors who specialize in Mathematics and Physics are located in the Main Library, Room 122 behind the Writing Center. Q-Tutors who specialize in Chemistry and Economics are located in the Main Library, Room 88 near Special Collections. If you have a quantitative question beyond these four disciplines, Q-Tutors are available to attempt to help you with this question or will help direct you to someone else who may be better able to help. Schedules and other important information can be found at https://offices.vassar.edu/ltrc/qrc/.

Academic Accomodations

Academic accommodations are available for students registered with the Office for Accessibility and Educational Opportunity (AEO). Students in need of disability (ADA/504) accommodations should schedule an appointment with me early in the semester to discuss any accommodations for this course that have been approved by the Office for Accessibility and Educational Opportunity, as indicated in your AEO accommodation letter.

Title IX Resources

Vassar College is committed to providing a safe learning environment for all students that is free of all forms of discrimination and sexual harassment, including sexual assault, domestic violence, dating violence, and stalking. If you (or someone you know) has experienced or experiences any of these incidents, know that you are not alone. Vassar College has staff members trained to support you in navigating campus life, accessing health and counseling services, providing academic and housing accommodations, helping with legal protective orders, and more.

Please be aware all Vassar faculty members are “responsible employees,” which means that if you tell me about a situation involving sexual harassment, sexual assault, dating violence, domestic violence, or stalking, I must share that information with the Title IX Coordinator. Although I have to make that notification, you will control how your case will be handled, including whether or not you wish to pursue a formal complaint. Our goal is to make sure you are aware of the range of options available to you and have access to the resources you need.

If you wish to speak to someone privately, you can contact any of the following on-campus resources:

  • Counseling Service (counselingservice.vassar.edu, 845-437-5700)
  • Health Service (healthservice.vassar.edu, 845-437-5800)
  • Rachel Gellert, SAVP (Support, Advocacy, and Violence Prevention) director, 845-437-7863)
  • SAVP advocate, available 24/7 by calling the CRC at 845-437-7333

The SAVP website (savp.vassar.edu and the Title IX section of the EOAA website (eoaa.vassar.edu/title-ix/) have more information, as well as links to both on- and off-campus resources.

Changes to Syllabus and Schedule

While unexpected, I reserve the right to make changes to this syllabus and schedule when necessary. I will always let you know when this occurs. For the most up-to-date syllabus, please always look on Moodle. I will never add more assignments, I will only ever remove them from the schedule (in some cases I may tweak readings).

Schedule

Please note that this schedule is filterable and searchable.

Footnotes

  1. If you choose to go this route, you must check with me before writing your proposal, otherwise I will not approve it.↩︎

  2. Hadley Wickham is a statistician, the creator of the ggplot2 package and the tidyverse, and Chief Scientist at RStudio. He’s from New Zealand, hence the .nz in the links here and above.↩︎

  3. Adapted from the DATA 144 syllabus created by Professors Monika Hu and Jason Waterman.↩︎

  4. There is to be no sharing between groups, to be clear.↩︎

  5. If you do not have access to a laptop, please let me know as soon as possible, and we will find a solution.↩︎

  6. Adapted from statement written by Professors Jacob Smith and Abbie Erler, Kenyon College.↩︎