logo

Introduction to Data Science I & II

Introduction, introduction #.

Dan L. Nicolae , Michael J. Franklin , Amanda R. Kube Jotte , Evelyn Campbell, Susanna Lange, Will Trimble, and Jesse London

Forthcoming…

Acknowledgements #

Jupyter Books was originally created by Sam Lau and Chris Holdgraf with support of the UC Berkeley Data Science Education Program and the Berkeley Institute for Data Science .

Introduction to Data Science with Python

Arvind Krishna, Lizhen Shi, Emre Besler, and Arend Kuyper

September 20, 2022

This book is developed for the course STAT303-1 (Data Science with Python-1). The first two chapters of the book are a review of python, and will be covered very quickly. Students are expected to know the contents of these chapters beforehand, or be willing to learn it quickly. Students may use the STAT201 book (https://nustat.github.io/Intro_to_programming_for_data_sci/) to review the python basics required for the STAT303 sequence. The core part of the course begins from the third chapter - Reading data .

Please feel free to let the instructors know in case of any typos/mistakes/general feedback in this book.

Data Science

Ds statistics, ds advanced, ds certificate, data science introduction.

Data Science is a combination of multiple disciplines that uses statistics, data analysis, and machine learning to analyze data and to extract knowledge and insights from it.

What is Data Science?

Data Science is about data gathering, analysis and decision-making.

Data Science is about finding patterns in data, through analysis, and make future predictions.

By using Data Science, companies are able to make:

  • Better decisions (should we choose A or B)
  • Predictive analysis (what will happen next?)
  • Pattern discoveries (find pattern, or maybe hidden information in the data)

Where is Data Science Needed?

Data Science is used in many industries in the world today, e.g. banking, consultancy, healthcare, and manufacturing.

Examples of where Data Science is needed:

  • For route planning: To discover the best routes to ship
  • To foresee delays for flight/ship/train etc. (through predictive analysis)
  • To create promotional offers
  • To find the best suited time to deliver goods
  • To forecast the next years revenue for a company
  • To analyze health benefit of training
  • To predict who will win elections

Data Science can be applied in nearly every part of a business where data is available. Examples are:

  • Consumer goods
  • Stock markets
  • Logistic companies

Advertisement

How Does a Data Scientist Work?

A Data Scientist requires expertise in several backgrounds:

  • Machine Learning
  • Programming (Python or R)
  • Mathematics

A Data Scientist must find patterns within the data. Before he/she can find the patterns, he/she must organize the data in a standard format.

Here is how a Data Scientist works:

  • Ask the right questions - To understand the business problem.
  • Explore and collect data - From database, web logs, customer feedback, etc.
  • Extract the data - Transform the data to a standardized format.
  • Clean the data - Remove erroneous values from the data.
  • Find and replace missing values - Check for missing values and replace them with a suitable value (e.g. an average value).
  • Normalize data - Scale the values in a practical range (e.g. 140 cm is smaller than 1,8 m. However, the number 140 is larger than 1,8. - so scaling is important).
  • Analyze data, find patterns and make future predictions .
  • Represent the result - Present the result with useful insights in a way the "company" can understand.

Where to Start?

In this tutorial, we will start by presenting what data is and how data can be analyzed.

You will learn how to use statistics and mathematical functions to make predictions.

Get Certified

COLOR PICKER

colorpicker

Contact Sales

If you want to use W3Schools services as an educational institution, team or enterprise, send us an e-mail: [email protected]

Report Error

If you want to report an error, or if you want to make a suggestion, send us an e-mail: [email protected]

Top Tutorials

Top references, top examples, get certified.

Browse Course Material

Course info, instructors.

  • Prof. Eric Grimson
  • Prof. John Guttag
  • Dr. Ana Bell

Departments

  • Electrical Engineering and Computer Science

As Taught In

  • Computer Science
  • Probability and Statistics

Learning Resource Types

Introduction to computational thinking and data science, assignments.

Please review the Style Guide (PDF) before attempting the problem sets. We have compiled a list of other Python resources that you may find helpful in this Additional Python Resources document (PDF) . It contains links to other online textbooks on Python, debugging tools, and fun online coding challenges.

Solutions for the problems sets are not available.

Problem Set 1: Space Cows Transportation (ZIP) (This ZIP file contains: 1 .pdf file, 2 .txt files, and 3 .py files)

Problem Set 2: Fastest Way to Get Around MIT (ZIP) (This ZIP file contains: 1 .pdf file, 1 .txt file, and 2 .py files)

Problem Set 3: Robot Simulation (ZIP) (This ZIP file contains: 1 .pdf file, 4 .pyc files, and 4 .py files)

Problem Set 4: Simulating the Spread of Disease and Bacteria Population (ZIP) (This ZIP file contains: 1 .pdf file and 2 .py files)

Problem Set 5: Modeling Global Warming (ZIP - 2.3MB) (This ZIP file contains: 1 .pdf file, 1 .csv file, and 2 .py files)

facebook

You are leaving MIT OpenCourseWare

introduction to data science assignment 3

Introduction to Data Science

A Python Approach to Concepts, Techniques and Applications

  • © 2024
  • Latest edition
  • Laura Igual 0 ,
  • Santi Seguí 1

Departament de Matemàtiques i Informàtica, Universitat de Barcelona, Barcelona, Spain

You can also search for this author in PubMed   Google Scholar

  • Describes tools and techniques that demystify data science
  • Discusses Python extensions, techniques and modules to perform statistical analysis and machine learning
  • Includes case studies, and supplies code examples and data at an associated website

Part of the book series: Undergraduate Topics in Computer Science (UTICS)

2526 Accesses

This is a preview of subscription content, log in via an institution to check access.

Access this book

  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

Table of contents (12 chapters)

Front matter, introduction to data science.

Laura Igual, Santi Seguí

Data Science Tools

  • Eloi Puertas

Descriptive Statistics

  • Statistical Inference

Supervised Learning

Regression analysis, unsupervised learning, network analysis, recommender systems, basics of natural language processing, deep learning, responsible data science, back matter.

  • Data Science
  • Parallel Computing
  • Python Programming
  • Graph Analysis

About this book

This accessible and classroom-tested textbook/reference presents an introduction to the fundamentals of the interdisciplinary field of data science. The coverage spans key concepts from statistics, machine/deep learning and responsible data science, useful techniques for network analysis and natural language processing, and practical applications of data science such as recommender systems or sentiment analysis. 

Topics and features:  

  • Provides numerous practical case studies using real-world data throughout the book 
  • Supports understanding through hands-on experience of solving data science problems using Python 
  • Describes concepts, techniques and tools for statistical analysis, machine learning, graph analysis, natural language processing, deep learning and responsible data science
  • Reviews a range of applications of data science, including recommender systems and sentiment analysis of text data 
  • Provides supplementary code resources and data at an associated website 

This practically-focused textbook provides an ideal introduction to the field for upper-tier undergraduate and beginning graduate students from computer science, mathematics, statistics, and other technical disciplines. The work is also eminently suitable for professionals on continuous education short courses, and to researchers following self-study courses.

Authors and Affiliations

About the authors.

Dr. Laura Igual  is an Associate Professor at the Departament de Matemàtiques i Informàtica, Universitat de Barcelona, Spain.  Dr. Santi Seguí  is an Associate Professor at the same institution.

The authors wish to mention that some chapters were co-written by Jordi Vitrià, Eloi Puertas, Petia Radeva, Oriol Pujol, Sergio Escalera.

Bibliographic Information

Book Title : Introduction to Data Science

Book Subtitle : A Python Approach to Concepts, Techniques and Applications

Authors : Laura Igual, Santi Seguí

Series Title : Undergraduate Topics in Computer Science

DOI : https://doi.org/10.1007/978-3-031-48956-3

Publisher : Springer Cham

eBook Packages : Computer Science , Computer Science (R0)

Copyright Information : The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2024

Softcover ISBN : 978-3-031-48955-6 Published: 13 April 2024

eBook ISBN : 978-3-031-48956-3 Published: 12 April 2024

Series ISSN : 1863-7310

Series E-ISSN : 2197-1781

Edition Number : 2

Number of Pages : XIV, 246

Number of Illustrations : 4 b/w illustrations, 78 illustrations in colour

Topics : Data Structures and Information Theory , Artificial Intelligence , Data Mining and Knowledge Discovery , Python

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

introduction to data science assignment 3

Programming for Data Science

Teaching data scientists the tools they need to use computers to do data science

Creative Commons License

Advanced Python for Data Science Assignment 3

All assigments beginning with Assignment 3 are to be submitted via GitHub. Create a new repository for each assignment using your NetID, an underscore ‘_’, the word ‘assignment’ and the assignment number, all in lower case. For example, if your NetID is aaa11 then the repository for this assignment would be aaa11_assignment3 . Clone this repository to your local computer. Commit any files that you are required to create as part of the assignment, then push the changes back to GitHub. Whatever is in the repository at the due date will be graded.

An astrophysicist colleague was recently complaining about how long it was taking to run an N-body simulation. “It’s really just a simple calculation, and I’m only simulating four planets, but it takes nearly a minute and a half to run one simulation. I really need it done in under 30 seconds.” You kindly offer to take a look at code to see if it is possible to speed it up. Your colleague provides you with a link to the source .

Although your colleague said the code was simple, it is still fairly complex, so you decide to tackle the problem in stages. A first scan of the code reveals a number of potential areas that could be improved. These include:

  • Reducing function call overhead
  • Using alternatives to membership testing of lists
  • Using local rather than global variables
  • Using data aggregation to reduce loop overheads

As you’re a cautious programmer, you decide to address each of these in turn. This will ensure that it is possible to check the program is still working correctly after each change, and to assess the performance improvement that the change achieved. You are also aware that the program has to be maintained by others in the future, so you want to make sure that the changes do not make this more difficult, especially if the performance improvement is only minor.

For each of these areas, create a new version of nbody.py (call them nbody_1.py , nbody_2.py , etc.) and commit them to the repository. You may also add a file with any other optimizations that you find. At the beginning of each file, put a comment indicating if the change made the most improvement, second most, etc. Finally, create another file called nbody_opt.py that contains all of the optimizations you made. Put a comment at the top indicating the relative speedup of the optimized version compared to the original version. Calculate the relative speedup (R) as follows:

introduction to data science assignment 3

Are you able to get it to run in under 30 seconds?

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications

This repository contains project assignments for GE 461 (Introduction to Data Science) course taken at Bilkent University [2022-2023 Spring]

esattok/introduction-to-data-science

Folders and files, repository files navigation, bilkent university ge 461 project (2022-2023 spring).

This Repository contains the project assignments for GE 461 (Introduction to Data Science) course taken at Bilkent University in 2022-2023 Spring semester.

Project Assignments:

  • project-01 : Given a dataset perform linear regression analysis using R programming language
  • project-02 : Develop a handwritten digit recognition system
  • project-03 : Implement and test an artificial neural network (ANN) regressor
  • project-04 : Analyzing wearable-sensor data from a group of subjects performing either a fall action (F) or non-fall action
  • Each assignment is independent from the others
  • Each assignment targets a specific area in Data Science
  • Python 1.4%

COMMENTS

  1. Introduction-to-Data-Science-in-python/Assignment+3 .ipynb at master

    This repository contains Ipython notebooks of assignments and tutorials used in the course introduction to data science in python, part of Applied Data Science using Python Specialization from Univ...

  2. Introduction to Data Science in Python

    41:10 SCORE 100/100 uwu SKILLS YOU WILL GAIN* Understand techniques such as lambdas and manipulating csv files* Describe common Python functionality and feat...

  3. Introduction to Data Science with Python

    Introduction to Data Science with Python - 13 Assignment 3 (Pandas)

  4. HarvardX: Introduction to Data Science with Python

    Data scientists use a range of programming languages, such as Python and R, to harness and analyze data. This course focuses on using Python in data science. By the end of the course, you'll have a fundamental understanding of machine learning models and basic concepts around Machine Learning (ML) and Artificial Intelligence (AI).

  5. Introduction

    Introduction to Data Science I & II. Introduction Part I: Exploring Data 1. What is Data Science? 2. Data Science Case Study ... 4.3 Arrays 4.4 Assignment for Mutable Data Types 5. Randomness and Control Statements 5.1 Random Choice 5.2 Conditional Statements 5.3 Iteration and Simulation ...

  6. IBM: Introduction to Data Science

    About this course. Please Note: Learners who successfully complete this IBM course can earn a skill badge — a detailed, verifiable and digital credential that profiles the knowledge and skills you've acquired in this course. Enroll to learn more, complete the course and claim your badge! The art of uncovering the insights and trends in data ...

  7. Introduction to Data Science in Python Course

    4.6 +. 172 reviews. Beginner. Dive into data science using Python and learn how to effectively analyze and visualize your data. No coding experience or skills needed. Start Course for Free. 4 Hours 13 Videos 44 Exercises. 455,537 Learners Statement of Accomplishment.

  8. Introduction to Data Science in Python

    Module 1 • 13 hours to complete. In this week you'll get an introduction to the field of data science, review common Python functionality and features which data scientists use, and be introduced to the Coursera Jupyter Notebook for the lectures. All of the course information on grading, prerequisites, and expectations are on the course ...

  9. Introduction to Data Science with Python

    Preface. This book is developed for the course STAT303-1 (Data Science with Python-1). The first two chapters of the book are a review of python, and will be covered very quickly. Students are expected to know the contents of these chapters beforehand, or be willing to learn it quickly. Students may use the STAT201 book (https://nustat.github ...

  10. Introduction to Data Science Specialization

    This 4-course Specialization from IBM will provide you with the key foundational skills any data scientist needs to prepare you for a career in data science or further advanced learning in the field. This Specialization will introduce you to what data science is and what data scientists do. You'll discover the applicability of data science ...

  11. Data Science Introduction

    Extract the data - Transform the data to a standardized format. Clean the data - Remove erroneous values from the data. Find and replace missing values - Check for missing values and replace them with a suitable value (e.g. an average value). Normalize data - Scale the values in a practical range (e.g. 140 cm is smaller than 1,8 m. However, the ...

  12. python

    Nov 3, 2019 at 14:50 I don't have a PC right now so I can't check it but I suppose you have empty cell or NaN or ... in the excel - Natthaphon Hongcharoen

  13. Introduction to Data Science in Python

    There are 4 modules in this course. This course will introduce the learner to the basics of the python programming environment, including fundamental python programming techniques such as lambdas, reading and manipulating csv files, and the numpy library. The course will introduce data manipulation and cleaning techniques using the popular ...

  14. Assignments

    Please review the Style Guide (PDF) before attempting the problem sets. We have compiled a list of other Python resources that you may find helpful in this Additional Python Resources document (PDF).It contains links to other online textbooks on Python, debugging tools, and fun online coding challenges.

  15. COGS 9 Assignment 3

    COGS9: Introduction to Data Science Assignment #3: p-hacking Due date: Friday 2022 February 18 23:59: Grading: 10% of overall course grade; 40 points total. Download the editable version of this document and add your responses in the locations indicated. Please respond using the blue font color used in the response text, as it makes the assignments easier to grade.

  16. Introduction to Data Science

    This accessible and classroom-tested textbook/reference presents an introduction to the fundamentals of the interdisciplinary field of data science. The coverage spans key concepts from statistics, machine/deep learning and responsible data science, useful techniques for network analysis and natural language processing, and practical ...

  17. What is Data Science? Course by IBM

    This field is data science. In today's world, we use Data Science to find patterns in data and make meaningful, data-driven conclusions and predictions. This course is for everyone and teaches concepts like how data scientists use machine learning and deep learning and how companies apply data science in business.

  18. Advanced Python for Data Science Assignment 3

    Advanced Python for Data Science Assignment 3. All assigments beginning with Assignment 3 are to be submitted via GitHub. Create a new repository for each assignment using your NetID, an underscore '_', the word 'assignment' and the assignment number, all in lower case. For example, if your NetID is aaa11 then the repository for this ...

  19. The Future of Generative AI is Agentic: What You Need to Know

    3. Agent implementations with accelerators. As people realize that agent is the future of Generative AI, many technical stacks, and clouder propose their way of building AI agents. In this section, let's walk through some main technical stacks and what they propose. 3.1 Agent with LangChain. Planning and Execution with AgentExecutor:

  20. Dissolving map boundaries in QGIS and Python

    Filtering South Asia region from Asia. Image by Author. Dissolve boundaries between countries in South Asia using geopandas. To dissolve the boundaries between countries in South Asia, I used the dissolve feature in geopandas. I passed None as an argument, and specified parameters to apply certain aggregate functions, in which the population and GDP in the resulting dissolved dataframe would ...

  21. My First Billion (of Rows) in DuckDB

    Introduction. The fields of AI, Data Science, and Data Engineering are progressing at full steam. Every day new tools, new paradigms, and new architectures are created, always trying to solve the problems of the previous ones. In this sea of new opportunities, it's interesting to know a little about the available tools to solve problems ...

  22. esattok/introduction-to-data-science

    This Repository contains the project assignments for GE 461 (Introduction to Data Science) course taken at Bilkent University in 2022-2023 Spring semester. Project Assignments: project-01 : Given a dataset perform linear regression analysis using R programming language