Global - (+34) 91 414 89 50 | N. America - (800) 757 6543 contact@solidq.com

COURSE: Data Science with R and SQL Server

UPCOMING DATES

There are no scheduled classes for this course at the moment. Please, send us an information request to find out more.

Introducing the language, statistics, data mining, and machine learning with R, and using data science in SQL Server and Microsoft BI stack.

R is the most popular environment and language for statistical analyses, data mining, and machine learning. Managed and scalable version of R runs in SQL Server, Power BI, and Azure ML. The main topic of the course is the R language. However, the course also shows how to use the languages and tools available in MS BI suite for data science applications, including Python, T-SQL, Power BI, Azure ML, and Excel. The labs focus on R; the demos also show the code in other languages. 

 

Do you want more info about this course?

COURSE DELIVERY OPTIONS

4-DAYS CLASSROOM

The course will take place in a classroom with no more than 12 students in order to maintain a good level of interactivity.

PRIVATE ONSITE

The course will take place in your company’s facilities. Request quote here.

Course Benefits

What does this course cover?

As being an open source development, R is the most popular analytical engine and programming language for data scientists worldwide. The number of libraries with new analytical functions is enormous and continuously growing. However, there are also some drawbacks. R is a programming language, so you have to learn it to use it. Open source development also means less control over code. Finally, the free R engine is not scalable. 

Microsoft added support for R code in SQL Server 2016, and continues to support it in later versions. A parallelized highly scalable execution engine is used to execute the R scripts. In addition, not every library is allowed in these two environments. 

Python is another language supported by SQL Server that can be used for data science applications. Python support was introduced in SQL Server 2017. The attendees of the course will learn basics about the Python language as well. 

Attendees of this course learn to program with R from the scratch. Basic R code is introduced using the free R engine and RStudio IDE. A lifecycle of a data science project is explained in details. The attendees learn how to perform the data overview and do the most tedious task in a project, the data preparation task. After data overview and preparation, the analytical part begins with intermediate statistics in order to analyze associations between pairs of variables. Then the course introduces more advanced methods for researching linear dependencies 

Too many variables in a model can make its own problem. The course shows how to do feature selection, starting with the basics of matrix calculations. Then the course switches more advanced data mining and machine learning analyses, including supervised and unsupervised learning. The course also introduces the currently modern topics, including forecasting, text mining, and reinforcement learning.  

Finally, the attendees also learn how to use the R code in SQL Server, Azure ML, and Power BI through labs, and how to use Python for inside all of the tools mentioned through demos.

Pre-requisites: What do you need to know?

Attendees should have basic understanding of data analysis and basic familiarity with SQL Server tools.  

Course Material

Every attendee gets a .PDF printout of all slides and all code and solutions for the demos presented and for the lab exercises.

In addition, every attendee gets an electronic version of the Data Science with SQL Server Quick Start Guide book by Dejan Sarka, Packt, 2018.

Classroom Setup

Each attendee works on a pre-prepared computer on a virtual machine with the following software pre-installed:

  • SQL Server 2017 or 2019 Database Engine with ML Services (In-Database)
  • AdventureWorksDW2017 demo database
  • Microsoft R Client
  • RStudio IDE
  • SQL Server Management Studio 

Expert Mentors

Our instructors have faced in previous real case projects, the same problems you are facing now. Learn from experience professionals.

Language

Deliver possible in English, Slovenian, Serbian, and Croatian; material in English. 

Course Format

This seminar consists of instructor presentations and individual work during labs. During labs, the attendees use mainly the R language. 

w

Interaction and Q&A

In all of our trainings, you will have the chance to ask individual questions and be capable of solving certain issues.

Course coverage

Compare R vs Python

Prepare Data for analytical tasks

Visualize associations between variables

Execute matrix operations

Get familiar with unsupervised learning methods

Get familiar with supervised learning methods

COURSE OUTLINE

Module 0. Introduction 

Module 1. Introducing data science and R 

  • What are statistics, data mining, machine learning… 
  • Data science projects and their lifetime 
  • Introducing R 
  • R tools 
  • R data structures 
  • Lab 1

Module 2. Introducing Python 

  • Basic syntax and objects 
  • Data manipulation with NumPy and Pandas 
  • Visualizations with matplotlib and seaborn libraries 
  • Data science with Scikit-Learn 
  • Lab 2: Discussion R vs Python 

Module 3. Data overview 

  • Datasets, cases and variables 
  • Types of variables 
  • Introductory statistics for discrete variables 
  • Descriptive statistics for continuous variables 
  • Basic graphs 
  • Sampling, confidence level, confidence interval 
  • Lab 2 

    Module 4. Data preparation 

    • Derived variables 
    • Missing values and outliers 
    • Smoothing and normalization 
    • Time series 
    • Training and test sets 
    • Lab 3 

    Module 5. Associations between two variables and visualizations of associations 

    • Covariance and correlation 
    • Contingency tables and chi-squared test 
    • T-test and analysis of variance 
    • Bayesian inference 
    • Linear models 
    • Lab 4

    Module 6. Feature selection and matrix operations 

    • Feature selection in linear modelsExecute  
    • Basic matrix algebra 
    • Principal component analysis 
    • Exploratory factor analysis 
    • Lab 5

    Module 7. Unsupervised learning

    • Hierarchical clustering
    • K-means clustering
    • Association rules
    • Lab 6

    Module 8. Supervised learning 

    • Neural Networks  
    • Logistic Regression 
    • Decision and regression trees 
    • Random forests 
    • Gradient boosting trees 
    • K-nearest neighbors 
    • Lab 7 

    Module 9. Modern topics 

    • Support vector machines 
    • Time series 
    • Text mining 
    • Deep learning 
    • Reinforcement learning 
    • Lab 8 

    Module 10. R in SQL Server and MS BI 

    • ML Services (In-Database) structure 
    • Executing external scripts in SQL Server 
    • Storing a model and performing native predictions 
    • R in Azure ML and Power BI 
    • Lab 9 

    Would you like to register for this course?

    • By submitting this information, you acknowledge that you have read the Privacy Policy and that you consent to our data processing in accordance with this Statement.