IPN CIC

    Welcome
    IPN-Dharma AI Lab

    This is an IPN CIC - DHARMA initiative to provide an Artificial Intelligence Laboratory to motivate researchers, professors and students to take advantage of the courses, resources and tools of the main technology platforms of the industry in the areas of Machine Learning, Data Science, Cloud Computing, Artificial Intelligence and Internet of Things with the purpose of generating a practical experience through a learning model between peers and by objectives.

    Level 2: Contextual Knowledge

    Data Engineering, Big Data, and Machine Learning on GCP

    This program provides the skills you need to advance your career and provides training to support your preparation for the industry-recognized Google Cloud Professional Data Engineer Certification.

    What you will learn:
    • Identify the purpose and value of the key Big Data and Machine Learning products in Google Cloud.
    • Use Cloud SQL and Dataproc to migrate existing MySQL and Hadoop/Pig/Spark/Hive workloads to Google Cloud.
    • Employ BigQuery to carry out interactive data analysis.
    • Choose between different data processing products on Google Cloud.

    Courses in this program

    1) Google Cloud Big Data and Machine Learning Fundamentals

    This course introduces participants to the big data capabilities of Google Cloud. Through a combination of presentations, demos, and hands-on labs, participants get an overview of Google Cloud and a detailed view of the data processing and machine learning capabilities. This course showcases the ease, flexibility, and power of big data solutions on Google Cloud.

    What you will learn:
    • Identify the purpose and value of the key Big Data and Machine Learning products in Google Cloud.
    • Use Cloud SQL and Dataproc to migrate existing MySQL and Hadoop/Pig/Spark/Hive workloads to Google Cloud.
    • Employ BigQuery to carry out interactive data analysis.
    • Choose between different data processing products on Google Cloud.

    Esfuerzo  Estimated effort 12 hours

    Idioma  Spanish & English language

    Link  Coursera Spanish

    Link  Coursera English

    2) Modernizing Data Lakes and Data Warehouses with GCP

    The two key components of any data pipeline are data lakes and warehouses. This course highlights use-cases for each type of storage and dives into the available data lake and warehouse solutions on Google Cloud in technical detail. Also, this course describes the role of a data engineer, the benefits of a successful data pipeline to business operations, and examines why data engineering should be done in a cloud environment.

    What you will learn:
    • Understand the differences between data lakes and data warehouses, the two key components of any data pipeline.
    • Explore use-cases for each type of storage and dive into the available data lake and warehouse solutions on Google Cloud in technical detail.
    • Understand the role of a data engineer and the benefits of a successful data pipeline to business operations.
    • Examine why data engineering should be done in a cloud environment.

    Esfuerzo  Estimated effort 7 hours

    Idioma  Spanish & English language

    Link  Coursera Spanish

    Link  Coursera English

    3) Building Batch Data Pipelines on GCP

    Data pipelines typically fall under one of the Extra-Load, Extract-Load-Transform or Extract-Transform-Load paradigms. This course describes which paradigm should be used and when for batch data. Furthermore, this course covers several technologies on Google Cloud for data transformation including BigQuery, executing Spark on Dataproc, pipeline graphs in Cloud Data Fusion and serverless data processing with Dataflow. Learners will get hands-on experience building data pipeline components on Google Cloud.

    What you will learn:
    • Review different methods of data loading: EL, ELT and ETL and when to use what.
    • Run Hadoop on Dataproc, leverage Cloud Storage, and optimize Dataproc jobs.
    • Use Dataflow to build your data processing pipelines.
    • Manage data pipelines with Data Fusion and Cloud Composer.

    Esfuerzo  Estimated effort 17 hours

    Idioma  Spanish & English language

    Link  Coursera Spanish

    Link  Coursera English

    4) Building Resilient Streaming Analytics Systems on GCP

    Processing streaming data is becoming increasingly popular as streaming enables businesses to get real-time metrics on business operations. This course covers how to build streaming data pipelines on Google Cloud. Pub/Sub is described for handling incoming streaming data. The course also covers how to apply aggregations and transformations to streaming data using Dataflow, and how to store processed records to BigQuery or Cloud Bigtable for analysis. Learners will get hands-on experience building streaming data pipeline components on Google Cloud.

    What you will learn:
    • Understand use-cases for real-time streaming analytics.
    • Use Pub/Sub asynchronous messaging service to manage data events.
    • Write streaming pipelines and run transformations where necessary.
    • Interoperate Dataflow, BigQuery and Pub/Sub for real-time streaming and analysis.

    Esfuerzo  Estimated effort 8 hours

    Idioma  Spanish & English language

    Link  Coursera Spanish

    Link  Coursera English

    5) Smart Analytics, Machine Learning, and AI on GCP

    Incorporating machine learning into data pipelines increases the ability of businesses to extract insights from their data. This course covers several ways machine learning can be included in data pipelines on Google Cloud depending on the level of customization required. For little to no customization, this course covers AutoML. For more tailored machine learning capabilities, this course introduces Notebooks and BigQuery machine learning (BigQuery ML). Also, this course covers how to productionalize machine learning solutions using Kubeflow. Learners will get hands-on experience building machine learning models on Google Cloud.

    What you will learn:
    • Understand use-cases for real-time streaming analytics.
    • Use Pub/Sub asynchronous messaging service to manage data events. Write streaming pipelines and run transformations where necessary.
    • Understand both sides of a streaming pipeline: production and consumption.
    • Interoperate Dataflow, BigQuery and Pub/Sub for real-time streaming and analysis.

    Esfuerzo  Estimated effort 9 hours

    Idioma  Spanish & English language

    Link  Coursera Spanish

    Link  Coursera English

    © 2015 |Laboratorio de Microtecnología y Sistemas Embebidos | Centro de Investigación en Computación | Instituto Politécnico Nacional