ADTA 5240.

Team – B: [Every document and screenshot should be named as ‘Team-B’]

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Overview:

This week you will work with your group on your final project.

Objectives

Apply concepts learned about

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper
  • Hadoop
  • Ecosystem

    Apply concepts learned in data preparation to preprocess data

    Construct an external table using basic SQL commands in

    BigQuery

  • Develop queries in BigQuery.
  • Construct a well-defined schema using basic HQL commands in
  • Hive
  • Develop queries in Hive
  • Develop queries in
  • Spark
  • Instructions
  • Each group will research their assigned use case. They will select a static dataset and streaming data source from the approved list provided or locate another and obtain the instructors’ approval.
  • Each group will create an executive summary. This summary should be between 400 and 550 words, not including the title page, references, or other supporting documents. It should read like a summary of your presentation, giving the use case project, stepping through the data lifecycle, identifying tools/applications used during certain phases of the data lifecycle, and concluding with the next steps for the data science or analyst teams. The executive summary is in Times New Roman, 12-point, with one-inch margins.
  • Each group will create a document with screenshots that includes the project and storage they created for their use case in GCP, setting up their Hadoop ecosystem, performing data processing with their static and streaming data, and performing queries in BigQuery, Hive, and Spark to ensure the quality of their data for the data science or analysts teams. Through each step, the team will take screenshots of their work and present them in a word document with brief explanations of the screenshots. The desciption should include the application used, the task performed, and why it was performed. Do not include how-to instructions.

    Each group will create a presentation that tells a story using the data lifecycle as a guide, and they will present their work during the designated time. You may be creative with the presentation with PowerPoint. The presentation is a professional business presentation. Each member of the group should speak. After the presentation, the group will entertain questions from the audience. The presentation should be at least 10-15 minutes in length.

    Meeting_Notes_Template:

    I have provided the ‘Meeting_Notes_Template.docx’, please fill the provided template.

    Approved Data Sources:

    I have provided ‘Approved Data Sources.pdf’ please select a two datasets from any USE CASES Approved Data Sources provided in the pdf.

    PPT and Word:

    Topic: Use cases from the discussion post

    Data: Use approved data sources (two or more)

    Executive Summary (25%): This paper should be between 400 and 550 words, not including the title page, code, and references.

    Screenshots (25%): These screenshots should show how you applied what you learned. Create a new project in GCP for this use case.

  • Presentation (50%): The group will present
  • Grading: This project is worth 20% of your final course grade. The Executive Summary will comprise 25% of this grade, screenshots 25%, and the presentation will be 50%
  • Document Type: Word and PPT
  • Executive Summary Requirements:
  • 400 to 550 words, not counting the title page, references, or supporting documents.
  • Title page: Organization Name, Logo, Use case, group number, and group members
  • Introduction: Introduce the use case and its purpose (Example: Data Engineering Request)
  • Body: Step through the data lifecycle with your use case and the tasks you did

  • Conclusion: Summarize and discuss the next steps for the data science and analyst teams
  • Double-spaced
  • Word Document
  • References
  • Application Screenshots Requirements:
  • GCP project & storage
  • Hadoop

  • OpenRefine
  • BigQueryHiveSpark

  • Include an explanation (3-10 sentences) with the screenshots telling the application used and the task performed.
  • Supporting Documents:
  • Reference page
  • Meeting notes or Task board
  • Data Sheet – List of Data sources and any

    additional information such as the website

  • address
  • Other documents
  • Word Document

    Meeting Notes Template:

    Date:

  • Start and End time:
  • Attendees:
  • Note-taker:

  • Notes:
  • Decisions:
  • Action Items:
  • Task board:
  • Create a Task board using MS Teams – Planner,
  • Excel, or Word
  • Task board Columns
  • To-Do

  • In Progress
  • Review

  • Done
  • Task Info: Description, Owner, Due Date
  • Presentation Requirements:
  • Business casual
  • TELL A STORY
  • Every group member must present
  • 10-15 minutes to present

  • 2 minutes for questions
  • 10-20 PowerPoint Slides
  • Title page: Organization Name, Logo, Use case, group number, and group members

  • Outline or Agenda
  • Every step of the data lifecycle – No definitions
  • Hive and Spark SQL comparison chart
  • A few of your screenshots (No more than 5)
  • Cite the source on the slide if not your own words
  • Are you stuck with your online class?
    Get help from our team of writers!

    Order your essay today and save 20% with the discount code RAPID