Connect your data: Upload a dataset

    After creating your hypothesis, it's time to think about connecting your data.

    Let's get into into a typical Data Science Workflow. It usually starts with a Hypothesis or Business Understanding to set the scope and understand the problem, which we talk about in a separate article of this knowledge base.

    The main stages of a typical workflow are illustrated below, and in this article, we will focus on the Connect stage.

    ClaroMente 3
    Workflow

    Here are some things to think about during this stage:

    Where is your data normally stored?

    To get your data into Clarofy for analysis, we must first understand where the data is coming from and how we might ‘query’ or request it to get what we want. You might be using a tool like SQL or querying directly from your historian to create a table.

    Here’s a list of things that can be helpful when writing your query.

    Specific Variables

    What are the variables of interest for your analysis? A hypothesis should be built up before the analytical workflow begins

    Location of data

    Is your data stored in a local PC? Perhaps in your organisation’s ‘Data Lake’, or database.

    Basic equations between variables

    Which of your variables do you want to relate to each other? You might want to add, subtract, or otherwise manipulate a variable from one location to get a more reflective analysis.

    Relationships between different tables in the database i.e. join types

    Some tables may need to be joined. What is the primary key that the join will be made on? What type of join will be most useful (left, inner, outer)?

    Time periods and other filters

    It’s helpful to have extra data, but too much will slow down your analysis. Use the correct filters so you have the most relevant data

    Aggregations/Summarisation

    Particularly important with processing data where too granular (minutely or 5-minutely) might cause delays, so there might be opportunities to aggregate by time or category.

     

    Once you have the data you need for your analysis, you might need to join different tables or combine them into a single table. A few ways to join data tables are outlined below.

    Screenshot 2022-04-01 154048

    When connecting to your data, other issues that may come up are:

    • Aggregations - Do you need to aggregate for time (eg: make everything an hourly average) or other category (eg: aggregate max pump speed for each crew)
    • Units - If you are working in different regions/locations, you're likely to encounter different units for the same measurement (eg: t, kg, lb)
    • Data types - to undertake calculations, data normally needs to be numeric. In the downloading or connection process, sometimes the data type (eg: continuous, discrete, boolean, string, date) may change, and this is something to keep an eye out for.

    Clarofy can accept .xls, .xlsx, .csv and .txt files. We are currently working on different methods to input your data so stay tuned for updates!

    ConnectPage-1

    Previous Article - Business Understanding: Verify a Hypothesis

    Before we plunge into an analytics project, it's important to ask questions and discuss the 'business understanding' with all the major stakeholders.

    Read Previous Article

    Next Article - Prepare Your Data:  What needs to be done before data is ready for analysis?

    Once your data is connected, it usually needs to be cleaned and manipulated before analysis.

    Read Next Article