Saltar al contenido principal
waffle.svg
Domo

Best Practices for Managing DataSets

Version 1

 

Important: When referencing this page outside of Knowledge Base, use this link:  https://knowledge.domo.com?cid=bestpracticesdatasets

General Best Practices

  • Make sure that every DataSet you create or import has a name and a description with specific details about what that DataSet contains.

  • Refrain from including the name of the connector you’re using in the DataSet title name to avoid redundancy.

  • Don’t add date ranges in the DataSet name (for example, “Google Analytics 2016”). Most DataSets are set up on automated schedule, and the name may become out of date soon. This practice will also help you avoid having to rename the DataSet and all other DataSets with dates in their names.

  • For DataSet descriptions, write a basic description of what data is being pulled, such as spends, impressions, etc. You can put the update frequency in the description, but it will often depend on whether or not that element is important to typical users.

  • Always designate an owner of the DataSet. This person should be responsible for that data. If a person sets up a DataSet and then does not transfer it to the owner of the data who should responsible for updating it, then the person who created the dataset (not the presumed owner) would receive the alert if that dataset breaks instead of the presumed owner, who is tasked with fixing it.

  • Because data lineage is hard to figure out in DataSets, in your DataFlow DataSet descriptions, add the data input and outsets for that DataFlow so future users can track down information relevant to the DataSet or DataFlow. In addition, include the frequency of the DataFlow and if it is automated or on a schedule.

    • Suggested DataFlow DataSet Descriptions:

      • Input DataSets:

      • Output DataSets:

      • Run Frequency:

  • When creating DataFlows that contain a calculated field, you want to add a prefix or a suffix such as ”CALC” so future users know there a calculated field in the DataFlow. When you troubleshoot the DataFlow later, it is much easier to determine if you are aware of the calculations.

  • DataSets should be named using the following template: TYPE_ClientInfo_Source_ReportName (for example, RAW_Kablinko_Sizmek_Performance_Analytics)

  • Recommended Naming Prefixes and What Each Means

    • RAW_: Raw data file that is pulled directly from a source. Transformations will be done on this data with Magic ETL or MySQL.

    • INT_: Intermediary DataSet for DataFlow. Usually the output of a dataflow that prepares the data to be merged with another source. Add PROD DataSet to description.

    • DEV_: Data is being audited. Change to PROD when audited.

    • PROD_: Production. Used for final DataSets. These are DataSets you can build cards on.

    • TEMP_: Used for test, development, and ad-hoc DataSets. These should be periodically audited to determine necessity.

  • Recommended Naming Suffix

    • CALC: Calculated fields that have been added through the ETL process.

  • Before you select the Share Calculation option when creating a Beast Mode, determine when it is effective to select this option. If the calculation is a common field many people will be using, it’s a good idea to share the calculation. If it’s a one time, ad hoc function, it is better to not share it, because it may end up causing problems for other users who don’t understand how it is supposed to be used in context.

Data Governance

  • If your company wants to create an extensive governance model, you have the option of adding comments to a Beast Mode calculation for a DataSet. These comments, which are essentially metadata within the Beast Mode calculation itself, can identify the author of the Beast Mode calculation and create a date and description of the Beast Mode and what it does. This way other people can access this useful information when they are determining whether to use it in their cards or who they need to talk to if they have questions about it.

    • Example Metadata for Beast Mode Identifying Author, Created Date, Brief Description

/* Author:

Created Date:

Description:

  */
 

  • If you are doing a test connection on a new DataSet, always set it to update manually instead of trying to set up an automatic feed. You don’t want a test or a problem dataset uploaded regularly.

  • Have the MajorDomo or another key stakeholder audit DataSets on a regular basis to make sure that there aren’t redundant DataSets, DataSets that have zero cards, and DataSets with zero DataFlows connected to them.

  • The MajorDomo can audit cards in the Data Center by sorting by number of cards. If you find a number of cards with zero DataSets connected to them, you can either delete each or reach out to the owner of the DataSets to determine why the cards are inside of Domo and, if necessary, ask that the owner delete the DataSet(s).

  • When auditing unused DataFlows it is a bit trickier to distinguish, but if you see it hasn’t run in 1-2 months, that’s an indication that there is no schedule on it and you may be able to delete it out, or investigate with the owner what’s going on with it.

  • When setting up data governance, make individual assignments for each tools. For example, you could assign one person to audit Workbench, one to audit all of the social data, one to audit Magic ETL data, and so on. Having an owner for each data type ensures those resources are always being audited. This can help facilitate your DataSet requirements and decrease the number of issues that are brought to the Center of Excellence.

  • Have a process in place for the user to

    • Upload the data.

    • Validate that the data is correct, either in Workbench or another tool.

    • Validate the data again in Domo to make sure your numbers are projecting as expected.

    • Go through every step of the process to ensure your data is correct and you can build cards from it.

    • After you have validated the data, make sure that Analyzer understands the DataSet.

    • After the card is built, have your data owner validate the card, as there may be a dimension that the person who built the card missed in regards to filtering.

By having validation steps throughout the entire process, you make sure that the numbers are appearing as expected and aren’t missing any details in a complex data structure or have calculations that don’t make sense.

  • Have those who audit data always check errors and check run failures. Make sure these persons have credentials and access to accounts if they are not the credential owner.

  • It is good practice to audit the Data Center once a month and make sure all the important DataFlows and DataSets are running. Also, confirm that all credentials are working or re-authenticate any of the ones that are out of date.