Data Processing Scenarios (ETL) - Your Feedback Matters

As you may already know, we are evaluating the importance of ETL data processing among our users and considering ways in which to improve user experiences for common ETL (extract, transform, load) processes within .NET apps. To help us finalize our future plans, please take a moment to review this post and submit your feedback to us.

Important Note: This blog post is conceptual in nature - we have yet to produce a working ETL prototype. Accordingly, all screenshots and code snippets are mere mockups.

Common ETL Scenarios

Please estimate the relevance of the following TOP 3 user stories – user stories that we selected after analyzing hundreds of user responses in our initial survey.

User Story #1: Aggregate Data from Multiple Data Sources Using a Web Service (Web App)

Scenario/Need

A Web developer creates a backend that prepares data in a format required by the client JavaScript framework. Input data is collected from multiple web services in JSON format. Once collected, data must be consolidated and grouped.

Our ETL Solution Ideas

  1. Create and deploy an ETL model in code.
    You should use our ETL API to write the model in code and pass it to EtlExecutor

Pseudo-code:

[ApiController]
[Route("api/[controller]")]
public class OrdersController : ControllerBase {
    [HttpGet]
    public IEnumerable<object> GetOrders() {

        var sqlNode1 = new SqlNode() {
            ConnectionString = "localdatabaseconnectionstring",
            Sql = "select * from employees"
        };
        var calculatedColumn = new CalculatedColumn(sqlNode1) {
            ColumnName = "FullName",
            Expression = "[FirstName] + [SecondName]"
        };

        var jsonNode2 = new JsonNode() {
            Uri = "https://sample.com/api/orders"
        };
        var parseDate = new ParseDateNode(jsonNode2) {
            ColumnName = "Date",
            Format = "yyyy-MM-dd HH:mm"
        };
        var filterNode = new FilterNode(parseDate) {
            FilterString = "IsToday([Date])"
        };
        var joinNode = new JoinNode(calculatedColumn, filterNode) {
            SourceColumn = "id",
            TargetColumn = "employeeId"
        };
        var groupByNode = new GroupByNode(joinNode) {
            GroupBy = new string[] { "FullName" },
            SummaryList = new SummaryInfo[] {
                new SummaryInfo() { Column = "Price", Summary = "Sum", NewColumnName = "TotalSales"},
                new SummaryInfo() { Column = "Price", Summary = "Max", NewColumnName = "MaxSales"}
            }
        };
        var renameColumn = new RenameColumn(groupByNode) {
            OldName = "FullName",
            NewName = "Employee name"
        };
        var outputJsonNode = new OutputJsonNode(renameColumn) {
            Indent = false
        };
        var executor = new EtlExecutor(Mode.Debug);
        return executor.Run(outputJsonNode);
    }
}
  1. Create and deploy an ETL model via a visual editor.

You can use the ‘Add New Item’ dialog in Visual Studio to create the new ETL model (“Devexpress ETL”) and customize it with a visual editor. See “Workflow Mockup” in the Technical Design section for screenshots.

You can then attach a pre-built ETL model to the EtlExecutor class in a controller class.

Pseudo-code:

[ApiController]
[Route("api/[controller]")]
public class OrdersController : ControllerBase {
    [HttpGet]
    public IEnumerable<object> GetOrders() {
        var executor = new EtlExecutor(Mode.Debug);
        return executor.Run("./Model/orders.dxetl");

    }
}

User Story #2: Pre-process Data from External Databases/File Sources and Analyze Information via UI Components (Desktop App)

Scenario/Need

A desktop application developer implements an analytics module in a WinForms application. Source data is loaded from a SQL database and Excel files. Data must be consolidated and pre-processed before it is passed to the UI component (such as a Data Grid control).

Our ETL Solution Ideas

  1. Use the ETL API to write the model in code.

Pseudo-code:

private DataGridView dataGridView1 = new DataGridView();
...
var sqlNode1 = new SqlNode() {
    ConnectionString = "localdatabaseconnectionstring",
    Sql = "select * from employees"
};

var jsonNode2 = new JsonNode() {
    Uri = "https://sample.com/api/orders"
};
var filterNode = new FilterNode(jsonNode2) {
    FilterString = "IsToday([Date])"
};
var joinNode = new JoinNode(sqlNode1, filterNode) {
    SourceColumn = "id",
    TargetColumn = "employeeId"
};
var groupByNode = new GroupByNode(joinNode) {
    GroupBy = new string[] { "FullName" },
    SummaryList = new SummaryInfo[] {
        new SummaryInfo() { Column = "Price", Summary = "Sum", NewColumnName = "TotalSales"},
        new SummaryInfo() { Column = "Price", Summary = "Max", NewColumnName = "MaxSales"}
    }
};

dataGridView1.DataSource = new ETLDataSource(groupByNode);
  1. Create and deploy an ETL model via a visual editor.

You can use the ‘Add New Item’ dialog in Visual Studio to create the new ETL model (“Devexpress ETL”) and customize it with a visual editor. See “Workflow Mockup” in the Technical Design section for screenshots.

You can then attach a pre-built ETL model to the Grid control using the ETL Data Source from the Create a New Data Source dialog.

User Story #3: Set up a Data Warehouse with Transformation Steps in Code

Scenario/Need

A developer creates an application that collects data from multiple data sources, transforms data (join, group, etc) and sends the resulting dataset to a database or a Warehouse.

Our ETL Solution Idea

  1. Create an ETL model in code.

Pseudo-code:

var sqlNode1 = new SqlNode() {
    ConnectionString = "localdatabaseconnectionstring",
    Sql = "select * from employees"
};
var calculatedColumn = new CalculatedColumn(sqlNode1) {
    ColumnName = "FullName",
    Expression = "[FirstName] + [SecondName]"
};

var jsonNode2 = new JsonNode() {
    Uri = "https://sample.com/api/orders"
};
var parseDate = new ParseDateNode(jsonNode2) {
    ColumnName = "Date",
    Format = "yyyy-MM-dd HH:mm"
};
var filterNode = new FilterNode(parseDate) {
    FilterString = "IsToday([Date])"
};
var joinNode = new JoinNode(calculatedColumn, filterNode) {
    SourceColumn = "id",
    TargetColumn = "employeeId"
};
var groupByNode = new GroupByNode(joinNode) {
    GroupBy = new string[] { "FullName" },
    SummaryList = new SummaryInfo[] {
        new SummaryInfo() { Column = "Price", Summary = "Sum", NewColumnName = "TotalSales"},
        new SummaryInfo() { Column = "Price", Summary = "Max", NewColumnName = "MaxSales"}
    }
};
var renameColumn = new RenameColumn(groupByNode) {
    OldName = "FullName",
    NewName = "Employee name"
};
var outputJsonNode = new OutputJsonNode(renameColumn) {
    Indent = false
};
var executor = new EtlExecutor(Mode.Debug);
var jsonString = executor.Run(outputJsonNode);

Options: Fine-tune an existing transformation function/parameter or create a custom transformation.

You can then create fully custom transformation steps:

class CustomStep: IEtlNode {
    IDataVerctors Eval(IDataVerctors dataFlow, CustomStepOptions option) {
        dataFlow.Add(option.NewColumnName, CustomTransformation(dataFlow[option.TargetColumn]));
        return dataFlow;
    }
}

You can also customize existing transformation steps:

public class CustomSummary : ICustomSummary {
    decimal accumulator = decimal.MaxValue;
    void AddValue(decimal value) {
        if(accumulator < value) {
            accumulator = value;
        }
    }
    decimal GetResult() {
        return accumulator;
    }
}
var groupByNode = new GroupByNode(joinNode) {
    GroupBy = new string[] { "FullName" },
    SummaryList = new SummaryInfo[] {
        new SummaryInfo() { Column = "Price", Summary = new CustomSummary(), NewColumnName = "TotalSales"},

    }
};
  1. Place a breakpoint near the required operation and run the application in debug mode to examine the current variable state. Optional: examine the model’s diagram and execute specific operations (steps).

  2. Access any application’s .NET entity when debugging the ETL model. To apply changes to the ETL model, rebuild your application (no in-place modifications in debug mode).

Technical Design of Our ETL Solutions (Survey)

Based on your data processing/business requirements, please tell us what you consider most relevant (you can also leave a comment if necessary):

Workflow Mockup - Create and deploy an ETL Model


  1. Use the Add New Item dialog in Visual Studio to add a new ETL model (“Devexpress ETL”).
  1. Define the ETL model via a visual editor.

· Add the first step (in this example, add a Web Service data source)

· Add more steps based on resulting data schema


  1. Write the EtlRunner code in controller / Bind a component to the data source.

Free DevExpress Products – Get Your Copy Today

The following free DevExpress product offers remain available. Should you have any questions about the free offers below, please submit a ticket via the DevExpress Support Center at your convenience. We’ll be happy to follow-up.
CRM-89dfd266-0864-4f53-ae23-6827370bb244
Customer141807
This is a magnificent component!!, we are very excited feature!!
10 September 2021
Stefano Paparesta
Stefano Paparesta
Fantastic !!!!
10 September 2021
Jonatas Hudler
Jonatas Hudler

Very interesting and promising feature indeed!

In our case, we mostly need ETL to compound with Reporting and Dashboard (specially!) suites. That means that the ETL process will be defined at runtime, since each costumer has their own requirements/needs.

So I was wondering if this could also be deployed and used by our end users?

In the real world, a designer is good to sell the idea, but when complexity kicks in a DSL / scripting language would be preferable. If there were a way to make a simple scripting language that would convert to the code equivalent, that would be a dream

Something like this (just a stub):


    PARAMETER StartDate = {LocalDateTimeThisMonth()} // Some Criteria expression
    PARAMETER EndDate = {AddDays(AddMonth(LocalDateTimeThisMonth, 1), -1)}

    DATASOURCE SomeSqlSource1 = SQLSERVER 'connection_string'
    DATASOURCE SomeSqlSource2 = ORACLE 'connection_string'
    DATASOURCE SomeHttpSource = JSON 'http://some.service.com/resource'

    DATASOURCE JoinSource = LEFT JOIN SomeSqlSource1.{"FieldA1", "FieldB1"} SomeSqlSource2.{"FieldA2", "FieldB2"}
    DATASOURCE JoinSource2 = LEFT JOIN JoinSource.{"FieldA1", "FieldB1"} SomeHttpSource.{"FieldA3", "FieldB3"}

    FILTER JoinSource2 = {[FieldDateTime] >= ?StartDate And [FieldDateTime] <= ?EndDate}

    RETURN {JoinSource2}
10 September 2021
Hedi Guizani
Hedi Guizani
I wanted to share my experience and reflexions about a DevExpress ETL engine.

*My Current Approche and Reflexions
1-To be able to get data from different kind of datasources I have used "CData .net connectors" [https://www.cdata.com/ado/]
2-I Had to make these connectors available for XPO via custom Xpo providers [This was a pain: the documentation is poor, Suggest a standard XPO provider for SQL92 complient database, or at least a very detailed project Example.]
3-For the Visual part I have used your great but not perfect Dashboard designer:
-Configure the datasources
-User the QueryDesinger to federate data.
-Create A GridItem/PivoItem to further manipulation of the data output.
-Read and export data from the GridItem/PivoItem.
**Chalenges:
-the Dashboard Api do not have a strong data transformation tools.
-the Dashboard Api is not plateform Independent.
-Performence Issue (mainly because of the (Read and export data from the GridItem/PivoItem) because It require visual component to be created.

*Recomendation:

[the Extract step]
-I don't think that the main goal of Devexpress if to provide a provider for every datasource out there, Other specialized companies has already done it and done it right :
"CData .net connectors" [https://www.cdata.com/ado/], Many BI software Company use it (Tableau., Ms PowerBI..)
-The creation of custom XPO providers should be less painfull so the we can easyly integrate any datasource.
-The federation datasource should be better the was it is today (https://docs.devexpress.com/XtraReports/400917/detailed-guide-to-devexpress-reporting/bind-reports-to-data/data-federation).
-I think the QueryDesinger is a great tool to query data, but can be largely improved,(sql editting...) (https://www.activequerybuilder.com/ is a better component for this task).
-The Sql language still the easiest way to get data.
-The SqlDataSource (https://docs.devexpress.com/CoreLibraries/DevExpress.DataAccess.Sql.SqlDataSource) should be agonistic and able to do datafedation between other sources (much like the datasource engine in dashboads, or reports)
-By experience ETL use many parameters so an "end user data entry form designer" for parameters values entries has become essensial (semilar like reports or dashboads but hope better [they are not very end user friendly].

[the Tranformation step]
-the devexpress Expression/Criteria engine is a killer and very helpful specialy for at runtime configuration,
-Please keep improving the language as well as the criteria and expression Editors.
-I also think that the criteria engine (or a derived one) should evoluate to a transformation (procedural) language (be able to execute actions like remove first line, execute loops...).
A great exemple is IrAuthor from Inrule (https://inrule.com/platform-overview/author-test/) (https://www.youtube.com/watch?v=J2x15gUHcbI&t=1s)
PowerBI has also a great transformation language : (https://docs.microsoft.com/en-us/power-bi/transform-model/desktop-query-overview).

-A workflow engine may help a lot here, speciacily that the depreciation of (ms workflow engine) is preventing many XAF developper (using the workflow module) to move to .Net5+ and blazor.
-A background excuction engine should be easy integrable to the ETL engine, (async tasks, paralelle tasks, batched tasks, scheduled <CRON> tasks), hangfire and hangfire Pro are a perfect exemple.

[the Load step]
-ETL systems are usualy used, in Import/Export data scenarios, Business Inteligence, or data syncronization operation.
-It will be realy great if Devexpress provides a standard Api for data syncronization between The ETL Api and XPO.
-Real time datasource like (MQTT, OPC UA, AMQP...) is a must have in some of todays business solutions, speciacily the one involving realtime monotoring, (IOT, energy consuption...)
-For the BI the dashboad engine (keep up the great work), some improvement in the plateform agonistic feild (on the data side [MultiDimensionalData Object], on the visual side (Agonistic Dashboard custom Items...)


*Prespectives:
A well designed ETL engine can be easy tranformed and used in a business rule engine(decision management framework),
[https://inrule.com/platform-overview/] [https://www.drools.org/]

Thank you for the great work

11 September 2021
JeePee [NL]
JeePee [NL]
Have a look at https://www.altova.com/mapforce how it can be shaped...
13 September 2021
Alex (DevExpress Support)
Alex (DevExpress)

@Everyone: thank you so much for your great feedback. Your thoughts are especially useful at this early stage of development.


@Jonatas, 

An ETL tool for end-users represents a natural evolution of this product – something that can be integrated into our existing solutions such as BI Dashboard or Reports. We have not yet determined what would work best for actual end-users (visual designer or scripting).

13 September 2021
Baldur Fürchau
Baldur Fürchau

If you use a datawarehouse (DWH) to simplify the ETL, many things can be done easy:

The ETL is defined once for each destination table (Sales, Orders, ,,,,) and load the data to the DWH. Transforms can be done mostly by SQL or with a scripting extension.
The Loading can than be done as often as needed. Compete, incremental or merged (update or insert). This are IT tasks because here is the knowhow about the data.

DevExpress-Components use than only one datasource, the Datawarehouse.
ETL for Endusers (unexperienced for SQL and performance considerations) isn't nessesary.

Here you can use all Filters/Grouping/Computing what is needed for the dashboards and, really, i don't miss here ETL specifics. Here i need more extensions for data consuming and analysis.

16 September 2021
Anton Kolesnik 2
Anton Kolesnik 2

Hi; Azure Data Factory and PowerBI/PowerQuery covers most of these options for cloud; SSIS on premise; Not sure what solution are you planning to deliver but there is significant competition :)

28 September 2021
Baldur Fürchau
Baldur Fürchau

I think, the ETL ist not the point (for us) of interrest, the point is the usage szenarios for frontends.
You have a great toolbox like the Dashboard (Web/Forms/...) which are very usefull.
You integrate your standard controls into the dashboard but with less functionality than the original control.
Our need is, that the full power of your controls can be used within a dashboard:
- Gantt, Diagram, Senkey, Sunburst, Tree also in the web.
- More configurations for the chart.
- "Actionbinding" between Widgets (e.g. Parallel Drilldown)
- Simple way to rearange drilldown nodes over some widgets without designer
- 3D-Charting (the native dxChart does it)

And so on.
I'm sorry, but the ETL-Processing is the smallest part of data analysis and data presentation. ETL is done from us within minutes but it need hours and days to design the result in an usable and understandable way.

We have made at the moment 14 dashboard extensions to reach standard behaviour in the underlaying controls. I have a lot of customer wishes to resolve.

But i have no additional wishes from our ETL-Process;-).
The Enduser (our Customers) don't use Visual Studio.
Our Enduser are Nonprogrammers, they are Controllers, Logistic/Warehouse-Leader, Productmanagers and so on and CEO's.

28 September 2021
Paul Fuller
Paul Fuller

As others have commented, there are very rich and powerful tools in the DWH / ETL space.  However I think there is a very good potential space for DX to fill.  The ingestion of data through code and metadata driven processing pipelines and destination to either target DBMS / files or DX UI components.  There are many cases where there are gaps to fill or specialised tools need to be built instead of using the dominant ETL tools.  C#, LINQ and DX are my preferred combination of technologies but other components like Linq2Excel, FileHelpers and Newtonsoft are needed at present.

I'm very supportive of DX getting further into this area.

28 September 2021
Peter Niederberghaus
Peter Niederberghaus
Just for understanding your sample code. I'm missing the processing of sqlNode1 here:

var sqlNode1 = new SqlNode() {
ConnectionString = "localdatabaseconnectionstring",
Sql = "select * from employees"
};
var calculatedColumn = new CalculatedColumn() {
ColumnName = "FullName",
Expression = "[FirstName] + [SecondName]"
};


Should sqlNode1 be an argument of CalculatedColumn()?

var calculatedColumn = new CalculatedColumn(sqlNode1) {
 
28 September 2021
Paulo Monteiro melo
Marcio Matos
For me, having a web dashboard interface for the end user to do DATA FEDERATION is a much higher priority. This federation that exists today, carried out in code, does not serve the end user, who often only needs to join or relate a simple excel spreadsheet to an existing sql query. It's a shame this subject hasn't evolved. Today the end user can only use one data source in the dashboard item.

Another point that I consider most relevant is to improve the Query Builder inside the panel. Have the ability to collapse and expand the chosen tables as a way to organize the model. Currently, if you include many tables, it is difficult to relate these different tables.
29 September 2021
Alex (DevExpress Support)
Alex (DevExpress)
Once again, I'd like to thank everyone for your feedback, it helps us focus on most important use cases.

@Peter: yes, exactly. Thank you for the comment, I will fix the code snippet.

@Marcio: good point. We plan to enhance Query Builder and Data Federation in 2022 updates.

1 October 2021
Henrik Werkström
Henrik Werkström
+ 1 @Marcio

Also an ability to create sub queries to join on would be awesome and solve many every day issues we encounter. I.e. adding the same table to a query many times and with specific filters on each instance.
4 October 2021
Alexander Krakhotko (Xafari team)
Alexander Krakhotko (Xafari team)
Hi!
it would be nice to provide compatibility with some existing DSL language, for example, Power Query M formula language
8 November 2021
CRM-7feeacec-0656-4254-a965-d2db0bc77d3f
RaffaeleZurlo
The ETL model with which Devexpress component is installed?
19 November 2021
NHIF
NHIF

If this could include something like debezium that will enable event streaming from database it will revolutionize data processing in the .net world.



27 December 2021
NHIF
NHIF
27 December 2021

Please login or register to post comments.