c# - Loop thorough multiple HTML tables in HTML Agility Pack -
i followed example in below link , able parse html table datatable.
http://blog.ditran.net/parsing-html-table-to-c-usable-datalist/
but not able parse multiple tables,when traverse through tr first tr have column names , rest have data in each table.so using logic , storing table data in dictionary , sending todatatable function.
can on how can loop thoriugh multiple tables , implement same logic.appreciate it.
var trowlist = doc.documentnode.selectnodes("//tr"); foreach (htmlnode trow in trowlist) { if (previousrowspanlist.count > 0) { thedict = previousrowspanlist[0]; previousrowspanlist.remove(thedict); //remove off list isworkingwithrowspan = true; } else { thedict = new list<keyvaluepair<string, string>>(); isworkingwithrowspan = false; } var tcelllist = trow.selectnodes("td|th"); tcelcount = tcelllist.count; if (tcelcount > 0 && !(tcelcount == 1 && string.isnullorempty(tcelllist[0].innertext.trim())) ) { //colorder = 1; isnullentirerow = true; (int colindex = 0; colindex < tcelcount; colindex++) { cell = tcelllist[colindex]; colinnertext = cell.innertext.replace(" ", " ").trim(); if (!string.isnullorempty(colinnertext)) isnullentirerow = false;
//
static datatable todatatable(list<list<keyvaluepair<string, string>>> list) { datatable result = new datatable(); if (list.count == 0) return result; result.columns.addrange( list.first().select(r => new datacolumn(r.value)).toarray() ); list= list.skip(1).toarray().tolist(); list.foreach(r => result.rows.add(r.select(c => c.value).cast<object>().toarray())); return result;
sample html:
<table> <tbody> <tr><td style="background-color:#a9f5a9;font-weight:bold;" class="center">node</td><td style="background-color:#a9f5a9;font-weight:bold;" class="center">logtime</td><td style="background-color:#a9f5a9;font-weight:bold;" class="center">hardware</td><td style="background-color:#a9f5a9;font-weight:bold;" class="center">prcstate a</td><td style="background-color:#a9f5a9;font-weight:bold;" class="center">prcstate b</td><td style="background-color:#a9f5a9;font-weight:bold;" class="center">cluster</td><td style="background-color:#a9f5a9;font-weight:bold;" class="center">raid</td><td style="background-color:#a9f5a9;font-weight:bold;" class="center">ad replication a</td><td style="background-color:#a9f5a9;font-weight:bold;" class="center">ad replication b</td><td style="background-color:#a9f5a9;font-weight:bold;" class="center">file replication a</td><td style="background-color:#a9f5a9;font-weight:bold;" class="center">file replication b</td><td style="background-color:#a9f5a9;font-weight:bold;" class="center">hcstart result</td></tr> <tr><td class="center">dtmscb1</td><td class="center">2016-08-26 16:40</td><td class="center">apg43l</td><td class="center">active</td><td class="center">passive</td><td class="center">-</td><td class="center">-</td><td class="center">-</td><td class="center">-</td><td class="center">-</td><td class="center">-</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">not ok</td></tr> <tr><td class="center">msc9</td><td class="center">2016-08-26 16:40</td><td class="center">apg40c/4</td><td class="center">passive</td><td class="center">active</td><td class="center">ok</td><td class="center">ok</td><td class="center">ok</td><td class="center">ok</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">not ok</td><td class="center">ok</td><td class="center">-</td></tr> </tbody> </table> <table> <tbody> <tr><td style="background-color:#a9f5a9;" class="center">node type</td><td style="background-color:#a9f5a9;" class="center">node</td><td style="background-color:#a9f5a9;" class="center">log time</td><td style="background-color:#a9f5a9;" class="center">new mon. alarms</td><td style="background-color:#a9f5a9;" class="center">mon. alarms total</td><td style="background-color:#a9f5a9;" class="center">other alarms</td><td style="background-color:#a9f5a9;" class="center">mml</td></tr> <tr><td class="center">bsc</td><td class="center">bmbsc1</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">46</td><td class="center">445</td><td class="center">ok</td></tr> <tr><td class="center">bsc</td><td class="center">bmbsc2c</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">27</td><td class="center">609</td><td class="center">ok</td></tr> <tr><td class="center">bsc</td><td class="center">cybsc1</td><td class="center">2016-08-26 16:45</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">1</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">45</td><td class="center">665</td><td class="center">ok</td></tr> <tr><td class="center">bsc</td><td class="center">cybsc2c</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">30</td><td class="center">849</td><td class="center">ok</td></tr> <tr><td class="center">msc-bc</td><td class="center">cymscb1</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">38</td><td class="center">283</td><td class="center">ok</td></tr> <tr><td class="center">bsc</td><td class="center">dtbsc1</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">48</td><td class="center">201</td><td class="center">ok</td></tr> <tr><td class="center">bsc</td><td class="center">dtbsc2</td><td class="center">2016-08-26 16:45</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">1</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">31</td><td class="center">310</td><td class="center">ok</td></tr> <tr><td class="center">msc-bc</td><td class="center">dtmscb1</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">25</td><td class="center">130</td><td class="center">ok</td></tr> <tr><td class="center">hlr</td><td class="center">hlr1</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">16</td><td class="center">12</td><td class="center">ok</td></tr> <tr><td class="center">hlr</td><td class="center">hlr2</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">24</td><td class="center">10</td><td class="center">ok</td></tr> <tr><td class="center">msc-s</td><td class="center">msc10</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">48</td><td class="center">79</td><td class="center">ok</td></tr> <tr><td class="center">msc-s</td><td class="center">msc9</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">46</td><td class="center">131</td><td class="center">ok</td></tr> </tbody> </table>
i'll keep first answer reference, below method split original html string array each string element containing html 1 table:
public static string[] parsehtmlsplittables(string htmlstring) { string[] result = new string[] { }; if (!string.isnullorwhitespace(htmlstring)) { htmldocument doc = new htmldocument(); doc.loadhtml(htmlstring); var tablenodes = doc.documentnode.selectnodes("//table"); if (tablenodes != null) { result = array.convertall<htmlnode, string>(tablenodes.toarray(), n => n.outerhtml); } } return result; }
with result can proceed parse each table:
string[] htmltables = parsehtmlsplittables(htmlstring); foreach (string html in htmltables) { list<list<keyvaluepair<string, string>>> parseresult = parsehtmltodatatable(html); datatable datatable = todatatable(parseresult); }
Comments
Post a Comment