c# - Loop thorough multiple HTML tables in HTML Agility Pack -


i followed example in below link , able parse html table datatable.

http://blog.ditran.net/parsing-html-table-to-c-usable-datalist/

but not able parse multiple tables,when traverse through tr first tr have column names , rest have data in each table.so using logic , storing table data in dictionary , sending todatatable function.

can on how can loop thoriugh multiple tables , implement same logic.appreciate it.

var trowlist = doc.documentnode.selectnodes("//tr"); foreach (htmlnode trow in trowlist)                     {                         if (previousrowspanlist.count > 0)                         {                             thedict = previousrowspanlist[0];                             previousrowspanlist.remove(thedict);        //remove off list                             isworkingwithrowspan = true;                         }                         else                         {                             thedict = new list<keyvaluepair<string, string>>();                             isworkingwithrowspan = false;                         }                         var tcelllist = trow.selectnodes("td|th");                         tcelcount = tcelllist.count;                         if (tcelcount > 0 &&                         !(tcelcount == 1 && string.isnullorempty(tcelllist[0].innertext.trim()))                         )                         {                             //colorder = 1;                             isnullentirerow = true;                             (int colindex = 0; colindex < tcelcount; colindex++)                             {                                 cell = tcelllist[colindex];                                 colinnertext = cell.innertext.replace("&nbsp;", " ").trim();                                 if (!string.isnullorempty(colinnertext))                                     isnullentirerow = false; 

//

 static datatable todatatable(list<list<keyvaluepair<string, string>>> list)         {             datatable result = new datatable();             if (list.count == 0)                 return result;              result.columns.addrange(         list.first().select(r => new datacolumn(r.value)).toarray()     );                list= list.skip(1).toarray().tolist();             list.foreach(r => result.rows.add(r.select(c => c.value).cast<object>().toarray()));               return result; 

sample html:

<table> <tbody> <tr><td style="background-color:#a9f5a9;font-weight:bold;" class="center">node</td><td style="background-color:#a9f5a9;font-weight:bold;" class="center">logtime</td><td style="background-color:#a9f5a9;font-weight:bold;" class="center">hardware</td><td style="background-color:#a9f5a9;font-weight:bold;" class="center">prcstate a</td><td style="background-color:#a9f5a9;font-weight:bold;" class="center">prcstate b</td><td style="background-color:#a9f5a9;font-weight:bold;" class="center">cluster</td><td style="background-color:#a9f5a9;font-weight:bold;" class="center">raid</td><td style="background-color:#a9f5a9;font-weight:bold;" class="center">ad replication a</td><td style="background-color:#a9f5a9;font-weight:bold;" class="center">ad replication b</td><td style="background-color:#a9f5a9;font-weight:bold;" class="center">file replication a</td><td style="background-color:#a9f5a9;font-weight:bold;" class="center">file replication b</td><td style="background-color:#a9f5a9;font-weight:bold;" class="center">hcstart result</td></tr> <tr><td class="center">dtmscb1</td><td class="center">2016-08-26 16:40</td><td class="center">apg43l</td><td class="center">active</td><td class="center">passive</td><td class="center">-</td><td class="center">-</td><td class="center">-</td><td class="center">-</td><td class="center">-</td><td class="center">-</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">not ok</td></tr> <tr><td class="center">msc9</td><td class="center">2016-08-26 16:40</td><td class="center">apg40c/4</td><td class="center">passive</td><td class="center">active</td><td class="center">ok</td><td class="center">ok</td><td class="center">ok</td><td class="center">ok</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">not ok</td><td class="center">ok</td><td class="center">-</td></tr> </tbody> </table>   <table> <tbody> <tr><td style="background-color:#a9f5a9;" class="center">node type</td><td style="background-color:#a9f5a9;" class="center">node</td><td style="background-color:#a9f5a9;" class="center">log time</td><td style="background-color:#a9f5a9;" class="center">new mon. alarms</td><td style="background-color:#a9f5a9;" class="center">mon. alarms total</td><td style="background-color:#a9f5a9;" class="center">other alarms</td><td style="background-color:#a9f5a9;" class="center">mml</td></tr> <tr><td class="center">bsc</td><td class="center">bmbsc1</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">46</td><td class="center">445</td><td class="center">ok</td></tr> <tr><td class="center">bsc</td><td class="center">bmbsc2c</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">27</td><td class="center">609</td><td class="center">ok</td></tr> <tr><td class="center">bsc</td><td class="center">cybsc1</td><td class="center">2016-08-26 16:45</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">1</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">45</td><td class="center">665</td><td class="center">ok</td></tr> <tr><td class="center">bsc</td><td class="center">cybsc2c</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">30</td><td class="center">849</td><td class="center">ok</td></tr> <tr><td class="center">msc-bc</td><td class="center">cymscb1</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">38</td><td class="center">283</td><td class="center">ok</td></tr> <tr><td class="center">bsc</td><td class="center">dtbsc1</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">48</td><td class="center">201</td><td class="center">ok</td></tr> <tr><td class="center">bsc</td><td class="center">dtbsc2</td><td class="center">2016-08-26 16:45</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">1</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">31</td><td class="center">310</td><td class="center">ok</td></tr> <tr><td class="center">msc-bc</td><td class="center">dtmscb1</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">25</td><td class="center">130</td><td class="center">ok</td></tr> <tr><td class="center">hlr</td><td class="center">hlr1</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">16</td><td class="center">12</td><td class="center">ok</td></tr> <tr><td class="center">hlr</td><td class="center">hlr2</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">24</td><td class="center">10</td><td class="center">ok</td></tr> <tr><td class="center">msc-s</td><td class="center">msc10</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">48</td><td class="center">79</td><td class="center">ok</td></tr> <tr><td class="center">msc-s</td><td class="center">msc9</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#ff0000;color:#ffffff;font-weight:bold;" class="center">46</td><td class="center">131</td><td class="center">ok</td></tr> </tbody> </table> 

i'll keep first answer reference, below method split original html string array each string element containing html 1 table:

public static string[] parsehtmlsplittables(string htmlstring) {     string[] result = new string[] { };      if (!string.isnullorwhitespace(htmlstring))     {         htmldocument doc = new htmldocument();         doc.loadhtml(htmlstring);          var tablenodes = doc.documentnode.selectnodes("//table");         if (tablenodes != null)         {             result = array.convertall<htmlnode, string>(tablenodes.toarray(), n => n.outerhtml);         }     }      return result; } 

with result can proceed parse each table:

string[] htmltables = parsehtmlsplittables(htmlstring);  foreach (string html in htmltables) {     list<list<keyvaluepair<string, string>>> parseresult = parsehtmltodatatable(html);      datatable datatable = todatatable(parseresult); } 

Comments

Popular posts from this blog

java - Jasper subreport showing only one entry from the JSON data source when embedded in the Title band -

mapreduce - Resource manager does not transit to active state from standby -

serialization - Convert Any type in scala to Array[Byte] and back -