DelphiFAQ Home Search:
General :: Programming :: Delphi :: Network
Network (local network and Internet) programming under Windows with Delphi.

Articles:

This list is sorted by recent document popularity (not total page views).
New documents will first appear at the bottom.

Featured Article

Extract the HTML from a page loaded in TWebBrowser

Question:
How can I get the HTML from a web page that I loaded in TWebBrowser? I want to clip some web contents?

Answer:

You can use the Document property - it has a lot of interesting properties:


  • Document.All
  • Document.bgColor
  • Document.Body.innerHTML
  • Document.Body.Style.overflowX
  • Document.Body.Style.overflowY
  • Document.Body.Style.zoom
  • Document.cookie
  • Document.documentElement.innerHTML
  • Document.documentElement.innerText
  • Document.FileSize
  • Document.Frames
  • Document.Images
  • Document.LastModified
  • Document.Links
  • Document.Location.Protocol
  • Document.ParentWindow
  • Document.ParentWindow.ScrollBy(iX: Integer; iY: Integer)
  • Document.Selection
  • Document.Title
  • Document.URL

of which the Body.innerText will serve our purpose. The only limitation of this solution is that it is giving us the HTML as the web browser displays it - which may be different from what 'View Source' in Internet Explorer would show. If the original HTML file included javascript dynamically generating content like this:

<script language='JavaScript'>
document.write('Hello Visitor');
</script>

then the above function will show the output 'Hello Visitor' but not the original javascript. You need to take a look at the browser cache to get to the original file or use something other than TWebBrowser.

// tested with Delphi 6, should work in Delphi 5 as well
 uses
   HTTPApp, MSHTML;
 
 procedure TForm1.WebBrowser1DocumentComplete(Sender: TObject;
   const pDisp: IDispatch; var URL: OleVariant);
 var
   document : IHTMLDocument2;
   s : string;
 begin
   // extract the day's total earnings etc
   Document := Webbrowser1.Document as IHTMLDocument2;
   s := Document.Body.innerHTML;
 
   // process this string to extract contents
 end;

Generated 0:01:41 on Nov 18, 2017