Thread: HTML parsing

Page 1 of 3 123 LastLast
Results 1 to 10 of 23
  1. #1 HTML parsing 
    Community Veteran

    mige5's Avatar
    Join Date
    Aug 2008
    Posts
    5,528
    Thanks given
    573
    Thanks received
    1,410
    Rep Power
    2114
    EDIT: I guess this is solved.
    Number of page #1 releases with most views & posts: (Updated: 2023)
    RS2 server section: 1
    RS2 client section: 2
    Reply With Quote  
     

  2. #2  
    Registered Member
    Project's Avatar
    Join Date
    Dec 2010
    Posts
    2,669
    Thanks given
    1,043
    Thanks received
    820
    Rep Power
    1101
    Does it matter what language/program the parser is?
    Reply With Quote  
     

  3. #3  
    Community Veteran

    mige5's Avatar
    Join Date
    Aug 2008
    Posts
    5,528
    Thanks given
    573
    Thanks received
    1,410
    Rep Power
    2114
    Quote Originally Posted by Project View Post
    Does it matter what language/program the parser is?
    needs to be java, as the rest of the tool I have is in java..
    Number of page #1 releases with most views & posts: (Updated: 2023)
    RS2 server section: 1
    RS2 client section: 2
    Reply With Quote  
     

  4. #4  
    Registered Member
    Project's Avatar
    Join Date
    Dec 2010
    Posts
    2,669
    Thanks given
    1,043
    Thanks received
    820
    Rep Power
    1101
    Quote Originally Posted by mige5 View Post
    needs to be java, as the rest of the tool I have is in java..
    Looks like jsoup would be the best route for you then.
    Reply With Quote  
     

  5. #5  
    Registered Member
    hc747's Avatar
    Join Date
    Dec 2013
    Age
    26
    Posts
    1,474
    Thanks given
    3,312
    Thanks received
    691
    Rep Power
    1098
    Check that the line contains "img src="
    -> line = line.substring(line.indexOf("=") + 1, line.indexOf(">") - 1);
    Reply With Quote  
     

  6. #6  
    Community Veteran

    mige5's Avatar
    Join Date
    Aug 2008
    Posts
    5,528
    Thanks given
    573
    Thanks received
    1,410
    Rep Power
    2114
    Quote Originally Posted by hc747 View Post
    Check that the line contains "img src="
    -> line = line.substring(line.indexOf("=") + 1, line.indexOf(">") - 1);
    umm.. not sure if its used elsewhere.. would there be an easy way to check that it is inside this: <div class="stories">
    Number of page #1 releases with most views & posts: (Updated: 2023)
    RS2 server section: 1
    RS2 client section: 2
    Reply With Quote  
     

  7. #7  
    Registered Member
    Project's Avatar
    Join Date
    Dec 2010
    Posts
    2,669
    Thanks given
    1,043
    Thanks received
    820
    Rep Power
    1101
    Quote Originally Posted by mige5 View Post
    umm.. not sure if its used elsewhere.. would there be an easy way to check that it is inside this: <div class="stories">
    Try Java Regex to extract text between tags,

    So you could make a filter that

    looks for
    Code:
    data-login-required="true">
        
    		
    		<img src="
    as the first string
    and the second would be
    Code:
    ">//THIS LINK HERE
    	
    	</a>
    
    <!--</div>-->
    Reply With Quote  
     

  8. #8  
    Registered Member
    hc747's Avatar
    Join Date
    Dec 2013
    Age
    26
    Posts
    1,474
    Thanks given
    3,312
    Thanks received
    691
    Rep Power
    1098
    Quote Originally Posted by mige5 View Post
    umm.. not sure if its used elsewhere.. would there be an easy way to check that it is inside this: <div class="stories">
    An easy enough way, yes.
    Reply With Quote  
     

  9. #9  
    Super Donator
    _sky's Avatar
    Join Date
    Aug 2015
    Posts
    151
    Thanks given
    116
    Thanks received
    72
    Rep Power
    59
    Quote Originally Posted by mige5 View Post
    Well did some googling and seemed like jsoup is the most recommended html parser.. havent really looked into it yet, as currently a bit busy.

    Anyway this is just part of the html page.. but yeah, I would need something that gets me all the img links I marked below:
    Jsoup is very easy to get into, it's imo by far the best html parser for java.
    You can simply get the src of all images of a webpage like this.

    Code:
    Document document = Jsoup.connect(YOUR_URL).get();
    Elements images = doc.select("img");
    for (Element element : images) {
    	String src = element.attr("src");
     	System.out.println(src);
    }
    If you only want images inside a specific div, all you have to do is edit your doc.select() to
    Code:
    doc.select("div.DIV_CLASS_NAME > img");
    Lemme know if you got it working!
    Reply With Quote  
     

  10. Thankful user:


  11. #10  
    Community Veteran

    mige5's Avatar
    Join Date
    Aug 2008
    Posts
    5,528
    Thanks given
    573
    Thanks received
    1,410
    Rep Power
    2114
    Quote Originally Posted by _sky View Post
    Jsoup is very easy to get into, it's imo by far the best html parser for java.
    You can simply get the src of all images of a webpage like this.

    Code:
    Document document = Jsoup.connect(YOUR_URL).get();
    Elements images = doc.select("img");
    for (Element element : images) {
    	String src = element.attr("src");
     	System.out.println(src);
    }
    If you only want images inside a specific div, all you have to do is edit your doc.select() to
    Code:
    doc.select("div.DIV_CLASS_NAME > img");
    Lemme know if you got it working!
    Its not finding the links

    Just noticed that If i just view the page source or save it as a html, I cant see the links anywhere either.. but if I use "inspect element" it looks like theres more stuff and theres also the links..
    Number of page #1 releases with most views & posts: (Updated: 2023)
    RS2 server section: 1
    RS2 client section: 2
    Reply With Quote  
     

Page 1 of 3 123 LastLast

Thread Information
Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)


User Tag List

Similar Threads

  1. Parsing HTML
    By Evan in forum Application Development
    Replies: 13
    Last Post: 02-18-2009, 09:58 PM
  2. [forums, HTML, PHP and others]Adding in some BBCodes
    By newservermaker in forum Website Development
    Replies: 3
    Last Post: 04-24-2008, 07:13 AM
  3. Simple HTML Beginners Tutorial
    By T in forum Website Development
    Replies: 6
    Last Post: 04-18-2008, 09:43 PM
  4. Html code for Vb forums?
    By Zak in forum Chat
    Replies: 3
    Last Post: 01-19-2008, 02:50 PM
  5. New RuneScape's Site Html
    By „Elf„ in forum Tutorials
    Replies: 35
    Last Post: 10-29-2007, 05:50 PM
Posting Permissions
  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •