Lifelesspeople.com

 Forum FAQsForum FAQs  Knowledge BaseKnowledge Base  RulesRules   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   HostingHosting   RegisterRegister 
 DonateDonate   WikiWiki   ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

PHP Parsing and XML Files Trouble

 
Lifelesspeople.com Forum Index -> Web Architects' Abode
Post new topic   Reply to topic View previous topic :: View next topic  
Author Message
Scott
tutorialtoday.com


Joined: 24 Mar 2005
Posts: 2650
Location: Mississauga, Ontario

PostPosted: Mon Feb 18, 2008 10:27 pm    Post subject: PHP Parsing and XML Files Trouble Reply with quote

I am going to be as descriptive as I can but I don't know if I can give enough information. The problem I am having is the xml handler is skipping data just because the main value is blank. What I mean by that is this is getting skipped:

Code:
<tag attrib="example"><![CDATA[ ]]></tag>


Most of them have value between the CDATA but in some cases some may not and (from this example) I still need the "attrib" value to be read to be inserted as a place holder.

I have tried adding this:

Code:
xml_parser_set_option($xml_parser, XML_OPTION_SKIP_WHITE, false);


but even still it skips it.

Can this even be fixed or will I need to put in something when generating the XML file. (in the blank tags, so it doesn't skip)
_________________
Tutorial Management Script - Version 1.4 Released
TutorialToday - Up and running, submit your tutorials!
Linux Tutorials - Coming Soon
Back to top
 
ClickFanatic
Est. 2005


Joined: 18 Jan 2005
Posts: 4100
Location: A particular geographic area

PostPosted: Tue Feb 19, 2008 8:05 am    Post subject: Reply with quote

The value for XML_OPTION_SKIP_WHITE is an integer. I am not sure if this will make a difference (considering PHP's loose typing), but try the following instead:
Code:
xml_parser_set_option($xml_parser, XML_OPTION_SKIP_WHITE, 0);

_________________
Captain Jell-O Buster from the Future
[img]http://feeds.feedburner.com/sparepencil.1.gif[/img]
Back to top
 
Scott
tutorialtoday.com


Joined: 24 Mar 2005
Posts: 2650
Location: Mississauga, Ontario

PostPosted: Tue Feb 19, 2008 3:02 pm    Post subject: Reply with quote

ClickFanatic wrote:
The value for XML_OPTION_SKIP_WHITE is an integer. I am not sure if this will make a difference (considering PHP's loose typing), but try the following instead:
Code:
xml_parser_set_option($xml_parser, XML_OPTION_SKIP_WHITE, 0);


Doesn't make a difference.
_________________
Tutorial Management Script - Version 1.4 Released
TutorialToday - Up and running, submit your tutorials!
Linux Tutorials - Coming Soon
Back to top
 
LP-SolidRaven
Dictator of the Dump


Joined: 06 Jun 2004
Posts: 7297
Location: The cheese is made out of moon

PostPosted: Wed Feb 20, 2008 8:34 am    Post subject: Reply with quote

If you only intend to use PHP5 you could always use Simple XML. On the other hand you could always try to get the lines that were missing out with preg_match_all
_________________
Quote:

<bart416> I just realized something
<bart416> we celebrate the fact that this piece of rock made one rotation around a glowing ball of plasma that is kept together due to its own gravity well
<njsg> HAPPY NEW YEAR
<Easter> ^^
Back to top
 
exsanguination
Forum Regular


Joined: 27 Apr 2005
Posts: 418
Location: Australia

PostPosted: Wed Feb 20, 2008 6:09 pm    Post subject: Reply with quote

SolidRaven wrote:
On the other hand you could always try to get the lines that were missing out with preg_match_all


Bad suggestion much? You are manipulating XML, not strings.


I assume you are using the built in SAX style parser?

So you'd have you content hander like this:


Code:
function startElement($parser, $element, $attributes) {}


right?

Then you can just enumerate the attributes like:

Code:

foreach ($attributes as $key => $value) {
            echo "$key = $value \n";
}


Does that not work?

Also instead of having an empty CDATA section, why not have an empty tag i.e.
Code:

<element attribute="sample" />
Back to top
 
Scott
tutorialtoday.com


Joined: 24 Mar 2005
Posts: 2650
Location: Mississauga, Ontario

PostPosted: Wed Feb 20, 2008 9:56 pm    Post subject: Reply with quote

I've tried a print_r($attrib) and that doesn't list the tags that are blank and keep getting skipped. I was hoping to just see if I was missing some sort of option but I guess I will just have to make a little work around whether it be text that acts as a place holder or just ending the tag as you mentioned above (<tag value="value" />)

Also, I think SolidRaven meant that you read the file to a string and use preg_match_all on the string, which probably would work with the right regex.
_________________
Tutorial Management Script - Version 1.4 Released
TutorialToday - Up and running, submit your tutorials!
Linux Tutorials - Coming Soon
Back to top
 
LP-SolidRaven
Dictator of the Dump


Joined: 06 Jun 2004
Posts: 7297
Location: The cheese is made out of moon

PostPosted: Thu Feb 21, 2008 12:09 pm    Post subject: Reply with quote

exsanguination wrote:
SolidRaven wrote:
On the other hand you could always try to get the lines that were missing out with preg_match_all


Bad suggestion much? You are manipulating XML, not strings.

If you're only reading it you can perfectly use preg_match_all to find the things that the xml parser didn't catch. It's not uncommon to use such methods.

Quote:

Also instead of having an empty CDATA section, why not have an empty tag

In most cases you don't have a lot of choice about your input syntax
_________________
Quote:

<bart416> I just realized something
<bart416> we celebrate the fact that this piece of rock made one rotation around a glowing ball of plasma that is kept together due to its own gravity well
<njsg> HAPPY NEW YEAR
<Easter> ^^
Back to top
 
krt
...


Joined: 11 Jan 2005
Posts: 4765
Location: Down Under

PostPosted: Thu Feb 21, 2008 5:18 pm    Post subject: Reply with quote

What XML parser are you using Scott? No XML parser should have trouble with an empty <tag></tag> or <tag/>.

Quote:
If you're only reading it you can perfectly use preg_match_all to find the things that the xml parser didn't catch. It's not uncommon to use such methods.

Maybe in cases where the XML is not well formed and you are not the one responsible for the XML file. I think both cases do not apply here.
Back to top
 
Scott
tutorialtoday.com


Joined: 24 Mar 2005
Posts: 2650
Location: Mississauga, Ontario

PostPosted: Thu Feb 21, 2008 6:30 pm    Post subject: Reply with quote

I'm using the built in parser:

http://ca.php.net/manual/en/ref.xml.php

Anyways, I just made a little work around, so when it generates the XML file and the value is blank (or a white-space) it replaces it with some placeholder text and when it is parsed if that text is there the template is just filled in to be blank.
_________________
Tutorial Management Script - Version 1.4 Released
TutorialToday - Up and running, submit your tutorials!
Linux Tutorials - Coming Soon
Back to top
 
ClickFanatic
Est. 2005


Joined: 18 Jan 2005
Posts: 4100
Location: A particular geographic area

PostPosted: Fri Feb 22, 2008 10:40 am    Post subject: Reply with quote

Golden rule. Don't change the format, fix the parser.
This really should be your long-term solution to the problem... the workaround is dirty.
_________________
Captain Jell-O Buster from the Future
[img]http://feeds.feedburner.com/sparepencil.1.gif[/img]
Back to top
 
LP-SolidRaven
Dictator of the Dump


Joined: 06 Jun 2004
Posts: 7297
Location: The cheese is made out of moon

PostPosted: Fri Feb 22, 2008 1:13 pm    Post subject: Reply with quote

Scott wrote:
I'm using the built in parser:

http://ca.php.net/manual/en/ref.xml.php

Anyways, I just made a little work around, so when it generates the XML file and the value is blank (or a white-space) it replaces it with some placeholder text and when it is parsed if that text is there the template is just filled in to be blank.

You could do that on the fly with a regex when you load the xml data...
_________________
Quote:

<bart416> I just realized something
<bart416> we celebrate the fact that this piece of rock made one rotation around a glowing ball of plasma that is kept together due to its own gravity well
<njsg> HAPPY NEW YEAR
<Easter> ^^
Back to top
 
Scott
tutorialtoday.com


Joined: 24 Mar 2005
Posts: 2650
Location: Mississauga, Ontario

PostPosted: Fri Feb 22, 2008 2:23 pm    Post subject: Reply with quote

Yes it is a bit of a dirty solution but no one will ever set the value of this specific thing to a random string like I have set it to. The values are snippets of HTML file which form a layout.
_________________
Tutorial Management Script - Version 1.4 Released
TutorialToday - Up and running, submit your tutorials!
Linux Tutorials - Coming Soon
Back to top
 
exsanguination
Forum Regular


Joined: 27 Apr 2005
Posts: 418
Location: Australia

PostPosted: Mon Feb 25, 2008 8:37 pm    Post subject: Reply with quote

SolidRaven wrote:
exsanguination wrote:
SolidRaven wrote:
On the other hand you could always try to get the lines that were missing out with preg_match_all


Bad suggestion much? You are manipulating XML, not strings.

If you're only reading it you can perfectly use preg_match_all to find the things that the xml parser didn't catch. It's not uncommon to use such methods.


ClickFanatic has it right, file a bug report if its seriously not working properly.

I know its not uncommon to use regex to parse xml (and HTML). Doesn't mean its a good idea.

The whole idea behind SAX XML parsers, like the one Scott is using, is that you don't need to have the entire xml document in memory, you read in what you need then throw away the rest.

Using regex is terribly inefficient because you need to read the entire XML and have it in memory, then compile and run your regex over it. Soon as you document gets any kind of size and complex you are going to find that it wasn't such a good idea (insert hammer / nail quote in here).
Back to top
 
LP-SolidRaven
Dictator of the Dump


Joined: 06 Jun 2004
Posts: 7297
Location: The cheese is made out of moon

PostPosted: Tue Feb 26, 2008 12:07 pm    Post subject: Reply with quote

exsanguination wrote:

ClickFanatic has it right, file a bug report if its seriously not working properly.

They'll have it fixed in a couple of months probably. Assuming they're not busy writing up a big useless feature for some java freak who wants more java stuff in php.

Quote:
I know its not uncommon to use regex to parse xml (and HTML). Doesn't mean its a good idea.

For most documents regex really isn't as bad as you make it sound.
If it's not larger than 10kb regex will do just fine.

Quote:
The whole idea behind SAX XML parsers, like the one Scott is using, is that you don't need to have the entire xml document in memory, you read in what you need then throw away the rest.

Using regex is terribly inefficient because you need to read the entire XML and have it in memory, then compile and run your regex over it. Soon as you document gets any kind of size and complex you are going to find that it wasn't such a good idea (insert hammer / nail quote in here).

1) He didn't mention size
2) PHP handles regex rather efficient compared to other languages. (It automatically caches so compilation of patterns only happens once, and it can remember somewhere around 4000 patterns if I remember correctly).
3) You don't have a xml parser everywhere, regex on the other hand is compiled in pretty much every php install. On top of that nothing prevents you from parsing a file like a sax parser in php without using a module. You can perfectly read in line per line at a time and then apply a regex to it. And cause it caches it won't be terribly inefficient as you claim.
_________________
Quote:

<bart416> I just realized something
<bart416> we celebrate the fact that this piece of rock made one rotation around a glowing ball of plasma that is kept together due to its own gravity well
<njsg> HAPPY NEW YEAR
<Easter> ^^
Back to top
 
Display posts from previous:   
Post new topic   Reply to topic    Lifelesspeople.com Forum Index -> Web Architects' Abode All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Home | Hosting | News | Forum | Links | System Status | About | Archive | Donate ]
Powered by phpBB © 2001, 2002 phpBB Group
All trademarks and copyrights on this page are owned by their respective owners. Posts and comments are owned by the poster. Everything else © 2001 - 2007 Lifelesspeople.com