| Author |
Message |
Scott tutorialtoday.com

Joined: 24 Mar 2005 Posts: 2650 Location: Mississauga, Ontario
|
Posted: Mon Feb 18, 2008 10:27 pm Post subject: PHP Parsing and XML Files Trouble |
|
|
I am going to be as descriptive as I can but I don't know if I can give enough information. The problem I am having is the xml handler is skipping data just because the main value is blank. What I mean by that is this is getting skipped:
| Code: | | <tag attrib="example"><![CDATA[ ]]></tag> |
Most of them have value between the CDATA but in some cases some may not and (from this example) I still need the "attrib" value to be read to be inserted as a place holder.
I have tried adding this:
| Code: | | xml_parser_set_option($xml_parser, XML_OPTION_SKIP_WHITE, false); |
but even still it skips it.
Can this even be fixed or will I need to put in something when generating the XML file. (in the blank tags, so it doesn't skip) _________________ Tutorial Management Script - Version 1.4 Released
TutorialToday - Up and running, submit your tutorials!
Linux Tutorials - Coming Soon |
|
| Back to top |
|
| |
ClickFanatic Est. 2005

Joined: 18 Jan 2005 Posts: 4100 Location: A particular geographic area
|
Posted: Tue Feb 19, 2008 8:05 am Post subject: |
|
|
The value for XML_OPTION_SKIP_WHITE is an integer. I am not sure if this will make a difference (considering PHP's loose typing), but try the following instead:
| Code: | | xml_parser_set_option($xml_parser, XML_OPTION_SKIP_WHITE, 0); |
_________________ Captain Jell-O Buster from the Future
[img]http://feeds.feedburner.com/sparepencil.1.gif[/img] |
|
| Back to top |
|
| |
Scott tutorialtoday.com

Joined: 24 Mar 2005 Posts: 2650 Location: Mississauga, Ontario
|
Posted: Tue Feb 19, 2008 3:02 pm Post subject: |
|
|
| ClickFanatic wrote: | The value for XML_OPTION_SKIP_WHITE is an integer. I am not sure if this will make a difference (considering PHP's loose typing), but try the following instead:
| Code: | | xml_parser_set_option($xml_parser, XML_OPTION_SKIP_WHITE, 0); |
|
Doesn't make a difference. _________________ Tutorial Management Script - Version 1.4 Released
TutorialToday - Up and running, submit your tutorials!
Linux Tutorials - Coming Soon |
|
| Back to top |
|
| |
LP-SolidRaven Dictator of the Dump

Joined: 06 Jun 2004 Posts: 7297 Location: The cheese is made out of moon
|
Posted: Wed Feb 20, 2008 8:34 am Post subject: |
|
|
If you only intend to use PHP5 you could always use Simple XML. On the other hand you could always try to get the lines that were missing out with preg_match_all _________________
| Quote: |
<bart416> I just realized something
<bart416> we celebrate the fact that this piece of rock made one rotation around a glowing ball of plasma that is kept together due to its own gravity well
<njsg> HAPPY NEW YEAR
<Easter> ^^
|
|
|
| Back to top |
|
| |
exsanguination Forum Regular
Joined: 27 Apr 2005 Posts: 418 Location: Australia
|
Posted: Wed Feb 20, 2008 6:09 pm Post subject: |
|
|
| SolidRaven wrote: | | On the other hand you could always try to get the lines that were missing out with preg_match_all |
Bad suggestion much? You are manipulating XML, not strings.
I assume you are using the built in SAX style parser?
So you'd have you content hander like this:
| Code: | | function startElement($parser, $element, $attributes) {} |
right?
Then you can just enumerate the attributes like:
| Code: |
foreach ($attributes as $key => $value) {
echo "$key = $value \n";
} |
Does that not work?
Also instead of having an empty CDATA section, why not have an empty tag i.e.
| Code: |
<element attribute="sample" />
|
|
|
| Back to top |
|
| |
Scott tutorialtoday.com

Joined: 24 Mar 2005 Posts: 2650 Location: Mississauga, Ontario
|
Posted: Wed Feb 20, 2008 9:56 pm Post subject: |
|
|
I've tried a print_r($attrib) and that doesn't list the tags that are blank and keep getting skipped. I was hoping to just see if I was missing some sort of option but I guess I will just have to make a little work around whether it be text that acts as a place holder or just ending the tag as you mentioned above (<tag value="value" />)
Also, I think SolidRaven meant that you read the file to a string and use preg_match_all on the string, which probably would work with the right regex. _________________ Tutorial Management Script - Version 1.4 Released
TutorialToday - Up and running, submit your tutorials!
Linux Tutorials - Coming Soon |
|
| Back to top |
|
| |
LP-SolidRaven Dictator of the Dump

Joined: 06 Jun 2004 Posts: 7297 Location: The cheese is made out of moon
|
Posted: Thu Feb 21, 2008 12:09 pm Post subject: |
|
|
| exsanguination wrote: | | SolidRaven wrote: | | On the other hand you could always try to get the lines that were missing out with preg_match_all |
Bad suggestion much? You are manipulating XML, not strings.
|
If you're only reading it you can perfectly use preg_match_all to find the things that the xml parser didn't catch. It's not uncommon to use such methods.
| Quote: |
Also instead of having an empty CDATA section, why not have an empty tag |
In most cases you don't have a lot of choice about your input syntax _________________
| Quote: |
<bart416> I just realized something
<bart416> we celebrate the fact that this piece of rock made one rotation around a glowing ball of plasma that is kept together due to its own gravity well
<njsg> HAPPY NEW YEAR
<Easter> ^^
|
|
|
| Back to top |
|
| |
krt ...

Joined: 11 Jan 2005 Posts: 4765 Location: Down Under
|
Posted: Thu Feb 21, 2008 5:18 pm Post subject: |
|
|
What XML parser are you using Scott? No XML parser should have trouble with an empty <tag></tag> or <tag/>.
| Quote: | | If you're only reading it you can perfectly use preg_match_all to find the things that the xml parser didn't catch. It's not uncommon to use such methods. |
Maybe in cases where the XML is not well formed and you are not the one responsible for the XML file. I think both cases do not apply here. |
|
| Back to top |
|
| |
Scott tutorialtoday.com

Joined: 24 Mar 2005 Posts: 2650 Location: Mississauga, Ontario
|
Posted: Thu Feb 21, 2008 6:30 pm Post subject: |
|
|
I'm using the built in parser:
http://ca.php.net/manual/en/ref.xml.php
Anyways, I just made a little work around, so when it generates the XML file and the value is blank (or a white-space) it replaces it with some placeholder text and when it is parsed if that text is there the template is just filled in to be blank. _________________ Tutorial Management Script - Version 1.4 Released
TutorialToday - Up and running, submit your tutorials!
Linux Tutorials - Coming Soon |
|
| Back to top |
|
| |
ClickFanatic Est. 2005

Joined: 18 Jan 2005 Posts: 4100 Location: A particular geographic area
|
Posted: Fri Feb 22, 2008 10:40 am Post subject: |
|
|
Golden rule. Don't change the format, fix the parser.
This really should be your long-term solution to the problem... the workaround is dirty. _________________ Captain Jell-O Buster from the Future
[img]http://feeds.feedburner.com/sparepencil.1.gif[/img] |
|
| Back to top |
|
| |
LP-SolidRaven Dictator of the Dump

Joined: 06 Jun 2004 Posts: 7297 Location: The cheese is made out of moon
|
Posted: Fri Feb 22, 2008 1:13 pm Post subject: |
|
|
| Scott wrote: | I'm using the built in parser:
http://ca.php.net/manual/en/ref.xml.php
Anyways, I just made a little work around, so when it generates the XML file and the value is blank (or a white-space) it replaces it with some placeholder text and when it is parsed if that text is there the template is just filled in to be blank. |
You could do that on the fly with a regex when you load the xml data... _________________
| Quote: |
<bart416> I just realized something
<bart416> we celebrate the fact that this piece of rock made one rotation around a glowing ball of plasma that is kept together due to its own gravity well
<njsg> HAPPY NEW YEAR
<Easter> ^^
|
|
|
| Back to top |
|
| |
Scott tutorialtoday.com

Joined: 24 Mar 2005 Posts: 2650 Location: Mississauga, Ontario
|
Posted: Fri Feb 22, 2008 2:23 pm Post subject: |
|
|
Yes it is a bit of a dirty solution but no one will ever set the value of this specific thing to a random string like I have set it to. The values are snippets of HTML file which form a layout. _________________ Tutorial Management Script - Version 1.4 Released
TutorialToday - Up and running, submit your tutorials!
Linux Tutorials - Coming Soon |
|
| Back to top |
|
| |
exsanguination Forum Regular
Joined: 27 Apr 2005 Posts: 418 Location: Australia
|
Posted: Mon Feb 25, 2008 8:37 pm Post subject: |
|
|
| SolidRaven wrote: | | exsanguination wrote: | | SolidRaven wrote: | | On the other hand you could always try to get the lines that were missing out with preg_match_all |
Bad suggestion much? You are manipulating XML, not strings.
|
If you're only reading it you can perfectly use preg_match_all to find the things that the xml parser didn't catch. It's not uncommon to use such methods. |
ClickFanatic has it right, file a bug report if its seriously not working properly.
I know its not uncommon to use regex to parse xml (and HTML). Doesn't mean its a good idea.
The whole idea behind SAX XML parsers, like the one Scott is using, is that you don't need to have the entire xml document in memory, you read in what you need then throw away the rest.
Using regex is terribly inefficient because you need to read the entire XML and have it in memory, then compile and run your regex over it. Soon as you document gets any kind of size and complex you are going to find that it wasn't such a good idea (insert hammer / nail quote in here). |
|
| Back to top |
|
| |
LP-SolidRaven Dictator of the Dump

Joined: 06 Jun 2004 Posts: 7297 Location: The cheese is made out of moon
|
Posted: Tue Feb 26, 2008 12:07 pm Post subject: |
|
|
| exsanguination wrote: |
ClickFanatic has it right, file a bug report if its seriously not working properly.
|
They'll have it fixed in a couple of months probably. Assuming they're not busy writing up a big useless feature for some java freak who wants more java stuff in php.
| Quote: | | I know its not uncommon to use regex to parse xml (and HTML). Doesn't mean its a good idea. |
For most documents regex really isn't as bad as you make it sound.
If it's not larger than 10kb regex will do just fine.
| Quote: | The whole idea behind SAX XML parsers, like the one Scott is using, is that you don't need to have the entire xml document in memory, you read in what you need then throw away the rest.
Using regex is terribly inefficient because you need to read the entire XML and have it in memory, then compile and run your regex over it. Soon as you document gets any kind of size and complex you are going to find that it wasn't such a good idea (insert hammer / nail quote in here). |
1) He didn't mention size
2) PHP handles regex rather efficient compared to other languages. (It automatically caches so compilation of patterns only happens once, and it can remember somewhere around 4000 patterns if I remember correctly).
3) You don't have a xml parser everywhere, regex on the other hand is compiled in pretty much every php install. On top of that nothing prevents you from parsing a file like a sax parser in php without using a module. You can perfectly read in line per line at a time and then apply a regex to it. And cause it caches it won't be terribly inefficient as you claim. _________________
| Quote: |
<bart416> I just realized something
<bart416> we celebrate the fact that this piece of rock made one rotation around a glowing ball of plasma that is kept together due to its own gravity well
<njsg> HAPPY NEW YEAR
<Easter> ^^
|
|
|
| Back to top |
|
| |
|
|
|