login


在执行大量文章的导入的过程中 GAE不堪重负 吼一声Deadline Exceed Error便挂掉了...
GAE的限制吧 每个request要在一定时间内完成...
为了解决这个问题... 开搜...

http://groups.google.com/group/google-appengine/browse_thread/thread/e37dcd0f38a2f96a?hl=en&pli=1 

gae group的一个讨论... 里面涉及了很多datastore优化的要点...

If you only need some of the properties for the query that needs 100+
results, you'll need to create a separate set of entities with just those
properties, and query those.  Similarly, if you want the query to return
just the keys, you'll need entities containing the properties that are the
subjects of query filters and the keys for the full entities.

100+ entities in a single request is a lot, especially with 40 properties on
each entity.  Smaller entities will get() faster, but you might also
consider avoiding needing so many results at once. If you're hitting request
timeouts and really need that much data, you could spread the requests
across multiple requests using JavaScript.  This won't reduce the total user
time for the complete result set, but you could reduce perceived latency by
displaying the first 20 results immediately while the remaining 80 are
fetched.

If you're trying to deliver the results all at once like in a downloadable
spreadsheet, you'll have to get clever, maybe use memcache as a workspace
and build it over multiple requests.

  
maybe use memcache as a workspace
and build it over multiple requests.
靠谱的解决方案 ...  解析出来的数据分成份放在memcache里  ..
然后像GAE unit 一样做ajax loop  ...

还真不好整呀 - - !


    Share in Google Reader     Share in Twitter..     Share in Friendfeed     Leave a Reply

6 Response to “终于在做导入的时候遇到了GAE的瓶颈”

  1. Robert Says:

    我也是由于这个原因放弃了导入wordpress的xml格式,取而代之我是用python脚本一篇一个request方式导入的。

  2. LinCong.JavaTech Says:

    恩 这样也是个好主意 ... 其实自己用够用了...
    现在wordpress的xml格式导入其实已经做完了..
    就是要多导入几次,会自动by pass 以前的结果,导入的时候每个步骤放在transaction里了 保证不会出乱子.. 另外发现 django signals挺耗费处理时间的,还有针对datastore  entity group ,keys的机制做了优化...减少查询  采用直接用key, key_name get的方式

    GAE几大硬伤啊
    1. python 变量,http post大小1M的限制
    2. 处理时间的限制
    3. 处理大量随机访问的限制  我发的链接那老哥最后无奈了..他每次显示一个页面都要几penny 欲哭无泪  看来是不适合做E-commerce平台的应用了...

  3. 可能的解决办法:
    1. ajax loop的方式  分为多次请求... 可以把memcache 作为暂时的处理空间...(非重要数据  否则memcache 一flush就...)
    2. 同上,另外优化查询 ,gae对随机查询很耗,尽量把查询改成通过key  get的方式,组织好你的datastore存储方式  尽量利用gae的entity group的优化  比如我的结构就是category - post -comments 这样每个category 是一个 group (暂时是这么组织的。。可能不靠谱..)
    3. 搞清楚gae能做什么, 不适合做什么...

  4. cnbohu Says:

    我600多k,wordpress导出的WXR,总是"500服务器纠结中... "
    下面就看见南方公园了

  5. evertobe Says:

    推荐使用本地脚本导入:
    cd 到apps\import_wxp\
    执行import.py (import.py -h查看用法)
    下面是个例子:
    import.py -f c:/wordpress.xml -m evertobe@gmail.com -a inforsphere -s 6.latest.inforsphere.appspot.com

  6. Nic Says:

    请看看这个报错:使用import.py -h的时候说找一些模块找不到? C:\Program Files\Google\google_appengine\ihere\apps\import_wxp>import.py -h Traceback (most recent call last): File "C:\Program Files\Google\google_appengine\ihere\apps\import_wxp\import.py ", line 26, in init_env() File "C:\Program Files\Google\google_appengine\ihere\apps\import_wxp\import.py ", line 10, in init_env from appenginepatch.appenginepatcher.patch import patch_all, setup_logging File "E:\Program Files\Google\google_appengine\ihere\common\appenginepatch\app enginepatcher\patch.py", line 7, in File "C:/Program Files/Google/google_appengine\google\appengine\ext\db\__init_ _.py", line 88, in from google.appengine.api import datastore File "C:/Program Files/Google/google_appengine\google\appengine\api\datastore. py", line 47, in from google.appengine.datastore import datastore_index File "C:/Program Files/Google/google_appengine\google\appengine\datastore\data store_index.py", line 53, in from google.appengine.api import validation File "C:/Program Files/Google/google_appengine\google\appengine\api\validation .py", line 44, in import yaml ImportError: No module named yaml

  7. evertobe Says:

    yaml模块需要安装 这个在google的sdk里面有 你如果是windows的话进cmd, cd到: C:\Program Files\Google\google_appengine\lib\yaml 运行python setup.py install 然后就可以了

Leave a Reply


Logo

About Me

  • A Computer Geek in Beijing, China. Focus on Web2.0 Technology: Google App Engine, Python, Django, Software Architecture, Agile, JAVA, J2EE, JavaScript, etc.

    Coding for fun, Coding with passion :-) It's my life!

Most Popular Posts

Tags